Skip to content

“data wants to be free”

6 November 2011
by

…I said that once to a friend from grad school when he asked why a project* I worked on was uploading anonymized data we had collected and managed for over a decade for others to analyze. I was brought up in the tradition of sharing your data with others. There has been a lot of focus on the hassles of doing so (anonymizing, cleaning, getting “scooped”, etc.).

But there are a number of reasons to do it. First, only by sharing the data do you allow others to be able to improve upon your ideas (and, hopefully, selfishly, to cite your work). In fact, one study showed that sharing detailed research data is associated with an increased citation rate.

The principle of sharing your data also strikes me as a way to signal that your findings are honest. A professor of mine demonstrated to us in an advanced methods course the difficulty of replicating findings if an author doesn’t think about potential replication when submitting a piece for publication. I decided from that point forward that I would always submit a final paper only after drafting an intelligible do-file and paring down a data file that could be uploaded online for someone else to replicate. A new study in PLoS:ONE finds the willingness to share data is related to the strength of evidence and the quality of reporting results. From the introduction:

…The unwillingness to share data of published research has been documented in a number of fields and is often ascribed in part to the fear among authors that independent reanalysis will expose statistical or analytical errors in their work and will produce conclusions that differ from theirs…

Here we study whether researchers’ willingness to share data for reanalysis is associated with the strength of the evidence (defined as the statistical evidence against the null hypothesis of no effect) and the quality of the reporting of statistical results (defined in terms of the prevalence of inconsistencies in reported statistical results)…

The authors find:

In this sample of psychology papers, the authors’ reluctance to share data was associated with more errors in reporting of statistical results and with relatively weaker evidence (against the null hypothesis). The documented errors are arguably the tip of the iceberg of potential errors and biases in statistical analyses and the reporting of statistical results.

HT Andrew Gelman.

_________________________________

"Mchinji Hall", or the table where we worked on the diaries so that they were eventually set free.

*If you listen to This American Life, you might have heard about the project — it was the “gossip” diaries from Malawi.

Leave a comment