What is best research practice in terms of reproducibility? At the recent workshop in As (Norway), I had a discussion with Marc-Oliver Gewaltig, similar to discussions I had earlier with some other colleagues as well. So I decided to put it up here. All feedback is welcome!
The discussion boils down to the following question: Is it better (in terms of reproducibility) to make code and data available online and allow users to repeat your experiments (or simulations as Marc-Oliver would call them) obtaining the same results, or to describe your theory (model in Marc-Oliver’s terminology) in sufficient detail that people can verify your results by re-implementing your experiments and verifying that they obtain the same thing?
I personally believe both approaches have their pros and cons. With the first one, a reader can download the related code and data, and very easily verify that he/she can obtain the same results as presented in the paper. If he wants to analyze things further, there is already a first implementation available to start analyzing, or to test on other data. However, that certainly doesn’t take away the need for a good and clear description in the paper!
With the second approach, one avoids the risk that a bug in the code giving those results is not caught by a reader reproducing the results, because he can just “double-click” to repeat the experiment. The second approach allows a thorough verification of the presented concept/theory, as the reader independently re-implements the work and checks the results. I believe certain standardization bodies like MPEG use this approach to make sure that descriptions are sufficiently precise.
Personally, I think the second approach is a better, more thorough approach in an ideal world. Currently, I prefer the first one, because most people won’t go into the depth of re-implementing things, and the first approach already gives those people something. Something more than just the paper, allowing to get their hands dirty on it. And “more interested readers” may still re-implement, or start analyzing the code in detail.
Elsevier Executable Paper Grand Challenge
At two recent occasions, I heard about Elsevier’s Executable Paper contest. The intention was to show concepts for the next generation of publications. Or as Elsevier put it:
Executable Paper Grand Challenge is a contest created to improve the way scientific information is communicated and used.
It asks:
By now, the contest is over, and the winners have been announced:
First Prize: The Collage Authoring Environment by Nowakowski et al.
Second Prize: SHARE: a web portal for creating and sharing executable research papers by Van Gorp and Mazanek.
Third Prize: A Universal Identifier for Computational Results by Gavish and Donoho.
Congratulations to all! At the AMP Workshop where I am now, we were lucky to have a presentation about the work by Gavish and Donoho, which sounds very cool! I also know the work by Van Gorp and Mazanek, using virtual machines to allow others to reproduce results. Still need to look into the winner’s work…
If any of this sounds interesting to you, and I believe it should, please take a look at the Grand Challenge website, and also check out some of the other participants’ contributions!
Here at the workshop, we also had an interesting related presentation yesterday by James Quirk about all that can be done with a PDF. Quite impressive! For examples, see his Amrita work and webpage.