I got pointed to another recent article published as a column in Nature:
Nick Barnes, Publish your computer code: it is good enough, Nature 467, 753 (2010), doi:10.1038/467753a.
Many very recognizable arguments pro (and contra) putting your code online! I enjoyed reading it, hope you will do the same.
It’s already quite some time ago, so it became highly time for a new post. This is definitely not by lack of interesting topics to blog about, it’s more related to a continued lack of time…
Anyway, I was reading last week about the Open Source Software Competition at ACM Multimedia this year. I didn’t know about it yet, but this seems to be a yearly event. Some more information about it is available here. In this competition, a short paper describing the software has to be submitted together with the entire open source software package. On the page, it is also clearly written that reviewers will make a reasonable attempt at testing and running the software, and submissions that don’t run will be rejected. I think this is a great event, and would like to congratulate the winners!
At the same time, I believe this also raises a question: Why is it that we have to create a separate competition for this, and do not have these same conditions (of being able to verify results) for regular conference or journal submissions? Is it too much to require both novelty (which I believe is less strictly checked in this competition), and (easy) reproducibility from paper submissions?
Published on
August 29, 2010 in
general.
It’s a good idea to try to reproduce someone else’s results. But when doing so, it’s important to give credit to the original authors, by making appropriate citations.
It’s not a good idea to copy someone else’s article, only changing the author list, and submitting it to another conference/journal. I’m glad to see IEEE takes appropriate measures when such a thing happens. And I am still amazed it actually happened…
Published on
August 15, 2010 in
general.
I’m happy to announce you here a call for papers for the special issue of EURASIP Journal on Advances in Signal Processing on reproducible research in signal processing. Any (novel) reproducible signal processing work qualifies. Submission deadline is November 1, 2010.

More information can be found in the call for papers.
Another data competition:
Machine Learning for Signal Processing (MLSP) TC Announces the Winners of the 6th Annual Data Analysis Competition
See here for more info.
Published on
June 18, 2010 in
general.
Earlier this week, I was at the Workshop in Computational Systems Biology: Models, Methods, Meaning at the Norwegian University of Life Sciences in As, Norway. I gave a talk there on reproducible research, and there were some other excellent talks on modeling and simulation, research methods, etc. I liked it a lot, and it was really an excellent workshop! Thanks for the organization, Hans Ekkehard!
As it says on the site, this workshop was on the following topic, which very well described to me both the content and the spirit of the workshop: “Modeling and simulation are essential tools in systems biology and many other branches of science. This workshop is an invitation to step back from the day-to-day struggle with our simulations and to reflect about the nature of modeling and its relation to simulation: How do modeling and simulation contribute to the development of knowledge? Is a simulation per se a valid scientific experiment?”
Both the speakers and the audience consisted of people with a very diverse background, ranging from physicists, chemists and engineers (like me) all the way to philosophers in metaphysics. This resulted in often very enthusiast and interesting discussions. It was also very interesting for me to see how scientists in neuroscience struggle with similar issues as me, and to see how they approach things. I learned a lot of new things, some of which will pop up in separate posts on this blog in the future. Stay tuned!
I was just reading the following two articles/notes. While they are not entirely about reproducible research, I think they reflect well the worries that many researchers have about current “publish or perish” research practices. Not sure I agree with all of it, but they do make a number of good remarks.
D. Geman, Ten Reasons Why Conference Papers Should be Abolished, Johns Hopkins University, Nov. 2007.
Y. Ma, Warning Signs of Bogus Progress in Research in an Age of Rich Computation and Information, ECE, University of Illinois, Nov. 2007.
I just stumbled upon this one: Neil Lawrence’s page on reproducible research. Nice page, see also the large number of reproducible publications on his publications page. I think machine learning’s got another nice reference set of reproducible publications!
It’s interesting to see how Jon Claerbout’s work inspired a large number of people all around the globe.
What is the best reproducible research?
What is best research practice in terms of reproducibility? At the recent workshop in As (Norway), I had a discussion with Marc-Oliver Gewaltig, similar to discussions I had earlier with some other colleagues as well. So I decided to put it up here. All feedback is welcome!
The discussion boils down to the following question: Is it better (in terms of reproducibility) to make code and data available online and allow users to repeat your experiments (or simulations as Marc-Oliver would call them) obtaining the same results, or to describe your theory (model in Marc-Oliver’s terminology) in sufficient detail that people can verify your results by re-implementing your experiments and verifying that they obtain the same thing?
I personally believe both approaches have their pros and cons. With the first one, a reader can download the related code and data, and very easily verify that he/she can obtain the same results as presented in the paper. If he wants to analyze things further, there is already a first implementation available to start analyzing, or to test on other data. However, that certainly doesn’t take away the need for a good and clear description in the paper!
With the second approach, one avoids the risk that a bug in the code giving those results is not caught by a reader reproducing the results, because he can just “double-click” to repeat the experiment. The second approach allows a thorough verification of the presented concept/theory, as the reader independently re-implements the work and checks the results. I believe certain standardization bodies like MPEG use this approach to make sure that descriptions are sufficiently precise.
Personally, I think the second approach is a better, more thorough approach in an ideal world. Currently, I prefer the first one, because most people won’t go into the depth of re-implementing things, and the first approach already gives those people something. Something more than just the paper, allowing to get their hands dirty on it. And “more interested readers” may still re-implement, or start analyzing the code in detail.