See Andrew Gelman’s post Suspiciously high correlations in brain imaging studies.
Archive for the 'Uncategorized' Category
Page 4 of 6
I just found out about BioMed Critical Commentary. Here’s an excerpt from the site’s philosophy statement.
The current system of scientific journals serves well certain constituencies: the advertisers, the journals themselves, and the authors. It is the underlying philosophy of BioMed Critical Commentary to serve the readers in preference to any other constituency.
In particular, this site could serve as a public forum for criticism that journals are not eager to publish. It could be a good place to discuss specific examples of irreproducible analyses.
Even when lab work and statistical analysis carried out perfectly, microarray experiment conclusions have a high probability of being incorrect for probabilistic reasons. Of course lab work and statistical analysis are not carried out perfectly. I went to a talk earlier this week that demonstrated reproducibility problems coming both from the wet lab and from the statistical analysis.
The talk presented a study that supposedly discovered genes that can distinguish those who will respond to a certain therapy from those who will not. On closer analysis, the paper actually demonstrated that is it possible to distinguish microarray experiments conducted on one day from experiments conducted another day. That is, batch effects from the lab were much larger than differences between patients who did and did not respond to therapy. I hear that this is typical unless gene expression levels vary dramatically between subgroups.
The talk also discussed problems with reproducing the statistical analysis. As is so often the case, data were mislabeled. In fact, 3/4 of the samples were mislabeled. Simply keeping up with indexes is the biggest barrier to reproducibility. It is shocking how often studies simply did not analyze the data they say they analyzed. This seems like a simple matter to get right; perhaps people give little attention to it precisely because it seems so simple.
So, three reasons to be skeptical of microarray experiment conclusions:
- High probability of false discovery
- Statistical reproducibility problems
- Physical reproducibility problems
Sergey Fomel just told me about a special session on reproducible research at the “Berlin 6 Open Access Conference” in Dusseldorf, Germany. Presentations from the conference are available online.
Sergey Fomel and Sünje Dallmeier-Tiessen gave presentations in geophysics. Patrick Vandewalle and Jelena Kovacevic gave presentations in signal processing. Mark Liberman, Kai von Fintel, and Steven Krauwer gave presentations related to language and technology.
Video of the presentations is available here.
Thomas Guest has a new blog post Books, blogs, comments and code samples discussing the challenges of writing a book that contains code samples, may be rendered to multiple devices as well as paper, etc. He points to a project by author Scott Meyers called Fastware that explores ways of meeting these challenges. I haven’t had time to explore Fastware yet, but it sounds like it is concerned with some of the same problems that come up in reproducible research.
My previous post discussed Keith Baggerly and his efforts as a “forensic bioinformatician.”
In that article, the reporter asks Keith to name the biggest problem he sees in trying to reproduce results.
It’s not sexy, it’s not higher mathematics. It’s bookkeeping … keeping track of the labels and keeping track of what goes where. The thing that we have found repeatedly in our analyses is that it actually is one of the most difficult steps in performing some of these analyses.
I’ve seen presentations where Keith discusses specific bookkeeping errors. Quite often columns get transposed in spreadsheets, so researchers are not analyzing the data they say they are analyzing.
The October 2008 issue of AMSTAT News has an article entitled “Forensic Bioinformatician Aims To Solve Mysteries of Biomarker Studies.” The article is about Keith Baggerly of M. D. Anderson Cancer Center and his efforts to reproduce analyses in bioinformatics papers.
The article quotes David Ransohoff, professor of medicine at UNC Chapel Hill, saying this about Keith Baggerly.
I think Keith is doing a wonderful and needed job … But the fact that we need people like him means that our journals are failing us. The kinds of things that Keith spends time finding out — what [the researchers] actually do — that’s what methods and results are supposed to be for in journals. … We need to figure out how to do science without needing people like Keith.
One of the reasons for lack of reproducibility is that journals press authors for space and so statistics sections get abbreviated. (Why not put the full details online?) Another reason is that bioinformatics articles are inherently cross-disciplinary and it may be that no single person is responsible for or even understands the entire article.
I recently heard out about some interesting tools from Blue Reference. I haven’t had a chance to try them out yet, but they look promising.
Sweave has received a fair amount of attention with regard to reproducibility because it lets you embed R code in LaTeX. Code stays with the presentation document, reducing the chance of error and increasing transparency. However, the number of people who use R and LaTeX is small, and asking people to learn these two packages before they can reproducible research is not going to fly. The number of people who use C# and Microsoft Word is orders of magnitude larger than the number of folks who use R and LaTeX.
It looks like Blue Reference’s product Inference for .NET lets .NET programmers do the kinds of things Sweave lets R programmers do, embedding .NET code in Microsoft Office documents. The also make a product Inference for MATLAB for embedding MATLAB code in Office documents.
Python developers who don’t think of themselves as .NET developers might want to use Inference for .NET to embed Python code in Word documents via Iron Python. Ruby developers might want to use Iron Ruby similarly.
Bjarne Stroustrup’s book The C++ Programming Language begins with the quote
Programming is understanding.
Many times I’ve thought I understood something until I tried to implement it in software. Then the process of writing and testing the software exposed my lack of understanding
One thing that can make reproducible research difficult is that you have to deeply understand what you’re doing. Making work reproducible may require automating steps that you do not fully understand, and don’t realize that you don’t understand until you try automating them.
Stated more positively, attempts to make research reproducible can lead to new insights into the research itself.
Related post: Paper doesn’t abort
In a recent interview on the Hanselminutes podcast, Jeff Web said that if he were to teach a computer science class, he would have the class work on an assignment, then a week later make everyone move over one chair, i.e. have everyone take over the code their neighbor started. Aside from the difficulty of assigning individual grades in such a class, I think that’s a fantastic idea.
Suppose students did have to take over a new code base every week. People who write mysterious code would be chastised by their peers. Hopefully people who think they write transparent code would realize that they don’t. The students might even hold a meeting outside of class to set a strategy. I could imagine someone standing up to argue that they’re all going to do poorly in the class unless they agree on some standards. It would be fantastic if the students would discover a few principles of software engineering out of self-defense.
I had a small taste of this in college. My first assignment in one computer science class was to add functionality to a program the instructor had started. Then he asked us to add the same functionality to a program that a typical student had written the previous semester. As the instructor emphasized, he didn’t pick out the worst program turned in, only a typical one. As I recall, the student code wasn’t terrible, but it wasn’t exactly clear either. This was by far the most educational homework problem I had in a CS class. I realized that the principles we’d been taught about how to write good code were not just platitudes but survival skills. Later my experience as a professional programmer and as a project manager reinforced the same conclusion.
In some environments, it’s not practical to have people switch projects unless it is absolutely necessary. Maybe the code is high quality (and maybe it’s not!) but there is a large amount of domain knowledge necessary before someone could contribute to the code. But at least software developers ought to be able to build each other’s code, even if they couldn’t maintain it.
When I was managing a group of around 20 programmers, mostly working on one-person projects, I had what I called reproducibility drills. These were similar to Jeff Webb’s idea for teaching computer science. I had everyone try to build someone else’s project. These exercises turned out to be far more difficult than anyone anticipated, but they caused us to improve our development procedures.
We later added a policy that a build master would have to extract a project from version control and build it without help from the developer before the project could be deployed. The developer was allowed (required) to create written instructions for how to build the project, and these instructions were to be in a location dictated by convention. The build master position rotated so we wouldn’t become too dependent on one person’s implicit knowledge.
Having a rotating build master is great improvement, but it lacked some of the benefits of the reproducibility drills. The build master procedure only requires a project to be reproducible before it’s deployed. That is essential, but it could foster an attitude that it’s OK for a project to be in bad shape until the very end. Also, some projects never actually deploy, such as research projects, and so they never go to the build master.