Archive for the 'Uncategorized' Category

Reproducible machine learning

I just stumbled upon this one: Neil Lawrence’s page on reproducible research. Nice page, see also the large number of reproducible publications on his publications page. I think machine learning’s got another nice reference set of reproducible publications!

It’s interesting to see how Jon Claerbout’s work inspired a large number of people all around the globe.

Citations

Something struck me lately, when reading a paper…

In academia, the game is all about publishing, and getting others to cite your articles. And I guess, to a certain extent, article counts and citation counts indeed give a measure of someone’s work. Until you start overfitting your system. But anyway, that’s another story…

So, to get back to my story, citations measure the quality of a work. In general, people try to be correct, and cite the researchers that started a certain work. And then, once work gets really well known, it’s somehow not cited anymore. So the ultimate reward for good work is not to be cited anymore. Or did you cite a reference when writing about the Fourier transform, wavelets, least squares or filtering? For some of them I don’t even know who it was, but someone must have invented them…

Open Access Week

This week is the first International Open Access Week. You can find more information here for the international events, or here for the Dutch website, to which I gave my modest contribution. I am truly convinced that making publications available online in open access is a great start. And the next step is to do the same for your code and data!

Research data. Who cares?

Today I attended the mini-symposium “Research data. Who cares?“, organized by Leon Osinski at TU Eindhoven. The symposium was organized at the startup of the 3TU.Data Centre, an organization by the 3 Dutch Technical Universities’ libraries for the storage, sharing and preservation of research data. I gave a presentation there about my experiences with reproducible research.

Another presentation there that I liked a lot was given by Pieter Van Gorp, about “SHARE” (Sharing Hosted Autonomous Research Environments). This is an exciting setup he developed. It allows a researcher to put his research results in a safe and well-controlled environment on a virtual machine. Other researchers can then login to that virtual machine, and reproduce the results in exactly the same environment as used by the author, as if they are working on the author’s machine. While I am not entirely sure yet about its advantage for my typical Matlab scripts (that do not require complex installations), it is certainly of tremendous help when presenting more complex tools and results. Seems like a great step towards one-click reproducing of results, and I am certainly going to try it out!

Welcome (back) !

Welcome (or welcome back) on this blog! September 1st, a good moment for a (new) start!

First of all, I’d like to welcome all readers from John Cook’s Reproducible Ideas blog at reproducibleresearch.org. I hope I’ll be able to live up to the standards John has set.

And of course also welcome to reproducibleresearch.net readers, and readers that join from pixeltje.be.

This blog has been created when merging reproducibleresearch.net and reproducibleresearch.org. I’ve taken this occasion to merge John’s posts (and thus keep those links valid) with my earlier posts on two different sites: blog.epfl.ch/rr and blog.pixeltje.be that are related to reproducible research.

I hope I’ll be able to write many interesting posts here. Please feel free to comment on any of my writings! If you would be interested in writing guest posts, please let me know!

One more RR web site

Just learned about Reproducible Research Planet.

Plan for merging .org and .net sites

Patrick Vandewalle and I will be combining our efforts to develop a web site to promote reproducible research. He has the domain name reproducibleresearch.net while I have reproducibleresearch.org. His site is better than the one I’ve developed, so I’d rather support his effort than continue  my own.

I plan to leave this web site up for a few more weeks and then hand the .org name over to Patrick. During that time, some of the content from this site will be merged into the framework of his new site. Please go over to the new site and participate in the forums.

I plan continue blogging about reproducible research from time to time, but future posts will be on my personal blog, The Endeavour. I may write a few more posts here regarding the status of the transition.

New web site devoted to RR

Check out the new web site http://www.reproducibleresearch.net by Patrick Vandewalle, Jelena Kovačević, and Martin Vetterli.

Reproducible Research logo

Reproducible Research in Signal Processing

Patrick Vandewalle, Jelena Kovačević, and Martin Vetterli have published a new article “Reproducible Research in Signal Processing: What, Why, and How” in IEEE Signal Processing Magazine (37) May 2009.

Preserving (the memory of) documents

The Long Now Foundation has produced a Rosetta disk containing 13,000 pages of information regarding 1,500 human languages. The text is engraved, not encoded. The text starts out large enough to read with the naked eye and becomes continuously smaller, strongly suggesting one should examine the disk under magnification to read further.

Long Now is trying to preserve documentation for thousands of years, but I just want to know how to preserve documents even for a few months or years. They want to hold on to knowledge as civilizations come and go. I’m just trying to hold on to knowledge as personnel come and go.

Mundane document preservation is a very difficult problem. Preserving the Declaration of Independence is easy; preserving meeting notes is hard. Preserving the Declaration is a technical problem. If you keep it in a glass case filled with nitrogen, keep the lights low, and make sure Nicolas Cage doesn’t steal it, you’re OK. Millions of people know that the document exists, and they know where to look for it. And besides the original paper copy, the text is available electronically in countless locations.

How do I preserve the document that describes why my internal software application uses the parameters it does? Make notes in the source code? Good idea, but most of the people who want to know about the parameters are not software developers. What about version control systems or content management systems? Great idea: put everything associated with a project in one place. But wherever you put the information, someone has to remember that it exists and know where to look for it.