Citogenesis in science and the importance of real problems
Daniel Lemire writes:
Many papers in Computer Science tell the following story:
- There is a pre-existing problem P.
- There are few relatively simple but effective solution to problem P. Among them is solution X.
- We came up with a new solution X+ which is a clever variation on X. It looks good on paper.
- We ran some experiments and tweaked our results until X+ looked good. We found a clever way to avoid comparing X+ and X directly and fairly, as it might then become obvious that the gains are small, or even negative! We would gladly report negative results, but then our paper could not be published.
It is a very convenient story for reviewers: the story is simple and easy to assess superficially. The problem is that sometimes, especially if the authors are famous and the idea is compelling, the results will spread. People will adopt X+ and cite it in their work. And the more they cite it, the more enticing it is to use X+ as every citation becomes further validation for X+. And why bother with algorithm X given that it is older and X+ is the state-of-the-art?
Occasionally, someone might try both X and X+, and they may report results showing that the gains due to X+ are small, or negative. But they have no incentive to make a big deal of it because they are trying to propose yet another better algorithm (X++).
But don’t we see the same thing in blogs? Where writers say “some,” “many,” “often,” etc., but none are claims that can be evaluated by others?
Make no mistake, given the rate of mis-citation that I find in published proceedings, I really want to agree with Daniel but I think the matter is more complex than simply saying that “engineers” work with “real” tests.
One of my pet peeves is that lack of history that I find in most CS papers. They may go back ten years but what about thirty or even forty years ago?
But as far as engineers, why is there so little code re-use if they are so interested in being efficient? Is re-writing code really more efficient or just NWH (Not Written Here)?