Software Updates: the “Unknown Unknown” of the Replication Crisis
Anastasia Ershova and Gerald Schneider (via Simon Hix):
By trying to replicate the results of a conditionally accepted article, we uncovered discrepancies between the reported results calculated by the author and the ones obtained by us. These divergences spurred an intensive exchange between the author and us and, finally, resulted in the discovery that they are due to changes in an algorithm used by the (commercial) software company for calculations done with a certain estimator. The software company, which pressures universities and research institutes to buy the expensive updates of their statistical package every second year at least, reports that it has since modified its algorithm. Yet, the company does not justify which version of the program is the correct one to use in order to get as close as possible to the underlying true relationship. It could be the case that the new algorithm saves us computing times, while the older versions calculate more accurate coefficients.
[…]
Researchers need to report which version of the software they used and, if this information is available, precisely when they last updated their software. In addition, they should be encouraged to replicate their findings with another software in the case that they are using a relatively newly developed estimator.
1 Comment RSS · Twitter
This is ridiculous. When a software company enters into a contract with a university, this contract must include that the university has access to the source code for these algorithms. Otherwise, if you don't know how your data was generated, its value is greatly diminished. Or just stick to open-source software.
Similarly, when publishing data, scientists should also publish the algorithms they used to generate the data.
(For example, the 2010 paper on debt-to-GDP ratio by Carmen Reinhart and Ken Rogoff was used to justify a lot of harmful economic policies. Turns out their Exccel sheet had a bug that was only found years later, because the Excel sheet was never published.)