Chapter 2 Bad practices in research

2.1 Manual steps in analysis

A mantra in some areas of statistics and biomedical research is “reproducibility”. In some contexts, reproducibility of results means that one researcher could run exeriments that are essentially identical to another researcher’s (presumably with different subjects, etc.) and get similar results. But in the context of research computing, especically with archival data sets, the idea is that one could take the steps outlined in a paper, using the (sometimes public) data sets, and produce similar results to those in the paper. In the limit, if the data sets are publicly available and the processes for transforming those for the paper are described precisely enough, then it should be possible to obtain precisely the coefficient estimates, etc., provided in the original paper.

2.2 Manual modification of data

2.3 Bad (or no) documentation

2.4 Poor version control

2.5 Limited sharing of code and data

2.6 No data exploration

2.7 Casual approach to merging data sets