Open data and data sharing in articles about COVID-19 published in preprint servers medRxiv and bioRxiv

This study aimed to analyze the content of data availability statements (DAS) and the actual sharing of raw data in preprint articles about COVID-19. The study combined a bibliometric analysis and a cross-sectional survey. We analyzed preprint articles on COVID-19 published on medRxiv and bioRxiv from January 1, 2020 to March 30, 2020. We extracted data sharing statements, tried to locate raw data when authors indicated they were available, and surveyed authors. The authors were surveyed in 2020–2021. We surveyed authors whose articles did not include DAS, who indicated that data are available on request, or their manuscript reported that raw data are available in the manuscript, but raw data were not found. Raw data collected in this study are published on Open Science Framework (https://osf.io/6ztec/). We analyzed 897 preprint articles. There were 699 (78%) articles with Data/Code field present on the website of a preprint server. In 234 (26%) preprints, data/code sharing statement was reported within the manuscript. For 283 preprints that reported that data were accessible, we found raw data/code for 133 (47%) of those 283 preprints (15% of all analyzed preprint articles). Most commonly, authors indicated that data were available on GitHub or another clearly specified web location, on (reasonable) request, in the manuscript or its supplementary files. In conclusion, preprint servers should require authors to provide data sharing statements that will be included both on the website and in the manuscript. Education of researchers about the meaning of data sharing is needed.