“Ensuring data are archived and open thus seems a no-brainer. Several funders and journals now require authors to make their data public, and a recent White House mandate that data from federally funded research must be made available immediately on publication is a welcome stimulus. Various data repositories exist to support these requirements, and journals and preprint servers also provide storage options. Consequently, publications now often include various accession numbers, stand-alone data citations and/or supplementary files.
But as the director of the National Library of Medicine, Patti Brennan, once noted, “data are like pictures of children: the people who created them think they’re beautiful, but they’re not always useful”. So, although the above trends are to be applauded, we should think carefully about that word ‘useful’ and ask what exactly we mean by ‘the data’, how and where they should be archived, and whether some data should be kept at all….
Researchers, institutions and funders should collaborate to develop an overarching strategy for data preservation — a plan D. There will doubtless be calls for a ‘PubMed Central for data’. But what we really need is a federated system of repositories with functionality tailored to the information that they archive. This will require domain experts to agree standards for different types of data from different fields: what should be archived and when, which format, where, and for how long. We can learn from the genomics, structural biology and astronomy communities, and funding agencies should cooperate to define subdisciplines and establish surveys of them to ensure comprehensive coverage of the data landscape, from astronomy to zoology….”