Digitizing the vast "dark data" in museum fossil collections

“The uniqueness of each museum collection means that scientists routinely make pilgrimages worldwide to visit them. It also means that the loss of a collection, as in the recent heart-wrenching fire in Rio de Janeiro, represents an irreplaceable loss of knowledge. It’s akin to the loss of family history when a family elder passes away. In Rio, these losses included one-of-a-kind dinosaurs, perhaps the oldest human remains ever found in South America, and the only audio recordings and documents of indigenous languages, including many that no longer have native speakers. Things we once knew, we know no longer; things we might have known can no longer be known.

But now digital technologies — including the internet, interoperable databases and rapid imaging techniques — make it possible to electronically aggregate museum data. Researchers, including a multi-institutional team I am leading, are laying the foundation for the coherent use of these millions of specimens. Across the globe, teams are working to bring these “dark data” — currently inaccessible via the web — into the digital light….

The sheer size of fossil collections, and the fact that most of their contents were collected before the invention of computers and the internet, make it very difficult to aggregate the data associated with museum specimens. From a digital point of view, most of the world’s fossil collections represent “dark data.” …

The Integrated Digitized Biocollections (iDigBio) site hosts all the major museum digitization efforts in the United States funded by the current NSF initiative that began in 2011….

Our group, called EPICC for Eastern Pacific Invertebrate Communities of the Cenozoicquantified just how much “dark data” are present in our joint collections. We found that our 10 museums contain fossils from 23 times the number of collection sites in California, Oregon and Washington than are currently documented in a leading online electronic database of the paleontological scientific literature, the Paleobiology Database….”