Open and reusable annotated mass spectrometry dataset of a chemodiverse collection of 1,600 plant extracts | GigaScience | Oxford Academic

Abstract:  As privileged structures, natural products often display potent biological activities. However, the discovery of novel bioactive scaffolds is often hampered by the chemical complexity of the biological matrices they are found in. Large natural extract collections are thus extremely valuable for their chemical novelty potential but also complicated to exploit in the frame of drug-discovery projects. In the end, it is the pure chemical substances that are desired for structural determination purposes and bioactivity evaluation. Researchers interested in the exploration of large and chemodiverse extract collections should thus establish strategies aiming to efficiently tackle such chemical complexity and access these structures. Establishing carefully crafted digital layers documenting the spectral and chemical complexity as well as bioactivity results of natural extracts collections can help prioritize time-consuming but mandatory isolation efforts. In this note, we report the results of our initial exploration of a collection of 1,600 plant extracts in the frame of a drug-discovery effort. After describing the taxonomic coverage of this collection, we present the results of its liquid chromatography high-resolution mass spectrometric profiling and the exploitation of these profiles using computational solutions. The resulting annotated mass spectral dataset and associated chemical and taxonomic metadata are made available to the community, and data reuse cases are proposed. We are currently continuing our exploration of this plant extract collection for drug-discovery purposes (notably looking for novel antitrypanosomatids, anti-infective and prometabolic compounds) and ecometabolomics insights. We believe that such a dataset can be exploited and reused by researchers interested in computational natural products exploration.