-
Sensors (Basel, Switzerland) Apr 2022The work aims to propose a novel approach for automatically identifying all instruments present in an audio excerpt using sets of individual convolutional neural... (Review)
Review
The work aims to propose a novel approach for automatically identifying all instruments present in an audio excerpt using sets of individual convolutional neural networks (CNNs) per tested instrument. The paper starts with a review of tasks related to musical instrument identification. It focuses on tasks performed, input type, algorithms employed, and metrics used. The paper starts with the background presentation, i.e., metadata description and a review of related works. This is followed by showing the dataset prepared for the experiment and its division into subsets: training, validation, and evaluation. Then, the analyzed architecture of the neural network model is presented. Based on the described model, training is performed, and several quality metrics are determined for the training and validation sets. The results of the evaluation of the trained network on a separate set are shown. Detailed values for precision, recall, and the number of true and false positive and negative detections are presented. The model efficiency is high, with the metric values ranging from 0.86 for the guitar to 0.99 for drums. Finally, a discussion and a summary of the results obtained follows.
Topics: Algorithms; Benchmarking; Deep Learning; Metadata; Neural Networks, Computer
PubMed: 35459018
DOI: 10.3390/s22083033 -
Ecology and Evolution Jul 2021Metadata plays an essential role in the long-term preservation, reuse, and interoperability of data. Nevertheless, creating useful metadata can be sufficiently difficult...
Metadata plays an essential role in the long-term preservation, reuse, and interoperability of data. Nevertheless, creating useful metadata can be sufficiently difficult and weakly enough incentivized that many datasets may be accompanied by little or no metadata. One key challenge is, therefore, how to make metadata creation easier and more valuable. We present a solution that involves creating domain-specific metadata schemes that are as complex as necessary and as simple as possible. These goals are achieved by co-development between a metadata expert and the researchers (i.e., the data creators). The final product is a bespoke metadata scheme into which researchers can enter information (and validate it) via the simplest of interfaces: a web browser application and a spreadsheet.We provide the R package dmdScheme (dmdScheme: An R package for working with domain specific MetaData schemes (Version v0.9.22), 2019) for creating a template domain-specific scheme. We describe how to create a domain-specific scheme from this template, including the iterative co-development process, and the simple methods for using the scheme, and simple methods for quality assessment, improvement, and validation.The process of developing a metadata scheme following the outlined approach was successful, resulting in a metadata scheme which is used for the data generated in our research group. The validation quickly identifies forgotten metadata, as well as inconsistent metadata, therefore improving the quality of the metadata. Multiple output formats are available, including XML.Making the provision of metadata easier while also ensuring high quality must be a priority for data curation initiatives. We show how both objectives are achieved by close collaboration between metadata experts and researchers to create domain-specific schemes. A near-future priority is to provide methods to interface domain-specific schemes with general metadata schemes, such as the Ecological Metadata Language, to increase interoperability.
PubMed: 34306613
DOI: 10.1002/ece3.7764 -
Computational and Structural... 2023In the fast-evolving landscape of biomedical research, the emergence of big data has presented researchers with extraordinary opportunities to explore biological... (Review)
Review
In the fast-evolving landscape of biomedical research, the emergence of big data has presented researchers with extraordinary opportunities to explore biological complexities. In biomedical research, big data imply also a big responsibility. This is not only due to genomics data being sensitive information but also due to genomics data being shared and re-analysed among the scientific community. This saves valuable resources and can even help to find new insights . To fully use these opportunities, detailed and correct metadata are imperative. This includes not only the availability of metadata but also their correctness. Metadata integrity serves as a fundamental determinant of research credibility, supporting the reliability and reproducibility of data-driven findings. Ensuring metadata availability, curation, and accuracy are therefore essential for bioinformatic research. Not only must metadata be readily available, but they must also be meticulously curated and ideally error-free. Motivated by an accidental discovery of a critical metadata error in patient data published in two high-impact journals, we aim to raise awareness for the need of correct, complete, and curated metadata. We describe how the metadata error was found, addressed, and present examples for metadata-related challenges in omics research, along with supporting measures, including tools for checking metadata and software to facilitate various steps from data analysis to published research.
PubMed: 37860229
DOI: 10.1016/j.csbj.2023.10.006 -
Scientific Data Apr 2023We present a database resulting from high throughput experimentation, primarily on metal oxide solid state materials. The central relational database, the Materials...
We present a database resulting from high throughput experimentation, primarily on metal oxide solid state materials. The central relational database, the Materials Provenance Store (MPS), manages the metadata and experimental provenance from acquisition of raw materials, through synthesis, to a broad range of materials characterization techniques. Given the primary research goal of materials discovery of solar fuels materials, many of the characterization experiments involve electrochemistry, along with optical, structural, and compositional characterizations. The MPS is populated with all information required for executing common data queries, which typically do not involve direct query of raw data. The result is a database file that can be distributed to users so that they can independently execute queries and subsequently download the data of interest. We propose this strategy as an approach to manage the highly heterogeneous and distributed data that arises from materials science experiments, as demonstrated by the management of over 30 million experiments run on over 12 million samples in the present MPS release.
Topics: Semantics; Databases, Factual; Metadata
PubMed: 37024515
DOI: 10.1038/s41597-023-02107-0 -
Scientific Data Jan 2021Ancient DNA and RNA are valuable data sources for a wide range of disciplines. Within the field of ancient metagenomics, the number of published genetic datasets has...
Ancient DNA and RNA are valuable data sources for a wide range of disciplines. Within the field of ancient metagenomics, the number of published genetic datasets has risen dramatically in recent years, and tracking this data for reuse is particularly important for large-scale ecological and evolutionary studies of individual taxa and communities of both microbes and eukaryotes. AncientMetagenomeDir (archived at https://doi.org/10.5281/zenodo.3980833 ) is a collection of annotated metagenomic sample lists derived from published studies that provide basic, standardised metadata and accession numbers to allow rapid data retrieval from online repositories. These tables are community-curated and span multiple sub-disciplines to ensure adequate breadth and consensus in metadata definitions, as well as longevity of the database. Internal guidelines and automated checks facilitate compatibility with established sequence-read archives and term-ontologies, and ensure consistency and interoperability for future meta-analyses. This collection will also assist in standardising metadata reporting for future ancient metagenomic studies.
Topics: Databases, Genetic; Humans; Metadata; Metagenome; Metagenomics; Publications
PubMed: 33500403
DOI: 10.1038/s41597-021-00816-y -
Bioinformatics (Oxford, England) Sep 2022Environmental DNA (eDNA), as a rapidly expanding research field, stands to benefit from shared resources including sampling protocols, study designs, discovered...
MOTIVATION
Environmental DNA (eDNA), as a rapidly expanding research field, stands to benefit from shared resources including sampling protocols, study designs, discovered sequences, and taxonomic assignments to sequences. High-quality community shareable eDNA resources rely heavily on comprehensive metadata documentation that captures the complex workflows covering field sampling, molecular biology lab work, and bioinformatic analyses. There are limited sources that provide documentation of database development on comprehensive metadata for eDNA and these workflows and no open-source software.
RESULTS
We present medna-metadata, an open-source, modular system that aligns with Findable, Accessible, Interoperable, and Reusable guiding principles that support scholarly data reuse and the database and application development of a standardized metadata collection structure that encapsulates critical aspects of field data collection, wet lab processing, and bioinformatic analysis. Medna-metadata is showcased with metabarcoding data from the Gulf of Maine (Polinski et al., 2019).
AVAILABILITY AND IMPLEMENTATION
The source code of the medna-metadata web application is hosted on GitHub (https://github.com/Maine-eDNA/medna-metadata). Medna-metadata is a docker-compose installable package. Documentation can be found at https://medna-metadata.readthedocs.io/en/latest/?badge=latest. The application is implemented in Python, PostgreSQL and PostGIS, RabbitMQ, and NGINX, with all major browsers supported. A demo can be found at https://demo.metadata.maine-edna.org/.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Metadata; DNA, Environmental; Data Management; Software; Databases, Factual
PubMed: 35960154
DOI: 10.1093/bioinformatics/btac556 -
Nature Communications Oct 2021The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset... (Review)
Review
The amount of public proteomics data is rapidly increasing but there is no standardized format to describe the sample metadata and their relationship with the dataset files in a way that fully supports their understanding or reanalysis. Here we propose to develop the transcriptomics data format MAGE-TAB into a standard representation for proteomics sample metadata. We implement MAGE-TAB-Proteomics in a crowdsourcing project to manually curate over 200 public datasets. We also describe tools and libraries to validate and submit sample metadata-related information to the PRIDE repository. We expect that these developments will improve the reproducibility and facilitate the reanalysis and integration of public proteomics datasets.
Topics: Big Data; Data Analysis; Databases, Protein; Humans; Metadata; Proteomics; Reproducibility of Results; Software; Transcriptome
PubMed: 34615866
DOI: 10.1038/s41467-021-26111-3 -
Genome Biology Jul 2021
Topics: Cell Lineage; Computational Biology; Datasets as Topic; High-Throughput Nucleotide Sequencing; Humans; Metadata; Reproducibility of Results; Sequence Analysis, RNA; Single-Cell Analysis; Transcriptome
PubMed: 34311752
DOI: 10.1186/s13059-021-02422-y -
Nucleic Acids Research Jan 2021ArrayExpress (https://www.ebi.ac.uk/arrayexpress) is an archive of functional genomics data at EMBL-EBI, established in 2002, initially as an archive for...
ArrayExpress (https://www.ebi.ac.uk/arrayexpress) is an archive of functional genomics data at EMBL-EBI, established in 2002, initially as an archive for publication-related microarray data and was later extended to accept sequencing-based data. Over the last decade an increasing share of biological experiments involve multiple technologies assaying different biological modalities, such as epigenetics, and RNA and protein expression, and thus the BioStudies database (https://www.ebi.ac.uk/biostudies) was established to deal with such multimodal data. Its central concept is a study, which typically is associated with a publication. BioStudies stores metadata describing the study, provides links to the relevant databases, such as European Nucleotide Archive (ENA), as well as hosts the types of data for which specialized databases do not exist. With BioStudies now fully functional, we are able to further harmonize the archival data infrastructure at EMBL-EBI, and ArrayExpress is being migrated to BioStudies. In future, all functional genomics data will be archived at BioStudies. The process will be seamless for the users, who will continue to submit data using the online tool Annotare and will be able to query and download data largely in the same manner as before. Nevertheless, some technical aspects, particularly programmatic access, will change. This update guides the users through these changes.
Topics: Animals; Cell Line; DNA Methylation; Databases, Genetic; Epigenesis, Genetic; Gene Expression Profiling; Genomics; High-Throughput Nucleotide Sequencing; Humans; Internet; Metadata; Oligonucleotide Array Sequence Analysis; Organ Specificity; Plants; Single-Cell Analysis; Software
PubMed: 33211879
DOI: 10.1093/nar/gkaa1062 -
BioRxiv : the Preprint Server For... Mar 2023A large number of genomic and imaging datasets are being produced by consortia that seek to characterize healthy and disease tissues at single-cell resolution. While...
A large number of genomic and imaging datasets are being produced by consortia that seek to characterize healthy and disease tissues at single-cell resolution. While much effort has been devoted to capturing information related to biospecimen information and experimental procedures, the metadata standards that describe data matrices and the analysis workflows that produced them are relatively lacking. Detailed metadata schema related to data analysis are needed to facilitate sharing and interoperability across groups and to promote data provenance for reproducibility. To address this need, we developed the Matrix and Analysis Metadata Standards (MAMS) to serve as a resource for data coordinating centers and tool developers. We first curated several simple and complex "use cases" to characterize the types of feature-observation matrices (FOMs), annotations, and analysis metadata produced in different workflows. Based on these use cases, metadata fields were defined to describe the data contained within each matrix including those related to processing, modality, and subsets. Suggested terms were created for the majority of fields to aid in harmonization of metadata terms across groups. Additional provenance metadata fields were also defined to describe the software and workflows that produced each FOM. Finally, we developed a simple list-like schema that can be used to store MAMS information and implemented in multiple formats. Overall, MAMS can be used as a guide to harmonize analysis-related metadata which will ultimately facilitate integration of datasets across tools and consortia. MAMS specifications, use cases, and examples can be found at https://github.com/single-cell-mams/mams/.
PubMed: 36945543
DOI: 10.1101/2023.03.06.531314