-
Frontiers in Cellular and Infection... 2018Eukaryotic parasites and pathogens continue to cause some of the most detrimental and difficult to treat diseases (or disease states) in both humans and animals, while... (Review)
Review
Eukaryotic parasites and pathogens continue to cause some of the most detrimental and difficult to treat diseases (or disease states) in both humans and animals, while also continuously expanding into non-endemic countries. Combined with the ever growing number of reports on drug-resistance and the lack of effective treatment programs for many metazoan diseases, the impact that these organisms will have on quality of life remain a global challenge. Vaccination as an effective prophylactic treatment has been demonstrated for well over 200 years for bacterial and viral diseases. From the earliest variolation procedures to the cutting edge technologies employed today, many protective preparations have been successfully developed for use in both medical and veterinary applications. In spite of the successes of these applications in the discovery of subunit vaccines against prokaryotic pathogens, not many targets have been successfully developed into vaccines directed against metazoan parasites. With the current increase in -omics technologies and metadata for eukaryotic parasites, target discovery for vaccine development can be expedited. However, a good understanding of the host/vector/pathogen interface is needed to understand the underlying biological, biochemical and immunological components that will confer a protective response in the host animal. Therefore, systems biology is rapidly coming of age in the pursuit of effective parasite vaccines. Despite the difficulties, a number of approaches have been developed and applied to parasitic helminths and arthropods. This review will focus on key aspects of vaccine development that require attention in the battle against these metazoan parasites, as well as successes in the field of vaccine development for helminthiases and ectoparasites. Lastly, we propose future direction of applying successes in pursuit of next generation vaccines.
Topics: Animals; Antigens, Protozoan; Arthropods; Drug Discovery; Drug Resistance; Helminths; Host-Parasite Interactions; Metadata; Parasites; Parasitic Diseases, Animal; Protozoan Vaccines; Systems Biology; Vaccination
PubMed: 29594064
DOI: 10.3389/fcimb.2018.00067 -
Scientific Data Oct 2022Metadata describe information about data source, type of creation, structure, status and semantics and are prerequisite for preservation and reuse of medical data. To...
Metadata describe information about data source, type of creation, structure, status and semantics and are prerequisite for preservation and reuse of medical data. To overcome the hurdle of disparate data sources and repositories with heterogeneous data formats a metadata crosswalk was initiated, based on existing standards. FAIR Principles were included, as well as data format specifications. The metadata crosswalk is the foundation of data provision between a Medical Data Integration Center (MeDIC) and researchers, providing a selection of metadata information for research design and requests. Based on the crosswalk, metadata items were prioritized and categorized to demonstrate that not one single predefined standard meets all requirements of a MeDIC and only a maximum data set of metadata is suitable for use. The development of a convergence format including the maximum data set is the anticipated solution for an automated transformation of metadata in a MeDIC.
Topics: Metadata; Information Storage and Retrieval; Semantics; Reference Standards
PubMed: 36307424
DOI: 10.1038/s41597-022-01792-7 -
Nucleic Acids Research Jan 2024MetaboLights is a global database for metabolomics studies including the raw experimental data and the associated metadata. The database is cross-species and...
MetaboLights is a global database for metabolomics studies including the raw experimental data and the associated metadata. The database is cross-species and cross-technique and covers metabolite structures and their reference spectra as well as their biological roles and locations where available. MetaboLights is the recommended metabolomics repository for a number of leading journals and ELIXIR, the European infrastructure for life science information. In this article, we describe the continued growth and diversity of submissions and the significant developments in recent years. In particular, we highlight MetaboLights Labs, our new Galaxy Project instance with repository-scale standardized workflows, and how data public on MetaboLights are being reused by the community. Metabolomics resources and data are available under the EMBL-EBI's Terms of Use at https://www.ebi.ac.uk/metabolights and under Apache 2.0 at https://github.com/EBI-Metabolights.
Topics: Metabolomics; Metadata; Databases, Genetic; Internet
PubMed: 37971328
DOI: 10.1093/nar/gkad1045 -
AMIA ... Annual Symposium Proceedings.... 2017In biomedicine, high-quality metadata are crucial for finding experimental datasets, for understanding how experiments were performed, and for reproducing those...
In biomedicine, high-quality metadata are crucial for finding experimental datasets, for understanding how experiments were performed, and for reproducing those experiments. Despite the recent focus on metadata, the quality of metadata available in public repositories continues to be extremely poor. A key difficulty is that the typical metadata acquisition process is time-consuming and error prone, with weak or nonexistent support for linking metadata to ontologies. There is a pressing need for methods and tools to speed up the metadata acquisition process and to increase the quality of metadata that are entered. In this paper, we describe a methodology and set of associated tools that we developed to address this challenge. A core component of this approach is a value recommendation framework that uses analysis of previously entered metadata and ontology-based metadata specifications to help users rapidly and accurately enter their metadata. We performed an initial evaluation of this approach using metadata from a public metadata repository.
Topics: Biological Ontologies; Biomedical Research; Data Accuracy; Data Analysis; Metadata; Methods
PubMed: 29854196
DOI: No ID Found -
Sensors (Basel, Switzerland) Apr 2022The work aims to propose a novel approach for automatically identifying all instruments present in an audio excerpt using sets of individual convolutional neural... (Review)
Review
The work aims to propose a novel approach for automatically identifying all instruments present in an audio excerpt using sets of individual convolutional neural networks (CNNs) per tested instrument. The paper starts with a review of tasks related to musical instrument identification. It focuses on tasks performed, input type, algorithms employed, and metrics used. The paper starts with the background presentation, i.e., metadata description and a review of related works. This is followed by showing the dataset prepared for the experiment and its division into subsets: training, validation, and evaluation. Then, the analyzed architecture of the neural network model is presented. Based on the described model, training is performed, and several quality metrics are determined for the training and validation sets. The results of the evaluation of the trained network on a separate set are shown. Detailed values for precision, recall, and the number of true and false positive and negative detections are presented. The model efficiency is high, with the metric values ranging from 0.86 for the guitar to 0.99 for drums. Finally, a discussion and a summary of the results obtained follows.
Topics: Algorithms; Benchmarking; Deep Learning; Metadata; Neural Networks, Computer
PubMed: 35459018
DOI: 10.3390/s22083033 -
Scientific Data Mar 2021Using brain atlases to localize regions of interest is a requirement for making neuroscientifically valid statistical inferences. These atlases, represented in...
Using brain atlases to localize regions of interest is a requirement for making neuroscientifically valid statistical inferences. These atlases, represented in volumetric or surface coordinate spaces, can describe brain topology from a variety of perspectives. Although many human brain atlases have circulated the field over the past fifty years, limited effort has been devoted to their standardization. Standardization can facilitate consistency and transparency with respect to orientation, resolution, labeling scheme, file storage format, and coordinate space designation. Our group has worked to consolidate an extensive selection of popular human brain atlases into a single, curated, open-source library, where they are stored following a standardized protocol with accompanying metadata, which can serve as the basis for future atlases. The repository containing the atlases, the specification, as well as relevant transformation functions is available in the neuroparc OSF registered repository or https://github.com/neurodata/neuroparc .
Topics: Brain; Brain Mapping; Humans; Image Processing, Computer-Assisted; Metadata
PubMed: 33686079
DOI: 10.1038/s41597-021-00849-3 -
BMC Bioinformatics Apr 2022Heterogeneous omics data, increasingly collected through high-throughput technologies, can contain hidden answers to very important and still unsolved biomedical...
BACKGROUND
Heterogeneous omics data, increasingly collected through high-throughput technologies, can contain hidden answers to very important and still unsolved biomedical questions. Their integration and processing are crucial mostly for tertiary analysis of Next Generation Sequencing data, although suitable big data strategies still address mainly primary and secondary analysis. Hence, there is a pressing need for algorithms specifically designed to explore big omics datasets, capable of ensuring scalability and interoperability, possibly relying on high-performance computing infrastructures.
RESULTS
We propose RGMQL, a R/Bioconductor package conceived to provide a set of specialized functions to extract, combine, process and compare omics datasets and their metadata from different and differently localized sources. RGMQL is built over the GenoMetric Query Language (GMQL) data management and computational engine, and can leverage its open curated repository as well as its cloud-based resources, with the possibility of outsourcing computational tasks to GMQL remote services. Furthermore, it overcomes the limits of the GMQL declarative syntax, by guaranteeing a procedural approach in dealing with omics data within the R/Bioconductor environment. But mostly, it provides full interoperability with other packages of the R/Bioconductor framework and extensibility over the most used genomic data structures and processing functions.
CONCLUSIONS
RGMQL is able to combine the query expressiveness and computational efficiency of GMQL with a complete processing flow in the R environment, being a fully integrated extension of the R/Bioconductor framework. Here we provide three fully reproducible example use cases of biological relevance that are particularly explanatory of its flexibility of use and interoperability with other R/Bioconductor packages. They show how RGMQL can easily scale up from local to parallel and cloud computing while it combines and analyzes heterogeneous omics data from local or remote datasets, both public and private, in a completely transparent way to the user.
Topics: Big Data; Cloud Computing; Genomics; Metadata; Software
PubMed: 35392801
DOI: 10.1186/s12859-022-04648-4 -
Nucleic Acids Research Jul 2022Millions of transcriptome samples were generated by the Library of Integrated Network-based Cellular Signatures (LINCS) program. When these data are processed into...
Millions of transcriptome samples were generated by the Library of Integrated Network-based Cellular Signatures (LINCS) program. When these data are processed into searchable signatures along with signatures extracted from Genotype-Tissue Expression (GTEx) and Gene Expression Omnibus (GEO), connections between drugs, genes, pathways and diseases can be illuminated. SigCom LINCS is a webserver that serves over a million gene expression signatures processed, analyzed, and visualized from LINCS, GTEx, and GEO. SigCom LINCS is built with Signature Commons, a cloud-agnostic skeleton Data Commons with a focus on serving searchable signatures. SigCom LINCS provides a rapid signature similarity search for mimickers and reversers given sets of up and down genes, a gene set, a single gene, or any search term. Additionally, users of SigCom LINCS can perform a metadata search to find and analyze subsets of signatures and find information about genes and drugs. SigCom LINCS is findable, accessible, interoperable, and reusable (FAIR) with metadata linked to standard ontologies and vocabularies. In addition, all the data and signatures within SigCom LINCS are available via a well-documented API. In summary, SigCom LINCS, available at https://maayanlab.cloud/sigcom-lincs, is a rich webserver resource for accelerating drug and target discovery in systems pharmacology.
Topics: Transcriptome; Metadata; Search Engine
PubMed: 35524556
DOI: 10.1093/nar/gkac328 -
GigaScience Dec 2022The life sciences are one of the biggest suppliers of scientific data. Reusing and connecting these data can uncover hidden insights and lead to new concepts. Efficient...
BACKGROUND
The life sciences are one of the biggest suppliers of scientific data. Reusing and connecting these data can uncover hidden insights and lead to new concepts. Efficient reuse of these datasets is strongly promoted when they are interlinked with a sufficient amount of machine-actionable metadata. While the FAIR (Findable, Accessible, Interoperable, Reusable) guiding principles have been accepted by all stakeholders, in practice, there are only a limited number of easy-to-adopt implementations available that fulfill the needs of data producers.
FINDINGS
We developed the FAIR Data Station, a lightweight application written in Java, that aims to support researchers in managing research metadata according to the FAIR principles. It implements the ISA metadata framework and uses minimal information metadata standards to capture experiment metadata. The FAIR Data Station consists of 3 modules. Based on the minimal information model(s) selected by the user, the "form generation module" creates a metadata template Excel workbook with a header row of machine-actionable attribute names. The Excel workbook is subsequently used by the data producer(s) as a familiar environment for sample metadata registration. At any point during this process, the format of the recorded values can be checked using the "validation module." Finally, the "resource module" can be used to convert the set of metadata recorded in the Excel workbook in RDF format, enabling (cross-project) (meta)data searches and, for publishing of sequence data, in an European Nucleotide Archive-compatible XML metadata file.
CONCLUSIONS
Turning FAIR into reality requires the availability of easy-to-adopt data FAIRification workflows that are also of direct use for data producers. As such, the FAIR Data Station provides, in addition to the means to correctly FAIRify (omics) data, the means to build searchable metadata databases of similar projects and can assist in ENA metadata submission of sequence data. The FAIR Data Station is available at https://fairbydesign.nl.
Topics: Metadata; Biological Science Disciplines; Databases, Factual; Nucleotides; Publishing
PubMed: 36879493
DOI: 10.1093/gigascience/giad014 -
Conservation Biology : the Journal of... Aug 2023Genetic diversity within species represents a fundamental yet underappreciated level of biodiversity. Because genetic diversity can indicate species resilience to...
Genetic diversity within species represents a fundamental yet underappreciated level of biodiversity. Because genetic diversity can indicate species resilience to changing climate, its measurement is relevant to many national and global conservation policy targets. Many studies produce large amounts of genome-scale genetic diversity data for wild populations, but most (87%) do not include the associated spatial and temporal metadata necessary for them to be reused in monitoring programs or for acknowledging the sovereignty of nations or Indigenous peoples. We undertook a distributed datathon to quantify the availability of these missing metadata and to test the hypothesis that their availability decays with time. We also worked to remediate missing metadata by extracting them from associated published papers, online repositories, and direct communication with authors. Starting with 848 candidate genomic data sets (reduced representation and whole genome) from the International Nucleotide Sequence Database Collaboration, we determined that 561 contained mostly samples from wild populations. We successfully restored spatiotemporal metadata for 78% of these 561 data sets (n = 440 data sets with data on 45,105 individuals from 762 species in 17 phyla). Examining papers and online repositories was much more fruitful than contacting 351 authors, who replied to our email requests 45% of the time. Overall, 23% of our email queries to authors unearthed useful metadata. The probability of retrieving spatiotemporal metadata declined significantly as age of the data set increased. There was a 13.5% yearly decrease in metadata associated with published papers or online repositories and up to a 22% yearly decrease in metadata that were only available from authors. This rapid decay in metadata availability, mirrored in studies of other types of biological data, should motivate swift updates to data-sharing policies and researcher practices to ensure that the valuable context provided by metadata is not lost to conservation science forever.
Topics: Humans; Conservation of Natural Resources; Metadata; Biodiversity; Probability; Genetic Variation
PubMed: 36704891
DOI: 10.1111/cobi.14061