-
Scientific Data Mar 2021Using brain atlases to localize regions of interest is a requirement for making neuroscientifically valid statistical inferences. These atlases, represented in...
Using brain atlases to localize regions of interest is a requirement for making neuroscientifically valid statistical inferences. These atlases, represented in volumetric or surface coordinate spaces, can describe brain topology from a variety of perspectives. Although many human brain atlases have circulated the field over the past fifty years, limited effort has been devoted to their standardization. Standardization can facilitate consistency and transparency with respect to orientation, resolution, labeling scheme, file storage format, and coordinate space designation. Our group has worked to consolidate an extensive selection of popular human brain atlases into a single, curated, open-source library, where they are stored following a standardized protocol with accompanying metadata, which can serve as the basis for future atlases. The repository containing the atlases, the specification, as well as relevant transformation functions is available in the neuroparc OSF registered repository or https://github.com/neurodata/neuroparc .
Topics: Brain; Brain Mapping; Humans; Image Processing, Computer-Assisted; Metadata
PubMed: 33686079
DOI: 10.1038/s41597-021-00849-3 -
GigaScience Sep 2021The Investigation/Study/Assay (ISA) Metadata Framework is an established and widely used set of open source community specifications and software tools for enabling...
BACKGROUND
The Investigation/Study/Assay (ISA) Metadata Framework is an established and widely used set of open source community specifications and software tools for enabling discovery, exchange, and publication of metadata from experiments in the life sciences. The original ISA software suite provided a set of user-facing Java tools for creating and manipulating the information structured in ISA-Tab-a now widely used tabular format. To make the ISA framework more accessible to machines and enable programmatic manipulation of experiment metadata, the JSON serialization ISA-JSON was developed.
RESULTS
In this work, we present the ISA API, a Python library for the creation, editing, parsing, and validating of ISA-Tab and ISA-JSON formats by using a common data model engineered as Python object classes. We describe the ISA API feature set, early adopters, and its growing user community.
CONCLUSIONS
The ISA API provides users with rich programmatic metadata-handling functionality to support automation, a common interface, and an interoperable medium between the 2 ISA formats, as well as with other life science data formats required for depositing data in public databases.
Topics: Biological Science Disciplines; Databases, Factual; Metadata; Software
PubMed: 34528664
DOI: 10.1093/gigascience/giab060 -
Journal of the American Medical... Apr 2024Despite federally mandated collection of sex and gender demographics in the electronic health record (EHR), longitudinal assessments are lacking. We assessed sex and...
OBJECTIVES
Despite federally mandated collection of sex and gender demographics in the electronic health record (EHR), longitudinal assessments are lacking. We assessed sex and gender demographic field utilization using EHR metadata.
MATERIALS AND METHODS
Patients ≥18 years of age in the Mass General Brigham health system with a first Legal Sex entry (registration requirement) between January 8, 2018 and January 1, 2022 were included in this retrospective study. Metadata for all sex and gender fields (Legal Sex, Sex Assigned at Birth [SAAB], Gender Identity) were quantified by completion rates, user types, and longitudinal change. A nested qualitative study of providers from specialties with high and low field use identified themes related to utilization.
RESULTS
1 576 120 patients met inclusion criteria: 100% had a Legal Sex, 20% a Gender Identity, and 19% a SAAB; 321 185 patients had field changes other than initial Legal Sex entry. About 2% of patients had a subsequent Legal Sex change, and 25% of those had ≥2 changes; 20% of patients had ≥1 update to Gender Identity and 19% to SAAB. Excluding the first Legal Sex entry, administrators made most changes (67%) across all fields, followed by patients (25%), providers (7.2%), and automated Health Level-7 (HL7) interface messages (0.7%). Provider utilization varied by subspecialty; themes related to systems barriers and personal perceptions were identified.
DISCUSSION
Sex and gender demographic fields are primarily used by administrators and raise concern about data accuracy; provider use is heterogenous and lacking. Provider awareness of field availability and variable workflows may impede use.
CONCLUSION
EHR metadata highlights areas for improvement of sex and gender field utilization.
Topics: Infant, Newborn; Humans; Male; Female; Gender Identity; Electronic Health Records; Metadata; Retrospective Studies; Demography; Transgender Persons
PubMed: 38308819
DOI: 10.1093/jamia/ocae016 -
Nucleic Acids Research Jul 2022Millions of transcriptome samples were generated by the Library of Integrated Network-based Cellular Signatures (LINCS) program. When these data are processed into...
Millions of transcriptome samples were generated by the Library of Integrated Network-based Cellular Signatures (LINCS) program. When these data are processed into searchable signatures along with signatures extracted from Genotype-Tissue Expression (GTEx) and Gene Expression Omnibus (GEO), connections between drugs, genes, pathways and diseases can be illuminated. SigCom LINCS is a webserver that serves over a million gene expression signatures processed, analyzed, and visualized from LINCS, GTEx, and GEO. SigCom LINCS is built with Signature Commons, a cloud-agnostic skeleton Data Commons with a focus on serving searchable signatures. SigCom LINCS provides a rapid signature similarity search for mimickers and reversers given sets of up and down genes, a gene set, a single gene, or any search term. Additionally, users of SigCom LINCS can perform a metadata search to find and analyze subsets of signatures and find information about genes and drugs. SigCom LINCS is findable, accessible, interoperable, and reusable (FAIR) with metadata linked to standard ontologies and vocabularies. In addition, all the data and signatures within SigCom LINCS are available via a well-documented API. In summary, SigCom LINCS, available at https://maayanlab.cloud/sigcom-lincs, is a rich webserver resource for accelerating drug and target discovery in systems pharmacology.
Topics: Transcriptome; Metadata; Search Engine
PubMed: 35524556
DOI: 10.1093/nar/gkac328 -
GigaScience Dec 2022The life sciences are one of the biggest suppliers of scientific data. Reusing and connecting these data can uncover hidden insights and lead to new concepts. Efficient...
BACKGROUND
The life sciences are one of the biggest suppliers of scientific data. Reusing and connecting these data can uncover hidden insights and lead to new concepts. Efficient reuse of these datasets is strongly promoted when they are interlinked with a sufficient amount of machine-actionable metadata. While the FAIR (Findable, Accessible, Interoperable, Reusable) guiding principles have been accepted by all stakeholders, in practice, there are only a limited number of easy-to-adopt implementations available that fulfill the needs of data producers.
FINDINGS
We developed the FAIR Data Station, a lightweight application written in Java, that aims to support researchers in managing research metadata according to the FAIR principles. It implements the ISA metadata framework and uses minimal information metadata standards to capture experiment metadata. The FAIR Data Station consists of 3 modules. Based on the minimal information model(s) selected by the user, the "form generation module" creates a metadata template Excel workbook with a header row of machine-actionable attribute names. The Excel workbook is subsequently used by the data producer(s) as a familiar environment for sample metadata registration. At any point during this process, the format of the recorded values can be checked using the "validation module." Finally, the "resource module" can be used to convert the set of metadata recorded in the Excel workbook in RDF format, enabling (cross-project) (meta)data searches and, for publishing of sequence data, in an European Nucleotide Archive-compatible XML metadata file.
CONCLUSIONS
Turning FAIR into reality requires the availability of easy-to-adopt data FAIRification workflows that are also of direct use for data producers. As such, the FAIR Data Station provides, in addition to the means to correctly FAIRify (omics) data, the means to build searchable metadata databases of similar projects and can assist in ENA metadata submission of sequence data. The FAIR Data Station is available at https://fairbydesign.nl.
Topics: Metadata; Biological Science Disciplines; Databases, Factual; Nucleotides; Publishing
PubMed: 36879493
DOI: 10.1093/gigascience/giad014 -
Studies in Health Technology and... 2017The vision for Australia's national electronic health record system included empowering consumers to become active participants in their own health care. This paper aims... (Review)
Review
The vision for Australia's national electronic health record system included empowering consumers to become active participants in their own health care. This paper aims to critically review the literature on consumer perspectives of Australia's My Health Record (formerly PCEHR). The review is based on a subset of articles (n=12) identified in the Australian EHR Repository (N=143), a repository of metadata of Australian Research on EHR located at Flinders University. Results show low levels of awareness and concerns about sharing records and equity of access for all Australians, which in view of the change from opt in to opt out raises concerns about explicit consent. Improved promotion and support, along with different models of access might lead to higher consumer engagement with, and use, of My Health Record, especially for populations at risk of digital exclusion.
Topics: Australia; Computer Systems; Electronic Health Records; Humans; Metadata; Patient Portals
PubMed: 28756450
DOI: No ID Found -
Conservation Biology : the Journal of... Aug 2023Genetic diversity within species represents a fundamental yet underappreciated level of biodiversity. Because genetic diversity can indicate species resilience to...
Genetic diversity within species represents a fundamental yet underappreciated level of biodiversity. Because genetic diversity can indicate species resilience to changing climate, its measurement is relevant to many national and global conservation policy targets. Many studies produce large amounts of genome-scale genetic diversity data for wild populations, but most (87%) do not include the associated spatial and temporal metadata necessary for them to be reused in monitoring programs or for acknowledging the sovereignty of nations or Indigenous peoples. We undertook a distributed datathon to quantify the availability of these missing metadata and to test the hypothesis that their availability decays with time. We also worked to remediate missing metadata by extracting them from associated published papers, online repositories, and direct communication with authors. Starting with 848 candidate genomic data sets (reduced representation and whole genome) from the International Nucleotide Sequence Database Collaboration, we determined that 561 contained mostly samples from wild populations. We successfully restored spatiotemporal metadata for 78% of these 561 data sets (n = 440 data sets with data on 45,105 individuals from 762 species in 17 phyla). Examining papers and online repositories was much more fruitful than contacting 351 authors, who replied to our email requests 45% of the time. Overall, 23% of our email queries to authors unearthed useful metadata. The probability of retrieving spatiotemporal metadata declined significantly as age of the data set increased. There was a 13.5% yearly decrease in metadata associated with published papers or online repositories and up to a 22% yearly decrease in metadata that were only available from authors. This rapid decay in metadata availability, mirrored in studies of other types of biological data, should motivate swift updates to data-sharing policies and researcher practices to ensure that the valuable context provided by metadata is not lost to conservation science forever.
Topics: Humans; Conservation of Natural Resources; Metadata; Biodiversity; Probability; Genetic Variation
PubMed: 36704891
DOI: 10.1111/cobi.14061 -
PloS One 2022To adopt the FAIR principles (Findable, Accessible, Interoperable, Reusable) to enhance data sharing, the Cure Sickle Cell Initiative (CureSCi) MetaData Catalog (MDC)...
OBJECTIVES
To adopt the FAIR principles (Findable, Accessible, Interoperable, Reusable) to enhance data sharing, the Cure Sickle Cell Initiative (CureSCi) MetaData Catalog (MDC) was developed to make Sickle Cell Disease (SCD) study datasets more Findable by curating study metadata and making them available through an open-access web portal.
METHODS
Study metadata, including study protocol, data collection forms, and data dictionaries, describe information about study patient-level data. We curated key metadata of 16 SCD studies in a three-tiered conceptual framework of category, subcategory, and data element using ontologies and controlled vocabularies to organize the study variables. We developed the CureSCi MDC by indexing study metadata to enable effective browse and search capabilities at three levels: study, Patient-Reported Outcome (PRO) Measures, and data element levels.
RESULTS
The CureSCi MDC offers several browse and search tools to discover studies by study level, PRO Measures, and data elements. The "Browse Studies," "Browse Studies by PRO Measures," and "Browse Studies by Data Elements" tools allow users to identify studies through pre-defined conceptual categories. "Search by Keyword" and "Search Data Element by Concept Category" can be used separately or in combination to provide more granularity to refine the search results. This resource helps investigators find information about specific data elements across studies using public browsing/search tools, before going through data request procedures to access controlled datasets. The MDC makes SCD studies more Findable through browsing/searching study information, PRO Measures, and data elements, aiding in the reuse of existing SCD data.
Topics: Humans; Metadata; Information Dissemination; Anemia, Sickle Cell
PubMed: 36508412
DOI: 10.1371/journal.pone.0256248 -
Journal of Digital Imaging Oct 2019In the last decades, the amount of medical imaging studies and associated metadata has been rapidly increasing. Despite being mostly used for supporting medical... (Review)
Review
In the last decades, the amount of medical imaging studies and associated metadata has been rapidly increasing. Despite being mostly used for supporting medical diagnosis and treatment, many recent initiatives claim the use of medical imaging studies in clinical research scenarios but also to improve the business practices of medical institutions. However, the continuous production of medical imaging studies coupled with the tremendous amount of associated data, makes the real-time analysis of medical imaging repositories difficult using conventional tools and methodologies. Those archives contain not only the image data itself but also a wide range of valuable metadata describing all the stakeholders involved in the examination. The exploration of such technologies will increase the efficiency and quality of medical practice. In major centers, it represents a big data scenario where Business Intelligence (BI) and Data Analytics (DA) are rare and implemented through data warehousing approaches. This article proposes an Extract, Transform, Load (ETL) framework for medical imaging repositories able to feed, in real-time, a developed BI (Business Intelligence) application. The solution was designed to provide the necessary environment for leading research on top of live institutional repositories without requesting the creation of a data warehouse. It features an extensible dashboard with customizable charts and reports, with an intuitive web-based interface that empowers the usage of novel data mining techniques, namely, a variety of data cleansing tools, filters, and clustering functions. Therefore, the user is not required to master the programming skills commonly needed for data analysts and scientists, such as Python and R.
Topics: Data Mining; Data Warehousing; Humans; Metadata; Radiology Information Systems
PubMed: 31201587
DOI: 10.1007/s10278-019-00184-5 -
PloS One 2022Crowdfunding platforms allow entrepreneurs to publish projects and raise funds for realizing them. Hence, the question of what influences projects' fundraising success... (Comparative Study)
Comparative Study
Crowdfunding platforms allow entrepreneurs to publish projects and raise funds for realizing them. Hence, the question of what influences projects' fundraising success is very important. Previous studies examined various factors such as project goals and project duration that may influence the outcomes of fundraising campaigns. We present a novel model for predicting the success of crowdfunding projects in meeting their funding goals. Our model focuses on semantic features only, whose performance is comparable to that of previous models. In an additional model we developed, we examine both project metadata and project semantics, delivering a comprehensive study of factors influencing crowdfunding success. Further, we analyze a large dataset of crowdfunding project data, larger than reported in the art. Finally, we show that when combining semantics and metadata, we arrive at F1 score accuracy of 96.2%. We compare our model's accuracy to the accuracy of previous research models by applying their methods on our dataset, and demonstrate higher accuracy of our model. In addition to our scientific contribution, we provide practical recommendations that may increase project funding success chances.
Topics: Algorithms; Crowdsourcing; Fund Raising; Humans; Metadata; Models, Theoretical; Semantics
PubMed: 35148341
DOI: 10.1371/journal.pone.0263891