-
PloS One 2023Considerable scientific work involves locating, analyzing, systematizing, and synthesizing other publications, often with the help of online scientific publication...
Considerable scientific work involves locating, analyzing, systematizing, and synthesizing other publications, often with the help of online scientific publication databases and search engines. However, use of online sources suffers from a lack of repeatability and transparency, as well as from technical restrictions. Alexandria3k is a Python software package and an associated command-line tool that can populate embedded relational databases with slices from the complete set of several open publication metadata sets. These can then be employed for reproducible processing and analysis through versatile and performant queries. We demonstrate the software's utility by visualizing the evolution of publications in diverse scientific fields and relationships among them, by outlining scientometric facts associated with COVID-19 research, and by replicating commonly-used bibliometric measures and findings regarding scientific productivity, impact, and disruption.
Topics: Databases, Factual; Search Engine; Bibliometrics; Metadata; Research Design
PubMed: 38032908
DOI: 10.1371/journal.pone.0294946 -
Journal of Digital Imaging Oct 2022Clinical images are vital for diagnosing and monitoring skin diseases, and their importance has increased with the growing popularity of machine learning. Lack of...
Clinical images are vital for diagnosing and monitoring skin diseases, and their importance has increased with the growing popularity of machine learning. Lack of standards has stifled innovation in dermatological imaging, unlike other image-intensive specialties such as radiology. We investigate the meta-requirements for utilizing the popular DICOM standard for metadata management of images in dermatology. We propose practical design solutions and provide open-source tools to integrate dermatologists' workflow with enterprise imaging systems. Using the tool, dermatologists can tag, search, organize and convert clinical images to the DICOM format. We believe that our less disruptive approach will improve the adoption of standards in the specialty.
Topics: Humans; Dermatology; Diagnostic Imaging; Metadata; Radiology Information Systems; Workflow
PubMed: 35488074
DOI: 10.1007/s10278-022-00636-5 -
Bioinformatics (Oxford, England) Sep 2022Environmental DNA (eDNA), as a rapidly expanding research field, stands to benefit from shared resources including sampling protocols, study designs, discovered...
MOTIVATION
Environmental DNA (eDNA), as a rapidly expanding research field, stands to benefit from shared resources including sampling protocols, study designs, discovered sequences, and taxonomic assignments to sequences. High-quality community shareable eDNA resources rely heavily on comprehensive metadata documentation that captures the complex workflows covering field sampling, molecular biology lab work, and bioinformatic analyses. There are limited sources that provide documentation of database development on comprehensive metadata for eDNA and these workflows and no open-source software.
RESULTS
We present medna-metadata, an open-source, modular system that aligns with Findable, Accessible, Interoperable, and Reusable guiding principles that support scholarly data reuse and the database and application development of a standardized metadata collection structure that encapsulates critical aspects of field data collection, wet lab processing, and bioinformatic analysis. Medna-metadata is showcased with metabarcoding data from the Gulf of Maine (Polinski et al., 2019).
AVAILABILITY AND IMPLEMENTATION
The source code of the medna-metadata web application is hosted on GitHub (https://github.com/Maine-eDNA/medna-metadata). Medna-metadata is a docker-compose installable package. Documentation can be found at https://medna-metadata.readthedocs.io/en/latest/?badge=latest. The application is implemented in Python, PostgreSQL and PostGIS, RabbitMQ, and NGINX, with all major browsers supported. A demo can be found at https://demo.metadata.maine-edna.org/.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Metadata; DNA, Environmental; Data Management; Software; Databases, Factual
PubMed: 35960154
DOI: 10.1093/bioinformatics/btac556 -
Current Opinion in Urology Jul 2019The purpose of this review is to examine and evaluate similarities and differences in bladder cancer expression subtypes and to understand the clinical implications of... (Review)
Review
PURPOSE OF REVIEW
The purpose of this review is to examine and evaluate similarities and differences in bladder cancer expression subtypes and to understand the clinical implications of the molecular subtyping.
RECENT FINDINGS
Four independent classification systems have been described, and there are broad similarities among the subtyping callers. Two major subtypes have been identified, that is, luminal and basal, with underlying subcategories based on various distinct characteristics. Luminal tumors generally bear a better prognosis and increased survival than basal tumors, although there is subtle variation in prognosis among the different subtypes within the luminal and basal classifications. Clinical subtyping is now commercially available, although there are limitations to its generalizability and application.
SUMMARY
Expression subtyping is a new method to personalize bladder cancer management. However, there is probably not sufficient evidence to incorporate use into current standards-of-care. Validation cohorts with clinically meaningful outcomes may further establish the clinical relevance of molecular subtyping of bladder cancer. Additionally, genetic alterations in bladder cancer may 'color' the interpretation of individual tumors beyond the expression subtype to truly personalize care for bladder cancer.
Topics: Biomarkers, Tumor; Gene Expression Profiling; Humans; Immunophenotyping; Metadata; Mutation; Neoplasm Invasiveness; Prognosis; Urinary Bladder Neoplasms
PubMed: 31158107
DOI: 10.1097/MOU.0000000000000641 -
Scientific Data Sep 2023The Two Weeks in the World research project has resulted in a dataset of 3087 clinically relevant bacterial genomes with pertaining metadata, collected from 59...
The Two Weeks in the World research project has resulted in a dataset of 3087 clinically relevant bacterial genomes with pertaining metadata, collected from 59 diagnostic units in 35 countries around the world during 2020. A relational database is available with metadata and summary data from selected bioinformatic analysis, such as species prediction and identification of acquired resistance genes.
Topics: Bacteria; Computational Biology; Databases, Factual; Genome, Bacterial; Metadata
PubMed: 37717051
DOI: 10.1038/s41597-023-02502-7 -
Trials Mar 2019Data repositories have the potential to play an important role in the effective and safe sharing of individual-participant data (IPD) from clinical studies. We analysed... (Review)
Review
BACKGROUND
Data repositories have the potential to play an important role in the effective and safe sharing of individual-participant data (IPD) from clinical studies. We analysed the current landscape of data repositories to create a detailed description of available repositories and assess their suitability for hosting data from clinical studies, from the perspective of the clinical researcher.
METHODS
We assessed repositories that enable storage, sharing, discoverability, re-use of the IPD and associated documents from clinical studies using a pre-defined set of 34 items and publicly available information from April to June 2018. For this purpose, we developed an indicator set to capture the maturity of the repositories' procedures and their suitability for the hosting of IPD. The indicators cover guidelines for data upload and data de-identification, data quality controls, contracts for upload and storage, flexibility of access, application of identifiers, availability of metadata, and long-term preservation.
RESULTS
We analysed 25 repositories, from an initial set of 55 identified as possibly relevant. Half of the included repositories were generic, i.e. not limited to a specific disease or clinical area and 13 were launched in the last 8 years. The sample was extremely heterogeneous and included repositories developed by research funders, infrastructures, universities, and editors. All but three repositories do not apply a fee for uploading, storage or access to data. None of the repositories completely demonstrated all the items included in the indicator set, but three repositories (Dryad, Drum, EASY) met - fully or partially - all items. Flexibility of data-access modalities appears to be limited, being lacking in half of the repositories.
CONCLUSIONS
Our evaluation, though often hampered by the lack of sufficient information, can help researchers to find a suitable repository for their datasets. Some repositories are more mature because of their support for clinical dataset preparation, contractual agreements, metadata and identifiers, different modalities of access, and long-term preservation of data. Further work is now required to achieve a more robust and accurate system for evaluation, which in turn may encourage the sharing of clinical study data.
TRIAL REGISTRATION
Study protocol available at https://zenodo.org/record/1438261#.W64kW9Egrcs .
Topics: Access to Information; Big Data; Clinical Studies as Topic; Data Collection; Data Mining; Databases, Factual; Humans; Information Dissemination; Metadata
PubMed: 30876434
DOI: 10.1186/s13063-019-3253-3 -
Data in Brief Feb 2020The OpenScience Slovenia metadata dataset contains metadata entries for Slovenian public domain academic documents which include undergraduate and postgraduate theses,...
The OpenScience Slovenia metadata dataset contains metadata entries for Slovenian public domain academic documents which include undergraduate and postgraduate theses, research and professional articles, along with other academic document types. The data within the dataset was collected as a part of the establishment of the Slovenian Open-Access Infrastructure which defined a unified document collection process and cataloguing for universities in Slovenia within the infrastructure repositories. The data was collected from several already established but separate library systems in Slovenia and merged into a single metadata scheme using metadata deduplication and merging techniques. It consists of text and numerical fields, representing attributes that describe documents. These attributes include document titles, keywords, abstracts, typologies, authors, issue years and other identifiers such as URL and UDC. The potential of this dataset lies especially in text mining and text classification tasks and can also be used in development or benchmarking of content-based recommender systems on real-world data.
PubMed: 31890793
DOI: 10.1016/j.dib.2019.104942 -
PLoS Computational Biology Oct 2020
Topics: Computational Biology; Gene Ontology; Genomics; Metadata; Molecular Sequence Annotation; Sequence Analysis, DNA
PubMed: 33017400
DOI: 10.1371/journal.pcbi.1008260 -
Nucleic Acids Research Jan 2020The Therapeutic Structural Antibody Database (Thera-SAbDab; http://opig.stats.ox.ac.uk/webapps/therasabdab) tracks all antibody- and nanobody-related therapeutics...
The Therapeutic Structural Antibody Database (Thera-SAbDab; http://opig.stats.ox.ac.uk/webapps/therasabdab) tracks all antibody- and nanobody-related therapeutics recognized by the World Health Organisation (WHO), and identifies any corresponding structures in the Structural Antibody Database (SAbDab) with near-exact or exact variable domain sequence matches. Thera-SAbDab is synchronized with SAbDab to update weekly, reflecting new Protein Data Bank entries and the availability of new sequence data published by the WHO. Each therapeutic summary page lists structural coverage (with links to the appropriate SAbDab entries), alignments showing where any near-matches deviate in sequence, and accompanying metadata, such as intended target and investigated conditions. Thera-SAbDab can be queried by therapeutic name, by a combination of metadata, or by variable domain sequence - returning all therapeutics that are within a specified sequence identity over a specified region of the query. The sequences of all therapeutics listed in Thera-SAbDab (461 unique molecules, as of 5 August 2019) are downloadable as a single file with accompanying metadata.
Topics: Antibodies; Clinical Trials as Topic; Databases, Protein; Humans; Internet; Metadata; Sequence Alignment; User-Computer Interface
PubMed: 31555805
DOI: 10.1093/nar/gkz827 -
Scientific Data Jan 2021Ancient DNA and RNA are valuable data sources for a wide range of disciplines. Within the field of ancient metagenomics, the number of published genetic datasets has...
Ancient DNA and RNA are valuable data sources for a wide range of disciplines. Within the field of ancient metagenomics, the number of published genetic datasets has risen dramatically in recent years, and tracking this data for reuse is particularly important for large-scale ecological and evolutionary studies of individual taxa and communities of both microbes and eukaryotes. AncientMetagenomeDir (archived at https://doi.org/10.5281/zenodo.3980833 ) is a collection of annotated metagenomic sample lists derived from published studies that provide basic, standardised metadata and accession numbers to allow rapid data retrieval from online repositories. These tables are community-curated and span multiple sub-disciplines to ensure adequate breadth and consensus in metadata definitions, as well as longevity of the database. Internal guidelines and automated checks facilitate compatibility with established sequence-read archives and term-ontologies, and ensure consistency and interoperability for future meta-analyses. This collection will also assist in standardising metadata reporting for future ancient metagenomic studies.
Topics: Databases, Genetic; Humans; Metadata; Metagenome; Metagenomics; Publications
PubMed: 33500403
DOI: 10.1038/s41597-021-00816-y