-
Annual International Conference of the... Nov 2021In this work, we compare the performance of six state-of-the-art deep neural networks in classification tasks when using only image features, to when these are combined...
In this work, we compare the performance of six state-of-the-art deep neural networks in classification tasks when using only image features, to when these are combined with patient metadata. We utilise transfer learning from networks pretrained on ImageNet to extract image features from the ISIC HAM10000 dataset prior to classification. Using several classification performance metrics, we evaluate the effects of including metadata with the image features. Furthermore, we repeat our experiments with data augmentation. Our results show an overall enhancement in performance of each network as assessed by all metrics, only noting degradation in a vgg16 architecture. Our results indicate that this performance enhancement may be a general property of deep networks and should be explored in other areas. Moreover, these improvements come at a negligible additional cost in computation time, and therefore are a practical method for other applications.
Topics: Humans; Machine Learning; Metadata; Neural Networks, Computer
PubMed: 34891799
DOI: 10.1109/EMBC46164.2021.9630047 -
Studies in Health Technology and... Aug 2019Metadata matching is an important step towards integrating heterogeneous healthcare data and facilitating secondary use. MDRCupid supports this step by providing a...
Metadata matching is an important step towards integrating heterogeneous healthcare data and facilitating secondary use. MDRCupid supports this step by providing a configurable metadata matching toolbox incorporating lexical and statistical matching approaches. The matching configuration can be adapted to different purposes by manually selecting algorithms and their weights or by using the optimization module with corresponding training data. The toolbox can be accessed as a web service via programming or user interface. For every selected metadata element, the metadata elements with the highest similarity scores are presented to the user and can be manually confirmed via the user interface, while the programming interface uses a similarity threshold to select corresponding elements. An HL7 FHIR ConceptMap is used to save the matches. Manually confirmed matches may be used as new training data for the optimizer to improve the matching parameters further.
Topics: Algorithms; Delivery of Health Care; Metadata
PubMed: 31437891
DOI: 10.3233/SHTI190189 -
Database : the Journal of Biological... Jun 2022The Gene Expression Omnibus (GEO) is a public archive containing >4 million digital samples from functional genomics experiments collected over almost two decades. The...
The Gene Expression Omnibus (GEO) is a public archive containing >4 million digital samples from functional genomics experiments collected over almost two decades. The accompanying metadata describing the experiments suffer from redundancy, inconsistency and incompleteness due to the prevalence of free text and the lack of well-defined data formats and their validation. To remedy this situation, we created Genomic Metadata Integration (GeMI; http://gmql.eu/gemi/), a web application that learns to automatically extract structured metadata (in the form of key-value pairs) from the plain text descriptions of GEO experiments. The extracted information can then be indexed for structured search and used for various downstream data mining activities. GeMI works in continuous interaction with its users. The natural language processing transformer-based model at the core of our system is a fine-tuned version of the Generative Pre-trained Transformer 2 (GPT2) model that is able to learn continuously from the feedback of the users thanks to an active learning framework designed for the purpose. As a part of such a framework, a machine learning interpretation mechanism (that exploits saliency maps) allows the users to understand easily and quickly whether the predictions of the model are correct and improves the overall usability. GeMI's ability to extract attributes not explicitly mentioned (such as sex, tissue type, cell type, ethnicity and disease) allows researchers to perform specific queries and classification of experiments, which was previously possible only after spending time and resources with tedious manual annotation. The usefulness of GeMI is demonstrated on practical research use cases. Database URL http://gmql.eu/gemi/.
Topics: Data Mining; Genomics; Machine Learning; Metadata; Software
PubMed: 35657113
DOI: 10.1093/database/baac036 -
Nature Methods Dec 2021
Topics: Cell Nucleus; Humans; Image Processing, Computer-Assisted; Information Dissemination; Medical Informatics; Metadata; Microscopy; Software
PubMed: 34862504
DOI: 10.1038/s41592-021-01342-w -
Computational Intelligence and... 2022Ethnic traditional sports are a comprehensive interdisciplinary field that draws on and integrates theories and methods from related disciplines, particularly...
Ethnic traditional sports are a comprehensive interdisciplinary field that draws on and integrates theories and methods from related disciplines, particularly anthropology, to reveal the laws governing the emergence and development of ethnic traditional sports. The evolution of ethnic traditional sports has historically contributed to the construction of theoretical systems, the transmission of traditional culture, the formation of distinctive fitness techniques, the intensification of socioeconomic interests, and the development of cultural self-confidence. The majority of ethnic traditional sports teaching resources, however, are unstructured data including documents, images, animations, programs, tools, and videos. They rely heavily on ethnic traditional physical education teachers for instructional design, and there is no way to create a unified integration and sharing platform. From the preparation, supplementation, and description of resource metadata specifications to the clustering, integration, and cataloging of digital resources and the implementation of retrieval and service interfaces, this paper focuses on the effective integration of high-quality digital traditional ethnic sports resources in the process of network sharing.
Topics: Metadata; Physical Education and Training; Sports
PubMed: 36131894
DOI: 10.1155/2022/6505770 -
Bioinformatics (Oxford, England) Feb 2022The metabolome and microbiome disorders are highly associated with human health, and there are great demands for dual-omics interaction analysis. Here, we designed and...
MOTIVATION
The metabolome and microbiome disorders are highly associated with human health, and there are great demands for dual-omics interaction analysis. Here, we designed and developed an integrative platform, 3MCor, for metabolome and microbiome correlation analysis under the instruction of phenotype and with the consideration of confounders.
RESULTS
Many traditional and novel correlation analysis methods were integrated for intra- and inter-correlation analysis. Three inter-correlation pipelines are provided for global, hierarchical and pairwise analysis. The incorporated network analysis function is conducive to rapid identification of network clusters and key nodes from a complicated correlation network. Complete numerical results (csv files) and rich figures (pdf files) will be generated in minutes. To our knowledge, 3MCor is the first platform developed specifically for the correlation analysis of metabolome and microbiome. Its functions were compared with corresponding modules of existing omics data analysis platforms. A real-world dataset was used to demonstrate its simple and flexible operation, comprehensive outputs and distinctive contribution to dual-omics studies.
AVAILABILITYAND IMPLEMENTATION
3MCor is available at http://3mcor.cn and the backend R script is available at https://github.com/chentianlu/3MCorServer.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Humans; Software; Metadata; Metabolome; Computers; Microbiota
PubMed: 34874987
DOI: 10.1093/bioinformatics/btab818 -
Studies in Health Technology and... May 2023OMOP common data model (CDM) is designed for analyzing large clinical data and building cohorts for medical research, which requires Extract-Transform-Load processes...
OMOP common data model (CDM) is designed for analyzing large clinical data and building cohorts for medical research, which requires Extract-Transform-Load processes (ETL) of local heterogeneous medical data. We present a concept for developing and evaluating a modularized metadata-driven ETL process, which can transform data into OMOP CDM regardless of 1) the source data format, 2) its versions and 3) context of use.
Topics: Metadata; Electronic Health Records; Databases, Factual; Biomedical Research
PubMed: 37203486
DOI: 10.3233/SHTI230256 -
BMC Medical Informatics and Decision... May 2021The variety of medical documentation often leads to incompatible data elements that impede data integration between institutions. A common approach to standardize and...
BACKGROUND
The variety of medical documentation often leads to incompatible data elements that impede data integration between institutions. A common approach to standardize and distribute metadata definitions are ISO/IEC 11179 norm-compliant metadata repositories with top-down standardization. To the best of our knowledge, however, it is not yet common practice to reuse the content of publicly accessible metadata repositories for creation of case report forms or routine documentation. We suggest an alternative concept called pragmatic metadata repository, which enables a community-driven bottom-up approach for agreeing on data collection models. A pragmatic metadata repository collects real-world documentation and considers frequent metadata definitions as high quality with potential for reuse.
METHODS
We implemented a pragmatic metadata repository proof of concept application and filled it with medical forms from the Portal of Medical Data Models. We applied this prototype in two use cases to demonstrate its capabilities for reusing metadata: first, integration into a study editor for the suggestion of data elements and, second, metadata synchronization between two institutions. Moreover, we evaluated the emergence of bottom-up standards in the prototype and two medical data managers assessed their quality for 24 medical concepts.
RESULTS
The resulting prototype contained 466,569 unique metadata definitions. Integration into the study editor led to a reuse of 1836 items and item groups. During the metadata synchronization, semantic codes of 4608 data elements were transferred. Our evaluation revealed that for less complex medical concepts weak bottom-up standards could be established. However, more diverse disease-related concepts showed no convergence of data elements due to an enormous heterogeneity of metadata. The survey showed fair agreement (K = 0.50, 95% CI 0.43-0.56) for good item quality of bottom-up standards.
CONCLUSIONS
We demonstrated the feasibility of the pragmatic metadata repository concept for medical documentation. Applications of the prototype in two use cases suggest that it facilitates the reuse of data elements. Our evaluation showed that bottom-up standardization based on a large collection of real-world metadata can yield useful results. The proposed concept shall not replace existing top-down approaches, rather it complements them by showing what is commonly used in the community to guide other researchers.
Topics: Documentation; Humans; Metadata; Reference Standards; Semantics
PubMed: 34001121
DOI: 10.1186/s12911-021-01524-8 -
Handbook of Experimental Pharmacology 2020While research data has become integral to the scholarly endeavour, a number of challenges hinder its development, management and dissemination. This chapter follows the...
While research data has become integral to the scholarly endeavour, a number of challenges hinder its development, management and dissemination. This chapter follows the life cycle of research data, by considering aspects ranging from storage and preservation to sharing and legal factors. While it provides a wide overview of the current ecosystem, it also pinpoints the elements comprising the modern research sharing practices such as metadata creation, the FAIR principles, identifiers, Creative Commons licencing and the various repository options. Furthermore, the chapter discusses the mandates and regulations that influence data sharing and the possible technological means of overcoming their complexity, such as blockchain systems.
Topics: Data Collection; Ecosystem; Information Dissemination; Information Storage and Retrieval; Metadata
PubMed: 31792682
DOI: 10.1007/164_2019_288 -
Studies in Health Technology and... 2018Metadata management is an important task in medical informatics and highly affects the gain out of existing health information data. Data Warehouse solutions like...
Metadata management is an important task in medical informatics and highly affects the gain out of existing health information data. Data Warehouse solutions like Informatics for Integrating Biology and the Bedside (i2b2) are common tools for identifying patient cohorts and analyzing collected clinical data while respecting patient privacy. The Resource Description Framework (RDF) is designed for highly interoperable ontology representation in various formats, facilitating ontology and metadata management. Our approach is to combine i2b2's and RDF's benefits by importing the easy-to-edit RDF ontology into the extensive-research-enabling i2b2 software. We do so by using a SPARQL Protocol and RDF Query Language (SPARQL) interface, that enables RDF data queries, and developing a java program, which then generates i2b2-specific SQL insert statements. To demonstrate our solution's feasibility, we transcribe our lung disease specific ontology to RDF and import it into our i2b2 data warehouse.
Topics: Biological Ontologies; Data Warehousing; Humans; Medical Informatics; Metadata; Software
PubMed: 30147037
DOI: No ID Found