-
Journal of Microbiology & Biology... Jun 2024The current and ongoing challenges brought on by climate change will require future scientists who have hands-on experience using advanced molecular techniques, can work...
The current and ongoing challenges brought on by climate change will require future scientists who have hands-on experience using advanced molecular techniques, can work with large data sets, and can make correlations between metadata and microbial diversity. A course-embedded research project can prepare students to answer complex research questions that might help plants adapt to climate change. The project described herein uses plants as a host to study the impact of climate change-induced drought on host-microbe interactions through next-generation DNA sequencing and analysis using a command-line program. Specifically, the project studies the impact of simulated drought on the rhizosphere microbiome of Fast Plants rapid cycling using inexpensive greenhouse supplies and 16S rRNA V3/V4 Illumina sequencing. Data analysis is performed with the freely accessible Python-based microbiome bioinformatics platform QIIME 2.
PubMed: 38888313
DOI: 10.1128/jmbe.00046-24 -
ArXiv Jun 2024Limited universally adopted data standards in veterinary science hinders data interoperability and therefore integration and comparison; this ultimately impedes...
BACKGROUND –
Limited universally adopted data standards in veterinary science hinders data interoperability and therefore integration and comparison; this ultimately impedes application of existing information-based tools to support advancement in veterinary diagnostics, treatments, and precision medicine.
HYPOTHESIS/OBJECTIVES –
Creation of a Vertebrate Breed Ontology (VBO) as a single, coherent logic-based standard for documenting breed names in animal health, production and research-related records will improve data use capabilities in veterinary and comparative medicine.
ANIMALS –
No live animals were used in this study.
METHODS –
A list of breed names and related information was compiled from relevant sources, organizations, communities, and experts using manual and computational approaches to create VBO. Each breed is represented by a VBO term that includes all provenance and the breed's related information as metadata. VBO terms are classified using description logic to allow computational applications and Artificial Intelligence-readiness.
RESULTS –
VBO is an open, community-driven ontology representing over 19,000 livestock and companion animal breeds covering 41 species. Breeds are classified based on community and expert conventions (e.g., horse breed, cattle breed). This classification is supported by relations to the breeds' genus and species indicated by NCBI Taxonomy terms. Relationships between VBO terms, e.g. relating breeds to their foundation stock, provide additional context to support advanced data analytics. VBO term metadata includes common names and synonyms, breed identifiers/codes, and attributed cross-references to other databases.
CONCLUSION AND CLINICAL IMPORTANCE –
Veterinary data interoperability and computability can be enhanced by the adoption of VBO as a source of standard breed names in databases and veterinary electronic health records.
PubMed: 38883236
DOI: No ID Found -
Digital Health 2024This study aimed to determine the status of scientific production on biosensor usage for human health monitoring.
OBJECTIVE
This study aimed to determine the status of scientific production on biosensor usage for human health monitoring.
METHODS
We used bibliometrics based on the data and metadata retrieved from the Web of Science between 2007 and 2022. Articles unrelated to health and medicine were excluded. The databases were processed using the VOSviewer software and auxiliary spreadsheets. Data extraction yielded 275 articles published in 161 journals, mainly concentrated on 13 journals and 881 keywords plus.
RESULTS
The keywords plus of high occurrences were estimated at 27, with seven to 30 occurrences. From the 1595 identified authors, 125 were consistently connected in the coauthorship network in the total set and were grouped into nine clusters. Using Lotka's law, we identified 24 prolific authors, and Hirsch index analysis revealed that 45 articles were cited more than 45 times. Crosses were identified between 17 articles in the Hirsch index and 17 prolific authors, highlighting the presence of a large set of prolific authors from various interconnected clusters, a triad, and a solitary prolific author.
CONCLUSION
An exponential trend was observed in biosensor research for health monitoring, identifying areas of innovation, collaboration, and technological challenges that can guide future research on this topic.
PubMed: 38882252
DOI: 10.1177/20552076241256876 -
Health Informatics Journal 2024This study aims to address the critical challenges of data integrity, accuracy, consistency, and precision in the application of electronic medical record (EMR) data...
This study aims to address the critical challenges of data integrity, accuracy, consistency, and precision in the application of electronic medical record (EMR) data within the healthcare sector, particularly within the context of Chinese medical information data management. The research seeks to propose a solution in the form of a medical metadata governance framework that is efficient and suitable for clinical research and transformation. The article begins by outlining the background of medical information data management and reviews the advancements in artificial intelligence (AI) technology relevant to the field. It then introduces the "Service, Patient, Regression, base/Away, Yeast" (SPRAY)-type AI application as a case study to illustrate the potential of AI in EMR data management. The research identifies the scarcity of scientific research on the transformation of EMR data in Chinese hospitals and proposes a medical metadata governance framework as a solution. This framework is designed to achieve scientific governance of clinical data by integrating metadata management and master data management, grounded in clinical practices, medical disciplines, and scientific exploration. Furthermore, it incorporates an information privacy security architecture to ensure data protection. The proposed medical metadata governance framework, supported by AI technology, offers a structured approach to managing and transforming EMR data into valuable scientific research outcomes. This framework provides guidance for the identification, cleaning, mining, and deep application of EMR data, thereby addressing the bottlenecks currently faced in the healthcare scenario and paving the way for more effective clinical research and data-driven decision-making.
Topics: Artificial Intelligence; China; Humans; Electronic Health Records; Data Management; Metadata
PubMed: 38881290
DOI: 10.1177/14604582241262961 -
Journal of Biomedical Informatics Jun 2024Studies confirm that significant biases exist in online recommendation platforms, exacerbating pre-existing disparities and leading to less-than-optimal outcomes for...
BACKGROUND
Studies confirm that significant biases exist in online recommendation platforms, exacerbating pre-existing disparities and leading to less-than-optimal outcomes for underrepresented demographics. We study issues of bias in inclusion and representativeness in the context of healthcare information disseminated via videos on the YouTube social media platform, a widely used online channel for multi-media rich information. With one in three US adults using the Internet to learn about a health concern, it is critical to assess inclusivity and representativeness regarding how health information is disseminated by digital platforms such as YouTube.
METHODS
Leveraging methods from fair machine learning (ML), natural language processing and voice and facial recognition methods, we examine inclusivity and representativeness of video content presenters using a large corpus of videos and their metadata on a chronic condition (diabetes) extracted from the YouTube platform. Regression models are used to determine whether presenter demographics impact video popularity, measured by the video's average daily view count. A video that generates a higher view count is considered to be more popular.
RESULTS
The voice and facial recognition methods predicted the gender and race of the presenter with reasonable success. Gender is predicted through voice recognition (accuracy = 78 %, AUC = 76 %), while the gender and race predictions use facial recognition (accuracy = 93 %, AUC = 92 % and accuracy = 82 %, AUC = 80 %, respectively). The gender of the presenter is more significant for video views only when the face of the presenter is not visible while videos with male presenters with no face visibility have a positive relationship with view counts. Furthermore, videos with white and male presenters have a positive influence on view counts while videos with female and non - white group have high view counts.
CONCLUSION
Presenters' demographics do have an influence on average daily view count of videos viewed on social media platforms as shown by advanced voice and facial recognition algorithms used for assessing inclusion and representativeness of the video content. Future research can explore short videos and those at the channel level because popularity of the channel name and the number of videos associated with that channel do have an influence on view counts.
PubMed: 38880237
DOI: 10.1016/j.jbi.2024.104669 -
Scientific Data Jun 2024In low- and middle-income countries, the substantial costs associated with traditional data collection pose an obstacle to facilitating decision-making in the field of...
In low- and middle-income countries, the substantial costs associated with traditional data collection pose an obstacle to facilitating decision-making in the field of public health. Satellite imagery offers a potential solution, but the image extraction and analysis can be costly and requires specialized expertise. We introduce SatelliteBench, a scalable framework for satellite image extraction and vector embeddings generation. We also propose a novel multimodal fusion pipeline that utilizes a series of satellite imagery and metadata. The framework was evaluated generating a dataset with a collection of 12,636 images and embeddings accompanied by comprehensive metadata, from 81 municipalities in Colombia between 2016 and 2018. The dataset was then evaluated in 3 tasks: including dengue case prediction, poverty assessment, and access to education. The performance showcases the versatility and practicality of SatelliteBench, offering a reproducible, accessible and open tool to enhance decision-making in public health.
Topics: Satellite Imagery; Colombia; Public Health; Humans; Dengue; Metadata
PubMed: 38879585
DOI: 10.1038/s41597-024-03366-1 -
JMIR AI Mar 2024Large curated data sets are required to leverage speech-based tools in health care. These are costly to produce, resulting in increased interest in data sharing. As...
BACKGROUND
Large curated data sets are required to leverage speech-based tools in health care. These are costly to produce, resulting in increased interest in data sharing. As speech can potentially identify speakers (ie, voiceprints), sharing recordings raises privacy concerns. This is especially relevant when working with patient data protected under the Health Insurance Portability and Accountability Act.
OBJECTIVE
We aimed to determine the reidentification risk for speech recordings, without reference to demographics or metadata, in clinical data sets considering both the size of the search space (ie, the number of comparisons that must be considered when reidentifying) and the nature of the speech recording (ie, the type of speech task).
METHODS
Using a state-of-the-art speaker identification model, we modeled an adversarial attack scenario in which an adversary uses a large data set of identified speech (hereafter, the known set) to reidentify as many unknown speakers in a shared data set (hereafter, the unknown set) as possible. We first considered the effect of search space size by attempting reidentification with various sizes of known and unknown sets using VoxCeleb, a data set with recordings of natural, connected speech from >7000 healthy speakers. We then repeated these tests with different types of recordings in each set to examine whether the nature of a speech recording influences reidentification risk. For these tests, we used our clinical data set composed of recordings of elicited speech tasks from 941 speakers.
RESULTS
We found that the risk was inversely related to the number of comparisons an adversary must consider (ie, the search space), with a positive linear correlation between the number of false acceptances (FAs) and the number of comparisons (r=0.69; P<.001). The true acceptances (TAs) stayed relatively stable, and the ratio between FAs and TAs rose from 0.02 at 1 × 10 comparisons to 1.41 at 6 × 10 comparisons, with a near 1:1 ratio at the midpoint of 3 × 10 comparisons. In effect, risk was high for a small search space but dropped as the search space grew. We also found that the nature of a speech recording influenced reidentification risk, with nonconnected speech (eg, vowel prolongation: FA/TA=98.5; alternating motion rate: FA/TA=8) being harder to identify than connected speech (eg, sentence repetition: FA/TA=0.54) in cross-task conditions. The inverse was mostly true in within-task conditions, with the FA/TA ratio for vowel prolongation and alternating motion rate dropping to 0.39 and 1.17, respectively.
CONCLUSIONS
Our findings suggest that speaker identification models can be used to reidentify participants in specific circumstances, but in practice, the reidentification risk appears small. The variation in risk due to search space size and type of speech task provides actionable recommendations to further increase participant privacy and considerations for policy regarding public release of speech recordings.
PubMed: 38875581
DOI: 10.2196/52054 -
PloS One 2024Data curators play an important role in assessing data quality and take actions that may ultimately lead to better, more valuable data products. This study explores the...
Data curators play an important role in assessing data quality and take actions that may ultimately lead to better, more valuable data products. This study explores the curation practices of data curators working within US-based data repositories. We performed a survey in January 2021 to benchmark the levels of curation performed by repositories and assess the perceived value and impact of curation on the data sharing process. Our analysis included 95 responses from 59 unique data repositories. Respondents primarily were professionals working within repositories and examined curation performed within a repository setting. A majority 72.6% of respondents reported that "data-level" curation was performed by their repository and around half reported their repository took steps to ensure interoperability and reproducibility of their repository's datasets. Curation actions most frequently reported include checking for duplicate files, reviewing documentation, reviewing metadata, minting persistent identifiers, and checking for corrupt/broken files. The most "value-add" curation action across generalist, institutional, and disciplinary repository respondents was related to reviewing and enhancing documentation. Respondents reported high perceived impact of curation by their repositories on specific data sharing outcomes including usability, findability, understandability, and accessibility of deposited datasets; respondents associated with disciplinary repositories tended to perceive higher impact on most outcomes. Most survey participants strongly agreed that data curation by the repository adds value to the data sharing process and that it outweighs the effort and cost. We found some differences between institutional and disciplinary repositories, both in the reported frequency of specific curation actions as well as the perceived impact of data curation. Interestingly, we also found variation in the perceptions of those working within the same repository regarding the level and frequency of curation actions performed, which exemplifies the complexity of a repository curation work. Our results suggest data curation may be better understood in terms of specific curation actions and outcomes than broadly defined curation levels and that more research is needed to understand the resource implications of performing these activities. We share these results to provide a more nuanced view of curation, and how curation impacts the broader data lifecycle and data sharing behaviors.
Topics: Humans; Data Curation; Surveys and Questionnaires; United States; Information Dissemination; Data Accuracy; Databases, Factual; Reproducibility of Results
PubMed: 38875230
DOI: 10.1371/journal.pone.0301171 -
Scientific Data Jun 2024Facilitating data sharing in scientific research, especially in the domain of animal studies, holds immense value, particularly in mitigating distress and enhancing the...
Facilitating data sharing in scientific research, especially in the domain of animal studies, holds immense value, particularly in mitigating distress and enhancing the efficiency of data collection. This study unveils a meticulously curated collection of neural activity data extracted from six electrophysiological datasets recorded from three parietal areas (V6A, PEc, PE) of two Macaca fascicularis during an instructed-delay foveated reaching task. This valuable resource is now accessible to the public, featuring spike timestamps, behavioural event timings and supplementary metadata, all presented alongside a comprehensive description of the encompassing structure. To enhance accessibility, data are stored as HDF5 files, a convenient format due to its flexible structure and the capability to attach diverse information to each hierarchical sub-level. To guarantee ready-to-use datasets, we also provide some MATLAB and Python code examples, enabling users to quickly familiarize themselves with the data structure.
Topics: Animals; Parietal Lobe; Macaca fascicularis
PubMed: 38871737
DOI: 10.1038/s41597-024-03479-7 -
Data in Brief Jun 2024This paper presents the data (images, observations, metadata) of three different deployments of camera traps in the Amsterdam Water Supply Dunes, a Natura 2000 nature...
This paper presents the data (images, observations, metadata) of three different deployments of camera traps in the Amsterdam Water Supply Dunes, a Natura 2000 nature reserve in the coastal dunes of the Netherlands. The pilots were aimed at determining how different types of camera deployment (e.g. regular vs. wide lens, various heights, inside/outside exclosures) might influence species detections, and how to deploy autonomous wildlife monitoring networks. Two pilots were conducted in herbivore exclosures and mainly detected European rabbits () and red fox (). The third pilot was conducted outside exclosures, with the European fallow deer () being most prevalent. Across all three pilots, a total of 47,597 images were annotated using the Agouti platform. All annotations were verified and quality-checked by a human expert. A total of 2,779 observations of 20 different species (including humans) were observed using 11 wildlife cameras during 2021-2023. The raw image files (excluding humans), image metadata, deployment metadata and observations from each pilot are shared using the Camtrap DP open standard and the extended data publishing capabilities of GBIF to increase the findability, accessibility, interoperability, and reusability of this data. The data are freely available and can be used for developing artificial intelligence (AI) algorithms that automatically detect and identify species from wildlife camera images.
PubMed: 38868386
DOI: 10.1016/j.dib.2024.110544