-
BMC Medicine Mar 2024The specific microbiota and associated metabolites linked to non-alcoholic fatty liver disease (NAFLD) are still controversial. Thus, we aimed to understand how the core...
BACKGROUND
The specific microbiota and associated metabolites linked to non-alcoholic fatty liver disease (NAFLD) are still controversial. Thus, we aimed to understand how the core gut microbiota and metabolites impact NAFLD.
METHODS
The data for the discovery cohort were collected from the Guangzhou Nutrition and Health Study (GNHS) follow-up conducted between 2014 and 2018. We collected 272 metadata points from 1546 individuals. The metadata were input into four interpretable machine learning models to identify important gut microbiota associated with NAFLD. These models were subsequently applied to two validation cohorts [the internal validation cohort (n = 377), and the prospective validation cohort (n = 749)] to assess generalizability. We constructed an individual microbiome risk score (MRS) based on the identified gut microbiota and conducted animal faecal microbiome transplantation experiment using faecal samples from individuals with different levels of MRS to determine the relationship between MRS and NAFLD. Additionally, we conducted targeted metabolomic sequencing of faecal samples to analyse potential metabolites.
RESULTS
Among the four machine learning models used, the lightGBM algorithm achieved the best performance. A total of 12 taxa-related features of the microbiota were selected by the lightGBM algorithm and further used to calculate the MRS. Increased MRS was positively associated with the presence of NAFLD, with odds ratio (OR) of 1.86 (1.72, 2.02) per 1-unit increase in MRS. An elevated abundance of the faecal microbiota (f__veillonellaceae) was associated with increased NAFLD risk, whereas f__rikenellaceae, f__barnesiellaceae, and s__adolescentis were associated with a decreased presence of NAFLD. Higher levels of specific gut microbiota-derived metabolites of bile acids (taurocholic acid) might be positively associated with both a higher MRS and NAFLD risk. FMT in mice further confirmed a causal association between a higher MRS and the development of NAFLD.
CONCLUSIONS
We confirmed that an alteration in the composition of the core gut microbiota might be biologically relevant to NAFLD development. Our work demonstrated the role of the microbiota in the development of NAFLD.
Topics: Middle Aged; Humans; Animals; Mice; Aged; Non-alcoholic Fatty Liver Disease; Gastrointestinal Microbiome; Liver; Independent Living; Microbiota
PubMed: 38454425
DOI: 10.1186/s12916-024-03317-y -
Foodborne Pathogens and Disease Sep 2023(. ) is a commensal organism or pathogen causing diseases in animals and humans, as well as widespread in the environment. Antimicrobial resistance (AMR) has...
(. ) is a commensal organism or pathogen causing diseases in animals and humans, as well as widespread in the environment. Antimicrobial resistance (AMR) has increasingly affected both animal and human health and continues to raise public health concerns. A decade ago, it was estimated that the increased use of whole genome sequencing (WGS) combined with sharing of public data would drastically change and improve the surveillance and understanding of epidemiology and AMR. This study aimed to evaluate the current usefulness of public WGS data for surveillance and to investigate the associations between serovars, antibiotic resistance genes (ARGs), and metadata. Out of 191,306 genomes deposited in European Nucleotide Archive and NCBI databases, 47,452 WGS with sufficient minimum metadata (country, year, and source) of were retrieved from 116 countries and isolated between 1905 and 2020. For analysis of the WGS data, KmerFinder, SISTR, and ResFinder were used for species, serovars, and AMR identification, respectively. The results showed that the five common isolation sources of are human (29.10%), avian (22.50%), environment (11.89%), water (9.33%), and swine (6.62%). The most common ARG profiles for each class of antimicrobials are β-lactam (; 6.78%), fluoroquinolone [([T57S], ); 0.87%], folate pathway antagonist (; 8.35%), macrolide [(A); 0.39%], phenicol (; 5.94%), polymyxin B (; 0.09%), and tetracycline [(A); 12.95%]. Our study reports the first overview of ARG profiles in publicly available genomes from online databases. All data sets from this study can be searched at Microreact.
Topics: Humans; Animals; Swine; Anti-Bacterial Agents; Metadata; Drug Resistance, Bacterial; Salmonella; Salmonella enterica; Drug Resistance, Multiple, Bacterial
PubMed: 37540138
DOI: 10.1089/fpd.2022.0080 -
Frontiers in Neuroinformatics 2023Despite the efforts of the neuroscience community, there are many published neuroimaging studies with data that are still not or . Users face significant challenges in...
BACKGROUND
Despite the efforts of the neuroscience community, there are many published neuroimaging studies with data that are still not or . Users face significant challenges in neuroimaging data due to the lack of provenance metadata, such as experimental protocols, study instruments, and details about the study participants, which is also required for To implement the FAIR guidelines for neuroimaging data, we have developed an iterative ontology engineering process and used it to create the NeuroBridge ontology. The NeuroBridge ontology is a computable model of provenance terms to implement FAIR principles and together with an international effort to annotate full text articles with ontology terms, the ontology enables users to locate relevant neuroimaging datasets.
METHODS
Building on our previous work in metadata modeling, and in concert with an initial annotation of a representative corpus, we modeled diagnosis terms (e.g., schizophrenia, alcohol usage disorder), magnetic resonance imaging (MRI) scan types (T1-weighted, task-based, etc.), clinical symptom assessments (PANSS, AUDIT), and a variety of other assessments. We used the feedback of the annotation team to identify missing metadata terms, which were added to the NeuroBridge ontology, and we restructured the ontology to support both the final annotation of the corpus of neuroimaging articles by a second, independent set of annotators, as well as the functionalities of the NeuroBridge search portal for neuroimaging datasets.
RESULTS
The NeuroBridge ontology consists of 660 classes with 49 properties with 3,200 axioms. The ontology includes mappings to existing ontologies, enabling the NeuroBridge ontology to be interoperable with other domain specific terminological systems. Using the ontology, we annotated 186 neuroimaging full-text articles describing the participant types, scanning, clinical and cognitive assessments.
CONCLUSION
The NeuroBridge ontology is the first computable metadata model that represents the types of data available in recent neuroimaging studies in schizophrenia and substance use disorders research; it can be extended to include more granular terms as needed. This metadata ontology is expected to form the computational foundation to help both investigators to make their data FAIR compliant and support users to conduct reproducible neuroimaging research.
PubMed: 37554248
DOI: 10.3389/fninf.2023.1216443 -
MedRxiv : the Preprint Server For... Aug 2023There are many studies that require researchers to extract specific information from the published literature, such as details about sequence records or about a...
Text mining biomedical literature to identify extremely unbalanced data for digital epidemiology and systematic reviews: dataset and methods for a SARS-CoV-2 genomic epidemiology study.
There are many studies that require researchers to extract specific information from the published literature, such as details about sequence records or about a randomized control trial. While manual extraction is cost efficient for small studies, larger studies such as systematic reviews are much more costly and time-consuming. To avoid exhaustive manual searches and extraction, and their related cost and effort, natural language processing (NLP) methods can be tailored for the more subtle extraction and decision tasks that typically only humans have performed. The need for such studies that use the published literature as a data source became even more evident as the COVID-19 pandemic raged through the world and millions of sequenced samples were deposited in public repositories such as GISAID and GenBank, promising large genomic epidemiology studies, but more often than not lacked many important details that prevented large-scale studies. Thus, granular geographic location or the most basic patient-relevant data such as demographic information, or clinical outcomes were not noted in the sequence record. However, some of these data was indeed published, but in the text, tables, or supplementary material of a corresponding published article. We present here methods to identify relevant journal articles that report having produced and made available in GenBank or GISAID, new SARS-CoV-2 sequences, as those that initially produced and made available the sequences are the most likely articles to include the high-level details about the patients from whom the sequences were obtained. Human annotators validated the approach, creating a gold standard set for training and validation of a machine learning classifier. Identifying these articles is a crucial step to enable future automated informatics pipelines that will apply Machine Learning and Natural Language Processing to identify patient characteristics such as co-morbidities, outcomes, age, gender, and race, enriching SARS-CoV-2 sequence databases with actionable information for defining large genomic epidemiology studies. Thus, enriched patient metadata can enable secondary data analysis, at scale, to uncover associations between the viral genome (including variants of concern and their sublineages), transmission risk, and health outcomes. However, for such enrichment to happen, the right papers need to be found and very detailed data needs to be extracted from them. Further, finding the very specific articles needed for inclusion is a task that also facilitates scoping and systematic reviews, greatly reducing the time needed for full-text analysis and extraction.
PubMed: 37577535
DOI: 10.1101/2023.07.29.23293370 -
PloS One 2023Reference data is key to produce reliable crop type and cropland maps. Although research projects, national and international programs as well as local initiatives...
Reference data is key to produce reliable crop type and cropland maps. Although research projects, national and international programs as well as local initiatives constantly gather crop related reference data, finding, collecting, and harmonizing data from different sources is a challenging task. Furthermore, ethical, legal, and consent-related restrictions associated with data sharing represent a common dilemma faced by international research projects. We address these dilemmas by building a community-based, open, harmonised reference data repository at global extent, ready for model training or product validation. Our repository contains data from different sources such as the Group on Earth Observations Global Agricultural Monitoring Initiative (GEOGLAM) Joint Experiment for Crop Assessment and Monitoring (JECAM) sites, the Radiant MLHub, the Future Harvest (CGIAR) centers, the National Aeronautics and Space Administration Food Security and Agriculture Program (NASA Harvest), the International Institute for Applied Systems Analysis (IIASA) citizen science platforms (LACO-Wiki and Geo-Wiki), as well as from individual project contributions. Data of 2016 onwards were collected, harmonised, and annotated. The data sets spatial, temporal, and thematic quality were assessed applying rules developed in this research. Currently, the repository holds around 75 million harmonised observations with standardized metadata of which a large share is available to the public. The repository, funded by ESA through the WorldCereal project, can be used for either the calibration of image classification deep learning algorithms or the validation of Earth Observation generated products, such as global cropland extent and maize and wheat maps. We recommend continuing and institutionalizing this reference data initiative e.g. through GEOGLAM, and encouraging the community to publish land cover and crop type data following the open science and open data principles.
Topics: Agriculture; Algorithms
PubMed: 37440484
DOI: 10.1371/journal.pone.0287731 -
BMC Biology Oct 2023Current solutions for the analysis of Western Blot images lack either transparency and reproducibility or can be tedious to use if one has to ensure the reproducibility...
BACKGROUND
Current solutions for the analysis of Western Blot images lack either transparency and reproducibility or can be tedious to use if one has to ensure the reproducibility of the analysis.
RESULTS
Here, we present an open-source gel image analysis program, IOCBIO Gel. It is designed to simplify image analysis and link the analysis results with the metadata describing the measurements. The software runs on all major desktop operating systems. It allows one to use it in either a single-researcher environment with local storage of the data or in a multiple-researcher environment using a central database to facilitate data sharing within the research team and beyond. By recording the original image and all operations performed on it, such as image cropping, subtraction of background, sample lane selection, and integration boundaries, the software ensures the reproducibility of the analysis and simplifies making corrections at any stage of the research. The analysis results are available either through direct access to the database used to store it or through the export of the relevant data.
CONCLUSIONS
The software is not only limited to Western Blot image analysis and can be used to analyze images obtained as a part of many other widely used biochemical techniques such as isoelectric focusing. By recording the original data and all the analysis steps, the program improves reproducibility in the analysis and contributes to the implementation of FAIR principles in the related fields.
Topics: Reproducibility of Results; Software; Image Processing, Computer-Assisted; Blotting, Western
PubMed: 37864184
DOI: 10.1186/s12915-023-01734-8 -
Diagnostics (Basel, Switzerland) Jul 2023Diffuse lung disorders (DLDs) and interstitial lung diseases (ILDs) are pathological conditions affecting the lung parenchyma and interstitial network. There are... (Review)
Review
Diffuse lung disorders (DLDs) and interstitial lung diseases (ILDs) are pathological conditions affecting the lung parenchyma and interstitial network. There are approximately 200 different entities within this category. Radiologists play an increasingly important role in diagnosing and monitoring ILDs, as they can provide non-invasive, rapid, and repeatable assessments using high-resolution computed tomography (HRCT). HRCT offers a detailed view of the lung parenchyma, resembling a low-magnification anatomical preparation from a histological perspective. The intrinsic contrast provided by air in HRCT enables the identification of even the subtlest morphological changes in the lung tissue. By interpreting the findings observed on HRCT, radiologists can make a differential diagnosis and provide a pattern diagnosis in collaboration with the clinical and functional data. The use of quantitative software and artificial intelligence (AI) further enhances the analysis of ILDs, providing an objective and comprehensive evaluation. The integration of "meta-data" such as demographics, laboratory, genomic, metabolomic, and proteomic data through AI could lead to a more comprehensive clinical and instrumental profiling beyond the human eye's capabilities.
PubMed: 37510077
DOI: 10.3390/diagnostics13142333 -
BioRxiv : the Preprint Server For... Jan 2024Alzheimer's disease (AD) and related dementias (ADRD) is a complex disease with multiple pathophysiological drivers that determine clinical symptomology and disease...
Alzheimer's disease (AD) and related dementias (ADRD) is a complex disease with multiple pathophysiological drivers that determine clinical symptomology and disease progression. These diseases develop insidiously over time, through many pathways and disease mechanisms and continue to have a huge societal impact for affected individuals and their families. While emerging blood-based biomarkers, such as plasma p-tau181 and p-tau217, accurately detect Alzheimer neuropthology and are associated with faster cognitive decline, the full extension of plasma proteomic changes in ADRD remains unknown. Earlier detection and better classification of the different subtypes may provide opportunities for earlier, more targeted interventions, and perhaps a higher likelihood of successful therapeutic development. In this study, we aim to leverage unbiased mass spectrometry proteomics to identify novel, blood-based biomarkers associated with cognitive decline. 1,786 plasma samples from 1,005 patients were collected over 12 years from partcipants in the Massachusetts Alzheimer's Disease Research Center Longitudinal Cohort Study. Patient metadata includes demographics, final diagnoses, and clinical dementia rating (CDR) scores taken concurrently. The Proteograph Product Suite (Seer, Inc.) and liquid-chromatography mass-spectrometry (LC-MS) analysis were used to process the plasma samples in this cohort and generate unbiased proteomics data. Data-independent acquisition (DIA) mass spectrometry results yielded 36,259 peptides and 4,007 protein groups. Linear mixed effects models revealed 138 differentially abundant proteins between AD and healthy controls. Machine learning classification models for AD diagnosis identified potential candidate biomarkers including MBP, BGLAP, and APoD. Cox regression models were created to determine the association of proteins with disease progression and suggest CLNS1A, CRISPLD2, and GOLPH3 as targets of further investigation as potential biomarkers. The Proteograph workflow provided deep, unbiased coverage of the plasma proteome at a speed that enabled a cohort study of almost 1,800 samples, which is the largest, deep, unbiased proteomics study of ADRD conducted to date.
PubMed: 38260620
DOI: 10.1101/2024.01.05.574446 -
Frontiers in Cellular Neuroscience 2023We performed a systematic review that identified at least 9,000 scientific papers on PubMed that include immunofluorescent images of cells from the central nervous...
BACKGROUND
We performed a systematic review that identified at least 9,000 scientific papers on PubMed that include immunofluorescent images of cells from the central nervous system (CNS). These CNS papers contain tens of thousands of immunofluorescent neural images supporting the findings of over 50,000 associated researchers. While many existing reviews discuss different aspects of immunofluorescent microscopy, such as image acquisition and staining protocols, few papers discuss immunofluorescent imaging from an image-processing perspective. We analyzed the literature to determine the image processing methods that were commonly published alongside the associated CNS cell, microscopy technique, and animal model, and highlight gaps in image processing documentation and reporting in the CNS research field.
METHODS
We completed a comprehensive search of PubMed publications using Medical Subject Headings (MeSH) terms and other general search terms for CNS cells and common fluorescent microscopy techniques. Publications were found on PubMed using a combination of column description terms and row description terms. We manually tagged the comma-separated values file (CSV) metadata of each publication with the following categories: animal or cell model, quantified features, threshold techniques, segmentation techniques, and image processing software.
RESULTS
Of the almost 9,000 immunofluorescent imaging papers identified in our search, only 856 explicitly include image processing information. Moreover, hundreds of the 856 papers are missing thresholding, segmentation, and morphological feature details necessary for explainable, unbiased, and reproducible results. In our assessment of the literature, we visualized current image processing practices, compiled the image processing options from the top twelve software programs, and designed a road map to enhance image processing. We determined that thresholding and segmentation methods were often left out of publications and underreported or underutilized for quantifying CNS cell research.
DISCUSSION
Less than 10% of papers with immunofluorescent images include image processing in their methods. A few authors are implementing advanced methods in image analysis to quantify over 40 different CNS cell features, which can provide quantitative insights in CNS cell features that will advance CNS research. However, our review puts forward that image analysis methods will remain limited in rigor and reproducibility without more rigorous and detailed reporting of image processing methods.
CONCLUSION
Image processing is a critical part of CNS research that must be improved to increase scientific insight, explainability, reproducibility, and rigor.
PubMed: 37545881
DOI: 10.3389/fncel.2023.1188858 -
Microbial Genomics Aug 2023Inferring the spatiotemporal spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) via Bayesian phylogeography has been complicated by the overwhelming...
Inferring the spatiotemporal spread of severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) via Bayesian phylogeography has been complicated by the overwhelming sampling bias present in the global genomic dataset. Previous work has demonstrated the utility of metadata in addressing this bias. Specifically, the inclusion of recent travel history of SARS-CoV-2-positive individuals into extended phylogeographical models has demonstrated increased accuracy of estimates, along with proposing alternative hypotheses that were not apparent using only genomic and geographical data. However, as the availability of comprehensive epidemiological metadata is limited, many of the current estimates rely on sequence data and basic metadata (i.e. sample date and location). As the bias within the SARS-CoV-2 sequence dataset is extensive, the degree to which we can rely on results drawn from standard phylogeographical models (i.e. discrete trait analysis) that lack integrated metadata is of great concern. This is particularly important when estimates influence and inform public health policy. We compared results generated from the same dataset, using two discrete phylogeographical models: one including travel history metadata and one without. We utilized sequences from Victoria, Australia, in this case study for two unique properties. Firstly, the high proportion of cases sequenced throughout 2020 within Victoria and the rest of Australia. Secondly, individual travel history was collected from returning travellers in Victoria during the first wave (January to May) of the coronavirus disease 2019 (COVID-19) pandemic. We found that the implementation of individual travel history was essential for the estimation of SARS-CoV-2 movement via discrete phylogeography models. Without the additional information provided by the travel history metadata, the discrete trait analysis could not be fit to the data due to numerical instability. We also suggest that during the first wave of the COVID-19 pandemic in Australia, the primary driving force behind the spread of SARS-CoV-2 was viral importation from international locations. This case study demonstrates the necessity of robust genomic datasets supplemented with epidemiological metadata for generating accurate estimates from phylogeographical models in datasets that have significant sampling bias. For future work, we recommend the collection of metadata in conjunction with genomic data. Furthermore, we highlight the risk of applying phylogeographical models to biased datasets without incorporating appropriate metadata, especially when estimates influence public health policy decision making.
Topics: Humans; SARS-CoV-2; Phylogeography; COVID-19; Bayes Theorem; Metadata; Pandemics; Victoria
PubMed: 37650865
DOI: 10.1099/mgen.0.001099