-
FEMS Microbiology Ecology Jun 2024Many R packages provide statistical approaches for elucidating the diversity of soil microbes, yet they still struggle to visualize microbial traits on a geographical...
Many R packages provide statistical approaches for elucidating the diversity of soil microbes, yet they still struggle to visualize microbial traits on a geographical map. This creates challenges in interpreting microbial biogeography on a regional scale, especially when the spatial scale is large or the distribution of sampling sites is uneven. Here, we developed a lightweight, flexible, and user-friendly R package called microgeo. This package integrates many functions involved in reading, manipulating, and visualizing geographical boundary data; downloading spatial datasets; and calculating microbial traits and rendering them onto a geographical map using grid-based visualization, spatial interpolation, or machine learning. Using this R package, users can visualize any trait calculated by microgeo or other tools on a map and can analyze microbiome data in conjunction with metadata derived from a geographical map. In contrast to other R packages that statistically analyze microbiome data, microgeo provides more-intuitive approaches in illustrating the biogeography of soil microbes on a large geographical scale, serving as an important supplement to statistically driven comparisons and facilitating the biogeographic analysis of publicly accessible microbiome data at a large spatial scale in a more convenient and efficient manner. The microgeo R package can be installed from the Gitee (https://gitee.com/bioape/microgeo) and GitHub (https://github.com/ChaonanLi/microgeo) repositories. Detailed tutorials for the microgeo R package are available at https://chaonanli.github.io/microgeo.
Topics: Soil Microbiology; Microbiota; Software; Bacteria; Phylogeography
PubMed: 38866720
DOI: 10.1093/femsec/fiae087 -
Journal of Infection in Developing... May 2024Global monitoring of severe acute respiratory syndrome related coronavirus 2 (SARS-CoV-2) genetic sequences and associated metadata is essential for coronavirus disease...
INTRODUCTION
Global monitoring of severe acute respiratory syndrome related coronavirus 2 (SARS-CoV-2) genetic sequences and associated metadata is essential for coronavirus disease 2019 (COVID-19) response. Therefore, Sanger's partial genome sequencing technique was used to monitor the circulating variants of SARS-CoV-2 in Cameroon.
METHODOLOGY
Nasopharyngeal specimen was collected from persons suspected of SARS-CoV-2 following the national guidelines between January and December 2021. All specimens with cycle threshold (Ct) below 30 after amplification were eligible for sequencing of the partial spike (S) gene of SARS-CoV-2 using the Sanger sequencing method.
RESULTS
During the year 2021, 1481 real time reverse transcriptase polymerase chain reaction (RT-PCR) SARS-CoV-2 positive samples were selected for partial sequencing of the S gene of SARS-CoV-2. Amongst these, 878 yielded good sequencing products. A total of 231 probable variants (26.3%) were identified. The variants were mainly represented by Delta (70.6%), Alpha (15.6%), Omicron (7.4%), Beta (3.5%), Mu (1.7%) and Gamma (0.4%). Phylogenetic analysis of the probable variants from Cameroon with reference strains confirmed that all prior and current variants of concern (VOC) clustered with their respective reference sequences.
CONCLUSIONS
The surveillance strategy implemented in Cameroon, based on partial sequencing of the S gene enabled identification of the major circulating variants and provided information on the distribution of these variants, which contributed to implementing public health measures to control disease spread in the country.
Topics: Humans; Cameroon; SARS-CoV-2; Spike Glycoprotein, Coronavirus; COVID-19; Male; Female; Adult; Adolescent; Child; Middle Aged; Young Adult; Child, Preschool; Nasopharynx; Aged; Phylogeny; Infant
PubMed: 38865404
DOI: 10.3855/jidc.18155 -
Journal of Korean Academy of Nursing May 2024This study aimed to investigate healthcare consumers' interest in patient safety on social media using structural topic modeling (STM) and to identify changes in...
PURPOSE
This study aimed to investigate healthcare consumers' interest in patient safety on social media using structural topic modeling (STM) and to identify changes in interest over time.
METHODS
Analyzing 105,727 posts from Naver news comments, blogs, internet cafés, and Twitter between 2010 and 2022, this study deployed a Python script for data collection and preprocessing. STM analysis was conducted using R, with the documents' publication years serving as metadata to trace the evolution of discussions on patient safety.
RESULTS
The analysis identified a total of 13 distinct topics, organized into three primary communities: (1) "Demand for systemic improvement of medical accidents," underscoring the need for legal and regulatory reform to enhance accountability; (2) "Efforts of the government and organizations for safety management," highlighting proactive risk mitigation strategies; and (3) "Medical accidents exposed in the media," reflecting widespread concerns over medical negligence and its repercussions. These findings indicate pervasive concerns regarding medical accountability and transparency among healthcare consumers.
CONCLUSION
The findings emphasize the importance of transparent healthcare policies and practices that openly address patient safety incidents. There is clear advocacy for policy reforms aimed at increasing the accountability and transparency of healthcare providers. Moreover, this study highlights the significance of educational and engagement initiatives involving healthcare consumers in fostering a culture of patient safety. Integrating consumer perspectives into patient safety strategies is crucial for developing a robust safety culture in healthcare.
Topics: Humans; Social Media; Patient Safety
PubMed: 38863193
DOI: 10.4040/jkan.23156 -
Genomics, Proteomics & Bioinformatics May 2024During the last decade, the generation and accumulation of petabase-scale high-throughput sequencing data have resulted in great challenges, including access to human...
During the last decade, the generation and accumulation of petabase-scale high-throughput sequencing data have resulted in great challenges, including access to human data, as well as transfer, storage, and sharing of enormous amounts of data. To promote data-driven biological research, the Korean government announced that all biological data generated from government-funded research projects should be deposited at the Korea BioData Station (K-BDS), which consists of multiple databases for individual data types. Here, we introduce the Korean Nucleotide Archive (KoNA), a repository of nucleotide sequence data. As of July 2022, the Korean Read Archive in KoNA has collected over 477 TB of raw next-generation sequencing data from national genome projects. To ensure data quality and prepare for international alignment, a standard operating procedure was adopted, which is similar to that of the International Nucleotide Sequence Database Collaboration. The standard operating procedure includes quality control processes for submitted data and metadata using an automated pipeline, followed by manual examination. To ensure fast and stable data transfer, a high-speed transmission system called GBox is used in KoNA. Furthermore, the data uploaded to or downloaded from KoNA through GBox can be readily processed using a cloud computing service called Bio-Express. This seamless coupling of KoNA, GBox, and Bio-Express enhances the data experience, including submission, access, and analysis of raw nucleotide sequences. KoNA not only satisfies the unmet needs for a national sequence repository in Korea but also provides datasets to researchers globally and contributes to advances in genomics. The KoNA is available at https://www.kobic.re.kr/kona/.
Topics: Republic of Korea; Databases, Nucleic Acid; Humans; High-Throughput Nucleotide Sequencing
PubMed: 38862433
DOI: 10.1093/gpbjnl/qzae017 -
Abdominal Radiology (New York) Jun 2024Accurate, automated MRI series identification is important for many applications, including display ("hanging") protocols, machine learning, and radiomics. The use of...
Accurate, automated MRI series identification is important for many applications, including display ("hanging") protocols, machine learning, and radiomics. The use of the series description or a pixel-based classifier each has limitations. We demonstrate a combined approach utilizing a DICOM metadata-based classifier and selective use of a pixel-based classifier to identify abdominal MRI series. The metadata classifier was assessed alone as Group metadata and combined with selective use of the pixel-based classifier for predictions with less than 70% certainty (Group combined). The overall accuracy (mean and 95% confidence intervals) for Groups metadata and combined on the test dataset were 0.870 CI (0.824,0.912) and 0.930 CI (0.893,0.963), respectively. With this combined metadata and pixel-based approach, we demonstrate accurate classification of 95% or greater for all pre-contrast MRI series and improved performance for some post-contrast series.
PubMed: 38860997
DOI: 10.1007/s00261-024-04379-5 -
Virus Evolution 2024Seasonal influenza virus predominantly evolves through antigenic drift, marked by the accumulation of mutations at antigenic sites. Because of antigenic drift, influenza...
Seasonal influenza virus predominantly evolves through antigenic drift, marked by the accumulation of mutations at antigenic sites. Because of antigenic drift, influenza vaccines are frequently updated, though their efficacy may still be limited due to strain mismatches. Despite the high levels of viral diversity observed across populations, most human studies reveal limited intrahost diversity, leaving the origin of population-level viral diversity unclear. Previous studies show host characteristics, such as immunity, might affect within-host viral evolution. Here we investigate influenza A viral diversity in children aged between 6 months and 18 years. Influenza virus evolution in children is less well characterized than in adults, yet may be associated with higher levels of viral diversity given the lower level of pre-existing immunity and longer durations of infection in children. We obtained influenza isolates from banked influenza A-positive nasopharyngeal swabs collected at the Children's Hospital of Philadelphia during the 2017-18 influenza season. Using next-generation sequencing, we evaluated the population of influenza viruses present in each sample. We characterized within-host viral diversity using the number and frequency of intrahost single-nucleotide variants (iSNVs) detected in each sample. We related viral diversity to clinical metadata, including subjects' age, vaccination status, and comorbid conditions, as well as sample metadata such as virus strain and cycle threshold. Consistent with previous studies, most samples contained low levels of diversity with no clear association between the subjects' age, vaccine status, or health status. Further, there was no enrichment of iSNVs near known antigenic sites. Taken together, these findings are consistent with previous observations that the majority of intrahost influenza virus infection is characterized by low viral diversity without evidence of diversifying selection.
PubMed: 38859985
DOI: 10.1093/ve/veae034 -
Journal of the American Medical... Jun 2024Precise literature recommendation and summarization are crucial for biomedical professionals. While the latest iteration of generative pretrained transformer (GPT)...
OBJECTIVES
Precise literature recommendation and summarization are crucial for biomedical professionals. While the latest iteration of generative pretrained transformer (GPT) incorporates 2 distinct modes-real-time search and pretrained model utilization-it encounters challenges in dealing with these tasks. Specifically, the real-time search can pinpoint some relevant articles but occasionally provides fabricated papers, whereas the pretrained model excels in generating well-structured summaries but struggles to cite specific sources. In response, this study introduces RefAI, an innovative retrieval-augmented generative tool designed to synergize the strengths of large language models (LLMs) while overcoming their limitations.
MATERIALS AND METHODS
RefAI utilized PubMed for systematic literature retrieval, employed a novel multivariable algorithm for article recommendation, and leveraged GPT-4 turbo for summarization. Ten queries under 2 prevalent topics ("cancer immunotherapy and target therapy" and "LLMs in medicine") were chosen as use cases and 3 established counterparts (ChatGPT-4, ScholarAI, and Gemini) as our baselines. The evaluation was conducted by 10 domain experts through standard statistical analyses for performance comparison.
RESULTS
The overall performance of RefAI surpassed that of the baselines across 5 evaluated dimensions-relevance and quality for literature recommendation, accuracy, comprehensiveness, and reference integration for summarization, with the majority exhibiting statistically significant improvements (P-values <.05).
DISCUSSION
RefAI demonstrated substantial improvements in literature recommendation and summarization over existing tools, addressing issues like fabricated papers, metadata inaccuracies, restricted recommendations, and poor reference integration.
CONCLUSION
By augmenting LLM with external resources and a novel ranking algorithm, RefAI is uniquely capable of recommending high-quality literature and generating well-structured summaries, holding the potential to meet the critical needs of biomedical professionals in navigating and synthesizing vast amounts of scientific literature.
PubMed: 38857454
DOI: 10.1093/jamia/ocae129 -
Journal of Medical Internet Research Jun 2024It is necessary to harmonize and standardize data variables used in case report forms (CRFs) of clinical studies to facilitate the merging and sharing of the collected...
BACKGROUND
It is necessary to harmonize and standardize data variables used in case report forms (CRFs) of clinical studies to facilitate the merging and sharing of the collected patient data across several clinical studies. This is particularly true for clinical studies that focus on infectious diseases. Public health may be highly dependent on the findings of such studies. Hence, there is an elevated urgency to generate meaningful, reliable insights, ideally based on a high sample number and quality data. The implementation of core data elements and the incorporation of interoperability standards can facilitate the creation of harmonized clinical data sets.
OBJECTIVE
This study's objective was to compare, harmonize, and standardize variables focused on diagnostic tests used as part of CRFs in 6 international clinical studies of infectious diseases in order to, ultimately, then make available the panstudy common data elements (CDEs) for ongoing and future studies to foster interoperability and comparability of collected data across trials.
METHODS
We reviewed and compared the metadata that comprised the CRFs used for data collection in and across all 6 infectious disease studies under consideration in order to identify CDEs. We examined the availability of international semantic standard codes within the Systemized Nomenclature of Medicine - Clinical Terms, the National Cancer Institute Thesaurus, and the Logical Observation Identifiers Names and Codes system for the unambiguous representation of diagnostic testing information that makes up the CDEs. We then proposed 2 data models that incorporate semantic and syntactic standards for the identified CDEs.
RESULTS
Of 216 variables that were considered in the scope of the analysis, we identified 11 CDEs to describe diagnostic tests (in particular, serology and sequencing) for infectious diseases: viral lineage/clade; test date, type, performer, and manufacturer; target gene; quantitative and qualitative results; and specimen identifier, type, and collection date.
CONCLUSIONS
The identification of CDEs for infectious diseases is the first step in facilitating the exchange and possible merging of a subset of data across clinical studies (and with that, large research projects) for possible shared analysis to increase the power of findings. The path to harmonization and standardization of clinical study data in the interest of interoperability can be paved in 2 ways. First, a map to standard terminologies ensures that each data element's (variable's) definition is unambiguous and that it has a single, unique interpretation across studies. Second, the exchange of these data is assisted by "wrapping" them in a standard exchange format, such as Fast Health care Interoperability Resources or the Clinical Data Interchange Standards Consortium's Clinical Data Acquisition Standards Harmonization Model.
Topics: Humans; Communicable Diseases; Semantics; Common Data Elements
PubMed: 38857066
DOI: 10.2196/50049 -
Frontiers in Public Health 2024Türkiye confirmed its first case of SARS-CoV-2 on March 11, 2020, coinciding with the declaration of the global COVID-19 pandemic. Subsequently, Türkiye swiftly...
BACKGROUND
Türkiye confirmed its first case of SARS-CoV-2 on March 11, 2020, coinciding with the declaration of the global COVID-19 pandemic. Subsequently, Türkiye swiftly increased testing capacity and implemented genomic sequencing in 2020. This paper describes Türkiye's journey of establishing genomic surveillance as a middle-income country with limited prior sequencing capacity and analyses sequencing data from the first two years of the pandemic. We highlight the achievements and challenges experienced and distill globally relevant lessons.
METHODS
We tracked the evolution of the COVID-19 pandemic in Türkiye from December 2020 to February 2022 through a timeline and analysed epidemiological, vaccination, and testing data. To investigate the phylodynamic and phylogeographic aspects of SARS-CoV-2, we used Nextstrain to analyze 31,629 high-quality genomes sampled from seven regions nationwide.
RESULTS
Türkiye's epidemiological curve, mirroring global trends, featured four distinct waves, each coinciding with the emergence and spread of variants of concern (VOCs). Utilizing locally manufactured kits to expand testing capacity and introducing variant-specific quantitative reverse transcription polymerase chain reaction (RT-qPCR) tests developed in partnership with a private company was a strategic advantage in Türkiye, given the scarcity and fragmented global supply chain early in the pandemic. Türkiye contributed more than 86,000 genomic sequences to global databases by February 2022, ensuring that Turkish data was reflected globally. The synergy of variant-specific RT-qPCR kits and genomic sequencing enabled cost-effective monitoring of VOCs. However, data analysis was constrained by a weak sequencing sampling strategy and fragmented data management systems, limiting the application of sequencing data to guide the public health response. Phylodynamic analysis indicated that Türkiye's geographical position as an international travel hub influenced both national and global transmission of each VOC despite travel restrictions.
CONCLUSION
This paper provides valuable insights into the testing and genomic surveillance systems adopted by Türkiye during the COVID-19 pandemic, proposing important lessons for countries developing national systems. The findings underscore the need for robust testing and sampling strategies, streamlined sample referral, and integrated data management with metadata linkage and data quality crucial for impactful epidemiological analysis. We recommend developing national genomic surveillance strategies to guide sustainable and integrated expansion of capacities built for COVID-19 and to optimize the effective utilization of sequencing data for public health action.
Topics: Humans; COVID-19; SARS-CoV-2; Genomics; Pandemics; Genome, Viral; Male
PubMed: 38855447
DOI: 10.3389/fpubh.2024.1332109 -
Frontiers in Water May 2024Antimicrobial resistance (AMR) is a world-wide public health threat that is projected to lead to 10 million annual deaths globally by 2050. The AMR public health issue...
Antimicrobial resistance (AMR) is a world-wide public health threat that is projected to lead to 10 million annual deaths globally by 2050. The AMR public health issue has led to the development of action plans to combat AMR, including improved antimicrobial stewardship, development of new antimicrobials, and advanced monitoring. The National Antimicrobial Resistance Monitoring System (NARMS) led by the United States (U.S) Food and Drug Administration along with the U.S. Centers for Disease Control and U.S. Department of Agriculture has monitored antimicrobial resistant bacteria in retail meats, humans, and food animals since the mid 1990's. NARMS is currently exploring an integrated One Health monitoring model recognizing that human, animal, plant, and environmental systems are linked to public health. Since 2020, the U.S. Environmental Protection Agency has led an interagency NARMS environmental working group (EWG) to implement a surface water AMR monitoring program (SWAM) at watershed and national scales. The NARMS EWG divided the development of the environmental monitoring effort into five areas: (i) defining objectives and questions, (ii) designing study/sampling design, (iii) selecting AMR indicators, (iv) establishing analytical methods, and (v) developing data management/analytics/metadata plans. For each of these areas, the consensus among the scientific community and literature was reviewed and carefully considered prior to the development of this environmental monitoring program. The data produced from the SWAM effort will help develop robust surface water monitoring programs with the goal of assessing public health risks associated with AMR pathogens in surface water (e.g., recreational water exposures), provide a comprehensive picture of how resistant strains are related spatially and temporally within a watershed, and help assess how anthropogenic drivers and intervention strategies impact the transmission of AMR within human, animal, and environmental systems.
PubMed: 38855419
DOI: 10.3389/frwa.2024.1359109