-
Microbiology Spectrum Aug 2023Staphylococcus aureus is an opportunistic pathogen and a leading cause of morbidity and mortality worldwide. Genomic-based surveillance has greatly improved our ability...
Staphylococcus aureus is an opportunistic pathogen and a leading cause of morbidity and mortality worldwide. Genomic-based surveillance has greatly improved our ability to track the emergence and spread of high-risk clones, but the full potential of genomic data is only reached when used in conjunction with detailed metadata. Here, we demonstrate the utility of an integrated approach by leveraging a curated collection of clinical and epidemiological metadata of S. aureus in the San Matteo Hospital (Italy) through a semisupervised clustering strategy. We sequenced 226 sepsis S. aureus samples, recovered over a period of 9 years. By using existing antibiotic profiling data, we selected strains that capture the full diversity of the population. Genome analysis revealed 49 sequence types, 16 of which are novel. Comparative genomic analyses of hospital- and community-acquired infection ruled out the existence of genomic features differentiating them, while evolutionary analyses of genes and traits of interest highlighted different dynamics of acquisition and loss between antibiotic resistance and virulence genes. Finally, highly resistant clones belonging to clonal complexes (CC) 8 and 22 were found to be responsible for abundant infections and deaths, while the highly virulent CC30 was responsible for rare but deadly episodes of infections. Genome sequencing is an important tool in clinical microbiology, as it allows in-depth characterization of isolates of interest and can propel genome-based surveillance studies. Such studies can benefit from methods of sample selection to capture the genomic diversity present in a data set. Here, we present an approach based on clustering of antibiotic resistance profiles that allows optimal sample selection for bacterial genomic surveillance. We apply the method to a 9-year collection of Staphylococcus aureus from a large hospital in northern Italy. Our method allows us to sequence the genomes of a large variety of strains of this important pathogen, which we then leverage to characterize the epidemiology in the hospital and to perform evolutionary analyses on genes and traits of interest. These analyses highlight different dynamics of acquisition and loss between antibiotic resistance and virulence genes.
Topics: Humans; Staphylococcus aureus; Metadata; Staphylococcal Infections; Genome, Bacterial; Anti-Bacterial Agents; Hospitals; Methicillin-Resistant Staphylococcus aureus; Microbial Sensitivity Tests
PubMed: 37458594
DOI: 10.1128/spectrum.01010-23 -
Data in Brief Dec 2023The real-time detection of multinational banknotes remains an ongoing research challenge within the academic community. Numerous studies have been conducted to address...
The real-time detection of multinational banknotes remains an ongoing research challenge within the academic community. Numerous studies have been conducted to address the need for rapid and accurate banknote recognition, counterfeit detection, and identification of damaged banknotes [1], [2], [3]. State-of-the-art techniques, such as machine learning (ML) and deep learning (DL), have supplanted traditional digital image processing methods in banknote recognition and classification. However, the success of ML or DL projects critically hinges on the size and comprehensiveness of the datasets employed. Existing datasets suffer from several limitations. Firstly, there is a notable absence of a Peruvian banknote dataset suitable for training ML or DL models. Second, the lack of annotated data with specific labels and metadata for Peruvian currency hinders the development of effective supervised learning models for banknote recognition and classification. Lastly, datasets from different regions may not align with the unique characteristics, design, and security features of Peruvian banknotes, limiting the accuracy and applicability of models in a Peruvian context [4] To address these limitations, we have meticulously curated a comprehensive dataset comprising a total of 9,315 images of Peruvian banknotes, encompassing both old and new denominations from 2011 (old) and 2019 (new) [5]. The Peruvian banknote dataset includes denominations of 10, 20, 50, and 100 Peruvian soles. Importantly, as indicated by [5], both the 2011 and 2019 families of banknotes are currently in circulation, further enhancing the dataset's relevance for real-world applications in currency recognition and verification. This dataset serves as a vital resource for addressing the challenges in real-time multinational banknote detection. By offering a comprehensive collection of images of Peruvian banknotes, both old and new, this dataset fills a critical gap in the field of banknote recognition. Researchers can utilize it to train and evaluate advanced machine learning and deep learning models, ultimately enhancing the accuracy of banknote processing systems.
PubMed: 37965616
DOI: 10.1016/j.dib.2023.109715 -
BioRxiv : the Preprint Server For... Oct 2023Biological invasions carry substantial practical and scientific importance, and represent natural evolutionary experiments on contemporary timescales. Here, we...
Biological invasions carry substantial practical and scientific importance, and represent natural evolutionary experiments on contemporary timescales. Here, we investigated genomic diversity and environmental adaptation of the crop pest using whole-genome sequencing data and environmental metadata for 29 population samples from its native and invasive range. Through a multifaceted analysis of this population genomic data, we increase our understanding of the genome, its diversity and its evolution, and we identify an appropriate genotype-environment association pipeline for our data set. Using this approach, we detect genetic signals of local adaptation associated with nine distinct environmental factors related to altitude, wind speed, precipitation, temperature, and human land use. We uncover unique functional signatures for each environmental variable, such as a prevalence of cuticular genes associated with annual precipitation. We also infer biological commonalities in the adaptation to diverse selective pressures, particularly in terms of the apparent contribution of nervous system evolution to enriched processes (ranging from neuron development to circadian behavior) and to top genes associated with all nine environmental variables. Our findings therefore depict a finer-scale adaptive landscape underlying the rapid invasion success of this agronomically important species.
PubMed: 37461625
DOI: 10.1101/2023.07.03.547576 -
BMC Genomics Sep 2023Comparative genomics is the comparison of genetic information within and across organisms to understand the evolution, structure, and function of genes, proteins, and... (Review)
Review
Comparative genomics is the comparison of genetic information within and across organisms to understand the evolution, structure, and function of genes, proteins, and non-coding regions (Sivashankari and Shanmughavel, Bioinformation 1:376-8, 2007). Advances in sequencing technology and assembly algorithms have resulted in the ability to sequence large genomes and provided a wealth of data that are being used in comparative genomic analyses. Comparative analysis can be leveraged to systematically explore and evaluate the biological relationships and evolution between species, aid in understanding the structure and function of genes, and gain a better understanding of disease and potential drug targets. As our knowledge of genetics expands, comparative genomics can help identify emerging model organisms among a broader span of the tree of life, positively impacting human health. This impact includes, but is not limited to, zoonotic disease research, therapeutics development, microbiome research, xenotransplantation, oncology, and toxicology. Despite advancements in comparative genomics, new challenges have arisen around the quantity, quality assurance, annotation, and interoperability of genomic data and metadata. New tools and approaches are required to meet these challenges and fulfill the needs of researchers. This paper focuses on how the National Institutes of Health (NIH) Comparative Genomics Resource (CGR) can address both the opportunities for comparative genomics to further impact human health and confront an increasingly complex set of challenges facing researchers.
Topics: United States; Humans; Genomics; Algorithms; Comparative Genomic Hybridization; Drug Delivery Systems; National Institutes of Health (U.S.)
PubMed: 37759191
DOI: 10.1186/s12864-023-09643-4 -
Journal of Chemical Information and... Nov 2023Web ontologies are important tools in modern scientific research because they provide a standardized way to represent and manage web-scale amounts of complex data. In...
Web ontologies are important tools in modern scientific research because they provide a standardized way to represent and manage web-scale amounts of complex data. In chemistry, a semantic database for chemical species is indispensable for its ability to interrelate and infer relationships, enabling a more precise analysis and prediction of chemical behavior. This paper presents OntoSpecies, a web ontology designed to represent chemical species and their properties. The ontology serves as a core component of The World Avatar knowledge graph chemistry domain and includes a wide range of identifiers, chemical and physical properties, chemical classifications and applications, and spectral information associated with each species. The ontology includes provenance and attribution metadata, ensuring the reliability and traceability of data. Most of the information about chemical species are sourced from PubChem and ChEBI data on the respective compound Web pages using a software agent, making OntoSpecies a comprehensive semantic database of chemical species able to solve novel types of problems in the field. Access to this reliable source of chemical data is provided through a SPARQL end point. The paper presents example use cases to demonstrate the contribution of OntoSpecies in solving complex tasks that require integrated semantically searchable chemical data. The approach presented in this paper represents a significant advancement in the field of chemical data management, offering a powerful tool for representing, navigating, and analyzing chemical information to support scientific research.
Topics: Knowledge Discovery; Reproducibility of Results; Software; Databases, Factual; Semantics
PubMed: 37883649
DOI: 10.1021/acs.jcim.3c00820 -
Journal of Pathology Informatics Dec 2024Advancements in digital pathology and computing resources have made a significant impact in the field of computational pathology for breast cancer diagnosis and... (Review)
Review
Advancements in digital pathology and computing resources have made a significant impact in the field of computational pathology for breast cancer diagnosis and treatment. However, access to high-quality labeled histopathological images of breast cancer is a big challenge that limits the development of accurate and robust deep learning models. In this scoping review, we identified the publicly available datasets of breast H&E-stained whole-slide images (WSIs) that can be used to develop deep learning algorithms. We systematically searched 9 scientific literature databases and 9 research data repositories and found 17 publicly available datasets containing 10 385 H&E WSIs of breast cancer. Moreover, we reported image metadata and characteristics for each dataset to assist researchers in selecting proper datasets for specific tasks in breast cancer computational pathology. In addition, we compiled 2 lists of breast H&E patches and private datasets as supplementary resources for researchers. Notably, only 28% of the included articles utilized multiple datasets, and only 14% used an external validation set, suggesting that the performance of other developed models may be susceptible to overestimation. The TCGA-BRCA was used in 52% of the selected studies. This dataset has a considerable selection bias that can impact the robustness and generalizability of the trained algorithms. There is also a lack of consistent metadata reporting of breast WSI datasets that can be an issue in developing accurate deep learning models, indicating the necessity of establishing explicit guidelines for documenting breast WSI dataset characteristics and metadata.
PubMed: 38405160
DOI: 10.1016/j.jpi.2024.100363 -
Behavior Research Methods Mar 2024To study visual and semantic object representations, the need for well-curated object concepts and images has grown significantly over the past years. To address this,...
To study visual and semantic object representations, the need for well-curated object concepts and images has grown significantly over the past years. To address this, we have previously developed THINGS, a large-scale database of 1854 systematically sampled object concepts with 26,107 high-quality naturalistic images of these concepts. With THINGSplus, we significantly extend THINGS by adding concept- and image-specific norms and metadata for all 1854 concepts and one copyright-free image example per concept. Concept-specific norms were collected for the properties of real-world size, manmadeness, preciousness, liveliness, heaviness, naturalness, ability to move or be moved, graspability, holdability, pleasantness, and arousal. Further, we provide 53 superordinate categories as well as typicality ratings for all their members. Image-specific metadata includes a nameability measure, based on human-generated labels of the objects depicted in the 26,107 images. Finally, we identified one new public domain image per concept. Property (M = 0.97, SD = 0.03) and typicality ratings (M = 0.97, SD = 0.01) demonstrate excellent consistency, with the subsequently collected arousal ratings as the only exception (r = 0.69). Our property (M = 0.85, SD = 0.11) and typicality (r = 0.72, 0.74, 0.88) data correlated strongly with external norms, again with the lowest validity for arousal (M = 0.41, SD = 0.08). To summarize, THINGSplus provides a large-scale, externally validated extension to existing object norms and an important extension to THINGS, allowing detailed selection of stimuli and control variables for a wide range of research interested in visual object processing, language, and semantic memory.
Topics: Humans; Metadata; Language; Semantics; Memory; Databases, Factual
PubMed: 37095326
DOI: 10.3758/s13428-023-02110-8 -
The Lancet. Planetary Health Feb 2024Although the effects of antimicrobial resistance (AMR) are most obvious at clinical treatment failure, AMR evolution, transmission, and dispersal happen largely in... (Review)
Review
Although the effects of antimicrobial resistance (AMR) are most obvious at clinical treatment failure, AMR evolution, transmission, and dispersal happen largely in environmental settings, for example within farms, waterways, livestock, and wildlife. We argue that systems-thinking, One Health approaches are crucial for tackling AMR, by understanding and predicting how anthropogenic activities interact within environmental subsystems, to drive AMR emergence and transmission. Innovative computational methods integrating big data streams (eg, from clinical, agricultural, and environmental monitoring) will accelerate our understanding of AMR, supporting decision making. There are challenges to accessing, integrating, synthesising, and interpreting such complex, multidimensional, heterogeneous datasets, including the lack of specific metrics to quantify anthropogenic AMR. Moreover, data confidentiality, geopolitical and cultural variation, surveillance gaps, and science funding cause biases, uncertainty, and gaps in AMR data and metadata. Combining systems-thinking with modelling will allow exploration, scaling-up, and extrapolation of existing data. This combination will provide vital understanding of the dynamic movement and transmission of AMR within and among environmental subsystems, and its effects across the greater system. Consequently, strategies for slowing down AMR dissemination can be modelled and compared for efficacy and cost-effectiveness.
Topics: Animals; Anti-Bacterial Agents; Drug Resistance, Bacterial; One Health; Animals, Wild; Agriculture
PubMed: 38331529
DOI: 10.1016/S2542-5196(23)00278-4 -
Open Research Europe 2023Survey data on migration aspirations, plans and intentions is important for understanding the drivers and dynamics of migration. Such data has been collected since the...
Survey data on migration aspirations, plans and intentions is important for understanding the drivers and dynamics of migration. Such data has been collected since the 1960s but has expanded massively in recent decades. This paper provides the first comprehensive overview of existing survey data in an inventory of 212 surveys with recorded metadata on geographic and temporal coverage, survey population, sample size, and other characteristics. 'A survey' is not always a clear-cut unit of analysis, but we adopted procedures that enable systematic comparisons, and identified surveys through systematic searches and follow-up investigation. The paper has three objectives. First, it facilitates reuse of survey data and secondary analysis, albeit with limitations in data access, which we document. Second, it helps consolidate a sprawling field and thereby contribute to methodological and theoretical strengthening. Third, it informs debates on the ethics, politics and biases of data collection by documenting broad patterns in the body of knowledge. The inventory of survey data on migration aspirations and related concepts gives migration researchers a new tool for locating existing data and strengthening the foundations for collecting new data.
PubMed: 38323224
DOI: 10.12688/openreseurope.15800.1 -
Scientific Data Dec 2023Metadata from epidemiological studies, including chronic disease outcome metadata (CDOM), are important to be findable to allow interpretability and reusability. We...
Metadata from epidemiological studies, including chronic disease outcome metadata (CDOM), are important to be findable to allow interpretability and reusability. We propose a comprehensive metadata schema and used it to assess public availability and findability of CDOM from German population-based observational studies participating in the consortium National Research Data Infrastructure for Personal Health Data (NFDI4Health). Additionally, principal investigators from the included studies completed a checklist evaluating consistency with FAIR principles (Findability, Accessibility, Interoperability, Reusability) within their studies. Overall, six of sixteen studies had complete publicly available CDOM. The most frequent CDOM source was scientific publications and the most frequently missing metadata were availability of codes of the International Classification of Diseases, Tenth Revision (ICD-10). Principal investigators' main perceived barriers for consistency with FAIR principles were limited human and financial resources. Our results reveal that CDOM from German population-based studies have incomplete availability and limited findability. There is a need to make CDOM publicly available in searchable platforms or metadata catalogues to improve their FAIRness, which requires human and financial resources.
Topics: Humans; Metadata; Publications; Chronic Disease
PubMed: 38052810
DOI: 10.1038/s41597-023-02726-7