-
Journal of Molecular Biology Jul 2023ModelCIF (github.com/ihmwg/ModelCIF) is a data information framework developed for and by computational structural biologists to enable delivery of Findable, Accessible,...
ModelCIF (github.com/ihmwg/ModelCIF) is a data information framework developed for and by computational structural biologists to enable delivery of Findable, Accessible, Interoperable, and Reusable (FAIR) data to users worldwide. ModelCIF describes the specific set of attributes and metadata associated with macromolecular structures modeled by solely computational methods and provides an extensible data representation for deposition, archiving, and public dissemination of predicted three-dimensional (3D) models of macromolecules. It is an extension of the Protein Data Bank Exchange / macromolecular Crystallographic Information Framework (PDBx/mmCIF), which is the global data standard for representing experimentally-determined 3D structures of macromolecules and associated metadata. The PDBx/mmCIF framework and its extensions (e.g., ModelCIF) are managed by the Worldwide Protein Data Bank partnership (wwPDB, wwpdb.org) in collaboration with relevant community stakeholders such as the wwPDB ModelCIF Working Group (wwpdb.org/task/modelcif). This semantically rich and extensible data framework for representing computed structure models (CSMs) accelerates the pace of scientific discovery. Herein, we describe the architecture, contents, and governance of ModelCIF, and tools and processes for maintaining and extending the data standard. Community tools and software libraries that support ModelCIF are also described.
Topics: Databases, Protein; Macromolecular Substances; Protein Conformation; Software
PubMed: 36828268
DOI: 10.1016/j.jmb.2023.168021 -
Microbial Genomics Sep 2023Non-typhoidal are extremely diverse and different serovars can exhibit varied phenotypes, including host adaptation and the ability to cause clinical illness in animals...
Non-typhoidal are extremely diverse and different serovars can exhibit varied phenotypes, including host adaptation and the ability to cause clinical illness in animals and humans. In the USA, serovar Kentucky is infrequently found to cause human illness, despite being the top serovar isolated from broiler chickens. Conversely, in Europe, this serovar falls in the top 10 serovars linked to human salmonellosis. Serovar Kentucky is polyphyletic and has two lineages, Kentucky-I and Kentucky-II; isolates belonging to Kentucky-I are frequently isolated from poultry in the USA, while Kentucky-II isolates tend to be associated with human illness. In this study, we analysed whole-genome sequences and associated metadata deposited in public databases between 2017 and 2021 by federal agencies to determine serovar Kentucky incidence across different animal and human sources. Of 5151 genomes, 90.3 % were from isolates that came from broilers, while 5.9 % were from humans and 3.0 % were from cattle. Kentucky-I isolates were associated with broilers, while isolates belonging to Kentucky-II and a new lineage, Kentucky-III, were more commonly associated with cattle and humans. Very few serovar Kentucky isolates were associated with turkey and swine sources. Phylogenetic analyses showed that Kentucky-III genomes were more closely related to Kentucky-I, and this was confirmed by CRISPR-typing and multilocus sequence typing (MLST). In a macrophage assay, serovar Kentucky-II isolates were able to replicate over eight times better than Kentucky-I isolates. Analysis of virulence factors showed unique patterns across these three groups, and these differences may be linked to their association with different hosts.
Topics: Humans; Animals; Cattle; Swine; Serogroup; Salmonella enterica; Chickens; Kentucky; Multilocus Sequence Typing; Phylogeny; Genomics; Phenotype
PubMed: 37750759
DOI: 10.1099/mgen.0.001089 -
Cureus Dec 2023This review is a bibliometric analysis based on anesthesiology, which is a medical specialty that deals with a patient's complete preoperative, intraoperative, and... (Review)
Review
This review is a bibliometric analysis based on anesthesiology, which is a medical specialty that deals with a patient's complete preoperative, intraoperative, and postoperative care. The objective of the review attempts to analyze the bibliometric characteristics of the 100 most top-cited articles on anesthesiology. The meta-data of the study were collected from the Core Collection of Web of Science database. A title search option was employed, and "Anesthesia" and "Anesthesiology" were typed in two different search boxes separated with the Boolean operator ''OR''. Further, the data were sorted by highest citation order; later, "article" was selected from the filter of document type, and all other types of documents were excluded. Finally, downloaded the bibliographic details of the 100 top-cited articles. VOSviewer Software (version 1.6.10 by van Eck and Waltman) was used for bibliometric network analysis for co-authors and keywords. Pearson chi-square test was used for statistical analysis. The 100 top-cited articles were published between the years of 1971 and 2018. These articles gained a maximum of 1006 to a minimum of 276 citations with an average of 384.57 cites/article. Open accessed articles gained a slightly higher ratio of citations, while more than half of the articles were published in the two leading journals of "Anesthesiology" and "Anesthesia and Analgesia". There was no statistically significant difference in both citation analysis among open and closed access journals and Anesthesia vs Non-Anesthesia journals. Thirty-six articles were published in journals not specifically related to Anesthesia. Most of the top-cited articles were contributed by the United States, whereas Surgery and General Anesthesia were the two most occurred keywords. We conclude that all the top-cited articles in anesthesiology were contributed by authors who belonged to the developed nations and the United States outclassed the rest of the world. This bibliometric analysis would be valuable to practitioners, academics, researchers, and students to understand the dynamics of progress in the field of anesthesiology.
PubMed: 38249230
DOI: 10.7759/cureus.50959 -
Frontiers in Oncology 2023Research on hepatocellular carcinoma (HCC) has grown significantly, and researchers cannot access the vast amount of literature. This study aimed to explore the research... (Review)
Review
INTRODUCTION
Research on hepatocellular carcinoma (HCC) has grown significantly, and researchers cannot access the vast amount of literature. This study aimed to explore the research progress in studying HCC over the past 30 years using a machine learning-based bibliometric analysis and to suggest future research directions.
METHODS
Comprehensive research was conducted between 1991 and 2020 in the public version of the PubMed database using the MeSH term "hepatocellular carcinoma." The complete records of the collected results were downloaded in Extensible Markup Language format, and the metadata of each publication, such as the publication year, the type of research, the corresponding author's country, the title, the abstract, and the MeSH terms, were analyzed. We adopted a latent Dirichlet allocation topic modeling method on the Python platform to analyze the research topics of the scientific publications.
RESULTS
In the last 30 years, there has been significant and constant growth in the annual publications about HCC (annual percentage growth rate: 7.34%). Overall, 62,856 articles related to HCC from the past 30 years were searched and finally included in this study. Among the diagnosis-related terms, "Liver Cirrhosis" was the most studied. However, in the 2010s, "Biomarkers, Tumor" began to outpace "Liver Cirrhosis." Regarding the treatment-related MeSH terms, "Hepatectomy" was the most studied; however, recent studies related to "Antineoplastic Agents" showed a tendency to supersede hepatectomy. Regarding basic research, the study of "Cell Lines, Tumors,'' appeared after 2000 and has been the most studied among these terms.
CONCLUSION
This was the first machine learning-based bibliometric study to analyze more than 60,000 publications about HCC over the past 30 years. Despite significant efforts in analyzing the literature on basic research, its connection with the clinical field is still lacking. Therefore, more efforts are needed to convert and apply basic research results to clinical treatment. Additionally, it was found that microRNAs have potential as diagnostic and therapeutic targets for HCC.
PubMed: 37664017
DOI: 10.3389/fonc.2023.1227991 -
Scientific Data May 2024Datasets consist of measurement data and metadata. Metadata provides context, essential for understanding and (re-)using data. Various metadata standards exist for...
Datasets consist of measurement data and metadata. Metadata provides context, essential for understanding and (re-)using data. Various metadata standards exist for different methods, systems and contexts. However, relevant information resides at differing stages across the data-lifecycle. Often, this information is defined and standardized only at publication stage, which can lead to data loss and workload increase. In this study, we developed Metadatasheet, a metadata standard based on interviews with members of two biomedical consortia and systematic screening of data repositories. It aligns with the data-lifecycle allowing synchronous metadata recording within Microsoft Excel, a widespread data recording software. Additionally, we provide an implementation, the Metadata Workbook, that offers user-friendly features like automation, dynamic adaption, metadata integrity checks, and export options for various metadata standards. By design and due to its extensive documentation, the proposed metadata standard simplifies recording and structuring of metadata for biomedical scientists, promoting practicality and convenience in data management. This framework can accelerate scientific progress by enhancing collaboration and knowledge transfer throughout the intermediate steps of data creation.
Topics: Biomedical Research; Data Management; Metadata; Software
PubMed: 38778016
DOI: 10.1038/s41597-024-03349-2 -
Scientific Data Apr 2024Human infections caused by viral pathogens trigger a complex gamut of host responses that limit disease, resolve infection, generate immunity, and contribute to severe...
Human infections caused by viral pathogens trigger a complex gamut of host responses that limit disease, resolve infection, generate immunity, and contribute to severe disease or death. Here, we present experimental methods and multi-omics data capture approaches representing the global host response to infection generated from 45 individual experiments involving human viruses from the Orthomyxoviridae, Filoviridae, Flaviviridae, and Coronaviridae families. Analogous experimental designs were implemented across human or mouse host model systems, longitudinal samples were collected over defined time courses, and global multi-omics data (transcriptomics, proteomics, metabolomics, and lipidomics) were acquired by microarray, RNA sequencing, or mass spectrometry analyses. For comparison, we have included transcriptomics datasets from cells treated with type I and type II human interferon. Raw multi-omics data and metadata were deposited in public repositories, and we provide a central location linking the raw data with experimental metadata and ready-to-use, quality-controlled, statistically processed multi-omics datasets not previously available in any public repository. This compendium of infection-induced host response data for reuse will be useful for those endeavouring to understand viral disease pathophysiology and network biology.
Topics: Animals; Humans; Mice; Gene Expression Profiling; Metabolomics; Multiomics; Proteomics; Viruses; Virus Diseases; Host-Pathogen Interactions
PubMed: 38565538
DOI: 10.1038/s41597-024-03124-3 -
Journal of Pathology Informatics Dec 2024Advancements in digital pathology and computing resources have made a significant impact in the field of computational pathology for breast cancer diagnosis and... (Review)
Review
Advancements in digital pathology and computing resources have made a significant impact in the field of computational pathology for breast cancer diagnosis and treatment. However, access to high-quality labeled histopathological images of breast cancer is a big challenge that limits the development of accurate and robust deep learning models. In this scoping review, we identified the publicly available datasets of breast H&E-stained whole-slide images (WSIs) that can be used to develop deep learning algorithms. We systematically searched 9 scientific literature databases and 9 research data repositories and found 17 publicly available datasets containing 10 385 H&E WSIs of breast cancer. Moreover, we reported image metadata and characteristics for each dataset to assist researchers in selecting proper datasets for specific tasks in breast cancer computational pathology. In addition, we compiled 2 lists of breast H&E patches and private datasets as supplementary resources for researchers. Notably, only 28% of the included articles utilized multiple datasets, and only 14% used an external validation set, suggesting that the performance of other developed models may be susceptible to overestimation. The TCGA-BRCA was used in 52% of the selected studies. This dataset has a considerable selection bias that can impact the robustness and generalizability of the trained algorithms. There is also a lack of consistent metadata reporting of breast WSI datasets that can be an issue in developing accurate deep learning models, indicating the necessity of establishing explicit guidelines for documenting breast WSI dataset characteristics and metadata.
PubMed: 38405160
DOI: 10.1016/j.jpi.2024.100363 -
Journal of Applied Crystallography Aug 2023This article demonstrates spatial mapping of the local and nanoscale structure of thin film objects using spatially resolved pair distribution function (PDF) analysis of...
This article demonstrates spatial mapping of the local and nanoscale structure of thin film objects using spatially resolved pair distribution function (PDF) analysis of synchrotron X-ray diffraction data. This is exemplified in a lab-on-chip combinatorial array of sample spots containing catalytically interesting nanoparticles deposited from liquid precursors using an ink-jet liquid-handling system. A software implementation is presented of the whole protocol, including an approach for automated data acquisition and analysis using the atomic PDF method. The protocol software can handle semi-automated data reduction, normalization and modeling, with user-defined recipes generating a comprehensive collection of metadata and analysis results. By slicing the collection using included functions, it is possible to build images of different contrast features chosen by the user, giving insights into different aspects of the local structure.
PubMed: 37555210
DOI: 10.1107/S1600576723005927 -
Open Research Europe 2023Opportunistic sensors are increasingly used for rainfall measurement. However, their raw data are collected by a variety of systems that are often not primarily intended...
Opportunistic sensors are increasingly used for rainfall measurement. However, their raw data are collected by a variety of systems that are often not primarily intended for rainfall monitoring, resulting in a plethora of different data formats and a lack of common standards. This hinders the sharing of opportunistic sensing (OS) data, their automated processing, and, at the end, their practical usage and integration into standard observation systems. This paper summarises the experiences of the more than 100 members of the OpenSense Cost Action involved in the OS of rainfall. We review the current practice of collecting and storing precipitation OS data and corresponding metadata, and propose new common guidelines describing the requirements on data and metadata collection, harmonising naming conventions, and defining human-readable and machine readable file formats for data and metadata storage. We focus on three sensors identified by the OpenSense community as prominent representatives of the OS of precipitation: Commercial microwave links (CML): fixed point-to-point radio links mainly used as backhauling connections in telecommunication networks Satellite microwave links (SML): radio links between geostationary Earth orbit (GEO) satellites and ground user terminals. Personal weather stations (PWS): non-professional meteorological sensors owned by citizens. The conventions presented in this paper are primarily designed for storing, handling, and sharing historical time series and do not consider specific requirements for using OS data in real time for operational purposes. The conventions are already now accepted by the ever growing OpenSense community and represent an important step towards automated processing of OS raw data and community development of joint OS software packages.
PubMed: 38405183
DOI: 10.12688/openreseurope.16068.2 -
BMC Research Notes Sep 2023This release note describes the Maize GxE project datasets within the Genomes to Fields (G2F) Initiative. The Maize GxE project aims to understand genotype by...
OBJECTIVES
This release note describes the Maize GxE project datasets within the Genomes to Fields (G2F) Initiative. The Maize GxE project aims to understand genotype by environment (GxE) interactions and use the information collected to improve resource allocation efficiency and increase genotype predictability and stability, particularly in scenarios of variable environmental patterns. Hybrids and inbreds are evaluated across multiple environments and phenotypic, genotypic, environmental, and metadata information are made publicly available.
DATA DESCRIPTION
The datasets include phenotypic data of the hybrids and inbreds evaluated in 30 locations across the US and one location in Germany in 2020 and 2021, soil and climatic measurements and metadata information for all environments (combination of year and location), ReadMe, and description files for each data type. A set of common hybrids is present in each environment to connect with previous evaluations. Each environment had a collaborator responsible for collecting and submitting the data, the GxE coordination team combined all the collected information and removed obvious erroneous data. Collaborators received the combined data to use, verify and declare that the data generated in their own environments was accurate. Combined data is released to the public with minimal filtering to maintain fidelity to the original data.
Topics: Zea mays; Seasons; Genotype; Germany; Resource Allocation
PubMed: 37710302
DOI: 10.1186/s13104-023-06430-y