-
Nature Aug 2023Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess the clinical knowledge of... (Comparative Study)
Comparative Study
Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess the clinical knowledge of models typically rely on automated evaluations based on limited benchmarks. Here, to address these limitations, we present MultiMedQA, a benchmark combining six existing medical question answering datasets spanning professional medicine, research and consumer queries and a new dataset of medical questions searched online, HealthSearchQA. We propose a human evaluation framework for model answers along multiple axes including factuality, comprehension, reasoning, possible harm and bias. In addition, we evaluate Pathways Language Model (PaLM, a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA and Measuring Massive Multitask Language Understanding (MMLU) clinical topics), including 67.6% accuracy on MedQA (US Medical Licensing Exam-style questions), surpassing the prior state of the art by more than 17%. However, human evaluation reveals key gaps. To resolve this, we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, knowledge recall and reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLMs for clinical applications.
Topics: Benchmarking; Bias; Clinical Competence; Comprehension; Computer Simulation; Datasets as Topic; Knowledge; Licensure; Medicine; Natural Language Processing; Patient Safety; Physicians
PubMed: 37438534
DOI: 10.1038/s41586-023-06291-2 -
Annual Review of Biomedical Data Science Aug 2023Advances in single-cell proteomics technologies have resulted in high-dimensional datasets comprising millions of cells that are capable of answering key questions about... (Review)
Review
Advances in single-cell proteomics technologies have resulted in high-dimensional datasets comprising millions of cells that are capable of answering key questions about biology and disease. The advent of these technologies has prompted the development of computational tools to process and visualize the complex data. In this review, we outline the steps of single-cell and spatial proteomics analysis pipelines. In addition to describing available methods, we highlight benchmarking studies that have identified advantages and pitfalls of the currently available computational toolkits. As these technologies continue to advance, robust analysis tools should be developed in tandem to take full advantage of the potential biological insights provided by these data.
Topics: Proteomics; Computational Biology; Benchmarking
PubMed: 37040735
DOI: 10.1146/annurev-biodatasci-020422-050255 -
Environmental Science & Technology Sep 2023The effects and risks of microplastics correlate with three-dimensional (3D) properties, such as the volume and surface area of the biologically accessible fraction of...
The effects and risks of microplastics correlate with three-dimensional (3D) properties, such as the volume and surface area of the biologically accessible fraction of the diverse particle mixtures as they occur in nature. However, these 3D parameters are difficult to estimate because measurement methods for spectroscopic and visible light image analysis yield data in only two dimensions (2D). The best-existing 2D to 3D conversion models require calibration for each new set of particles, which is labor-intensive. Here we introduce a new model that does not require calibration and compare its performance with existing models, including calibration-based ones. For the evaluation, we developed a new method in which the volumes of environmentally relevant microplastic mixtures are estimated in one go instead of on a cumbersome particle-by-particle basis. With this, the new Barchiesi model can be seen as the most universal. The new model can be implemented in software used for the analysis of infrared spectroscopy and visual light image analysis data and is expected to increase the accuracy of risk assessments based on particle volumes and surface areas as toxicologically relevant metrics.
Topics: Microplastics; Plastics; Benchmarking; Calibration; Light
PubMed: 37683039
DOI: 10.1021/acs.est.3c03620 -
Genome Biology Oct 2023Genomic benchmark datasets are essential to driving the field of genomics and bioinformatics. They provide a snapshot of the performances of sequencing technologies and... (Review)
Review
Genomic benchmark datasets are essential to driving the field of genomics and bioinformatics. They provide a snapshot of the performances of sequencing technologies and analytical methods and highlight future challenges. However, they depend on sequencing technology, reference genome, and available benchmarking methods. Thus, creating a genomic benchmark dataset is laborious and highly challenging, often involving multiple sequencing technologies, different variant calling tools, and laborious manual curation. In this review, we discuss the available benchmark datasets and their utility. Additionally, we focus on the most recent benchmark of genes with medical relevance and challenging genomic complexity.
Topics: Benchmarking; Genomics; Computational Biology; Genome; High-Throughput Nucleotide Sequencing
PubMed: 37798733
DOI: 10.1186/s13059-023-03061-1 -
Bioinformatics (Oxford, England) Nov 2023Biomedical entity linking (BEL) is the task of grounding entity mentions to a knowledge base (KB). It plays a vital role in information extraction pipelines for the life... (Review)
Review
MOTIVATION
Biomedical entity linking (BEL) is the task of grounding entity mentions to a knowledge base (KB). It plays a vital role in information extraction pipelines for the life sciences literature. We review recent work in the field and find that, as the task is absent from existing benchmarks for biomedical text mining, different studies adopt different experimental setups making comparisons based on published numbers problematic. Furthermore, neural systems are tested primarily on instances linked to the broad coverage KB UMLS, leaving their performance to more specialized ones, e.g. genes or variants, understudied.
RESULTS
We therefore developed BELB, a biomedical entity linking benchmark, providing access in a unified format to 11 corpora linked to 7 KBs and spanning six entity types: gene, disease, chemical, species, cell line, and variant. BELB greatly reduces preprocessing overhead in testing BEL systems on multiple corpora offering a standardized testbed for reproducible experiments. Using BELB, we perform an extensive evaluation of six rule-based entity-specific systems and three recent neural approaches leveraging pre-trained language models. Our results reveal a mixed picture showing that neural approaches fail to perform consistently across entity types, highlighting the need of further studies towards entity-agnostic models.
AVAILABILITY AND IMPLEMENTATION
The source code of BELB is available at: https://github.com/sg-wbi/belb. The code to reproduce our experiments can be found at: https://github.com/sg-wbi/belb-exp.
Topics: Benchmarking; Data Mining; Software; Language; Natural Language Processing
PubMed: 37975879
DOI: 10.1093/bioinformatics/btad698 -
BMC Bioinformatics Dec 2023Biclustering is increasingly used in biomedical data analysis, recommendation tasks, and text mining domains, with hundreds of biclustering algorithms proposed. When...
BACKGROUND
Biclustering is increasingly used in biomedical data analysis, recommendation tasks, and text mining domains, with hundreds of biclustering algorithms proposed. When assessing the performance of these algorithms, more than real datasets are required as they do not offer a solid ground truth. Synthetic data surpass this limitation by producing reference solutions to be compared with the found patterns. However, generating synthetic datasets is challenging since the generated data must ensure reproducibility, pattern representativity, and real data resemblance.
RESULTS
We propose G-Bic, a dataset generator conceived to produce synthetic benchmarks for the normative assessment of biclustering algorithms. Beyond expanding on aspects of pattern coherence, data quality, and positioning properties, it further handles specificities related to mixed-type datasets and time-series data.G-Bic has the flexibility to replicate real data regularities from diverse domains. We provide the default configurations to generate reproducible benchmarks to evaluate and compare diverse aspects of biclustering algorithms. Additionally, we discuss empirical strategies to simulate the properties of real data.
CONCLUSION
G-Bic is a parametrizable generator for biclustering analysis, offering a solid means to assess biclustering solutions according to internal and external metrics robustly.
Topics: Gene Expression Profiling; Reproducibility of Results; Benchmarking; Cluster Analysis; Algorithms
PubMed: 38053078
DOI: 10.1186/s12859-023-05587-4 -
Bioinformatics (Oxford, England) Dec 2023The steady increment of Whole Genome/Exome sequencing and the development of novel Next Generation Sequencing-based gene panels requires continuous testing and...
MOTIVATION
The steady increment of Whole Genome/Exome sequencing and the development of novel Next Generation Sequencing-based gene panels requires continuous testing and validation of variant calling (VC) pipelines and the detection of sequencing-related issues to be maintained up-to-date and feasible for the clinical settings. State of the art tools are reliable when used to compute standard performance metrics. However, the need for an automated software to discriminate between bioinformatic and sequencing issues and to optimize VC parameters remains unmet.
RESULTS
The aim of the current work is to present RecallME, a bioinformatic suite that tracks down difficult-to-detect variants as insertions and deletions in highly repetitive regions, thus providing the maximum reachable recall for both single nucleotide variants and small insertion and deletions and to precisely guide the user in the pipeline optimization process.
AVAILABILITY AND IMPLEMENTATION
Source code is freely available under MIT license at https://github.com/mazzalab-ieo/recallme. RecallME web application is available at https://translational-oncology-lab.shinyapps.io/recallme/. To use RecallME, users must obtain a license for ANNOVAR by themselves.
Topics: Benchmarking; Software; Computational Biology; Exome; High-Throughput Nucleotide Sequencing
PubMed: 38092052
DOI: 10.1093/bioinformatics/btad722 -
Biomolecules & Biomedicine Mar 2024I read the article "Scientometrics and academia" by Dr. Zerem and colleagues. My perspective on citation metrics and scientometrics is more cautious. Therefore, in this...
I read the article "Scientometrics and academia" by Dr. Zerem and colleagues. My perspective on citation metrics and scientometrics is more cautious. Therefore, in this article, I present my viewpoint on this subject. Read more in the PDF.
Topics: Benchmarking
PubMed: 38197799
DOI: 10.17305/bb.2023.10233 -
Cell Reports Methods Nov 2023Molecular representation learning plays an important role in molecular property prediction. Existing molecular property prediction models rely on the de facto standard...
Molecular representation learning plays an important role in molecular property prediction. Existing molecular property prediction models rely on the de facto standard of covalent-bond-based molecular graphs for representing molecular topology at the atomic level and totally ignore the non-covalent interactions within the molecule. In this study, we propose a molecular geometric deep learning model to predict the properties of molecules that aims to comprehensively consider the information of covalent and non-covalent interactions of molecules. The essential idea is to incorporate a more general molecular representation into geometric deep learning (GDL) models. We systematically test molecular GDL (Mol-GDL) on fourteen commonly used benchmark datasets. The results show that Mol-GDL can achieve a better performance than state-of-the-art (SOTA) methods. Extensive tests have demonstrated the important role of non-covalent interactions in molecular property prediction and the effectiveness of Mol-GDL models.
Topics: Deep Learning; Benchmarking; Models, Molecular
PubMed: 37875121
DOI: 10.1016/j.crmeth.2023.100621 -
Bioinformatics (Oxford, England) Jun 2023Predicting the regulatory function of non-coding DNA using only the DNA sequence continues to be a major challenge in genomics. With the advent of improved optimization...
MOTIVATION
Predicting the regulatory function of non-coding DNA using only the DNA sequence continues to be a major challenge in genomics. With the advent of improved optimization algorithms, faster GPU speeds, and more intricate machine-learning libraries, hybrid convolutional and recurrent neural network architectures can be constructed and applied to extract crucial information from non-coding DNA.
RESULTS
Using a comparative analysis of the performance of thousands of Deep Learning architectures, we developed ChromDL, a neural network architecture combining bidirectional gated recurrent units, convolutional neural networks, and bidirectional long short-term memory units, which significantly improves upon a range of prediction metrics compared to its predecessors in transcription factor binding site, histone modification, and DNase-I hyper-sensitive site detection. Combined with a secondary model, it can be utilized for accurate classification of gene regulatory elements. The model can also detect weak transcription factor binding as compared to previously developed methods and has the potential to help delineate transcription factor binding motif specificities.
AVAILABILITY AND IMPLEMENTATION
The ChromDL source code can be found at https://github.com/chrishil1/ChromDL.
Topics: Algorithms; Benchmarking; DNA; Deoxyribonuclease I; Transcription Factors
PubMed: 37387183
DOI: 10.1093/bioinformatics/btad217