benchmarking - OpenMD.com Journal Search

Large language models encode clinical knowledge.

Nature Aug 2023

Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess the clinical knowledge of... (Comparative Study)

Summary PubMed Full Text PDF

Comparative Study

Authors: Karan Singhal, Shekoofeh Azizi, Tao Tu...

Large language models (LLMs) have demonstrated impressive capabilities, but the bar for clinical applications is high. Attempts to assess the clinical knowledge of models typically rely on automated evaluations based on limited benchmarks. Here, to address these limitations, we present MultiMedQA, a benchmark combining six existing medical question answering datasets spanning professional medicine, research and consumer queries and a new dataset of medical questions searched online, HealthSearchQA. We propose a human evaluation framework for model answers along multiple axes including factuality, comprehension, reasoning, possible harm and bias. In addition, we evaluate Pathways Language Model (PaLM, a 540-billion parameter LLM) and its instruction-tuned variant, Flan-PaLM on MultiMedQA. Using a combination of prompting strategies, Flan-PaLM achieves state-of-the-art accuracy on every MultiMedQA multiple-choice dataset (MedQA, MedMCQA, PubMedQA and Measuring Massive Multitask Language Understanding (MMLU) clinical topics), including 67.6% accuracy on MedQA (US Medical Licensing Exam-style questions), surpassing the prior state of the art by more than 17%. However, human evaluation reveals key gaps. To resolve this, we introduce instruction prompt tuning, a parameter-efficient approach for aligning LLMs to new domains using a few exemplars. The resulting model, Med-PaLM, performs encouragingly, but remains inferior to clinicians. We show that comprehension, knowledge recall and reasoning improve with model scale and instruction prompt tuning, suggesting the potential utility of LLMs in medicine. Our human evaluations reveal limitations of today's models, reinforcing the importance of both evaluation frameworks and method development in creating safe, helpful LLMs for clinical applications.

Topics: Benchmarking; Bias; Clinical Competence; Comprehension; Computer Simulation; Datasets as Topic; Knowledge; Licensure; Medicine; Natural Language Processing; Patient Safety; Physicians

PubMed: 37438534
DOI: 10.1038/s41586-023-06291-2

Computational Methods for Single-Cell Proteomics.

Annual Review of Biomedical Data Science Aug 2023

Advances in single-cell proteomics technologies have resulted in high-dimensional datasets comprising millions of cells that are capable of answering key questions about... (Review)

Summary PubMed Full Text PDF

Review

Authors: Sophia M Guldberg, Trine Line Hauge Okholm, Elizabeth E McCarthy...

Advances in single-cell proteomics technologies have resulted in high-dimensional datasets comprising millions of cells that are capable of answering key questions about biology and disease. The advent of these technologies has prompted the development of computational tools to process and visualize the complex data. In this review, we outline the steps of single-cell and spatial proteomics analysis pipelines. In addition to describing available methods, we highlight benchmarking studies that have identified advantages and pitfalls of the currently available computational toolkits. As these technologies continue to advance, robust analysis tools should be developed in tandem to take full advantage of the potential biological insights provided by these data.

Topics: Proteomics; Computational Biology; Benchmarking

PubMed: 37040735
DOI: 10.1146/annurev-biodatasci-020422-050255

Adding Depth to Microplastics.

Environmental Science & Technology Sep 2023

The effects and risks of microplastics correlate with three-dimensional (3D) properties, such as the volume and surface area of the biologically accessible fraction of...

Summary PubMed Full Text PDF

Authors: Margherita Barchiesi, Merel Kooi, Albert A Koelmans...

The effects and risks of microplastics correlate with three-dimensional (3D) properties, such as the volume and surface area of the biologically accessible fraction of the diverse particle mixtures as they occur in nature. However, these 3D parameters are difficult to estimate because measurement methods for spectroscopic and visible light image analysis yield data in only two dimensions (2D). The best-existing 2D to 3D conversion models require calibration for each new set of particles, which is labor-intensive. Here we introduce a new model that does not require calibration and compare its performance with existing models, including calibration-based ones. For the evaluation, we developed a new method in which the volumes of environmentally relevant microplastic mixtures are estimated in one go instead of on a cumbersome particle-by-particle basis. With this, the new Barchiesi model can be seen as the most universal. The new model can be implemented in software used for the analysis of infrared spectroscopy and visual light image analysis data and is expected to increase the accuracy of risk assessments based on particle volumes and surface areas as toxicologically relevant metrics.

Topics: Microplastics; Plastics; Benchmarking; Calibration; Light

PubMed: 37683039
DOI: 10.1021/acs.est.3c03620

Genomic variant benchmark: if you cannot measure it, you cannot improve it.

Genome Biology Oct 2023

Genomic benchmark datasets are essential to driving the field of genomics and bioinformatics. They provide a snapshot of the performances of sequencing technologies and... (Review)

Summary PubMed Full Text PDF

Review

Authors: Sina Majidian, Daniel Paiva Agustinho, Chen-Shan Chin...

Genomic benchmark datasets are essential to driving the field of genomics and bioinformatics. They provide a snapshot of the performances of sequencing technologies and analytical methods and highlight future challenges. However, they depend on sequencing technology, reference genome, and available benchmarking methods. Thus, creating a genomic benchmark dataset is laborious and highly challenging, often involving multiple sequencing technologies, different variant calling tools, and laborious manual curation. In this review, we discuss the available benchmark datasets and their utility. Additionally, we focus on the most recent benchmark of genes with medical relevance and challenging genomic complexity.

Topics: Benchmarking; Genomics; Computational Biology; Genome; High-Throughput Nucleotide Sequencing

PubMed: 37798733
DOI: 10.1186/s13059-023-03061-1

BELB: a biomedical entity linking benchmark.

Bioinformatics (Oxford, England) Nov 2023

Biomedical entity linking (BEL) is the task of grounding entity mentions to a knowledge base (KB). It plays a vital role in information extraction pipelines for the life... (Review)

Summary PubMed Full Text PDF

Review

Authors: Samuele Garda, Leon Weber-Genzel, Robert Martin...

MOTIVATION

Biomedical entity linking (BEL) is the task of grounding entity mentions to a knowledge base (KB). It plays a vital role in information extraction pipelines for the life sciences literature. We review recent work in the field and find that, as the task is absent from existing benchmarks for biomedical text mining, different studies adopt different experimental setups making comparisons based on published numbers problematic. Furthermore, neural systems are tested primarily on instances linked to the broad coverage KB UMLS, leaving their performance to more specialized ones, e.g. genes or variants, understudied.

RESULTS

We therefore developed BELB, a biomedical entity linking benchmark, providing access in a unified format to 11 corpora linked to 7 KBs and spanning six entity types: gene, disease, chemical, species, cell line, and variant. BELB greatly reduces preprocessing overhead in testing BEL systems on multiple corpora offering a standardized testbed for reproducible experiments. Using BELB, we perform an extensive evaluation of six rule-based entity-specific systems and three recent neural approaches leveraging pre-trained language models. Our results reveal a mixed picture showing that neural approaches fail to perform consistently across entity types, highlighting the need of further studies towards entity-agnostic models.

AVAILABILITY AND IMPLEMENTATION

The source code of BELB is available at: https://github.com/sg-wbi/belb. The code to reproduce our experiments can be found at: https://github.com/sg-wbi/belb-exp.

Topics: Benchmarking; Data Mining; Software; Language; Natural Language Processing

PubMed: 37975879
DOI: 10.1093/bioinformatics/btad698

G-bic: generating synthetic benchmarks for biclustering.

BMC Bioinformatics Dec 2023

Biclustering is increasingly used in biomedical data analysis, recommendation tasks, and text mining domains, with hundreds of biclustering algorithms proposed. When...

Summary PubMed Full Text PDF

Authors: Eduardo N Castanho, João P Lobo, Rui Henriques...

BACKGROUND

Biclustering is increasingly used in biomedical data analysis, recommendation tasks, and text mining domains, with hundreds of biclustering algorithms proposed. When assessing the performance of these algorithms, more than real datasets are required as they do not offer a solid ground truth. Synthetic data surpass this limitation by producing reference solutions to be compared with the found patterns. However, generating synthetic datasets is challenging since the generated data must ensure reproducibility, pattern representativity, and real data resemblance.

RESULTS

We propose G-Bic, a dataset generator conceived to produce synthetic benchmarks for the normative assessment of biclustering algorithms. Beyond expanding on aspects of pattern coherence, data quality, and positioning properties, it further handles specificities related to mixed-type datasets and time-series data.G-Bic has the flexibility to replicate real data regularities from diverse domains. We provide the default configurations to generate reproducible benchmarks to evaluate and compare diverse aspects of biclustering algorithms. Additionally, we discuss empirical strategies to simulate the properties of real data.

CONCLUSION

G-Bic is a parametrizable generator for biclustering analysis, offering a solid means to assess biclustering solutions according to internal and external metrics robustly.

Topics: Gene Expression Profiling; Reproducibility of Results; Benchmarking; Cluster Analysis; Algorithms

PubMed: 38053078
DOI: 10.1186/s12859-023-05587-4

Benchmarking and improving the performance of variant-calling pipelines with RecallME.

Bioinformatics (Oxford, England) Dec 2023

The steady increment of Whole Genome/Exome sequencing and the development of novel Next Generation Sequencing-based gene panels requires continuous testing and...

Summary PubMed Full Text PDF

Authors: Gianluca Vozza, Emanuele Bonetti, Giulia Tini...

MOTIVATION

The steady increment of Whole Genome/Exome sequencing and the development of novel Next Generation Sequencing-based gene panels requires continuous testing and validation of variant calling (VC) pipelines and the detection of sequencing-related issues to be maintained up-to-date and feasible for the clinical settings. State of the art tools are reliable when used to compute standard performance metrics. However, the need for an automated software to discriminate between bioinformatic and sequencing issues and to optimize VC parameters remains unmet.

RESULTS

The aim of the current work is to present RecallME, a bioinformatic suite that tracks down difficult-to-detect variants as insertions and deletions in highly repetitive regions, thus providing the maximum reachable recall for both single nucleotide variants and small insertion and deletions and to precisely guide the user in the pipeline optimization process.

AVAILABILITY AND IMPLEMENTATION

Source code is freely available under MIT license at https://github.com/mazzalab-ieo/recallme. RecallME web application is available at https://translational-oncology-lab.shinyapps.io/recallme/. To use RecallME, users must obtain a license for ANNOVAR by themselves.

Topics: Benchmarking; Software; Computational Biology; Exome; High-Throughput Nucleotide Sequencing

PubMed: 38092052
DOI: 10.1093/bioinformatics/btad722

Citation metrics and scientometrics.

Biomolecules & Biomedicine Mar 2024

I read the article "Scientometrics and academia" by Dr. Zerem and colleagues. My perspective on citation metrics and scientometrics is more cautious. Therefore, in this...

Summary PubMed Full Text PDF

Authors: Rajko Igic

I read the article "Scientometrics and academia" by Dr. Zerem and colleagues. My perspective on citation metrics and scientometrics is more cautious. Therefore, in this article, I present my viewpoint on this subject. Read more in the PDF.

Topics: Benchmarking

PubMed: 38197799
DOI: 10.17305/bb.2023.10233

Molecular geometric deep learning.

Cell Reports Methods Nov 2023

Molecular representation learning plays an important role in molecular property prediction. Existing molecular property prediction models rely on the de facto standard...

Summary PubMed Full Text PDF

Authors: Cong Shen, Jiawei Luo, Kelin Xia...

Molecular representation learning plays an important role in molecular property prediction. Existing molecular property prediction models rely on the de facto standard of covalent-bond-based molecular graphs for representing molecular topology at the atomic level and totally ignore the non-covalent interactions within the molecule. In this study, we propose a molecular geometric deep learning model to predict the properties of molecules that aims to comprehensively consider the information of covalent and non-covalent interactions of molecules. The essential idea is to incorporate a more general molecular representation into geometric deep learning (GDL) models. We systematically test molecular GDL (Mol-GDL) on fourteen commonly used benchmark datasets. The results show that Mol-GDL can achieve a better performance than state-of-the-art (SOTA) methods. Extensive tests have demonstrated the important role of non-covalent interactions in molecular property prediction and the effectiveness of Mol-GDL models.

Topics: Deep Learning; Benchmarking; Models, Molecular

PubMed: 37875121
DOI: 10.1016/j.crmeth.2023.100621

ChromDL: a next-generation regulatory DNA classifier.

Bioinformatics (Oxford, England) Jun 2023

Predicting the regulatory function of non-coding DNA using only the DNA sequence continues to be a major challenge in genomics. With the advent of improved optimization...

Summary PubMed Full Text PDF

Authors: Christopher Hill, Sanjarbek Hudaiberdiev, Ivan Ovcharenko...

MOTIVATION

Predicting the regulatory function of non-coding DNA using only the DNA sequence continues to be a major challenge in genomics. With the advent of improved optimization algorithms, faster GPU speeds, and more intricate machine-learning libraries, hybrid convolutional and recurrent neural network architectures can be constructed and applied to extract crucial information from non-coding DNA.

RESULTS

Using a comparative analysis of the performance of thousands of Deep Learning architectures, we developed ChromDL, a neural network architecture combining bidirectional gated recurrent units, convolutional neural networks, and bidirectional long short-term memory units, which significantly improves upon a range of prediction metrics compared to its predecessors in transcription factor binding site, histone modification, and DNase-I hyper-sensitive site detection. Combined with a secondary model, it can be utilized for accurate classification of gene regulatory elements. The model can also detect weak transcription factor binding as compared to previously developed methods and has the potential to help delineate transcription factor binding motif specificities.

AVAILABILITY AND IMPLEMENTATION

The ChromDL source code can be found at https://github.com/chrishil1/ChromDL.

Topics: Algorithms; Benchmarking; DNA; Deoxyribonuclease I; Transcription Factors

PubMed: 37387183
DOI: 10.1093/bioinformatics/btad217