-
NeuroImage Aug 2022The field of neuroimaging has embraced methods from machine learning in a variety of ways. Although an increasing number of initiatives have published open-access...
The field of neuroimaging has embraced methods from machine learning in a variety of ways. Although an increasing number of initiatives have published open-access neuroimaging datasets, specifically designed benchmarks are rare in the field. In this article, we first describe how benchmarks in computer science and biomedical imaging have fostered methodological progress in machine learning. Second, we identify the special characteristics of neuroimaging data and outline what researchers have to ensure when establishing a neuroimaging benchmark, how datasets should be composed and how adequate evaluation criteria can be chosen. Based on lessons learned from machine learning benchmarks, we argue for an extended evaluation procedure that, next to applying suitable performance metrics, focuses on scientifically relevant aspects such as explainability, robustness, uncertainty, computational efficiency and code quality. Lastly, we envision a collaborative neuroimaging benchmarking platform that combines the discussed aspects in a collaborative and agile framework, allowing researchers across disciplines to work together on the key predictive problems of the field of neuroimaging and psychiatry.
Topics: Benchmarking; Humans; Machine Learning; Neuroimaging; Psychiatry
PubMed: 35561945
DOI: 10.1016/j.neuroimage.2022.119298 -
Genome Biology and Evolution Dec 2020Orthobench is the standard benchmark to assess the accuracy of orthogroup inference methods. It contains 70 expert-curated reference orthogroups (RefOGs) that span the...
Orthobench is the standard benchmark to assess the accuracy of orthogroup inference methods. It contains 70 expert-curated reference orthogroups (RefOGs) that span the Bilateria and cover a range of different challenges for orthogroup inference. Here, we leveraged improvements in tree inference algorithms and computational resources to reinterrogate these RefOGs and carry out an extensive phylogenetic delineation of their composition. This phylogenetic revision altered the membership of 31 of the 70 RefOGs, with 24 subject to extensive revision and 7 that required minor changes. We further used these revised and updated RefOGs to provide an assessment of the orthogroup inference accuracy of widely used orthogroup inference methods. Finally, we provide an open-source benchmarking suite to support the future development and use of the Orthobench benchmark.
Topics: Benchmarking; Biological Evolution; Computational Biology; Databases, Factual; Genetic Techniques
PubMed: 33022036
DOI: 10.1093/gbe/evaa211 -
Sensors (Basel, Switzerland) Dec 2021Recent advances in the control of overground exoskeletons are being centered on improving balance support and decreasing the reliance on crutches. However, appropriate...
Recent advances in the control of overground exoskeletons are being centered on improving balance support and decreasing the reliance on crutches. However, appropriate methods to quantify the stability of these exoskeletons (and their users) are still under development. A reliable and reproducible balance assessment is critical to enrich exoskeletons' performance and their interaction with humans. In this work, we present the BenchBalance system, which is a benchmarking solution to conduct reproducible balance assessments of exoskeletons and their users. Integrating two key elements, i.e., a hand-held perturbator and a smart garment, BenchBalance is a portable and low-cost system that provides a quantitative assessment related to the reaction and capacity of wearable exoskeletons and their users to respond to controlled external perturbations. A software interface is used to guide the experimenter throughout a predefined protocol of measurable perturbations, taking into account antero-posterior and mediolateral responses. In total, the protocol is composed of sixteen perturbation conditions, which vary in magnitude and location while still controlling their orientation. The data acquired by the interface are classified and saved for a subsequent analysis based on synthetic metrics. In this paper, we present a proof of principle of the BenchBalance system with a healthy user in two scenarios: subject not wearing and subject wearing the H2 lower-limb exoskeleton. After a brief training period, the experimenter was able to provide the manual perturbations of the protocol in a consistent and reproducible way. The balance metrics defined within the BenchBalance framework were able to detect differences in performance depending on the perturbation magnitude, location, and the presence or not of the exoskeleton. The BenchBalance system will be integrated at EUROBENCH facilities to benchmark the balance capabilities of wearable exoskeletons and their users.
Topics: Benchmarking; Crutches; Exoskeleton Device; Humans; Lower Extremity; Wearable Electronic Devices
PubMed: 35009661
DOI: 10.3390/s22010119 -
The ISME Journal Jan 2021Growth rates are central to understanding microbial interactions and community dynamics. Metagenomic growth estimators have been developed, specifically codon usage bias...
Growth rates are central to understanding microbial interactions and community dynamics. Metagenomic growth estimators have been developed, specifically codon usage bias (CUB) for maximum growth rates and "peak-to-trough ratio" (PTR) for in situ rates. Both were originally tested with pure cultures, but natural populations are more heterogeneous, especially in individual cell histories pertinent to PTR. To test these methods, we compared predictors with observed growth rates of freshly collected marine prokaryotes in unamended seawater. We prefiltered and diluted samples to remove grazers and greatly reduce virus infection, so net growth approximated gross growth. We sampled over 44 h for abundances and metagenomes, generating 101 metagenome-assembled genomes (MAGs), including Actinobacteria, Verrucomicrobia, SAR406, MGII archaea, etc. We tracked each MAG population by cell-abundance-normalized read recruitment, finding growth rates of 0 to 5.99 per day, the first reported rates for several groups, and used these rates as benchmarks. PTR, calculated by three methods, rarely correlated to growth (r ~-0.26-0.08), except for rapidly growing γ-Proteobacteria (r ~0.63-0.92), while CUB correlated moderately well to observed maximum growth rates (r = 0.57). This suggests that current PTR approaches poorly predict actual growth of most marine bacterial populations, but maximum growth rates can be approximated from genomic characteristics.
Topics: Archaea; Bacteria; Benchmarking; Metagenome; Metagenomics
PubMed: 32939027
DOI: 10.1038/s41396-020-00773-1 -
Briefings in Bioinformatics May 2023Since the 1980s, dozens of computational methods have addressed the problem of predicting RNA secondary structure. Among them are those that follow standard optimization... (Review)
Review
Since the 1980s, dozens of computational methods have addressed the problem of predicting RNA secondary structure. Among them are those that follow standard optimization approaches and, more recently, machine learning (ML) algorithms. The former were repeatedly benchmarked on various datasets. The latter, on the other hand, have not yet undergone extensive analysis that could suggest to the user which algorithm best fits the problem to be solved. In this review, we compare 15 methods that predict the secondary structure of RNA, of which 6 are based on deep learning (DL), 3 on shallow learning (SL) and 6 control methods on non-ML approaches. We discuss the ML strategies implemented and perform three experiments in which we evaluate the prediction of (I) representatives of the RNA equivalence classes, (II) selected Rfam sequences and (III) RNAs from new Rfam families. We show that DL-based algorithms (such as SPOT-RNA and UFold) can outperform SL and traditional methods if the data distribution is similar in the training and testing set. However, when predicting 2D structures for new RNA families, the advantage of DL is no longer clear, and its performance is inferior or equal to that of SL and non-ML methods.
Topics: Humans; RNA; Machine Learning; Algorithms; Benchmarking
PubMed: 37096592
DOI: 10.1093/bib/bbad153 -
Scientific Data Sep 2023Computational drug repositioning methods have emerged as an attractive and effective solution to find new candidates for existing therapies, reducing the time and cost...
Computational drug repositioning methods have emerged as an attractive and effective solution to find new candidates for existing therapies, reducing the time and cost of drug development. Repositioning methods based on biomedical knowledge graphs typically offer useful supporting biological evidence. This evidence is based on reasoning chains or subgraphs that connect a drug to a disease prediction. However, there are no databases of drug mechanisms that can be used to train and evaluate such methods. Here, we introduce the Drug Mechanism Database (DrugMechDB), a manually curated database that describes drug mechanisms as paths through a knowledge graph. DrugMechDB integrates a diverse range of authoritative free-text resources to describe 4,583 drug indications with 32,249 relationships, representing 14 major biological scales. DrugMechDB can be employed as a benchmark dataset for assessing computational drug repositioning models or as a valuable resource for training such models.
Topics: Benchmarking; Databases, Factual; Drug Development; Drug Repositioning; Knowledge
PubMed: 37717042
DOI: 10.1038/s41597-023-02534-z -
Journal of Nuclear Medicine Technology Jun 2020U.S. Pharmacopeia (USP) general chapter <825>, "Radiopharmaceuticals: Preparation, Compounding, Dispensing, and Repackaging," is a new standard proposed to provide... (Review)
Review
U.S. Pharmacopeia (USP) general chapter <825>, "Radiopharmaceuticals: Preparation, Compounding, Dispensing, and Repackaging," is a new standard proposed to provide minimum requirements for the preparation, compounding, dispensing, and repackaging of sterile and nonsterile radiopharmaceuticals. This new standard represents endeavors on the part of the USP to respond to appeals by nuclear medicine professionals to move beyond a minimal supplement to USP <797> and provide policies specific to radiopharmaceuticals. USP <825> provides nuclear pharmacies and nuclear medicine departments in hospitals and clinics with the benchmarks to assess current practice activities and integrate needed changes to meet regulatory and accreditation audit reviews. This continuing education article focuses on components of USP <825> specific to the nuclear medicine technologist for a better understanding of obligations when preparing sterile radiopharmaceuticals for clinical use.
Topics: Benchmarking; Humans; Nuclear Medicine; Organizations, Nonprofit; United States
PubMed: 32499321
DOI: 10.2967/jnmt.120.243378 -
Proteins Sep 2020Protein docking is essential for structural characterization of protein interactions. Besides providing the structure of protein complexes, modeling of proteins and...
Protein docking is essential for structural characterization of protein interactions. Besides providing the structure of protein complexes, modeling of proteins and their complexes is important for understanding the fundamental principles and specific aspects of protein interactions. The accuracy of protein modeling, in general, is still less than that of the experimental approaches. Thus, it is important to investigate the applicability of docking techniques to modeled proteins. We present new comprehensive benchmark sets of protein models for the development and validation of protein docking, as well as a systematic assessment of free and template-based docking techniques on these sets. As opposed to previous studies, the benchmark sets reflect the real case modeling/docking scenario where the accuracy of the models is assessed by the modeling procedure, without reference to the native structure (which would be unknown in practical applications). We also expanded the analysis to include docking of protein pairs where proteins have different structural accuracy. The results show that, in general, the template-based docking is less sensitive to the structural inaccuracies of the models than the free docking. The near-native docking poses generated by the template-based approach, typically, also have higher ranks than those produces by the free docking (although the free docking is indispensable in modeling the multiplicity of protein interactions in a crowded cellular environment). The results show that docking techniques are applicable to protein models in a broad range of modeling accuracy. The study provides clear guidelines for practical applications of docking to protein models.
Topics: Amino Acid Sequence; Benchmarking; Binding Sites; Databases, Protein; Molecular Docking Simulation; Protein Binding; Protein Structure, Secondary; Proteins; Software
PubMed: 32170770
DOI: 10.1002/prot.25889 -
Annals of Surgical Oncology May 2024Benchmarking in surgery has been proposed as a means to compare results across institutions to establish best practices. We sought to define benchmark values for...
INTRODUCTION
Benchmarking in surgery has been proposed as a means to compare results across institutions to establish best practices. We sought to define benchmark values for hepatectomy for intrahepatic cholangiocarcinoma (ICC) across an international population.
METHODS
Patients who underwent liver resection for ICC between 1990 and 2020 were identified from an international database, including 14 Eastern and Western institutions. Patients operated on at high-volume centers who had no preoperative jaundice, ASA class <3, body mass index <35 km/m, without need for bile duct or vascular resection were chosen as the benchmark group.
RESULTS
Among 1193 patients who underwent curative-intent hepatectomy for ICC, 600 (50.3%) were included in the benchmark group. Among benchmark patients, median age was 58.0 years (interquartile range [IQR] 49.0-67.0), only 28 (4.7%) patients received neoadjuvant therapy, and most patients had a minor resection (n = 499, 83.2%). Benchmark values included ≥3 lymph nodes retrieved when lymphadenectomy was performed, blood loss ≤600 mL, perioperative blood transfusion rate ≤42.9%, and operative time ≤339 min. The postoperative benchmark values included TOO achievement ≥59.3%, positive resection margin ≤27.5%, 30-day readmission ≤3.6%, Clavien-Dindo III or more complications ≤14.3%, and 90-day mortality ≤4.8%, as well as hospital stay ≤14 days.
CONCLUSIONS
Benchmark cutoffs targeting short-term perioperative outcomes can help to facilitate comparisons across hospitals performing liver resection for ICC, assess inter-institutional variation, and identify the highest-performing centers to improve surgical and oncologic outcomes.
Topics: Humans; Middle Aged; Bile Ducts, Intrahepatic; Benchmarking; Hepatectomy; Bile Duct Neoplasms; Cholangiocarcinoma; Retrospective Studies
PubMed: 38214817
DOI: 10.1245/s10434-023-14880-8 -
RNA (New York, N.Y.) Dec 2023The tremendous rate with which data is generated and analysis methods emerge makes it increasingly difficult to keep track of their domain of applicability, assumptions,... (Review)
Review
The tremendous rate with which data is generated and analysis methods emerge makes it increasingly difficult to keep track of their domain of applicability, assumptions, limitations, and consequently, of the efficacy and precision with which they solve specific tasks. Therefore, there is an increasing need for benchmarks, and for the provision of infrastructure for continuous method evaluation. APAeval is an international community effort, organized by the RNA Society in 2021, to benchmark tools for the identification and quantification of the usage of alternative polyadenylation (APA) sites from short-read, bulk RNA-sequencing (RNA-seq) data. Here, we reviewed 17 tools and benchmarked eight on their ability to perform APA identification and quantification, using a comprehensive set of RNA-seq experiments comprising real, synthetic, and matched 3'-end sequencing data. To support continuous benchmarking, we have incorporated the results into the OpenEBench online platform, which allows for continuous extension of the set of methods, metrics, and challenges. We envisage that our analyses will assist researchers in selecting the appropriate tools for their studies, while the containers and reproducible workflows could easily be deployed and extended to evaluate new methods or data sets.
Topics: RNA; Benchmarking; RNA-Seq; Polyadenylation; Sequence Analysis, RNA
PubMed: 37816550
DOI: 10.1261/rna.079849.123