-
Nature Methods Nov 2023The lack of benchmark data sets with inbuilt ground-truth makes it challenging to compare the performance of existing long-read isoform detection and differential...
The lack of benchmark data sets with inbuilt ground-truth makes it challenging to compare the performance of existing long-read isoform detection and differential expression analysis workflows. Here, we present a benchmark experiment using two human lung adenocarcinoma cell lines that were each profiled in triplicate together with synthetic, spliced, spike-in RNAs (sequins). Samples were deeply sequenced on both Illumina short-read and Oxford Nanopore Technologies long-read platforms. Alongside the ground-truth available via the sequins, we created in silico mixture samples to allow performance assessment in the absence of true positives or true negatives. Our results show that StringTie2 and bambu outperformed other tools from the six isoform detection tools tested, DESeq2, edgeR and limma-voom were best among the five differential transcript expression tools tested and there was no clear front-runner for performing differential transcript usage analysis between the five tools compared, which suggests further methods development is needed for this application.
Topics: Humans; Gene Expression Profiling; High-Throughput Nucleotide Sequencing; Benchmarking; RNA; Protein Isoforms
PubMed: 37783886
DOI: 10.1038/s41592-023-02026-3 -
Current Opinion in Critical Care Oct 2009In this article we discuss our experiences benchmarking eight ICUs in The Netherlands. Benchmarks must be carefully designed and implemented to generate meaningful... (Review)
Review
PURPOSE OF REVIEW
In this article we discuss our experiences benchmarking eight ICUs in The Netherlands. Benchmarks must be carefully designed and implemented to generate meaningful results. We define prerequisites that we have identified for successful benchmarking and discuss the development, implementation and results of ICU benchmarks that we have completed.
RECENT FINDINGS
Previous articles have discussed benchmarking ICUs, but there are still few studies of significant size and appropriate design that measure the impact of benchmarking on outcomes. Perhaps the most well known, and still best example of a benchmarking study designed to measure outcome improvements is the work of Pronovost et al. in Michigan ICUs.
SUMMARY
Benchmarking is an increasingly common activity, however it is difficult to prove that benchmarks result in improved outcomes. Concurrent with our benchmarking activities the Standardized Mortality Ratio in Dutch ICUs has decreased. We have been able to show that larger ICUs in our benchmarks generally had improved outcomes despite a higher average patient severity. Quality assurance in healthcare is maturing and benchmarks will become an increasingly useful way of comparing performance between institutions.
Topics: Benchmarking; Efficiency, Organizational; Humans; Intensive Care Units; Outcome Assessment, Health Care; Quality of Health Care
PubMed: 19633547
DOI: 10.1097/MCC.0b013e32833079fb -
Bioinformatics (Oxford, England) Dec 2019Protein structure comparison plays a fundamental role in understanding the evolutionary relationships between proteins. Here, we release a new version of the DaliLite...
MOTIVATION
Protein structure comparison plays a fundamental role in understanding the evolutionary relationships between proteins. Here, we release a new version of the DaliLite standalone software. The novelties are hierarchical search of the structure database organized into sequence based clusters, and remote access to our knowledge base of structural neighbors. The detection of fold, superfamily and family level similarities by DaliLite and state-of-the-art competitors was benchmarked against a manually curated structural classification.
RESULTS
Database search strategies were evaluated using Fmax with query-specific thresholds. DaliLite and DeepAlign outperformed TM-score based methods at all levels of the benchmark, and DaliLite outperformed DeepAlign at fold level. Hierarchical and knowledge-based searches got close to the performance of systematic pairwise comparison. The knowledge-based search was four times as efficient as the hierarchical search. The knowledge-based search dynamically adjusts the depth of the search, enabling a trade-off between speed and recall.
AVAILABILITY AND IMPLEMENTATION
http://ekhidna2.biocenter.helsinki.fi/dali/README.v5.html.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Algorithms; Benchmarking; Databases, Factual; Proteins; Sequence Analysis, Protein; Software
PubMed: 31263867
DOI: 10.1093/bioinformatics/btz536 -
Chemphyschem : a European Journal of... Sep 2022The potentials of mean force (PMFs) along the end-to-end distance of two different helical peptides have been obtained and benchmarked using the adaptive steered...
The potentials of mean force (PMFs) along the end-to-end distance of two different helical peptides have been obtained and benchmarked using the adaptive steered molecular dynamics (ASMD) method. The results depend strongly on the choice of force field driving the underlying all-atom molecular dynamics, and are reported with respect to the three most popular CHARMM force field versions: c22, c27 and c36. Two small peptides, and 1PEF, serve as the particular case studies. The comparisons between the versions of the CHARMM force fields provides both a qualitative and quantitative look at their performance in forced unfolding simulations in which peptides undergo large changes in structural conformations. We find that ASMD with the underlying c36 force field provides the most robust results for the selected benchmark peptides.
Topics: Benchmarking; Molecular Conformation; Molecular Dynamics Simulation; Peptides
PubMed: 35594194
DOI: 10.1002/cphc.202200175 -
Journal of Infection and Public Health Oct 2013Growing numbers of healthcare facilities are routinely collecting standardized data on healthcare-associated infection (HAI), which can be used not only to track... (Review)
Review
Growing numbers of healthcare facilities are routinely collecting standardized data on healthcare-associated infection (HAI), which can be used not only to track internal performance but also to compare local data to national and international benchmarks. Benchmarking overall (crude) HAI surveillance metrics without accounting or adjusting for potential confounders can result in misleading conclusions. Methods commonly used to provide risk-adjusted metrics include multivariate logistic regression analysis, stratification, indirect standardization, and restrictions. The characteristics of recognized benchmarks worldwide, including the advantages and limitations are described. The choice of the right benchmark for the data from the Gulf Cooperation Council (GCC) states is challenging. The chosen benchmark should have similar data collection and presentation methods. Additionally, differences in surveillance environments including regulations should be taken into consideration when considering such a benchmark. The GCC center for infection control took some steps to unify HAI surveillance systems in the region. GCC hospitals still need to overcome legislative and logistic difficulties in sharing data to create their own benchmark. The availability of a regional GCC benchmark may better enable health care workers and researchers to obtain more accurate and realistic comparisons.
Topics: Benchmarking; Communicable Disease Control; Cross Infection; Epidemiologic Methods; Humans; Topography, Medical
PubMed: 23999329
DOI: 10.1016/j.jiph.2013.05.001 -
Briefings in Bioinformatics May 2022Data analysis is a critical part of quantitative proteomics studies in interpreting biological questions. Numerous computational tools for protein quantification,...
Data analysis is a critical part of quantitative proteomics studies in interpreting biological questions. Numerous computational tools for protein quantification, imputation and differential expression (DE) analysis were generated in the past decade and the search for optimal tools is still going on. Moreover, due to the rapid development of RNA sequencing (RNA-seq) technology, a vast number of DE analysis methods were created for that purpose. The applicability of these newly developed RNA-seq-oriented tools to proteomics data remains in doubt. In order to benchmark these analysis methods, a proteomics dataset consisting of proteins derived from humans, yeast and drosophila, in defined ratios, was generated in this study. Based on this dataset, DE analysis tools, including microarray- and RNA-seq-based ones, imputation algorithms and protein quantification methods were compared and benchmarked. Furthermore, applying these approaches to two public datasets showed that RNA-seq-based DE tools achieved higher accuracy (ACC) in identifying DEPs. This study provides useful guidelines for analyzing quantitative proteomics datasets. All the methods used in this study were integrated into the Perseus software, version 2.0.3.0, which is available at https://www.maxquant.org/perseus.
Topics: Algorithms; Benchmarking; Proteins; Proteomics; Sequence Analysis, RNA; Software
PubMed: 35397162
DOI: 10.1093/bib/bbac138 -
The British Journal of Surgery Jan 2019Benchmarking is a popular quality-improvement tool in economic practice. Its basic principle consists of identifying the best (the benchmark), then comparing with the...
Benchmarking is a popular quality-improvement tool in economic practice. Its basic principle consists of identifying the best (the benchmark), then comparing with the best, and learning from the best. In healthcare, the concept of benchmarking or establishing benchmarks has been less specific, where comparisons often do not target the best, but the average results. The goal, however, remains improvement in patient outcome. This article outlines the application of benchmarking and proposes a standard approach of benchmark determination in surgery, including the establishment of best achievable real-world postoperative outcomes. Parameters used for this purpose must be reproducible, objective and universal. A systematic approach for determining benchmarks enables self-assessment of surgical outcome and facilitates the detection of areas for improvement. The intention of benchmarking is to stimulate surgeons' genuine endeavour for perfection, rather than to judge centre or surgeon performance.
Topics: Benchmarking; Clinical Competence; Humans; Quality Improvement; Surgeons; Surgical Procedures, Operative
PubMed: 30485405
DOI: 10.1002/bjs.10976 -
Sensors (Basel, Switzerland) May 2022Object detection is an essential capability for performing complex tasks in robotic applications. Today, deep learning (DL) approaches are the basis of state-of-the-art...
Object detection is an essential capability for performing complex tasks in robotic applications. Today, deep learning (DL) approaches are the basis of state-of-the-art solutions in computer vision, where they provide very high accuracy albeit with high computational costs. Due to the physical limitations of robotic platforms, embedded devices are not as powerful as desktop computers, and adjustments have to be made to deep learning models before transferring them to robotic applications. This work benchmarks deep learning object detection models in embedded devices. Furthermore, some hardware selection guidelines are included, together with a description of the most relevant features of the two boards selected for this benchmark. Embedded electronic devices integrate a powerful AI co-processor to accelerate DL applications. To take advantage of these co-processors, models must be converted to a specific embedded runtime format. Five quantization levels applied to a collection of DL models are considered; two of them allow the execution of models in the embedded general-purpose CPU and are used as the baseline to assess the improvements obtained when running the same models with the three remaining quantization levels in the AI co-processors. The benchmark procedure is explained in detail, and a comprehensive analysis of the collected data is presented. Finally, the feasibility and challenges of the implementation of embedded object detection applications are discussed.
Topics: Algorithms; Benchmarking; Computers; Deep Learning; Electronics
PubMed: 35684827
DOI: 10.3390/s22114205 -
Nature Communications Nov 2021Intratumour heterogeneity provides tumours with the ability to adapt and acquire treatment resistance. The development of more effective and personalised treatments for...
Intratumour heterogeneity provides tumours with the ability to adapt and acquire treatment resistance. The development of more effective and personalised treatments for cancers, therefore, requires accurate characterisation of the clonal architecture of tumours, enabling evolutionary dynamics to be tracked. Many methods exist for achieving this from bulk tumour sequencing data, involving identifying mutations and performing subclonal deconvolution, but there is a lack of systematic benchmarking to inform researchers on which are most accurate, and how dataset characteristics impact performance. To address this, we use the most comprehensive tumour genome simulation tool available for such purposes to create 80 bulk tumour whole exome sequencing datasets of differing depths, tumour complexities, and purities, and use these to benchmark subclonal deconvolution pipelines. We conclude that i) tumour complexity does not impact accuracy, ii) increasing either purity or purity-corrected sequencing depth improves accuracy, and iii) the optimal pipeline consists of Mutect2, FACETS and PyClone-VI. We have made our benchmarking datasets publicly available for future use.
Topics: Benchmarking; Exome; High-Throughput Nucleotide Sequencing; Humans; Software
PubMed: 34737285
DOI: 10.1038/s41467-021-26698-7 -
Microbial Genomics Mar 2022Phylogenetic analyses are widely used in microbiological research, for example to trace the progression of bacterial outbreaks based on whole-genome sequencing data. In...
Phylogenetic analyses are widely used in microbiological research, for example to trace the progression of bacterial outbreaks based on whole-genome sequencing data. In practice, multiple analysis steps such as assembly, alignment and phylogenetic inference are combined to form phylogenetic workflows. Comprehensive benchmarking of the accuracy of complete phylogenetic workflows is lacking. To benchmark different phylogenetic workflows, we simulated bacterial evolution under a wide range of evolutionary models, varying the relative rates of substitution, insertion, deletion, gene duplication, gene loss and lateral gene transfer events. The generated datasets corresponded to a genetic diversity usually observed within bacterial species (≥95 % average nucleotide identity). We replicated each simulation three times to assess replicability. In total, we benchmarked 19 distinct phylogenetic workflows using 8 different simulated datasets. We found that recently developed -mer alignment methods such as kSNP and ska achieve similar accuracy as reference mapping. The high accuracy of -mer alignment methods can be explained by the large fractions of genomes these methods can align, relative to other approaches. We also found that the choice of assembly algorithm influences the accuracy of phylogenetic reconstruction, with workflows employing SPAdes or skesa outperforming those employing Velvet. Finally, we found that the results of phylogenetic benchmarking are highly variable between replicates. We conclude that for phylogenomic reconstruction, -mer alignment methods are relevant alternatives to reference mapping at the species level, especially in the absence of suitable reference genomes. We show genome assembly accuracy to be an underappreciated parameter required for accurate phylogenomic reconstruction.
Topics: Algorithms; Benchmarking; Genome; Phylogeny; Workflow
PubMed: 35290758
DOI: 10.1099/mgen.0.000799