-
Journal of Chemical Information and... Aug 2022The field of machine learning for drug discovery is witnessing an explosion of novel methods. These methods are often benchmarked on simple physicochemical properties... (Review)
Review
The field of machine learning for drug discovery is witnessing an explosion of novel methods. These methods are often benchmarked on simple physicochemical properties such as solubility or general druglikeness, which can be readily computed. However, these properties are poor representatives of objective functions in drug design, mainly because they do not depend on the candidate compound's interaction with the target. By contrast, molecular docking is a widely applied method in drug discovery to estimate binding affinities. However, docking studies require a significant amount of domain knowledge to set up correctly, which hampers adoption. Here, we present dockstring, a bundle for meaningful and robust comparison of ML models using docking scores. dockstring consists of three components: (1) an open-source Python package for straightforward computation of docking scores, (2) an extensive dataset of docking scores and poses of more than 260,000 molecules for 58 medically relevant targets, and (3) a set of pharmaceutically relevant benchmark tasks such as virtual screening or design of selective kinase inhibitors. The Python package implements a robust ligand and target preparation protocol that allows nonexperts to obtain meaningful docking scores. Our dataset is the first to include docking poses, as well as the first of its size that is a full matrix, thus facilitating experiments in multiobjective optimization and transfer learning. Overall, our results indicate that docking scores are a more realistic evaluation objective than simple physicochemical properties, yielding benchmark tasks that are more challenging and more closely related to real problems in drug discovery.
Topics: Benchmarking; Drug Design; Ligands; Molecular Docking Simulation; Protein Binding; Proteins
PubMed: 35849793
DOI: 10.1021/acs.jcim.1c01334 -
Genome Biology Jan 2022Researchers view vast zeros in single-cell RNA-seq data differently: some regard zeros as biological signals representing no or low gene expression, while others regard... (Review)
Review
Researchers view vast zeros in single-cell RNA-seq data differently: some regard zeros as biological signals representing no or low gene expression, while others regard zeros as missing data to be corrected. To help address the controversy, here we discuss the sources of biological and non-biological zeros; introduce five mechanisms of adding non-biological zeros in computational benchmarking; evaluate the impacts of non-biological zeros on data analysis; benchmark three input data types: observed counts, imputed counts, and binarized counts; discuss the open questions regarding non-biological zeros; and advocate the importance of transparent analysis.
Topics: Benchmarking; Biology; Sequence Analysis, RNA; Single-Cell Analysis; Exome Sequencing
PubMed: 35063006
DOI: 10.1186/s13059-022-02601-5 -
RNA (New York, N.Y.) Dec 2023The tremendous rate with which data is generated and analysis methods emerge makes it increasingly difficult to keep track of their domain of applicability, assumptions,... (Review)
Review
The tremendous rate with which data is generated and analysis methods emerge makes it increasingly difficult to keep track of their domain of applicability, assumptions, limitations, and consequently, of the efficacy and precision with which they solve specific tasks. Therefore, there is an increasing need for benchmarks, and for the provision of infrastructure for continuous method evaluation. APAeval is an international community effort, organized by the RNA Society in 2021, to benchmark tools for the identification and quantification of the usage of alternative polyadenylation (APA) sites from short-read, bulk RNA-sequencing (RNA-seq) data. Here, we reviewed 17 tools and benchmarked eight on their ability to perform APA identification and quantification, using a comprehensive set of RNA-seq experiments comprising real, synthetic, and matched 3'-end sequencing data. To support continuous benchmarking, we have incorporated the results into the OpenEBench online platform, which allows for continuous extension of the set of methods, metrics, and challenges. We envisage that our analyses will assist researchers in selecting the appropriate tools for their studies, while the containers and reproducible workflows could easily be deployed and extended to evaluate new methods or data sets.
Topics: RNA; Benchmarking; RNA-Seq; Polyadenylation; Sequence Analysis, RNA
PubMed: 37816550
DOI: 10.1261/rna.079849.123 -
Therapeutic Innovation & Regulatory... Jan 2023Benchmark data characterizing protocol design practices and performance informs clinical trial design decisions and serves as important baseline measures for assessing...
BACKGROUND
Benchmark data characterizing protocol design practices and performance informs clinical trial design decisions and serves as important baseline measures for assessing protocol design behaviors and their impact during and post-pandemic.
METHODS
Tufts CSDD, in collaboration with a working group of 20 major and mid-sized pharmaceutical companies and CROs, gathered phase I-III data from protocols completed just prior to the start of the global pandemic.
RESULTS
Data for 187 protocols were analyzed to derive benchmarks overall and for two primary subgroups: oncology vs. non-oncology protocols and rare disease vs. non-rare disease protocols. The results show a continuing upward trend across all protocol design variables. Phase II and III protocols average more endpoints, eligibility criteria, protocol pages; investigative sites; countries and datapoints collected. Oncology and rare disease protocols' enrolled-to-completion rates are much lower, involve a much higher average number of countries and investigative sites, require more planned patient visits and generate considerably more clinical research data. As such, oncology and rare disease clinical trial cycle times are longer-most notably at time periods occurring after study startup and prior to database lock-due to intense patient recruitment and retention challenges.
CONCLUSIONS
The results of this study present valuable design insights and comparative baseline measures. The implications of these results and the expected impact of decentralized clinical trials on protocol design practices and performance is discussed.
Topics: Humans; Benchmarking; Pandemics; Patient Selection; Clinical Trials as Topic
PubMed: 35960455
DOI: 10.1007/s43441-022-00438-5 -
Structure (London, England : 1993) Jan 2023Recent advancements in computational tools have allowed protein structure prediction with high accuracy. Computational prediction methods have been used for modeling...
Recent advancements in computational tools have allowed protein structure prediction with high accuracy. Computational prediction methods have been used for modeling many soluble and membrane proteins, but the performance of these methods in modeling peptide structures has not yet been systematically investigated. We benchmarked the accuracy of AlphaFold2 in predicting 588 peptide structures between 10 and 40 amino acids using experimentally determined NMR structures as reference. Our results showed AlphaFold2 predicts α-helical, β-hairpin, and disulfide-rich peptides with high accuracy. AlphaFold2 performed at least as well if not better than alternative methods developed specifically for peptide structure prediction. AlphaFold2 showed several shortcomings in predicting Φ/Ψ angles, disulfide bond patterns, and the lowest RMSD structures failed to correlate with lowest pLDDT ranked structures. In summary, computation can be a powerful tool to predict peptide structures, but additional steps may be necessary to analyze and validate the results.
Topics: Protein Structure, Secondary; Benchmarking; Peptides; Membrane Proteins; Disulfides; Protein Conformation
PubMed: 36525975
DOI: 10.1016/j.str.2022.11.012 -
IEEE Transactions on Bio-medical... Mar 2022Machine learning techniques used in computer-aided medical image analysis usually suffer from the domain shift problem caused by different distributions between... (Review)
Review
Machine learning techniques used in computer-aided medical image analysis usually suffer from the domain shift problem caused by different distributions between source/reference data and target data. As a promising solution, domain adaptation has attracted considerable attention in recent years. The aim of this paper is to survey the recent advances of domain adaptation methods in medical image analysis. We first present the motivation of introducing domain adaptation techniques to tackle domain heterogeneity issues for medical image analysis. Then we provide a review of recent domain adaptation models in various medical image analysis tasks. We categorize the existing methods into shallow and deep models, and each of them is further divided into supervised, semi-supervised and unsupervised methods. We also provide a brief summary of the benchmark medical image datasets that support current domain adaptation research. This survey will enable researchers to gain a better understanding of the current status, challenges and future directions of this energetic research field.
Topics: Benchmarking; Image Processing, Computer-Assisted; Machine Learning
PubMed: 34606445
DOI: 10.1109/TBME.2021.3117407 -
Wiener Klinische Wochenschrift Feb 2022The need for patient safety through consistent diagnostic performance has increasingly been brought into focus during the last two decades. Around the globe operational...
BACKGROUND AND AIMS
The need for patient safety through consistent diagnostic performance has increasingly been brought into focus during the last two decades. Around the globe operational efficiency of diagnostic laboratories plays a key role in satisfying this need, which has impressively been shown during the recent months of the SARS-CoV‑2 pandemic. On a global level, however, there has been a lack to collate and benchmark data for diagnostic laboratories. The goals of this study were to design and pilot a questionnaire addressing key aspects of diagnostic laboratory management.
METHODS
The questionnaire was designed using an iterative process and taking into consideration information that could be extracted from the literature, author experience and feedback from informal focus groups of laboratory professionals. The resulting tool consisted of 50 items, either relating to general information or more specifically addressing the topics of "operational performance", "integrated clinical care performance", and "financial sustainability". A limited number of laboratories were surveyed to be able to further improve the newly developed tool and motivate the global laboratory community to participate in further benchmarking activity.
RESULTS AND CONCLUSION
Altogether, 65 laboratories participated in the survey, 42 were hospital laboratories and 23 were commercial laboratories. Potential for further improvement and standardization became apparent across the board, e.g. use of IT for order management, auto-validation, or turn-around time (TAT) monitoring. Notably, a gap was identified regarding services provided to physicians, in particular "reflexive test suggestions", "proactive consultation on complex cases", and "diagnostic pathways guidance", which were only provided by about two thirds of laboratories. Concordantly, within-laboratory TAT (Lab TAT) was monitored by about 80% of respondents, while sample-to-result TAT, which is arguably the TAT most relevant to clinicians, was only monitored by 32% of respondents. Altogether, the need for stronger integration of the laboratory into the clinical care process became apparent and should be a main trajectory of future laboratory management.
Topics: Austria; Benchmarking; COVID-19; Germany; Humans; Laboratories; SARS-CoV-2; Surveys and Questionnaires; Switzerland
PubMed: 34709471
DOI: 10.1007/s00508-021-01962-4 -
Genome Biology Dec 2021Single-cell RNA-sequencing (scRNA-seq) technologies and associated analysis methods have rapidly developed in recent years. This includes preprocessing methods, which...
BACKGROUND
Single-cell RNA-sequencing (scRNA-seq) technologies and associated analysis methods have rapidly developed in recent years. This includes preprocessing methods, which assign sequencing reads to genes to create count matrices for downstream analysis. While several packaged preprocessing workflows have been developed to provide users with convenient tools for handling this process, how they compare to one another and how they influence downstream analysis have not been well studied.
RESULTS
Here, we systematically benchmark the performance of 10 end-to-end preprocessing workflows (Cell Ranger, Optimus, salmon alevin, alevin-fry, kallisto bustools, dropSeqPipe, scPipe, zUMIs, celseq2, and scruff) using datasets yielding different biological complexity levels generated by CEL-Seq2 and 10x Chromium platforms. We compare these workflows in terms of their quantification properties directly and their impact on normalization and clustering by evaluating the performance of different method combinations. While the scRNA-seq preprocessing workflows compared vary in their detection and quantification of genes across datasets, after downstream analysis with performant normalization and clustering methods, almost all combinations produce clustering results that agree well with the known cell type labels that provided the ground truth in our analysis.
CONCLUSIONS
In summary, the choice of preprocessing method was found to be less important than other steps in the scRNA-seq analysis process. Our study comprehensively compares common scRNA-seq preprocessing workflows and summarizes their characteristics to guide workflow users.
Topics: Benchmarking; Cluster Analysis; Gene Expression Profiling; RNA-Seq; Sequence Analysis, RNA; Single-Cell Analysis; Software; Transcriptome; Workflow
PubMed: 34906205
DOI: 10.1186/s13059-021-02552-3 -
BMC Bioinformatics Feb 2021Benchmarking the performance of complex analytical pipelines is an essential part of developing Lab Developed Tests (LDT). Reference samples and benchmark calls...
BACKGROUND
Benchmarking the performance of complex analytical pipelines is an essential part of developing Lab Developed Tests (LDT). Reference samples and benchmark calls published by Genome in a Bottle (GIAB) consortium have enabled the evaluation of analytical methods. The performance of such methods is not uniform across the different genomic regions of interest and variant types. Several benchmarking methods such as hap.py, vcfeval, and vcflib are available to assess the analytical performance characteristics of variant calling algorithms. However, assessing the performance characteristics of an overall LDT assay still requires stringing together several such methods and experienced bioinformaticians to interpret the results. In addition, these methods are dependent on the hardware, operating system and other software libraries, making it impossible to reliably repeat the analytical assessment, when any of the underlying dependencies change in the assay. Here we present a scalable and reproducible, cloud-based benchmarking workflow that is independent of the laboratory and the technician executing the workflow, or the underlying compute hardware used to rapidly and continually assess the performance of LDT assays, across their regions of interest and reportable range, using a broad set of benchmarking samples.
RESULTS
The benchmarking workflow was used to evaluate the performance characteristics for secondary analysis pipelines commonly used by Clinical Genomics laboratories in their LDT assays such as the GATK HaplotypeCaller v3.7 and the SpeedSeq workflow based on FreeBayes v0.9.10. Five reference sample truth sets generated by Genome in a Bottle (GIAB) consortium, six samples from the Personal Genome Project (PGP) and several samples with validated clinically relevant variants from the Centers for Disease Control were used in this work. The performance characteristics were evaluated and compared for multiple reportable ranges, such as whole exome and the clinical exome.
CONCLUSIONS
We have implemented a benchmarking workflow for clinical diagnostic laboratories that generates metrics such as specificity, precision and sensitivity for germline SNPs and InDels within a reportable range using whole exome or genome sequencing data. Combining these benchmarking results with validation using known variants of clinical significance in publicly available cell lines, we were able to establish the performance of variant calling pipelines in a clinical setting.
Topics: Benchmarking; Exome; Germ Cells; High-Throughput Nucleotide Sequencing; Polymorphism, Single Nucleotide; Software; Workflow
PubMed: 33627090
DOI: 10.1186/s12859-020-03934-3 -
Molecular Medicine Reports Apr 2021Genome assemblers are computational tools for genome assembly, based on a plenitude of primary sequencing data. The quality of genome assemblies is estimated by their...
Genome assemblers are computational tools for genome assembly, based on a plenitude of primary sequencing data. The quality of genome assemblies is estimated by their contiguity and the occurrences of misassemblies (duplications, deletions, translocations or inversions). The rapid development of sequencing technologies has enabled the rise of novel genome assembly strategies. The ultimate goal of such strategies is to utilise the features of each sequencing platform in order to address the existing weaknesses of each sequencing type and compose a complete and correct genome map. In the present study, the hybrid strategy, which is based on Illumina short paired‑end reads and Nanopore long reads, was benchmarked using MaSuRCA and Wengan assemblers. Moreover, the long‑read assembly strategy, which is based on Nanopore reads, was benchmarked using Canu or PacBio HiFi reads were benchmarked using Hifiasm and HiCanu. The assemblies were performed on a computational cluster with limited computational resources. Their outputs were evaluated in terms of accuracy and computational performance. PacBio HiFi assembly strategy outperforms the other ones, while Hi‑C scaffolding, which is based on chromatin 3D structure, is required in order to increase continuity, accuracy and completeness when large and complex genomes, such as the human one, are assembled. The use of Hi‑C data is also necessary while using the hybrid assembly strategy. The results revealed that HiFi sequencing enabled the rise of novel algorithms which require less genome coverage than that of the other strategies making the assembly a less computationally demanding task. Taken together, these developments may lead to the democratisation of genome assembly projects which are now approachable by smaller labs with limited technical and financial resources.
Topics: Algorithms; Animals; Benchmarking; Drosophila melanogaster; Genome, Human; Genome, Insect; High-Throughput Nucleotide Sequencing; Humans
PubMed: 33537807
DOI: 10.3892/mmr.2021.11890