-
Methods in Molecular Biology (Clifton,... 2024Knowledge of the expected accuracy of HLA typing algorithms is important when choosing between algorithms and when evaluating the HLA typing predictions of an algorithm....
Knowledge of the expected accuracy of HLA typing algorithms is important when choosing between algorithms and when evaluating the HLA typing predictions of an algorithm. This chapter guides the reader through an example benchmarking study that evaluates the performances of four NGS-based HLA typing algorithms as well as outlining factors to consider, when designing and running such a benchmarking study. The code related to this benchmarking workflow can be found at https://github.com/nikolasthuesen/springers-hla-benchmark/ .
Topics: Histocompatibility Testing; Benchmarking; Algorithms; Humans; High-Throughput Nucleotide Sequencing; Software; HLA Antigens
PubMed: 38907892
DOI: 10.1007/978-1-0716-3874-3_6 -
International Journal of Molecular... Sep 2023Much of today's molecular science revolves around next-generation sequencing. Frequently, the first step in analyzing such data is aligning sequencing reads to a...
Much of today's molecular science revolves around next-generation sequencing. Frequently, the first step in analyzing such data is aligning sequencing reads to a reference genome. This step is often taken for granted, but any analysis downstream of the alignment will be affected by the aligner's ability to correctly map sequences. In most cases, for research into chromatin structure and nucleosome positioning, ATAC-seq, ChIP-seq, and MNase-seq experiments use short read lengths. How well aligners manage these reads is critical. Most aligner programs will output mapped reads and unmapped reads. However, from a biological point of view, reads will fall into one of three categories: correctly mapped, incorrectly mapped, and unmapped. While increased sequencing depth can often compensate for unmapped reads, incorrectly and correctly mapped reads appear algorithmically identical but can produce biologically significant alterations in the results. For this reason, we are benchmarking various alignment programs to determine their propensity to incorrectly map short reads. As short-read alignment is an important step in ATAC-seq, ChIP-seq, and MNase-seq experiments, caution should be taken in mapping reads to ensure that the most accurate conclusions can be made from the data generated. Our analysis is intended to help investigators new to the field pick the alignment program best suited for their experimental conditions. In general, the aligners we tested performed well. BWA, Bowtie2, and Chromap were all exceptionally accurate, and we recommend using them. Furthermore, we show that longer read lengths do in fact lead to more accurate mappings.
Topics: Chromatin; Benchmarking; Sequence Alignment; Genome; High-Throughput Nucleotide Sequencing; Sequence Analysis, DNA; Software; Algorithms
PubMed: 37762379
DOI: 10.3390/ijms241814074 -
Journal of Zhejiang University.... May 2024Infectious diseases are a great threat to human health. Rapid and accurate detection of pathogens is important in the diagnosis and treatment of infectious diseases.... (Review)
Review
Infectious diseases are a great threat to human health. Rapid and accurate detection of pathogens is important in the diagnosis and treatment of infectious diseases. Metagenomics next-generation sequencing (mNGS) is an unbiased and comprehensive approach for detecting all RNA and DNA in a sample. With the development of sequencing and bioinformatics technologies, mNGS is moving from research to clinical application, which opens a new avenue for pathogen detection. Numerous studies have revealed good potential for the clinical application of mNGS in infectious diseases, especially in difficult-to-detect, rare, and novel pathogens. However, there are several hurdles in the clinical application of mNGS, such as: (1) lack of universal workflow validation and quality assurance; (2) insensitivity to high-host background and low-biomass samples; and (3) lack of standardized instructions for mass data analysis and report interpretation. Therefore, a complete understanding of this new technology will help promote the clinical application of mNGS to infectious diseases. This review briefly introduces the history of next-generation sequencing, mainstream sequencing platforms, and mNGS workflow, and discusses the clinical applications of mNGS to infectious diseases and its advantages and disadvantages.
Topics: Metagenomics; Humans; High-Throughput Nucleotide Sequencing; Communicable Diseases; Computational Biology; Workflow
PubMed: 38910493
DOI: 10.1631/jzus.B2300029 -
Methods in Molecular Biology (Clifton,... 2024DNA barcodes are short, standardized DNA segments that geneticists can use to identify all living taxa. On the other hand, DNA barcoding identifies species by analyzing...
DNA barcodes are short, standardized DNA segments that geneticists can use to identify all living taxa. On the other hand, DNA barcoding identifies species by analyzing these specific regions against a DNA barcode reference library. In its initial years, DNA barcodes sequenced by Sanger's method were extensively used by taxonomists for the characterization and identification of species. But in recent years, DNA barcoding by next-generation sequencing (NGS) has found broader applications, such as quality control, biomonitoring of protected species, and biodiversity assessment. Technological advancements have also paved the way to metabarcoding, which has enabled massive parallel sequ.encing of complex bulk samples using high-throughput sequencing techniques. In future, DNA barcoding along with high-throughput techniques will show stupendous progress in taxonomic classification with reference to available sequence data.
Topics: DNA Barcoding, Taxonomic; High-Throughput Nucleotide Sequencing; Sequence Analysis, DNA; Biodiversity; DNA; Animals
PubMed: 38683316
DOI: 10.1007/978-1-0716-3581-0_8 -
Methods in Molecular Biology (Clifton,... 2024Computational pangenomics deals with the joint analysis of all genomic sequences of a species. It has already been successfully applied to various tasks in many research...
Computational pangenomics deals with the joint analysis of all genomic sequences of a species. It has already been successfully applied to various tasks in many research areas. Further advances in DNA sequencing technologies constantly let more and more genomic sequences become available for many species, leading to an increasing attractiveness of pangenomic studies. At the same time, larger datasets also pose new challenges for data structures and algorithms that are needed to handle the data. Efficient methods oftentimes make use of the concept of k-mers.Core detection is a common way of analyzing a pangenome. The pangenome's core is defined as the subset of genomic information shared among all individual members. Classically, it is not only determined on the abstract level of genes but can also be described on the sequence level.In this chapter, we provide an overview of k-mer-based methods in the context of pangenomics studies. We first revisit existing software solutions for k-mer counting and k-mer set representation. Afterward, we describe the usage of two k-mer-based approaches, Pangrowth and Corer, for pangenomic core detection.
Topics: Software; Genomics; Algorithms; Computational Biology; Sequence Analysis, DNA; Humans; High-Throughput Nucleotide Sequencing
PubMed: 38819557
DOI: 10.1007/978-1-0716-3838-5_4 -
Nucleic Acids Research Dec 2023We presented an experimental method called FLOUR-seq, which combines BD Rhapsody and nanopore sequencing to detect the RNA lifecycle (including nascent, mature, and...
We presented an experimental method called FLOUR-seq, which combines BD Rhapsody and nanopore sequencing to detect the RNA lifecycle (including nascent, mature, and degrading RNAs) in cells. Additionally, we updated our HIT-scISOseq V2 to discover a more accurate RNA lifecycle using 10x Chromium and Pacbio sequencing. Most importantly, to explore how single-cell full-length RNA sequencing technologies could help improve the RNA velocity approach, we introduced a new algorithm called 'Region Velocity' to more accurately configure cellular RNA velocity. We applied this algorithm to study spermiogenesis and compared the performance of FLOUR-seq with Pacbio-based HIT-scISOseq V2. Our findings demonstrated that 'Region Velocity' is more suitable for analyzing single-cell full-length RNA data than traditional RNA velocity approaches. These novel methods could be useful for researchers looking to discover full-length RNAs in single cells and comprehensively monitor RNA lifecycle in cells.
Topics: Algorithms; High-Throughput Nucleotide Sequencing; Nanopore Sequencing; Sequence Analysis, RNA; Single-Cell Analysis
PubMed: 37941145
DOI: 10.1093/nar/gkad969 -
BMC Cancer Jul 2023Gene fusions are important cancer drivers in pediatric cancer and their accurate detection is essential for diagnosis and treatment. Clinical decision-making requires...
BACKGROUND
Gene fusions are important cancer drivers in pediatric cancer and their accurate detection is essential for diagnosis and treatment. Clinical decision-making requires high confidence and precision of detection. Recent developments show RNA sequencing (RNA-seq) is promising for genome-wide detection of fusion products but hindered by many false positives that require extensive manual curation and impede discovery of pathogenic fusions.
METHODS
We developed Fusion-sq to overcome existing disadvantages of detecting gene fusions. Fusion-sq integrates and "fuses" evidence from RNA-seq and whole genome sequencing (WGS) using intron-exon gene structure to identify tumor-specific protein coding gene fusions. Fusion-sq was then applied to the data generated from a pediatric pan-cancer cohort of 128 patients by WGS and RNA sequencing.
RESULTS
In a pediatric pan-cancer cohort of 128 patients, we identified 155 high confidence tumor-specific gene fusions and their underlying structural variants (SVs). This includes all clinically relevant fusions known to be present in this cohort (30 patients). Fusion-sq distinguishes healthy-occurring from tumor-specific fusions and resolves fusions in amplified regions and copy number unstable genomes. A high gene fusion burden is associated with copy number instability. We identified 27 potentially pathogenic fusions involving oncogenes or tumor-suppressor genes characterized by underlying SVs, in some cases leading to expression changes indicative of activating or disruptive effects.
CONCLUSIONS
Our results indicate how clinically relevant and potentially pathogenic gene fusions can be identified and their functional effects investigated by combining WGS and RNA-seq. Integrating RNA fusion predictions with underlying SVs advances fusion detection beyond extensive manual filtering. Taken together, we developed a method for identifying candidate gene fusions that is suitable for precision oncology applications. Our method provides multi-omics evidence for assessing the pathogenicity of tumor-specific gene fusions for future clinical decision making.
Topics: Child; Humans; Neoplasms; RNA-Seq; High-Throughput Nucleotide Sequencing; Precision Medicine; Sequence Analysis, RNA; Gene Fusion; Whole Genome Sequencing
PubMed: 37400763
DOI: 10.1186/s12885-023-11054-3 -
Genome Biology Jan 2024Sample multiplexing enables pooled analysis during single-cell RNA sequencing workflows, thereby increasing throughput and reducing batch effects. A challenge for all...
Sample multiplexing enables pooled analysis during single-cell RNA sequencing workflows, thereby increasing throughput and reducing batch effects. A challenge for all multiplexing techniques is to link sample-specific barcodes with cell-specific barcodes, then demultiplex sample identity post-sequencing. However, existing demultiplexing tools fail under many real-world conditions where barcode cross-contamination is an issue. We therefore developed deMULTIplex2, an algorithm inspired by a mechanistic model of barcode cross-contamination. deMULTIplex2 employs generalized linear models and expectation-maximization to probabilistically determine the sample identity of each cell. Benchmarking reveals superior performance across various experimental conditions, particularly on large or noisy datasets with unbalanced sample compositions.
Topics: Single-Cell Gene Expression Analysis; Single-Cell Analysis; Algorithms; Sequence Analysis, RNA; High-Throughput Nucleotide Sequencing
PubMed: 38291503
DOI: 10.1186/s13059-024-03177-y -
Biochemical Society Transactions Jun 2024Whole genome sequencing of viruses provides high-resolution molecular insights, enhancing our understanding of viral genome function and phylogeny. Beyond fundamental... (Review)
Review
Whole genome sequencing of viruses provides high-resolution molecular insights, enhancing our understanding of viral genome function and phylogeny. Beyond fundamental research, viral sequencing is increasingly vital for pathogen surveillance, epidemiology, and clinical applications. As sequencing methods rapidly evolve, the diversity of viral genomics applications and catalogued genomes continues to expand. Advances in long-read, single molecule, real-time sequencing methodologies present opportunities to sequence contiguous, haplotype resolved viral genomes in a range of research and applied settings. Here we present an overview of nucleic acid sequencing methods and their applications in studying viral genomes. We emphasise the advantages of different viral sequencing approaches, with a particular focus on the benefits of third-generation sequencing technologies in elucidating viral evolution, transmission networks, and pathogenesis.
Topics: Genome, Viral; Humans; High-Throughput Nucleotide Sequencing; Viruses; Genomics; Phylogeny; Whole Genome Sequencing; Sequence Analysis, DNA
PubMed: 38747720
DOI: 10.1042/BST20231322 -
Indian Journal of Medical Microbiology 2023Detection of infectious diseases, especially among immunocompromised and patients on prolonged anti-microbial treatment, remains challenging, limited by conventional... (Review)
Review
BACKGROUND
Detection of infectious diseases, especially among immunocompromised and patients on prolonged anti-microbial treatment, remains challenging, limited by conventional techniques with low sensitivity and long-turnaround time. Molecular detection by polymerase chain reaction (PCR) also has limited utility as it requires a targeted approach with prior suspicion of the infecting organism. Advancements in sequencing methodologies, specifically next-generation sequencing (NGS), have presented a promising opportunity to identify pathogens in cases where conventional techniques may be inadequate. However, the direct application of these techniques for diagnosing invasive infections is still limited by the need for invasive sampling, highlighting the pressing need to develop and implement non-invasive or minimally invasive approaches to improve the diagnosis of invasive infections.
OBJECTIVES
The objectives of this article are to explore the notable features, clinical utility, and constraints associated with the detection of microbial circulating cell-free DNA (mcfDNA) as a minimally invasive diagnostic tool for infectious diseases.
CONTENT
The mcfDNA detection provides an opportunity to identify micro-organisms in the blood of a patient. It is especially beneficial in immunocompromised patients where invasive sampling is not possible or where repeated cultures are negative. This review will discuss the applications and constraints of detecting mcfDNA for diagnosing infections and the various platforms available for its detection.
Topics: Humans; Cell-Free Nucleic Acids; Communicable Diseases; Polymerase Chain Reaction; Specimen Handling; High-Throughput Nucleotide Sequencing
PubMed: 37945127
DOI: 10.1016/j.ijmmb.2023.100433