-
Comparing assembly strategies for third-generation sequencing technologies across different genomes.Genomics Sep 2023The recent advent of long-read sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore technology (ONT), has led to substantial accuracy and...
The recent advent of long-read sequencing technologies, such as Pacific Biosciences (PacBio) and Oxford Nanopore technology (ONT), has led to substantial accuracy and computational cost improvements. However, de novo whole-genome assembly still presents significant challenges related to the computational cost and the quality of the results. Accordingly, sequencing accuracy and throughput continue to improve, and many tools are constantly emerging. Therefore, selecting the correct sequencing platform, the proper sequencing depth and the assembly tools are necessary to perform high-quality assembly. This paper evaluates the primary assembly reconstruction from recent hybrid and non-hybrid pipelines on different genomes. We find that using PacBio high-fidelity long-read (HiFi) plays an essential role in haplotype construction with respect to ONT reads. However, we observe a substantial improvement in the correctness of the assembly from high-fidelity ONT datasets and combining it with HiFi or short-reads.
Topics: High-Throughput Nucleotide Sequencing; Genome; Sequence Analysis, DNA
PubMed: 37598732
DOI: 10.1016/j.ygeno.2023.110700 -
Haematologica Feb 2024Innovations in molecular diagnostics have often evolved through the study of hematologic malignancies. Examples include the pioneering characterization of the... (Review)
Review
Innovations in molecular diagnostics have often evolved through the study of hematologic malignancies. Examples include the pioneering characterization of the Philadelphia chromosome by cytogenetics in the 1970s, the implementation of polymerase chain reaction for high-sensitivity detection and monitoring of mutations and, most recently, targeted next- generation sequencing to drive the prognostic and therapeutic assessment of leukemia. Hematologists and hematopath- ologists have continued to advance in the past decade with new innovations improving the type, amount, and quality of data generated for each molecule of nucleic acid. In this review article, we touch on these new developments and discuss their implications for diagnostics in hematopoietic malignancies. We review advances in sequencing platforms and library preparation chemistry that can lead to faster turnaround times, novel sequencing techniques, the development of mobile laboratories with implications for worldwide benefits, the current status of sample types, improvements to quality and reference materials, bioinformatic pipelines, and the integration of machine learning and artificial intelligence into mol- ecular diagnostic tools for hematologic malignancies.
Topics: Humans; Artificial Intelligence; Mutation; Hematologic Neoplasms; Polymerase Chain Reaction; High-Throughput Nucleotide Sequencing
PubMed: 37584286
DOI: 10.3324/haematol.2022.282442 -
American Journal of Human Genetics Dec 2023DNA sample contamination is a major issue in clinical and research applications of whole-genome and -exome sequencing. Even modest levels of contamination can...
DNA sample contamination is a major issue in clinical and research applications of whole-genome and -exome sequencing. Even modest levels of contamination can substantially affect the overall quality of variant calls and lead to widespread genotyping errors. Currently, popular tools for estimating the contamination level use short-read data (BAM/CRAM files), which are expensive to store and manipulate and often not retained or shared widely. We propose a metric to estimate DNA sample contamination from variant-level whole-genome and -exome sequence data called CHARR, contamination from homozygous alternate reference reads, which leverages the infiltration of reference reads within homozygous alternate variant calls. CHARR uses a small proportion of variant-level genotype information and thus can be computed from single-sample gVCFs or callsets in VCF or BCF formats, as well as efficiently stored variant calls in Hail VariantDataset format. Our results demonstrate that CHARR accurately recapitulates results from existing tools with substantially reduced costs, improving the accuracy and efficiency of downstream analyses of ultra-large whole-genome and exome sequencing datasets.
Topics: Humans; Animals; Trout; Sequence Analysis, DNA; Genotype; Homozygote; DNA; High-Throughput Nucleotide Sequencing; Software
PubMed: 38000370
DOI: 10.1016/j.ajhg.2023.10.011 -
Yi Chuan = Hereditas Feb 2024As a key supporting technology in the fields of life sciences and medicine, high-throughput sequencing has developed rapidly and become increasingly mature. The workflow... (Review)
Review
As a key supporting technology in the fields of life sciences and medicine, high-throughput sequencing has developed rapidly and become increasingly mature. The workflow of this technology can be divided into nucleic acid extraction, library construction, sequencing, and data analysis. Among these, library construction is a pivotal step that bridges the previous and subsequent stages. The effectiveness of library construction is contingent on the quality of upstream samples and also impacts the data analysis following sequence data output. The selection and implementation of library construction quality control techniques are crucial for enhancing the reliability of results and reducing errors in sequencing data. This review provides an in-depth discussion of library construction quality control techniques, summarizing and evaluating their principles, advantages and disadvantages, and applicability. It also discusses the selection of relevant technologies in practical application scenarios. The aim is to offer theoretical foundations and references for researchers, disease prevention and control personnel, and others when choosing library quality control techniques, thereby promoting the quality and efficiency of high-throughput sequencing work.
Topics: High-Throughput Nucleotide Sequencing; Reproducibility of Results; Gene Library; Cloning, Molecular; Quality Control; Sequence Analysis, DNA
PubMed: 38340004
DOI: 10.16288/j.yczz.23-262 -
Genes Mar 2024The advancement of next-generation sequencing (NGS) technologies provides opportunities for large-scale Pharmacogenetic (PGx) studies and pre-emptive PGx testing to...
BACKGROUND
The advancement of next-generation sequencing (NGS) technologies provides opportunities for large-scale Pharmacogenetic (PGx) studies and pre-emptive PGx testing to cover a wide range of genotypes present in diverse populations. However, NGS-based PGx testing is limited by the lack of comprehensive computational tools to support genetic data analysis and clinical decisions.
METHODS
Bioinformatics utilities specialized for human genomics and the latest cloud-based technologies were used to develop a bioinformatics pipeline for analyzing the genomic sequence data and reporting PGx genotypes. A database was created and integrated in the pipeline for filtering the actionable PGx variants and clinical interpretations. Strict quality verification procedures were conducted on variant calls with the whole genome sequencing (WGS) dataset of the 1000 Genomes Project (G1K). The accuracy of PGx allele identification was validated using the WGS dataset of the Pharmacogenetics Reference Materials from the Centers for Disease Control and Prevention (CDC).
RESULTS
The newly created bioinformatics pipeline, Pgxtools, can analyze genomic sequence data, identify actionable variants in 13 PGx relevant genes, and generate reports annotated with specific interpretations and recommendations based on clinical practice guidelines. Verified with two independent methods, we have found that Pgxtools consistently identifies variants more accurately than the results in the G1K dataset on GRCh37 and GRCh38.
CONCLUSIONS
Pgxtools provides an integrated workflow for large-scale genomic data analysis and PGx clinical decision support. Implemented with cloud-native technologies, it is highly portable in a wide variety of environments from a single laptop to High-Performance Computing (HPC) clusters and cloud platforms for different production scales and requirements.
Topics: Humans; Pharmacogenomic Testing; Pharmacogenetics; High-Throughput Nucleotide Sequencing; Genomics; Computational Biology
PubMed: 38540411
DOI: 10.3390/genes15030352 -
Scientific Reports Jan 2024The paper focuses on the correction of Illumina WGS sequencing reads. We provide an extensive evaluation of the existing correctors. To this end, we measure an impact of...
The paper focuses on the correction of Illumina WGS sequencing reads. We provide an extensive evaluation of the existing correctors. To this end, we measure an impact of the correction on variant calling (VC) as well as de novo assembly. It shows, that in selected cases read correction improves the VC results quality. We also examine the algorithms behaviour in a processing of Illumina NovaSeq reads, with different reads quality characteristics than in older sequencers. We show that most of the algorithms are ready to cope with such reads. Finally, we introduce a new version of RECKONER, our read corrector, by optimizing it and equipping with a new correction strategy. Currently, RECKONER allows to correct high-coverage human reads in less than 2.5 h, is able to cope with two types of reads errors: indels and substitutions, and utilizes a new, based on a two lengths of oligomers, correction verification technique.
Topics: Humans; Aged; Sequence Analysis, DNA; Algorithms; High-Throughput Nucleotide Sequencing; INDEL Mutation
PubMed: 38278837
DOI: 10.1038/s41598-024-52386-9 -
International Journal of Molecular... Jun 2024RNA sequencing (RNA-Seq) is a powerful technique and is increasingly being used in clinical research and drug development. Currently, several RNA-Seq methods have been... (Comparative Study)
Comparative Study
RNA sequencing (RNA-Seq) is a powerful technique and is increasingly being used in clinical research and drug development. Currently, several RNA-Seq methods have been developed. However, the relative advantage of each method for degraded RNA and low-input RNA, such as RNA samples collected in the field of clinical setting, has remained unknown. The Standard method of RNA-Seq captures mRNA by poly(A) capturing using Oligo dT beads, which is not suitable for degraded RNA. Here, we used three commercially available RNA-Seq library preparation kits (SMART-Seq, xGen Broad-range, and RamDA-Seq) using random primer instead of Oligo dT beads. To evaluate the performance of these methods, we compared the correlation, the number of detected expressing genes, and the expression levels with the Standard RNA-Seq method. Although the performance of RamDA-Seq was similar to that of Standard RNA-Seq, the performance for low-input RNA and degraded RNA has decreased. The performance of SMART-Seq was better than xGen and RamDA-Seq in low-input RNA and degraded RNA. Furthermore, the depletion of ribosomal RNA (rRNA) improved the performance of SMART-Seq and xGen due to increased expression levels. SMART-Seq with rRNA depletion has relative advantages for RNA-Seq using low-input and degraded RNA.
Topics: Sequence Analysis, RNA; Humans; RNA Stability; Gene Expression Profiling; High-Throughput Nucleotide Sequencing; RNA; RNA, Ribosomal; RNA, Messenger; RNA-Seq
PubMed: 38892331
DOI: 10.3390/ijms25116143 -
Methods in Molecular Biology (Clifton,... 2024Next-generation sequencing (NGS) has transformed genomics by allowing researchers to sequence DNA and RNA at highest speed, accuracy, and cost-effectiveness. Researchers...
Next-generation sequencing (NGS) has transformed genomics by allowing researchers to sequence DNA and RNA at highest speed, accuracy, and cost-effectiveness. Researchers investigate DNA interactions with the help next-generation sequencing with a great deal of information. Over the last decade, NGS technologies have advanced significantly, with the development of several platforms, including Illumina, PacBio, and Oxford Nanopore, each offering distinct advantages and uses. The use of next-generation sequencing (NGS) has aided in the discovery of genetic variations, gene expression patterns, and epigenetic modifications connected with a variety of diseases, including cancer, neurological disorders, and infectious diseases. By identifying these regions, we can control the expression of genes, cellular signaling pathways, and other key biological processes. NGS is an effective method for researching DNA interactions that has completely transformed the area of genomics. NGS has also played an important part in personalized medicine, enabling the discovery of disease-causing mutations and the creation of targeted medicines. Finally, NGS has transformed the field of genomics, resulting in new discoveries and applications in medicine, environmental sciences, and other fields.
Topics: Sequence Analysis, DNA; Genomics; Precision Medicine; DNA; High-Throughput Nucleotide Sequencing
PubMed: 37803122
DOI: 10.1007/978-1-0716-3461-5_14 -
Frontiers in Immunology 2024A complete understanding of disease pathophysiology in advanced liver disease is hampered by the challenges posed by clinical specimen collection. Notably, in these...
BACKGROUND AND AIMS
A complete understanding of disease pathophysiology in advanced liver disease is hampered by the challenges posed by clinical specimen collection. Notably, in these patients, a transjugular liver biopsy (TJB) is the only safe way to obtain liver tissue. However, it remains unclear whether successful sequencing of this extremely small and fragile tissue can be achieved for downstream characterization of the hepatic landscape.
METHODS
Here we leveraged in-house available single-cell RNA-sequencing (scRNA-seq) and single-nucleus (snRNA-seq) technologies and accompanying tissue processing protocols and performed an in-patient comparison on TJB's from decompensated cirrhosis patients (n = 3).
RESULTS
We confirmed a high concordance between nuclear and whole cell transcriptomes and captured 31,410 single nuclei and 6,152 single cells, respectively. The two platforms revealed similar diversity since all 8 major cell types could be identified, albeit with different cellular proportions thereof. Most importantly, hepatocytes were most abundant in snRNA-seq, while lymphocyte frequencies were elevated in scRNA-seq. We next focused our attention on hepatic myeloid cells due to their key role in injury and repair during chronic liver disease. Comparison of their transcriptional signatures indicated that these were largely overlapping between the two platforms. However, the scRNA-seq platform failed to recover sufficient Kupffer cell numbers, and other monocytes/macrophages featured elevated expression of stress-related parameters.
CONCLUSION
Our results indicate that single-nucleus transcriptome sequencing provides an effective means to overcome complications associated with clinical specimen collection and could sufficiently profile all major hepatic cell types including all myeloid cell subsets.
Topics: Humans; Gene Expression Profiling; Sequence Analysis, RNA; High-Throughput Nucleotide Sequencing; RNA, Small Nuclear; Liver Diseases; Liver Cirrhosis
PubMed: 38380322
DOI: 10.3389/fimmu.2024.1346520 -
ACS Synthetic Biology Dec 2023A comprehensive error analysis of DNA-stored data during processing, such as DNA synthesis and sequencing, is crucial for reliable DNA data storage. Both synthesis and...
A comprehensive error analysis of DNA-stored data during processing, such as DNA synthesis and sequencing, is crucial for reliable DNA data storage. Both synthesis and sequencing errors depend on the sequence and the transition of bases of nucleotides; ignoring either one of the error sources leads to technical challenges in minimizing the error rate. Here, we present a methodology and toolkit that utilizes an oligonucleotide library generated from a 10-base-shifted sequence array, which is individually labeled with unique molecular identifiers, to delineate and profile DNA synthesis and sequencing errors simultaneously. This methodology enables position- and sequence-independent error profiling of both DNA synthesis and sequencing. Using this toolkit, we report base transitional errors in both synthesis and sequencing in general DNA data storage as well as degenerate-base-augmented DNA data storage. The methodology and data presented will contribute to the development of DNA sequence designs with minimal error.
Topics: Sequence Analysis, DNA; High-Throughput Nucleotide Sequencing; DNA; DNA Replication; Nucleotides
PubMed: 37961855
DOI: 10.1021/acssynbio.3c00308