-
Briefings in Bioinformatics Nov 2019With the recent advances in DNA sequencing technologies, the study of the genetic composition of living organisms has become more accessible for researchers. Several... (Review)
Review
MOTIVATION
With the recent advances in DNA sequencing technologies, the study of the genetic composition of living organisms has become more accessible for researchers. Several advances have been achieved because of it, especially in the health sciences. However, many challenges which emerge from the complexity of sequencing projects remain unsolved. Among them is the task of assembling DNA fragments from previously unsequenced organisms, which is classified as an NP-hard (nondeterministic polynomial time hard) problem, for which no efficient computational solution with reasonable execution time exists. However, several tools that produce approximate solutions have been used with results that have facilitated scientific discoveries, although there is ample room for improvement. As with other NP-hard problems, machine learning algorithms have been one of the approaches used in recent years in an attempt to find better solutions to the DNA fragment assembly problem, although still at a low scale.
RESULTS
This paper presents a broad review of pioneering literature comprising artificial intelligence-based DNA assemblers-particularly the ones that use machine learning-to provide an overview of state-of-the-art approaches and to serve as a starting point for further study in this field.
Topics: Algorithms; Genome; High-Throughput Nucleotide Sequencing; Machine Learning; Sequence Analysis, DNA
PubMed: 30137230
DOI: 10.1093/bib/bby072 -
Nature Reviews. Neurology Nov 2019Charcot-Marie-Tooth disease and the related disorders hereditary motor neuropathy and hereditary sensory neuropathy, collectively termed CMT, are the commonest group of... (Review)
Review
Charcot-Marie-Tooth disease and the related disorders hereditary motor neuropathy and hereditary sensory neuropathy, collectively termed CMT, are the commonest group of inherited neuromuscular diseases, and they exhibit wide phenotypic and genetic heterogeneity. CMT is usually characterized by distal muscle atrophy, often with foot deformity, weakness and sensory loss. In the past decade, next-generation sequencing (NGS) technologies have revolutionized genomic medicine and, as these technologies are being applied to clinical practice, they are changing our diagnostic approach to CMT. In this Review, we discuss the application of NGS technologies, including disease-specific gene panels, whole-exome sequencing, whole-genome sequencing (WGS), mitochondrial sequencing and high-throughput transcriptome sequencing, to the diagnosis of CMT. We discuss the growing challenge of variant interpretation and consider how the clinical phenotype can be combined with genetic, bioinformatic and functional evidence to assess the pathogenicity of genetic variants in patients with CMT. WGS has several advantages over the other techniques that we discuss, which include unparalleled coverage of coding, non-coding and intergenic areas of both nuclear and mitochondrial genomes, the ability to identify structural variants and the opportunity to perform genome-wide dense homozygosity mapping. We propose an algorithm for incorporating WGS into the CMT diagnostic pathway.
Topics: Charcot-Marie-Tooth Disease; Genetic Testing; Genetic Variation; High-Throughput Nucleotide Sequencing; Humans
PubMed: 31582811
DOI: 10.1038/s41582-019-0254-5 -
Methods in Molecular Biology (Clifton,... 2021A-to-I RNA editing in humans plays a relevant role since it can influence gene expression and increase proteome diversity. In addition, its deregulation has been linked...
A-to-I RNA editing in humans plays a relevant role since it can influence gene expression and increase proteome diversity. In addition, its deregulation has been linked to a variety of human diseases, including neurological disorders and cancer.In the last decade, massive transcriptome sequencing through the RNAseq technology has dramatically improved the investigation of RNA editing at single nucleotide resolution. Nowadays, different bioinformatics resources to discover and/or collect A-to-I events have been released. Hereafter, we initially provide an overview of the state-of-the-art RNA editing databases and, then, we focus on REDIportal, the largest collection of A-to-I events with more than 4.5 million sites from 2660 humans GTEx samples.
Topics: Animals; Computational Biology; Databases, Genetic; Genome, Human; Genomics; High-Throughput Nucleotide Sequencing; Humans; Internet; RNA Editing; Sequence Analysis, RNA; Software; Transcriptome; User-Computer Interface
PubMed: 33835458
DOI: 10.1007/978-1-0716-1307-8_25 -
Methods in Molecular Biology (Clifton,... 2023Multiplexed inter-simple sequence repeat (ISSR) genotyping by sequencing (MIG-seq) is a simple, rapid, and inexpensive method for detecting single-nucleotide...
Multiplexed inter-simple sequence repeat (ISSR) genotyping by sequencing (MIG-seq) is a simple, rapid, and inexpensive method for detecting single-nucleotide polymorphisms (SNPs) using next-generation sequencing (NGS). The advantages of MIG-seq include easy application to various species without prior genetic information. In addition, this method opens the door to genome-wide nucleotide sequence analyses of low-quality and trace-level deoxyribonucleic acid (DNA) samples, which have previously been difficult to analyze. Another advantage is that the procedure is simple, time-saving, and inexpensive. Recently, MIG-seq has been applied to wild and cultivated plants and has produced novel results. Using invisible DNA information, questions related to gene flow through pollination and seed dispersal, the genetic structure and diversity of populations, clonality, and the hybridization of wild and cultivated plants are being rapidly answered. In this chapter, I present the results of plant research based on MIG-seq and describe the procedure for this method as a user of MIG-seq.
Topics: Genotype; Genome; High-Throughput Nucleotide Sequencing; Polymorphism, Single Nucleotide; Genotyping Techniques
PubMed: 36781659
DOI: 10.1007/978-1-0716-3024-2_29 -
Bioinformatics (Oxford, England) May 2022Regulatory elements (REs), such as enhancers and promoters, are known as regulatory sequences functional in a heterogeneous regulatory network to control gene expression...
MOTIVATION
Regulatory elements (REs), such as enhancers and promoters, are known as regulatory sequences functional in a heterogeneous regulatory network to control gene expression by recruiting transcription regulators and carrying genetic variants in a context specific way. Annotating those REs relies on costly and labor-intensive next-generation sequencing and RNA-guided editing technologies in many cellular contexts.
RESULTS
We propose a systematic Gene Ontology Annotation method for Regulatory Elements (RE-GOA) by leveraging the powerful word embedding in natural language processing. We first assemble a heterogeneous network by integrating context specific regulations, protein-protein interactions and gene ontology (GO) terms. Then we perform network embedding and associate regulatory elements with GO terms by assessing their similarity in a low dimensional vector space. With three applications, we show that RE-GOA outperforms existing methods in annotating TFs' binding sites from ChIP-seq data, in functional enrichment analysis of differentially accessible peaks from ATAC-seq data, and in revealing genetic correlation among phenotypes from their GWAS summary statistics data.
AVAILABILITY AND IMPLEMENTATION
The source code and the systematic RE annotation for human and mouse are available at https://github.com/AMSSwanglab/RE-GOA.
SUPPLEMENTARY INFORMATION
Supplementary data are available at Bioinformatics online.
Topics: Animals; Chromatin Immunoprecipitation Sequencing; High-Throughput Nucleotide Sequencing; Mice; Molecular Sequence Annotation; Promoter Regions, Genetic; Regulatory Sequences, Nucleic Acid
PubMed: 35561169
DOI: 10.1093/bioinformatics/btac185 -
Methods in Molecular Biology (Clifton,... 2020Next-generation sequencing (NGS), particularly RNA-sequencing (RNA-Seq) technique, allows detection and quantification of different RNA transcripts in a tissue sample,...
Next-generation sequencing (NGS), particularly RNA-sequencing (RNA-Seq) technique, allows detection and quantification of different RNA transcripts in a tissue sample, and in our case toxin transcripts from snake venom glands. Using this approach, novel toxin transcripts can be detected and abundancies of different isoforms of each toxin measured. The analytical pipeline can be briefly outlined as follows. Isolation of mRNA from tissue under RNase-free condition is essential to keep mRNA intact before sequencing. After mRNA fragmentation, the adapters are added to both ends of the fragments to synthesize complementary cDNAs. The obtained cDNA library is then sequenced on Illumina HiSeq 2000 platform. Quality of millions of reads produced from the NGS is checked and the sequences corresponding to the adapters and low-quality reads are removed. Subsequently, the NGS data are subjected to the workflow of de novo assembly, quantification of expression levels, annotation of transcripts, and identification of ORFs, signal peptides, structurally conserved domains, and functional motifs. In this report we describe the listed methodological steps and techniques in details and refer to the platforms and software that may be adopted for similar studies.
Topics: Animals; Exocrine Glands; Gene Library; High-Throughput Nucleotide Sequencing; Sequence Analysis, RNA; Snake Venoms
PubMed: 31576524
DOI: 10.1007/978-1-4939-9845-6_5 -
BMC Bioinformatics May 2022De novo genome assembly typically produces a set of contigs instead of the complete genome. Thus additional data such as genetic linkage maps, optical maps, or Hi-C data...
BACKGROUND
De novo genome assembly typically produces a set of contigs instead of the complete genome. Thus additional data such as genetic linkage maps, optical maps, or Hi-C data is needed to resolve the complete structure of the genome. Most of the previous work uses the additional data to order and orient contigs.
RESULTS
Here we introduce a framework to guide genome assembly with additional data. Our approach is based on clustering the reads, such that each read in each cluster originates from nearby positions in the genome according to the additional data. These sets are then assembled independently and the resulting contigs are further assembled in a hierarchical manner. We implemented our approach for genetic linkage maps in a tool called HGGA.
CONCLUSIONS
Our experiments on simulated and real Pacific Biosciences long reads and genetic linkage maps show that HGGA produces a more contiguous assembly with less contigs and from 1.2 to 9.8 times higher NGA50 or N50 than a plain assembly of the reads and 1.03 to 6.5 times higher NGA50 or N50 than a previous approach integrating genetic linkage maps with contig assembly. Furthermore, also the correctness of the assembly remains similar or improves as compared to an assembly using only the read data.
Topics: Genome; High-Throughput Nucleotide Sequencing; Sequence Analysis, DNA
PubMed: 35525918
DOI: 10.1186/s12859-022-04701-2 -
Current Opinion in Hematology Mar 2023The aim of this study was to provide insight into how novel next-generation sequencing (NGS) techniques are set to revolutionize clinical practice. (Review)
Review
PURPOSE OF REVIEW
The aim of this study was to provide insight into how novel next-generation sequencing (NGS) techniques are set to revolutionize clinical practice.
RECENT FINDINGS
Advances in sequencing technologies have focused on improved capture of mutations and reads and cellular resolution. Both short and long read DNA sequencing technology are being refined and combined in novel ways with other multiomic approaches to gain unprecedented biological insight into disease. Single-cell (sc)DNA-seq and integrated scDNA-seq with immunophenotyping provide granular information on disease composition such as clonal hierarchy, co-mutation status, zygosity, clonal diversity and genotype phenotype correlations. These and other techniques can identify rare cell populations providing the opportunity for increased sensitivity in measurable residual disease monitoring and precise characterization of residual clones permitting distinction of leukemic from pre/nonmalignant clones.
SUMMARY
Increasing genetics-based mechanistic insights and classification of myeloid diseases along with a decrease in the cost of high-throughput NGS mean novel sequencing technologies are closer to being a reality in standard clinical practice. These technologies are poised to improve diagnostics, our ability to monitor treatment response and minimal residual disease and allow the study of premalignant conditions such as clonal haematopoiesis.
Topics: Humans; Sequence Analysis, DNA; Mutation; High-Throughput Nucleotide Sequencing; Genetic Association Studies
PubMed: 36602939
DOI: 10.1097/MOH.0000000000000754 -
BMC Bioinformatics Jul 2020A key use of high throughput sequencing technology is the sequencing and assembly of full genome sequences. These genome assemblies are commonly assessed using... (Review)
Review
BACKGROUND
A key use of high throughput sequencing technology is the sequencing and assembly of full genome sequences. These genome assemblies are commonly assessed using statistics relating to contiguity of the assembly. Measures of contiguity are not strongly correlated with information about the biological completion or correctness of the assembly, and a commonly reported metric, N50, can be misleading. Over the years, multiple research groups have rejected the overuse of N50 and sought to develop more informative metrics.
RESULTS
This paper presents a review of problems that arise from relying solely on contiguity as a measure of genome assembly quality as well as current alternative methods. Alternative methods are compared on the basis of how informative they are about the biological quality of the assembly and how easy they are to use. A comprehensive method for using multiple metrics of measuring assembly quality is presented.
CONCLUSIONS
This study aims to report on the status of assembly assessment methods and compare them, as well as to offer a comprehensive method that incorporates multiple facets of quality assessment. Weaknesses and strengths of varying methods are presented and explained, with recommendations based on speed of analysis and user friendliness.
Topics: Genomics; High-Throughput Nucleotide Sequencing; Humans
PubMed: 32631298
DOI: 10.1186/s12859-020-3382-4 -
Methods in Molecular Biology (Clifton,... 2023Single-cell sequencing allows for the measurement of sequence information from individual cells with next-generation sequencing (NGS). However, its application to...
Single-cell sequencing allows for the measurement of sequence information from individual cells with next-generation sequencing (NGS). However, its application to third-generation sequencing platforms such as Oxford Nanopore has been challenging because of its lower basecalling accuracy. Here we describe the method to perform highly accurate single-cell COrrected Long-Read sequencing (scCOLOR-seq) by droplet-based encapsulation of cells and sequencing using the Oxford Nanopore Sequencing system.
Topics: High-Throughput Nucleotide Sequencing; Sequence Analysis, DNA; Nanopores
PubMed: 36781734
DOI: 10.1007/978-1-0716-2996-3_18