-
Applied and Environmental Microbiology Jan 2020More than 10 years ago, we published the paper describing the mothur software package in Our goal was to create a comprehensive package that allowed users to analyze... (Review)
Review
More than 10 years ago, we published the paper describing the mothur software package in Our goal was to create a comprehensive package that allowed users to analyze amplicon sequence data using the most robust methods available. mothur has helped lead the community through the ongoing sequencing revolution and continues to provide this service to the microbial ecology community. Beyond its success and impact on the field, mothur's development exposed a series of observations that are generally translatable across science. Perhaps the observation that stands out the most is that all science is done in the context of prevailing ideas and available technologies. Although it is easy to criticize choices that were made 10 years ago through a modern lens, if we were to wait for all of the possible limitations to be solved before proceeding, science would stall. Even preceding the development of mothur, it was necessary to address the most important problems and work backwards to other problems that limited access to robust sequence analysis tools. At the same time, we strive to expand mothur's capabilities in a data-driven manner to incorporate new ideas and accommodate changes in data and desires of the research community. It has been edifying to see the benefit that a simple set of tools can bring to so many other researchers.
Topics: Environmental Microbiology; Sequence Analysis; Software
PubMed: 31704678
DOI: 10.1128/AEM.02343-19 -
Genes Jan 2019The adoption of single molecule real-time (SMRT) sequencing [...].
The adoption of single molecule real-time (SMRT) sequencing [...].
Topics: Animals; High-Throughput Nucleotide Sequencing; Humans; Plants; Sensitivity and Specificity; Sequence Analysis, DNA
PubMed: 30621217
DOI: 10.3390/genes10010024 -
Briefings in Bioinformatics May 2014With the development of next-generation sequencing (NGS) technologies, a large amount of short read data has been generated. Assembly of these short reads can be... (Review)
Review
With the development of next-generation sequencing (NGS) technologies, a large amount of short read data has been generated. Assembly of these short reads can be challenging for genomes and metagenomes without template sequences, making alignment-based genome sequence comparison difficult. In addition, sequence reads from NGS can come from different regions of various genomes and they may not be alignable. Sequence signature-based methods for genome comparison based on the frequencies of word patterns in genomes and metagenomes can potentially be useful for the analysis of short reads data from NGS. Here we review the recent development of alignment-free genome and metagenome comparison based on the frequencies of word patterns with emphasis on the dissimilarity measures between sequences, the statistical power of these measures when two sequences are related and the applications of these measures to NGS data.
Topics: Algorithms; Computational Biology; Genomics; High-Throughput Nucleotide Sequencing; Markov Chains; Models, Statistical; Sequence Alignment; Sequence Analysis
PubMed: 24064230
DOI: 10.1093/bib/bbt067 -
International Journal of Molecular... Aug 2021Aptamers feature a number of advantages, compared to antibodies. However, their application has been limited so far, mainly because of the complex selection process.... (Review)
Review
Aptamers feature a number of advantages, compared to antibodies. However, their application has been limited so far, mainly because of the complex selection process. 'High-throughput sequencing fluorescent ligand interaction profiling' (HiTS-FLIP) significantly increases the selection efficiency and is consequently a very powerful and versatile technology for the selection of high-performance aptamers. It is the first experiment to allow the direct and quantitative measurement of the affinity and specificity of millions of aptamers simultaneously by harnessing the potential of optical next-generation sequencing platforms to perform fluorescence-based binding assays on the clusters displayed on the flow cells and determining their sequence and position in regular high-throughput sequencing. Many variants of the experiment have been developed that allow automation and in situ conversion of DNA clusters into base-modified DNA, RNA, peptides, and even proteins. In addition, the information from mutational assays, performed with HiTS-FLIP, provides deep insights into the relationship between the sequence, structure, and function of aptamers. This enables a detailed understanding of the sequence-specific rules that determine affinity, and thus, supports the evolution of aptamers. Current variants of the HiTS-FLIP experiment and its application in the field of aptamer selection, characterisation, and optimisation are presented in this review.
Topics: Aptamers, Nucleotide; Automation, Laboratory; High-Throughput Nucleotide Sequencing; Mutagenesis; Optical Devices; Sequence Analysis, DNA
PubMed: 34502110
DOI: 10.3390/ijms22179202 -
Briefings in Bioinformatics May 2014Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also... (Review)
Review
Among alignment-free methods, Iterated Maps (IMs) are on a particular extreme: they are also scale free (order free). The use of IMs for sequence analysis is also distinct from other alignment-free methodologies in being rooted in statistical mechanics instead of computational linguistics. Both of these roots go back over two decades to the use of fractal geometry in the characterization of phase-space representations. The time series analysis origin of the field is betrayed by the title of the manuscript that started this alignment-free subdomain in 1990, 'Chaos Game Representation'. The clash between the analysis of sequences as continuous series and the better established use of Markovian approaches to discrete series was almost immediate, with a defining critique published in same journal 2 years later. The rest of that decade would go by before the scale-free nature of the IM space was uncovered. The ensuing decade saw this scalability generalized for non-genomic alphabets as well as an interest in its use for graphic representation of biological sequences. Finally, in the past couple of years, in step with the emergence of BigData and MapReduce as a new computational paradigm, there is a surprising third act in the IM story. Multiple reports have described gains in computational efficiency of multiple orders of magnitude over more conventional sequence analysis methodologies. The stage appears to be now set for a recasting of IMs with a central role in processing nextgen sequencing results.
Topics: Computational Biology; Fractals; Models, Statistical; Nonlinear Dynamics; Sequence Alignment; Sequence Analysis
PubMed: 24162172
DOI: 10.1093/bib/bbt072 -
Microbiology Spectrum Jan 2019In the decade and a half since the introduction of next-generation sequencing (NGS), the technical feasibility, cost, and overall utility of sequencing have changed... (Review)
Review
In the decade and a half since the introduction of next-generation sequencing (NGS), the technical feasibility, cost, and overall utility of sequencing have changed dramatically, including applications for infectious disease epidemiology. Massively parallel sequencing technologies have decreased the cost of sequencing by more than 6 orders or magnitude over this time, with a corresponding increase in data generation and complexity. This review provides an overview of the basic principles, chemistry, and operational mechanics of current sequencing technologies, including both conventional Sanger and NGS approaches. As the generation of large amounts of sequence data becomes increasingly routine, the role of bioinformatics in data analysis and reporting becomes all the more critical, and the successful deployment of NGS in public health settings requires careful consideration of changing information technology, bioinformatics, workforce, and regulatory requirements. While there remain important challenges to the sustainable implementation of NGS in public health, in terms of both laboratory and bioinformatics capacity, the impact of these technologies on infectious disease surveillance and outbreak investigations has been nothing short of revolutionary. Understanding the important role that NGS plays in modern public health laboratory practice is critical, as is the need to ensure appropriate workforce, infrastructure, facilities, and funding consideration for routine NGS applications, future innovation, and rapidly scaling NGS-based infectious disease surveillance and outbreak response activities. *This article is part of a curated collection.
Topics: Computational Biology; DNA; Data Analysis; Gene Library; High-Throughput Nucleotide Sequencing; Humans; Sequence Analysis, DNA
PubMed: 30737915
DOI: 10.1128/microbiolspec.AME-0005-2018 -
BMC Bioinformatics Jan 2022Quality control checks are the first step in RNA-Sequencing analysis, which enable the identification of common issues that occur in the sequenced reads. Checks for...
BACKGROUND
Quality control checks are the first step in RNA-Sequencing analysis, which enable the identification of common issues that occur in the sequenced reads. Checks for sequence quality, contamination, and complexity are commonplace, and allow users to implement steps downstream which can account for these issues. Strand-specificity of reads is frequently overlooked and is often unavailable even in published data, yet when unknown or incorrectly specified can have detrimental effects on the reproducibility and accuracy of downstream analyses.
RESULTS
To address these issues, we developed how_are_we_stranded_here, a Python library that helps to quickly infer strandedness of paired-end RNA-Sequencing data. Testing on both simulated and real RNA-Sequencing reads showed that it correctly measures strandedness, and measures outside the normal range may indicate sample contamination.
CONCLUSIONS
how_are_we_stranded_here is fast and user friendly, making it easy to implement in quality control pipelines prior to analysing RNA-Sequencing data. how_are_we_stranded_here is freely available at https://github.com/betsig/how_are_we_stranded_here .
Topics: High-Throughput Nucleotide Sequencing; RNA-Seq; Reproducibility of Results; Sequence Analysis, DNA; Sequence Analysis, RNA; Software
PubMed: 35065593
DOI: 10.1186/s12859-022-04572-7 -
Developmental Medicine and Child... Aug 2011
Topics: Genomics; Humans; Sequence Analysis, DNA
PubMed: 21679360
DOI: 10.1111/j.1469-8749.2011.04016.x -
BMC Bioinformatics Jan 2022Recent development of bioinformatics tools for Next Generation Sequencing data has facilitated complex analyses and prompted large scale experimental designs for...
BACKGROUND
Recent development of bioinformatics tools for Next Generation Sequencing data has facilitated complex analyses and prompted large scale experimental designs for comparative genomics. When combined with the advances in network inference tools, this can lead to powerful methodologies for mining genomics data, allowing development of pipelines that stretch from sequence reads mapping to network inference. However, integrating various methods and tools available over different platforms requires a programmatic framework to fully exploit their analytic capabilities. Integrating multiple genomic analysis tools faces challenges from standardization of input and output formats, normalization of results for performing comparative analyses, to developing intuitive and easy to control scripts and interfaces for the genomic analysis pipeline.
RESULTS
We describe here NetSeekR, a network analysis R package that includes the capacity to analyze time series of RNA-Seq data, to perform correlation and regulatory network inferences and to use network analysis methods to summarize the results of a comparative genomics study. The software pipeline includes alignment of reads, differential gene expression analysis, correlation network analysis, regulatory network analysis, gene ontology enrichment analysis and network visualization of differentially expressed genes. The implementation provides support for multiple RNA-Seq read mapping methods and allows comparative analysis of the results obtained by different bioinformatics methods.
CONCLUSION
Our methodology increases the level of integration of genomics data analysis tools to network inference, facilitating hypothesis building, functional analysis and genomics discovery from large scale NGS data. When combined with network analysis and simulation tools, the pipeline allows for developing systems biology methods using large scale genomics data.
Topics: Computational Biology; High-Throughput Nucleotide Sequencing; RNA-Seq; Sequence Analysis, RNA; Time Factors
PubMed: 35090393
DOI: 10.1186/s12859-021-04554-1 -
GigaScience Dec 2020Following the miniaturization of integrated circuitry and other computer hardware over the past several decades, DNA sequencing is on a similar path. Leading this trend...
BACKGROUND
Following the miniaturization of integrated circuitry and other computer hardware over the past several decades, DNA sequencing is on a similar path. Leading this trend is the Oxford Nanopore sequencing platform, which currently offers the hand-held MinION instrument and even smaller instruments on the horizon. This technology has been used in several important applications, including the analysis of genomes of major pathogens in remote stations around the world. However, despite the simplicity of the sequencer, an equally simple and portable analysis platform is not yet available.
RESULTS
iGenomics is the first comprehensive mobile genome analysis application, with capabilities to align reads, call variants, and visualize the results entirely on an iOS device. Implemented in Objective-C using the FM-index, banded dynamic programming, and other high-performance bioinformatics techniques, iGenomics is optimized to run in a mobile environment. We benchmark iGenomics using a variety of real and simulated Nanopore sequencing datasets of viral and bacterial genomes and show that iGenomics has performance comparable to the popular BWA-MEM/SAMtools/IGV suite, without necessitating a laptop or server cluster.
CONCLUSIONS
iGenomics is available open source (https://github.com/stuckinaboot/iGenomics) and for free on Apple's App Store (https://apple.co/2HCplzr).
Topics: Computational Biology; Genome, Bacterial; High-Throughput Nucleotide Sequencing; Nanopore Sequencing; Nanopores; Sequence Analysis, DNA; Smartphone
PubMed: 33284326
DOI: 10.1093/gigascience/giaa138