-
CUSHAW3: sensitive and accurate base-space and color-space short-read alignment with hybrid seeding.PloS One 2014The majority of next-generation sequencing short-reads can be properly aligned by leading aligners at high speed. However, the alignment quality can still be further... (Comparative Study)
Comparative Study
The majority of next-generation sequencing short-reads can be properly aligned by leading aligners at high speed. However, the alignment quality can still be further improved, since usually not all reads can be correctly aligned to large genomes, such as the human genome, even for simulated data. Moreover, even slight improvements in this area are important but challenging, and usually require significantly more computational endeavor. In this paper, we present CUSHAW3, an open-source parallelized, sensitive and accurate short-read aligner for both base-space and color-space sequences. In this aligner, we have investigated a hybrid seeding approach to improve alignment quality, which incorporates three different seed types, i.e. maximal exact match seeds, exact-match k-mer seeds and variable-length seeds, into the alignment pipeline. Furthermore, three techniques: weighted seed-pairing heuristic, paired-end alignment pair ranking and read mate rescuing have been conceived to facilitate accurate paired-end alignment. For base-space alignment, we have compared CUSHAW3 to Novoalign, CUSHAW2, BWA-MEM, Bowtie2 and GEM, by aligning both simulated and real reads to the human genome. The results show that CUSHAW3 consistently outperforms CUSHAW2, BWA-MEM, Bowtie2 and GEM in terms of single-end and paired-end alignment. Furthermore, our aligner has demonstrated better paired-end alignment performance than Novoalign for short-reads with high error rates. For color-space alignment, CUSHAW3 is consistently one of the best aligners compared to SHRiMP2 and BFAST. The source code of CUSHAW3 and all simulated data are available at http://cushaw3.sourceforge.net.
Topics: Base Sequence; Computational Biology; Computer Simulation; High-Throughput Nucleotide Sequencing; Sequence Alignment; Software
PubMed: 24466273
DOI: 10.1371/journal.pone.0086869 -
Nucleic Acids Research Feb 2020The double-helical structure of DNA results from canonical base pairing and stacking interactions. However, variations from steady-state conformations resulting from...
The double-helical structure of DNA results from canonical base pairing and stacking interactions. However, variations from steady-state conformations resulting from mechanical perturbations in cells have physiological relevance but their dependence on sequence remains unclear. Here, we use molecular dynamics simulations showing sequence differences result in markedly different structural motifs upon physiological twisting and stretching. We simulate overextension on different sequences of DNA ((AA)12, (AT)12, (CC)12 and (CG)12) with supercoiling densities at 200 and 50 mM salt concentrations. We find that DNA denatures in the majority of stretching simulations, surprisingly including those with over-twisted DNA. GC-rich sequences are observed to be more stable than AT-rich ones, with the specific response dependent on the base pair order. Furthermore, we find that (AT)12 forms stable periodic structures with non-canonical hydrogen bonds in some regions and non-canonical stacking in others, whereas (CG)12 forms a stacking motif of four base pairs independent of supercoiling density. Our results demonstrate that 20-30% DNA extension is sufficient for breaking B-DNA around and significantly above cellular supercoiling, and that the DNA sequence is crucial for understanding structural changes under mechanical stress. Our findings have important implications for the activities of protein machinery interacting with DNA in all cells.
Topics: Base Pairing; Base Sequence; Biophysical Phenomena; DNA; GC Rich Sequence; Hydrogen Bonding; Molecular Dynamics Simulation; Molecular Structure; Nucleic Acid Conformation
PubMed: 31930331
DOI: 10.1093/nar/gkz1227 -
Future Microbiology Nov 2012Evolution of bacterial pathogen populations has been detected in a variety of ways including phenotypic tests, such as metabolic activity, reaction to antisera and drug... (Review)
Review
Evolution of bacterial pathogen populations has been detected in a variety of ways including phenotypic tests, such as metabolic activity, reaction to antisera and drug resistance and genotypic tests that measure variation in chromosome structure, repetitive loci and individual gene sequences. While informative, these methods only capture a small subset of the total variation and, therefore, have limited resolution. Advances in sequencing technologies have made it feasible to capture whole-genome sequence variation for each sample under study, providing the potential to detect all changes at all positions in the genome from single nucleotide changes to large-scale insertions and deletions. In this review, we focus on recent work that has applied this powerful new approach and summarize some of the advances that this has brought in our understanding of the details of how bacterial pathogens evolve.
Topics: Bacteria; Bacterial Proteins; Base Sequence; Evolution, Molecular; Genome, Bacterial; Polymorphism, Single Nucleotide; Sequence Analysis, DNA
PubMed: 23075447
DOI: 10.2217/fmb.12.108 -
Journal of Bacteriology Oct 1988The ref gene of bacteriophage P1 stimulates recombination between two defective lacZ genes in the Escherichia coli chromosome (lac x lac recombination) and certain other...
The ref gene of bacteriophage P1 stimulates recombination between two defective lacZ genes in the Escherichia coli chromosome (lac x lac recombination) and certain other RecA-dependent recombination processes. We determined the DNA sequence of the 5' portion of the ref gene and tested various regions for functionality by inserting DNA fragments lacking increasing amounts of 5' sequence into plasmid and lambda phage vectors and measuring the ability of the constructs to stimulate lac x lac recombination. The region found essential for Ref activity in the absence of external heterologous promoters encodes two presumptive promoters, pref-1 and pref-2, whose -10 regions fall in a nearly perfect 13-base-pair (bp) tandem repeat. The -10 region of the putative pref-1 is part of a phage P1 c1 repressor recognition sequence. The first two ATG codons in the ref reading frame are, respectively, 90 and 216 bp downstream from the putative promoter-operator region. Deletion analysis indicated that translation can initiate at either ATG (although neither is associated with a canonical ribosome-binding sequence) and that the 42 amino acids in between are not indispensable for Ref stimulation of lac x lac recombination. However, the shorter reading frame appears to encode a less active polypeptide. The 91-bp leader region between the putative promoter-operator and the first ATG contains 30 codons in frame with the ref structural sequence, but its frame can be shifted without affecting Ref activity. The leader region ends with an apparent rho-independent termination sequence (attenuator). Deletion of 18 bp of early leader sequence drastically reduced Ref activity, even when ref was driven by a heterologous promoter (plac). An 8-bp internal deletion in the putative attenuator sequence relieved this requirement for the early leader sequence. This latter observation, along with nucleotide complementarity between portions of the early leader and attenuator sequences, are consistent with preemption of attenuation by the early leader.
Topics: Base Sequence; Chromosome Deletion; Coliphages; DNA Mutational Analysis; Gene Expression Regulation; Genes, Regulator; Genes, Viral; Hydrogen Bonding; Molecular Sequence Data; Molecular Structure; Operator Regions, Genetic; Promoter Regions, Genetic; Recombination, Genetic; Regulatory Sequences, Nucleic Acid; Terminator Regions, Genetic
PubMed: 3170487
DOI: 10.1128/jb.170.10.4881-4889.1988 -
The Journal of Biological Chemistry Apr 1990
Review
Topics: Animals; Base Sequence; Chromosomes; DNA; DNA Nucleotidyltransferases; Molecular Sequence Data; Repetitive Sequences, Nucleic Acid
PubMed: 2180936
DOI: No ID Found -
Genome Biology Jun 2016An association between hammerhead ribozymes and non-autonomous, long terminal repeat retrotransposons is uncovered in plants, shedding light on the biological function...
An association between hammerhead ribozymes and non-autonomous, long terminal repeat retrotransposons is uncovered in plants, shedding light on the biological function of genomically encoded ribozymes.
Topics: Base Sequence; Nucleic Acid Conformation; RNA, Catalytic; RNA, Viral; Retroelements; Terminal Repeat Sequences
PubMed: 27339278
DOI: 10.1186/s13059-016-1007-z -
Nature Reviews. Neuroscience Jun 2016A nucleotide repeat expansion (NRE) within the chromosome 9 open reading frame 72 (C9orf72) gene was the first of this type of mutation to be linked to multiple... (Review)
Review
A nucleotide repeat expansion (NRE) within the chromosome 9 open reading frame 72 (C9orf72) gene was the first of this type of mutation to be linked to multiple neurological conditions, including amyotrophic lateral sclerosis and frontotemporal dementia. The pathogenic mechanisms through which the C9orf72 NRE contributes to these disorders include loss of C9orf72 function and gain-of-function mechanisms of C9orf72 driven by toxic RNA and protein species encoded by the NRE. These mechanisms have been linked to several cellular defects - including nucleocytoplasmic trafficking deficits and nuclear stress - that have been observed in both patients and animal models.
Topics: Animals; Base Sequence; C9orf72 Protein; Humans; Neurodegenerative Diseases; Proteins; Trinucleotide Repeat Expansion
PubMed: 27150398
DOI: 10.1038/nrn.2016.38 -
FEMS Microbiology Reviews Jan 2000On the basis of established knowledge of microbial genetics one can distinguish three major natural strategies in the spontaneous generation of genetic variations in... (Review)
Review
On the basis of established knowledge of microbial genetics one can distinguish three major natural strategies in the spontaneous generation of genetic variations in bacteria. These strategies are: (1) small local changes in the nucleotide sequence of the genome, (2) intragenomic reshuffling of segments of genomic sequences and (3) the acquisition of DNA sequences from another organism. The three general strategies differ in the quality of their contribution to microbial evolution. Besides a number of non-genetic factors, various specific gene products are involved in the generation of genetic variation and in the modulation of the frequency of genetic variation. The underlying genes are called evolution genes. They act for the benefit of the biological evolution of populations as opposed to the action of housekeeping genes and accessory genes which are for the benefit of individuals. Examples of evolution genes acting as variation generators are found in the transposition of mobile genetic elements and in so-called site-specific recombination systems. DNA repair systems and restriction-modification systems are examples of modulators of the frequency of genetic variation. The involvement of bacterial viruses and of plasmids in DNA reshuffling and in horizontal gene transfer is a hint for their evolutionary functions. Evolution genes are thought to undergo biological evolution themselves, but natural selection for their functions is indirect, at the level of populations, and is called second-order selection. In spite of an involvement of gene products in the generation of genetic variations, evolution genes do not programmatically direct evolution towards a specific goal. Rather, a steady interplay between natural selection and mixed populations of genetic variants gives microbial evolution its direction.
Topics: Bacteria; Base Sequence; DNA Repair; DNA Transposable Elements; Evolution, Molecular; Gene Rearrangement; Gene Transfer, Horizontal; Genetic Variation; Genome, Bacterial; Recombination, Genetic
PubMed: 10640595
DOI: 10.1111/j.1574-6976.2000.tb00529.x -
PloS One 2013The tertiary motifs in complex RNA molecules play vital roles to either stabilize the formation of RNA 3D structure or to provide important biological functionality to...
The tertiary motifs in complex RNA molecules play vital roles to either stabilize the formation of RNA 3D structure or to provide important biological functionality to the molecule. In order to better understand the roles of these tertiary motifs in riboswitches, we examined 11 representative riboswitch PDB structures for potential agreement of both motif occurrences and conservations. A total of 61 unique tertiary interactions were found in the reference structures. In addition to the expected common A-minor motifs and base-triples mainly involved in linking distant regions the riboswitch structures three highly conserved variants of A-minor interactions called G-minors were found in the SAM-I and FMN riboswitches where they appear to be involved in the recognition of the respective ligand's functional groups. From our structural survey as well as corresponding structure and sequence alignments, the agreement between motif occurrences and conservations are very prominent across the representative riboswitches. Our analysis provide evidence that some of these tertiary interactions are essential components to form the structure where their sequence positions are conserved despite a high degree of diversity in other parts of the respective riboswitches sequences. This is indicative of a vital role for these tertiary interactions in determining the specific biological function of riboswitch.
Topics: Base Sequence; Conserved Sequence; Databases, Nucleic Acid; Genetic Variation; Models, Molecular; Molecular Sequence Annotation; Molecular Sequence Data; Nucleic Acid Conformation; Nucleotide Motifs; RNA, Messenger; Riboswitch; Sequence Alignment
PubMed: 24040136
DOI: 10.1371/journal.pone.0073984 -
BMC Bioinformatics Feb 2021An inverted repeat is a DNA sequence followed downstream by its reverse complement, potentially with a gap in the centre. Inverted repeats are found in both prokaryotic...
BACKGROUND
An inverted repeat is a DNA sequence followed downstream by its reverse complement, potentially with a gap in the centre. Inverted repeats are found in both prokaryotic and eukaryotic genomes and they have been linked with countless possible functions. Many international consortia provide a comprehensive description of common genetic variation making alternative sequence representations, such as IUPAC encoding, necessary for leveraging the full potential of such broad variation datasets.
RESULTS
We present IUPACPAL, an exact tool for efficient identification of inverted repeats in IUPAC-encoded DNA sequences allowing also for potential mismatches and gaps in the inverted repeats.
CONCLUSION
Within the parameters that were tested, our experimental results show that IUPACPAL compares favourably to a similar application packaged with EMBOSS. We show that IUPACPAL identifies many previously unidentified inverted repeats when compared with EMBOSS, and that this is also performed with orders of magnitude improved speed.
Topics: Base Sequence; Genome; Inverted Repeat Sequences; Prokaryotic Cells; Repetitive Sequences, Nucleic Acid
PubMed: 33549041
DOI: 10.1186/s12859-021-03983-2