structural gene - OpenMD.com Journal Search

PIntron: a fast method for detecting the gene structure due to alternative splicing via maximal pairings of a pattern and a text.

BMC Bioinformatics Apr 2012

A challenging issue in designing computational methods for predicting the gene structure into exons and introns from a cluster of transcript (EST, mRNA) sequences, is...

Summary PubMed Full Text PDF

Authors: Yuri Pirola, Raffaella Rizzi, Ernesto Picardi...

BACKGROUND

A challenging issue in designing computational methods for predicting the gene structure into exons and introns from a cluster of transcript (EST, mRNA) sequences, is guaranteeing accuracy as well as efficiency in time and space, when large clusters of more than 20,000 ESTs and genes longer than 1 Mb are processed. Traditionally, the problem has been faced by combining different tools, not specifically designed for this task.

RESULTS

We propose a fast method based on ad hoc procedures for solving the problem. Our method combines two ideas: a novel algorithm of proved small time complexity for computing spliced alignments of a transcript against a genome, and an efficient algorithm that exploits the inherent redundancy of information in a cluster of transcripts to select, among all possible factorizations of EST sequences, those allowing to infer splice site junctions that are largely confirmed by the input data. The EST alignment procedure is based on the construction of maximal embeddings, that are sequences obtained from paths of a graph structure, called embedding graph, whose vertices are the maximal pairings of a genomic sequence T and an EST P. The procedure runs in time linear in the length of P and T and in the size of the output.The method was implemented into the PIntron package. PIntron requires as input a genomic sequence or region and a set of EST and/or mRNA sequences. Besides the prediction of the full-length transcript isoforms potentially expressed by the gene, the PIntron package includes a module for the CDS annotation of the predicted transcripts.

CONCLUSIONS

PIntron, the software tool implementing our methodology, is available at http://www.algolab.eu/PIntron under GNU AGPL. PIntron has been shown to outperform state-of-the-art methods, and to quickly process some critical genes. At the same time, PIntron exhibits high accuracy (sensitivity and specificity) when benchmarked with ENCODE annotations.

Topics: Algorithms; Alternative Splicing; Animals; Exons; Expressed Sequence Tags; Genomics; Humans; Introns; Sequence Alignment; Software

PubMed: 22537006
DOI: 10.1186/1471-2105-13-S5-S2

Myosin light-chain 1 and 3 gene has two structurally distinct and differentially regulated promoters evolving at different rates.

Molecular and Cellular Biology Nov 1985

DNA fragments located 10 kilobases apart in the genome and containing, respectively, the first myosin light chain 1 (MLC1f) and the first myosin light chain 3 (MLC3f)... (Comparative Study)

Summary PubMed Full Text PDF

Comparative Study

Authors: E E Strehler, M Periasamy, M A Strehler-Page...

DNA fragments located 10 kilobases apart in the genome and containing, respectively, the first myosin light chain 1 (MLC1f) and the first myosin light chain 3 (MLC3f) specific exon of the rat myosin light chain 1 and 3 gene, together with several hundred base pairs of upstream flanking sequences, have been shown in runoff in vitro transcription assays to direct initiation of transcription at the cap sites of MLC1f and MLC3f mRNAs used in vivo. These results establish the presence of two separate, functional promoters within that gene. A comparison of the nucleotide sequence of the rat MLC1f/3f gene with the corresponding sequences from mouse and chicken shows that: the MLC1f promoter regions have been highly conserved up to position -150 from the cap site while the MLC3f promoter regions display a very poor degree of homology and even the absence or poor conservation of typical eucaryotic promoter elements such as TATA and CAT boxes; the exon/intron structure of this gene has been completely conserved in the three species; and corresponding exons, except for the regions encoding most of the 5' and 3' untranslated sequences, show greater than 75% homology while corresponding introns are similar in size but considerably divergent in sequence. The above findings indicate that the overall structure of the MLC1f/3f genes has been maintained between avian and mammalian species and that these genes contain two functional and widely spaced promoters. The fact that the structures of the alkali light chain gene from Drosophila melanogaster and of other related genes of the troponin C supergene family resemble a MLC3f gene without an upstream promoter and first exon strongly suggests that the present-day MLC1f/3f genes of higher vertebrates arose from a primordial alkali light chain gene through the addition of a far-upstream MLC1f-specific promoter and first exon. The two promoters have evolved at different rates, with the MLC1f promoter being more conserved than the MLC3f promoter. This discrepant evolutionary rate might reflect different mechanisms of promoter activation for the transcription of MLC1f and MLC3f RNA.

Topics: Amino Acid Sequence; Animals; Base Sequence; Chickens; DNA Restriction Enzymes; Drosophila melanogaster; Endonucleases; Genes; Genes, Regulator; Mice; Myosin Subfragments; Myosins; Peptide Fragments; Promoter Regions, Genetic; Rats; Single-Strand Specific DNA and RNA Endonucleases; Species Specificity; Templates, Genetic; Transcription, Genetic

PubMed: 3018505
DOI: 10.1128/mcb.5.11.3168-3182.1985

Structural and functional relationships between fumarase and aspartase. Nucleotide sequences of the fumarase (fumC) and aspartase (aspA) genes of Escherichia coli K12.

The Biochemical Journal Jul 1986

The nucleotide sequences of two segments of DNA (2250 and 2921 base-pairs) containing the functionally related fumarase (fumC) and aspartase (aspA) genes of Escherichia... (Comparative Study)

Summary PubMed Full Text PDF

Comparative Study

Authors: S A Woods, J S Miles, R E Roberts...

The nucleotide sequences of two segments of DNA (2250 and 2921 base-pairs) containing the functionally related fumarase (fumC) and aspartase (aspA) genes of Escherichia coli K12 were determined. The fumC structural gene comprises 1398 base-pairs (466 codons, excluding the initiation codon), and it encodes a polypeptide of Mr 50353 that resembles the fumarases of Bacillus subtilis 168 (citG-gene product), rat liver and pig heart. The fumC gene starts 140 base-pairs downstream of the structurally-unrelated fumA gene, but there is no evidence that both genes form part of the same operon. The aspA structural gene comprises 1431 base-pairs (477 codons excluding the initiation codon), and it encodes a polypeptide of Mr 52190, similar to that predicted from maxicell studies and for the enzyme from E. coli W. Remarkable homologies were found between the primary structures of the fumarase (fumC and citG) and aspartase (aspA) genes and their products, suggesting close structural and evolutionary relationships.

Topics: Amino Acid Sequence; Amino Acids; Ammonia-Lyases; Aspartate Ammonia-Lyase; Base Sequence; Cloning, Molecular; Codon; DNA, Bacterial; Escherichia coli; Fumarate Hydratase; Genes

PubMed: 3541901
DOI: 10.1042/bj2370547

Gene composer: database software for protein construct design, codon engineering, and gene synthesis.

BMC Biotechnology Apr 2009

To improve efficiency in high throughput protein structure determination, we have developed a database software package, Gene Composer, which facilitates the...

Summary PubMed Full Text PDF

Authors: Don Lorimer, Amy Raymond, John Walchli...

BACKGROUND

To improve efficiency in high throughput protein structure determination, we have developed a database software package, Gene Composer, which facilitates the information-rich design of protein constructs and their codon engineered synthetic gene sequences. With its modular workflow design and numerous graphical user interfaces, Gene Composer enables researchers to perform all common bio-informatics steps used in modern structure guided protein engineering and synthetic gene engineering.

RESULTS

An interactive Alignment Viewer allows the researcher to simultaneously visualize sequence conservation in the context of known protein secondary structure, ligand contacts, water contacts, crystal contacts, B-factors, solvent accessible area, residue property type and several other useful property views. The Construct Design Module enables the facile design of novel protein constructs with altered N- and C-termini, internal insertions or deletions, point mutations, and desired affinity tags. The modifications can be combined and permuted into multiple protein constructs, and then virtually cloned in silico into defined expression vectors. The Gene Design Module uses a protein-to-gene algorithm that automates the back-translation of a protein amino acid sequence into a codon engineered nucleic acid gene sequence according to a selected codon usage table with minimal codon usage threshold, defined G:C% content, and desired sequence features achieved through synonymous codon selection that is optimized for the intended expression system. The gene-to-oligo algorithm of the Gene Design Module plans out all of the required overlapping oligonucleotides and mutagenic primers needed to synthesize the desired gene constructs by PCR, and for physically cloning them into selected vectors by the most popular subcloning strategies.

CONCLUSION

We present a complete description of Gene Composer functionality, and an efficient PCR-based synthetic gene assembly procedure with mis-match specific endonuclease error correction in combination with PIPE cloning. In a sister manuscript we present data on how Gene Composer designed genes and protein constructs can result in improved protein production for structural studies.

Topics: Algorithms; Cloning, Molecular; Codon; Computational Biology; Databases, Genetic; Genes, Synthetic; Protein Engineering; Sequence Alignment; Software; User-Computer Interface

PubMed: 19383142
DOI: 10.1186/1472-6750-9-36

Patterns of exon-intron architecture variation of genes in eukaryotic genomes.

BMC Genomics Jan 2009

The origin and importance of exon-intron architecture comprises one of the remaining mysteries of gene evolution. Several studies have investigated the variations of...

Summary PubMed Full Text PDF

Authors: Liucun Zhu, Ying Zhang, Wen Zhang...

BACKGROUND

The origin and importance of exon-intron architecture comprises one of the remaining mysteries of gene evolution. Several studies have investigated the variations of intron length, GC content, ordinal position in a gene and divergence. However, there is little study about the structural variation of exons and introns.

RESULTS

We investigated the length, GC content, ordinal position and divergence in both exons and introns of 13 eukaryotic genomes, representing plant and animal. Our analyses revealed that three basic patterns of exon-intron variation were present in nearly all analyzed genomes (P < 0.001 in most cases): an ordinal reduction of length and divergence in both exon and intron, a co-variation between exon and its flanking introns in their length, GC content and divergence, and a decrease of average exon (or intron) length, GC content and divergence as the total exon numbers of a gene increased. In addition, we observed that the shorter introns had either low or high GC content, and the GC content of long introns was intermediate.

CONCLUSION

Although the factors contributing to these patterns have not been identified, our results provide three important clues: common factor(s) exist and may shape both exons and introns; the ordinal reduction patterns may reflect a time-orderly evolution; and the larger first and last exons may be splicing-required. These clues provide a framework for elucidating mechanisms involved in the organization of eukaryotic genomes and particularly in building exon-intron structures.

Topics: Animals; Base Composition; Evolution, Molecular; Exons; Genetic Variation; Genome; Humans; Introns; Plants; Sequence Alignment; Sequence Analysis, DNA; Species Specificity

PubMed: 19166620
DOI: 10.1186/1471-2164-10-47

Intron exon boundary junctions in human genome have in-built unique structural and energetic signals.

Nucleic Acids Research Mar 2021

Precise identification of correct exon-intron boundaries is a prerequisite to analyze the location and structure of genes. The existing framework for genomic signals,...

Summary PubMed Full Text PDF

Authors: Akhilesh Mishra, Priyanka Siwach, Pallavi Misra...

Precise identification of correct exon-intron boundaries is a prerequisite to analyze the location and structure of genes. The existing framework for genomic signals, delineating exon and introns in a genomic segment, seems insufficient, predominantly due to poor sequence consensus as well as limitations of training on available experimental data sets. We present here a novel concept for characterizing exon-intron boundaries in genomic segments on the basis of structural and energetic properties. We analyzed boundary junctions on both sides of all the exons (3 28 368) of protein coding genes from human genome (GENCODE database) using 28 structural and three energy parameters. Study of sequence conservation at these sites shows very poor consensus. It is observed that DNA adopts a unique structural and energy state at the boundary junctions. Also, signals are somewhat different for housekeeping and tissue specific genes. Clustering of 31 parameters into four derived vectors gives some additional insights into the physical mechanisms involved in this biological process. Sites of structural and energy signals correlate well to the positions playing important roles in pre-mRNA splicing.

Topics: Exons; Genes, Essential; Genome, Human; Genomics; Humans; Introns; RNA Splice Sites

PubMed: 33621338
DOI: 10.1093/nar/gkab098

The complete nucleotide sequence of the tryptophan operon of Escherichia coli.

Nucleic Acids Research Dec 1981

The tryptophan (trp) operon of Escherichia coli has become the basic reference structure for studies on tryptophan metabolism. Within the past five years the application... (Review)

Summary PubMed Full Text PDF

Review

Authors: C Yanofsky, T Platt, I P Crawford...

The tryptophan (trp) operon of Escherichia coli has become the basic reference structure for studies on tryptophan metabolism. Within the past five years the application of recombinant DNA and sequencing methodologies has permitted the characterization of the structural and functional elements in this gene cluster at the molecular level. In this summary report we present the complete nucleotide sequence for the five structural genes of the trp operon of E. coli together with the internal and flanking regions of regulatory information.

Topics: Base Sequence; Biological Evolution; Codon; DNA, Bacterial; Escherichia coli; Genes; Genes, Bacterial; Operon; Peptides; Tryptophan

PubMed: 7038627
DOI: 10.1093/nar/9.24.6647

Discovery of 17 conserved structural RNAs in fungi.

Nucleic Acids Research Jun 2021

Many non-coding RNAs with known functions are structurally conserved: their intramolecular secondary and tertiary interactions are maintained across evolutionary time....

Summary PubMed Full Text PDF

Authors: William Gao, Thomas A Jones, Elena Rivas...

Many non-coding RNAs with known functions are structurally conserved: their intramolecular secondary and tertiary interactions are maintained across evolutionary time. Consequently, the presence of conserved structure in multiple sequence alignments can be used to identify candidate functional non-coding RNAs. Here, we present a bioinformatics method that couples iterative homology search with covariation analysis to assess whether a genomic region has evidence of conserved RNA structure. We used this method to examine all unannotated regions of five well-studied fungal genomes (Saccharomyces cerevisiae, Candida albicans, Neurospora crassa, Aspergillus fumigatus, and Schizosaccharomyces pombe). We identified 17 novel structurally conserved non-coding RNA candidates, which include four H/ACA box small nucleolar RNAs, four intergenic RNAs and nine RNA structures located within the introns and untranslated regions (UTRs) of mRNAs. For the two structures in the 3' UTRs of the metabolic genes GLY1 and MET13, we performed experiments that provide evidence against them being eukaryotic riboswitches.

Topics: 3' Untranslated Regions; Computational Biology; Genome, Fungal; Introns; Lysine-tRNA Ligase; Markov Chains; Nucleic Acid Conformation; RNA, Fungal; RNA, Small Nucleolar; RNA, Untranslated; Ribosomal Proteins; Riboswitch; Sequence Alignment; Thioredoxins

PubMed: 34086938
DOI: 10.1093/nar/gkab355

Genomic, regulatory and epigenetic mechanisms underlying duplicated gene evolution in the natural allotetraploid Oryza minuta.

BMC Genomics Jan 2014

Polyploid species contribute to Oryza diversity. However, the mechanisms underlying gene and genome evolution in Oryza polyploids remain largely unknown. The...

Summary PubMed Full Text PDF

Authors: Yi Sui, Bo Li, Jinfeng Shi...

BACKGROUND

Polyploid species contribute to Oryza diversity. However, the mechanisms underlying gene and genome evolution in Oryza polyploids remain largely unknown. The allotetraploid Oryza minuta, which is estimated to have formed less than one million years ago, along with its putative diploid progenitors (O. punctata and O. officinalis), are quite suitable for the study of polyploid genome evolution using a comparative genomics approach.

RESULTS

Here, we performed a comparative study of a large genomic region surrounding the Shattering4 locus in O. minuta, as well as in O. punctata and O. officinalis. Duplicated genomes in O. minuta have maintained the diploid genome organization, except for several structural variations mediated by transposon movement. Tandem duplicated gene clusters are prevalent in the Sh4 region, and segmental duplication followed by random deletion is illustrated to explain the gene gain-and-loss process. Both copies of most duplicated genes still persist in O. minuta. Molecular evolution analysis suggested that these duplicated genes are equally evolved and mostly manipulated by purifying selection. However, cDNA-SSCP analysis revealed that the expression patterns were dramatically altered between duplicated genes: nine of 29 duplicated genes exhibited expression divergence in O. minuta. We further detected one gene silencing event that was attributed to gene structural variation, but most gene silencing could not be related to sequence changes. We identified one case in which DNA methylation differences within promoter regions that were associated with the insertion of one hAT element were probably responsible for gene silencing, suggesting a potential epigenetic gene silencing pathway triggered by TE movement.

CONCLUSIONS

Our study revealed both genetic and epigenetic mechanisms involved in duplicated gene silencing in the allotetraploid O. minuta.

Topics: DNA Methylation; Epigenesis, Genetic; Evolution, Molecular; Gene Silencing; Genes, Duplicate; Genes, Plant; Genetic Loci; Genomics; Homologous Recombination; Multigene Family; Oryza; Plant Proteins; Promoter Regions, Genetic; Tetraploidy

PubMed: 24393121
DOI: 10.1186/1471-2164-15-11

Combined protein construct and synthetic gene engineering for heterologous protein expression and crystallization using Gene Composer.

BMC Biotechnology Apr 2009

With the goal of improving yield and success rates of heterologous protein production for structural studies we have developed the database and algorithm software...

Summary PubMed Full Text PDF

Authors: Amy Raymond, Scott Lovell, Don Lorimer...

BACKGROUND

With the goal of improving yield and success rates of heterologous protein production for structural studies we have developed the database and algorithm software package Gene Composer. This freely available electronic tool facilitates the information-rich design of protein constructs and their engineered synthetic gene sequences, as detailed in the accompanying manuscript.

RESULTS

In this report, we compare heterologous protein expression levels from native sequences to that of codon engineered synthetic gene constructs designed by Gene Composer. A test set of proteins including a human kinase (P38alpha), viral polymerase (HCV NS5B), and bacterial structural protein (FtsZ) were expressed in both E. coli and a cell-free wheat germ translation system. We also compare the protein expression levels in E. coli for a set of 11 different proteins with greatly varied G:C content and codon bias.

CONCLUSION

The results consistently demonstrate that protein yields from codon engineered Gene Composer designs are as good as or better than those achieved from the synonymous native genes. Moreover, structure guided N- and C-terminal deletion constructs designed with the aid of Gene Composer can lead to greater success in gene to structure work as exemplified by the X-ray crystallographic structure determination of FtsZ from Bacillus subtilis. These results validate the Gene Composer algorithms, and suggest that using a combination of synthetic gene and protein construct engineering tools can improve the economics of gene to structure research.

Topics: Algorithms; Base Composition; Cell-Free System; Codon; Escherichia coli; Gene Expression; Genes, Synthetic; Humans; Protein Engineering; Protein Structure, Tertiary; Sequence Alignment; Software; User-Computer Interface

PubMed: 19383143
DOI: 10.1186/1472-6750-9-37