-
Genome Biology 2001The recent draft assembly of the human genome provides a unified basis for describing genomic structure and function. The draft is sufficiently accurate to provide...
BACKGROUND
The recent draft assembly of the human genome provides a unified basis for describing genomic structure and function. The draft is sufficiently accurate to provide useful annotation, enabling direct observations of previously inferred biological phenomena.
RESULTS
We report here a functionally annotated human gene index placed directly on the genome. The index is based on the integration of public transcript, protein, and mapping information, supplemented with computational prediction. We describe numerous global features of the genome and examine the relationship of various genetic maps with the assembly. In addition, initial sequence analysis reveals highly ordered chromosomal landscapes associated with paralogous gene clusters and distinct functional compartments. Finally, these annotation data were synthesized to produce observations of gene density and number that accord well with historical estimates. Such a global approach had previously been described only for chromosomes 21 and 22, which together account for 2.2% of the genome.
CONCLUSIONS
We estimate that the genome contains 65,000-75,000 transcriptional units, with exon sequences comprising 4%. The creation of a comprehensive gene index requires the synthesis of all available computational and experimental evidence.
Topics: Chromosome Mapping; Gene Expression Profiling; Genes; Genome, Human; Humans; Transcription, Genetic
PubMed: 11516338
DOI: 10.1186/gb-2001-2-7-research0025 -
Nucleic Acids Research Jan 2009IMGT, the international ImMunoGeneTics information system (http://www.imgt.org), was created in 1989 by Marie-Paule Lefranc, Laboratoire d'ImmunoGénétique Moléculaire...
IMGT, the international ImMunoGeneTics information system (http://www.imgt.org), was created in 1989 by Marie-Paule Lefranc, Laboratoire d'ImmunoGénétique Moléculaire LIGM (Université Montpellier 2 and CNRS) at Montpellier, France, in order to standardize and manage the complexity of immunogenetics data. The building of a unique ontology, IMGT-ONTOLOGY, has made IMGT the global reference in immunogenetics and immunoinformatics. IMGT is a high-quality integrated knowledge resource specialized in the immunoglobulins or antibodies, T cell receptors, major histocompatibility complex, of human and other vertebrate species, proteins of the IgSF and MhcSF, and related proteins of the immune systems of any species. IMGT provides a common access to standardized data from genome, proteome, genetics and 3D structures. IMGT consists of five databases (IMGT/LIGM-DB, IMGT/GENE-DB, IMGT/3Dstructure-DB, etc.), fifteen interactive online tools for sequence, genome and 3D structure analysis, and more than 10,000 HTML pages of synthesis and knowledge. IMGT is used in medical research (autoimmune diseases, infectious diseases, AIDS, leukemias, lymphomas and myelomas), veterinary research, biotechnology related to antibody engineering (phage displays, combinatorial libraries, chimeric, humanized and human antibodies), diagnostics (clonalities, detection and follow-up of residual diseases) and therapeutical approaches (graft, immunotherapy, vaccinology). IMGT is freely available at http://www.imgt.org.
Topics: Animals; Databases, Genetic; Genes, Immunoglobulin; Genes, T-Cell Receptor; Humans; Immunogenetic Phenomena; Immunoglobulins; Internet; Major Histocompatibility Complex; Mice; Receptors, Antigen, T-Cell; Software; Terminology as Topic
PubMed: 18978023
DOI: 10.1093/nar/gkn838 -
European Journal of Biochemistry Dec 2002Understanding peroxidase function in plants is complicated by the lack of substrate specificity, the high number of genes, their diversity in structure and our limited...
Understanding peroxidase function in plants is complicated by the lack of substrate specificity, the high number of genes, their diversity in structure and our limited knowledge of peroxidase gene transcription and translation. In the present study we sequenced expressed sequence tags (ESTs) encoding novel heme-containing class III peroxidases from Arabidopsis thaliana and annotated 73 full-length genes identified in the genome. In total, transcripts of 58 of these genes have now been observed. The expression of individual peroxidase genes was assessed in organ-specific EST libraries and compared to the expression of 33 peroxidase genes which we analyzed in whole plants 3, 6, 15, 35 and 59 days after sowing. Expression was assessed in root, rosette leaf, stem, cauline leaf, flower bud and cell culture tissues using the gene-specific and highly sensitive reverse transcriptase-polymerase chain reaction (RT-PCR). We predicted that 71 genes could yield stable proteins folded similarly to horseradish peroxidase (HRP). The putative mature peroxidases derived from these genes showed 28-94% amino acid sequence identity and were all targeted to the endoplasmic reticulum by N-terminal signal peptides. In 20 peroxidases these signal peptides were followed by various N-terminal extensions of unknown function which are not present in HRP. Ten peroxidases showed a C-terminal extension indicating vacuolar targeting. We found that the majority of peroxidase genes were expressed in root. In total, class III peroxidases accounted for an impressive 2.2% of root ESTs. Rather few peroxidases showed organ specificity. Most importantly, genes expressed constitutively in all organs and genes with a preference for root represented structurally diverse peroxidases (< 70% sequence identity). Furthermore, genes appearing in tandem showed distinct expression profiles. The alignment of 73 Arabidopsis peroxidase sequences provides an easy access to the identification of orthologous peroxidases in other plant species and will provide a common platform for combining knowledge of peroxidase structure and function relationships obtained in various species.
Topics: Amino Acid Sequence; Arabidopsis; DNA, Complementary; Expressed Sequence Tags; Genome, Plant; Heme; Horseradish Peroxidase; Introns; Models, Molecular; Molecular Sequence Data; Peroxidases; Protein Structure, Tertiary; Reverse Transcriptase Polymerase Chain Reaction; Sequence Homology, Amino Acid; Transcription, Genetic
PubMed: 12473102
DOI: 10.1046/j.1432-1033.2002.03311.x -
Proceedings of the National Academy of... May 1982We have determined the nucleotide sequence of the structural gene for colicin E1, which consists of 1,566 base pairs. The amino acid sequence (522 residues) of the...
We have determined the nucleotide sequence of the structural gene for colicin E1, which consists of 1,566 base pairs. The amino acid sequence (522 residues) of the protein was derived from the DNA sequence, and the molecular weight was calculated to be 57,279. From the analysis of the predicted secondary structure, there appear to be three consecutive long alpha-helices in the NH2-terminal half of the polypeptide, spanning 40, 100, and 35 amino acid residues. In addition, there is a polypeptide region near the COOH terminus that shows homology to the NH2-terminal signal portions of outer membrane lipoprotein in Escherichia coli and beta-lactamase in Bacillus licheniformis. Most of the homologous amino acids are located in the region where either alpha-helix or beta-sheet would be expected to occur, as determined from the amino acid sequence. These characteristics of the predicted protein structure might correspond to properties of colicin E1 as an ionophore in its antimicrobial action and also as an exported protein during its induced synthesis.
Topics: Amino Acid Sequence; Base Sequence; Colicins; DNA, Bacterial; Genes; Genes, Bacterial; Protein Conformation; Protein Precursors; Structure-Activity Relationship
PubMed: 6953432
DOI: 10.1073/pnas.79.9.2827 -
BMC Molecular and Cell Biology Jul 2020Trichomonas vaginalis, the causative agent of a prevalent urogenital infection in humans, is an evolutionarily divergent protozoan. Protein-coding genes in T. vaginalis...
BACKGROUND
Trichomonas vaginalis, the causative agent of a prevalent urogenital infection in humans, is an evolutionarily divergent protozoan. Protein-coding genes in T. vaginalis are largely controlled by two core promoter elements, producing mRNAs with short 5' UTRs. The specific mechanisms adopted by T. vaginalis to fine-tune the translation efficiency (TE) of mRNAs remain largely unknown.
RESULTS
Using both computational and experimental approaches, this study investigated two key factors influencing TE in T. vaginalis: codon usage and mRNA secondary structure. Statistical dependence between TE and codon adaptation index (CAI) highlighted the impact of codon usage on mRNA translation in T. vaginalis. A genome-wide interrogation revealed that low structural complexity at the 5' end of mRNA followed closely by a highly structured downstream region correlates with TE variation in this organism. To validate these findings, a synthetic library of 15 synonymous iLOV genes was created, representing five mRNA folding profiles and three codon usage profiles. Fluorescence signals produced by the expression of these synonymous iLOV genes in T. vaginalis were consistent with and validated our in silico predictions.
CONCLUSIONS
This study demonstrates the role of codon usage bias and mRNA secondary structure in TE of T. vaginalis mRNAs, contributing to a better understanding of the factors that influence, and possibly regulate, gene expression in this human pathogen.
Topics: Base Sequence; Biological Evolution; Codon; Gene Library; Genes, Reporter; Nucleic Acid Conformation; Open Reading Frames; Protein Biosynthesis; RNA, Messenger; Trichomonas vaginalis
PubMed: 32689943
DOI: 10.1186/s12860-020-00297-8 -
Scientific Reports Jul 2022Cancer is a disease caused by errors within the multicellular system and it represents a major health issue in multicellular organisms. Although cancer research has...
Cancer is a disease caused by errors within the multicellular system and it represents a major health issue in multicellular organisms. Although cancer research has advanced substantially, new approaches focusing on fundamental aspects of cancer origin and mechanisms of spreading are necessary. Comparative genomic studies have shown that most genes linked to human cancer emerged during the early evolution of Metazoa. Thus, basal animals without true tissues and organs, such as sponges (Porifera), might be an innovative model system for understanding the molecular mechanisms of proteins involved in cancer biology. One of these proteins is developmentally regulated GTP-binding protein 1 (DRG1), a GTPase stabilized by interaction with DRG family regulatory protein 1 (DFRP1). This study reveals a high evolutionary conservation of DRG1 gene/protein in metazoans. Our biochemical analysis and structural predictions show that both recombinant sponge and human DRG1 are predominantly monomers that form complexes with DFRP1 and bind non-specifically to RNA and DNA. We demonstrate the conservation of sponge and human DRG1 biological features, including intracellular localization and DRG1:DFRP1 binding, function of DRG1 in α-tubulin dynamics, and its role in cancer biology demonstrated by increased proliferation, migration and colonization in human cancer cells. These results suggest that the ancestor of all Metazoa already possessed DRG1 that is structurally and functionally similar to the human DRG1, even before the development of real tissues or tumors, indicating an important function of DRG1 in fundamental cellular pathways.
Topics: Animals; GTP-Binding Proteins; Genomics; Humans; Neoplasms; Oncogenes; RNA; Transcription Factors
PubMed: 35790840
DOI: 10.1038/s41598-022-15242-2 -
Molecular and Cellular Biology Sep 1988We have investigated the structural features of spontaneous deletions in Caenorhabditis elegans. We cloned and sequenced the junctions of 16 spontaneous deletions...
We have investigated the structural features of spontaneous deletions in Caenorhabditis elegans. We cloned and sequenced the junctions of 16 spontaneous deletions affecting the unc-54 myosin heavy-chain gene and compared their sequences with those of the wild type. We analyzed these sequences in an attempt to identify structural features of the gene that are consistently involved in the spontaneous deletion process. Most deletions (15 of 16) removed a single contiguous region of DNA, with no nucleotides inserted or rearranged at the deletion junctions; one deletion was more complex. unc-54 deletions were small, averaging 600 base pairs in length, and were randomly distributed throughout the gene. Unlike deletions that occur in Escherichia coli, spontaneous unc-54 deletions did not contain statistically significant direct or inverted repeats at or near their termini. Except for their small average size, we have not identified any distinguishing features of their sequence or structure. We discuss these results with regard to the mechanisms for spontaneous deletion in eucaryotic and procaryotic cells.
Topics: Alleles; Animals; Base Sequence; Caenorhabditis; Chromosome Deletion; Cloning, Molecular; DNA; Genes; Molecular Sequence Data; Myosin Subfragments; Myosins; Peptide Fragments
PubMed: 3221864
DOI: 10.1128/mcb.8.9.3748-3754.1988 -
PloS One 2011Genomic DNA sequences display compositional heterogeneity on many scales. In this paper we analyzed tendencies and anomalies in the occurence of mono, di and...
Genomic DNA sequences display compositional heterogeneity on many scales. In this paper we analyzed tendencies and anomalies in the occurence of mono, di and trinucleotides in structural regions of plant genes. Representation of these trends as a function of position along genic sequences highlighted compositional features peculiar of either monocots or eudicots that were remarkably uniform within these two evolutionary clades. The most evident of these features appeared in the form of gradient of base content along the direction of transcription. The robustness of such a representation was validated in sequences sub-datasets generated considering structural and compositional features such as total length of cds, overall GC content and genic orientation in the genome. Piecewise regression analyses indicated that the gradients could be conveniently approximated to a two segmented model where a first region featuring a steep slope is followed by a second segment fitting a milder variation. In general, monocots species showed steeper segments than eudicots. The guanine gradient was the most distinctive feature between the two evolutionary clades, being moderately increasing in eudicots and firmly decreasing in monocots. Single gene investigation revealed that a high proportion of genes show compositional trends compatible with a segmented model suggesting that these features are essential attributes of gene organization. Dinucleotide and trinucleotide biases were referred to expectation based on a random union of the component elements. The average bias at dinucleotide level identified a significant undererpresentation of some dinucleotide and the overrepresention of others. The bias at trinucleotide level was on average low. Finally, the analysis of bryophyte coding sequences showed mononucleotide, dinucleotide and trinucleotide compositional trends resembling those of higher plants. This finding suggested that the emergenge of compositional bias is an ancient event in evolution which was already present at the time of land conquest by green plants.
Topics: Genes, Plant; Nucleotides; Untranslated Regions
PubMed: 21829660
DOI: 10.1371/journal.pone.0022855 -
BMC Evolutionary Biology Oct 2008Globin isoforms with variant properties and functions have been found in the pseudocoel, body wall and cuticle of various nematode species and even in the eyespots of... (Comparative Study)
Comparative Study
BACKGROUND
Globin isoforms with variant properties and functions have been found in the pseudocoel, body wall and cuticle of various nematode species and even in the eyespots of the insect-parasite Mermis nigrescens. In fact, much higher levels of complexity exist, as shown by recent whole genome analysis studies. In silico analysis of the genome of Caenorhabditis elegans revealed an unexpectedly high number of globin genes featuring a remarkable diversity in gene structure, amino acid sequence and expression profiles.
RESULTS
In the present study we have analyzed whole genomic data from C. briggsae, C. remanei, Pristionchus pacificus and Brugia malayi and EST data from several other nematode species to study the evolutionary history of the nematode globin gene family. We find a high level of conservation of the C. elegans globin complement, with even distantly related nematodes harboring orthologs to many Caenorhabditis globins. Bayesian phylogenetic analysis resolves all nematode globins into two distinct globin classes. Analysis of the globin intron-exon structures suggests extensive loss of ancestral introns and gain of new positions in deep nematode ancestors, and mainly loss in the Caenorhabditis lineage. We also show that the Caenorhabditis globin genes are expressed in distinct, mostly non-overlapping, sets of cells and that they are all under strong purifying selection.
CONCLUSION
Our results enable reconstruction of the evolutionary history of the globin gene family in the nematode phylum. A duplication of an ancestral globin gene occurred before the divergence of the Platyhelminthes and the Nematoda and one of the duplicated genes radiated further in the nematode phylum before the split of the Spirurina and Rhabditina and was followed by further radiation in the lineage leading to Caenorhabditis. The resulting globin genes were subject to processes of subfunctionalization and diversification leading to cell-specific expression patterns. Strong purifying selection subsequently dampened further evolution and facilitated fixation of the duplicated genes in the genome.
Topics: Algorithms; Amino Acid Sequence; Animals; Caenorhabditis; Evolution, Molecular; Expressed Sequence Tags; Gene Expression Profiling; Genes, Helminth; Genome, Helminth; Globins; Introns; Likelihood Functions; Molecular Sequence Data; Multigene Family; Phylogeny; Selection, Genetic; Sequence Alignment
PubMed: 18844991
DOI: 10.1186/1471-2148-8-279 -
Genome Research May 2001With the availability of a nearly complete sequence of the human genome, aligning expressed sequence tags (EST) to the genomic sequence has become a practical and... (Comparative Study)
Comparative Study
With the availability of a nearly complete sequence of the human genome, aligning expressed sequence tags (EST) to the genomic sequence has become a practical and powerful strategy for gene prediction. Elucidating gene structure is a complex problem requiring the identification of splice junctions, gene boundaries, and alternative splicing variants. We have developed a software tool, Transcript Assembly Program (TAP), to delineate gene structures using genomically aligned EST sequences. TAP assembles the joint gene structure of the entire genomic region from individual splice junction pairs, using a novel algorithm that uses the EST-encoded connectivity and redundancy information to sort out the complex alternative splicing patterns. A method called polyadenylation site scan (PASS) has been developed to detect poly-A sites in the genome. TAP uses these predictions to identify gene boundaries by segmenting the joint gene structure at polyadenylated terminal exons. Reconstructing 1007 known transcripts, TAP scored a sensitivity (Sn) of 60% and a specificity (Sp) of 92% at the exon level. The gene boundary identification process was found to be accurate 78% of the time. also reports alternative splicing patterns in EST alignments. An analysis of alternative splicing in 1124 genic regions suggested that more than half of human genes undergo alternative splicing. Surprisingly, we saw an absolute majority of the detected alternative splicing events affect the coding region. Furthermore, the evolutionary conservation of alternative splicing between human and mouse was analyzed using an EST-based approach. (See http://stl.wustl.edu/~zkan/TAP/)
Topics: Alternative Splicing; Computational Biology; Expressed Sequence Tags; Genes; Genome, Human; Humans; RNA, Messenger; Sequence Alignment; Software; Software Validation; Transcription, Genetic
PubMed: 11337482
DOI: 10.1101/gr.155001