-
Current Opinion in Structural Biology Dec 2021RAG1/2 (RAG) is an RNH-type DNA recombinase specially evolved to initiate V(D)J gene rearrangement for generating the adaptive immune response in jawed vertebrates.... (Review)
Review
RAG1/2 (RAG) is an RNH-type DNA recombinase specially evolved to initiate V(D)J gene rearrangement for generating the adaptive immune response in jawed vertebrates. After decades of frustration with little mechanistic understanding of RAG, the crystal structure of mouse RAG recombinase opened the flood gates in early 2015. Structures of three different chordate RAG recombinases, including protoRAG, and the evolutionarily preceding transib transposase have been determined in complex with various DNA substrates. Biochemical studies along with the abundant structural data have shed light on how RAG has evolved from an ordinary transposase to a specialized recombinase in initiating gene rearrangement. RAG has also become one of the best characterized RNH-type recombinases, illustrating how a single active site can cleave the two antiparallel DNA strands of a double helix.
Topics: Adaptive Immunity; Animals; Genes, RAG-1; Homeodomain Proteins; Mice; Recombinases; V(D)J Recombination
PubMed: 34245989
DOI: 10.1016/j.sbi.2021.05.014 -
Human Genomics Sep 2015"CCN" is an acronym referring to the first letter of each of the first three members of this original group of mammalian functionally and phylogenetically distinct... (Review)
Review
"CCN" is an acronym referring to the first letter of each of the first three members of this original group of mammalian functionally and phylogenetically distinct extracellular matrix (ECM) proteins [i.e., cysteine-rich 61 (CYR61), connective tissue growth factor (CTGF), and nephroblastoma-overexpressed (NOV)]. Although "CCN" genes are unlikely to have arisen from a common ancestral gene, their encoded proteins share multimodular structures in which most cysteine residues are strictly conserved in their positions within several structural motifs. The CCN genes can be subdivided into members developmentally indispensable for embryonic viability (e.g., CCN1, 2 and 5), each assuming unique tissue-specific functions, and members not essential for embryonic development (e.g., CCN3, 4 and 6), probably due to a balance of functional redundancy and specialization during evolution. The temporo-spatial regulation of the CCN genes and the structural information contained within the sequences of their encoded proteins reflect diversity in their context and tissue-specific functions. Genetic association studies and experimental anomalies, replicated in various animal models, have shown that altered CCN gene structure or expression is associated with "injury" stimuli--whether mechanical (e.g., trauma, shear stress) or chemical (e.g., ischemia, hyperglycemia, hyperlipidemia, inflammation). Consequently, increased organ-specific susceptibility to structural damages ensues. These data underscore the critical functions of CCN proteins in the dynamics of tissue repair and regeneration and in the compensatory responses preceding organ failure. A better understanding of the regulation and mode of action of each CCN member will be useful in developing specific gain- or loss-of-function strategies for therapeutic purposes.
Topics: Amino Acid Sequence; Animals; CCN Intercellular Signaling Proteins; Disease; Exons; Gene Expression Regulation, Developmental; Genetic Predisposition to Disease; Humans; Introns; Molecular Sequence Data
PubMed: 26395334
DOI: 10.1186/s40246-015-0046-y -
BMC Genomics Oct 2019The location and modular structure of eukaryotic protein-coding genes in genomic sequences can be automatically predicted by gene annotation algorithms. These... (Comparative Study)
Comparative Study
BACKGROUND
The location and modular structure of eukaryotic protein-coding genes in genomic sequences can be automatically predicted by gene annotation algorithms. These predictions are often used for comparative studies on gene structure, gene repertoires, and genome evolution. However, automatic annotation algorithms do not yet correctly identify all genes within a genome, and manual annotation is often necessary to obtain accurate gene models and gene sets. As manual annotation is time-consuming, only a fraction of the gene models in a genome is typically manually annotated, and this fraction often differs between species. To assess the impact of manual annotation efforts on genome-wide analyses of gene structural properties, we compared the structural properties of protein-coding genes in seven diverse insect species sequenced by the i5k initiative.
RESULTS
Our results show that the subset of genes chosen for manual annotation by a research community (3.5-7% of gene models) may have structural properties (e.g., lengths and exon counts) that are not necessarily representative for a species' gene set as a whole. Nonetheless, the structural properties of automatically generated gene models are only altered marginally (if at all) through manual annotation. Major correlative trends, for example a negative correlation between genome size and exonic proportion, can be inferred from either the automatically predicted or manually annotated gene models alike. Vice versa, some previously reported trends did not appear in either the automatic or manually annotated gene sets, pointing towards insect-specific gene structural peculiarities.
CONCLUSIONS
In our analysis of gene structural properties, automatically predicted gene models proved to be sufficiently reliable to recover the same gene-repertoire-wide correlative trends that we found when focusing on manually annotated gene models only. We acknowledge that analyses on the individual gene level clearly benefit from manual curation. However, as genome sequencing and annotation projects often differ in the extent of their manual annotation and curation efforts, our results indicate that comparative studies analyzing gene structural properties in these genomes can nonetheless be justifiable and informative.
Topics: Amino Acid Sequence; Base Composition; Base Sequence; Exons; Genes, Insect; Genome, Insect; Introns; Molecular Sequence Annotation
PubMed: 31623555
DOI: 10.1186/s12864-019-6064-8 -
BMC Evolutionary Biology Jul 2017The ever increasing availability of genomes makes it possible to investigate and compare not only the genomic complements of genes and proteins, but also of RNAs. One... (Comparative Study)
Comparative Study
BACKGROUND
The ever increasing availability of genomes makes it possible to investigate and compare not only the genomic complements of genes and proteins, but also of RNAs. One class of RNAs, the long noncoding RNAs (lncRNAs) and, in particular, their subclass of long intergenic noncoding RNAs (lincRNAs) have recently gained much attention because of their roles in regulation of important biological processes such as immune response or cell differentiation and as possible evolutionary precursors for protein coding genes. lincRNAs seem to be poorly conserved at the sequence level but at least some lincRNAs have conserved structural elements and syntenic genomic positions. Previous studies showed that transposable elements are a main contribution to the evolution of lincRNAs in mammals. In contrast, plant lincRNA emergence and evolution has been linked with local duplication events. However, little is known about their evolutionary dynamics in general and in insect genomes in particular.
RESULTS
Here we compared lincRNAs between seven insect genomes and investigated possible evolutionary changes and functional roles. We find very low sequence conservation between different species and that similarities within a species are mostly due to their association with transposable elements (TE) and simple repeats. Furthermore, we find that TEs are less frequent in lincRNA exons than in their introns, indicating that TEs may have been removed by selection. When we analysed the predicted thermodynamic stabilities of lincRNAs we found that they are more stable than their randomized controls which might indicate some selection pressure to maintain certain structural elements. We list several of the most stable lincRNAs which could serve as prime candidates for future functional studies. We also discuss the possibility of de novo protein coding genes emerging from lincRNAs. This is because lincRNAs with high GC content and potentially with longer open reading frames (ORF) are candidate loci where de novo gene emergence might occur.
CONCLUSION
The processes responsible for the emergence and diversification of lincRNAs in insects remain unclear. Both duplication and transposable elements may be important for the creation of new lincRNAs in insects.
Topics: Animals; DNA Transposable Elements; Exons; Genome, Insect; Insecta; Introns; Open Reading Frames; RNA, Long Noncoding
PubMed: 28673235
DOI: 10.1186/s12862-017-0985-0 -
The Plant Journal : For Cell and... Sep 2022Spruces (Picea spp.) are coniferous trees widespread in boreal and mountainous forests of the northern hemisphere, with large economic significance and enormous...
Spruces (Picea spp.) are coniferous trees widespread in boreal and mountainous forests of the northern hemisphere, with large economic significance and enormous contributions to global carbon sequestration. Spruces harbor very large genomes with high repetitiveness, hampering their comparative analysis. Here, we present and compare the genomes of four different North American spruces: the genome assemblies for Engelmann spruce (Picea engelmannii) and Sitka spruce (Picea sitchensis) together with improved and more contiguous genome assemblies for white spruce (Picea glauca) and for a naturally occurring introgress of these three species known as interior spruce (P. engelmannii × glauca × sitchensis). The genomes were structurally similar, and a large part of scaffolds could be anchored to a genetic map. The composition of the interior spruce genome indicated asymmetric contributions from the three ancestral genomes. Phylogenetic analysis of the nuclear and organelle genomes revealed a topology indicative of ancient reticulation. Different patterns of expansion of gene families among genomes were observed and related with presumed diversifying ecological adaptations. We identified rapidly evolving genes that harbored high rates of non-synonymous polymorphisms relative to synonymous ones, indicative of positive selection and its hitchhiking effects. These gene sets were mostly distinct between the genomes of ecologically contrasted species, and signatures of convergent balancing selection were detected. Stress and stimulus response was identified as the most frequent function assigned to expanding gene families and rapidly evolving genes. These two aspects of genomic evolution were complementary in their contribution to divergent evolution of presumed adaptive nature. These more contiguous spruce giga-genome sequences should strengthen our understanding of conifer genome structure and evolution, as their comparison offers clues into the genetic basis of adaptation and ecology of conifers at the genomic level. They will also provide tools to better monitor natural genetic diversity and improve the management of conifer forests. The genomes of four closely related North American spruces indicate that their high similarity at the morphological level is paralleled by the high conservation of their physical genome structure. Yet, the evidence of divergent evolution is apparent in their rapidly evolving genomes, supported by differential expansion of key gene families and large sets of genes under positive selection, largely in relation to stimulus and environmental stress response.
Topics: Expressed Sequence Tags; Genome, Plant; Multigene Family; Phylogeny; Picea; Tracheophyta
PubMed: 35789009
DOI: 10.1111/tpj.15889 -
Genome Research Oct 2010Ever since the pre-molecular era, the birth of new genes with novel functions has been considered to be a major contributor to adaptive evolutionary innovation. Here, I... (Review)
Review
Ever since the pre-molecular era, the birth of new genes with novel functions has been considered to be a major contributor to adaptive evolutionary innovation. Here, I review the origin and evolution of new genes and their functions in eukaryotes, an area of research that has made rapid progress in the past decade thanks to the genomics revolution. Indeed, recent work has provided initial whole-genome views of the different types of new genes for a large number of different organisms. The array of mechanisms underlying the origin of new genes is compelling, extending way beyond the traditionally well-studied source of gene duplication. Thus, it was shown that novel genes also regularly arose from messenger RNAs of ancestral genes, protein-coding genes metamorphosed into new RNA genes, genomic parasites were co-opted as new genes, and that both protein and RNA genes were composed from scratch (i.e., from previously nonfunctional sequences). These mechanisms then also contributed to the formation of numerous novel chimeric gene structures. Detailed functional investigations uncovered different evolutionary pathways that led to the emergence of novel functions from these newly minted sequences and, with respect to animals, attributed a potentially important role to one specific tissue--the testis--in the process of gene birth. Remarkably, these studies also demonstrated that novel genes of the various types significantly impacted the evolution of cellular, physiological, morphological, behavioral, and reproductive phenotypic traits. Consequently, it is now firmly established that new genes have indeed been major contributors to the origin of adaptive evolutionary novelties.
Topics: Animals; Biological Evolution; Eukaryota; Evolution, Molecular; Genes; Genomics; Humans; Male; Phenotype; Testis
PubMed: 20651121
DOI: 10.1101/gr.101386.109 -
Genes Jan 2022genes are novel genes which emerge from non-coding DNA. Until now, little is known about genes' properties, correlated to their age and mechanisms of emergence. In...
genes are novel genes which emerge from non-coding DNA. Until now, little is known about genes' properties, correlated to their age and mechanisms of emergence. In this study, we investigate four related properties: introns, upstream regulatory motifs, 5' Untranslated regions (UTRs) and protein domains, in 23,135 human proto-genes. We found that proto-genes contain introns, whose number and position correlates with the genomic position of proto-gene emergence. The origin of these introns is debated, as our results suggest that 41% of proto-genes might have captured existing introns, and 13.7% of them do not splice the ORF. We show that proto-genes which emerged via overprinting tend to be more enriched in core promotor motifs, while intergenic and intronic genes are more enriched in enhancers, even if the TATA motif is most commonly found upstream in these genes. Intergenic and intronic 5' UTRs of proto-genes have a lower potential to stabilise mRNA structures than exonic proto-genes and established human genes. Finally, we confirm that proteins expressed by proto-genes gain new putative domains with age. Overall, we find that regulatory motifs inducing transcription and translation of previously non-coding sequences may facilitate proto-gene emergence. Our study demonstrates that introns, 5' UTRs, and domains have specific properties in proto-genes. We also emphasize that the genomic positions of de novo genes strongly impacts these properties.
Topics: 5' Untranslated Regions; Exons; Genomics; Humans; Introns; Promoter Regions, Genetic
PubMed: 35205330
DOI: 10.3390/genes13020284 -
Genome Research Oct 2018Despite the importance of duplicate genes for evolutionary adaptation, accurate gene annotation is often incomplete, incorrect, or lacking in regions of segmental...
Despite the importance of duplicate genes for evolutionary adaptation, accurate gene annotation is often incomplete, incorrect, or lacking in regions of segmental duplication. We developed an approach combining long-read sequencing and hybridization capture to yield full-length transcript information and confidently distinguish between nearly identical genes/paralogs. We used biotinylated probes to enrich for full-length cDNA from duplicated regions, which were then amplified, size-fractionated, and sequenced using single-molecule, long-read sequencing technology, permitting us to distinguish between highly identical genes by virtue of multiple paralogous sequence variants. We examined 19 gene families as expressed in developing and adult human brain, selected for their high sequence identity (average >99%) and overlap with human-specific segmental duplications (SDs). We characterized the transcriptional differences between related paralogs to better understand the birth-death process of duplicate genes and particularly how the process leads to gene innovation. In 48% of the cases, we find that the expressed duplicates have changed substantially from their ancestral models due to novel sites of transcription initiation, splicing, and polyadenylation, as well as fusion transcripts that connect duplication-derived exons with neighboring genes. We detect unannotated open reading frames in genes currently annotated as pseudogenes, while relegating other duplicates to nonfunctional status. Our method significantly improves gene annotation, specifically defining full-length transcripts, isoforms, and open reading frames for new genes in highly identical SDs. The approach will be more broadly applicable to genes in structurally complex regions of other genomes where the duplication process creates novel genes important for adaptive traits.
Topics: Brain; Evolution, Molecular; Gene Duplication; Gene Expression Profiling; Humans; Molecular Sequence Annotation; Multigene Family; Open Reading Frames; Pseudogenes; Segmental Duplications, Genomic; Sequence Analysis, DNA; Sequence Analysis, RNA
PubMed: 30228200
DOI: 10.1101/gr.237610.118 -
Database : the Journal of Biological... 2015Homeobox genes are a group of genes coding for transcription factors with a DNA-binding helix-turn-helix structure called a homeodomain and which play a crucial role in... (Comparative Study)
Comparative Study
Homeobox genes are a group of genes coding for transcription factors with a DNA-binding helix-turn-helix structure called a homeodomain and which play a crucial role in pattern formation during embryogenesis. Many homeobox genes are located in clusters and some of these, most notably the HOX genes, are known to have antisense or opposite strand long non-coding RNA (lncRNA) genes that play a regulatory role. Because automated annotation of both gene clusters and non-coding genes is fraught with difficulty (over-prediction, under-prediction, inaccurate transcript structures), we set out to manually annotate all homeobox genes in the mouse and human genomes. This includes all supported splice variants, pseudogenes and both antisense and flanking lncRNAs. One of the areas where manual annotation has a significant advantage is the annotation of duplicated gene clusters. After comprehensive annotation of all homeobox genes and their antisense genes in human and in mouse, we found some discrepancies with the current gene set in RefSeq regarding exact gene structures and coding versus pseudogene locus biotype. We also identified previously un-annotated pseudogenes in the DUX, Rhox and Obox gene clusters, which helped us re-evaluate and update the gene nomenclature in these regions. We found that human homeobox genes are enriched in antisense lncRNA loci, some of which are known to play a role in gene or gene cluster regulation, compared to their mouse orthologues. Of the annotated set of 241 human protein-coding homeobox genes, 98 have an antisense locus (41%) while of the 277 orthologous mouse genes, only 62 protein coding gene have an antisense locus (22%), based on publicly available transcriptional evidence.
Topics: Animals; Databases, Nucleic Acid; Genome, Human; Helix-Turn-Helix Motifs; Homeodomain Proteins; Humans; Mice; Molecular Sequence Annotation; Multigene Family; Pseudogenes; RNA, Long Noncoding
PubMed: 26412852
DOI: 10.1093/database/bav091 -
PloS One 2023Seven IN Absentia (SINA) is a small family of genes coding for ubiquitin-ligases that play major roles in regulating various plant growth and developmental processes, as... (Review)
Review
Seven IN Absentia (SINA) is a small family of genes coding for ubiquitin-ligases that play major roles in regulating various plant growth and developmental processes, as well as in plant response to diverse biotic and abiotic stresses. Here, we studied the SINA genes family in bread wheat Triticum aestivum which is a culture of major importance for food security worldwide. One hundred and forty-one SINA family genes have been identified in bread wheat and showed that their number is very high compared to other plant species such as A. thaliana or rice. The expansion of this family seems to have been more important in monocots than in eudicots. In bread wheat, the chromosome 3 distal region is the site of a massive amplification of the SINA family, since we found that 83 of the 141 SINA genes are located on this chromosome in the Chinese Spring variety. This amplification probably occurred as a result of local duplications, followed by sequences divergence. The study was then extended to 4856 SINA proteins from 97 plant species. Phylogenetic and structural analyses identified a group of putative ancestral SINA proteins in plants containing a 58 aminoacid specific signature. Based on sequence homology and the research of that "Ancestral SINA motif" of 58 amino acids, a methodological process has been proposed and lead to the identification of functional SINA genes in a large family such as the Triticae that might be used for other species. Finally, tis paper gives a comprehensive overview of wheat gene family organization and functionalization taken the SINA genes as an example.
Topics: Bread; Gene Expression Regulation, Plant; Genes, Plant; Multigene Family; Phylogeny; Plant Proteins; Stress, Physiological; Triticum
PubMed: 38127955
DOI: 10.1371/journal.pone.0295021