-
Nature Reviews. Cancer Jul 2017Transposable elements give rise to interspersed repeats, sequences that comprise most of our genomes. These mobile DNAs have been historically underappreciated - both... (Review)
Review
Transposable elements give rise to interspersed repeats, sequences that comprise most of our genomes. These mobile DNAs have been historically underappreciated - both because they have been presumed to be unimportant, and because their high copy number and variability pose unique technical challenges. Neither impediment now seems steadfast. Interest in the human mobilome has never been greater, and methods enabling its study are maturing at a fast pace. This Review describes the activity of transposable elements in human cancers, particularly long interspersed element-1 (LINE-1). LINE-1 sequences are self-propagating, protein-coding retrotransposons, and their activity results in somatically acquired insertions in cancer genomes. Altered expression of transposable elements and animation of genomic LINE-1 sequences appear to be hallmarks of cancer, and can be responsible for driving mutations in tumorigenesis.
Topics: Cell Transformation, Neoplastic; DNA Transposable Elements; Humans; Long Interspersed Nucleotide Elements; Minisatellite Repeats; Neoplasms; Open Reading Frames; Promoter Regions, Genetic; RNA; Short Interspersed Nucleotide Elements; Terminal Repeat Sequences
PubMed: 28642606
DOI: 10.1038/nrc.2017.35 -
Cell Jan 2019In order to provide a comprehensive resource for human structural variants (SVs), we generated long-read sequence data and analyzed SVs for fifteen human genomes. We...
In order to provide a comprehensive resource for human structural variants (SVs), we generated long-read sequence data and analyzed SVs for fifteen human genomes. We sequence resolved 99,604 insertions, deletions, and inversions including 2,238 (1.6 Mbp) that are shared among all discovery genomes with an additional 13,053 (6.9 Mbp) present in the majority, indicating minor alleles or errors in the reference. Genotyping in 440 additional genomes confirms the most common SVs in unique euchromatin are now sequence resolved. We report a ninefold SV bias toward the last 5 Mbp of human chromosomes with nearly 55% of all VNTRs (variable number of tandem repeats) mapping to this portion of the genome. We identify SVs affecting coding and noncoding regulatory loci improving annotation and interpretation of functional variation. These data provide the framework to construct a canonical human reference and a resource for developing advanced representations capable of capturing allelic diversity.
Topics: Alleles; Euchromatin; Gene Frequency; Genome, Human; Genomic Structural Variation; Genomics; Humans; Minisatellite Repeats; Sequence Analysis, DNA
PubMed: 30661756
DOI: 10.1016/j.cell.2018.12.019 -
Nucleic Acids Research Nov 2023SINE-VNTR-Alu (SVA) retrotransposons are evolutionarily young and still-active transposable elements (TEs) in the human genome. Several pathogenic SVA insertions have...
SINE-VNTR-Alu (SVA) retrotransposons are evolutionarily young and still-active transposable elements (TEs) in the human genome. Several pathogenic SVA insertions have been identified that directly mutate host genes to cause neurodegenerative and other types of diseases. However, due to their sequence heterogeneity and complex structures as well as limitations in sequencing techniques and analysis, SVA insertions have been less well studied compared to other mobile element insertions. Here, we identified polymorphic SVA insertions from 3646 whole-genome sequencing (WGS) samples of >150 diverse populations and constructed a polymorphic SVA insertion reference catalog. Using 20 long-read samples, we also assembled reference and polymorphic SVA sequences and characterized the internal hexamer/variable-number-tandem-repeat (VNTR) expansions as well as differing SVA activity for SVA subfamilies and human populations. In addition, we developed a module to annotate both reference and polymorphic SVA copies. By characterizing the landscape of both reference and polymorphic SVA retrotransposons, our study enables more accurate genotyping of these elements and facilitate the discovery of pathogenic SVA insertions.
Topics: Humans; Alu Elements; Genome, Human; Minisatellite Repeats; Retroelements; Short Interspersed Nucleotide Elements
PubMed: 37823611
DOI: 10.1093/nar/gkad821 -
International Journal of Molecular... Feb 2021Repetitive DNA in humans is still widely considered to be meaningless, and variations within this part of the genome are generally considered to be harmless to the... (Review)
Review
Repetitive DNA in humans is still widely considered to be meaningless, and variations within this part of the genome are generally considered to be harmless to the carrier. In contrast, for euchromatic variation, one becomes more careful in classifying inter-individual differences as meaningless and rather tends to see them as possible influencers of the so-called 'genetic background', being able to at least potentially influence disease susceptibilities. Here, the known 'bad boys' among repetitive DNAs are reviewed. Variable numbers of tandem repeats (VNTRs = micro- and minisatellites), small-scale repetitive elements (SSREs) and even chromosomal heteromorphisms (CHs) may therefore have direct or indirect influences on human diseases and susceptibilities. Summarizing this specific aspect here for the first time should contribute to stimulating more research on human repetitive DNA. It should also become clear that these kinds of studies must be done at all available levels of resolution, i.e., from the base pair to chromosomal level and, importantly, the epigenetic level, as well.
Topics: Chromosomes, Human; DNA, Satellite; Genome, Human; Humans; Microsatellite Repeats; Minisatellite Repeats; Repetitive Sequences, Nucleic Acid
PubMed: 33669810
DOI: 10.3390/ijms22042072 -
Science (New York, N.Y.) Sep 2021Many human proteins contain domains that vary in size or copy number because of variable numbers of tandem repeats (VNTRs) in protein-coding exons. However, the...
Many human proteins contain domains that vary in size or copy number because of variable numbers of tandem repeats (VNTRs) in protein-coding exons. However, the relationships of VNTRs to most phenotypes are unknown because of difficulties in measuring such repetitive elements. We developed methods to estimate VNTR lengths from whole-exome sequencing data and impute VNTR alleles into single-nucleotide polymorphism haplotypes. Analyzing 118 protein-altering VNTRs in 415,280 UK Biobank participants for association with 786 phenotypes identified some of the strongest associations of common variants with human phenotypes, including height, hair morphology, and biomarkers of health. Accounting for large-effect VNTRs further enabled fine-mapping of associations to many more protein-coding mutations in the same genes. These results point to cryptic effects of highly polymorphic common structural variants that have eluded molecular analyses to date.
Topics: Aggrecans; Antigens; Black People; Body Height; Genetic Association Studies; Genome, Human; Hair; Haplotypes; Humans; Intermediate Filament Proteins; Kidney; Lipoprotein(a); Minisatellite Repeats; Mucin-1; Phenotype; Polymorphism, Genetic; Polymorphism, Single Nucleotide; Polynucleotide Adenylyltransferase; White People; Exome Sequencing
PubMed: 34554798
DOI: 10.1126/science.abg8289 -
Veterinary Research Dec 2022African swine fever virus (ASFV) is a large DNA virus that infects domestic pigs with high morbidity and mortality rates. Repeat sequences, which are DNA sequence...
African swine fever virus (ASFV) is a large DNA virus that infects domestic pigs with high morbidity and mortality rates. Repeat sequences, which are DNA sequence elements that are repeated more than twice in the genome, play an important role in the ASFV genome. The majority of repeat sequences, however, have not been identified and characterized in a systematic manner. In this study, three types of repeat sequences, including microsatellites, minisatellites and short interspersed nuclear elements (SINEs), were identified in the ASFV genome, and their distribution, structure, function, and evolutionary history were investigated. Most repeat sequences were observed in noncoding regions and at the 5' end of the genome. Noncoding repeat sequences tended to form enhancers, whereas coding repeat sequences had a lower ratio of alpha-helix and beta-sheet and a higher ratio of loop structure and surface amino acids than nonrepeat sequences. In addition, the repeat sequences tended to encode penetrating and antimicrobial peptides. Further analysis of the evolution of repeat sequences revealed that the pan-repeat sequences presented an open state, showing the diversity of repeat sequences. Finally, CpG islands were observed to be negatively correlated with repeat sequence occurrences, suggesting that they may affect the generation of repeat sequences. Overall, this study emphasizes the importance of repeat sequences in ASFVs, and these results can aid in understanding the virus's function and evolution.
Topics: Animals; Swine; African Swine Fever Virus; Sus scrofa; Amino Acids; Antimicrobial Peptides; Minisatellite Repeats
PubMed: 36461107
DOI: 10.1186/s13567-022-01119-9 -
Genome Research Aug 2021There are more than 55,000 variable number tandem repeats (VNTRs) in the human genome, notable for both their striking polymorphism and mutability. Despite their role in...
There are more than 55,000 variable number tandem repeats (VNTRs) in the human genome, notable for both their striking polymorphism and mutability. Despite their role in human evolution and genomic variation, they have yet to be studied collectively and in detail, partially owing to their large size, variability, and predominant location in noncoding regions. Here, we examine 467 VNTRs that are human-specific expansions, unique to one location in the genome, and not associated with retrotransposons. We leverage publicly available long-read genomes, including from the Human Genome Structural Variant Consortium, to ascertain the exact nucleotide composition of these VNTRs and compare their composition of alleles. We then confirm repeat unit composition in more than 3000 short-read samples from the 1000 Genomes Project. Our analysis reveals that these VNTRs contain highly structured repeat motif organization, modified by frequent deletion and duplication events. Although overall VNTR compositions tend to remain similar between 1000 Genomes Project superpopulations, we describe a notable exception with substantial differences in repeat composition (in ), as well as several VNTRs that are significantly different in length between superpopulations (in , and ). We also observe that most of these VNTRs are expanded in archaic human genomes, yet remain stable in length between single generations. Collectively, our findings indicate that repeat motif variability, repeat composition, and repeat length are all informative modalities to consider when characterizing VNTRs and their contribution to genomic variation.
Topics: Genome, Human; Genomic Structural Variation; Humans; Minisatellite Repeats; Nucleotides; Polymorphism, Genetic
PubMed: 34244228
DOI: 10.1101/gr.275560.121 -
American Journal of Human Genetics Sep 2020Tandem repeats are proposed to contribute to human-specific traits, and more than 40 tandem repeat expansions are known to cause neurological disease. Here, we...
Tandem repeats are proposed to contribute to human-specific traits, and more than 40 tandem repeat expansions are known to cause neurological disease. Here, we characterize a human-specific 69 bp variable number tandem repeat (VNTR) in the last intron of WDR7, which exhibits striking variability in both copy number and nucleotide composition, as revealed by long-read sequencing. In addition, greater repeat copy number is significantly enriched in three independent cohorts of individuals with sporadic amyotrophic lateral sclerosis (ALS). Each unit of the repeat forms a stem-loop structure with the potential to produce microRNAs, and the repeat RNA can aggregate when expressed in cells. We leveraged its remarkable sequence variability to align the repeat in 288 samples and uncover its mechanism of expansion. We found that the repeat expands in the 3'-5' direction, in groups of repeat units divisible by two. The expansion patterns we observed were consistent with duplication events, and a replication error called template switching. We also observed that the VNTR is expanded in both Denisovan and Neanderthal genomes but is fixed at one copy or fewer in non-human primates. Evaluating the repeat in 1000 Genomes Project samples reveals that some repeat segments are solely present or absent in certain geographic populations. The large size of the repeat unit in this VNTR, along with our multiplexed sequencing strategy, provides an unprecedented opportunity to study mechanisms of repeat expansion, and a framework for evaluating the roles of VNTRs in human evolution and disease.
Topics: Adaptor Proteins, Signal Transducing; Aged; Alzheimer Disease; Amyotrophic Lateral Sclerosis; DNA Repeat Expansion; Evolution, Molecular; Female; Gene Expression Regulation; Humans; Male; Minisatellite Repeats; Phenotype; Species Specificity; Tandem Repeat Sequences
PubMed: 32750315
DOI: 10.1016/j.ajhg.2020.07.004 -
Journal of Infection in Developing... Nov 2023Mycobacterium tuberculosis genotyping has impacted evolutionary studies worldwide. Nonetheless, its application and the knowledge generated depend on the genetic marker... (Review)
Review
INTRODUCTION
Mycobacterium tuberculosis genotyping has impacted evolutionary studies worldwide. Nonetheless, its application and the knowledge generated depend on the genetic marker evaluated and the detection technologies that have evolved over the years. Here we describe the timeline of main genotypic methods related to M. tuberculosis in Latin America and the main findings obtained.
METHODOLOGY
Systematic searches through the PubMed database were performed from 1993 to May 2021. A total of 345 articles met the inclusion criteria and were selected.
RESULTS
Spacer oligonucleotide typing (spoligotyping) was the most widely used method in Latin America, with decreasing use in parallel with increasing use of mycobacterial interspersed repetitive unit-variable number tandem repeat (MIRU-VNTR) and whole genome sequencing (WGS). Among the countries, Brazil, Mexico, and Argentina had the most publications, and a considerable part of the articles were in collaboration with Latin American or non-Latin American institutions; a small proportion of studies needed partnerships to perform the genotypic methods. The genotypic methods allowed the identification of M. tuberculosis genotypes with greater capacity for clonal expansion and revealed the predominance of the Euro-American lineage in Latin America. There was a notable presence of the Beijing family in Peru and Colombia.
CONCLUSIONS
The data obtained demonstrated the importance of expanding collaborative networks of tuberculosis (TB) research groups to countries with low productivity in this area, the commitment of the few Latin American countries to advance TB research, as well as the inestimable value of building a Latin America database, considering ease of population mobility between countries.
Topics: Humans; Latin America; Genotype; Polymorphism, Restriction Fragment Length; Bacterial Typing Techniques; Tuberculosis; Mycobacterium tuberculosis; Minisatellite Repeats
PubMed: 37956372
DOI: 10.3855/jidc.17840 -
Nature Communications Jul 2021Variable number tandem repeats (VNTRs) are composed of consecutive repetitive DNA with hypervariable repeat count and composition. They include protein coding sequences...
Variable number tandem repeats (VNTRs) are composed of consecutive repetitive DNA with hypervariable repeat count and composition. They include protein coding sequences and associations with clinical disorders. It has been difficult to incorporate VNTR analysis in disease studies that use short-read sequencing because the traditional approach of mapping to the human reference is less effective for repetitive and divergent sequences. In this work, we solve VNTR mapping for short reads with a repeat-pangenome graph (RPGG), a data structure that encodes both the population diversity and repeat structure of VNTR loci from multiple haplotype-resolved assemblies. We develop software to build a RPGG, and use the RPGG to estimate VNTR composition with short reads. We use this to discover VNTRs with length stratified by continental population, and expression quantitative trait loci, indicating that RPGG analysis of VNTRs will be critical for future studies of diversity and disease.
Topics: Chromosome Mapping; Gene Expression Regulation; Genetic Loci; Genetic Variation; Genetics, Population; Genome, Human; Humans; Minisatellite Repeats; Nucleotide Motifs; Quantitative Trait Loci
PubMed: 34253730
DOI: 10.1038/s41467-021-24378-0