-
Cell Aug 2023Many regions in the human genome vary in length among individuals due to variable numbers of tandem repeats (VNTRs). To assess the phenotypic impact of VNTRs...
Many regions in the human genome vary in length among individuals due to variable numbers of tandem repeats (VNTRs). To assess the phenotypic impact of VNTRs genome-wide, we applied a statistical imputation approach to estimate the lengths of 9,561 autosomal VNTR loci in 418,136 unrelated UK Biobank participants and 838 GTEx participants. Association and statistical fine-mapping analyses identified 58 VNTRs that appeared to influence a complex trait in UK Biobank, 18 of which also appeared to modulate expression or splicing of a nearby gene. Non-coding VNTRs at TMCO1 and EIF3H appeared to generate the largest known contributions of common human genetic variation to risk of glaucoma and colorectal cancer, respectively. Each of these two VNTRs associated with a >2-fold range of risk across individuals. These results reveal a substantial and previously unappreciated role of non-coding VNTRs in human health and gene regulation.
Topics: Humans; Calcium Channels; Colorectal Neoplasms; Genome, Human; Glaucoma; Minisatellite Repeats; Polymorphism, Genetic; Eukaryotic Initiation Factor-3
PubMed: 37527660
DOI: 10.1016/j.cell.2023.07.002 -
Experimental Biology and Medicine... May 2022SINE-VNTR-Alus (SVAs) are the youngest retrotransposon family in the human genome. Their ongoing mobilization has generated genetic variation within the human... (Review)
Review
SINE-VNTR-Alus (SVAs) are the youngest retrotransposon family in the human genome. Their ongoing mobilization has generated genetic variation within the human population. At least 24 insertions to date, detailed in this review, have been associated with disease. The predominant mechanisms through which this occurs are alterations to normal splicing patterns, exonic insertions causing loss-of-function mutations, and large genomic deletions. Dissecting the functional impact of these SVAs and the mechanism through which they cause disease provides insight into the consequences of their presence in the genome and how these elements could influence phenotypes. Many of these disease-associated SVAs have been difficult to characterize and would not have been identified through routine analyses. However, the number identified has increased in recent years as DNA and RNA sequencing data became more widely available. Therefore, as the search for complex structural variation in disease continues, it is likely to yield further disease-causing SVA insertions.
Topics: Alu Elements; Genome, Human; Humans; Minisatellite Repeats; Retroelements
PubMed: 35387528
DOI: 10.1177/15353702221082612 -
Nucleic Acids Research Nov 2023SINE-VNTR-Alu (SVA) retrotransposons are evolutionarily young and still-active transposable elements (TEs) in the human genome. Several pathogenic SVA insertions have...
SINE-VNTR-Alu (SVA) retrotransposons are evolutionarily young and still-active transposable elements (TEs) in the human genome. Several pathogenic SVA insertions have been identified that directly mutate host genes to cause neurodegenerative and other types of diseases. However, due to their sequence heterogeneity and complex structures as well as limitations in sequencing techniques and analysis, SVA insertions have been less well studied compared to other mobile element insertions. Here, we identified polymorphic SVA insertions from 3646 whole-genome sequencing (WGS) samples of >150 diverse populations and constructed a polymorphic SVA insertion reference catalog. Using 20 long-read samples, we also assembled reference and polymorphic SVA sequences and characterized the internal hexamer/variable-number-tandem-repeat (VNTR) expansions as well as differing SVA activity for SVA subfamilies and human populations. In addition, we developed a module to annotate both reference and polymorphic SVA copies. By characterizing the landscape of both reference and polymorphic SVA retrotransposons, our study enables more accurate genotyping of these elements and facilitate the discovery of pathogenic SVA insertions.
Topics: Humans; Alu Elements; Genome, Human; Minisatellite Repeats; Retroelements; Short Interspersed Nucleotide Elements
PubMed: 37823611
DOI: 10.1093/nar/gkad821 -
International Journal of Molecular... Feb 2021Repetitive DNA in humans is still widely considered to be meaningless, and variations within this part of the genome are generally considered to be harmless to the... (Review)
Review
Repetitive DNA in humans is still widely considered to be meaningless, and variations within this part of the genome are generally considered to be harmless to the carrier. In contrast, for euchromatic variation, one becomes more careful in classifying inter-individual differences as meaningless and rather tends to see them as possible influencers of the so-called 'genetic background', being able to at least potentially influence disease susceptibilities. Here, the known 'bad boys' among repetitive DNAs are reviewed. Variable numbers of tandem repeats (VNTRs = micro- and minisatellites), small-scale repetitive elements (SSREs) and even chromosomal heteromorphisms (CHs) may therefore have direct or indirect influences on human diseases and susceptibilities. Summarizing this specific aspect here for the first time should contribute to stimulating more research on human repetitive DNA. It should also become clear that these kinds of studies must be done at all available levels of resolution, i.e., from the base pair to chromosomal level and, importantly, the epigenetic level, as well.
Topics: Chromosomes, Human; DNA, Satellite; Genome, Human; Humans; Microsatellite Repeats; Minisatellite Repeats; Repetitive Sequences, Nucleic Acid
PubMed: 33669810
DOI: 10.3390/ijms22042072 -
Frontiers in Genetics 2022Expanded tandem repeat DNAs are associated with various unusual chromosomal lesions, despiralizations, multi-branched inter-chromosomal associations, and fragile sites.... (Review)
Review
Expanded tandem repeat DNAs are associated with various unusual chromosomal lesions, despiralizations, multi-branched inter-chromosomal associations, and fragile sites. Fragile sites cytogenetically manifest as localized gaps or discontinuities in chromosome structure and are an important genetic, biological, and health-related phenomena. Common fragile sites (∼230), present in most individuals, are induced by aphidicolin and can be associated with cancer; of the 27 molecularly-mapped common sites, none are associated with a particular DNA sequence motif. Rare fragile sites ( 40 known), 5% of the population (may be as few as a single individual), can be associated with neurodevelopmental disease. All 10 molecularly-mapped folate-sensitive fragile sites, the largest category of rare fragile sites, are caused by gene-specific CGG/CCG tandem repeat expansions that are aberrantly CpG methylated and include FRAXA, FRAXE, FRAXF, FRA2A, FRA7A, FRA10A, FRA11A, FRA11B, FRA12A, and FRA16A. The minisatellite-associated rare fragile sites, FRA10B, FRA16B, can be induced by AT-rich DNA-ligands or nucleotide analogs. Despiralized lesions and multi-branched inter-chromosomal associations at the heterochromatic satellite repeats of chromosomes 1, 9, 16 are inducible by de-methylating agents like 5-azadeoxycytidine and can spontaneously arise in patients with ICF syndrome (mmunodeficiency entromeric instability and acial anomalies) with mutations in genes regulating DNA methylation. ICF individuals have hypomethylated satellites I-III, alpha-satellites, and subtelomeric repeats. Ribosomal repeats and subtelomeric D4Z4 megasatellites/macrosatellites, are associated with chromosome location, fragility, and disease. Telomere repeats can also assume fragile sites. Dietary deficiencies of folate or vitamin B12, or drug insults are associated with megaloblastic and/or pernicious anemia, that display chromosomes with fragile sites. The recent discovery of many new tandem repeat expansion loci, with varied repeat motifs, where motif lengths can range from mono-nucleotides to megabase units, could be the molecular cause of new fragile sites, or other chromosomal lesions. This review focuses on repeat-associated fragility, covering their induction, cytogenetics, epigenetics, cell type specificity, genetic instability (repeat instability, micronuclei, deletions/rearrangements, and sister chromatid exchange), unusual heritability, disease association, and penetrance. Understanding tandem repeat-associated chromosomal fragile sites provides insight to chromosome structure, genome packaging, genetic instability, and disease.
PubMed: 36468036
DOI: 10.3389/fgene.2022.985975 -
Science (New York, N.Y.) Sep 2021Many human proteins contain domains that vary in size or copy number because of variable numbers of tandem repeats (VNTRs) in protein-coding exons. However, the...
Many human proteins contain domains that vary in size or copy number because of variable numbers of tandem repeats (VNTRs) in protein-coding exons. However, the relationships of VNTRs to most phenotypes are unknown because of difficulties in measuring such repetitive elements. We developed methods to estimate VNTR lengths from whole-exome sequencing data and impute VNTR alleles into single-nucleotide polymorphism haplotypes. Analyzing 118 protein-altering VNTRs in 415,280 UK Biobank participants for association with 786 phenotypes identified some of the strongest associations of common variants with human phenotypes, including height, hair morphology, and biomarkers of health. Accounting for large-effect VNTRs further enabled fine-mapping of associations to many more protein-coding mutations in the same genes. These results point to cryptic effects of highly polymorphic common structural variants that have eluded molecular analyses to date.
Topics: Aggrecans; Antigens; Black People; Body Height; Genetic Association Studies; Genome, Human; Hair; Haplotypes; Humans; Intermediate Filament Proteins; Kidney; Lipoprotein(a); Minisatellite Repeats; Mucin-1; Phenotype; Polymorphism, Genetic; Polymorphism, Single Nucleotide; Polynucleotide Adenylyltransferase; White People; Exome Sequencing
PubMed: 34554798
DOI: 10.1126/science.abg8289 -
Cell Reports Sep 2022Since formation of the first proto-eukaryotes, gene repertoire and genome complexity have significantly increased. Among genetic elements responsible for this increase...
Since formation of the first proto-eukaryotes, gene repertoire and genome complexity have significantly increased. Among genetic elements responsible for this increase are tandem repeats. Here we describe a genome-wide analysis of large tandem repeats, called megasatellites, in 58 vertebrate genomes. Two bursts occurred, one after the radiation between Agnatha and Gnathostomata fishes and the second one in therian mammals. Megasatellites are enriched in subtelomeric regions and frequently encoded in genes involved in transcription regulation, intracellular trafficking, and cell membrane metabolism, reminiscent of what is observed in fungus genomes. The presence of many introns within young megasatellites suggests that an exon-intron DNA segment is first duplicated and amplified before accumulation of mutations in intronic parts partially erases the megasatellite in such a way that it becomes detectable only in exons. Our results suggest that megasatellite formation and evolution is a dynamic and still ongoing process in vertebrate genomes.
Topics: Animals; Evolution, Molecular; Exons; Genome, Fungal; Introns; Mammals; Vertebrates
PubMed: 36103826
DOI: 10.1016/j.celrep.2022.111347 -
Veterinary Research Dec 2022African swine fever virus (ASFV) is a large DNA virus that infects domestic pigs with high morbidity and mortality rates. Repeat sequences, which are DNA sequence...
African swine fever virus (ASFV) is a large DNA virus that infects domestic pigs with high morbidity and mortality rates. Repeat sequences, which are DNA sequence elements that are repeated more than twice in the genome, play an important role in the ASFV genome. The majority of repeat sequences, however, have not been identified and characterized in a systematic manner. In this study, three types of repeat sequences, including microsatellites, minisatellites and short interspersed nuclear elements (SINEs), were identified in the ASFV genome, and their distribution, structure, function, and evolutionary history were investigated. Most repeat sequences were observed in noncoding regions and at the 5' end of the genome. Noncoding repeat sequences tended to form enhancers, whereas coding repeat sequences had a lower ratio of alpha-helix and beta-sheet and a higher ratio of loop structure and surface amino acids than nonrepeat sequences. In addition, the repeat sequences tended to encode penetrating and antimicrobial peptides. Further analysis of the evolution of repeat sequences revealed that the pan-repeat sequences presented an open state, showing the diversity of repeat sequences. Finally, CpG islands were observed to be negatively correlated with repeat sequence occurrences, suggesting that they may affect the generation of repeat sequences. Overall, this study emphasizes the importance of repeat sequences in ASFVs, and these results can aid in understanding the virus's function and evolution.
Topics: Animals; Swine; African Swine Fever Virus; Sus scrofa; Amino Acids; Antimicrobial Peptides; Minisatellite Repeats
PubMed: 36461107
DOI: 10.1186/s13567-022-01119-9 -
Genome Research Aug 2021There are more than 55,000 variable number tandem repeats (VNTRs) in the human genome, notable for both their striking polymorphism and mutability. Despite their role in...
There are more than 55,000 variable number tandem repeats (VNTRs) in the human genome, notable for both their striking polymorphism and mutability. Despite their role in human evolution and genomic variation, they have yet to be studied collectively and in detail, partially owing to their large size, variability, and predominant location in noncoding regions. Here, we examine 467 VNTRs that are human-specific expansions, unique to one location in the genome, and not associated with retrotransposons. We leverage publicly available long-read genomes, including from the Human Genome Structural Variant Consortium, to ascertain the exact nucleotide composition of these VNTRs and compare their composition of alleles. We then confirm repeat unit composition in more than 3000 short-read samples from the 1000 Genomes Project. Our analysis reveals that these VNTRs contain highly structured repeat motif organization, modified by frequent deletion and duplication events. Although overall VNTR compositions tend to remain similar between 1000 Genomes Project superpopulations, we describe a notable exception with substantial differences in repeat composition (in ), as well as several VNTRs that are significantly different in length between superpopulations (in , and ). We also observe that most of these VNTRs are expanded in archaic human genomes, yet remain stable in length between single generations. Collectively, our findings indicate that repeat motif variability, repeat composition, and repeat length are all informative modalities to consider when characterizing VNTRs and their contribution to genomic variation.
Topics: Genome, Human; Genomic Structural Variation; Humans; Minisatellite Repeats; Nucleotides; Polymorphism, Genetic
PubMed: 34244228
DOI: 10.1101/gr.275560.121 -
American Journal of Human Genetics Sep 2020Tandem repeats are proposed to contribute to human-specific traits, and more than 40 tandem repeat expansions are known to cause neurological disease. Here, we...
Tandem repeats are proposed to contribute to human-specific traits, and more than 40 tandem repeat expansions are known to cause neurological disease. Here, we characterize a human-specific 69 bp variable number tandem repeat (VNTR) in the last intron of WDR7, which exhibits striking variability in both copy number and nucleotide composition, as revealed by long-read sequencing. In addition, greater repeat copy number is significantly enriched in three independent cohorts of individuals with sporadic amyotrophic lateral sclerosis (ALS). Each unit of the repeat forms a stem-loop structure with the potential to produce microRNAs, and the repeat RNA can aggregate when expressed in cells. We leveraged its remarkable sequence variability to align the repeat in 288 samples and uncover its mechanism of expansion. We found that the repeat expands in the 3'-5' direction, in groups of repeat units divisible by two. The expansion patterns we observed were consistent with duplication events, and a replication error called template switching. We also observed that the VNTR is expanded in both Denisovan and Neanderthal genomes but is fixed at one copy or fewer in non-human primates. Evaluating the repeat in 1000 Genomes Project samples reveals that some repeat segments are solely present or absent in certain geographic populations. The large size of the repeat unit in this VNTR, along with our multiplexed sequencing strategy, provides an unprecedented opportunity to study mechanisms of repeat expansion, and a framework for evaluating the roles of VNTRs in human evolution and disease.
Topics: Adaptor Proteins, Signal Transducing; Aged; Alzheimer Disease; Amyotrophic Lateral Sclerosis; DNA Repeat Expansion; Evolution, Molecular; Female; Gene Expression Regulation; Humans; Male; Minisatellite Repeats; Phenotype; Species Specificity; Tandem Repeat Sequences
PubMed: 32750315
DOI: 10.1016/j.ajhg.2020.07.004