-
Genome Research Apr 2023There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a...
There has been tremendous progress in phased genome assembly production by combining long-read data with parental information or linked-read data. Nevertheless, a typical phased genome assembly generated by trio-hifiasm still generates more than 140 gaps. We perform a detailed analysis of gaps, assembly breaks, and misorientations from 182 haploid assemblies obtained from a diversity panel of 77 unique human samples. Although trio-based approaches using HiFi are the current gold standard, chromosome-wide phasing accuracy is comparable when using Strand-seq instead of parental data. Importantly, the majority of assembly gaps cluster near the largest and most identical repeats (including segmental duplications [35.4%], satellite DNA [22.3%], or regions enriched in GA/AT-rich DNA [27.4%]). Consequently, 1513 protein-coding genes overlap assembly gaps in at least one haplotype, and 231 are recurrently disrupted or missing from five or more haplotypes. Furthermore, we estimate that 6-7 Mbp of DNA are misorientated per haplotype irrespective of whether trio-free or trio-based approaches are used. Of these misorientations, 81% correspond to bona fide large inversion polymorphisms in the human species, most of which are flanked by large segmental duplications. We also identify large-scale alignment discontinuities consistent with 11.9 Mbp of deletions and 161.4 Mbp of insertions per haploid genome. Although 99% of this variation corresponds to satellite DNA, we identify 230 regions of euchromatic DNA with frequent expansions and contractions, nearly half of which overlap with 197 protein-coding genes. Such variable and incompletely assembled regions are important targets for future algorithmic development and pangenome representation.
Topics: Humans; DNA, Satellite; Polymorphism, Genetic; Haplotypes; Segmental Duplications, Genomic; Sequence Analysis, DNA
PubMed: 37164484
DOI: 10.1101/gr.277334.122 -
ELife Mar 2018A universal and unquestioned characteristic of eukaryotic cells is that the genome is divided into multiple chromosomes and encapsulated in a single nucleus. However,...
A universal and unquestioned characteristic of eukaryotic cells is that the genome is divided into multiple chromosomes and encapsulated in a single nucleus. However, the underlying mechanism to ensure such a configuration is unknown. Here, we provide evidence that pericentromeric satellite DNA, which is often regarded as junk, is a critical constituent of the chromosome, allowing the packaging of all chromosomes into a single nucleus. We show that the multi-AT-hook satellite DNA-binding proteins, D1 and mouse HMGA1, play an evolutionarily conserved role in bundling pericentromeric satellite DNA from heterologous chromosomes into 'chromocenters', a cytological association of pericentromeric heterochromatin. Defective chromocenter formation leads to micronuclei formation due to budding from the interphase nucleus, DNA damage and cell death. We propose that chromocenter and satellite DNA serve a fundamental role in encapsulating the full complement of the genome within a single nucleus, the universal characteristic of eukaryotic cells.
Topics: Animals; Cell Nucleus; Chromosomes; DNA, Satellite; DNA-Binding Proteins; Drosophila Proteins; Drosophila melanogaster; Eukaryotic Cells; HMGA1a Protein; Mice
PubMed: 29578410
DOI: 10.7554/eLife.34122 -
G3 (Bethesda, Md.) Nov 2020Satellite DNAs (satDNAs) are a ubiquitous feature of eukaryotic genomes and are usually the major components of constitutive heterochromatin. The satDNA, also known as...
Satellite DNAs (satDNAs) are a ubiquitous feature of eukaryotic genomes and are usually the major components of constitutive heterochromatin. The satDNA, also known as the 359 bp satellite, is one of the most abundant repetitive sequences in and has been linked to several different biological functions. We investigated the presence and evolution of the satDNA in 16 genomes. We find that the satDNA family is much more ancient than previously appreciated, being shared among part of the group that diverged from a common ancestor ∼27 Mya. We found that the satDNA family has two major subfamilies spread throughout phylogeny (∼360 bp and ∼190 bp). Phylogenetic analysis of ∼10,000 repeats extracted from 14 of the species revealed that the satDNA family is present within heterochromatin and euchromatin. A high number of euchromatic repeats are gene proximal, suggesting the potential for local gene regulation. Notably, heterochromatic copies display concerted evolution and a species-specific pattern, whereas euchromatic repeats display a more typical evolutionary pattern, suggesting that chromatin domains may influence the evolution of these sequences. Overall, our data indicate the satDNA as the most perduring satDNA family described in phylogeny to date. Our study provides a strong foundation for future work on the functional roles of satDNA across many species.
Topics: Animals; DNA, Satellite; Drosophila; Drosophila melanogaster; Evolution, Molecular; Phylogeny; Repetitive Sequences, Nucleic Acid
PubMed: 32934018
DOI: 10.1534/g3.120.401727 -
Genes May 2020Bioinformatic and molecular characterization of satellite repeats was performed to understand the impact of their diversification on genome evolution. Satellite repeat... (Comparative Study)
Comparative Study
Bioinformatic and molecular characterization of satellite repeats was performed to understand the impact of their diversification on genome evolution. Satellite repeat diversity was evaluated in four cultivated and wild species, including the diploid species and , as well as the tetraploid species and . We comparatively characterized six satellite repeat families using in total 76 clones with 180 monomers. We observed that the monomer units of VaccSat1, VaccSat2, VaccSat5, and VaccSat6 showed a higher order repeat (HOR) structure, likely originating from the organization of two adjacent subunits with differing similarity, length and size. Moreover, VaccSat1, VaccSat3, VaccSat6, and VaccSat7 were found to have sequence similarity to parts of transposable elements. We detected satellite-typical tandem organization for VaccSat1 and VaccSat2 in long arrays, while VaccSat5 and VaccSat6 distributed in multiple sites over all chromosomes of tetraploid , presumably in long arrays. In contrast, very short arrays of VaccSat3 and VaccSat7 are dispersedly distributed over all chromosomes in the same species, likely as internal parts of transposable elements. We provide a comprehensive overview on satellite species specificity in , which are potentially useful as molecular markers to address the taxonomic complexity of the genus, and provide information for genome studies of this genus.
Topics: Chromosomes, Plant; Computational Biology; DNA Transposable Elements; DNA, Satellite; Genome, Plant; Genotype; Phylogeny; Ploidies; Sequence Alignment; Species Specificity; Vaccinium
PubMed: 32397417
DOI: 10.3390/genes11050527 -
Epigenetics & Chromatin Oct 2021Trimethylation of histone H3 on lysine 9 (H3K9me3) at satellite DNA sequences has been primarily studied at (peri)centromeric regions, where its level shows differences...
BACKGROUND
Trimethylation of histone H3 on lysine 9 (H3K9me3) at satellite DNA sequences has been primarily studied at (peri)centromeric regions, where its level shows differences associated with various processes such as development and malignant transformation. However, the dynamics of H3K9me3 at distal satellite DNA repeats has not been thoroughly investigated.
RESULTS
We exploit the sets of publicly available data derived from chromatin immunoprecipitation combined with massively parallel DNA sequencing (ChIP-Seq), produced by the The Encyclopedia of DNA Elements (ENCODE) project, to analyze H3K9me3 at assembled satellite DNA repeats in genomes of human cell lines and during mouse fetal development. We show that annotated satellite elements are generally enriched for H3K9me3, but its level in cancer cell lines is on average lower than in normal cell lines. We find 407 satellite DNA instances with differential H3K9me3 enrichment between cancer and normal cells including a large 115-kb cluster of GSATII elements on chromosome 12. Differentially enriched regions are not limited to satellite DNA instances, but instead encompass a wider region of flanking sequences. We found no correlation between the levels of H3K9me3 and noncoding RNA at corresponding satellite DNA loci. The analysis of data derived from multiple tissues identified 864 instances of satellite DNA sequences in the mouse reference genome that are differentially enriched between fetal developmental stages.
CONCLUSIONS
Our study reveals significant differences in H3K9me3 level at a subset of satellite repeats between biological states and as such contributes to understanding of the role of satellite DNA repeats in epigenetic regulation during development and carcinogenesis.
Topics: Animals; Cell Line; DNA, Satellite; Epigenesis, Genetic; Fetal Development; Histones; Humans; Mice
PubMed: 34663449
DOI: 10.1186/s13072-021-00423-6 -
Cells Jun 2022Centromeric satellite DNA (cen-satDNA) consists of highly divergent repeat monomers, each approximately 171 base pairs in length. Here, we investigated the genetic...
Centromeric satellite DNA (cen-satDNA) consists of highly divergent repeat monomers, each approximately 171 base pairs in length. Here, we investigated the genetic diversity in the centromeric region of two primate species: long-tailed () and rhesus () macaques. Fluorescence in situ hybridization and bioinformatic analysis showed the chromosome-specific organization and dynamic nature of cen-satDNAsequences, and their substantial diversity, with distinct subfamilies across macaque populations, suggesting increased turnovers. Comparative genomics identified high level polymorphisms spanning a 120 bp deletion region and a remarkable interspecific variability in cen-satDNA size and structure. Population structure analysis detected admixture patterns within populations, indicating their high divergence and rapid evolution. However, differences in cen-satDNA profiles appear to not be involved in hybrid incompatibility between the two species. Our study provides a genomic landscape of centromeric repeats in wild macaques and opens new avenues for exploring their impact on the adaptive evolution and speciation of primates.
Topics: Animals; DNA, Satellite; Genomics; In Situ Hybridization, Fluorescence; Macaca fascicularis; Macaca mulatta
PubMed: 35741082
DOI: 10.3390/cells11121953 -
Genome Biology and Evolution Oct 2021A large portion of animal and plant genomes consists of noncoding DNA. This part includes tandemly repeated sequences and gained attention because it offers exciting...
A large portion of animal and plant genomes consists of noncoding DNA. This part includes tandemly repeated sequences and gained attention because it offers exciting insights into genome biology. We investigated satellite-DNA elements of the platyhelminth Schistosoma mansoni, a parasite with remarkable biological features. Schistosoma mansoni lives in the vasculature of humans causing schistosomiasis, a disease of worldwide importance. Schistosomes are the only trematodes that have evolved separate sexes, and the sexual maturation of the female depends on constant pairing with the male. The schistosome karyotype comprises eight chromosome pairs, males are homogametic (ZZ) and females are heterogametic (ZW). Part of the repetitive DNA of S. mansoni are W-elements (WEs), originally discovered as female-specific satellite DNAs in the heterochromatic block of the W-chromosome. Based on new genome and transcriptome data, we performed a reanalysis of the W-element families (WEFs). Besides a new classification of 19 WEFs, we provide first evidence for stage-, sex-, pairing-, gonad-, and strain-specific/preferential transcription of WEs as well as their mobile nature, deduced from autosomal copies of full-length and partial WEs. Structural analyses suggested roles as sources of noncoding RNA-like hammerhead ribozymes, for which we obtained functional evidence. Finally, the variable WEF occurrence in different schistosome species revealed remarkable divergence. From these results, we propose that WEs potentially exert enduring influence on the biology of S. mansoni. Their variable occurrence in different strains, isolates, and species suggests that schistosome WEs may represent genetic factors taking effect on variability and evolution of the family Schistosomatidae.
Topics: Animals; Biology; DNA, Satellite; Female; Male; Repetitive Sequences, Nucleic Acid; Schistosoma mansoni; Sex Chromosomes
PubMed: 34469545
DOI: 10.1093/gbe/evab204 -
BMC Genomics Apr 2021Mammalian centromeres are satellite-rich chromatin domains that execute conserved roles in kinetochore assembly and chromosome segregation. Centromere satellites evolve...
BACKGROUND
Mammalian centromeres are satellite-rich chromatin domains that execute conserved roles in kinetochore assembly and chromosome segregation. Centromere satellites evolve rapidly between species, but little is known about population-level diversity across these loci.
RESULTS
We developed a k-mer based method to quantify centromere copy number and sequence variation from whole genome sequencing data. We applied this method to diverse inbred and wild house mouse (Mus musculus) genomes to profile diversity across the core centromere (minor) satellite and the pericentromeric (major) satellite repeat. We show that minor satellite copy number varies more than 10-fold among inbred mouse strains, whereas major satellite copy numbers span a 3-fold range. In contrast to widely held assumptions about the homogeneity of mouse centromere repeats, we uncover marked satellite sequence heterogeneity within single genomes, with diversity levels across the minor satellite exceeding those at the major satellite. Analyses in wild-caught mice implicate subspecies and population origin as significant determinants of variation in satellite copy number and satellite heterogeneity. Intriguingly, we also find that wild-caught mice harbor dramatically reduced minor satellite copy number and elevated satellite sequence heterogeneity compared to inbred strains, suggesting that inbreeding may reshape centromere architecture in pronounced ways.
CONCLUSION
Taken together, our results highlight the power of k-mer based approaches for probing variation across repetitive regions, provide an initial portrait of centromere variation across Mus musculus, and lay the groundwork for future functional studies on the consequences of natural genetic variation at these essential chromatin domains.
Topics: Animals; Centromere; DNA, Satellite; Mice; Mice, Inbred Strains; Repetitive Sequences, Nucleic Acid
PubMed: 33865332
DOI: 10.1186/s12864-021-07591-5 -
Genome Biology and Evolution Apr 2019Repetitive satellite DNA (satDNA) sequences are abundant in eukaryote genomes, with a structural and functional role in centromeric function. We analyzed the nucleotide... (Comparative Study)
Comparative Study
Repetitive satellite DNA (satDNA) sequences are abundant in eukaryote genomes, with a structural and functional role in centromeric function. We analyzed the nucleotide sequence and chromosomal location of the five known cattle (Bos taurus) satDNA families in seven species from the tribe Tragelaphini (Bovinae subfamily). One of the families (SAT1.723) was present at the chromosomes' centromeres of the Tragelaphini species, as well in two more distantly related bovid species, Ovis aries and Capra hircus. Analysis of the interaction of SAT1.723 with centromeric proteins revealed that this satDNA sequence is involved in the centromeric activity in all the species analyzed and that it is preserved for at least 15-20 Myr across Bovidae species. The satDNA sequence similarity among the analyzed species reflected different stages of homogeneity/heterogeneity, revealing the evolutionary history of each satDNA family. The SAT1.723 monomer-flanking regions showed the presence of transposable elements, explaining the extensive shuffling of this satDNA between different genomic regions.
Topics: Animals; Centromere; Centromere Protein A; DNA Transposable Elements; DNA, Satellite; Genetic Variation; Multigene Family; Ruminants
PubMed: 30888421
DOI: 10.1093/gbe/evz061 -
PLoS Genetics Feb 2010In a previous study, we showed that centromere repositioning, that is the shift along the chromosome of the centromeric function without DNA sequence rearrangement, has...
In a previous study, we showed that centromere repositioning, that is the shift along the chromosome of the centromeric function without DNA sequence rearrangement, has occurred frequently during the evolution of the genus Equus. In this work, the analysis of the chromosomal distribution of satellite tandem repeats in Equus caballus, E. asinus, E. grevyi, and E. burchelli highlighted two atypical features: 1) several centromeres, including the previously described evolutionary new centromeres (ENCs), seem to be devoid of satellite DNA, and 2) satellite repeats are often present at non-centromeric termini, probably corresponding to relics of ancestral now inactive centromeres. Immuno-FISH experiments using satellite DNA and antibodies against the kinetochore protein CENP-A demonstrated that satellite-less primary constrictions are actually endowed with centromeric function. The phylogenetic reconstruction of centromere repositioning events demonstrates that the acquisition of satellite DNA occurs after the formation of the centromere during evolution and that centromeres can function over millions of years and many generations without detectable satellite DNA. The rapidly evolving Equus species gave us the opportunity to identify different intermediate steps along the full maturation of ENCs.
Topics: Animals; Autoantigens; Base Sequence; Cell Line; Centromere; Centromere Protein A; Chromosomal Proteins, Non-Histone; Chromosomes, Mammalian; DNA, Satellite; Equidae; Evolution, Molecular; Female; In Situ Hybridization, Fluorescence; Male; Phylogeny; Protein Transport
PubMed: 20169180
DOI: 10.1371/journal.pgen.1000845