-
Molecular Plant Jan 2023Plant genomes are so highly diverse that a substantial proportion of genomic sequences are not shared among individuals. The variable DNA sequences, along with the... (Review)
Review
Plant genomes are so highly diverse that a substantial proportion of genomic sequences are not shared among individuals. The variable DNA sequences, along with the conserved core sequences, compose the more sophisticated pan-genome that represents the collection of all non-redundant DNA in a species. With rapid progress in genome sequencing technologies, pan-genome research in plants is now accelerating. Here we review recent advances in plant pan-genomics, including major driving forces of structural variations that constitute the variable sequences, methodological innovations for representing the pan-genome, and major successes in constructing plant pan-genomes. We also summarize recent efforts toward decoding the remaining dark matter in telomere-to-telomere or gapless plant genomes. These new genome resources, which have remarkable advantages over numerous previously assembled less-than-perfect genomes, are expected to become new references for genetic studies and plant breeding.
Topics: Genomics; Genome, Plant; Chromosome Mapping
PubMed: 36523157
DOI: 10.1016/j.molp.2022.12.009 -
Nature Reviews. Genetics Apr 2020Since the early days of the genome era, the scientific community has relied on a single 'reference' genome for each species, which is used as the basis for a wide range... (Review)
Review
Since the early days of the genome era, the scientific community has relied on a single 'reference' genome for each species, which is used as the basis for a wide range of genetic analyses, including studies of variation within and across species. As sequencing costs have dropped, thousands of new genomes have been sequenced, and scientists have come to realize that a single reference genome is inadequate for many purposes. By sampling a diverse set of individuals, one can begin to assemble a pan-genome: a collection of all the DNA sequences that occur in a species. Here we review efforts to create pan-genomes for a range of species, from bacteria to humans, and we further consider the computational methods that have been proposed in order to capture, interpret and compare pan-genome data. As scientists continue to survey and catalogue the genomic variation across human populations and begin to assemble a human pan-genome, these efforts will increase our power to connect variation to human diversity, disease and beyond.
Topics: Genome, Bacterial; Genome, Human; Genome, Plant; Genomics; Humans
PubMed: 32034321
DOI: 10.1038/s41576-020-0210-7 -
Briefings in Bioinformatics Jul 2019For the past 20 years, the Clusters of Orthologous Genes (COG) database had been a popular tool for microbial genome annotation and comparative genomics. Initially... (Review)
Review
For the past 20 years, the Clusters of Orthologous Genes (COG) database had been a popular tool for microbial genome annotation and comparative genomics. Initially created for the purpose of evolutionary classification of protein families, the COG have been used, apart from straightforward functional annotation of sequenced genomes, for such tasks as (i) unification of genome annotation in groups of related organisms; (ii) identification of missing and/or undetected genes in complete microbial genomes; (iii) analysis of genomic neighborhoods, in many cases allowing prediction of novel functional systems; (iv) analysis of metabolic pathways and prediction of alternative forms of enzymes; (v) comparison of organisms by COG functional categories; and (vi) prioritization of targets for structural and functional characterization. Here we review the principles of the COG approach and discuss its key advantages and drawbacks in microbial genome analysis.
Topics: Computational Biology; Databases, Protein; Evolution, Molecular; Genome, Microbial; Genomics; Molecular Sequence Annotation; Multigene Family; Phylogeny; Proteins
PubMed: 28968633
DOI: 10.1093/bib/bbx117 -
Nature Medicine Feb 2022Two decades ago, the sequence of the first human genome was published. Since then, advances in genome technologies have resulted in whole-genome sequencing and... (Review)
Review
Two decades ago, the sequence of the first human genome was published. Since then, advances in genome technologies have resulted in whole-genome sequencing and microarray-based genotyping of millions of human genomes. However, genetic and genomic studies are predominantly based on populations of European ancestry. As a result, the potential benefits of genomic research-including better understanding of disease etiology, early detection and diagnosis, rational drug design and improved clinical care-may elude the many underrepresented populations. Here, we describe factors that have contributed to the imbalance in representation of different populations and, leveraging our experiences in setting up genomic studies in diverse global populations, we propose a roadmap to enhancing inclusion and ensuring equal health benefits of genomics advances. Our Perspective highlights the importance of sincere, concerted global efforts toward genomic equity to ensure the benefits of genomic medicine are accessible to all.
Topics: Genome, Human; Genomics; Humans; Whole Genome Sequencing
PubMed: 35145307
DOI: 10.1038/s41591-021-01672-4 -
Nucleic Acids Research Jan 2023KEGG (https://www.kegg.jp) is a manually curated database resource integrating various biological objects categorized into systems, genomic, chemical and health...
KEGG (https://www.kegg.jp) is a manually curated database resource integrating various biological objects categorized into systems, genomic, chemical and health information. Each object (database entry) is identified by the KEGG identifier (kid), which generally takes the form of a prefix followed by a five-digit number, and can be retrieved by appending /entry/kid in the URL. The KEGG pathway map viewer, the Brite hierarchy viewer and the newly released KEGG genome browser can be launched by appending /pathway/kid, /brite/kid and /genome/kid, respectively, in the URL. Together with an improved annotation procedure for KO (KEGG Orthology) assignment, an increasing number of eukaryotic genomes have been included in KEGG for better representation of organisms in the taxonomic tree. Multiple taxonomy files are generated for classification of KEGG organisms and viruses, and the Brite hierarchy viewer is used for taxonomy mapping, a variant of Brite mapping in the new KEGG Mapper suite. The taxonomy mapping enables analysis of, for example, how functional links of genes in the pathway and physical links of genes on the chromosome are conserved among organism groups.
Topics: Genome; Genomics; Databases, Factual; Databases, Genetic
PubMed: 36300620
DOI: 10.1093/nar/gkac963 -
Genome Research May 2017While metagenomics has emerged as a technology of choice for analyzing bacterial populations, the assembly of metagenomic data remains challenging, thus stifling...
While metagenomics has emerged as a technology of choice for analyzing bacterial populations, the assembly of metagenomic data remains challenging, thus stifling biological discoveries. Moreover, recent studies revealed that complex bacterial populations may be composed from dozens of related strains, thus further amplifying the challenge of metagenomic assembly. metaSPAdes addresses various challenges of metagenomic assembly by capitalizing on computational ideas that proved to be useful in assemblies of single cells and highly polymorphic diploid genomes. We benchmark metaSPAdes against other state-of-the-art metagenome assemblers and demonstrate that it results in high-quality assemblies across diverse data sets.
Topics: Contig Mapping; Genome, Bacterial; Genomics; Metagenome; Sequence Analysis, DNA; Software
PubMed: 28298430
DOI: 10.1101/gr.213959.116 -
Genome Research May 2022The concept of pan-genome, which is the collection of all genomes from a population, has shown a great potential in genomics study, especially for crop sciences. The...
The concept of pan-genome, which is the collection of all genomes from a population, has shown a great potential in genomics study, especially for crop sciences. The rice pan-genome constructed from the second-generation sequencing (SGS) data is about 270 Mb larger than , the rice reference genome (NipRG), but it is still disadvantaged by incompleteness and loss of genomic contexts. The third-generation sequencing (TGS) with long reads can help to construct better pan-genomes. In this paper, we report a high-quality rice pan-genome construction method by introducing a series of new steps to deal with the long-read data, including unmapped sequence block filtering, redundancy removing, and sequence block elongating. Compared to NipRG, the long-read sequencing-based pan-genome constructed from 105 rice accessions, which contains 604 Mb novel sequences, is much more comprehensive than the one constructed from ∼3000 rice genomes sequenced with short reads. The repetitive sequences are the main components of novel sequences, which partially explain the differences between the pan-genomes based on TGS and SGS. Adding six wild rice accessions, there are about 879 Mb novel sequences and 19,000 novel genes in the rice pan-genome in total. In addition, we have created high-quality reference genomes for all representative rice populations, including five gapless reference genomes. This study has made significant progress in our understanding of the rice pan-genome, and this pan-genome construction method for long-read data can be applied to accelerate a broad range of genomics studies.
Topics: Genome; Genomics; High-Throughput Nucleotide Sequencing; Oryza; Sequence Analysis, DNA
PubMed: 35396275
DOI: 10.1101/gr.276015.121 -
Current Opinion in Plant Biology Apr 2020Plant genomes span several orders of magnitude in size, vary in levels of ploidy and heterozygosity, and contain old and recent bursts of transposable elements, which... (Review)
Review
Plant genomes span several orders of magnitude in size, vary in levels of ploidy and heterozygosity, and contain old and recent bursts of transposable elements, which render them challenging but interesting to assemble. Recent advances in single molecule sequencing and physical mapping technologies have enabled high-quality, chromosome scale assemblies of plant species with increasing complexity and size. Single molecule reads can now exceed megabases in length, providing unprecedented opportunities to untangle genomic regions missed by short read technologies. However, polyploid and heterozygous plant genomes are still difficult to assemble but provide opportunities for new tools and approaches. Haplotype phasing, structural variant analysis and de novo pan-genomics are the emerging frontiers in plant genome assembly.
Topics: DNA Transposable Elements; Genome, Plant; Genomics; High-Throughput Nucleotide Sequencing; Sequence Analysis, DNA
PubMed: 31981929
DOI: 10.1016/j.pbi.2019.12.009 -
Journal of Genetics and Genomics = Yi... Sep 2022Pan-genomics can encompass most of the genetic diversity of a species or population and has proved to be a powerful tool for studying genomic evolution and the origin... (Review)
Review
Pan-genomics can encompass most of the genetic diversity of a species or population and has proved to be a powerful tool for studying genomic evolution and the origin and domestication of species, and for providing information for plant improvement. Plant genomics has greatly progressed because of improvements in sequencing technologies and the rapid reduction of sequencing costs. Nevertheless, pan-genomics still presents many challenges, including computationally intensive assembly methods, high costs with large numbers of samples, ineffective integration of big data, and difficulty in applying it to downstream multi-omics analysis and breeding research. In this review, we summarize the definition and recent achievements of plant pan-genomics, computational technologies used for pan-genome construction, and the applications of pan-genomes in plant genomics and molecular breeding. We also discuss challenges and perspectives for future pan-genomics studies and provide a detailed pipeline for sample selection, genome assembly and annotation, structural variation identification, and construction and application of graph-based pan-genomes. The aim is to provide important guidance for plant pan-genome research and a better understanding of the genetic basis of genome evolution, crop domestication, and phenotypic diversity for future studies.
Topics: Domestication; Genome, Plant; Genomics
PubMed: 35750315
DOI: 10.1016/j.jgg.2022.06.004 -
Proceedings of the National Academy of... Jan 2022Genomics encompasses the entire tree of life, both extinct and extant, and the evolutionary processes that shape this diversity. To date, genomic research has focused on...
Genomics encompasses the entire tree of life, both extinct and extant, and the evolutionary processes that shape this diversity. To date, genomic research has focused on humans, a small number of agricultural species, and established laboratory models. Fewer than 18,000 of ∼2,000,000 eukaryotic species (<1%) have a representative genome sequence in GenBank, and only a fraction of these have ancillary information on genome structure, genetic variation, gene expression, epigenetic modifications, and population diversity. This imbalance reflects a perception that human studies are paramount in disease research. Yet understanding how genomes work, and how genetic variation shapes phenotypes, requires a broad view that embraces the vast diversity of life. We have the technology to collect massive and exquisitely detailed datasets about the world, but expertise is siloed into distinct fields. A new approach, integrating comparative genomics with cell and evolutionary biology, ecology, archaeology, anthropology, and conservation biology, is essential for understanding and protecting ourselves and our world. Here, we describe potential for scientific discovery when comparative genomics works in close collaboration with a broad range of fields as well as the technical, scientific, and social constraints that must be addressed.
Topics: Animals; Biodiversity; Biological Evolution; Evolution, Molecular; Genetic Variation; Genome; Genomics; Humans; Phylogeny
PubMed: 35042807
DOI: 10.1073/pnas.2115644119