-
Nature Reviews. Genetics Apr 2020Since the early days of the genome era, the scientific community has relied on a single 'reference' genome for each species, which is used as the basis for a wide range... (Review)
Review
Since the early days of the genome era, the scientific community has relied on a single 'reference' genome for each species, which is used as the basis for a wide range of genetic analyses, including studies of variation within and across species. As sequencing costs have dropped, thousands of new genomes have been sequenced, and scientists have come to realize that a single reference genome is inadequate for many purposes. By sampling a diverse set of individuals, one can begin to assemble a pan-genome: a collection of all the DNA sequences that occur in a species. Here we review efforts to create pan-genomes for a range of species, from bacteria to humans, and we further consider the computational methods that have been proposed in order to capture, interpret and compare pan-genome data. As scientists continue to survey and catalogue the genomic variation across human populations and begin to assemble a human pan-genome, these efforts will increase our power to connect variation to human diversity, disease and beyond.
Topics: Genome, Bacterial; Genome, Human; Genome, Plant; Genomics; Humans
PubMed: 32034321
DOI: 10.1038/s41576-020-0210-7 -
Briefings in Bioinformatics Jul 2019For the past 20 years, the Clusters of Orthologous Genes (COG) database had been a popular tool for microbial genome annotation and comparative genomics. Initially... (Review)
Review
For the past 20 years, the Clusters of Orthologous Genes (COG) database had been a popular tool for microbial genome annotation and comparative genomics. Initially created for the purpose of evolutionary classification of protein families, the COG have been used, apart from straightforward functional annotation of sequenced genomes, for such tasks as (i) unification of genome annotation in groups of related organisms; (ii) identification of missing and/or undetected genes in complete microbial genomes; (iii) analysis of genomic neighborhoods, in many cases allowing prediction of novel functional systems; (iv) analysis of metabolic pathways and prediction of alternative forms of enzymes; (v) comparison of organisms by COG functional categories; and (vi) prioritization of targets for structural and functional characterization. Here we review the principles of the COG approach and discuss its key advantages and drawbacks in microbial genome analysis.
Topics: Computational Biology; Databases, Protein; Evolution, Molecular; Genome, Microbial; Genomics; Molecular Sequence Annotation; Multigene Family; Phylogeny; Proteins
PubMed: 28968633
DOI: 10.1093/bib/bbx117 -
Trends in Genetics : TIG Sep 2021The reference genome serves two distinct purposes within the field of genomics. First, it provides a persistent structure against which findings can be reported,... (Review)
Review
The reference genome serves two distinct purposes within the field of genomics. First, it provides a persistent structure against which findings can be reported, allowing for universal knowledge exchange between users. Second, it reduces the computational costs and time required to process genomic data by creating a scaffold that can be relied upon by analysis software. Here, we posit that current efforts to extend the linear reference to a graph-based structure while trying to fulfil both of these purposes concurrently will face a trade-off between comprehensiveness and computational efficiency. In this article, we explore how the reference genome is used and suggest an alternative structure, The Genome Atlas (TGA), to fulfil the bipartite role of the reference genome.
Topics: Computer Graphics; Genetics, Medical; Genome; Genomics; Humans
PubMed: 33419587
DOI: 10.1016/j.tig.2020.12.002 -
Science China. Life Sciences Dec 2018Whole genome engineering is now feasible with the aid of genome editing and synthesis tools. Synthesizing a genome from scratch allows modifications of the genomic... (Review)
Review
Whole genome engineering is now feasible with the aid of genome editing and synthesis tools. Synthesizing a genome from scratch allows modifications of the genomic structure and function to an extent that was hitherto not possible, which will finally lead to new insights into the basic principles of life and enable valuable applications. With several recent genome synthesis projects as examples, the technical details to synthesize a genome and applications of synthetic genome are addressed in this perspective. A series of ongoing or future synthetic genomics projects, including the different genomes to be synthesized in GP-write, synthetic minimal genome, massively recoded genome, chimeric genome and synthetic genome with expanded genetic alphabet, are also discussed here with a special focus on theoretical and technical impediments in the design and synthesis process. Synthetic genomics will become a commonplace to engineer pathways and genomes according to arbitrary sets of design principles with the development of high-efficient, low-cost genome synthesis and assembly technologies.
Topics: Genes, Synthetic; Genetic Engineering; Genome; Genomics; Models, Biological; Sequence Analysis; Synthetic Biology
PubMed: 30465231
DOI: 10.1007/s11427-018-9403-y -
Yi Chuan = Hereditas Nov 2021With the release of high-quality reference genomes assembled by long reads from the third-generation sequencing technology, as well as extensive re-sequencing and... (Review)
Review
With the release of high-quality reference genomes assembled by long reads from the third-generation sequencing technology, as well as extensive re-sequencing and population genetic analysis, researchers found that a single reference genome does not represent the diversity within a species. The missing sequences on the reference genome result in an incomplete population genetic polymorphism map. The emergence of pan-genome can well repair the deficiency of single reference genome, which include core genome (responsible for basic biological functions and the main phenotypic characteristics within a species) and the variable genome (related to the genetic diversity or biological characteristics). According to the core and variable genome proportion, the types of pan-genomes can be either open or closed. Here, we review the current exploring of pan-genome for a range of species, to discuss the characteristics of pan-genome in various biological groups. The pan-genome of mammals are more likely closed, while the pan-genomes of microbes, angiosperms, and some invertebrates are likely non-closed. It is possible to complete the reference genome and obtain complete variation information through the pan-genomic study, which will contribute to the study of molecular mechanism for genetic diversity and phenotypic evolution.
Topics: Genome; Genomics
PubMed: 34815206
DOI: 10.16288/j.yczz.21-214 -
Annual Review of Animal Biosciences Feb 2019Affordable, high-throughput DNA sequencing has accelerated the pace of genome assembly over the past decade. Genome assemblies from high-throughput, short-read... (Review)
Review
Affordable, high-throughput DNA sequencing has accelerated the pace of genome assembly over the past decade. Genome assemblies from high-throughput, short-read sequencing, however, are often not as contiguous as the first generation of genome assemblies. Whereas early genome assembly projects were often aided by clone maps or other mapping data, many current assembly projects forego these scaffolding data and only assemble genomes into smaller segments. Recently, new technologies have been invented that allow chromosome-scale assembly at a lower cost and faster speed than traditional methods. Here, we give an overview of the problem of chromosome-scale assembly and traditional methods for tackling this problem. We then review new technologies for chromosome-scale assembly and recent genome projects that used these technologies to create highly contiguous genome assemblies at low cost.
Topics: Animals; Chromosome Mapping; Genome; Genomics; High-Throughput Nucleotide Sequencing; Sequence Analysis, DNA
PubMed: 30485757
DOI: 10.1146/annurev-animal-020518-115344 -
Swiss Medical Weekly Jan 2020Technological advances in the ability to read the human genome have accelerated the speed of sequencing, such that today we can perform whole genome sequencing (WGS) in... (Review)
Review
Technological advances in the ability to read the human genome have accelerated the speed of sequencing, such that today we can perform whole genome sequencing (WGS) in one day. Until recently, genomic studies have largely been limited to seeking novel scientific discoveries. The application of new insights gained through cancer WGS into the clinical domain, have been relatively limited. Looking ahead, a vast amount of data can be generated by genomic studies. Of note, excellent organisation of genomic and clinical data permits the application of machine-learning methods which can lead to the development of clinical algorithms that could assist future clinicians and genomicists in the analysis and interpretation of individual cancer genomes. Here, we describe what can be gleaned from holistic whole cancer genome profiling and argue that we must build the infrastructure and educational frameworks to support the modern clinical genomicist to prepare for a future where WGS will be the norm.
Topics: Algorithms; Genome, Human; Genomics; Humans; Neoplasms; Whole Genome Sequencing
PubMed: 31986218
DOI: 10.4414/smw.2020.20158 -
DNA Research : An International Journal... Jan 2021Pan-genomic studies aim at representing the entire sequence diversity within a species to provide useful resources for evolutionary studies, functional genomics and... (Review)
Review
Pan-genomic studies aim at representing the entire sequence diversity within a species to provide useful resources for evolutionary studies, functional genomics and breeding of cultivated plants. Cost reductions in high-throughput sequencing and advances in sequence assembly algorithms have made it possible to create multiple reference genomes along with a catalogue of all forms of genetic variations in plant species with large and complex or polyploid genomes. In this review, we summarize the current approaches to building pan-genomes as an in silico representation of plant sequence diversity and outline relevant methods for their effective utilization in linking structural with phenotypic variation. We propose as future research avenues (i) transcriptomic and epigenomic studies across multiple reference genomes and (ii) the development of user-friendly and feature-rich pan-genome browsers.
Topics: Computational Biology; Epigenomics; Gene Expression Profiling; Genetic Variation; Genome, Plant; Genomics; High-Throughput Nucleotide Sequencing; Plants; Sequence Analysis, DNA; Sequence Analysis, RNA; Transcriptome
PubMed: 33484244
DOI: 10.1093/dnares/dsaa030 -
Methods in Molecular Biology (Clifton,... 2012The science of genomes: only within the past few decades have scientists progressed from the analysis of a single or a small number of genes at once to the investigation... (Review)
Review
The science of genomes: only within the past few decades have scientists progressed from the analysis of a single or a small number of genes at once to the investigation of thousands of genes, going from the study of the units of inheritance to the investigation of the whole genome of an organism. The science of the genomes, or "genomics," initially dedicated to the determination of DNA sequences (the nucleotide order on a given fragment of DNA), has promptly expanded toward a more functional level--studying the expression profiles and the roles of both genes and proteins. The aim of the chapter is to review some basic assumptions and definitions that are the fabric of genomics, and to elucidate key concepts and approaches on which genomics rely.
Topics: Animals; Chromosome Mapping; Computational Biology; Genome; Genomics; Humans; Sequence Analysis, DNA
PubMed: 22081340
DOI: 10.1007/978-1-60327-216-2_6 -
Methods in Molecular Biology (Clifton,... 2018Bacteria and archaea, collectively known as prokaryotes, have in general genomes that are much smaller than those of eukaryotes. As a result, thousands of these genomes... (Comparative Study)
Comparative Study Review
Bacteria and archaea, collectively known as prokaryotes, have in general genomes that are much smaller than those of eukaryotes. As a result, thousands of these genomes have been sequenced. In prokaryotes, gene architecture lacks the intron-exon structure of eukaryotic genes (with an occasional exception). These two facts mean that there is an abundance of data for prokaryotic genomes, and that they are easier to study than the more complex eukaryotic genomes. In this chapter, we provide an overview of genome comparison tools that have been developed primarily (sometimes exclusively) for prokaryotic genomes. We cover methods that use only the DNA sequences, methods that use only the gene content, and methods that use both data types.
Topics: Algorithms; Computational Biology; Evolution, Molecular; Genes, Archaeal; Genes, Bacterial; Genome, Archaeal; Genome, Bacterial; Genomics; Phylogeny; Sequence Alignment; Sequence Analysis, DNA; Software
PubMed: 29277863
DOI: 10.1007/978-1-4939-7463-4_3