-
BMC Bioinformatics 2012Pedigree genotype datasets are used for analysing genetic inheritance and to map genetic markers and traits. Such datasets consist of hundreds of related animals...
BACKGROUND
Pedigree genotype datasets are used for analysing genetic inheritance and to map genetic markers and traits. Such datasets consist of hundreds of related animals genotyped for thousands of genetic markers and invariably contain multiple errors in both the pedigree structure and in the associated individual genotype data. These errors manifest as apparent inheritance inconsistencies in the pedigree, and invalidate analyses of marker inheritance patterns across the dataset. Cleaning raw datasets of bad data points (incorrect pedigree relationships, unreliable marker assays, suspect samples, bad genotype results etc.) requires expert exploration of the patterns of exposed inconsistencies in the context of the inheritance pedigree. In order to assist this process we are developing VIPER (Visual Pedigree Explorer), a software tool that integrates an inheritance-checking algorithm with a novel space-efficient pedigree visualisation, so that reported inheritance inconsistencies are overlaid on an interactive, navigable representation of the pedigree structure.
METHODS AND RESULTS
This paper describes an evaluation of how VIPER displays the different scales and types of dataset that occur experimentally, with a description of how VIPER's display interface and functionality meet the challenges presented by such data. We examine a range of possible error types found in real and simulated pedigree genotype datasets, demonstrating how these errors are exposed and explored using the VIPER interface and we evaluate the utility and usability of the interface to the domain expert.Evaluation was performed as a two stage process with the assistance of domain experts (geneticists). The initial evaluation drove the iterative implementation of further features in the software prototype, as required by the users, prior to a final functional evaluation of the pedigree display for exploring the various error types, data scales and structures.
CONCLUSIONS
The VIPER display was shown to effectively expose the range of errors found in experimental genotyped pedigrees, allowing users to explore the underlying causes of reported inheritance inconsistencies. This interface will provide the basis for a full data cleaning tool that will allow the user to remove isolated bad data points, and reversibly test the effect of removing suspect genotypes and pedigree relationships.
Topics: Algorithms; Animals; Animals, Domestic; Computational Biology; Female; Genetic Markers; Genotype; Humans; Male; Pedigree; Software
PubMed: 22607476
DOI: 10.1186/1471-2105-13-S8-S5 -
Molecular Ecology Resources Sep 2017Data on hundreds or thousands of single nucleotide polymorphisms (SNPs) provide detailed information about the relationships between individuals, but currently few tools...
Data on hundreds or thousands of single nucleotide polymorphisms (SNPs) provide detailed information about the relationships between individuals, but currently few tools can turn this information into a multigenerational pedigree. I present the r package sequoia, which assigns parents, clusters half-siblings sharing an unsampled parent and assigns grandparents to half-sibships. Assignments are made after consideration of the likelihoods of all possible first-, second- and third-degree relationships between the focal individuals, as well as the traditional alternative of being unrelated. This careful exploration of the local likelihood surface is implemented in a fast, heuristic hill-climbing algorithm. Distinction between the various categories of second-degree relatives is possible when likelihoods are calculated conditional on at least one parent of each focal individual. Performance was tested on simulated data sets with realistic genotyping error rate and missingness, based on three different large pedigrees (N = 1000-2000). This included a complex pedigree with overlapping generations, occasional close inbreeding and some unknown birth years. Parentage assignment was highly accurate down to about 100 independent SNPs (error rate <0.1%) and fast (<1 min) as most pairs can be excluded from being parent-offspring based on opposite homozygosity. For full pedigree reconstruction, 40% of parents were assumed nongenotyped. Reconstruction resulted in low error rates (<0.3%), high assignment rates (>99%) in limited computation time (typically <1 h) when at least 200 independent SNPs were used. In three empirical data sets, relatedness estimated from the inferred pedigree was strongly correlated to genomic relatedness.
Topics: Animals; Cluster Analysis; Computational Biology; Genotyping Techniques; Humans; Pedigree; Polymorphism, Single Nucleotide
PubMed: 28271620
DOI: 10.1111/1755-0998.12665 -
Genetic Epidemiology Jul 2011The need to collect accurate and complete pedigree information has been a drawback of family-based linkage and association studies. Even in case-control studies,...
The need to collect accurate and complete pedigree information has been a drawback of family-based linkage and association studies. Even in case-control studies, investigators should be aware of, and condition on, familial relationships. In single nucleotide polymorphism (SNP) genome scans, relatedness can be directly inferred from the genetic data rather than determined through interviews. Various methods of estimating relatedness have previously been implemented, most notably in PLINK. We present new fast and accurate algorithms for estimating global and local kinship coefficients from dense SNP genotypes. These algorithms require only a single pass through the SNP genotype data. We also show that these estimates can be used to cluster individuals into pedigrees. With these estimates in hand, quantitative trait locus linkage analysis proceeds via traditional variance components methods without any prior relationship information. We demonstrate the success of our algorithms on simulated and real data sets. Our procedures make linkage analysis as easy as a typical genomewide association study.
Topics: Algorithms; Alleles; Databases, Genetic; Female; Genetic Linkage; Genome-Wide Association Study; Humans; Male; Models, Genetic; Models, Statistical; Pedigree; Polymorphism, Single Nucleotide; Quantitative Trait Loci
PubMed: 21465549
DOI: 10.1002/gepi.20584 -
Molecular Ecology Jan 2022Over the past 50 years conservation genetics has developed a substantive toolbox to inform species management. One of the most long-standing tools available to manage...
Over the past 50 years conservation genetics has developed a substantive toolbox to inform species management. One of the most long-standing tools available to manage genetics-the pedigree-has been widely used to characterize diversity and maximize evolutionary potential in threatened populations. Now, with the ability to use high throughput sequencing to estimate relatedness, inbreeding, and genome-wide functional diversity, some have asked whether it is warranted for conservation biologists to continue collecting and collating pedigrees for species management. In this perspective, we argue that pedigrees remain a relevant tool, and when combined with genomic data, create an invaluable resource for conservation genomic management. Genomic data can address pedigree pitfalls (e.g., founder relatedness, missing data, uncertainty), and in return robust pedigrees allow for more nuanced research design, including well-informed sampling strategies and quantitative analyses (e.g., heritability, linkage) to better inform genomic inquiry. We further contend that building and maintaining pedigrees provides an opportunity to strengthen trusted relationships among conservation researchers, practitioners, Indigenous Peoples, and Local Communities.
Topics: Conservation of Natural Resources; Genetics, Population; Genome; Genomics; Inbreeding; Pedigree
PubMed: 34553796
DOI: 10.1111/mec.16192 -
Beijing Da Xue Xue Bao. Yi Xue Ban =... Jun 2023To utilized the baseline data of the Beijing Fangshan Family Cohort Study, and to estimate whether the association between a healthy lifestyle and arterial stiffness...
OBJECTIVE
To utilized the baseline data of the Beijing Fangshan Family Cohort Study, and to estimate whether the association between a healthy lifestyle and arterial stiffness might be modified by genetic effects.
METHODS
Probands and their relatives from 9 rural areas in Fangshan district, Beijing were included in this study. We developed a healthy lifestyle score based on five lifestyle behaviors: smoking, alcohol consumption, body mass index (BMI), dietary pattern, and physical activity. The measurements of arterial stiffness were brachial-ankle pulse wave velocity (baPWV) and ankle-brachial index (ABI). A variance component model was used to determine the heritability of arterial stiffness. Genotype-environment interaction effects were performed by the maximum likelihood methods. Subsequently, 45 candidate single nucleotide polymorphisms (SNPs) located in the glycolipid metabolism pathway were selected, and generalized estimated equations were used to assess the gene-environment interaction effects between particular genetic loci and healthy lifestyles.
RESULTS
A total of 6 302 study subjects across 3 225 pedigrees were enrolled in this study, with a mean age of 56.9 years and 45.1% male. Heritability of baPWV and ABI was 0.360 (95%: 0.302-0.418) and 0.243 (95%: 0.175-0.311), respectively. Significant genotype-healthy diet interaction on baPWV and genotype-BMI interaction on ABI were observed. Following the findings of genotype-environment interaction analysis, we further identified two SNPs located in and might modify the association between healthy dietary pattern and arterial stiffness, indicating that adherence to a healthy dietary pattern might attenuate the genetic risk on arterial stiffness. Three SNPs in , and were shown to interact with BMI, implying that maintaining BMI within a healthy range might decrease the genetic risk of arterial stiffness.
CONCLUSION
The current study discovered that genotype-healthy dietary pattern and genotype-BMI interactions might affect the risk of arterial stiffness. Furthermore, we identified five genetic loci that might modify the relationship between healthy dietary pattern and BMI with arterial stiffness. Our findings suggested that a healthy lifestyle may reduce the genetic risk of arterial stiffness. This study has laid the groundwork for future research exploring mechanisms of arterial stiffness.
Topics: Humans; Male; Middle Aged; Female; Ankle Brachial Index; Cohort Studies; Gene-Environment Interaction; Vascular Stiffness; Pedigree; Pulse Wave Analysis; Genotype
PubMed: 37291913
DOI: 10.19723/j.issn.1671-167X.2023.03.003 -
PLoS Genetics May 2019The rapid digitization of genealogical and medical records enables the assembly of extremely large pedigree records spanning millions of individuals and trillions of...
The rapid digitization of genealogical and medical records enables the assembly of extremely large pedigree records spanning millions of individuals and trillions of pairs of relatives. Such pedigrees provide the opportunity to investigate the sociological and epidemiological history of human populations in scales much larger than previously possible. Linear mixed models (LMMs) are routinely used to analyze extremely large animal and plant pedigrees for the purposes of selective breeding. However, LMMs have not been previously applied to analyze population-scale human family trees. Here, we present Sparse Cholesky factorIzation LMM (Sci-LMM), a modeling framework for studying population-scale family trees that combines techniques from the animal and plant breeding literature and from human genetics literature. The proposed framework can construct a matrix of relationships between trillions of pairs of individuals and fit the corresponding LMM in several hours. We demonstrate the capabilities of Sci-LMM via simulation studies and by estimating the heritability of longevity and of reproductive fitness (quantified via number of children) in a large pedigree spanning millions of individuals and over five centuries of human history. Sci-LMM provides a unified framework for investigating the epidemiological history of human populations via genealogical records.
Topics: Animals; Computer Simulation; Female; Genealogy and Heraldry; Genetic Fitness; Genetics, Population; Humans; Linear Models; Longevity; Male; Models, Genetic; Pedigree; Plants
PubMed: 31071088
DOI: 10.1371/journal.pgen.1008124 -
Journal of Dairy Science Nov 2019The objectives of this study were to investigate bias in genomic predictions for dairy cattle and to find a practical approach to reduce the bias. The simulated data...
The objectives of this study were to investigate bias in genomic predictions for dairy cattle and to find a practical approach to reduce the bias. The simulated data included phenotypes, pedigrees, and genotypes, mimicking a dairy cattle population (i.e., cows with phenotypes and bulls with no phenotypes) and assuming selection by breeding values or no selection. With the simulated data, genomic estimated breeding values (GEBV) were calculated with a single-step genomic BLUP and compared with true breeding values. Phenotypes and genotypes were simulated in 10 generations and in the last 4 generations, respectively. Phenotypes in the last generation were removed to predict breeding values for those individuals using only genomic and pedigree information. Complete pedigrees and incomplete pedigrees with 50% missing dams were created to construct the pedigree-based relationship matrix with and without inbreeding. With missing dams, unknown parent groups (UPG) were assigned in relationship matrices. Regression coefficients (b) and coefficients of determination (R) of true breeding values on (G)EBV were calculated to investigate inflation and accuracy in GEBV for genotyped animals, respectively. In addition to the simulation study, 18 linear type traits of US Holsteins were examined. For the 18 type traits, b and R of GEBV with full data sets on GEBV with partial data sets for young genotyped bulls were calculated. The results from the simulation study indicated inflation in GEBV for genotyped males that were evaluated with only pedigree and genomic information under BLUP selection. However, when UPG for only pedigree-based relationships were included, the inflation was reduced, accuracy was highest, and genetic trends had no bias. For the linear type traits, when UPG for only pedigree-based relationships were included, the results were generally in agreement with those from the simulation study, implying less bias in genetic trends. However, when including no UPG, UPG in pedigree-based relationships, or UPG in genomic relationships, inflation and accuracy in GEBV were similar. The results from the simulation and type traits suggest that UPG must be defined accurately to be estimable and inbreeding should be included in pedigree-based relationships. In dairy cattle, known pedigree information with inbreeding and estimable UPG plays an important role in improving compatibility between pedigree-based and genomic relationship matrices, resulting in more reliable genomic predictions.
Topics: Animals; Bias; Cattle; Female; Genotype; Male; Models, Genetic; Pedigree; Phenotype; Selective Breeding
PubMed: 31495630
DOI: 10.3168/jds.2019-16789 -
Human Genetics Oct 2012Rare variation is the current frontier in human genetics. The large pedigree design is practical, efficient, and well-suited for investigating rare variation. In large... (Review)
Review
Rare variation is the current frontier in human genetics. The large pedigree design is practical, efficient, and well-suited for investigating rare variation. In large pedigrees, specific rare variants that co-segregate with a trait will occur in sufficient numbers so that effects can be measured, and evidence for association can be evaluated, by making use of methods that fully use the pedigree information. Evidence from linkage analysis can focus investigation, both reducing the multiple testing burden and expanding the variants that can be evaluated and followed up, as recent studies have shown. The large pedigree design requires only a small fraction of the sample size needed to identify rare variants of interest in population-based designs, and many highly suitable, well-understood, and available statistical and computational tools already exist. Samples consisting of large pedigrees with existing rich phenotype and genome scan data should be prime candidates for high-throughput sequencing in the search of the determinants of complex traits.
Topics: Computational Biology; Genetic Association Studies; Genetic Linkage; Genetic Variation; High-Throughput Nucleotide Sequencing; Humans; Pedigree
PubMed: 22714655
DOI: 10.1007/s00439-012-1190-2 -
Journal of Animal Science May 2022This study investigated using imputed genotypes from non-genotyped animals which were not in the pedigree for the purpose of genetic selection and improving genetic gain...
This study investigated using imputed genotypes from non-genotyped animals which were not in the pedigree for the purpose of genetic selection and improving genetic gain for economically relevant traits. Simulations were used to mimic a 3-breed crossbreeding system that resembled a modern swine breeding scheme. The simulation consisted of three purebred (PB) breeds A, B, and C each with 25 and 425 mating males and females, respectively. Males from A and females from B were crossed to produce AB females (n = 1,000), which were crossed with males from C to produce crossbreds (CB; n = 10,000). The genome consisted of three chromosomes with 300 quantitative trait loci and ~9,000 markers. Lowly heritable reproductive traits were simulated for A, B, and AB (h2 = 0.2, 0.2, and 0.15, respectively), whereas a moderately heritable carcass trait was simulated for C (h2 = 0.4). Genetic correlations between reproductive traits in A, B, and AB were moderate (rg = 0.65). The goal trait of the breeding program was AB performance. Selection was practiced for four generations where AB and CB animals were first produced in generations 1 and 2, respectively. Non-genotyped AB dams were imputed using FImpute beginning in generation 2. Genotypes of PB and CB were used for imputation. Imputation strategies differed by three factors: 1) AB progeny genotyped per generation (2, 3, 4, or 6), 2) known or unknown mates of AB dams, and 3) genotyping rate of females from breeds A and B (0% or 100%). PB selection candidates from A and B were selected using estimated breeding values for AB performance, whereas candidates from C were selected by phenotype. Response to selection using imputed genotypes of non-genotyped animals was then compared to the scenarios where true AB genotypes (trueGeno) or no AB genotypes/phenotypes (noGeno) were used in genetic evaluations. The simulation was replicated 20 times. The average increase in genotype concordance between unknown and known sire imputation strategies was 0.22. Genotype concordance increased as the number of genotyped CB increased with little additional gain beyond 9 progeny. When mates of AB were known and more than 4 progeny were genotyped per generation, the phenotypic response in AB did not differ (P > 0.05) from trueGeno yet was greater (P < 0.05) than noGeno. Imputed genotypes of non-genotyped animals can be used to increase performance when 4 or more progeny are genotyped and sire pedigrees of CB animals are known.
Topics: Animals; Female; Genotype; Hybridization, Genetic; Male; Models, Genetic; Pedigree; Phenotype; Polymorphism, Single Nucleotide; Quantitative Trait Loci; Swine
PubMed: 35451025
DOI: 10.1093/jas/skac148 -
Heredity Mar 2017The proportion of an individual's genome that is identical by descent (GWIBD) can be estimated from pedigrees (inbreeding coefficient 'Pedigree F') or molecular markers...
The proportion of an individual's genome that is identical by descent (GWIBD) can be estimated from pedigrees (inbreeding coefficient 'Pedigree F') or molecular markers ('Marker F'), but both estimators come with error. Assuming unrelated pedigree founders, Pedigree F is the expected proportion of GWIBD given a specific inbreeding constellation. Meiotic recombination introduces variation around that expectation (Mendelian noise) and related pedigree founders systematically bias Pedigree F downward. Marker F is an estimate of the actual proportion of GWIBD but it suffers from the sampling error of markers plus the error that occurs when a marker is homozygous without reflecting common ancestry (identical by state). We here show via simulation of a zebra finch and a human linkage map that three aspects of meiotic recombination (independent assortment of chromosomes, number of crossovers and their distribution along chromosomes) contribute to variation in GWIBD and thus the precision of Pedigree and Marker F. In zebra finches, where the genome contains large blocks that are rarely broken up by recombination, the Mendelian noise was large (nearly twofold larger s.d. values compared with humans) and Pedigree F thus less precise than in humans, where crossovers are distributed more uniformly along chromosomes. Effects of meiotic recombination on Marker F were reversed, such that the same number of molecular markers yielded more precise estimates of GWIBD in zebra finches than in humans. As a consequence, in species inheriting large blocks that rarely recombine, even small numbers of microsatellite markers will often be more informative about inbreeding and fitness than large pedigrees.
Topics: Animals; Chromosome Mapping; Finches; Genetic Linkage; Genetic Markers; Genotyping Techniques; Homozygote; Humans; Inbreeding; Meiosis; Microsatellite Repeats; Pedigree; Recombination, Genetic
PubMed: 27804967
DOI: 10.1038/hdy.2016.95