-
Genome Medicine May 2024Polygenic prediction studies in continental Africans are scarce. Africa's genetic and environmental diversity pose a challenge that limits the generalizability of...
BACKGROUND
Polygenic prediction studies in continental Africans are scarce. Africa's genetic and environmental diversity pose a challenge that limits the generalizability of polygenic risk scores (PRS) for body mass index (BMI) within the continent. Studies to understand the factors that affect PRS variability within Africa are required.
METHODS
Using the first multi-ancestry genome-wide association study (GWAS) meta-analysis for BMI involving continental Africans, we derived a multi-ancestry PRS and compared its performance to a European ancestry-specific PRS in continental Africans (AWI-Gen study) and a European cohort (Estonian Biobank). We then evaluated the factors affecting the performance of the PRS in Africans which included fine-mapping resolution, allele frequencies, linkage disequilibrium patterns, and PRS-environment interactions.
RESULTS
Polygenic prediction of BMI in continental Africans is poor compared to that in European ancestry individuals. However, we show that the multi-ancestry PRS is more predictive than the European ancestry-specific PRS due to its improved fine-mapping resolution. We noted regional variation in polygenic prediction across Africa's East, South, and West regions, which was driven by a complex interplay of the PRS with environmental factors, such as physical activity, smoking, alcohol intake, and socioeconomic status.
CONCLUSIONS
Our findings highlight the role of gene-environment interactions in PRS prediction variability in Africa. PRS methods that correct for these interactions, coupled with the increased representation of Africans in GWAS, may improve PRS prediction in Africa.
Topics: Humans; Body Mass Index; Multifactorial Inheritance; Genome-Wide Association Study; Africa; Black People; Polymorphism, Single Nucleotide; White People; Genetic Predisposition to Disease; Gene Frequency; Gene-Environment Interaction; Linkage Disequilibrium; Male; Female
PubMed: 38816834
DOI: 10.1186/s13073-024-01348-x -
Scientific Reports May 2024We construct non-linear machine learning (ML) prediction models for systolic and diastolic blood pressure (SBP, DBP) using demographic and clinical variables and...
We construct non-linear machine learning (ML) prediction models for systolic and diastolic blood pressure (SBP, DBP) using demographic and clinical variables and polygenic risk scores (PRSs). We developed a two-model ensemble, consisting of a baseline model, where prediction is based on demographic and clinical variables only, and a genetic model, where we also include PRSs. We evaluate the use of a linear versus a non-linear model at both the baseline and the genetic model levels and assess the improvement in performance when incorporating multiple PRSs. We report the ensemble model's performance as percentage variance explained (PVE) on a held-out test dataset. A non-linear baseline model improved the PVEs from 28.1 to 30.1% (SBP) and 14.3% to 17.4% (DBP) compared with a linear baseline model. Including seven PRSs in the genetic model computed based on the largest available GWAS of SBP/DBP improved the genetic model PVE from 4.8 to 5.1% (SBP) and 4.7 to 5% (DBP) compared to using a single PRS. Adding additional 14 PRSs computed based on two independent GWASs further increased the genetic model PVE to 6.3% (SBP) and 5.7% (DBP). PVE differed across self-reported race/ethnicity groups, with primarily all non-White groups benefitting from the inclusion of additional PRSs. In summary, non-linear ML models improves BP prediction in models incorporating diverse populations.
Topics: Humans; Machine Learning; Blood Pressure; Multifactorial Inheritance; Phenotype; Genome-Wide Association Study; Risk Factors; Male; Female; Genetic Predisposition to Disease; Models, Genetic; Hypertension; Middle Aged; Genetic Risk Score
PubMed: 38816422
DOI: 10.1038/s41598-024-62945-9 -
Translational Psychiatry May 2024Substance use disorder (SUD) is a global health problem with a significant impact on individuals and society. The presentation of SUD is diverse, involving various...
Substance use disorder (SUD) is a global health problem with a significant impact on individuals and society. The presentation of SUD is diverse, involving various substances, ages at onset, comorbid conditions, and disease trajectories. Current treatments for SUD struggle to address this heterogeneity, resulting in high relapse rates. SUD often co-occurs with other psychiatric and mental health-related conditions that contribute to the heterogeneity of the disorder and predispose to adverse disease trajectories. Family and genetic studies highlight the role of genetic and environmental factors in the course of SUD, and point to a shared genetic liability between SUDs and comorbid psychopathology. In this study, we aimed to disentangle SUD heterogeneity using a deeply phenotyped SUD cohort and polygenic scores (PGSs) for psychiatric disorders and related traits. We explored associations between PGSs and various SUD-related phenotypes, as well as PGS-environment interactions using information on lifetime emotional, physical, and/or sexual abuse. Our results identify clusters of individuals who exhibit differences in their phenotypic profile and reveal different patterns of associations between SUD-related phenotypes and the genetic liability for mental health-related traits, which may help explain part of the heterogeneity observed in SUD. In our SUD sample, we found associations linking the genetic liability for attention-deficit hyperactivity disorder (ADHD) with lower educational attainment, the genetic liability for post-traumatic stress disorder (PTSD) with higher rates of unemployment, the genetic liability for educational attainment with lower rates of criminal records and unemployment, and the genetic liability for well-being with lower rates of outpatient treatments and fewer problems related to family and social relationships. We also found evidence of PGS-environment interactions showing that genetic liability for suicide attempts worsened the psychiatric status in SUD individuals with a history of emotional physical and/or sexual abuse. Collectively, these data contribute to a better understanding of the role of genetic liability for mental health-related conditions and adverse life experiences in SUD heterogeneity.
Topics: Humans; Substance-Related Disorders; Multifactorial Inheritance; Male; Female; Adult; Phenotype; Genetic Predisposition to Disease; Middle Aged; Genome-Wide Association Study; Gene-Environment Interaction; Young Adult; Comorbidity; Mental Disorders
PubMed: 38811559
DOI: 10.1038/s41398-024-02923-x -
Nature Communications May 2024Dominance heritability in complex traits has received increasing recognition. However, most polygenic score (PGS) approaches do not incorporate non-additive effects....
Dominance heritability in complex traits has received increasing recognition. However, most polygenic score (PGS) approaches do not incorporate non-additive effects. Here, we present GenoBoost, a flexible PGS modeling framework capable of considering both additive and non-additive effects, specifically focusing on genetic dominance. Building on statistical boosting theory, we derive provably optimal GenoBoost scores and provide its efficient implementation for analyzing large-scale cohorts. We benchmark it against seven commonly used PGS methods and demonstrate its competitive predictive performance. GenoBoost is ranked the best for four traits and second-best for three traits among twelve tested disease outcomes in UK Biobank. We reveal that GenoBoost improves prediction for autoimmune diseases by incorporating non-additive effects localized in the MHC locus and, more broadly, works best in less polygenic traits. We further demonstrate that GenoBoost can infer the mode of genetic inheritance without requiring prior knowledge. For example, GenoBoost finds non-zero genetic dominance effects for 602 of 900 selected genetic variants, resulting in 2.5% improvements in predicting psoriasis cases. Lastly, we show that GenoBoost can prioritize genetic loci with genetic dominance not previously reported in the GWAS catalog. Our results highlight the increased accuracy and biological insights from incorporating non-additive effects in PGS models.
Topics: Multifactorial Inheritance; Humans; Models, Genetic; Genome-Wide Association Study; Polymorphism, Single Nucleotide; Genetic Predisposition to Disease; Autoimmune Diseases; Genes, Dominant; Psoriasis
PubMed: 38811555
DOI: 10.1038/s41467-024-48654-x -
The New Phytologist Jul 2024Understanding the genetic basis of how plants defend against pathogens is important to monitor and maintain resilient tree populations. Swiss needle cast (SNC) and...
Understanding the genetic basis of how plants defend against pathogens is important to monitor and maintain resilient tree populations. Swiss needle cast (SNC) and Rhabdocline needle cast (RNC) epidemics are responsible for major damage of forest ecosystems in North America. Here we investigate the genetic architecture of tolerance and resistance to needle cast diseases in Douglas-fir (Pseudotsuga menziesii) caused by two fungal pathogens: SNC caused by Nothophaeocryptopus gaeumannii, and RNC caused by Rhabdocline pseudotsugae. We performed case-control genome-wide association analyses and found disease resistance and tolerance in Douglas-fir to be polygenic and under strong selection. We show that stomatal regulation as well as ethylene and jasmonic acid pathways are important for resisting SNC infection, and secondary metabolite pathways play a role in tolerating SNC once the plant is infected. We identify a major transcriptional regulator of plant defense, ERF1, as the top candidate for RNC resistance. Our findings shed light on the highly polygenic architectures underlying fungal disease resistance and tolerance and have important implications for forestry and conservation as the climate changes.
Topics: Disease Resistance; Plant Diseases; Pseudotsuga; Genome-Wide Association Study; Ascomycota; Trees; Adaptation, Physiological; Multifactorial Inheritance; Gene Expression Regulation, Plant; Genes, Plant
PubMed: 38803110
DOI: 10.1111/nph.19797 -
Heredity May 2024When a population is isolated and composed of few individuals, genetic drift is the paramount evolutionary force and results in the loss of genetic diversity. Inbreeding...
When a population is isolated and composed of few individuals, genetic drift is the paramount evolutionary force and results in the loss of genetic diversity. Inbreeding might also occur, resulting in genomic regions that are identical by descent, manifesting as runs of homozygosity (ROHs) and the expression of recessive traits. Likewise, the genes underlying traits of interest can be revealed by comparing fixed SNPs and divergent haplotypes between affected and unaffected individuals. Populations of white-tailed deer (Odocoileus virginianus) on islands of Saint Pierre and Miquelon (SPM, France) have high incidences of leucism and malocclusions, both considered genetic defects; on the Florida Keys islands (USA) deer exhibit smaller body sizes, a polygenic trait. Here we aimed to reconstruct island demography and identify the genes associated with these traits in a pseudo case-control design. The two island populations showed reduced levels of genomic diversity and a build-up of deleterious mutations compared to mainland deer; there was also significant genome-wide divergence in Key deer. Key deer showed higher inbreeding levels, but not longer ROHs, consistent with long-term isolation. We identified multiple trait-related genes in ROHs including LAMTOR2 which has links to pigmentation changes, and NPVF which is linked to craniofacial abnormalities. Our mixed approach of linking ROHs, fixed SNPs and haplotypes matched a high number (~50) of a-priori body size candidate genes in Key deer. This suite of biomarkers and candidate genes should prove useful for population monitoring, noting all three phenotypes show patterns consistent with a complex trait and non-Mendelian inheritance.
PubMed: 38802598
DOI: 10.1038/s41437-024-00685-2 -
Journal of Alzheimer's Disease : JAD 2024Polygenic risk scores (PRS) are linear combinations of genetic markers weighted by effect size that are commonly used to predict disease risk. For complex heritable...
BACKGROUND
Polygenic risk scores (PRS) are linear combinations of genetic markers weighted by effect size that are commonly used to predict disease risk. For complex heritable diseases such as late-onset Alzheimer's disease (LOAD), PRS models fail to capture much of the heritability. Additionally, PRS models are highly dependent on the population structure of the data on which effect sizes are assessed and have poor generalizability to new data.
OBJECTIVE
The goal of this study is to construct a paragenic risk score that, in addition to single genetic marker data used in PRS, incorporates epistatic interaction features and machine learning methods to predict risk for LOAD.
METHODS
We construct a new state-of-the-art genetic model for risk of Alzheimer's disease. Our approach innovates over PRS models in two ways: First, by directly incorporating epistatic interactions between SNP loci using an evolutionary algorithm guided by shared pathway information; and second, by estimating risk via an ensemble of non-linear machine learning models rather than a single linear model. We compare the paragenic model to several PRS models from the literature trained on the same dataset.
RESULTS
The paragenic model is significantly more accurate than the PRS models under 10-fold cross-validation, obtaining an AUC of 83% and near-clinically significant matched sensitivity/specificity of 75%. It remains significantly more accurate when evaluated on an independent holdout dataset and maintains accuracy within APOE genotype strata.
CONCLUSIONS
Paragenic models show potential for improving disease risk prediction for complex heritable diseases such as LOAD over PRS models.
Topics: Humans; Alzheimer Disease; Machine Learning; Multifactorial Inheritance; Epistasis, Genetic; Genetic Predisposition to Disease; Female; Male; Polymorphism, Single Nucleotide; Aged; Genome-Wide Association Study; Apolipoproteins E; Models, Genetic; Genetic Risk Score
PubMed: 38788065
DOI: 10.3233/JAD-230236 -
ELife May 2024Rich data from large biobanks, coupled with increasingly accessible association statistics from genome-wide association studies (GWAS), provide great opportunities to...
Rich data from large biobanks, coupled with increasingly accessible association statistics from genome-wide association studies (GWAS), provide great opportunities to dissect the complex relationships among human traits and diseases. We introduce BADGERS, a powerful method to perform polygenic score-based biobank-wide association scans. Compared to traditional approaches, BADGERS uses GWAS summary statistics as input and does not require multiple traits to be measured in the same cohort. We applied BADGERS to two independent datasets for late-onset Alzheimer's disease (AD; n=61,212). Among 1738 traits in the UK biobank, we identified 48 significant associations for AD. Family history, high cholesterol, and numerous traits related to intelligence and education showed strong and independent associations with AD. Furthermore, we identified 41 significant associations for a variety of AD endophenotypes. While family history and high cholesterol were strongly associated with AD subgroups and pathologies, only intelligence and education-related traits predicted pre-clinical cognitive phenotypes. These results provide novel insights into the distinct biological processes underlying various risk factors for AD.
Topics: Alzheimer Disease; Humans; Risk Factors; Endophenotypes; Genome-Wide Association Study; Male; Biological Specimen Banks; Female; United Kingdom; Aged; Genetic Predisposition to Disease; Multifactorial Inheritance; Aged, 80 and over
PubMed: 38787369
DOI: 10.7554/eLife.91360 -
Journal of the Association For Research... May 2024Age-related hearing loss is the most common form of permanent hearing loss that is associated with various health traits, including Alzheimer's disease, cognitive...
PURPOSE
Age-related hearing loss is the most common form of permanent hearing loss that is associated with various health traits, including Alzheimer's disease, cognitive decline, and depression. The present study aims to identify genetic comorbidities of age-related hearing loss. Past genome-wide association studies identified multiple genomic loci involved in common adult-onset health traits. Polygenic risk scores (PRS) could summarize the polygenic inheritance and quantify the genetic susceptibility of complex traits independent of trait expression. The present study conducted a PRS-based association analysis of age-related hearing difficulty in the UK Biobank sample (N = 425,240), followed by a replication analysis using hearing thresholds (HTs) and distortion-product otoacoustic emissions (DPOAEs) in 242 young adults with self-reported normal hearing. We hypothesized that young adults with genetic comorbidities associated with age-related hearing difficulty would exhibit subclinical decline in HTs and DPOAEs in both ears.
METHODS
A total of 111,243 participants reported age-related hearing difficulty in the UK Biobank sample (> 40 years). The PRS models were derived from the polygenic risk score catalog to obtain 2627 PRS predictors across the health spectrum. HTs (0.25-16 kHz) and DPOAEs (1-16 kHz, L1/L2 = 65/55 dB SPL, F2/F1 = 1.22) were measured on 242 young adults. Saliva-derived DNA samples were subjected to low-pass whole genome sequencing, followed by genome-wide imputation and PRS calculation. The logistic regression analyses were performed to identify PRS predictors of age-related hearing difficulty in the UK Biobank cohort. The linear mixed model analyses were performed to identify PRS predictors of HTs and DPOAEs.
RESULTS
The PRS-based association analysis identified 977 PRS predictors across the health spectrum associated with age-related hearing difficulty. Hearing difficulty and hearing aid use PRS predictors revealed the strongest association with the age-related hearing difficulty phenotype. Youth with a higher genetic predisposition to hearing difficulty revealed a subclinical elevation in HTs and a decline in DPOAEs in both ears. PRS predictors associated with age-related hearing difficulty were enriched for mental health, lifestyle, metabolic, sleep, reproductive, digestive, respiratory, hematopoietic, and immune traits. Fifty PRS predictors belonging to various trait categories were replicated for HTs and DPOAEs in both ears.
CONCLUSION
The study identified genetic comorbidities associated with age-related hearing loss across the health spectrum. Youth with a high genetic predisposition to age-related hearing difficulty and other related complex traits could exhibit sub-clinical decline in HTs and DPOAEs decades before clinically meaningful age-related hearing loss is observed. We posit that effective communication of genetic risk, promoting a healthy lifestyle, and reducing exposure to environmental risk factors at younger ages could help prevent or delay the onset of age-related hearing difficulty at older ages.
PubMed: 38782831
DOI: 10.1007/s10162-024-00947-0 -
Familial Cancer May 2024Pancreatic ductal adenocarcinoma (PDAC) is the fourth leading cause of cancer-related death in the Western world. The number of diagnosed cases and the mortality rate...
The best linear unbiased prediction (BLUP) method as a tool to estimate the lifetime risk of pancreatic ductal adenocarcinoma in high-risk individuals with no known pathogenic germline variants.
Pancreatic ductal adenocarcinoma (PDAC) is the fourth leading cause of cancer-related death in the Western world. The number of diagnosed cases and the mortality rate are almost equal as the majority of patients present with advanced disease at diagnosis. Between 4 and 10% of pancreatic cancer cases have an apparent hereditary background, known as hereditary pancreatic cancer (HPC) and familial pancreatic cancer (FPC), when the genetic basis is unknown. Surveillance of high-risk individuals (HRI) from these families by imaging aims to detect PDAC at an early stage to improve prognosis. However, the genetic basis is unknown in the majority of HRIs, with only around 10-13% of families carrying known pathogenic germline mutations. The aim of this study was to assess an individual's genetic cancer risk based on sex and personal and family history of cancer. The Best Linear Unbiased Prediction (BLUP) methodology was used to estimate an individual's predicted risk of developing cancer during their lifetime. The model uses different demographic factors in order to estimate heritability. A reliable estimation of heritability for pancreatic cancer of 0.27 on the liability scale, and 0.07 at the observed data scale as obtained, which is different from zero, indicating a polygenic inheritance pattern of PDAC. BLUP was able to correctly discriminate PDAC cases from healthy individuals and those with other cancer types. Thus, providing an additional tool to assess PDAC risk HRI with an assumed genetic predisposition in the absence of known pathogenic germline mutations.
PubMed: 38780705
DOI: 10.1007/s10689-024-00397-w