-
Scientific Reports May 2024We construct non-linear machine learning (ML) prediction models for systolic and diastolic blood pressure (SBP, DBP) using demographic and clinical variables and...
We construct non-linear machine learning (ML) prediction models for systolic and diastolic blood pressure (SBP, DBP) using demographic and clinical variables and polygenic risk scores (PRSs). We developed a two-model ensemble, consisting of a baseline model, where prediction is based on demographic and clinical variables only, and a genetic model, where we also include PRSs. We evaluate the use of a linear versus a non-linear model at both the baseline and the genetic model levels and assess the improvement in performance when incorporating multiple PRSs. We report the ensemble model's performance as percentage variance explained (PVE) on a held-out test dataset. A non-linear baseline model improved the PVEs from 28.1 to 30.1% (SBP) and 14.3% to 17.4% (DBP) compared with a linear baseline model. Including seven PRSs in the genetic model computed based on the largest available GWAS of SBP/DBP improved the genetic model PVE from 4.8 to 5.1% (SBP) and 4.7 to 5% (DBP) compared to using a single PRS. Adding additional 14 PRSs computed based on two independent GWASs further increased the genetic model PVE to 6.3% (SBP) and 5.7% (DBP). PVE differed across self-reported race/ethnicity groups, with primarily all non-White groups benefitting from the inclusion of additional PRSs. In summary, non-linear ML models improves BP prediction in models incorporating diverse populations.
Topics: Humans; Machine Learning; Blood Pressure; Multifactorial Inheritance; Phenotype; Genome-Wide Association Study; Risk Factors; Male; Female; Genetic Predisposition to Disease; Models, Genetic; Hypertension; Middle Aged; Genetic Risk Score
PubMed: 38816422
DOI: 10.1038/s41598-024-62945-9 -
Translational Psychiatry May 2024Substance use disorder (SUD) is a global health problem with a significant impact on individuals and society. The presentation of SUD is diverse, involving various...
Substance use disorder (SUD) is a global health problem with a significant impact on individuals and society. The presentation of SUD is diverse, involving various substances, ages at onset, comorbid conditions, and disease trajectories. Current treatments for SUD struggle to address this heterogeneity, resulting in high relapse rates. SUD often co-occurs with other psychiatric and mental health-related conditions that contribute to the heterogeneity of the disorder and predispose to adverse disease trajectories. Family and genetic studies highlight the role of genetic and environmental factors in the course of SUD, and point to a shared genetic liability between SUDs and comorbid psychopathology. In this study, we aimed to disentangle SUD heterogeneity using a deeply phenotyped SUD cohort and polygenic scores (PGSs) for psychiatric disorders and related traits. We explored associations between PGSs and various SUD-related phenotypes, as well as PGS-environment interactions using information on lifetime emotional, physical, and/or sexual abuse. Our results identify clusters of individuals who exhibit differences in their phenotypic profile and reveal different patterns of associations between SUD-related phenotypes and the genetic liability for mental health-related traits, which may help explain part of the heterogeneity observed in SUD. In our SUD sample, we found associations linking the genetic liability for attention-deficit hyperactivity disorder (ADHD) with lower educational attainment, the genetic liability for post-traumatic stress disorder (PTSD) with higher rates of unemployment, the genetic liability for educational attainment with lower rates of criminal records and unemployment, and the genetic liability for well-being with lower rates of outpatient treatments and fewer problems related to family and social relationships. We also found evidence of PGS-environment interactions showing that genetic liability for suicide attempts worsened the psychiatric status in SUD individuals with a history of emotional physical and/or sexual abuse. Collectively, these data contribute to a better understanding of the role of genetic liability for mental health-related conditions and adverse life experiences in SUD heterogeneity.
Topics: Humans; Substance-Related Disorders; Multifactorial Inheritance; Male; Female; Adult; Phenotype; Genetic Predisposition to Disease; Middle Aged; Genome-Wide Association Study; Gene-Environment Interaction; Young Adult; Comorbidity; Mental Disorders
PubMed: 38811559
DOI: 10.1038/s41398-024-02923-x -
Nature Communications May 2024Dominance heritability in complex traits has received increasing recognition. However, most polygenic score (PGS) approaches do not incorporate non-additive effects....
Dominance heritability in complex traits has received increasing recognition. However, most polygenic score (PGS) approaches do not incorporate non-additive effects. Here, we present GenoBoost, a flexible PGS modeling framework capable of considering both additive and non-additive effects, specifically focusing on genetic dominance. Building on statistical boosting theory, we derive provably optimal GenoBoost scores and provide its efficient implementation for analyzing large-scale cohorts. We benchmark it against seven commonly used PGS methods and demonstrate its competitive predictive performance. GenoBoost is ranked the best for four traits and second-best for three traits among twelve tested disease outcomes in UK Biobank. We reveal that GenoBoost improves prediction for autoimmune diseases by incorporating non-additive effects localized in the MHC locus and, more broadly, works best in less polygenic traits. We further demonstrate that GenoBoost can infer the mode of genetic inheritance without requiring prior knowledge. For example, GenoBoost finds non-zero genetic dominance effects for 602 of 900 selected genetic variants, resulting in 2.5% improvements in predicting psoriasis cases. Lastly, we show that GenoBoost can prioritize genetic loci with genetic dominance not previously reported in the GWAS catalog. Our results highlight the increased accuracy and biological insights from incorporating non-additive effects in PGS models.
Topics: Multifactorial Inheritance; Humans; Models, Genetic; Genome-Wide Association Study; Polymorphism, Single Nucleotide; Genetic Predisposition to Disease; Autoimmune Diseases; Genes, Dominant; Psoriasis
PubMed: 38811555
DOI: 10.1038/s41467-024-48654-x -
The New Phytologist Jul 2024Understanding the genetic basis of how plants defend against pathogens is important to monitor and maintain resilient tree populations. Swiss needle cast (SNC) and...
Understanding the genetic basis of how plants defend against pathogens is important to monitor and maintain resilient tree populations. Swiss needle cast (SNC) and Rhabdocline needle cast (RNC) epidemics are responsible for major damage of forest ecosystems in North America. Here we investigate the genetic architecture of tolerance and resistance to needle cast diseases in Douglas-fir (Pseudotsuga menziesii) caused by two fungal pathogens: SNC caused by Nothophaeocryptopus gaeumannii, and RNC caused by Rhabdocline pseudotsugae. We performed case-control genome-wide association analyses and found disease resistance and tolerance in Douglas-fir to be polygenic and under strong selection. We show that stomatal regulation as well as ethylene and jasmonic acid pathways are important for resisting SNC infection, and secondary metabolite pathways play a role in tolerating SNC once the plant is infected. We identify a major transcriptional regulator of plant defense, ERF1, as the top candidate for RNC resistance. Our findings shed light on the highly polygenic architectures underlying fungal disease resistance and tolerance and have important implications for forestry and conservation as the climate changes.
Topics: Disease Resistance; Plant Diseases; Pseudotsuga; Genome-Wide Association Study; Ascomycota; Trees; Adaptation, Physiological; Multifactorial Inheritance; Gene Expression Regulation, Plant; Genes, Plant
PubMed: 38803110
DOI: 10.1111/nph.19797 -
Journal of Alzheimer's Disease : JAD 2024Polygenic risk scores (PRS) are linear combinations of genetic markers weighted by effect size that are commonly used to predict disease risk. For complex heritable...
BACKGROUND
Polygenic risk scores (PRS) are linear combinations of genetic markers weighted by effect size that are commonly used to predict disease risk. For complex heritable diseases such as late-onset Alzheimer's disease (LOAD), PRS models fail to capture much of the heritability. Additionally, PRS models are highly dependent on the population structure of the data on which effect sizes are assessed and have poor generalizability to new data.
OBJECTIVE
The goal of this study is to construct a paragenic risk score that, in addition to single genetic marker data used in PRS, incorporates epistatic interaction features and machine learning methods to predict risk for LOAD.
METHODS
We construct a new state-of-the-art genetic model for risk of Alzheimer's disease. Our approach innovates over PRS models in two ways: First, by directly incorporating epistatic interactions between SNP loci using an evolutionary algorithm guided by shared pathway information; and second, by estimating risk via an ensemble of non-linear machine learning models rather than a single linear model. We compare the paragenic model to several PRS models from the literature trained on the same dataset.
RESULTS
The paragenic model is significantly more accurate than the PRS models under 10-fold cross-validation, obtaining an AUC of 83% and near-clinically significant matched sensitivity/specificity of 75%. It remains significantly more accurate when evaluated on an independent holdout dataset and maintains accuracy within APOE genotype strata.
CONCLUSIONS
Paragenic models show potential for improving disease risk prediction for complex heritable diseases such as LOAD over PRS models.
Topics: Humans; Alzheimer Disease; Machine Learning; Multifactorial Inheritance; Epistasis, Genetic; Genetic Predisposition to Disease; Female; Male; Polymorphism, Single Nucleotide; Aged; Genome-Wide Association Study; Apolipoproteins E; Models, Genetic; Genetic Risk Score
PubMed: 38788065
DOI: 10.3233/JAD-230236 -
Molecular Genetics and Genomics : MGG May 2024Hereditary spherocytosis (HS) is one of the most common causes of hereditary hemolytic anemia. The current diagnostic guidelines for HS are mainly based on a combination...
Hereditary spherocytosis (HS) is one of the most common causes of hereditary hemolytic anemia. The current diagnostic guidelines for HS are mainly based on a combination of physical examination and laboratory investigation. However, some patients present with complicated clinical manifestations that cannot be explained by routine diagnostic protocols. Here, we report a rare HS case of mild anemia with extremely high indirect bilirubin levels and high expression of fetal hemoglobin. Using whole exome sequencing analysis, this patient was identified as a heterozygous carrier of a de novo SPTB nonsense mutation (c.605G > A; p.W202*) and a compound heterozygous carrier of known UGT1A1 and KLF1 mutations. This genetic analysis based on the interpretation of the patient's genomic data not only achieved precise diagnosis by an excellent explanation of the complicated phenotype but also provided valuable suggestions for subsequent appropriate approaches for treatment, surveillance and prophylaxis.
Topics: Humans; Spherocytosis, Hereditary; Phenotype; Kruppel-Like Transcription Factors; Spectrin; Glucuronosyltransferase; Exome Sequencing; Codon, Nonsense; Male; Heterozygote; Female
PubMed: 38787432
DOI: 10.1007/s00438-024-02150-5 -
ELife May 2024Rich data from large biobanks, coupled with increasingly accessible association statistics from genome-wide association studies (GWAS), provide great opportunities to...
Rich data from large biobanks, coupled with increasingly accessible association statistics from genome-wide association studies (GWAS), provide great opportunities to dissect the complex relationships among human traits and diseases. We introduce BADGERS, a powerful method to perform polygenic score-based biobank-wide association scans. Compared to traditional approaches, BADGERS uses GWAS summary statistics as input and does not require multiple traits to be measured in the same cohort. We applied BADGERS to two independent datasets for late-onset Alzheimer's disease (AD; n=61,212). Among 1738 traits in the UK biobank, we identified 48 significant associations for AD. Family history, high cholesterol, and numerous traits related to intelligence and education showed strong and independent associations with AD. Furthermore, we identified 41 significant associations for a variety of AD endophenotypes. While family history and high cholesterol were strongly associated with AD subgroups and pathologies, only intelligence and education-related traits predicted pre-clinical cognitive phenotypes. These results provide novel insights into the distinct biological processes underlying various risk factors for AD.
Topics: Alzheimer Disease; Humans; Risk Factors; Endophenotypes; Genome-Wide Association Study; Male; Biological Specimen Banks; Female; United Kingdom; Aged; Genetic Predisposition to Disease; Multifactorial Inheritance; Aged, 80 and over
PubMed: 38787369
DOI: 10.7554/eLife.91360 -
Arteriosclerosis, Thrombosis, and... Jul 2024Heterozygous familial hypercholesterolemia (FH) is among the most common genetic conditions worldwide that affects ≈ 1 in 300 individuals. FH is characterized by...
BACKGROUND
Heterozygous familial hypercholesterolemia (FH) is among the most common genetic conditions worldwide that affects ≈ 1 in 300 individuals. FH is characterized by increased levels of low-density lipoprotein cholesterol (LDL-C) and increased risk of coronary artery disease (CAD), but there is a wide spectrum of severity within the FH population. This variability in expression is incompletely explained by known risk factors. We hypothesized that genome-wide genetic influences, as represented by polygenic risk scores (PRSs) for cardiometabolic traits, would influence the phenotypic severity of FH.
METHODS
We studied individuals with clinically diagnosed FH (n=1123) from the FH Canada National Registry, as well as individuals with genetically identified FH from the UK Biobank (n=723). For all individuals, we used genome-wide gene array data to calculate PRSs for CAD, LDL-C, lipoprotein(a), and other cardiometabolic traits. We compared the distribution of PRSs in individuals with clinically diagnosed FH, genetically diagnosed FH, and non-FH controls and examined the association of the PRSs with the risk of atherosclerotic cardiovascular disease.
RESULTS
Individuals with clinically diagnosed FH had higher levels of LDL-C, and the incidence of atherosclerotic cardiovascular disease was higher in individuals with clinically diagnosed compared with genetically identified FH. Individuals with clinically diagnosed FH displayed enrichment for higher PRSs for CAD, LDL-C, and lipoprotein(a) but not for other cardiometabolic risk factors. The CAD PRS was associated with a risk of atherosclerotic cardiovascular disease among individuals with an FH-causing genetic variant.
CONCLUSIONS
Genetic background, as expressed by genome-wide PRSs for CAD, LDL-C, and lipoprotein(a), influences the phenotypic severity of FH, expanding our understanding of the determinants that contribute to the variable expressivity of FH. A PRS for CAD may aid in risk prediction among individuals with FH.
Topics: Humans; Hyperlipoproteinemia Type II; Female; Male; Middle Aged; Multifactorial Inheritance; Cholesterol, LDL; Phenotype; Registries; Genetic Predisposition to Disease; Coronary Artery Disease; Risk Assessment; Genome-Wide Association Study; Lipoprotein(a); Adult; Aged; Canada; United Kingdom; Severity of Illness Index; Risk Factors; Case-Control Studies; Biomarkers; Incidence
PubMed: 38779854
DOI: 10.1161/ATVBAHA.123.320287 -
Human Genomics May 2024Given the high prevalence of BPH among elderly men, pinpointing those at elevated risk can aid in early intervention and effective management. This study aimed to...
BACKGROUND
Given the high prevalence of BPH among elderly men, pinpointing those at elevated risk can aid in early intervention and effective management. This study aimed to explore that polygenic risk score (PRS) is effective in predicting benign prostatic hyperplasia (BPH) incidence, prognosis and risk of operation in Han Chinese.
METHODS
A retrospective cohort study included 12,474 male participants (6,237 with BPH and 6,237 non-BPH controls) from the Taiwan Precision Medicine Initiative (TPMI). Genotyping was performed using the Affymetrix Genome-Wide TWB 2.0 SNP Array. PRS was calculated using PGS001865, comprising 1,712 single nucleotide polymorphisms. Logistic regression models assessed the association between PRS and BPH incidence, adjusting for age and prostate-specific antigen (PSA) levels. The study also examined the relationship between PSA, prostate volume, and response to 5-α-reductase inhibitor (5ARI) treatment, as well as the association between PRS and the risk of TURP.
RESULTS
Individuals in the highest PRS quartile (Q4) had a significantly higher risk of BPH compared to the lowest quartile (Q1) (OR = 1.51, 95% CI = 1.274-1.783, p < 0.0001), after adjusting for PSA level. The Q4 group exhibited larger prostate volumes and a smaller volume reduction after 5ARI treatment. The Q1 group had a lower cumulative TURP probability at 3, 5, and 10 years compared to the Q4 group. PRS Q4 was an independent risk factor for TURP.
CONCLUSIONS
In this Han Chinese cohort, higher PRS was associated with an increased susceptibility to BPH, larger prostate volumes, poorer response to 5ARI treatment, and a higher risk of TURP. Larger prospective studies with longer follow-up are warranted to further validate these findings.
Topics: Humans; Male; Prostatic Hyperplasia; Aged; Middle Aged; Polymorphism, Single Nucleotide; Genetic Predisposition to Disease; Retrospective Studies; Multifactorial Inheritance; Asian People; Risk Factors; 5-alpha Reductase Inhibitors; Prostate-Specific Antigen; Taiwan; Prognosis; Prostate; Genetic Risk Score; East Asian People
PubMed: 38778357
DOI: 10.1186/s40246-024-00619-3 -
Scientific Reports May 2024In recent years, the utility of polygenic risk scores (PRS) in forecasting disease susceptibility from genome-wide association studies (GWAS) results has been widely...
In recent years, the utility of polygenic risk scores (PRS) in forecasting disease susceptibility from genome-wide association studies (GWAS) results has been widely recognised. Yet, these models face limitations due to overfitting and the potential overestimation of effect sizes in correlated variants. To surmount these obstacles, we devised the Stacked Neural Network Polygenic Risk Score (SNPRS). This novel approach synthesises outputs from multiple neural network models, each calibrated using genetic variants chosen based on diverse p-value thresholds. By doing so, SNPRS captures a broader array of genetic variants, enabling a more nuanced interpretation of the combined effects of these variants. We assessed the efficacy of SNPRS using the UK Biobank data, focusing on the genetic risks associated with breast and prostate cancers, as well as quantitative traits like height and BMI. We also extended our analysis to the Korea Genome and Epidemiology Study (KoGES) dataset. Impressively, our results indicate that SNPRS surpasses traditional PRS models and an isolated deep neural network in terms of accuracy, highlighting its promise in refining the efficacy and relevance of PRS in genetic studies.
Topics: Humans; Multifactorial Inheritance; Genome-Wide Association Study; Neural Networks, Computer; Genetic Predisposition to Disease; Polymorphism, Single Nucleotide; Female; Male; Prostatic Neoplasms; Breast Neoplasms; Risk Factors; Genetic Risk Score
PubMed: 38773257
DOI: 10.1038/s41598-024-62513-1