-
Nature Communications Apr 2024There is a long-standing debate about the magnitude of the contribution of gene-environment interactions to phenotypic variations of complex traits owing to the low...
There is a long-standing debate about the magnitude of the contribution of gene-environment interactions to phenotypic variations of complex traits owing to the low statistical power and few reported interactions to date. To address this issue, the Gene-Lifestyle Interactions Working Group within the Cohorts for Heart and Aging Research in Genetic Epidemiology Consortium has been spearheading efforts to investigate G × E in large and diverse samples through meta-analysis. Here, we present a powerful new approach to screen for interactions across the genome, an approach that shares substantial similarity to the Mendelian randomization framework. We identify and confirm 5 loci (6 independent signals) interacted with either cigarette smoking or alcohol consumption for serum lipids, and empirically demonstrate that interaction and mediation are the major contributors to genetic effect size heterogeneity across populations. The estimated lower bound of the interaction and environmentally mediated heritability is significant (P < 0.02) for low-density lipoprotein cholesterol and triglycerides in Cross-Population data. Our study improves the understanding of the genetic architecture and environmental contributions to complex traits.
Topics: Gene-Environment Interaction; Humans; Genome-Wide Association Study; Multifactorial Inheritance; Male; Triglycerides; Female; Alcohol Drinking; Polymorphism, Single Nucleotide; Phenotype; Cholesterol, LDL; Cigarette Smoking; Quantitative Trait Loci; Middle Aged
PubMed: 38649715
DOI: 10.1038/s41467-024-47806-3 -
Twin Research and Human Genetics : the... Apr 2024While it is known that vitamin D deficiency is associated with adverse bone outcomes, it remains unclear whether low vitamin D status may increase the risk of a wider...
While it is known that vitamin D deficiency is associated with adverse bone outcomes, it remains unclear whether low vitamin D status may increase the risk of a wider range of health outcomes. We had the opportunity to explore the association between common genetic variants associated with both 25 hydroxyvitamin D (25OHD) and the vitamin D binding protein (DBP, encoded by the gene) with a comprehensive range of health disorders and laboratory tests in a large academic medical center. We used summary statistics for 25OHD and DBP to generate polygenic scores (PGS) for 66,482 participants with primarily European ancestry and 13,285 participants with primarily African ancestry from the Vanderbilt University Medical Center Biobank (BioVU). We examined the predictive properties of PGS, and two scores related to DBP concentration with respect to 1322 health-related phenotypes and 315 laboratory-measured phenotypes from electronic health records. In those with European ancestry: (a) the PGS and PGS scores, and individual SNPs rs4588 and rs7041 were associated with both 25OHD concentration and 1,25 dihydroxyvitamin D concentrations; (b) higher PGS was associated with decreased concentrations of triglycerides and cholesterol, and reduced risks of vitamin D deficiency, disorders of lipid metabolism, and diabetes. In general, the findings for the African ancestry group were consistent with findings from the European ancestry analyses. Our study confirms the utility of PGS and two key variants within the gene (rs4588 and rs7041) to predict the risk of vitamin D deficiency in clinical settings and highlights the shared biology between vitamin D-related genetic pathways a range of health outcomes.
Topics: Humans; Vitamin D-Binding Protein; Vitamin D; Female; Male; Middle Aged; Adult; Genome-Wide Association Study; Polymorphism, Single Nucleotide; White People; Phenotype; Aged; Vitamin D Deficiency; Multifactorial Inheritance
PubMed: 38644690
DOI: 10.1017/thg.2024.19 -
ELife Apr 2024We propose a new framework for human genetic association studies: at each locus, a deep learning model (in this study, Sei) is used to calculate the functional genomic...
We propose a new framework for human genetic association studies: at each locus, a deep learning model (in this study, Sei) is used to calculate the functional genomic activity score for two haplotypes per individual. This score, defined as the Haplotype Function Score (HFS), replaces the original genotype in association studies. Applying the HFS framework to 14 complex traits in the UK Biobank, we identified 3619 independent HFS-trait associations with a significance of p < 5 × 10. Fine-mapping revealed 2699 causal associations, corresponding to a median increase of 63 causal findings per trait compared with single-nucleotide polymorphism (SNP)-based analysis. HFS-based enrichment analysis uncovered 727 pathway-trait associations and 153 tissue-trait associations with strong biological interpretability, including 'circadian pathway-chronotype' and 'arachidonic acid-intelligence'. Lastly, we applied least absolute shrinkage and selection operator (LASSO) regression to integrate HFS prediction score with SNP-based polygenic risk scores, which showed an improvement of 16.1-39.8% in cross-ancestry polygenic prediction. We concluded that HFS is a promising strategy for understanding the genetic basis of human complex traits.
Topics: Humans; Haplotypes; Multifactorial Inheritance; Quantitative Trait Loci; Genome-Wide Association Study; Polymorphism, Single Nucleotide; Phenotype
PubMed: 38639992
DOI: 10.7554/eLife.92574 -
Journal of the American Heart... May 2024A study was designed to investigate whether the coronary artery disease polygenic risk score (CAD-PRS) may guide lipid-lowering treatment initiation as well as deferral...
BACKGROUND
A study was designed to investigate whether the coronary artery disease polygenic risk score (CAD-PRS) may guide lipid-lowering treatment initiation as well as deferral in primary prevention beyond established clinical risk scores.
METHODS AND RESULTS
Participants were 311 799 individuals from the UK Biobank free of atherosclerotic cardiovascular disease, diabetes, chronic kidney disease, and lipid-lowering treatment at baseline. Participants were categorized as statin indicated, statin indication unclear, or statin not indicated as defined by the European and US guidelines on statin use. For a median of 11.9 (11.2-12.6) years, 8196 major coronary events developed. CAD-PRS added to European-Systematic Coronary Risk Evaluation 2 (European-SCORE2) and US-Pooled Cohort Equation (US-PCE) identified 18% and 12% of statin-indication-unclear individuals whose risk of major coronary events were the same as or higher than the average risk of statin-indicated individuals and 16% and 12% of statin-indicated individuals whose major coronary event risks were the same as or lower than the average risk of statin-indication-unclear individuals. For major coronary and atherosclerotic cardiovascular disease events, CAD-PRS improved C-statistics greater among statin-indicated or statin-indication-unclear than statin-not-indicated individuals. For atherosclerotic cardiovascular disease events, CAD-PRS added to the European evaluation and US equation resulted in a net reclassification improvement of 13.6% (95% CI, 11.8-15.5) and 14.7% (95% CI, 13.1-16.3) among statin-indicated, 10.8% (95% CI, 9.6-12.0) and 15.3% (95% CI, 13.2-17.5) among statin-indication-unclear, and 0.9% (95% CI, 0.6-1.3) and 3.6% (95% CI, 3.0-4.2) among statin-not-indicated individuals.
CONCLUSIONS
CAD-PRS may guide statin initiation as well as deferral among statin-indication-unclear or statin-indicated individuals as defined by the European and US guidelines. CAD-PRS had little clinical utility among statin-not-indicated individuals.
Topics: Humans; Hydroxymethylglutaryl-CoA Reductase Inhibitors; Coronary Artery Disease; Male; Female; Middle Aged; Risk Assessment; Practice Guidelines as Topic; United States; Aged; Primary Prevention; Europe; Eligibility Determination; United Kingdom; Risk Factors; Genetic Predisposition to Disease; Multifactorial Inheritance; Patient Selection; Adult
PubMed: 38639378
DOI: 10.1161/JAHA.123.032831 -
Nature Genetics May 2024Rare damaging variants in a large number of genes are known to cause monogenic developmental disorders (DDs) and have also been shown to cause milder subclinical...
Rare damaging variants in a large number of genes are known to cause monogenic developmental disorders (DDs) and have also been shown to cause milder subclinical phenotypes in population cohorts. Here, we show that carrying multiple (2-5) rare damaging variants across 599 dominant DD genes has an additive adverse effect on numerous cognitive and socioeconomic traits in UK Biobank, which can be partially counterbalanced by a higher educational attainment polygenic score (EA-PGS). Phenotypic deviators from expected EA-PGS could be partly explained by the enrichment or depletion of rare DD variants. Among carriers of rare DD variants, those with a DD-related clinical diagnosis had a substantially lower EA-PGS and more severe phenotype than those without a clinical diagnosis. Our results suggest that the overall burden of both rare and common variants can modify the expressivity of a phenotype, which may then influence whether an individual reaches the threshold for clinical disease.
Topics: Humans; Multifactorial Inheritance; Phenotype; Developmental Disabilities; Female; Male; Genetic Predisposition to Disease; Genetic Variation; United Kingdom; Genes, Modifier; Middle Aged; Genome-Wide Association Study
PubMed: 38637616
DOI: 10.1038/s41588-024-01710-0 -
Alzheimer's & Dementia : the Journal of... Jun 2024Alzheimer's disease (AD) prevalence increases with age, yet a small fraction of the population reaches ages > 100 years without cognitive decline. We studied the genetic...
BACKGROUND
Alzheimer's disease (AD) prevalence increases with age, yet a small fraction of the population reaches ages > 100 years without cognitive decline. We studied the genetic factors associated with such resilience against AD.
METHODS
Genome-wide association studies identified 86 single nucleotide polymorphisms (SNPs) associated with AD risk. We estimated SNP frequency in 2281 AD cases, 3165 age-matched controls, and 346 cognitively healthy centenarians. We calculated a polygenic risk score (PRS) for each individual and investigated the functional properties of SNPs enriched/depleted in centenarians.
RESULTS
Cognitively healthy centenarians were enriched with the protective alleles of the SNPs associated with AD risk. The protective effect concentrated on the alleles in/near ANKH, GRN, TMEM106B, SORT1, PLCG2, RIN3, and APOE genes. This translated to >5-fold lower PRS in centenarians compared to AD cases (P = 7.69 × 10), and 2-fold lower compared to age-matched controls (P = 5.83 × 10).
DISCUSSION
Maintaining cognitive health until extreme ages requires complex genetic protection against AD, which concentrates on the genes associated with the endolysosomal and immune systems.
HIGHLIGHTS
Cognitively healthy cent enarians are enriched with the protective alleles of genetic variants associated with Alzheimer's disease (AD). The protective effect is concentrated on variants involved in the immune and endolysosomal systems. Combining variants into a polygenic risk score (PRS) translated to > 5-fold lower PRS in centenarians compared to AD cases, and ≈ 2-fold lower compared to middle-aged healthy controls.
Topics: Humans; Alzheimer Disease; Polymorphism, Single Nucleotide; Female; Male; Aged, 80 and over; Genome-Wide Association Study; Genetic Predisposition to Disease; Multifactorial Inheritance; Alleles; Case-Control Studies
PubMed: 38634500
DOI: 10.1002/alz.13810 -
Nature Genetics May 2024We report a multi-ancestry genome-wide association study on liver cirrhosis and its associated endophenotypes, alanine aminotransferase (ALT) and γ-glutamyl...
We report a multi-ancestry genome-wide association study on liver cirrhosis and its associated endophenotypes, alanine aminotransferase (ALT) and γ-glutamyl transferase. Using data from 12 cohorts, including 18,265 cases with cirrhosis, 1,782,047 controls, up to 1 million individuals with liver function tests and a validation cohort of 21,689 cases and 617,729 controls, we identify and validate 14 risk associations for cirrhosis. Many variants are located near genes involved in hepatic lipid metabolism. One of these, PNPLA3 p.Ile148Met, interacts with alcohol intake, obesity and diabetes on the risk of cirrhosis and hepatocellular carcinoma (HCC). We develop a polygenic risk score that associates with the progression from cirrhosis to HCC. By focusing on prioritized genes from common variant analyses, we find that rare coding variants in GPAM associate with lower ALT, supporting GPAM as a potential target for therapeutic inhibition. In conclusion, this study provides insights into the genetic underpinnings of cirrhosis.
Topics: Humans; Liver Cirrhosis; Genome-Wide Association Study; Genetic Predisposition to Disease; Liver Neoplasms; Carcinoma, Hepatocellular; Alanine Transaminase; Polymorphism, Single Nucleotide; Male; Lipase; Female; gamma-Glutamyltransferase; Membrane Proteins; Cohort Studies; Case-Control Studies; Multifactorial Inheritance; Risk Factors; Genetic Variation
PubMed: 38632349
DOI: 10.1038/s41588-024-01720-y -
PLoS Genetics Apr 2024Population differences in risk of disease are common, but the potential genetic basis for these differences is not well understood. A standard approach is to compare...
Population differences in risk of disease are common, but the potential genetic basis for these differences is not well understood. A standard approach is to compare genetic risk across populations by testing for mean differences in polygenic scores, but existing studies that use this approach do not account for statistical noise in effect estimates (i.e., the GWAS betas) that arise due to the finite sample size of GWAS training data. Here, we show using Bayesian polygenic score methods that the level of uncertainty in estimates of genetic risk differences across populations is highly dependent on the GWAS training sample size, the polygenicity (number of causal variants), and genetic distance (FST) between the populations considered. We derive a Wald test for formally assessing the difference in genetic risk across populations, which we show to have calibrated type 1 error rates under a simplified assumption that all SNPs are independent, which we achieve in practise using linkage disequilibrium (LD) pruning. We further provide closed-form expressions for assessing the uncertainty in estimates of relative genetic risk across populations under the special case of an infinitesimal genetic architecture. We suggest that for many complex traits and diseases, particularly those with more polygenic architectures, current GWAS sample sizes are insufficient to detect moderate differences in genetic risk across populations, though more substantial differences in relative genetic risk (relative risk > 1.5) can be detected. We show that conventional approaches that do not account for sampling error from the training sample, such as using a simple t-test, have very high type 1 error rates. When applying our approach to prostate cancer, we demonstrate a higher genetic risk in African Ancestry men, with lower risk in men of European followed by East Asian ancestry.
Topics: Male; Humans; Bayes Theorem; Risk Factors; Linkage Disequilibrium; Multifactorial Inheritance; Prostatic Neoplasms; Genome-Wide Association Study; Genetic Predisposition to Disease; Polymorphism, Single Nucleotide
PubMed: 38630784
DOI: 10.1371/journal.pgen.1011212 -
Cancer Epidemiology, Biomarkers &... Jun 2024Previous studies have demonstrated that incorporating a polygenic risk score (PRS) to existing risk prediction models for breast cancer improves model fit, but to...
BACKGROUND
Previous studies have demonstrated that incorporating a polygenic risk score (PRS) to existing risk prediction models for breast cancer improves model fit, but to determine its clinical utility the impact on risk categorization needs to be established. We add a PRS to two well-established models and quantify the difference in classification using the net reclassification improvement (NRI).
METHODS
We analyzed data from 126,490 post-menopausal women of "White British" ancestry, aged 40 to 69 years at baseline from the UK Biobank prospective cohort. The breast cancer outcome was derived from linked registry data and hospital records. We combined a PRS for breast cancer with 10-year risk scores from the Tyrer-Cuzick and Gail models, and compared these to the risk scores from the models using phenotypic variables alone. We report metrics of discrimination and classification, and consider the importance of the risk threshold selected.
RESULTS
The Harrell's C statistic of the 10-year risk from the Tyrer-Cuzick and Gail models was 0.57 and 0.54, respectively, increasing to 0.67 when the PRS was included. Inclusion of the PRS gave a positive NRI for cases in both models [0.080 (95% confidence interval (CI), 0.053-0.104) and 0.051 (95% CI, 0.030-0.073), respectively], with negligible impact on controls.
CONCLUSIONS
The addition of a PRS for breast cancer to the well-established Tyrer-Cuzick and Gail models provides a substantial improvement in the prediction accuracy and risk stratification.
IMPACT
These findings could have important implications for the ongoing discussion about the value of PRS in risk prediction models and screening.
Topics: Humans; Female; Breast Neoplasms; Middle Aged; United Kingdom; Aged; Adult; Biological Specimen Banks; Risk Assessment; Prospective Studies; Risk Factors; Multifactorial Inheritance; Genetic Predisposition to Disease; Genetic Risk Score; UK Biobank
PubMed: 38630597
DOI: 10.1158/1055-9965.EPI-23-1432 -
PloS One 2024Detecting epistatic drivers of human phenotypes is a considerable challenge. Traditional approaches use regression to sequentially test multiplicative interaction terms...
Detecting epistatic drivers of human phenotypes is a considerable challenge. Traditional approaches use regression to sequentially test multiplicative interaction terms involving pairs of genetic variants. For higher-order interactions and genome-wide large-scale data, this strategy is computationally intractable. Moreover, multiplicative terms used in regression modeling may not capture the form of biological interactions. Building on the Predictability, Computability, Stability (PCS) framework, we introduce the epiTree pipeline to extract higher-order interactions from genomic data using tree-based models. The epiTree pipeline first selects a set of variants derived from tissue-specific estimates of gene expression. Next, it uses iterative random forests (iRF) to search training data for candidate Boolean interactions (pairwise and higher-order). We derive significance tests for interactions, based on a stabilized likelihood ratio test, by simulating Boolean tree-structured null (no epistasis) and alternative (epistasis) distributions on hold-out test data. Finally, our pipeline computes PCS epistasis p-values that probabilisticly quantify improvement in prediction accuracy via bootstrap sampling on the test set. We validate the epiTree pipeline in two case studies using data from the UK Biobank: predicting red hair and multiple sclerosis (MS). In the case of predicting red hair, epiTree recovers known epistatic interactions surrounding MC1R and novel interactions, representing non-linearities not captured by logistic regression models. In the case of predicting MS, a more complex phenotype than red hair, epiTree rankings prioritize novel interactions surrounding HLA-DRB1, a variant previously associated with MS in several populations. Taken together, these results highlight the potential for epiTree rankings to help reduce the design space for follow up experiments.
Topics: Humans; Epistasis, Genetic; Genome-Wide Association Study; Phenotype; Multifactorial Inheritance; Logistic Models; Polymorphism, Single Nucleotide
PubMed: 38625909
DOI: 10.1371/journal.pone.0298906