You are here

Non-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations.

TitleNon-linear machine learning models incorporating SNPs and PRS improve polygenic prediction in diverse human populations.
Publication TypeJournal Article
Year of Publication2022
AuthorsElgart, M, Lyons, G, Romero-Brufau, S, Kurniansyah, N, Brody, JA, Guo, X, Lin, HJ, Raffield, L, Gao, Y, Chen, H, de Vries, P, Lloyd-Jones, DM, Lange, LA, Peloso, GM, Fornage, M, Rotter, JI, Rich, SS, Morrison, AC, Psaty, BM, Levy, D, Redline, S, Sofer, T
Corporate/Institutional AuthorsNHLBI’s Trans-Omics in Precision Medicine (TOPMed) Consortium
JournalCommun Biol
Volume5
Issue1
Pagination856
Date Published2022 08 22
ISSN2399-3642
KeywordsGenetic Predisposition to Disease, Genome-Wide Association Study, Humans, Machine Learning, Multifactorial Inheritance, Polymorphism, Single Nucleotide
Abstract<p>Polygenic risk scores (PRS) are commonly used to quantify the inherited susceptibility for a trait, yet they fail to account for non-linear and interaction effects between single nucleotide polymorphisms (SNPs). We address this via a machine learning approach, validated in nine complex phenotypes in a multi-ancestry population. We use an ensemble method of SNP selection followed by gradient boosted trees (XGBoost) to allow for non-linearities and interaction effects. We compare our results to the standard, linear PRS model developed using PRSice, LDpred2, and lassosum2. Combining a PRS as a feature in an XGBoost model results in a relative increase in the percentage variance explained compared to the standard linear PRS model by 22% for height, 27% for HDL cholesterol, 43% for body mass index, 50% for sleep duration, 58% for systolic blood pressure, 64% for total cholesterol, 66% for triglycerides, 77% for LDL cholesterol, and 100% for diastolic blood pressure. Multi-ancestry trained models perform similarly to specific racial/ethnic group trained models and are consistently superior to the standard linear PRS models. This work demonstrates an effective method to account for non-linearities and interaction effects in genetics-based prediction models.</p>
DOI10.1038/s42003-022-03812-z
Alternate JournalCommun Biol
PubMed ID35995843
PubMed Central IDPMC9395509
Grant ListR01 HL120393 / HL / NHLBI NIH HHS / United States
U54 HG003067 / HG / NHGRI NIH HHS / United States
U01 HL120393 / HL / NHLBI NIH HHS / United States
R01 HL146860 / HL / NHLBI NIH HHS / United States
HHSN268201800001C / HL / NHLBI NIH HHS / United States
UM1 HG008898 / HG / NHGRI NIH HHS / United States
R01 HL117626 / HL / NHLBI NIH HHS / United States
U24 HG008956 / HG / NHGRI NIH HHS / United States
HHSN268201100037C / HL / NHLBI NIH HHS / United States
HHSN268201600032C / ES / NIEHS NIH HHS / United States
R21 HL145425 / HL / NHLBI NIH HHS / United States
R01 HL092577 / HL / NHLBI NIH HHS / United States
R01 HL059367 / HL / NHLBI NIH HHS / United States
U54 HG003273 / HG / NHGRI NIH HHS / United States
R01 HL142711 / HL / NHLBI NIH HHS / United States
R35 HL135818 / HL / NHLBI NIH HHS / United States
R01 HL098433 / HL / NHLBI NIH HHS / United States
HHSN268201500015C / HL / NHLBI NIH HHS / United States
R21 AG070644 / AG / NIA NIH HHS / United States
HHSN268201600033C / ES / NIEHS NIH HHS / United States
HHSN268201500014C / HL / NHLBI NIH HHS / United States
KL2 TR002490 / TR / NCATS NIH HHS / United States
ePub date: 
22/08