Study design
The UKB is a prospective cohort study from the UK that contains more than 500,000 volunteers between 40 and 69 years of age at inclusion. The study design, sample characteristics and genotype data have been described elsewhere66,67. The UKB was approved by the National Research Ethics Service Committee North West Multi-Centre Haydock and all study procedures were performed in accordance with the World Medical Association Declaration of Helsinki ethical principles for medical research. We included 460,036 individuals across the three major ancestries in UKB in our analyses for whom inclusion criteria (given consent to further usage of the data, availability of genetic data and passed quality control (QC) of genetic data) applied. Data from UKB were linked to death registries and hospital episode statistics (HES). We used the ancestry assignments as defined by the pan-UKB68 and further assigned unclassified individuals to their respective ancestries based on a k-nearest neighbor approach using genetic principal components. All analyses were conducted under UKB applications 44448 and 30418.
Metabolomic measurements
Up to 249 targeted metabolomic measurements were quantified using the Nightingale NMR platform in human EDTA plasma samples. Detailed experimental procedures for the NMR platform are described elsewhere65,69. The NMR platform covers a wide range of metabolic biomarkers, including lipoprotein lipids, fatty acids and small molecules such as amino acids, ketone bodies and glycolysis metabolites, quantified in molar concentration units. We combine here three data releases that cover the full breadth of the UKB. Metabolomics data were available for 482,276 individuals, including 19,699 samples with data from baseline and repeat visit.
Metabolites were reliably detected, with only one biomarker over 2.5% missingness in releases 1/2 (creatinine) and release 3 (3-hydroxybutyrate). Ninety-eight percent of the samples had <5% missingness over all biomarkers in releases 1/2 and release 3. We used the ukbnmr70 R package (v2.2, R v4.3.2) for QC and removal of technical variation in the NMR data. This includes technical confounders such as sample preparation time, shipping plate well, spectrometer effects, time drift within spectrometers and outlier plates.
We removed samples that were flagged by Nightingale for poor quality and used the MICE (Multivariate Imputation by Chained Equations)71 R package to impute the remaining dataset. In total, we imputed 0.16% and 0.17% of data in releases 1/2 and release 3, respectively.
We observed overall good consistency with the overlapping routine blood biomarkers previously measured in the same cohort (median r = 0.9, range 0.62–0.94) (Extended Data Fig. 9).
Adjustment of metabolomic data for medication use
We sought to adjust the NMR data for medication use, especially cholesterol-lowering medication, to avoid false-positive results driven by medication use in downstream genetic analyses. For male and female participants separately, we fit linear models to quantify the impact of six drug categories on each NMR phenotype: cholesterol-lowering medicine, blood pressure medication, diabetic medication including Metformin usage, oral contraceptive pill or minipill (female only) and hormone replacement therapy (female only) (UKB fields 6177 and 6153) (Supplementary Fig. 6 and Supplementary Table 18).
We used data from individuals with both baseline (NMRbaseline) and repeat (NMRfollow-up) assessment metabolic data available and estimated the effect of medication (med terms) in individuals that did not take any drugs at the time of the baseline visit (n = 6,312 male, n = 6,713 female participants) using the following model:
$$\begin{array}{l}{\mathrm{NMR}}_{\mathrm{baseline}} \sim {\mathrm{NMR}}_{{{\mathrm{follow}}}{{\text{-}}}{{\mathrm{up}}}}+\mathrm{age}+\mathrm{BMI} \\+{\mathrm{med}}_{\mathrm{cholesterol}}+{\mathrm{med}}_{\mathrm{diabetic}}+{\mathrm{med}}_{\mathrm{contraception}}+{\mathrm{med}}_{\mathrm{hormone}}+{\mathrm{error}}.\end{array}$$
We note that the sample sizes for diabetic medication (nmale = 45, nfemale = 29), oral contraceptive medication (n = 27) and hormone replacement therapy (n = 148) were too small to reliably estimate any effects. Effect estimates for diabetic medication were correlated to estimates for cholesterol-lowering medicine. The effect estimates for blood pressure medication were minimal across the phenotypes. We considered thus only the impact of cholesterol-lowering medicine and corrected the metabolic data in a sex-specific manner.
Genotyping and GWAS analyses
GWAS was performed on 249 metabolic traits measured by the NMR platform on British European (n = 434,646), British Central/South Asian (n = 8,796) and British African participants (n = 6,573) that had complete phenotypic, covariate and genetic information available. We used the Haplotype Reference Consortium-imputed genetic data, including all autosomal chromosomes and the X chromosome. We performed GWAS under the additive model using REGENIE (v3.2.5)72 that uses a two-step procedure to account for population structure. We derived a set of high-quality genotyped variants per population by applying the following filters: (MAF >1%, minor allele count (MAC) >100, missingness rate <10%, PHWE > 1 × 10−15). Furthermore, linkage disequilibrium pruning was performed using a 1,000-kb window, shifting by 100 variants and removing variants with LD (r2) >0.8. We used these variants as input for the first step of REGENIE to generate individual trait predictions using the leave-one-chromosome-out scheme. These predictions are used in the second step where individual variants are tested. Models were adjusted for age, sex and the first ten genetic principal components. We tested variants with a MAF >0.5%, amounting to 11.5 million variants in British European individuals, 11.5 million variants in British Central/South Asian individuals and 19.3 million variants in British African individuals.
For initial discovery, we performed a meta-analysis across the three ancestral groups using METAL73. We required variants to be present in at least two ancestral groups. To declare significance, we considered a stringent P-value threshold (2.0 × 10−10) by dividing the standard genome-wide threshold by the number of metabolic phenotypes (5.0 × 10−8/249).
We tested our results for genomic inflation and calculated the single-nucleotide polymorphism (SNP)-based heritability using LD-score regression74 (Supplementary Table 19).
Regional clumping and fine-mapping
We used regional clumping (±500 kb) around sentinel variants from the analyses including British European samples to select independent genomic regions associated with a metabolic phenotype and collapsed neighboring regions using BEDtools (v2.30.0). We treated the extended MHC region (chr6: 25.5–34.0 Mb) as one region.
Within each region of interest, excluding the MHC region, we performed statistical fine-mapping for all phenotypes associated with that region using the ‘Sum of single effects’ model (SuSiE) implemented in the susieR (v0.12.35) R package75. In brief, SuSiE uses a Bayesian framework for variable selection in a multiple regression problem with the aim to identify sets of independent variants each of which probably contains the true causally underlying genetic variant. We implemented the workflow using default prior and parameter settings, apart from the minimum absolute correlation, which we set to 0.1. Because SuSiE is implemented in a linear regression framework, we used the GWAS summary statistics with a matching correlation matrix of dosage genotypes instead of individual-level data to implement fine-mapping (susie_rss()) as recommended by the authors75.
To determine the appropriate number of credible sets within each region, we iterated over the maximum credible sets parameter in susieR from two to ten, thus generating fine-mapped results constrained to a range of maximum number of credible sets. For each collection of credible sets, we pruned sets where the lead variant was correlated to the lead variant of other credible sets (r2 > 0.25). After pruning, we considered the fine-mapped results with the largest number of credible sets.
We performed several sensitivity analyses by computing joint models per locus–phenotype combination, jointly modeling the effect of all distinct lead credible set variants in a single linear model. Subsequently, we retained only credible sets where the lead variant reached genome-wide significance (P = 5.0 × 10−8) in both marginal and joint statistics. Furthermore, we ensured the estimated coefficients were directionally concordant and of similar magnitude between joint and marginal models (±25%). Linear models were implemented in R using the glm() function and used only unrelated British European participants and the same set of covariates as described above.
Finally, we used LD clumping (r2 > 0.6) to identify credible sets shared across metabolic phenotypes.
We computed the correlation matrix with LDscore v2.0 using genetic data from 50,000 randomly selected, unrelated White European UKB participants. In situations where SuSiE did not deliver a credible set, we used the Wakefield approximation76 to compute 95%-credible sets.
Replication of genetic associations
We replicated our trans-ancestral genetic signals using two independent studies: (1) the so-far largest published mGWAS3 and (2) a parallel effort using overlapping UKB data9, both using the same NMR platform. We considered a set of metabolic traits that were directly measured by the NMR platform and not inferred from other traits to avoid multiplicative errors in these more sensitive phenotypes. In total, we were able to match 144 (Karjalainen et al.3) and 169 (Tambets et al.9) metabolic traits, for which we compared sentinel variants that passed metabolome-adjusted, genome-wide significance in our trans-ancestral meta-analysis and that overlapped between the studies.
Causal gene assignment
To assign candidate genes for all metabolite QTLs residing outside the MHC region, we first collected annotations for each genetic variant or proxies thereof (r2 > 0.6), including distance to the gene body and putative functional consequences based on the Variant Effect Predictor (VEP) tool offered by Ensembl. We further collated up to ten closest genes within a 2-Mb window and subsequent gene features such as: (1) eQTL evidence for a given variant–gene pair for each tissue available in the eQTL Catalogue release 777; (2) evidence of being annotated as metabolic in the MGI or Orphanet databases as defined in ProGem19; (3) evidence of being listed in the Online Mendelian Inheritance in Man (OMIM) database39; (4) and evidence of being an already assigned drug target in Open Targets78 clinical stages III and IV.
With no universally accepted standard for variant-to-gene assignments, we relied on prior biological and genomic information to create three sets of ‘putative true positive’ (PTP) set: genes part of cholesterol pathway in the Kyoto Encyclopedia of Genes and Genomes (KEGG)79 or REACTOME80 database (n = 6,791, 722 unique SNPs), lipid pathway (n = 5,670, 603 unique SNPs) and amino acid-related pathway (n = 8,349, 895 unique SNPs). We used all fine-mapped SNPs associated with metabolites classified in the respective NMR metabolite class (Cholesterol: cholesterol, cholesteryl esters, free cholesterol; Lipid: total lipids, other lipids, relative lipid concentration, phospholipids; Amino Acid: amino acid) in the PTP set and used overlapping SNPs in only one PTP set. We trained (7:3 training:test ratio without overlapping variants) a random forest classifier using fivefold cross-validation with subsampling to account for the unbalanced datasets (scikit-learn v1.4.1). We used the balanced accuracy score to choose the best-performing forest from each training set. Subsequently, we used the best-performing classifier from each PTP set to assign candidate scores for all putative effector genes across the entire set of metabolite QTLs. We calculated the median score across classifiers and selected the highest-scoring gene per variant. Within each PTP set, we omitted features used to define true positive sets. Each of the three classifiers exhibited consistent performance (mean ROC-AUC: 0.80, mean balanced accuracy score 0.69) (Supplementary Fig. 7). We used the sum across all three classifiers to assign effector gene scores but present only genes as potential effector genes that reached sufficient support as indicated by largest difference between consecutively prioritized genes.
To provide another layer of evidence for assignment of causal genes at metabolic loci, we performed cis-colocalization with protein targets measured in the independent Fenland study22. Cis (for example, gene body ± 500 kb) summary statistics were preprocessed using MungeSumStats81. To relax the single causal variant assumption, we used a colocalization approach where we fine-mapped all traits with SuSiE and then performed colocalization among all credible sets using functionality of the coloc (v5.2.3)82,83 and susieR (v0.12.35)75 R packages. For this, we set the prior probability that a SNP is associated with both traits to 5 × 10−6 and restricted the maximum number of credible sets for the outcome data to five82.
Tissue enrichment of metabolic loci
We tested whether genes proximal to metabolic loci and assigned effector genes were enriched in tissue compartments by leveraging data from the Human Protein Atlas84. Specifically, we used a two-sided Fisher’s test whether metabolic genes were enriched among tissue-specific genes (tissue-enriched or tissue-enhanced as defined by the Protein Atlas) against all protein-coding genes as background.
Pleiotropy assignment and overlap with the GWAS Catalog
To assign modes of pleiotropy for each mQTL, we first clumped lead credible set variants across NMR measures by LD, collating variants with r2 ≥ 0.6 as a single signal, referred to hereafter as mQTL group. This was done based on dosage files of all unrelated British European UKB participants and implemented with the igraph (v.2.0.1.1) package in R. For each mQTL, we computed pairwise Pearson correlation coefficients among associated NMR measures. We classified each mQTL group on: (1) the 25th percentile of all pairwise correlations, and (2) the Pearson correlation coefficient between the association strengths for each measure (− log10(P value)) and its correlation coefficient with the most strongly associated measure within the mQTL. The latter is a measure to what extent the association between NMR measures at a given locus (‘pleiotropy’) can be explained by being correlated with the most proximal associated measure. Based on opposing those two measures for all mQTLs we defined the following five groups: (1) ‘specific’ mQTLs associated with only ≤3 highly correlated NMR measures (rho ≥0.6); (2) ‘pathway pleiotropic’ mQTLs associated with highly correlated NMR measures (rho ≥0.6) that followed the described association pattern (rho ≥0.6); (3) ‘proportional pleiotropic’ mQTL groups associated with, in part, uncorrelated NMR measures but highly correlated association statistics (rho ≥0.6); (4) ‘disproportional pleiotropic’ mQTLs associated with highly correlated NMR measures (rho ≥0.6), but without evidence that this translated into a correlation of association statistics (rho <0.6), and; (5) all remaining mQTLs as ‘unspecific pleiotropic’ groups.
To quantify the extent to which our pleiotropy assignment extends beyond the NMR measures analyzed here, we intersected mQTLs and proxies thereof with results reported in the GWAS Catalog (downloaded 20 May 2024). We first pruned GWAS Catalog entries for those with mapped traits (to minimize double counting), results that met genome-wide significance (P < 5 × 10−8) and had location information available. We further dropped results similar to NMR measures based on broad Experimental Factor Ontology (EFO) terms (for example, EFO:0005105 and child terms indicating ‘lipid or lipoprotein measurement’). To further account for traits mapping to similar categories, we iteratively traced back-mapped EFO terms to broader parent terms. We finally classified mQTLs to be ‘specific’ in the GWAS Catalog if they associated with fewer than five parent EFO terms and ‘unspecific’ otherwise.
Integration with cardiovascular endpoints
We next aimed to investigate the shared genetic basis of the 249 NMR and 25 selected CVD traits. We utilized public databases (GWAS Catalog, openGWAS, CVD-KP) to collect CVD data comprising the largest currently publicly available GWAS datasets on CAD and myocardial infarction, angina pectoris, aortic aneurysm, heart failure and stroke, and peripheral arterial disease, including two to five subtypes for some phenotypes (Supplementary Table 13). Data were harmonized and, if necessary, lifted over to GRCh37 using the MungeSumstats (v1.13.2) R package81. We queried mQTL lead variants and proxies in strong LD (r2 > 0.8; LD backbone based on UKB, as described above) of each NMR trait in each region and corresponding summary statistics for each CVD trait.
To investigate ‘locus’ effects, we performed statistical colocalization for all combinations of the NMR traits–CVD traits as described before (see ‘Causal gene assignment’ section).
To estimate ‘level’ effects of NMR metabolite concentrations on CVD outcomes, we performed Mendelian Randomization analysis using the TwoSampleMR package (v0.5.1), implementing the inverse-variance weighted and the MR-Egger methods. We used all 249 NMR metabolites as exposure variables, the 25 CVDs as outcome variables and assessed separately four sets of instruments: (1) sentinel variants, (2) lead credible set variants, (3) lead credible set variants restricted for molecular pleiotropy (for example, ‘pathway pleiotropy’) and (4) lead credible set variants restricted for both molecular and phenotypic pleiotropy. We used the Wald ratio method to estimate the effect of NMR concentrations on CVD outcomes using only single genetic variants85. We used MR-Egger to test for evidence of a pleiotropic association, an intercept P value >0.0001 indicating evidence of no pleiotropy and checked for concordance between the effect estimates of inverse-variance weighted Mendelian randomisation (IVW-MR), MR-Egger and single genetic variant MR. We controlled the FDR at 5% (ref. 86). To further limit the possible extent of pleiotropic associations, we only reported ‘level effects’ passing these filters in the variant sets 2–4, prioritizing the association in the more stringent variant set.
The overlap of ‘locus effects’ showing no ‘disproportional pleiotropy’ according to the section ‘Pleiotropy assignment and overlap with the GWAS Catalog’ as well as a significant single variant MR (FDR 5%) and ‘level effects’ calculated from metabolite-specific or metabolite- and phenome-specific variants was used to identify gene–metabolite pairs associated with CVD risk independent of LDL metabolism. We considered loci as independent from LDL metabolism if they did not associate with clinical LDL cholesterol at the locus with P < 2.0 × 10−10 and the effect estimate of any variant on clinical LDL-C ranked upward the 80th percentile of all effect estimates at the locus.
Whole exome sequencing data QC for rare variant analyses
An in-depth description of whole exome sequencing, including experimental details, variant calling and standard QC measures for the UKB has been extensively reported by Backman et al.87. We performed additional QC steps at the UKB Research Analysis Platform (RAP; https://ukbiobank.dnanexus.com/).
We used bcftools (v1.15.1) to process population-level Variant Call Format (pVCF) files. Initially, we normalized the data using the reference sequence GRCh38 build, followed by splitting multiallelic variants. Subsequently, we conducted QC on these variants using a set of parameters outlined below to filter high-quality variants for downstream genetic analyses. Genotypes for SNPs were set to missing if the read depth was less than 7 (or less than 10 for INDELs) or if the genotype quality was below 20. Furthermore, we excluded variants if the allele balance was less than 0.25 or greater than 0.8 in heterozygous carriers. Finally, we excluded variants with missingness >50%.
Variant annotation and gene burden masks
Variants were annotated using ENSEMBL VEP88 (v106.1) with the most severe consequence for each variant chosen across all protein-coding transcripts. We further utilized additional plugins REVEL89, CADD v1.690 and LOFTEE91 for variant annotation. Based on these scores, we defined six partially overlapping variant masks: (1) high-confidence predicted LoF (pLOF, based on LOFTEE and includes stop-gained, splice site disrupting, and frameshift variants); (2) any pLOF assigned high impact by VEP; (3) pLOF and high-impact missense variants (CADD score >20 or REVEL score >0.5); (4) pLOF and any missense variants; (5) only high-impact variants; and (6) any missense variants but not pLOF. We tested synonymous variants separately as a negative control. We tested each mask in different MAF bins, using 0.5% and 0.005% as thresholds.
We performed rare variant association testing (RVAT) using whole exome sequencing (WES) data across 249 NMR phenotypes using REGENIE (v3.1.1) via the DNAnexus Swiss Army Knife tool (v4.9.1). Similar to common variant GWASs, we used a two-step approach by REGENIE. We additionally generated step 1 leave-one-chromosome-out (LOCO) files with and without adjusting for common signals via a polygenic score (PGS derived from all lead credible set variant per NMR trait) in the RVAT models per phenotype. All RVAT models were then adjusted for PGS in addition to age, biological sex, fasting duration and the first ten genetic PCs. We first performed aggregated gene burden testing across for 19,026 genes using a set of masks as defined above. For gene burden testing, we used the aggregated Cauchy association test to estimate P values for each gene across masks and allele frequency bins. The aggregated Cauchy association test first computes P values for all sets defined by various masks within a gene and then takes these P values as input to compute one P value for the respective gene via a well-approximated Cauchy distribution.
We performed single variant association testing for exonic variants (ExWAS). For the ExWAS, we tested variants with MAC >5 and reported results for variants with MAF <0.0005. We have performed these analyses in individuals of British European, British African and British Central/South Asian ancestry.
We considered findings as robust if they passed multiple-testing-corrected statistical significance (gene burden: P < 1.2 × 10−8 (corrected for the number of genes × number of traits); ExWAS: P < 2.0 × 10−10 (same as for common variant GWAS, conventional genome-wide significance corrected for the number of traits)) in both the model with and without adjusting for the common variant PGS and effect sizes did not differ by more than 20% between these models, as this might otherwise indicate that rare variant findings cannot clearly be distinguished from common variant effects.
Phenotype definition
To systematically test for phenotypic consequences of genes identified through rare variant analysis, we collated 626 disease entities following previous work1 by aggregating information from self-report, HES, death certificates and primary care data (45% of the UKB population). Each disease entity had at least one significant common variant, and we used a similar analysis workflow using REGENIE as described for NMR measures but using logistic regression with saddle point approximation.
Integration of OMIM
We downloaded the OMIM gene–disease list (9 November 2023) and kept 7,327 unique entries after filtering for gene entries with high confidence (level 3). We computed the enrichment of genes associated with any NMR measure from rare variant or gene burden analysis against a background of 19,989 protein coding genes using Fisher’s exact test.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Source link