Study population
We studied three independent cohorts of patients with NSCLC who were treated with PD-1-based immunotherapies in the advanced or metastatic setting. The clinical characteristics of the cohorts are presented in Supplementary Tables 1 and 2. The first cohort, which served as the training cohort, was obtained from the Yale Cancer Center (New Haven). This cohort comprises 113 tissue samples collected between 2012 and 2019 (Fig. 1a). Tissue microarray (TMA) master blocks were created using nonadjacent tumor cores, each with a diameter of 0.6 mm, from each patient’s biopsy. To represent tumor heterogeneity, 5 µm-thick cuts from four independent blocks per TMA, each containing a different core (ROI) from the same patient tumor, were used for transcriptomic profiling using the DSP-GeoMx WTA or CTA (cancer transcriptome atlas, Bruker Spatial Biology, US) platforms and single cores were used for proteomic profiling using the PhenoCycler fusion (PCF) platform (Akoya Biosciences, US). The GeoMx WTA platform and two ROIs per TMA from the same patient tumor were used for proteomic profiling using the PCF platform. These samples were collected and used under the authorization of the Yale Human Investigation Committee (protocol 9505008219), with assurances filed with and approved by the U.S. Department of Health and Human Services. The Yale cohort TMA is designated as YTMA-471. The second cohort, used as a validation cohort, was obtained from the UQ. TMAs were constructed from resected NSCLC tissue specimens from patients who subsequently recurred and were treated with PD-1-based immunotherapies in the advanced or metastatic setting. This cohort contained 42 tissue samples collected between 2009 and 2018 (Fig. 1a) and is designated as UQTMA-4301. The study received ethical approval from the Queensland University of Technology Human Research Ethics Committee (2000000494) and was ratified by the UQ. The third cohort, serving as a second validation dataset, was obtained from Sotiria General Hospital, Medical School, National and Kapodistrian University of Athens, Greece. It comprised 79 patients who received PD-1-based therapy between 2019 and 2023 and was used to validate the generalizability of the findings from the Yale training cohort. The study was conducted under HIC protocol 16760/23-06-2023. The Greek cohort TMAs are YTMA-642 + YTMA-709. Written informed consent or waiver of consent was obtained from all participants before enrollment. A schematic of the workflow is shown in Fig. 1b. Clinical data are provided in Supplementary Data 1 and 2.
Study design
As stated above, the Yale cohort served as a training set, while the UQ cohort and the University of Athens (Greek cohort) served as validation sets. The training cohort facilitated the development of biomarkers associated with immunotherapy outcomes in machine learning models, while the validation cohorts provided independent data to assess the generalizability of these models. Treatment response was classified according to Response Evaluation Criteria In Solid Tumors version 1.1 (Complete Response, Partial Response, Stable Disease and Progressive Disease), and PFS was recorded for all cohorts. To minimize biases from prior therapies, the study focused solely on patients receiving first-line immunotherapy. As a result, the Yale cohort was narrowed down to 41 patients. The clinical endpoint was set to 2-year PFS, consistent with prior real-world and clinical trial data supporting this timeframe as a clinically meaningful surrogate for durable benefit in advanced/metastatic NSCLC. Major immunotherapy trials that led to regulatory approval, such as KEYNOTE-189 and KEYNOTE-407, implemented a 2-year cap on treatment duration2,11. In real-world clinical practice, treatment duration beyond two years was common due to the absence of data defining optimal therapy length. However, recent retrospective evidence from over 1,000 patients showed no significant OS benefit beyond two years36. Furthermore, approximately 20% of patients in that study discontinued treatment at two years despite the absence of progression. Taken together with the toxicity and financial burdens of prolonged therapy, these findings support a shift toward 2-year treatment as a clinical standard. Therefore, adopting 2-year PFS as an endpoint enhances the real-world relevance and interpretability of our findings37. Exclusion criteria for each cohort are depicted in the CONSORT diagrams (Fig. 1a,b). A schematic of data acquisition and analysis workflow is shown in Fig. 1b. We began with data acquisition using the PCF platform and validated findings using the DSP-GeoMx platform. The final analysis focused on findings based on cell types identified through the PCF-derived proteomic data.
Spatial proteomic profiling for Yale and UQ NSCLC cohorts
The CODEX staining method was used for Yale, UQ and Greek cohorts, using the Phenocycler Fusion platform. Staining and imaging were performed by the Akoya Biosciences Spatial Tissue Exploration Program (STEP) following the manufacturer’s instructions38. Formalin-fixed, paraffin-embedded TMA sections were mounted on SuperFrost charged slides, rehydrated and subjected to heat-induced epitope retrieval. A 26-antibody cocktail (190 µl per section) was incubated for 3 h at room temperature in a humidity chamber. The list of antibodies is provided in Supplementary Table 3. This was followed by several cycles of washing and fixation. Details on antibody barcoding, dilution and imaging cycles are provided in Supplementary Data 3. Imaging was performed using the PCF platform and registered qptiff files were exported for image analysis.
Image analysis
Images were imported into QuPath (version 0.4.2)39 and segmented using the Cellpose plugin (version 2.0)40 on the DAPI2 channel with the nuclear pretrained model (‘.cellExpansion = 4 µm’ and ‘.cellConstrainScale = 1.5’). Segmentation quality was visually inspected, and TMA cores with poor quality, such as those that were fragmented, folded, necrotic, or exhibiting high nonspecific fluorescence, were excluded. An artificial neural network pixel classifier was trained on PanCK (pan-cytokeratin) signals to define a tumor/stroma mask, capturing PanCK+ pixels as tumor. Only tumor nests larger than 100 µm2 were annotated as ‘tumor’. Cell metrics, including universally unique identifier (UUID) codes, spatial coordinates, nuclear size and median cell expression per channel, were exported for analysis in Python. Cell classifications from subsequent unsupervised clustering were re-imported into QuPath and matched by their UUIDs for visual inspection and quality control. A detailed image analysis workflow is illustrated in Extended Data Fig. 1a. Raw fluorescence intensity data (16-bit) underwent arcsinh transformation to normalize and reduce skewness, followed by marker scaling across cells. CD45 was used to illustrate preprocessing effectiveness (Extended Data Fig. 1b).
Data integration, clustering and phenotyping workflow
Expression matrices and cell metadata were imported into Anndata for quality control, preprocessing, clustering and phenotyping41,42,43 in Python 3.10. Data were first subset into canonical lineage markers (CD45, CD3e, CD4, CD8, CD20, CD14, CD68, CD11b, CD11c, CD31, CD34, SMA, Vimentin, E-cadherin, PanCK, nuclear size). Each slide was preprocessed individually by arcsinh transform (cofactor 150), marker-wise scaling, and cell-wise normalization, following recommended CODEX workflows44. Preprocessed data were concatenated and integrated using Scanpy’s Harmonypy (version 0.0.5)45 by slide, and adjusted principal components were used for clustering via Phenograph (version 1.5.7)46. Multiple Leiden resolutions (0.1–0.5) were tested across k = 15–30 and r = 0.2, with k = 15 empirically chosen to best resolve canonical cell types. Eighteen clusters were manually annotated to identify CD4, CD8, B cells, granulocytes, macrophages, dendritic cells (DCs), tumor cells, stromal cells, vessels and unidentified cells. Further classification in QuPath defined functional subsets, including CD4 TFH (PD-1+, ICOS+), Treg (FOXP3+), exhausted CD8 (PD-1+), cytotoxic CD8 (granzyme B+), M2 macrophage (CD163+), proliferating tumor (PanCK+Ki67+) and PD-L1 tumor (PD-L1+; Fig. 2a), using an object classifier trained on representative images. Labeled cells were then merged into 14 final cell types (Fig. 2b). For the analysis of PD-L1 expression in progressors and nonprogressors in tumor cells and macrophages, PanCK+ cells above the median were used to compute mean tumor PD-L1 per patient. Similarly, cells expressing CD68 and CD163 above the threshold were used to quantify macrophage PD-L1.
Spatial analysis
Cell proportions were calculated by normalizing the number of each cell type within a TMA core or tumor/stromal region to the total number of cells within that region, yielding cell percentages. Spatial interactions were analyzed using the ‘spatial_interaction’ function in Scimap (https://github.com/labsyspharm/scimap) with the radius method (radius = 30 µm, k-nearest neighbor = 10) for all pairwise combinations. Group means were compared between PFS groups. Cellular neighborhoods were defined using the Neighborhood Identification Pipeline (https://github.com/nolanlab/NeighborhoodCoordination)9. For each cell, the types of its ten nearest neighbors were recorded as features, which were then clustered into ten groups using a k-nearest neighbor unsupervised algorithm. The selection of ten neighbors and ten clusters was guided by prior literature9,47. Each cell was assigned to a ‘neighborhood’, based on the most frequent nearby cell types, and these clusters were manually annotated by their dominant phenotype. Differential enrichment of cell types across neighborhoods was assessed using an ordinary least squares linear model from the same pipeline9, measuring fold change significance based on binary PFS outcome.
DSP-GeoMx WTA for the Yale and Greek NSCLC cohorts
For the Yale cohort, TMA slides were processed using the DSP-GeoMx manual slide preparation protocol (MAN-10150-01) as described in our prior study5. Briefly, four different ROI containing distinct tumor areas of the same tumors were used to enhance reproducibility. Antigen retrieval of the formalin-fixed, paraffin-embedded tissue was conducted for 20 min, followed by deparaffinization and rehydration. The slides were then exposed to proteinase K for 20 min, after which RNA probes were applied to the tissues for in situ hybridization. The following day, stringent washes were performed to remove off-target probes. To delineate areas within each ROI, morphology markers were used for each AOI—CD68 for macrophage, CD45 for leukocyte, PanCK for tumor compartments and SYTO 13 for nuclear staining. The slides were incubated, washed and loaded onto the DSP-GeoMx instrument. Scanning, AOI selection and probe collection were performed as per the user manual (MAN-10088-03). Each AOI representing a compartment, such as tumor region, macrophage region or leukocyte region from a patient tumor core was collected in a 96-well plate. Next, a GeoMx-next-generation sequencing (NGS) readout library was prepared according to the user manual (MAN-10117-01). Sequencing was performed by Yale Center for Genome Analysis, and the sequencing data were processed with the GeoMx-NGS Pipeline (MAN-10153-01), converting FASTQ files to digital count conversion (.dcc) files for read counts. Quality control was conducted on each ROI to ensure accurate analysis of the approximately 18,000 genes targeted using the WTA panel.
DSP-GeoMx CTA processing for the UQ NSCLC cohort
For the UQ cohort, pretreatment TMA slides were processed using the DSP-GeoMx CTA manual slide preparation protocol. This methodology was adapted from ref. 5, which profiles approximately 1,800 genes (Cancer Transcriptome Atlas Panel). After antigen retrieval and deparaffinization, proteinase K treatment was performed, followed by in situ hybridization using CTA RNA probes. The stringent washes were then performed to remove off-target probes. CD3, CD68 and PanCK morphology markers were used to identify tumor AOIs (PanCK+) and stromal (PanCK−) regions. The slides were loaded onto the DSP-GeoMx instrument, and scanning, AOI selection, and probe collection were conducted per the DSP-GeoMx user manual. Each AOI representing a compartment, tumor or stroma, was collected into a 96-well plate. A GeoMx-NGS readout library was prepared following the DSP-GeoMx user manual (MAN-10117-01). PCR reactions were pooled into two mixtures for the tumor and stromal compartments. Sequencing was performed by the Australian Genome Research Facility, and data were processed using the GeoMx-NGS Pipeline (MAN-10153-01) to convert FASTQ files to digital counts conversion (.dcc) files for read counts. Quality control was performed according to preset thresholds to ensure accurate and high-quality profiling.
Univariable analysis of cell-type fractions associated with treatment outcome
We conducted univariable analysis to assess the association between cell-type fractions and treatment outcomes for 14 cell types in both tumor and stromal compartments. Cell fractions were derived using the PCF platform. HRs and two-sided log-rank test P values were calculated using Cox proportional hazards models to evaluate the impact of each cell type on 2-year PFS. BH correction was applied for multiple hypothesis testing.
Computational pipeline for the development of cell-type signatures
We developed cell-type signatures for both tumor and stromal compartments using a robust signature training framework based on a prior study5. The input consists of matrices \({M}_{{ij}}^{\,\mathrm{tumor}}\) and \({M}_{{ij}}^{\,\mathrm{stroma}}\) containing cell fractions for each patient i and cell-type j in tumor and stroma compartments, respectively, where there are 14 distinct cell types across tumor and stromal compartments. For each patient, we have PFS data (time in months and an event indicator: 1 = progression/death and 0 = alive/progression free). For each compartment, we generated n splits of the training data into tenfolds, generating M LASSO-penalized Cox models. For each split, tenfold cross-validation was used to select penalty parameter λ, and a LASSO model was trained using the optimal penalty for that split by pooling all tenfolds. We set lower.limits = 0 in R glmnet (version 4.1-9) for resistance models (positive nonzero coefficients) and upper.limits = 0 for response models (negative nonzero coefficients). This yielded M coefficient sets \({\beta }_{m,j}\), where \({\beta }_{m,j}\) is the coefficient of cell type j in model m. Cell types with a nonzero coefficient in at least t of the models were retained. A final unpenalized Cox model was fit using these selected cell types to obtain coefficients \({\beta }_{j}^{\mathrm{final}}\). The final signature score for patient i was calculated as \({S}_{i}^{\mathrm{final}}={\Sigma }_{j}{\beta }_{j}^{\mathrm{final}}{M}_{{ij}}\). Hence, the final signature is a Cox regression model trained on LASSO-selected cell types. This pipeline was applied to both tumor and stromal compartments to derive resistance and response signatures, where we set M = 50 for stroma and M = 100 for tumor, since we observed that these values were sufficient to identify a stable set of nonzero predictors for each compartment, and the threshold t was set to the maximum value such that each signature contained at least three cell types.
Independent validation of compartment-specific cell-type signatures
We computed the final signature scores for resistance and response models in both tumor and stromal compartments. Each patient received a final score, indicating their predicted risk of progression based on the specific compartment and cell-type signature. We assessed out-of-sample accuracy by evaluating HR and survival probabilities in the validation cohorts. In resistance models, positive coefficients predicted increased risk \(\mathrm{HR} > 1.0\); in response models, negative coefficients predicted reduced risk \(\mathrm{HR} < 1.0\). Statistical significance of HRs was assessed using Cox proportional hazards models. P values were calculated using a two-tailed log-rank test by binarizing the scores at the median and refitting Cox models.
Cell-type deconvolution using DSP-GeoMx- data for validation of spatial proteomic data
To validate the proteomically identified cell types, we analyzed gene expression data generated from the stromal compartment using the DSP software (version 0.4). The quantile-normalized matrix was deconvolved with CIBERSORTx, using the leukocyte gene signature matrix (LM22), which defines reference gene expression profiles for 22 immune cell types. This enabled estimation of stromal cell fractions for each patient. The deconvolution model solved for the cell fractions \({F}_{{ji}}\) using: \(B\times {F}_{i}\approx {M}_{i}^{\mathrm{DSPGeoMx}}\), where B is the LM22 signature matrix, \({F}_{{ji}}\) is the fraction of cells belonging to cell type j in patient i, and \({M}_{i}^{\mathrm{DSPGeoMx}}\) is the stromal compartment matrix for patient i. We compared cell fraction distributions between patients with different PFS outcomes (alive versus dead) using two-sided t tests. This analysis identified stromal cell types enriched in either group. By cross-validating our compartment-specific signatures against DSP-derived CIBERSORTx results, we confirmed the consistency and predictive relevance of the spatial proteomic cell-type features.
Computational pipeline for the development of gene signatures from cell-type signatures (cell-to-gene signatures)
We developed compartment-specific gene signatures based on the proteomics-derived cell-type signatures using a robust signature training framework based on the approach developed in a prior study5. Gene expression reference profiles were obtained from CIBERSORTx, and a public lung cancer scRNA-seq dataset (LuCA) focusing on CD4 T cells, macrophages, granulocytes and malignant cells. Reference gene lists for both models are provided in Supplementary Data 4. Using the mastR (version 1.8.0) R package48, 892,296 cells from LuCA were aggregated into 2,215 pseudobulk samples (>50 cells each), spanning 14 cell types. Marker genes were identified using differential expression and a ranked product permutation test with default parameters. We constructed matrices \({M}_{{ij}}^{\,\mathrm{tumor}}\) and \({M}_{{ij}}^{\,\mathrm{stroma}}\) containing gene expression counts for each patient i and gene j in tumor and stroma compartments, respectively, where genes were restricted to those associated with cell types occurring in the associated proteomics cell-type signatures. Each patient had PFS data (time in months and event indicator—1 = progression/death, 0 = progression free). For each compartment (tumor and stroma) and each cell-type gene set, we generated 50 data splits, each split 80/20 for training/testing. In each training split, differentially expressed genes between progression-free and nonprogression-free patients (P < 0.05, negative binomial exact test) were identified. For each split, 100 LASSO Cox regression models were optimized with different random seeds, by first using tenfold cross-validation on the current training split to select the penalty λ, and then fitting a final LASSO Cox regression model with the best value of λ using the current training set. Any gene appearing in at least one of these models was retained; a signature for the current data split was then trained by fitting an unpenalized Cox regression model on the current training split restricted to the selected genes, and the HR of the signature was estimated on the current test set. To constrain coefficient directions, we set lower.limits = 0 for resistance models (positive nonzero coefficients) and upper.limits = 0 for response models (negative nonzero coefficients). Each data split m thus produced a coefficient set \({\beta }_{m,j}\) and an associated HR estimate. Genes with nonzero coefficients in at least 0.05 of the models having \(\mathrm{HR} > 1.5\) (resistance) or \(\mathrm{HR} < 0.7\) (response) were selected for inclusion in the final signatures. These thresholds were set so that the number of genes included was at most ten. Final Cox regression models were then refit without the penalty using the entire discovery cohort (not split into train/test), and only the selected genes, to derive the final signature coefficients, \({\beta }_{j}^{\mathrm{final}}\). The final signature score for patient i was calculated as \({S}_{i}^{\mathrm{final}}={\Sigma }_{j}{\beta }_{j}^{\mathrm{final}}{M}_{{ij}}\). For training the resistance gene model, we used genes from cell types in the resistance cell-type signature models (for example, proliferating tumor, granulocytes and vessels), and for training the response gene model, we used genes from the cell types in the response cell-type signature models (for example, M1 macrophages, M2 macrophages and CD4 T cells). This approach was applied uniformly across tumor and stromal compartments to define gene-level resistance and response signatures derived from cell-type-level proteomic signatures.
Independent validation of cell-to-gene signatures
To validate the gene signatures, we calculated gene expression matrices for tumor and stromal compartments in both the UQ cohort and the Greek cohort. Sequencing and data processing for the UQ cohort were performed by the UQ research team. For the Greek cohort, sequencing and data preprocessing were conducted by the Yale team. Each matrix contained expression counts per patient and gene in the respective compartments. We computed final resistance and response signature scores for both tumor and stromal compartments, with each score indicating the predicted risk of progression. Out-of-sample performance was evaluated using HRs of the signature scores evaluated in the validation cohorts. Genes with positive coefficients in the resistance signature predicted increased risk \(\mathrm{HR} > 1.0\), while genes with negative coefficients in the response signature predicted reduced risk \(\mathrm{HR} < 1.0\). Statistical significance was assessed using Cox proportional hazards models, with two-sided log-rank test P values derived from each model to test the out-of-sample accuracy of our signatures. We also conducted a multivariable analysis adjusting for clinical factors when available (age, sex, disease stage, prior treatment, type of immunotherapy, line of immunotherapy, histology and smoking status). Cox multivariable regression and Chi-square tests (Supplementary Tables 4–7) were used to evaluate associations between gene signatures and outcomes. This validation step is designed to test the generalizability of the spatially defined compartment-specific gene signatures.
Statistics and reproducibility
No statistical method was used to predetermine sample size. One ROI was excluded from transcriptomic analysis due to pre-identified operator inconsistencies, as documented in Supplementary Fig. 6. The remaining data were included in all analyses. Samples were not prospectively randomized. The investigators were not blinded to allocation during experiments and outcome assessment.
Reporting summary
Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.
Source link