Basal cell of origin resolves neuroendocrine–tuft lineage plasticity in cancer

Resource availability

Materials availability

There are limitations to the availability of basal-derived organoid lines generated in this study owing to their derivation from primary cells in the tracheal epithelium. The human SCLC tissue used in this study was not available because of sample scarcity. Human transcriptomic data from Caris Life Sciences used for this study are not publicly available but can be made available upon reasonable request. The de-identified sequencing data are owned by Caris Life Sciences, and qualified researchers can apply for access by signing a data usage agreement.

Experimental models and study participant details

Mice

Rb1^fl/fl;Trp53^fl/fl;H11b–LSL–Myc^T58A/T58A–Ires–Luciferase (RPM; The Jackson Laboratory (JAX) no. 029971)⁴⁰, RPM–R26–LSL–Cas9–Ires–Gfp (RPM–Cas9)^26,27, RPM;Ascl1^fl/fl (RPMA)²⁷, Rb1^fl/fl;Trp53^fl/fl;Rbl2^fl/fl (RPR2)⁶² and Rb1^fl/fl;Trp53^fl/fl;Pten^fl/fl (RPP)^63,64 mice have been previously described. SCID/beige mice (CBSCGB) were purchased and available from Taconic and Charles River Laboratories. Ai9-tdTomato Cre-reporter mice (B6.Cg-Gt(ROSA)26Sor^{tm9(CAG-tdTomato)Hze}/J) were generously donated as a gift from C.-L. Lee at Duke University and are available for purchase through JAX (strain no. 007909; RRID:IMSR_JAX:007909).

All mice were housed and treated according to regulations set by the Institutional Animal Care and Use Committee (IACUC) of Duke University. Specifically, the mice were housed under conditions described by the Guide for the Care and Use of Laboratory Animals with a 12-h light/12-h dark cycle, at temperatures from 18–24 °C and in a 40–60% humidity range. Sample groups were randomly allocated but with intention to equally distribute sex among cohorts. Viral infections were performed in a Biosafety Level 2+ room following the guidelines from Duke University Institutional Biosafety Committee. Male and female mice were distributed equally for all experiments. Mice with symptoms including but not limited to inability to ambulate, eat or drink; weight loss in excess of 15% of body weight; tumours exceeding 10% of body weight; tumours with necrosis or ulceration of skin surface; laboured breathing; or other signs of poor body condition were killed before study end points to ensure human end points, as defined by Duke University’s Policy on Tumor Burden in Rodents and as permitted by IACUC protocol no. A057-22-03(2025). Tumour volume or end points, as defined by our IACUC protocol, were not exceeded in any of our studies.

Basal-derived organoid cultures and cell lines

Basal-derived organoid cultures from RPM, RPMA and RPR2 mice were obtained and transformed ex vivo. Organoid lines were determined to be free of pathogens by IDEXX 18-panel mouse pathogen testing and were confirmed to be Mycoplasma-negative before implantation to SCID/beige hosts. The cell lines used in this study include HEK-293T/17 cells (American Type Culture Collection (ATCC) CRL-11268) to produce lentivirus and H1048 SCLC cells (ATCC CRL-5853). Cell lines were tested for Mycoplasma every 3 months and were negative. Cell line identities were confirmed through short tandem repeat profiling within 6 months of usage, last performed in July 2024. No commonly misidentified cell lines were used in this study.

Patient tissue for immunostaining

Biopsies for the establishment of PDX models were performed after obtaining written informed consent from patients under an Institutional Review Board-approved protocol at Memorial Sloan Kettering (IRB14-091). Models were established and characterized, as previously described⁶⁵. As previously described²⁶ for human biopsies from Huntsman Cancer Institute, all patients provided informed consent for the collection of specimens, approved by the University of Utah Institutional Review Board (IRB_00010924 and IRB_00089989) in accordance with the US Common Rule. For tissue microarrays, human biopsies collected at Washington University in St. Louis were acquired with approval under IRB_202008098.

Caris Life Sciences patient cohort

Caris real-world data derived from a retrospective review of patient tumour specimens (n = 944) with a diagnosis of SCLC (on the basis of pathological confirmation by local pathologists) submitted to a Clinical Laboratory Improvement Amendments-certified laboratory (Caris Life Sciences) for molecular profiling. This study was conducted in accordance with the guidelines of the Declaration of Helsinki, Belmont Report and US Common Rule and in compliance with policy 45 CFR 46.101(b). This study was conducted using retrospective, de-identified clinical data, and patient consent was therefore not required. Human samples were derived from individuals in the age range of 65–70 years old and included approximately 49% male and 51% female samples. Population characteristics are reported in Supplementary Table 8. De-identified patient demographics and treatment information can also be found in Supplementary Table 8.

Method details

Naphthalene injury model and tumour initiation in mice

Mice at 6–8 weeks of age were treated intraperitoneally with 275 mg kg⁻¹ naphthalene before 9 AM in corn oil, as described⁶⁶, 72 h before administration of adenoviral Cre, a time point where KRT5⁺ basal cells are shown to be abundant and proliferative⁶⁷, which we have validated (Extended Data Fig. 1e,f). After naphthalene treatment, the mice were infected by intratracheal (RPM and RPP) or intranasal (RPP) instillation with 1 × 10⁸ plaque-forming units (pfu) of Ad5–K5–Cre adenovirus (University of Iowa VVC-Berns-1547) using established methods^68,69. No observed differences in latency or tumour phenotype occurred in RPP mice with intratracheal versus intranasal inoculation methods; therefore, both were included in the results. In brief, the mice were anaesthetized with isoflurane at a flow rate of 20–25 ml h⁻¹, depending on the size and sex of the mouse. The optimal breathing rate was approximately one breath every 2–3 s. For intratracheal instillation, the mice were positioned on a platform with their chest hanging vertically beneath them. A steel feeding tube or Exel Safelet IV catheter (needle removed) was slid into the trachea, and 63 μl of viral cocktail consisting of 10 mM CaCl₂ (Sigma; C5670), 1 × 10⁸ pfu adenovirus and MEM (Thermo Fisher Scientific; 11095080) up to 63 μl was administered through a P200 pipette to the catheter opening. The mice were maintained in this position until the entire volume was dispensed and then monitored until they regained full motility and recovered from anaesthesia. For intranasal instillation, the mice were held in a supine position and administered 63 μl of identical viral cocktail through a P20 pipette, alternating between the left and right nares for each drop. Administration of other Ad–Cre viruses (CGRP, VVC-U of Iowa-1160; SPC, VVC-U of Iowa-1168; CCSP, VVC-U of Iowa-1166; CMV, VVC-U of Iowa-5) also occurred in mice 6–8 weeks of age with identical methods, intratracheally, but without naphthalene injury, as previously described^27,40,70.

Micro-computed tomography imaging

To monitor tumour development in autochthonous models, mice were imaged beginning 4 weeks after Ad–Cre administration for RPM mice, 8 weeks for RPP mice and every 2 weeks thereafter. The mice were anaesthetized with isoflurane and imaged using a small animal Quantum GX2 micro-computed tomography (PerkinElmer). Quantum GX2 images were acquired with 18-s scans at 45-μm resolution, 90 kV, with 88 mA of current. The mice were killed when the tumour burden resulted in any difficulty in breathing or significant weight loss, as permitted by IACUC.

Immunohistochemistry

For immunohistochemistry of autochthonous mouse models, lungs were inflated with 1× PBS, extracted and individual lung lobes and trachea were collected for fixation. Tissues were fixed in 10% neutral buffered formalin for 24 h at room temperature, washed in PBS and transferred to 70% ethanol. Formalin-fixed paraffin-embedded (FFPE) sections at 4–5 μm were dewaxed, rehydrated and subjected to high-temperature antigen retrieval by boiling 20 min in a pressure cooker in 0.01 M citrate buffer at pH 6.0. Slides were quenched of endogenous peroxide in 3% H₂O₂ for 15 min, blocked in 5% goat serum in PBS/0.1% Tween 20 (PBS-T) for 1 h and then stained overnight with primary antibodies in blocking buffer (5% goat serum or SignalStain antibody diluent (Cell Signaling Technology (CST); 8112). For non-CST primary antibodies, an HRP-conjugated secondary antibody (Vector Laboratories) was used at 1:200 dilution in PBS-T, incubated for 45 min at room temperature and followed by DAB staining (Vector Laboratories). Alternatively, CST primary antibodies were detected using 150 μl of SignalStain Boost IHC Detection Reagent (CST; 8114). All staining was performed using the Sequenza cover plate technology. The primary antibodies included ASCL1 (Abcam; ab211327) 1:300, NEUROD1 (Abcam; ab213725; using Tris/EDTA buffer (pH 9.0) instead of citrate buffer for antigen retrieval) 1:300, POU2F3 (Sigma; HPA019652) 1:300, YAP1 (CST; 14074) 1:300, HES1 (CST; 11988) 1:300, DNP63 (R&D Systems; AF1916) 1:400, phospho-AKT Ser473 (CST; 4060) 1:100, MYC (Abcam; ab32072; Tris/EDTA buffer (pH 9.0) 1:300, NKX2-1 (Abcam; ab76013) 1:250 and KRT5 (BioLegend; 905501) 1:1,000. For manual H-score quantification, whole slides were scanned using a Pannoramic MIDI II automatic digital slide scanner (3DHISTECH), and images were acquired using SlideViewer software (3DHISTECH). Immunohistochemistry quantification from primary tumour models included tumours from both the trachea and lung lobes. H-score was quantified on stained slides on a scale of 0–300, taking into consideration the percentage of positive cells and staining intensity, as previously described⁷¹, where H-score = positive cells (%) × intensity score of 0–3. For example, a tumour with 80% positive cells with high intensity of 3 has a 240 H-score.

Immunofluorescence

Lung and tumour tissue was collected and fixed for at least 24 h in 10% neutral buffered formalin and then transferred to 70% ethanol before embedding in paraffin. Wild-type and transformed organoids (more than 1 × 10⁶ cells) were collected in approximately 0.5–1 ml of organoid medium using a P1000 pipette tip and then transferred to a conical tube containing 10 ml of 10% formalin. The organoids were fixed at room temperature in formalin for 24 h. After fixation, the organoids were spun down at 500g for 5 min and then washed in 70% ethanol. Ethanol was removed, and the organoids were resuspended in approximately 300 μl of 3% low-melting agarose gel (microwaved to melt and then incubated in a 50 °C water bath for 30 min) using a wide-bore P1000 pipette tip and then transferred to one well of a 96-well V-bottom plate. When the agarose solidified (approximately 3–5 min at room temperature), agarose plugs containing organoids were transferred from the well plate to histology cassettes, placed in 70% ethanol and then subjected to FFPE and sectioning for slides. Before staining, the slides were rehydrated in CitriSolv (2 × 3 min), 100% ethanol (2 × 3 min), 90% ethanol (1 × 3 min), 70% ethanol (1 × 3 min), 40% ethanol (1 × 3 min) and dH₂O (1 × 5 min). Rehydrated tissue was subjected to high-temperature antigen retrieval by boiling for 15 min in a pressure cooker containing 0.01 M citrate buffer at pH 6.0. The slides were cooled at room temperature for 2 h and positioned for staining in Sequenza staining racks (Thermo Fisher Scientific; 10129-584). The slides were blocked at room temperature for 1 h in 10% donkey serum in PBS/0.2% Tween 20 (PBS-T). For primary mouse-on-mouse (M.O.M.) tissue staining, M.O.M. IgG Blocking Reagent (VectorLabs; PK-2200) was also added according to the manufacturer’s protocol. The primary antibody was diluted in 10% donkey serum in PBS-T and added to slides, and the slides were incubated overnight at 4 °C. The following day, the slides were washed three times with PBS-T and then stained with the secondary antibody diluted in 10% donkey serum. For M.O.M. staining, M.O.M. protein concentrate (VectorLabs; PK-2200) was added to the secondary antibody solution, according to the manufacturer’s protocol. The slides were then subjected to three times extra washes with PBS-T, followed by DAPI staining (1 μg ml⁻¹ in PBS-T) for 20 min. After three extra washes in PBS-T, the slides were coverslipped with Aqua-Poly/Mount mounting medium (Polysciences; 18606-20). The primary antibodies included anti-mouse ASCL1 (BD Pharmingen; 556604) 1:25, anti-rabbit ASCL1 (Abcam; ab211327) 1:100, anti-goat NEUROD1 (R&D Systems; AF2746) 1:50, anti-rabbit NEUROD1 (Abcam; ab213725) 1:200, POU2F3 (Sigma; HPA019652) 1:100, anti-rabbit CCSP/SCGB1A1 (MilliporeSigma; 07-623) 1:75, anti-rat KRT8 (Developmental Studies Hybridoma Bank; TROMA-I) 1:100, anti-mouse FOXJ1 (eBioscience; 14-9965-80) 1:100, anti-mouse KI67 (BD Pharmingen; BDB556003) 1:100, anti-goat P63 (R&D Systems; AF1916) 1:40, anti-chicken mCherry/tdTomato (Sigma; AB356481) 1:100, anti-rabbit KRT13 (Abcam; ab92551) 1:200, anti-guinea-pig KRT13 (Origene; BP5076) 1:100, anti-rabbit KRT5 (BioLegend; 905501) 1:200, anti-mouse KRT5 (GeneTex; GTX60580) 1:100 and anti-rabbit CGRP (Sigma; C8198) 1:100. The secondary antibodies for immunofluorescence were all used at a concentration of 10 μg ml⁻¹ and included donkey anti-rabbit AF488 (Invitrogen; A21206), donkey anti-rat AF568 (Invitrogen; A78946), donkey anti-rat AF647 (Invitrogen; A78947), donkey anti-mouse AF647 (Invitrogen; A31571), goat anti-mouse IgG2a AF647 (Invitrogen; A21241), donkey anti-goat AF594 (Invitrogen; A11058), donkey anti-chicken AF594 (Invitrogen; A78951) and donkey anti-guinea-pig AF594 (Jackson ImmunoResearch; 706-585-148). The slides were imaged on an EVOS M5000 (Invitrogen; AMF5000) digital inverted benchtop microscope or on a Leica STELLARIS SP8 FALCON confocal microscope on an upright DM6 stand using a ×20 objective and laser illumination. Fluorescence signal was collected with two photomultiplier tubes and three HyS detectors (plus one HyX detector for four-colour staining). Images were acquired using Leica LAS X Microscope Software, including the Navigator function for imaging whole cross sections of the mouse trachea.

Analysis of K5–Cre-based reporter activity in mice

R26R–Ai9 mice were infected intratracheally with 1 × 10⁸ pfu of Ad5–K5–Cre adenovirus 72 h after naphthalene injury, as described above. Tissues from tracheas and lungs were collected at 0, 3, 4 or 7 days post-infection and subjected to FFPE using the methods described above, with tracheas and lung lobes embedded in separate cassettes. Tissues from the tracheas and all lung lobes of four mice were subjected to co-immunofluorescent staining using the methods described above, and one cross section of the lungs and trachea per animal was assessed for co-expression of tdTom with KRT5, P63 and/or KRT13. In total, 160 tdTom⁺ cells were manually assessed from the lung and tracheal tissues for co-expression of KRT13 and/or KRT5, and 90 tdTom⁺ cells were manually assessed for co-expression of KRT5 and/or P63, on the basis of overlapping fluorescent signal from images of all tdTom⁺ cells (acquired with a Leica STELLARIS SP8 confocal microscope and LAS X software; see ‘Immunofluorescence’ section). The results of manual quantification are included in Extended Data Fig. 11e and represent the percentage of tdTom⁺ cells in the tracheal epithelium, lung airway and total airway (trachea + lung airway epithelium) that co-express the indicated basal markers or are within two cell distances from KRT5⁺, P63⁺ or KRT13⁺ cells. TdTom⁺ cells lacking co-expression of a basal marker but within two cell distances from cells with high basal marker expression were included in the quantification because their proximity to basal cells suggests that they may have been targeted by K5–Cre in a basal state but basal marker expression was downregulated during rapid differentiation/regeneration after naphthalene injury.

Semi-automated image quantification

Quantification of cell types present in the murine lung and tracheal epithelium on days 0, 1.5, 2, 3, 5 and 10 after naphthalene injury (Extended Data Fig. 1e,f) was semi-automated using QuPath open software for bioimage analysis (v.0.5.1-x64). At each time point, stained tissues from one to four mice were imaged using a Leica STELLARIS SP8 confocal microscope and LAS X software (see ‘Immunofluorescence’). For tracheal quantification, whole tracheal cross sections were captured per set of co-stains using the LAS X Navigator software with a ×20 objective. For lung airway quantification, approximately three to ten images of distinct airways were captured per animal per time point. Images of tracheas and lung airways were then exported with scale bars as multi-page TIFFs for import to QuPath with ‘image type’ set as ‘fluorescence’. The lung and tracheal epithelium per image were manually annotated in QuPath, and then all annotations were subjected to automated analysis for positive cells per fluorescent marker using the appropriate channel (analyse > cell detection > cell detection). Cell detection parameters were selected for each individual stain on the basis of the positive and negative control regions of the images. After cell detection, the number of positive cells per annotation was exported to a.tsv file (measure > export measurements; export type = ‘annotations’) that lists the ‘perimeter’ and number of cells detected in each annotation. The ‘perimeter’ measurements of each annotation were divided by 2 to estimate the length of the epithelium quantified (approximate length = perimeter/2) and to ultimately obtain the estimated number of positive cells per millimetre of lung or tracheal epithelium. For lung airways, approximately 5–35 total millimetres of epithelium was quantified per cell-type marker per time point (CCSP, KRT5, KRT13 and CGRP). For tracheal airways, approximately 18–50 total millimetres of epithelium was quantified per cell-type marker per time point (CCSP, KRT5, KRT13, CGRP and P63).

Quantification of co-expression of subtype markers ASCL1, NEUROD1 and POU2F3 in RPM K5–Cre-initiated tumours (Extended Data Fig. 2f) was semi-automated and performed using QuPath (v.0.5.1). RPM K5–Cre-initiated tumours were subjected to immunofluorescence for SCLC subtype markers and DAPI (to mark nuclei) (see ‘Immunofluorescence’), and images of ten unique tumours from five mice were captured on an EVOS M5000 (Invitrogen; AMF5000) digital inverted benchtop microscope with a ×10 or ×20 objective per channel. Images from all four channels per tumour were exported with scale bars and imported individually to QuPath with image type set as fluorescence. The DAPI image per tumour was first subjected to manual annotation of the tumour region in QuPath (whole image annotation if the region only included tumour, or custom annotation if the image included normal regions). Next, the DAPI image was subjected to automated detection of all cells (analyse > cell detection), and then the identified cells were converted to ‘annotations’ by first creating a single measurement classifier on the basis of DAPI cell detections (classify > object classifier > create single measurement classifier). Settings included ‘object filter’ set to ‘detections (all)’, ‘channel filter’ set to ‘blue’, ‘measurement’ set to ‘cell: blue mean’ and ‘classifier name’ set to ‘blue’. Live preview was checked to manually ensure that the classifier captured all cells. After creating the single measurement classifier, all objects (‘cells’) were converted to annotations using the QuPath script editor. After running the command, all resulting annotations (each should highlight one cell in the image) in the ‘annotations’ pane in QuPath were copied (select all with cursor then edit > copy to clipboard > selected objects) and then pasted onto each image from the other three channels. For each of the other three images, automatic detection of positive cells for all annotations was performed (analyse > cell detection > positive cell detection) using appropriate parameters per channel to detect positive cells on the basis of the positive/negative control regions of images. Once positive cells were identified for all channels (all three images), annotation measurements were exported (measure > export measurements; ‘export type’ set to ‘annotations’). The resulting .tsv file contained ordered annotations representing individual tumour cells in each image, with corresponding columns of whether that annotation ‘cell’ was positive or not for each channel/image. Positive detections from each channel per annotation ‘cell’ were assessed to obtain the number of co-expressing cells per imaged region of ten distinct tumours.

Human SCLC cell infections

Human SCLC cell line H1048 was obtained from ATCC and cultured in RPMI medium supplemented with 10% fetal bovine serum (FBS), 1% l-glutamine and 1% penicillin–streptomycin antibiotic cocktail. To generate H1048 sgNTC, sgPTEN and myristoylated AKT cell lines, cells were infected with a non-targeting sgRNA (sgNTC) or an sgRNA against PTEN (sgPTEN: 5′–GAC TGG GAA TAG TTA CTC CC -3′) in the LCV2-hygro backbone (Addgene plasmid no. 98291) or infected with the pHRIG–AKT1 lentiviral construct (Addgene; 53583). In brief, high-titre virus (approximately 1–5 × 10⁷ pfu) was produced using HEK-293T cells transfected with a three-plasmid system, including the targeting construct and lentiviral packaging plasmids pCMV delta R8.2 (Addgene plasmid no. 8455) and pCMV–VSVG (Addgene plasmid no. 8454). Viruses were collected at 48 h and 72 h post-transfection, concentrated by means of ultracentrifugation (25,000 rpm for 1.45 h), resuspended in 1× sterile PBS and stored at –80 °C until use. H1048 cells were subjected to spinoculation at 37 °C, 900g, for 30 min. During spinoculation, 0.5–1 million cells per well of a six-well plate were cultured with 2-ml RPMI, 25-μl HEPES buffer (Thermo Fisher Scientific; 15630080), 8 μg ml⁻¹ of Polybrene (Santa Cruz Biotechnology; sc-134220) and 25-μl high-titre virus. Cells were selected 48 h after spinoculation with hygromycin (for LCV2-hygro-infected cells) or sorted for GFP to enrich for cells infected with pHRIG–AKT1.

Immunoblotting

For human cell line and mouse tumour western blots, protein lysates were prepared, as described^40,72, separated by means of SDS–PAGE and transferred to polyvinylidene fluoride (PVDF) membranes (Bio-Rad; 1704157) using the Trans-Blot Turbo Transfer System (Bio-Rad; 1704150). Membranes were blocked for 1 h in 5% milk, followed by overnight incubation with primary antibodies at 4 °C. The membranes were washed for 3 × 10 min at room temperature in Tris-buffered saline with Tween 20 (TBS-T). Mouse and rabbit HRP-conjugated secondary antibodies (Jackson ImmunoResearch; 1:10,000) were incubated for 1 h in 5% milk at room temperature, followed by washing 3 × 10 min at room temperature in Tris-buffered saline with Tween 20. The membranes were exposed to WesternBright Quantum HRP substrate (Advansta; K-12045-D50) and detected on HyBlot CL film (Denville Scientific). The primary antibodies included ASCL1 (1:1,000; Abcam; ab211327), NEUROD1 (1:1,000; CST; 62953), MYC (1:1,000; CST; 5605), PTEN (1:1,000; CST; 9559), POU2F3 (1:1,000; Sigma; HPA019652), pAKT (Ser473) (1:1,000; CST; 4060), pAKT (Thr308) (1:1,000; CST; 13038), total AKT (1:1,000; CST; 9272) and HSP90 (1:1,000; CST; 4877) as loading control.

Basal organoids

Tracheal basal cell isolation and organoid culture

Live, normal tracheal basal cells were isolated from RPM, RPR2 and RPMA mice (not treated with Ad–Cre) and grown as organoids, as described previously^12,73,74. In brief, the mice were euthanized, and three to four tracheas per genotype were isolated in cold DMEM/F12-Advanced media (Thermo Fisher Scientific; 12-634-238 + 10% FBS, 1% l-glutamine and 1% penicillin–streptomycin). Tracheas were opened to expose the lumen using a razor blade and forceps. Each trachea was placed in a 1.5-ml Eppendorf tube in 500-μl dispase (50 U ml⁻¹; Corning; 354235) diluted in HBSS-free medium (Thermo Fisher Scientific; 14175-095) to 16 U ml⁻¹ and incubated at room temperature for 30 min. After incubation, tracheas were transferred to new Eppendorf tubes containing 500 μl of 0.5 mg ml⁻¹ of DNAse (Thermo Fisher Scientific; NC9709009) diluted in HBSS-free medium and incubated for an extra 40 min at room temperature. Tracheas from each genotype were pooled in a 10-cm dish containing DMEM/F12-Advanced media, and forceps were used to gently pull apart the epithelial layers/sheets from the cartilage of each trachea. The media containing all tracheal epithelial sheets per genotype were transferred to a 15-ml conical tube and centrifuged at 4 °C, 2,000 rpm, for 5 min. The supernatant was removed. The remaining cell pellet was resuspended in 1 ml of TrypLE Express (Invitrogen; 12604013) and then incubated at 37 °C for 5 min. TrypLE was quenched through the addition of 10-ml DMEM/F12-Advanced media and then transferred through a 100-μm cell strainer into a 50-ml conical tube. Excess tissue was pushed through the cell strainer gently using a plunger from a syringe. Filtered cells were spun down at 2,000 rpm for 5 min at 4 °C, and the supernatant was removed. The remaining cell pellet was resuspended in 1 ml of fluorescence-activated cell sorting (FACS) buffer (20-ml PBS, 400-μl FBS and 80-μl 0.5 M EDTA), spun down and then stained in 100 μl of FACS buffer containing 1-μg anti-rat ITGA6/CD49 (eBioscience; 14-0495-85) primary antibody for 30–60 min on ice. The samples were washed three times in FACS buffer and then resuspended in 100 μl of FACS buffer with 1 μg of secondary antibody goat anti-rat APC (BioLegend; 405407) and incubated for 30 min in the dark on ice. The samples were washed three times with FACS buffer and then stained for 15 min with 1 μg ml⁻¹ of DAPI. After three times extra washes, the cells were subjected to FACS, and DAPI⁻ITGA6⁺ cells were isolated. The resulting basal cells were resuspended in 100% Matrigel (Corning/Fisher; CB-40234C or homemade), 20,000–100,000 cells per one 50-μl Matrigel dome, and then 50 μl of Matrigel was plated per well of a pre-warmed 24-well plate. After Matrigel solidified at 37 °C, 500 μl of organoid culture medium (OCM) consisting of 50% L-WRN conditioned medium⁷⁵ (Sigma; SCM105 or homemade), 50% DMEM/F12-Advanced media (supplemented with 10% FBS, 1% l-glutamine, 1% penicillin–streptomycin, 10 ng ml⁻¹ EGF (Thermo Fisher Scientific; PHG0311), 10 ng ml⁻¹ FGF (Thermo Fisher Scientific; PHG0369) and 10 μM Y-27632 Rho kinase/ROCK inhibitor (MedchemExpress; HY-10071) was added per well. OCM was changed every 2–3 days, and organoids were split and expanded using TrypLE upon confluence.

Lentiviral transduction of organoids with CellTag Library V1

The CellTag V1 plasmid library was purchased from Addgene (plasmid no. 124591) and amplified according to the published protocol for this technology⁵². In brief, the plasmid library was transformed using Stellar Competent Cells at an efficiency of approximately 220 colony-forming units per unique CellTag in the V1 library. The library was isolated from Escherichia coli culture through the Plasmid Plus Mega Kit (QIAGEN; 12981) and assessed for complexity through high-throughput DNA sequencing with the Illumina MiSeq (75-cycle paired-end sequencing v.3). Generation of the CellTag whitelist from sequencing data resulted in 13,836 unique CellTags in the 90th percentile of detection frequency. High titer lentivirus was generated from the CellTag V1 library following published protocols^26,76 and titred on the basis of GFP fluorescence with 293T cells (ATCC; CRL11-268).

To generate ‘CellTagged pre-Cre’ organoids and allografts, normal (no Cre-mediated recombination) RPM, RPMA and RPR2 basal organoids were expanded for approximately 3.5 weeks post-isolation, and then approximately 1 × 10⁶ cells were transduced with the CellTag V1 lentiviral library. To generate ‘CellTagged post-Cre’ organoids and allografts, previously transformed RPM basal organoids (approximately 6–8 weeks before) were expanded and then subjected to CellTag V1 lentiviral transduction. For transduction, normal or transformed organoids were dissociated into single cells with TrypLE (Invitrogen; 12604013) for 30 min and subjected to mechanical dissociation every 10 min of TrypLE incubation. TrypLE was quenched, and cells were pelleted and resuspended in 500 μl of OCM plus 8 μg ml⁻¹ of Polybrene (Santa Cruz Biotechnology; sc-134220) and 25 μl of CellTag V1 high-titre lentivirus and then plated in one well of a 24-well plate. The cells were spinoculated at 300g for 30 min at room temperature to increase the transduction efficiency, incubated immediately after spinoculation for 3–6 h at 37 °C and then pelleted and replated in 50 μl of Matrigel and 500 μl of fresh viral supernatant. The organoids were incubated for 24 h, and then viral medium was replaced with normal OCM for organoid expansion. GFP was visible in more than 50% of cells as soon as 24 h after viral transduction.

Basal organoid Cre administration

CellTagged pre-Cre normal basal organoids (RPM, RPMA and RPR2) were expanded for approximately 8–10 weeks after CellTagging to allow clonal expansion and then were subjected to Cre-mediated transformation. CellTagged post-Cre normal basal organoids (RPM) were subjected to Cre-mediated transformation approximately 3–6 weeks after generation before CellTagging occurred. Because TAT–Cre treatment resulted in unreliable levels of recombination (Extended Data Figs. 3b, 4g and 5a), we used high-titre adenoviral CMV–Cre (University of Iowa; VVC-U of Iowa-5) to recombine all genotypes, including those for CellTagging experiments. For all samples, successful recombination with Ad–CMV–Cre occurred by (1) dissociating organoids into single cells (approximately 500,000–1 million cells) using TrypLE (Invitrogen; 12604013) for 30 min at 37 °C with mechanical dissociation every 10 min; (2) spinoculating (300g; room temperature; 30 min) organoids in 2.5–5 × 10⁷ pfu CMV–Cre in 500-μl OCM + 10 μg ml⁻¹ of Polybrene in a 24-well plate; (3) incubating 4–6 h at 37 °C; and (4) seeding in Matrigel and propagating as normal organoid cultures, as described above. Full recombination of all alleles for each genotype was confirmed using recombination PCR approximately 4 weeks after Cre treatment.

PCR validation of recombination efficiency

The QIAGEN DNeasy kit (69506) was used to isolate genomic DNA from basal-derived organoids after exposure to Cre. Fully recombined tumour-derived cell lines from each genotype were used for positive recombination controls. DNA concentrations were measured on a BioTek Synergy HT plate reader. Equal quantities of tumour genomic DNA (100 ng) were amplified by PCR with GoTaq (Promega; M7123) using primers to detect Rb1 recombination: D15′-GCAGGAGGCAAAAATCCACATAAC-3′, 1lox5′ 5′-CTCTAGATCCTCTCATTCTTCCC-3′ and 3′ lox 5′-CCTTGACCATAGCCCAGCAC-3′. The PCR conditions were 94 °C for 3 min, 30 cycles of (94 °C for 30 s, 55 °C for 1 min and 72 °C for 1.5 min), 72 °C for 5 min and held at 4 °C. The expected band sizes were approximately 500 bp for the recombined Rb1 allele and 310 bp for the unrecombined/floxed allele. Primers to detect Trp53 recombination included the following: A 5′-CACAAAAACAGGTTAAACCCAG-3′, B 5′-AGCACATAGGAGGCAGAGAC-3′ and D 5′-GAAGACAGAAAAGGGGAGGG-3′. The PCR conditions were 94 °C for 2 min, 30 cycles of (94 °C for 30 s, 58 °C for 30 s and 72 °C for 50 s), 72 °C for 5 min and held at 4 °C. The expected band sizes were 612 bp for the Trp53 recombined allele and 370 bp for the unrecombined/floxed allele. Primers to detect Myc^T58A recombination included the following: CAG-F2 5′-CTGGTTATTGTGCTGTCTCATCAT-3′ and MycT-R 5′-GCAGCTCGAATTTCTTCCAGA-3′. The PCR conditions used were 94 °C for 2 min, 35 cycles of (95 °C for 30 s, 60 °C for 30 s and 72 °C for 1.5 min), 72 °C for 7 min and held at 4 °C. The expected band sizes were approximately 350 bp for the recombined allele and approximately 1,239 bp for the unrecombined/floxed allele. Primers to detect Ascl1 recombination included the following: Sense Ascl1 5′UTR:5′-AACTTTCCTCCGGGGCTCGTTTC-3′ (for Cre recombined fwd), VR2: 5′-TAGACGTTGTGGCTGTTGTAGT-3′ (for Cre recombined rev), MF1 5′-CTCTGTCCAAACGCAAAGTGG-3′ (for floxed fwd) and VR2 5′-TAGACGTTGTGGCTGTTGTAGT-3′ (for floxed rev). The PCR conditions were 94 °C for 5 min, 30 cycles of (94 °C for 1 min, 64 °C for 1.5 min and 72 °C for 1 min), 72 °C for 10 min and held at 4 °C. The expected band sizes were approximately 700–850 bp for the Ascl1 recombined allele and approximately 857 bp for the unrecombined/floxed allele. Recombination PCR to detect Rbl2 (also known as p130) recombined (approximately 350 bp) and floxed (more than 1,500 bp) alleles was performed under conditions and with primers as previously described⁶². The PCR products were run on 1.2% agarose/Tris–acetate–EDTA (TAE) gels containing ethidium bromide or SYBR Safe, and images were acquired using a Bio-Rad Gel Doc XR imaging system.

Generation of basal-organoid derived allografts

After validating the recombination in basal organoids, fully recombined RPM, RPR2 and RPMA basal organoids were implanted as whole or partially digested organoids into flanks of SCID/beige mice that were between 6 and 12 weeks old (Taconic/Charles River Laboratories). Subcutaneous implants of approximately 0.5–3 × 10⁶ cells per flank in 50 μl of 50:50 Matrigel:OCM mix were performed. After implantation, basal organoid allografts were measured once to thrice weekly with calipers and collected when tumours reached an average of 1 cm³ but no greater than 2 cm³ or upon ulceration, loss of more than 10% of the baseline animal body weight or interference with animal eating, drinking or moving, whichever was earlier, in accordance with Duke University’s Policy on Tumor Burden in Rodents and to ensure a humane end point. A tumour volume of 2 cm³ is the maximum, as allowed by our IACUC protocol, and permitted end points were not exceeded in any study. Tumour tissue was then subjected to FFPE and/or dissociation for scRNA-seq experiments and/or reimplantation.

CRISPR editing of organoids and validation

To generate Pten knockout RPM and RPMA organoids, basal organoids transformed with CMV–Cre were infected with either a non-targeting sgRNA (sgCtrl) or an sgRNA targeting Pten (sgPten: 5′-TCATCAAAGAGATCGTTAGC-3′), both cloned into the LCV2-puro backbone (Addgene plasmid no. 52961). In brief, a high-titre virus (approximately 1–5 × 10⁷ pfu) was produced using HEK-293T cells transfected with a three-plasmid system, including LCV2–sgCtrl or sgPten and lentiviral packaging plasmids pCMV delta R8.2 (Addgene plasmid no. 8455) and pCMV–VSVG (Addgene plasmid no. 8454). Viruses were collected at 48 h and 72 h after transfection, concentrated by means of ultracentrifugation (25,000 rpm for 1.45 h), resuspended in 1× sterile PBS and stored at −80 °C until use. Fully recombined RPM and RPMA organoids were subjected to spinoculation with the high-titre virus using methods described in ‘Lentiviral transduction of organoids with CellTag Library V1’. Successful editing of Pten was validated through T7 endonuclease genome mismatch assays using the following primers: Fwd: 5′-CTCTCGTCGTCTGTCTA-3′ and Rev: 5′-CGAACACTCCCTAGGTGAATAC-3′. In brief, an approximately 1,000-bp region containing the sgPten site was amplified using Q5 High-Fidelity DNA Polymerase (New England Biolabs; M0492). The PCR conditions used were 98 °C for 30 s, 35 cycles of (98 °C for 10 s, 65 °C for 20 s and 72 °C for 30 s), 72 °C for 2 min and held at 4 °C. The PCR product was purified using a PCR DNA Clean & Concentrator Kit (ZYMO; D4030), and 200 ng of the annealed PCR product was subjected to T7 Endonuclease I digestion for 15 min at 37 °C. The digestion was quenched with 0.25 M EDTA. The products (digested and undigested controls) were run on agarose/Tris–acetate–EDTA gels containing ethidium bromide or SYBR Safe, and images were acquired using a Bio-Rad Gel Doc XR imaging system.

Immunoblotting was performed to validate PTEN loss through downstream induction of phospho-AKT (Ser473) in RPM and RPMA basal organoids, using methods described above for human cell lines. The primary antibodies included pAKT (Ser473) (1:1,000; CST; 4060S), total AKT (1:1,000; CST; 9272) and HSP90 (1:1,000; CST; 4877) as loading control.

Single-cell transcriptomics

scRNA-seq sample information

Samples sequenced for this study included (1) n = 1 wild-type basal organoid sample (no CellTag data); (2) a series of basal organoids and resulting allograft tumours that were ‘CellTagged pre-CMV–Cre’ (to trace the lineage from a single normal basal cell of origin) (n = 1 RPM organoid sample, n = 1 RPM allograft tumour, n = 1 RPMA organoid sample, n = 1 RPMA allograft (pool of three tumours), n = 1 RPR2 organoid sample and n = 1 RPR2 allograft tumour); (3) a series of organoids and resulting allograft tumours that were ‘CellTagged post-CMV–Cre’ (to trace the lineage from a single transformed basal cell of origin) (n = 1 RPM organoid sample and n = 2 RPM allograft tumours); and (4) primary lung tumours from autochthonous GEMMs (n = 2 RPM Cgrp–Cre initiated tumours and n = 2 RPM K5–Cre initiated tumour samples, each of which was a pool of two distinct lung tumours from one mouse). RPR2 transformed organoids were not analysed in this study, but data were included in the associated Gene Expression Omnibus (GEO) deposition. N = 5 extra RPM Cgrp–Cre initiated primary tumours were used in the analyses in this study but were previously published (GEO: GSE149180 and GSE1555692)^26,27.

RPM, RPMA and RPR2 organoid samples CellTagged pre-Cre were prepared for single-cell transcriptomics approximately 3 months after the initial CellTagging and approximately 1 month after Cre treatment (for transformed organoids). RPM organoids CellTagged post-Cre were prepared for single-cell transcriptomics approximately 6–8 weeks after transformation and approximately 2–3 weeks after CellTagging. All transformed organoid samples were prepared for scRNA-seq on the same day of implant to SCID/beige hosts (see Fig. 3a for experimental timeline).

Preparation of single-cell suspensions for scRNA-seq

Organoid samples were dissociated into single-cell suspensions using TrypLE Express for 30 min at 37 °C with mechanical disruption approximately every 10 min. Allografts and primary tumours used for scRNA-seq were isolated fresh from the lung or flank of mice and immediately subjected to digestion into single-cell suspensions and preparation for sequencing. Tumour tissue was mechanically dissociated into small clumps using scissors in 1 ml of an enzymatic digestion cocktail per sample and then incubated for 30 min at 37 °C. The digestion cocktail consisted of 4,200 μl of HBSS-free medium (Thermo Fisher Scientific; 14175), 600 μl of TrypLE Express (Invitrogen; 12604013), 600 μl of 10 mg ml⁻¹ of collagenase type 4 (Worthington Biochemical; LS004186) prepared in HBSS-free medium (Thermo Fisher Scientific; 14175-095) and 600 μl of dispase (50 U ml⁻¹; Corning; 354235) and sterilized using a 0.22-μm syringe filter. Enzymatic digestion was quenched on ice with 500 μl of quench medium containing 7.2 ml of Leibovitz’s L-15 medium (Thermo Fisher Scientific; 11415), 800 μl of FBS and 30 μl of 5 mg ml⁻¹ of DNase (Thermo Fisher Scientific; NC9709009) in HBSS-free medium. Tissue suspension was passed through a 100-μm cell strainer. Cells were spun at 2,500g for 5 min at 4 °C. The supernatant was removed and replaced with 500 μl of ammonium chloride–potassium lysis buffer per sample to remove red blood cell contamination (3 min incubation at 37 °C; VWR; 10128-808). The reaction was quenched with 10 ml of cold DMEM/F12-Advanced media supplemented with 10% FBS, 1% l-glutamine and 1% penicillin–streptomycin. The cells were spun at 500g for 5 min at 4 °C and resuspended in cold, filtered medium or cold, filtered 0.04% BSA in PBS at a concentration of 1–2 × 10⁶ cells ml⁻¹ and counted manually with a haemocytometer. Single-cell organoid or tumour cell suspensions were immediately subjected to multiplexing with 10X CellPlex or 10x Genomics library preparation, as described below.

RNA-seq library preparation

Multiplexed samples included transformed organoids CellTagged pre-Cre (RPM, RPMA and RPR2), one K5–Cre RPM sample containing one tumour more central in the lung and one tumour nearer the trachea and the two primary RPM Cgrp–Cre-initiated tumours. The samples were multiplexed before library preparation using 10x Genomics 3′ CellPlex Kit Set A (1000261) and Feature Barcodes (1000262) and following 10x Genomics Demonstrated Protocol (CG000391) following suggestions for dissociated tumour cells. After CellPlexing, the samples were loaded onto a Chromium X series controller (10x Genomics; 1000331), targeting 10,000 cells per sample. Samples not subjected to multiplexing were immediately loaded onto a 10x Chromium X controller targeting 10,000–20,000 cells per sample. Library preparation was performed following manufacturer’s protocols using the 10x Chromium Next GEM Single Cell 3′ Kit, v.3.1 (10x Genomics; PN-1000268). Completed libraries were sequenced on an Illumina NovaSeq 6000, Illumina NextSeq 1000 or a NovaSeq X Plus to target more than 30,000 reads per cell with the 10x Genomics-recommended paired-end sequencing mode for dual indexed samples. The individual sample details, including CellPlex oligo information, are provided as metadata with the GEO submission.

Demultiplexing and data alignment

scRNA-seq data were demultiplexed and processed into FASTQ files through the Cell Ranger v.7.2.0 pipeline (10x Genomics). The primary tumour samples were aligned to a custom mouse genome (GRCm38-mm10-2020-A), including eGFP, Cas9, Firefly luciferase (fLuc) and Venus, to detect recombined alleles in our various mouse models. RPM tumours in this publication express fLuc⁴⁰ following recombination of the MycT58A-Ires-Luc^LSL/LSL allele, and RPMA tumours express fLuc and Venus following recombination of the MycT58A-Ires-Luc^LSL/LSL and Ascl1^fl/fl alleles, respectively. Some RPM primary tumours were derived from RPM–Cas9–GFP mice and express eGFP and Cas9 in addition to fLuc after recombination. CellTagged basal allograft tumour samples were aligned to a custom mouse genome (GRCm38-mm10-2020-A) with fLuc and Venus to aid in detecting recombined tumour cells and include GFP.CDS and CellTag.UTR transcripts to allow detection of CellTags, according to the published workflow^52,77 (https://github.com/morris-lab/CellTagR). Sequences used for custom genome builds are included in this publication’s GitHub repository. Count barcodes and unique molecular identifiers were generated using Cell Ranger count or Cell Ranger multi-pipelines for CellPlexed samples.

Initial quality control and normalization

Quality control and downstream analysis were performed in Python (v.3.8.8) using Scanpy (v.1.10.0), according to current expert recommendations for single-cell best practices⁷⁸. Anndata objects were created from filtered feature matrices with sc.read_10X_mtx(), and quality metrics were calculated using sc.pp.calculate_qc_metrics(). Low-quality cells and potential doublets were initially excluded by selecting for cells with 15% or lower mitochondrial content (‘pct_counts_mito’), greater than 1,000–2,000 total counts (‘total_counts’; sample dependent) and greater than 500–2,000 but less than 7,000–9,000 genes detected (‘n_genes_by_counts’; sample dependent). Normalized counts were calculated with sc.pp.normalize_total() and a target sum of 10,000. Integrated anndata objects containing multiple scRNA-seq datasets were combined using adata.concatenate() with join=‘outer’.

Further quality control and clustering

Data were further processed using Scanpy (v.1.10.0), scvi-tools (v.0.17.4) and benchmarking standards to minimize batch effects while maintaining true biological variability, particularly across integrated objects⁷⁹. First, highly variable genes were determined with sc.pp.highly_variable genes() using 5,000 (for organoids and allograft tumours) or 10,000 (for primary tumours) top genes, flavour set to ‘seurat_v3’ and batch key set to the name of the specific scRNA-seq batch. Poisson gene selection was then calculated with scvi.data.poisson_gene_selection() with the same n_top_genes and batch_key used for highly variable gene selection. The probabilistic deep learning model was set up through scvi.model.SCVI.setup_anndata() to initialize the integration and clustering of datasets from anndata objects containing only the top genes identified by the highly variable Poisson gene selection. Continuous covariate keys were set as percentage of mitochondrial counts, and batch keys set to distinguish samples were prepared or sequenced at different times. The model was trained with default parameters, an early stopping patience of 20 and a maximum of 500 epochs using the model.train() function. The latent representation of the model was obtained with model.get_latent_representation() and added to the.obsm of each full anndata object (including all genes, not just the highly variable genes). Neighbours were then calculated with sc.pp.neighbors() with use_rep set to the.obsm category added from the latent representation. UMAP embedding was performed using sc.tl.umap() with min_dist=0.5. Finally, Leiden clusters were generated with sc.tl.leiden() with resolution set to 1.0 for initial steps. As required throughout the scvi pipeline, we used raw counts for all of the steps described above.

Extra rounds of quality control and data filtering were performed per dataset by assessing n_genes_by_counts, total counts and percentage of mitochondrial counts per cluster. In general, the model tends to cluster low-quality and doublet cells together; therefore, clusters with exceptionally high or low average genes_by_counts, total counts or mitochondrial content were labelled as low quality and considered for removal from the dataset. Removal was performed after assessing gene expression on the basis of known markers of tumour and normal lung cells and marker genes per cluster, as determined by sc.tl_rank_genes_groups(), to help ensure that biological cells that normally have higher or lower n_genes_by_counts, total_counts or percentage of mitochondrial content were not aberrantly filtered out. In addition, in tumour samples from autochthonous or allograft models, we removed non-tumour cells by assessing common immune and normal lung cell type gene expression. Each time a cluster was removed, we ran the scvi pipeline on the new anndata object iteratively through this quality control step until there were no longer any low-quality or non-tumour cell clusters in the anndata object. Additionally, each time clusters were removed from a larger anndata object, the pipeline was re-run for optimal clustering. Final Leiden clusters were determined with sc.tl.leiden() resolution set to 0.5–1.0 for all datasets (sample dependent, on the basis of heterogeneity in marker expression determined to be biologically relevant). The full source code used to reproduce all scRNA-seq analysis methods is on GitHub (https://github.com/TGOliver-lab/Ireland_Basal_SCLC_2025) and is publicly accessible upon publication.

Plot generation

UMAP plots showing clustering, sample information and/or expression of specific genes were generated with sc.pl.umap() with vmin = 0, vmax = ‘p99.5’ and the normalized count layer as input. Dot plots of normalized counts were generated with sc.pl.dotplot(), and, if clustered, the dendrogram was set to ‘True’. For violin plots, data were first converted from anndata objects to Seurat objects using the readH5AD() function in the zellkonverter package and the CreateSeuratObject() function in the SeuratObject package and then plotted in R using Seurat’s VlnPlot() function or the plotColData function in the scater package. Signature scores in UMAP were also generated in Seurat using the FeaturePlot() function.

Transcriptomic gene signatures and differential gene expression analysis

For signature score assessment and differential gene expression analysis, anndata objects were converted to Seurat objects in R, and normalized and log-transformed counts were used for visualization after the raw count data were subjected to Seurat’s NormalizeData() function. Cell cycle scores were assigned to Seurat objects using the CellCycleScoring() function and Seurat’s cc.genes gene lists, converted to mouse homologues.

MYC²⁶, ASCL1⁴² and NEUROD1²⁷ target gene signatures were previously generated and represent conserved transcriptional targets identified through ChIP–seq ± RNA-seq on mouse and human SCLC cell lines and/or tumours. To generate a POU2F3 target gene signature, published.bed files from POU2F3 ChIP on two human POU2F3 + SCLC cell lines (H526 and H1048) were downloaded (GEO: GSE247951)¹⁸. Peaks were called and annotated using ChIPseeker in R. Genes with peaks in the promoter region (less than 2 kb from gene TSS) were considered target genes. Conserved target genes between H1048 and H526 were identified, and only conserved target genes also enriched by log₂FC > 0.5 in POU2F3-high versus POU2F3-low human SCLC tumours by published bulk RNA-seq²³ were included in the final POU2F3 target gene score. The ATOH1 target gene score was derived from the binding and expression target analysis (BETA) of ChIP–seq data from SCLC patient-derived xenograft cells, as previously published⁴⁵. The YAP1 activity score was derived from a published 22-gene YAP/TAZ target signature derived from RNA-seq and ChIP–seq data from 891 cancer cell lines and including genes exhibiting a twofold or greater decrease in expression following YAP/TAZ knockdown or upregulated in YAP1 overexpression/activation⁴⁶. ASCL1, NEUROD1, ATOH1, POU2F3 and MYC target gene scores and YAP1 activity score were assigned to the metadata of converted Seurat objects using the AddModuleScore() function. All target gene and activity scores are included in Supplementary Table 2.

Normal NE, tuft and basal cell scores were previously published as consensus gene lists derived from mouse scRNA-seq data on normal lung cell types¹⁰, but proliferation genes, which included ‘Rpl’ and ‘Rps’ genes, were removed to eliminate cell cycle differences and focus on fate-specific markers. The published ionocyte consensus signature was limited to only 19 genes¹⁰; therefore, we applied an expanded ionocyte signature derived from genes >0.5 log₂FC enriched in droplet-based scRNA-seq data on mouse trachea/lungs from the same study¹⁰ (63 genes in total). In addition, we used a human ionocyte signature derived from the top 100 human ionocyte markers established in the same study from scRNA-seq on human bronchial epithelium¹⁰. Normal basal hillock and luminal hillock signatures were derived from mouse scRNA-seq data⁴³, and methods to generate these gene signatures are previously described⁴⁴. The normal cell type signatures are listed in Supplementary Table 2. Gene signatures to assess basal cell heterogeneity in organoid samples (Extended Data Fig. 7) represent the top 100 genes in published enrichment signatures of cell types identified in scRNA-seq data of human tracheal epithelium⁴¹ (listed in Supplementary Table 6).

SCLC archetype signatures (A, A2, N and P) are previously published^29,80,81 and were derived from archetype assignments of human SCLC cell lines. The archetype signatures per subtype are listed in Supplementary Table 2. Gene sets for human SCLC subtypes A, N and P (Extended Data Fig. 2m) were obtained from scRNA-seq data on human tumours and have been previously published⁸². Inflamed SCLC tumour signatures (Fig. 5g) were derived from published non-negative matrix factorization (NMF) studies on bulk RNA-seq data from n = 81 human SCLC tumours (Gay et al.⁸ and George et al.³⁹) or mRNA, protein and phosphorylation data from n = 107 human SCLC tumours (Liu et al.²³), where distinct inflammatory subsets were identified (annotated as NMF3 in both studies). We generated the Gay et al. inflammatory signature from the publication’s NMF-derived gene signature (n = 1,300 total genes)⁸ by taking genes greater than 1 log₂FC enriched in SCLC-I versus other SCLC subtypes, resulting in a signature with n = 379 human genes converted to mouse homologues (Supplementary Table 7). The Liu et al.²³ inflammatory signature comprised the top 100 genes enriched in NMF3 versus other NMF groups (greater than 1 log₂FC)²³, converted to mouse homologues (Supplementary Table 7).

Human SCLC subtype signatures were generated from the real-world Caris dataset and three extra bulk RNA-seq datasets of human SCLC tumours (Liu et al.²³, George et al.³⁹ and Lissa et al.⁶¹). All tumours were initially subtyped as hSCLC-A, hSCLC-N, hSCLC-Y, hSCLC-P, hSCLC-mixed or Lin⁻ according to the methods described below in Gene expression profiling and SCLC subtype classification. Caris subtype signatures represent the top 100 enriched genes per human subtype compared with all other samples. Liu et al.²³, George et al.³⁹ and Lissa et al.⁶¹ subtype signatures represent genes showing a log₂FC of 2 or greater in each subtype compared with samples from other subtypes (further detailed in Supplementary Table 7). Seurat’s AddModuleScore() function was used to apply human SCLC subtype signatures (after converting human genes to mouse homologues) from the real-world tumour data and the Liu et al.²³, George et al.³⁹ and Lissa et al.⁶¹ datasets to mouse tumour scRNA-seq data. The resulting subtype scores in our mouse data were then compared for similarity by means of the Pearson correlation matrix in Fig. 5e. The subtyping results and all human SCLC subtype signatures are listed in Supplementary Table 7.

Finally, the NE score was determined on the basis of Spearman correlation with an established expression vector, in which approximately 41 NE and 87 non-NE human cell lines were used to identify a core 50-gene signature that comprised 25 NE genes and 25 non-NE genes that robustly predicted NE phenotype⁸³. Seurat objects were converted to single-cell experiment⁸⁴ objects with the as.SingleCellExperiment() function, and then NE score was added as metadata and visualized using the Scater plotColTable() function. Other signatures were visualized similarly with Scanpy or using FeaturePlot() and/or VlnPlot() functions in Seurat.

Marker genes of Leiden clusters in UMAPs throughout the study were calculated using the sc.tl.rank_genes_groups() function in Scanpy on normalized count data with Wilcoxon rank-sum test and the number of genes set to 500.

CellTag analysis and clone calling

CellTags were identified in the RPM and RPMA basal organoid and allograft scRNA-seq samples using processed binary alignment/map (BAM) files (from Cell Ranger count output; see above methods) and following the CellTagR pipeline documentation (https://github.com/morris-lab/CellTagR). In short, BAM files were filtered to exclude unmapped reads and include reads that align to the GFP.CDS transgene or CellTag.UTR. CellTag objects were created in R, and CellTags were extracted from the filtered BAM files to generate the matrices of cell barcodes, 10x unique molecular identifiers and CellTags. The matrix was further filtered to include only barcodes identified as cells by the Cell Ranger pipeline and then subjected to error correction through Starcode. CellTags not detected in our whitelist with representation above the 75th percentile (generated from assessment of our lentiviral library complexity, as described above) were also removed. Clones were assigned as cells expressing more than two but less than 20 CellTags with similar combinations of CellTags (Jaccard similarity better than 0.8). For scRNA-seq analysis and CellTag visualization, we used Scanpy (v.1.10.0) for initial quality control and scvi-tools (v.0.17.4) for integration and clustering in Python, following the methods described above. The resulting anndata objects were converted to Seurat objects in R for further CellTag analyses. CellTag-based clonal information was added as metadata by 10x-assigned barcode to visualize clone distribution and cell identity per clone using standard visualization functions in R. The final clonal analyses only considered clones with more than five cells to ensure sufficient sampling of each clone. CellTag metadata for RPM and RPMA organoid and allograft tumour samples CellTagged pre-Cre and RPM organoid and allograft samples CellTagged post-Cre are included in Supplementary Table 4. CellTagging on RPR2 basal organoids and allografts occurred, but clonal representation was limited in the allograft tumour because of long latency of this model (approximately 6 months) and strong bottlenecks; thus, no CellTag information on RPR2 samples is included in this study.

ForceAtlas2 mapping, diffusion pseudotime and CellRank analyses

To visualize clonal trajectories, scRNA-seq data from RPM and RPMA basal-derived allografts were projected through force-directed graphing with ForceAtlas2⁸⁵ in Scanpy using the sc.pl.draw_graph() function with default settings. We chose to model the combined RPM and RPMA allograft cells on one trajectory because we observed that cells from each genotype occupied all Leiden clusters, although at variable frequencies, suggesting that the cells in each model have the potential to reach all transcriptional phenotypes. Diffusion pseudotime was calculated with sc.tl.dpt() with default settings (n_dcs=10) after setting root cells as cluster 17 basal-like cells (the phenotypic starting point of the experiment, that is, basal organoids) with adata.uns[‘iroot’]. Next, diffusion pseudotime was used as input to perform CellRank2 analysis⁸⁶, which computes a transition matrix of cellular dynamics and uses estimators to calculate subsequent fate probabilities, driver genes and gene expression trends. First, we set a PseudotimeKernel (pk) and then calculated a cell–cell transition matrix using pk.compute_transition_matrix(). Next, the Generalized Perron Cluster Cluster Analysis (GPCCA) estimator, a Markov chain-based estimator in CellRank⁸⁷, was initialized with g=GPCCA(pk). To assign macrostates, the estimator was fit using g.fit() with n_states set to 9 (determined after assessment of the top 20 eigenvalues) and cluster_key set to assigned Leiden clusters. Setting n_states to 9 allowed for all Leiden clusters with distinct SCLC fate phenotypes to be picked up as macrostates by the estimator. Cluster 17, the basal-like cluster, was assigned to the estimator as an initial state using g.set_initial_states(), consistent with the assigned pseudotime root state. Terminal states were predicted by the estimator on the basis of the highest stability values using default settings with the g.predict_terminal_states() function and allow_overlap set to True. The predicted terminal states are shown in Extended Data Fig. 6e. Next, g.compute_fate_probabilities() was used to assign probability values to all cells on the basis of how probable each cell is to reach each terminal state. Fate probabilities per cell, annotated by Leiden cluster or SCLC fate, were visualized using cr.pl.circular_projection(). The resulting graphs were manually annotated with the assigned cell fate groupings for Fig. 3i. The predicted lineage drivers of each terminal state were computed with the g.compute_lineage_drivers() function that predicts driver genes by correlating gene expression with fate probability values. The predicted CellRank lineage drivers for each terminal state are available in Supplementary Table 5. Finally, we focused on individual trajectories of interest to visualize temporal expression patterns of predicted driver genes along pseudotime using the generalized additive models (GAMs)⁸⁸. To initialize a model for GAM fitting, the cr.models.GAM() function was run with max_iter set to 6,000, spline_order set to 2 and n_knots set to 10. The top 50–100 predicted driver genes were fit to the GAM model per lineage. Gene expression changes over pseudotime were visualized using the cr.pl.heatmap() function and sorted according to peak expression in pseudotime (Fig. 3j and Extended Data Fig. 6f). The key parameters for the cr.pl.heatmap() function included data_key set to normalized counts, show_fate_probabilities set to True, time_key set to diffusion pseudotime and show_all_genes set to True. Source code to reproduce these analyses is deposited at GitHub (https://github.com/TGOliver-lab/Ireland_Basal_SCLC_2025).

Human SCLC data from Caris Life Sciences

Whole transcriptome sequencing sample preparation and data alignment

Whole transcriptome sequencing uses a hybrid capture method to pull down the full transcriptome from FFPE tumour samples using the SureSelect Human All Exon V7 bait panel (Agilent Technologies) and the NovaSeq platform (Illumina). The FFPE specimens underwent pathology review to discern the percentage of tumour content and tumour size; a minimum of 10% tumour content in the area selected for microdissection was required to enable enrichment and extraction of tumour-specific RNA. The QIAGEN RNA FFPE tissue extraction kit was used for extraction, and the RNA quality and quantity were determined using the Agilent TapeStation. Biotinylated RNA baits were hybridized to the synthesized and purified complementary DNA targets, and the bait–target complexes were amplified in a post-capture PCR. The resultant libraries were quantified and normalized, and the pooled libraries were denatured, diluted and sequenced. Raw data were demultiplexed using the Illumina DRAGEN FFPE accelerator. FASTQ files were aligned with STAR aligner (release 2.7.4a at GitHub). Expression data were produced using Salmon, which provides fast and bias-aware quantification of transcript expression⁸⁹. BAM files from the STAR aligner were further processed for RNA variants using a proprietary custom detection pipeline. The reference genome used was GRCh37/hg19, and analytical validation of this test demonstrated 97% or higher positive percent agreement, 99% or higher negative percent agreement and 99% or higher overall percent agreement with a validated comparator method.

Gene expression profiling and SCLC subtype classification

For stratification of patient samples into subgroups, RNA expression values for established and putative lineage-defining transcription factors ASCL1 (A), NEUROD1 (N), POU2F3 (P) and YAP1 (Y) were standardized to Z scores. Samples with a single positive Z score among A/N/P/Y were assigned to the respective gene-associated subgroups, samples with multiple positive Z scores were classified as ‘mixed’ and samples with negative Z scores for all four genes were classified as ‘lineage-negative’ (Lin⁻). This method was also used to ‘subtype’ tumours from the George et al.³⁹, Liu et al.²³ and Lissa et al.⁶¹ datasets (results in Supplementary Table 7).

Gene set enrichment analyses

GSEA was performed using human homologues of normal tuft, basal, NE and ionocyte cell signatures, previously established from mouse and/or human scRNA-seq datasets¹⁰, as described above in the scRNA-seq-related methods and included in Supplementary Table 2. The input included rank-ordered gene lists for each subtype classification on the basis of log-transformed fold change expression (log-transformed fold change of ASCL1 subtype versus all other samples, NEUROD1 subtype versus all other samples, and so on).

Quantification and statistical analyses

All statistical analyses were performed in R or using GraphPad Prism (v.6). Detailed statistical methods, including those for immunostaining quantification, bulk and single-cell transcriptomics analyses, are described in relevant figure legends and Methods. In general, for data with normal distributions, pairwise comparisons were subjected to Student’s two-tailed unpaired t-tests, whereas multi-group comparisons were subjected to one-way ANOVA tests followed by post hoc Tukey’s or Fisher’s LSD multiple comparison tests. Non-parametric data, as determined by normality testing (Anderson–Darling, Shapiro–Wilk and Kolmogorov–Smirnov), were subjected to two-tailed Mann–Whitney or Wilcoxon rank-sum tests for pairwise comparisons or Kruskal–Wallis tests followed by post hoc Dunn’s multiple comparison tests (uncorrected or with Bonferroni correction) for comparisons of more than two groups. For Kaplan–Meier survival analysis, P values were calculated using log-rank (Mantel–Cox) testing. P values < 0.05 were considered statistically significant for all tests, unless otherwise specified. No statistical methods were used to predetermine sample sizes. All violin plots with box–whisker overlays show median and quartiles. Error bars for all data shown represent mean ± s.d. or s.e.m., as indicated in figure legends. Manual immunostaining and quantification, allograft measurements and quantification of histopathologies were performed blinded to treatment status and/or genotype to prevent bias from skewing the results. Blinding was not reported for other experiments given that no subjective measurements were considered to be involved.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Source link