The evolution of hominin bipedalism in two steps

Sample collection

Human developmental stages E45–E72 were obtained from first-trimester termination procedures conducted at the Birth Defects Research Laboratory (BDRL; supported by National Institutes of Health (NIH) award number 5R24HD000836) at the University of Washington. Consent was obtained from donors by the BDRL prior to the procedures. All protocols followed ethical guidelines established by the NIH. The University of Washington Institutional Review Boards (IRBs) approved the collection and distribution of tissues for research, and permission to receive and use the samples for research purposes was obtained from Harvard University’s IRB (IRB16-1504) and Committee on Microbiological Safety (COMS) (18-103). Later stage (through 27 weeks) human samples were received via digital transfer of fully anonymized post-mortem imaging data from the Great Ormond Street Hospital for Children NHS Foundation Trust (GOSH). On procedure day, fresh samples from BDRL were washed in Hanks’ solution (pH 6.8–7.3) and shipped overnight on ice. Upon arrival in the lab, samples were dissected in a petri dish filled with 5% FBS/DMEM and quickly inspected for completeness. Three to four specimens from each stage—E45, E53/54, E57, E59, E67 and E72—were analysed in line with previous studies19 (see Supplementary Table 1 for sample details). Subsequently, samples were used for morphological (histology or computed tomography (CT) scanning) or genetic studies (multiomics and spatial transcriptomics) and were either fixed or frozen, as described in the corresponding sections.

Mouse (M. musculus) E15.5, E16.5 and E18.5 were obtained from the wild-type C57BL/6 NJ mouse line housed in the Capellini laboratory at Harvard University. Mice were euthanized by exposure to carbon dioxide for 5 min in accordance with ethical guidelines. Samples were then dissected under a microscope in ice-cold 1× PBS and used for histology or CT scanning (additional details below).

Morphological analysis

Histology of human and mice samples

The pelvic girdle (both left and right sides) and surrounding soft tissues were dissected under a light microscope. The left pelvis was fixed in Bouin’s fixative for 24–48 h. Samples were then washed in 1× PBS (3 times, 30 min per wash), dehydrated using an ethanol series, and embedded in paraplast at different orientations (transverse, sagittal, and coronal) for each developmental stage (see Supplementary Table 1 for sample details, embedding orientations and number of replicates). Paraplast-embedded samples were serially sectioned at 10 µm using a microtome and stored at room temperature until staining. The staining procedure followed a modified Mallory trichrome protocol57: a 10-min haematoxylin stain was followed by Mallory I, phosphomolybdic acid, Mallory II, dehydration and mounting in a xylene-based clear mount. Stained slides were stored in individual slide boxes (Supplementary Table 1). Trichrome stain highlights collagen and cartilage in bright blue; muscle in red; keratin in orange, nuclei and surrounding extracellular matrix in dark brown, and blood vessels in red.

Photography of histological slides

The histological collection of human and mouse pelvic girdle samples resulted in an extensive slide archive, with separate samples being sectioned in three planes: transverse, sagittal and coronal. Each slide contained 5–6 paraffin sections, leading to a total of over 1,500 sections requiring documentation (Supplementary Table 1). To streamline the photography of these slides, the ZEISS Axioscan 7 at the Harvard Center for Biological Imaging was used, enabling the simultaneous imaging of up to 125 slides. A custom smart profile was created to automatically detect and define tissue margins before analysis. ZEISS Blue software was then used to individually import the images for further analysis. A 100-year-old slide series (H&E-stained) for Microcebus myoxinus was loaned from the American Museum of Natural History, New York. Histological sections were carefully examined to identify those that highlighted pelvic girdle morphology across a developmental series (with specimens ranging from 12 mm to 18 mm). The 100-year-old slides were not compatible with the Axioscan; therefore, these invaluable slides were photographed using a TissueScope (LE 120) located at the Museum of Comparative Zoology (MCZ) in Cambridge, employing custom-made slide trays (Supplementary Table 1 for additional details). A developmental series of Saguinus oedipus geoffroyi housed at the MCZ was also added to the comparative dataset. High-resolution slide photographs are available on the MCZ website, under the special collections (Museum of Comparative Zoology, Harvard University, President and Fellows of Harvard College; license: https://creativecommons.org/licenses/by-sa/4.0/).

CT scanning of developmental samples

The developmental samples (dissected right or left sides that were not used for histological analysis) were fixed in 4% PFA58,59 overnight at 4 °C, dehydrated in methanol, and then transferred sequentially to 20% and 30% sucrose until they sank to the bottom of the collection tubes. Next, the samples were stained with 3–5% phosphomolybdic acid in 1× PBS for 1 week. If samples appeared overstained at the end of this period, they were rinsed in distilled water to remove excess stain. The samples were then scanned using the UChicago PaleoCT scanner (GE Phoenix v/tome/x 240 kV/180 kV scanner). Scanning parameters are provided in Supplementary Table 1.

CT scanning of primate specimens

Thirty-four developmental specimens representing 12 primate genera were selected for μCT scan imaging via the Field Museum of Natural History, Chicago (see Supplementary Table 1 for a list of specimens and their respective scanning parameters). Because museum specimens are typically stored in the same solution for years, the selected samples were placed in a freshly prepared solution of 70% ethanol in deionised water for 7 days to refresh the tissues. After this, the samples were transferred into a 10% iodine solution (prepared by mixing iodine (Sigma 207772) and potassium iodide (Sigma 221945) in a 1:2 ratio) diluted in 70% ethanol60,61. The specimens remained in this solution for three to ten weeks, depending on their size.

X-ray computed microtomographic (μCT) scanning of the specimens was performed at the PaleoCT Scanner Facility in the Department of Organismal Biology and Anatomy, University of Chicago (https://oba.bsd.uchicago.edu/node/261), using a Phoenix v|tome|x scanner equipped with a 180 kV nano-focus and a high-power 240 kV micro-focus CT tube.

CT scanning of chimpanzee specimens

The two chimpanzee specimens, estimated historically to be 100–200 years old, were obtained from the Natural History Museum of Germany. Owing to the rarity and value of these specimens, staining was avoided, and the samples were scanned in their original jars. Scanning parameters are provided in Supplementary Table 1.

Segmentation of CT scans

Scans of human, mouse and primate specimens were imported into Amira (v.2022.1) for segmentation of bone, cartilage, muscles and blood vessels associated with the pelvic girdle. Each structure of interest was assigned as a separate material. The volumes were then extracted based on the selected materials to create morphological meshes, which were exported as .stl files for comparative analysis.

10X Multiomics and Visium spatial transcriptomics

Human tissue samples were acquired from the BDRL. Next, samples were rinsed in 1× PBS and 5% FBS/DMEM. We analysed 1–2 specimens from each of the following stages: E45, E53/54, E57, E59, E67 and E72. At each timepoint, left and right sides of the sample were dissected under the microscope and were used for 10X Multiomics and 10X spatial transcriptomics, respectively (refer to Supplementary Table 1 for sample details and Extended Data Fig. 2 for experimental set-up). The right side was immediately embedded and snap frozen in OCT compound. The tissue blocks were stored at −80 °C. The left side was dissected under the microscope and the illium and soft tissues were separated. The ilium tissues were digested in 0.25% type I collagenase (Corning) for up to 2 h at 37 °C. Cells were lysed and nuclei were isolated following manufacturers’ instructions as detailed in Nuclei Isolation for Single Cell Multiomics ATAC + Gene Expression Sequencing protocol. Lysis time of the tissues varied based on developmental stage: E53/54, 5 min; E57, 7 min; E67, 8 min; E72, 9 min. Single-cell ATAC-seq and RNA-sequencing libraries were prepared according to the 10X Chromium Single Cell Multiome ATAC + Gene Expression kit. Targeted nuclei recovery ranged between 4,000–10,000 for the developmental stages. Isolated nuclei were transposed, loaded onto a Chromium Next GEM chip, which was next run on a Chromium Controller to generate a GEl beads-in-emulsion (GEM) with nuclei. For post GEM incubation, nuclei were cleaned using Dynabeads and SPRIselect on a magnetic separator. The GEM incubated nuclei were then separated into two mixtures and the following steps were performed to construct scRNA-seq and scATAC-seq libraries separately: the 10X barcoded DNA was generated from the transposed fragments, and 10X barcoded cDNA was generated from the single-cell RNA. Libraries were prepared according to manufacturers’ instructions. Sequencing was done using a NovSeq S2 platform (8 samples in 1 lane), 100 bp paired-end reads, at the Harvard University Bauer Core sequencing facility.

For spatial transcriptomics, the samples were sectioned sagittally and transversely, thoroughly choosing 8–10 sections that depict the morphology of the ilium and its surrounding tissues. The remaining sections were H&E stained for anatomical guidance. An optimization step was carried out using the 10X Genomics optimization slide and reagents kit (PN-1000193) prior to the experimental protocol to identify permeabilization times for each developmental stage. This was done by analysing the fluorescent signal for each stage under the LSM Zeiss 900 Confocal microscope: E45, 12 min; E54, 18 min; E57 and E59, 20 min; E67, 22 min; E72, 24 min. Once the optimization times were identified, spatial transcriptomics was followed using the 10X Visium spatial transcriptomics kit for fresh frozen tissues. Each gene expression slide was equipped with four slots (each slot was a 6.5 mm × 6.5 mm capture area) with 5,000 oligonucleotide barcoded spots.

Flash-frozen OCT tissue blocks were equilibrated at −20 °C prior to sectioning. The temperature of the cryostat blade was optimized to obtain proper sections (for example, −18 °C for human pelvis girdle samples). From each sample, sections of 10 µM thickness that depict the ilium morphology were placed on the capture areas. The slides were always kept on dry ice and were stored at −80 °C until processed or stained with H&E immediately after sectioning. The stained sections were imaged under a Leica BF microscope at ×10 magnification. The tissue sections were permeabilized following the identified optimization steps for each stage. Next, reverse transcription and second strand synthesis were performed. cDNA quantification was carried out using quantitative PCR (qPCR) using a KAPA SYBR fast qPCR kit on a Bio-Rad CFX96 real-time PCR machine. For each capture area, cDNA amplification cycles were calculated using the qPCR results (amplification cycles = 25% peak fluorescence of the Cq value). Amplified cDNA was cleaned up using SPRI beads, and quality and quantity of cDNA were measured using the Agilent 4200 TapeStation High Sensitivity D5000 ScreenTape. Based on the cDNA quantity for each sample, the total number of sample index PCR cycles was calculated for the subsequent Visium Spatial Gene expression library construction. Libraries were cleaned using a double-sided selection using SPRI beads. Adaptors were ligated and a final double-sided cleanup were done using SPRI beads. Quality of the constructed libraries was measured using the Agilent 4200 TapeStation High Sensitivity D5000 ScreenTape.

Post-library construction, libraries were quantified using the KAPA-Illumina PCR quantification kit, pooled, and sequenced on an Illumina NovaSeq at the Harvard University, Bauer core. Eight samples were run in one lane and sequenced to 100-bp reads per tissue based on manufacturers recommended depth.

10X Genomics multiomics analysis

The 10X Genomics multiomics protocol generated both scRNA-seq and scATAC-seq data for each cell processed during the data acquisition step. The quality of each raw FASTQ file (for both scRNA-seq and scATAC-seq) was initially checked using MultiQC (6.14). The number of cells obtained from single-cell multiomics is as follows: E53, 3,000–5,000; E57, 6,000–7,000; E59, 8,000; E67, 6,000; E72, 12,000 (Supplementary Table 1 shows total cell numbers per stage; Extended Data Fig. 4 highlights the exact cell numbers and quality control plots). Raw sequences were mapped onto the GRCh38 human genome using Cell Ranger software v.2.0.0 (developed by 10X Genomics). For scRNA-seq, output from the Cell Ranger software was analysed using two different pipelines: (1) Scanpy, which is explained in detail under the SCENIC+ analysis; and (2) the Seurat pipeline, which is explained below.

Seurat v.4.362 and Signac v.1.1063 were used for subsequent analyses. To assess the percentage of reads mapping to the mitochondrial genome, mitochondrial quality control metrics were calculated using the PercentageFeatureSet function, focusing on genes starting with ‘MT-’. The VlnPlot function was used to visualize the number of unique genes and reads (total RNA-sequencing, ATAC-seq and mitochondrial percentage). Cells with low-quality DNA (evidenced by low gene counts), high mitochondrial DNA levels (indicative of cell death) or abnormally high gene counts (suggestive of cell doublets) were excluded from the analysis to ensure data quality64 (Extended Data Fig. 4 and Supplementary Table 1). Data were then normalized using sctransform (available via the SCT assay). Next, principal components analysis (visualized using ggplot2) was performed on highly variable data. Cells were clustered using a graph-based clustering approach (a non-linear dimensionality reduction technique) in Seurat, grouping cells with similar gene expression profiles and open chromatin regions (using the smart local algorithm (SLM) with a 0.8 resolution). Genes unique to each cluster at each stage were catalogued using the function FindMarkers (Supplementary Table 2) and distinct expression patterns were visualized using dot plots (DotPlot), feature plots (FeaturePlot), violin plots (VlnPlot) and heat maps (DoHeatmap) (Extended Data Fig. 4, Supplementary Table 2 and Supplementary Fig. 2). Transcriptional activity for each gene identified in Seurat analysis was assessed by calculating ATAC-seq counts upstream (±2 kb) of the genes of interest. UMAPs were also generated using only scATAC-seq data (Extended Data Fig. 4). Annotations from the scRNA-seq dataset served as anchors when transferred to the scATAC-seq data. Using WNN, multiple modalities (RNA-seq and ATAC-seq) were integrated within each cell, generating new UMAPs (Supplementary Fig. 1). These WNN UMAPs were then used in subsequent analyses.

SCENIC+ analysis

To understand the underlying GRNs in chondrocytes, osteoblasts and the early perichondrium, single-cell gene expression data, single-cell open chromatin regions and transcription factor motif enrichment scores were combined to predict eRegulons (enhancer-driven GRNs) using SCENIC+65 (Supplementary Fig. 6). Four Python packages were used prior to running SCENIC+: scanpy66, pycisTopic67, pycisTarget65 and create_cis_Target_databases.

Pre-processing of scRNA-seq data

The output obtained from Cell Ranger was read as an AnnData object. For quality control, cells expressing more than 200 genes and genes expressed in more than 3 cells were retained for subsequent analysis. Doublets were filtered out using Scrublet68. Mitochondrial reads were removed to improve data quality. For the SCENIC+ analysis, the raw data matrix was used (as adata.raw), bypassing the normalization step. Cell-type annotations from the preceding Seurat analysis were used as a reference to label the clusters, ensuring consistency in cluster identification and marker genes across all subsequent analyses. The preprocessed scRNA-seq data were saved as an.h5ad file for later import into SCENIC+.

Pre-processing of scATAC-seq data

PycisTopic was used to cluster cells and accessible regions into regulatory topics, generating pseudobulk profiles for each cell type based on prior single-cell multiomic data cell annotations. For each pseudobulk profile, consensus peaks were inferred using MACS269, which produced.bed files for each cell type that were subsequently used in further analyses. Peaks were merged into consensus peak sets. For each cell, the log number of unique fragments per cell barcode, transcription start site enrichment per cell barcode, and duplication rate per cell barcode were calculated. Based on these quality control metrics, barcodes that failed the quality control check were filtered out and only those passing quality control were retained for downstream analysis.

Creating the cisTopic object

Next, a cisTopic object was created with metadata (for example, annotations of each cell), and a model with the optimum number of topics was selected. Next, for each cell type differentially accessible regions were computed. The differentially accessible regions were used to infer candidate enhancers per cell type in the subsequent SCENIC+ analysis. An already available CisTarget database for humans (hg38) was downloaded (https://resources.aertslab.org/cistarget/databases/homo_sapiens/hg38/screen/mc_v10_clust/region_based/).

Finally, using SCENIC+, the gene expression patterns identified from our scRNA-seq data, accessibility peaks and region sets from PycisTopic analysis, and the transcription factor binding sites from the CisTarget human database were combined together to identify potential enhancer-driven GRNs (eGRNs). Gene set enrichment analyses were performed to identify eRegulons, as described65. The subsequent SCENIC+ analysis was conducted using default parameters, retaining eRegulons with more than ten targets. The AUCell algorithm was employed to calculate the enrichment scores (area under the curve) for each eRegulon. After retaining eRegulons with ten or more targets, our analysis revealed the top three eRegulons for each cell type at E53 and E57 (based on the eRSS scores), with a primary focus on chondrocyte populations, mesenchymal cells, perichondral and osteoblast cells (Supplementary Table 7 and Supplementary Fig. 6).

Generation of eGRNs

The regulatory network output, with its transcription factors (eRegulons) and the targeted genes with enriched motifs, generated from the SCENIC+ analysis was imported into Cytoscape70 for visualization (Supplementary Fig. 4) and further analysis. The .csv files (Supplementary Table 7) containing transcription factors, their predicted target genes and associated eRegulon scores were formatted and loaded as network tables in Cytoscape. Target genes were sorted based on their triplet ranking65; genes with the lowest ranking (high triplet ranking values) were considered to have stronger interactions with the eRegulon of interest. eGRNs were generated for individual cell types at E53 and E57, as well as for combined chondrocyte populations, to visualize the underlying networks between cell types.

To improve visualization, nodes representing transcription factors were colour-coded based on their eRegulon association, while targeted genes were displayed with distinct circular node shapes for improved clarity. Edges were weighted according to confidence scores from the SCENIC+ analysis, providing insights into the strength of regulatory interactions.

GO enrichment analysis was performed for all targeted genes, and the results were overlaid onto the eGRNs. Special emphasis was given to genes previously implicated in abnormalities of the human pelvic girdle, which were highlighted in green (refer to Supplementary Table 7 for GO terms, P values and descriptions for each stage). The resulting network visualizations enabled the identification of key transcriptional regulators and their downstream targets, facilitating the interpretation of cell-type-specific regulatory programmes and interactions between chondrocyte populations.

10X Genomics Visium spatial transcriptomic analysis

Integration with scRNA-seq data

Each morphological section sequenced for spatial transcriptomic analysis produced raw FASTQ files, whose quality was checked as before using MultiQC (v.1.14)71. Sequences were mapped onto the GRCh38 human genome using Space Ranger (v.2.1.0). The data were preprocessed similarly to the single-cell multiomics data and normalized using sctransform. Seurat v.4.3 was then used to analyse the Space Ranger outputs and generate UMAPs for each histomorphological section. Marker genes identified from the single-cell data for each anatomical location were visualized using the SpatialFeaturePlot function. To increase the resolution of the Visium spatial dataset, the spatial data were integrated with stage-matched single-cell multiomics data (where the preprocessed scRNA-seq data served as an anchoring reference) to visualize cell cluster expression patterns (Supplementary Fig. 1). We implemented a generalized linear negative binomial model for each gene that is expressed in each spot to identify significantly expressed genes in tissue sections. The Space Ranger output also generated a CLOUPE file, which was further processed in Loupe Browser (v.8.0.0) to visualize gene expression across each histological section.

CellChat analysis

To investigate potential signalling pathways mediating communication between external cell populations (mesenchymal cell population at E45 and the perichondrium at E53) and the internal cartilaginous model in spatial transcriptomic sections at E45 and E53, we employed CellChat72. CellChat was used to infer ligand–receptor interactions from spatially resolved transcriptomic data, enabling the identification of key signalling networks within the tissues of interest. For this analysis, cell-type annotations from the preceding spatial and seurat analysis were integrated to define distinct populations. Ligand–receptor interactions were predicted by mapping expressed genes to known signalling pairs from the CellChat database (CellChatDB). Here, to visualize cell–cell communications, the ‘Secreted Signalling’ subset from CellChatDB was used. To prioritize biologically relevant interactions, we focused on pathways enriched in external cell populations that showed predicted signalling activity directed toward the internal cartilaginous model, using the identifyOverExpressedGenes and identifyOverExpressedInteractions functions. Next, average gene expression was calculated for each cell (using truncatedMean). Cells with low interaction counts were filtered out. Key pathways associated with cartilage development, extracellular matrix remodelling, and cellular proliferation were highlighted (Supplementary Fig. 3). To visualize the predicted signalling networks, we generated chord plots, heat maps and hierarchical plots, illustrating the directionality and strength of intercellular communication. These chord and interaction plots were mapped on to the spatial transcriptomic profiles to generate a hierarchical view of the cell–cell interactions across the spatial profile (using the function netVisual_aggregate). The results provided insights into potential signalling networks that may influence the development and maintenance of the cartilaginous model. Similar analyses were done for each multiomic samples as well to look deeper into the ligand–receptor interactions at a larger scale. However, for brevity we only focused on our early spatial transcriptomics data (Supplementary Fig. 3).

Intrinsic (internal) versus extrinsic (external) analysis

Each section (at E45 and E53) was re-clustered in Loupe Browser to distinguish genes that are differentially expressed between external and internal cell populations (external and internal cues) during the anteroposterior widening of the ilium (see Supplementary Note 2). The initial clustering performed using the Seurat pipeline identified only 5–7 clusters (Supplementary Fig. 1). Therefore, the re-clustering step further refined the cell clusters based on differentially expressed genes. Five to six sections were selected for both E45 and E53 (Extended Data Fig. 6), and the freehand tool was used to delineate regions of interest. For the internal cues, only dots overlaying the cartilaginous anlagen were selected, whereas for the extrinsic analysis, dots overlaying both cartilage and adjacent soft tissues were included. Each section was reanalysed to identify subclusters within the regions of interest (see Supplementary Tables 3 and 4 for detailed information on all subclusters and their genes). The top 300 genes in each subcluster from both internal and external cues were examined based on their log2-transformed fold change and P values. Their expression patterns were manually visualized in each section of interest. A comprehensive GO analysis was performed for the top genes in each cluster, identifying genes linked to pelvic girdle morphology, cell migration, cell proliferation and cell division (Supplementary Tables 3 and 4). From this refined gene list, populations of interest were identified: the E45 external mesenchymal population and internal iliac chondrocyte population and the E53 external perichondral cells and internal chondrocyte population. Genes of interest were further analysed using the UCSC Genome Browser to identify potential regions of significance (Supplementary Note 2).

Deciphering the underlying evolutionary signals and enrichment assay

The consensus peak files obtained from the MACS2 analysis (described in the preceding SCENIC+ analysis) were used to investigate evolutionary signals. For each cell type identified at various developmental stages, individual BED files were overlapped with HARs, HAQERs and hCONDELs (Supplementary Note 2) using Bedtools v.2.31.0 (Supplementary Table 5). To systematically evaluate the enrichment of HARs within these cell-type-specific peak sets across distinct developmental timepoints, enrichment analyses were performed relative to randomly sampled genomic backgrounds. Specifically, enrichment significance was determined by calculating fold enrichments and associated P values, with P values subsequently adjusted across all developmental stages using the Benjamini–Hochberg false discovery rate (FDR) correction method (results detailed in Supplementary Table 5). Peaks displaying significant enrichment were defined based on stringent thresholds: adjusted P values less than 0.05 and fold enrichment values substantially exceeding those expected by random chance (fold >1.5). For within-time-point comparative enrichment analyses, a rigorous permutation strategy involving 10,000 random shuffles was implemented to generate empirical background distributions. Each shuffled background was precisely matched to the median size of peaks and the total number of peaks within the tested cell-type-specific set. This provided a robust statistical baseline against which observed HAR overlaps could be compared. Furthermore, to specifically test whether particular cell types exhibited preferential enrichment for HARs relative to other co-occurring cell types at the same developmental stage, a pooled peak background was created. This pooled set comprised peaks from all cell types at a given stage, excluding those from the target cell type under examination. This exclusion strategy ensured a stringent comparative background, reducing bias from high-activity regions and thereby strengthening the specificity of enrichment detection. By employing this dual-background approach, utilizing both broad genomic and stringent within-time-point comparative backgrounds, the analyses robustly discriminated genuine evolutionary enrichment signals from background noise. This comprehensive and rigorous strategy enabled precise identification of cell types significantly enriched for HARs.

GREAT analysis

To study the biological significance of genomic regions associated with scATAC-seq peaks (for chondrocyte, mesodermal, perichondral and osteoblast cells at E53, E57, E67 and E72) identified from the MACS2 analysis, we performed Genomic Regions Enrichment of Annotations Tool (GREAT) analysis73. To ensure robust enrichment results, we conducted the analysis using two distinct background sets: a custom-generated background and a whole-genome background. The custom background was constructed by generating genomic regions that matched the size and distribution characteristics of the peaks of interest (a combined dataset with all the resulting peaks), ensuring a tailored reference set for more precise enrichment analysis. In parallel, the whole-genome background was employed to provide a broader context for assessing functional enrichment. Both analyses were conducted using the default association rule, which assigns genomic regions to genes based on proximity to the transcription start site and regulatory domain extension criteria. Functional enrichment terms, including GO, biological pathways and tissue-specific regulatory elements, were examined to identify potential biological processes linked to the identified peaks (refer to Supplementary Table 6 for more details).

Cell lineage tracing analysis

Velocyto (v.0.17)74 was used to generate the initial.loom file (a format to store the scRNA-seq data for each stage) from the 10X Genomics multiomic output files produced via Cell Ranger. Here, the run10X function was used along with the GRCh38 human genome and the corresponding annotation file to generate the.loom output file. This output file from Velocyto, along with cell annotations and UMAPs from Seurat for each developmental stage, served as input files for scVelo (v.0.24 in a Python environment)75 to generate RNA velocity plots75,76 (Fig. 4a). RNA velocity is a predictive method for visualizing a cell’s gene expression over time and space. The dynamical modelling approach was applied, with pre-processing of RNA data that included normalization by total size. For the dynamical model, a likelihood-based expectation-maximization framework was implemented. Spliced (mature cells) and unspliced (young cells) counts were calculated using the scv.tl.velocity function to trace each cell’s trajectory. The trajectory analysis was embedded onto UMAPs from the Seurat analysis (using the scv.pl.velocity_embedding_stream function) and visualized with the scv.pl.velocity function.

Hi-C analysis and visualizing the chromatin architecture

To investigate chromatin organization and identify topologically associating domains (TADs), we performed Hi-C analysis77 using publicly available data for chondrocytes (NCBI Gene Expression Omnibus (GEO) accession GSE200345). The.hic file was downloaded and processed using the Juicer pipeline78. Using Juicer78, contact matrices were generated, and chromatin interaction loops were called to identify regions with significant 3D genome interactions. To visualize chromatin architecture, including TAD boundaries and interaction hotspots, the processed data along with the cell-type-specific peak files were visualized using Juicebox (Extended Data Fig. 9). This enabled clear identification of TAD structures, interaction peaks, and regions of chromatin compaction near genes of interest.

T410R Jansen mouse line generation

The generation of mice expressing PTH1R with a HA tag (referred to here as WT-PTH1R) has been previously described79. To replace Thr at position 410 with Arg, improved genome editing via oviductal nucleic acids delivery (i-GONAD) was performed approximately 16 h post-coitus on wild-type females (CD1), which had been mated with a male homozygous for WT-PTH1R (C57BL6/CD1 mixed strain), following the protocol outlined previously80.

In brief, ovaries and oviducts were surgically exposed through dorso-lateral incisions. A tip-tapered glass capillary pipette was used to inject 2 μl of a premixed CRISPR genome editing solution into the lumen of each oviduct. This solution included 1.5 μl of a two-piece CRISPR guide RNA (crRNA) (5′-GAAGCTGCTCAAATCCACGCTGG-3′) and trans-activating CIRSPR RNA (tracrRNA) at a final concentration of 30 μM (IDT), 1.5 μl of a single-strand DNA repair template of 75 nucleotides (5′-CACACGGCAGCAGTACCGGAAGCTGCTCAAATCTAGACTAGTGCTCATGCCCCTCTTTGGCGTCCACTACATTGT-3′) at 3 μg μl−1 (IDT), which introduced the T410R mutation and three silent nucleotide changes, creating an XbaI restriction site, 1 μl of recombinant Cas9 (4 μg μl−1; IDT), and 0.4 μl of Fast Green dye (0.1 mg ml−1; Sigma-Aldrich). Immediately following injection, the oviducts were covered with a sterile PBS-soaked Kimwipe (Kimberly-Clark), and electroporation was performed using a BTX-820 square pulse generator with electrode tweezers, CUY652P2.5×4 (Nepa Gene). The pulse generator was set to 50 V, 5 ms, 8 pulses, with a 0.5 cm electrode gap. The ovaries and oviducts were then returned to their original positions, and the incisions were sutured. Genome-edited and non-edited embryos were allowed to develop to term, and pups were genotyped to verify mutation introduction into the human WT-PTH1R.

Genomic DNA from i-GONAD offspring was isolated from tail (age <20 days) or ear tissue (age >20 days). PCR was performed using 2 μl of DNA with a forward primer (5′-CTGTGACCTTCTTCCTTTACTTCCT-3′) and reverse primer (5′-AGGTACAAGCTGAGATCAAGAAAT-3′); thermocycling conditions were: initial denaturation at 95 °C for 5 min, followed by 35 cycles of denaturation at 95 °C for 15 s, annealing at 58 °C for 10 s, and extension at 72 °C for 20 s, with a final extension at 72 °C for 10 min. After gel electrophoresis (1.6% agarose stained with Gel Green, Biotium 41005), DNA bands of the expected size (567 bp) were excised, purified, and submitted for nucleotide sequence analysis at the Massachusetts General Hospital sequencing core using amplification primers. To confirm the presence of the repair template, including silent mutations establishing an XbaI restriction site (TCTAGA) and the C>G change introducing the T410R mutation, PCR amplicons were incubated with XbaI (60 min, 35 °C) before another gel electrophoresis. Expected band sizes were 379 bp and 188 bp for the mutant allele, and 567 bp for the wild-type allele.

LacZ staining of mice

To observe the activity of the putative regulatory region of RUNX2, the region of interest (Hg38 chr. 6: 44917111–44919226) was cloned into the Hsp68 lacZ vector. At the same time a similar vector was created using the orthologous region of this putative RUNX2 enhancer in the chimpanzee genome (panTro6, chr. 6: 44438191–44440307). Progeny (at E14.5) were genotyped for the lacZ transgene, and both lacZ-positive (13 total) and lacZ-negative mouse samples (12 total) for both constructs were subsequently subjected to X-gal staining. E14.5 specimens were first fixed in 4% PFA overnight at 4 °C and then washed in 1× PBS to remove excess fixative. Next, the specimens were washed in a wash buffer consisting of MgCl2, deoxycholate, NP-40, and sodium phosphate (pH 7.3). X-gal staining solution was then added to completely cover the specimens, and they were left in a dark chamber, nutating, while colour developed (Fig. 4d). Once the specimens displayed blue colouration, X-gal was removed, and they were subsequently washed in wash buffer and re-fixed for long-term storage.

Inclusion and ethics statement

Specimens from developmental (E45–E72) human musculoskeletal and adjacent tissues were obtained from the BDRL at the University of Washington with ethics board approval and maternal written consent. This study was performed in accordance with ethical and legal guidelines and ethical guidelines established by the NIH. The University of Washington IRBs approved the collection and distribution of human tissues for research, and permission to receive and use the samples for research purposes was obtained from Harvard University’s IRB (Capellini: IRB16-1504) and Committee on Microbiological Safety (COMS) (18-103). All mouse protocols were approved by the Harvard University Institutional Animal Care and Use Committee, and all mouse work was covered under the Capellini laboratory IACUC protocol (13-04-161-3).

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.


Source link

Leave a Reply

Your email address will not be published. Required fields are marked *