A human-specific regulatory mechanism revealed in a pre-implantation model

Ethics

This work was performed following the 2021 ISSCR Guidelines59. The use of blood-derived induced naive pluripotent stem cells for the experiments described in this article was approved by the Stanford Stem Cell Research Oversight committee (SCRO protocol number 900).

Cell culture

Induced hnPSCs were generated from peripheral blood cells by overexpressing NANOG and KLF2 by Masaki et al.60; the official name of this cell line is PB004. Cells have been authenticated by STR. Cells were grown on irradiated Cf1 mouse embryonic fibroblast (MEF) feeder layers (A34181, Fisher Scientific). Before experiments entailing next-generation sequencing, hnPSCs were plated without MEFs (feeder-free conditions) using Geltrex (A1413301, Gibco) as a matrix. hnPSCs were grown in PXGL20. This medium consists of N2B27 as base, which is made by mixing 1:1 DMEM/F-12 (D8437, Sigma) and Neurobasal (21103049, Thermo) with the following added supplements: 2 mM l-glutamine (25030024, Thermo), 100 µM 2-mercaptoethanol (M3148, Sigma), N2 and B27 supplements (17502048 and 7504044, Gibco) and 1× antibiotic–antimycotic solution (A5955, Sigma-Aldrich). To make PXGL, we freshly supplemented the following chemicals: 1 µM PD0325901 (S1036, Selleckchem), 2 µM XAV939 (SM38-200, Cell Guidance Systems), 2 µM Go 6983 (2285, Bio-Techne), 10 ng ml−1 recombinant human LIF (300-05, Preprotech) and 1 µg ml−1 of doxycycline to sustain NANOG and KLF2 transgenes expression in hnPSCs. Doxycycline was eliminated from the media for the nontarg-CARGO or LTR5Hs-CARGO induction experiments. Cells were passaged using TrypLE Express (12-605-010, Fisher Scientific) every 3–4 days or whenever colonies were too confluent. The cell incubator was kept at 37 °C and humidified at 7% CO2 and 5% O2 (hypoxia). All cell lines were tested monthly for Mycoplasma. For KRAB–dCas9 and ZNF729 (rescue) inductions, we used 2× water soluble cumate (QM150A-1, System Biosciences).

Derivation of nontarg-CARGO and LTR5Hs-CARGO hnPSCs

hnPSCs were nucleofected using a Lonza 4D-Nucleofector using the P3 Primary X Kit-S (V4XP-3032, Lonza) with the DN100 program. Per nucleofection, we used 400,000 cells without MEF depletion. To generate the KRAB–dCas9 hnPSCs, 0.8 µg of a piggyBac construct containing KRAB–dCas9 under a cumate-inducible promoter61, the cumate repressor CymR and a puromycin selection cassette were co-nucleofected with 0.2 µg of the super piggyBac transposase (PB210PA-1, System Biosciences). Clones containing the integration were selected with puromycin (0.5 µg ml−1) for three passages. KRAB–dCas9 hnPSCs cells were later nucleofected with 0.8 µg of the piggyBac constructs containing nontarg-CARGO (#191319, Addgene)10 or LTR5Hs-CARGO (#191316, Addgene)10 and a neomycin selection cassette and 0.2 µg of the super piggyBac transposase. Cells were then selected with 200 µg ml−1 of G418 for 10 days. Cells (n = 2,000) were subsequently plated in a 10-cm2 plate containing MEFs and fed every day. On days 8–9, sparse colonies were visible and were picked and expanded for the experiments. Cells were treated with puromycin and G418 every few passages to sustain proper KRAB–dCas9 and CARGO array expression, as we noticed these transgenes get silenced over the passages in hnPSCs. For the ‘orthogonal repression of LTR5Hs’ experiments, a distinct array of gRNAs targeting LTR5Hs was designed and cloned into piggyBac using CARGO23 (gRNA sequences are in Supplementary Table 1). The LTR5Hs-Ortho-CARGO hnPSCs were generated as described above, with the only difference that this time the KRAB–dCas9 transgene was under a cumate-inducible Ef1a promoter to ensure high repression at the population level. Analysis of the role of the HERVK proteins in the dark spheres phenotype (‘rescue with HERVK ORFs’ experiment) was performed by selecting three high-repression LTR5Hs-CARGO clones that were previously demonstrated to give rise to dark spheres, and integrating into them multiple copies of a piggyBac transgene encoding a tagBFP and the proteins gag, pro and pol24 under a constitutive Ef1a promoter to ensure robust expression. High-repression LTR5Hs-CARGO hnPSCs positive for tagBFP were isolated and utilized for blastoid formation under cumate treatment to induce LTR5Hs repression.

Genetic deletion of LTR5Hs and ZNF729 overexpression

Selected LTR5Hs elements were deleted from the genome using pairs of gRNAs designed using Benchling (Biology Software, 2023; Supplementary Table 1). crRNAs were purchased from IDT with the XT modification for stability. Cells (n = 400,000) were nucleofected with a ribonucleoprotein complex containing 1.65 µg of HiFi Cas9 Nuclease V3 (1081059, IDT) and 0.85 µl of a 1:1 ratio of 100 µM annealed tracRNA and crRNA. Cells were passaged once and then 2,000 cells were plated on a 10-cm2 plate with MEFs for colony picking. Clones were genotyped using PCR and Sanger sequencing, and heterozygous and homozygous clones were kept for experiments. For the rescue experiments in Fig. 3g,h, 400,000 ΔLTR5Hs ZNF729−/− hnPSCs were nucleofected with a piggyBac plasmid subcloned from a pcDNA3 vector, containing ZNF729–HA cDNA (purchased from Genscript) and a puromycin selection cassette. Super piggyBac transposase was co-nucleofected. Cells were selected with 0.5 µg ml−1 puromycin for 10 days and ZNF729–HA expression was tested by western blot.

Derivation of ZNF729–FKBP(F36V)–HA hnPSCs

To endogenously tag ZNF729, we performed homology-directed repair at the locus with a donor DNA providing the FKBP(F36V)62,63 and HA tags. To this end, we drew upon a previously published method64 based on the combination of Cas9 ribonucleoproteins and delivery of the donor template by AAV6 viral vectors. To generate the AAV viral particles, 2 × 15 cm2 dishes of 293FT cells (R70007, Invitrogen) at 60% confluency were transfected. The day of transfection, the 293FT cells ‘complete cell media’ (DMEM/high-glucose medium (SH30243.FS, Cytiva), 10% FBS (100-106, GeminiBio), 1X non-essential amino acids (1114-0050, Gibco), 1X GlutaMAX (4109-0036, Gibco) and 1X antibiotic–antimycotic (1524-0062, Gibco)) was refreshed 6 h before transfecting. Transfection was carried out using 120 µg polyethylenimine (PEI) per 15-cm2 plate, 22 µg of pDGM6 (#110660, Addgene)65 and 6 µg of AAV template (cloned in the pAAV-GFP backbone; #32395, Addgene)66. After 24 h, the medium was changed to ‘slow growth media’ (same as complete media, but with 2% FBS instead of 10%), and upon further 48 h of culture, the AAV viral particles were purified using one reaction of the AAVpro kit (6675, Takara Bio) and stored at −80 °C. The crRNA utilized to target the ZNF729 C-terminal region was purchased from IDT with the XT modification for stability. Wild-type hnPSCs (n = 400,000) were nucleofected with the ribonucleoprotein complex containing 1.65 µg of HiFi Cas9 Nuclease V3 (1081059, IDT) and 0.85 µl of a 1:1 ratio of 100 µM annealed tracRNA and crRNA. Cells were seeded in a plate containing MEFs, PXGL, the ROCK inhibitor Y-27632 and the AAV viral particles containing the donor template. Media were changed after 24 h. Cells were passaged once and then 2,000 cells were plated on a 10-cm2 plate with MEFs for colony picking. Correct editing was analysed by PCR, Sanger sequencing and western blot. ZNF729 depletion was obtained upon addition of 500 nM of dTAGv-1 (6914, Tocris) for the indicated times.

Blastoid formation

To generate blastoids, the protocol described in Kagawa et al.17,67 was followed with minor changes. hnPSCs were grown on MEFs and dissociated the day of the experiment into single cells. MEFs were depleted by culturing the dissociated cells in PXGL over a gelatin matrix (G1393, Sigma-Aldrich) for 1 h. We used 24-well Aggrewell 400 (34415, StemCell Technologies) plates as vessels. Upon multiple tests, we determined that starting from 76 cells per intended blastoid was optimal, so 91,200 hnPSCs were plated per well of the microwell plate (76 × 1,200 microwells). On the day of plating, cells were cultured in N2B27 base medium containing 10 µM Y-27632 (72304, StemCell Technologies). After 20–24 h, medium was changed to PALLY medium (N2B27 base medium supplemented with PD0325901 (1 µM), A83-01 (1 µM; HY-10432, MedChemExpress), 1-oleoyl lysophosphatidic acid sodium salt (LPA; 500 nM; 3854, Tocris), hLIF (10 ng ml−1) and Y-27632 (10 µM)). PALLY medium was refreshed the next day. Seventy-two hours after plating, medium was replaced with medium containing 500 nM of LPA. At 96 h, structures were collected and analysed as needed. In those experiments in which 2× cumate was added to induce KRAB–dCas9 expression, the drug was added on the day of plating concomitantly with cell aggregation.

hnPSCs differentiation towards the trophectoderm lineage

Trophectoderm monolayer differentiation was completed as described previously29,68. In brief, hnPSCs were washed with PBS and then incubated with TrypLE Express for 10 min at 37 °C. Dissociated cells were washed in DMEM/F-12 (11-330-057, Thermo Fisher Scientific) with 0.1% Bovine Albumin Fraction V (15260037, Thermo Fisher Scientific) and resuspended in nTE-1 media (N2B27 media supplemented with 2 µM PD325901, 2 µM A83-01 and 10 ng ml−1 BMP4 (314-BP-010, R&D Systems)). Cells were counted and seeded to plates coated with 0.15 µg cm2 laminin511-E8 (AMS.892 021, Amsbio) at a density of 2 × 104 cells per cm2. Twenty-four hours after plating, media were changed to nTE-2 media (N2B27 media supplemented with 2 µM PD325901, 2 µM A83-01 and 1 µg ml−1 JAK inhibitor I (74022, StemCell Technologies)). Forty-eight hours after plating, the media were again changed to fresh nTE-2 media. To repress LTR5Hs elements during the differentiation, media were supplemented with 2× water-soluble cumate. Differentiations took place under hypoxic conditions.

hnPSCs differentiation towards the hypoblast lineage

Hypoblast monolayer differentiation from hnPSCs was completed as previously described55. In brief, hnPSCs were washed with PBS and then incubated with TrypLE Express for 10 min at 37 °C. Dissociated cells were washed in DMEM/F-12 with 0.1% Bovine Albumin Fraction V and resuspended in a six-factor ‘6 F media’ (N2B27 media supplemented with 25 ng ml−1 FGF4 (100-31, PeproTech; stabilized with 1 µg ml−1 heparin sodium), 10 ng ml−1 recombinant human BMP4, 10 ng ml−1 recombinant human PDGF-AA (100-13A, Peprotech), 1 µM XAV939 (SM38-10, Cell Guidance Systems), 3 µM A83-01 (HY-10432, MedChem Express) and 0.1 µM retinoic acid (R2625, Sigma-Aldrich). Cells were counted and seeded to plates coated with 0.15 µg cm2 laminin511-E8 at a density of 5 × 104 cells per cm2. Twenty-four hours after plating, the medium was replaced with fresh 6F media. Forty-eight hours after plating, the medium was changed to a seven-factor ‘7F media’, which includes the same factors used in the 6F media, with the addition of 10 ng ml−1 recombinant human IL-6 (200-06, PeproTech). To repress LTR5Hs elements during the differentiation, media were supplemented with 2× water-soluble cumate. Differentiations took place under hypoxic conditions and flow cytometry measures were taken on day 3.

Flow cytometry

After 3 days of trophectoderm or hypoblast differentiation, 200,000 cells were used for staining. Cells were pelleted and resuspended in 100 µl of N2B27 supplemented with 10 µM Y-27632 and either a 1:100 dilution of TACTSD2-BV421 for the trophectoderm differentiations (563243, BD Biosciences) or 1:200 dilution of ANPEP-BV421 for the hypoblast differentiations (301716, BioLegend). Cells were incubated on ice in the dark for 1 h and then washed twice with N2B27 supplemented with 10 µM Y-27632. Flow cytometry was performed on the SONY MA900 cell sorter and data were analysed using FlowJo (v10.10.0).

RNA extraction and RT–qPCR

RNA extraction was performed using TRIzol (15596018, Life Technologies) directly on dissociated hnPSCs carrying the indicated perturbation or, in the case of blastoids and dark spheres, before RNA extraction, the structures were dissociated in a 1:1 mixture of trypsin-EDTA 0.5% (15-400-054, Fisher Scientific) and Accutase (07920, StemCell Technologies) for 5 min, diluted in N2B27 and spun down. Extraction was followed by RNA purification using a Direct-zol RNA-prep kit (R2052, Zymo Research) with DNAse treatment. Of RNA, 1 µg was retrotranscribed into cDNA using a SensiFAST cDNA synthesis kit (BIO-65053, Bioline), cDNA was diluted 1:4 with molecular grade water and 2 µl of this dilution was used for quantitative PCR (qPCR) with primers for each amplicon (Supplementary Table 1). qPCR was performed in a LightCycler 480 Instrument (II) using a SensiFAST SYBR (Bioline, BIO-98020). For experiments using Taqman probes, qPCR Primetime probes were purchased from IDT (sequences in Supplementary Table 1) and were combined for qPCR with the LightCycler 480 Probes Master mix (04707494001, Roche).

CUT&RUN

Protocol was performed according to Meers et al.69 and using the CUTANA reagents from EpiCypher (concanavalin A-conjugated paramagnetic beads (21-1401), pAG-Tn5 (15-1017) and Escherichia coli spike-in DNA (18-1401)). We used 500,000 nontarg-CARGO or LTR5Hs-CARGO hnPSCs per condition. Cells were permeabilized using 0.005% of digitonin, and 0.5 µg of H3K9me3 primary antibody was used per sample (Supplementary Table 1). DNA was extracted using phenol–chloroform, and library preparation was performed using the NEBNext Ultra II Library Prep kit (E7645S, New England Biolabs). Libraries were sequenced paired-end for 150 cycles in a Novaseq 6000 Illumina sequencer.

ChIP–seq, ChIP–seq libraries construction and sequencing

Cells were grown on feeder-free conditions using Geltrex to minimize MEF contamination in the sequencing. One 10 cm2 (approximately 6 × 106 hnPSCs) was used per ChIP. Cells were crosslinked in PBS containing 1% methanol-free formaldehyde (28908, Pierce) for 10 min. Fixation was quenched during 10 min by adding a final concentration of 0.1 M of glycine. Upon harvesting, cells were resuspended in buffer 1 (50 mM HEPES-KOH pH 7.5, 140 mM NaCl, 1 mM EDTA, 10% glycerol, 0.5% NP-40 and 0.25% Triton X-100) and incubated for 10 min, rotating at 4°, before centrifugation at 1,350g for 5 min at 4 °C. The pellet was lysed in buffer 2 (10 mM Tris pH 8, 200 mM NaCl, 1 mM EDTA and 0.5 mM EGTA), incubated for 10 min at 4 °C and once again centrifugated at 1,350g for 5 min. Then, the pellet was lysed in buffer 3 (10 mM Tris pH 8, 100 mM NaCl, 1 mM EDTA, 0.5 mM EGTA, 0.1% sodium deoxycholate and 0.5% N-lauroylsarcosine), incubated for 20 min on ice and sonicated in a Bioruptor sonicator (Diagenode) until the obtention of DNA fragments of sizes ranging 400–600 bp. Chromatin was quantified, and approximately 10–25 µg of chromatin were used for immunoprecipitation in a total of 500 µl of buffer 3 containing the antibodies indicated in Supplementary Table 1. After overnight incubation, 100 µl of magnetic protein G beads (10004D, Life Technologies) were added to each immunoprecipitation. After 2–3 h of incubation, the immunocomplexes were washed five times with RIPA wash buffer (50 mM HEPES-KOH pH 7.5, 500 mM LiCl, 1 mM EDTA, 1% NP-40 and 0.7% sodium deoxycholate) and once with TE-NaCl buffer (50 mM Tris pH 8, 10 mM EDTA and 50 mM NaCl). To recover the DNA, the immunocomplexes were eluted in elution buffer (50 mM Tris pH 8, 10 mM EDTA and 1% SDS) at 65 °C for 15 min with vortexing every 5 min. The bead eluate was decrosslinked overnight at 65 °C. After RNAse A treatment for 30 min (FEREN0531, Thermo Fisher Scientific) and proteinase K treatment (EO0492, Thermo Fisher) for 2 h, the DNA was purified using a Qiagen kit (28106, Qiagen).

To prepare ChIP–seq libraries for sequencing, we utilized the NEBNext Ultra II DNA kit (E7645S, NEB), and Agencourt AMPure XP beads (A63881, Beckman Coulter) were used for the cleanings. We started from approximately 20 to 50 ng of ChIP or input DNA and followed the manufacturer’s instructions. Paired-end sequencing (150 cycles) was performed in a Novaseq X Plus sequencer (Illumina) including 1% of PhiX.

Bulk RNA-seq and library preparation

RNA was extracted using TRIzol from nontarg-CARGO and LTR5Hs-CARGO hnPSCs treated with cumate during 4 days in the absence of doxycycline, from ZNF729–FH hnPSCs treated with dTAGv-1 for 3 h and 24 h or from blastoids and dark spheres. mRNA was purified using poly-T oligo-attached magnetic beads. After fragmentation, the first-strand cDNA was synthesized using random hexamer primers followed by the second-strand cDNA synthesis. Libraries were prepared by end repair, A-tailing, adapter ligation, size selection, amplification and purification, and they were checked with Qubit and qPCR for quantification and Bioanalyzer for size-distribution detection. Quantified libraries will be pooled and sequenced on a Novaseq 6000 Illumina sequencer.

Blastoid immunostainings

Immunostaining of blastoids was performed ‘in well’. Media from Aggrewell were carefully aspirated (more than 90%). For fixation, 1 ml of 4% paraformaldehyde was added to the well and incubated at room temperature for 15 min. The paraformaldehyde was carefully aspirated and substituted for a rinse buffer composed of PBS with 3 mg ml−1 polyvinylpyrrolidone. After one rinse, blastoids were permeabilized in PBS–polyvinylpyrrolidone containing 0.25% of Triton X-100 for 30 min. The permeabilization solution was aspirated substituted with blocking buffer (0.1% BSA (A9418, Sigma-Aldrich), 0.01% Tween 20 (P1379, Sigma-Aldrich) and 2% donkey serum (017-000-121, Jackson Immunoresearch)), which was dispensed in the well with a 5-ml serological pipette to subsequently collect all the blastoids from the well and deposit them into a well of a six-well plate containing more blocking solution. Blocking took place for at least 3 h at 4 °C. Blastoids were picked using standard mouth pipetting or 20-µl pipette tips and moved to primary antibodies (Supplementary Table 1) diluted in blocking solution in Nunc MicroWell MiniTrays (12-565-154, Fisher Scientific) at 4 °C overnight. Blastoids were washed three times with blocking buffer and stained with Alexa Fluor secondary antibodies for 3 h, washed three times and imaged in blocking buffer using an 18-well microslide (81826, Ibidi) in an Inverted Zeiss LSM 780 confocal microscope.

PIP-seq

PIP-seq70 is an alternative to microfluidics-based scRNA-seq methods that captures cells via vortex and can be performed from beginning to library preparation at the experimenter’s bench. Blastoids from two wells of an Aggrewell plate per condition were collected and centrifuged in a 15-ml tube for 2 min at 250g, then the supernatant was aspirated. The blastoids pellet was resuspended in collagenase IV (07909, StemCell Technologies) and incubated at 37 °C with mild agitation for 40 min. Blastoids were centrifuged again in N2B27 medium and the pellet was resuspended in 0.5% trypsin-EDTA (15-400-054, Fisher Scientific) and incubated for 10 min at 37 °C. Two further washes were performed with N2B27, and the dissociated cells were passed through a 40-µm Flowmi cell strainer. Experiment only continued when viability was larger than 80%. Cells (40,000 or less) were counted, captured and used for completing the PIP-seq T20 3′ Single Cell RNA Kit protocol (FBS-SCR-T20-4-V4.05, Fluent Biosciences) without changes and using 12 cycles of cDNA amplification. Libraries were prepared with the reagents in the kit and were sequenced in an Illumina Novaseq X instrument.

Western blot

After SDS–PAGE electrophoresis, protein transfer was carried out on a multilayered cassette including a nitrocellulose membrane. The transfer buffer was composed of 25 mM Tris-HCl, 192 mM glycine, 0.05% SDS and 10% methanol. The power source was set to 125 V for 90 min. The nitrocellulose membrane was blocked with 5% milk for 1 h and incubated with a HA tag antibody (Supplementary Table 1) overnight to detect ZNF729–HA or ZNF729–FH. For gel source data, see Supplementary Fig. 1.

Image obtention and quantification

All bright-field images were taken using the EVOS FL Imaging System. The fluorescent immunostainings were imaged using an Inverted Zeiss LSM 780 confocal microscope. To obtain blastoids inner cell mass (ICM):trophectoderm (TE) ratios, we used the Fiji software71. We calculated the diameter of the blastoids cavity and the ICM size by measuring the distance from the point of contact with the trophectoderm to the end of the ICM. To count number of cells expressing specific lineage markers, we used a combination of software and manual counting. KLF17, GATA4 and cleaved CASP3-positive cells were counted with Fiji’s cell counter in each stack. GATA3-positive cells were counted using the 3D Object Counter plug-in from Fiji, carefully curating the assigned positive cells with the human eye and correcting when necessary (for example, fluorescent artefacts that are not cells).

Quantification of blastoid formation efficiency

For determining blastoid efficiencies, end-point (96 h) blastoids were moved to a 15-ml conical tube and the total volume was measured. Next, two technical duplicates of 50-µl aliquots were dispensed into a 96-well plate and the structures were evaluated and counted, ultimately extrapolating to the total conical tube volume and to the 1,200 microwells present in the Aggrewell. To consider a 3D structure as a blastoid, we followed previously established criteria17. In brief, its morphology should resemble stage B6 of human blastocyst, with an accumulation of cells surrounded by a monolayered cyst mimicking the ICM and the trophectoderm, respectively. The ICM of the blastoids is often outside the cyst, in such case, we still considered that structure a blastoid. Blastoids have an approximate total diameter between 150 µm and 250 µm, and in the case of LTR5Hs-CARGO blastoids, the cavity should be larger than 150 µm, with no upper limit. When tested by scRNA-seq or immunofluorescence, blastoids must express markers consistent with the lineages of blastocyst. Dark spheres are structures that appear darker in bright field and are not cavitated. We note that the efficiencies calculated may be an underestimation, as some aggregates or blastoids are accidentally aspirated during the medium changes.

Recording of blastoid formation video

The blastoid formation protocol was performed as indicated above, with changes. Instead of using Aggrewells (which have an opaque bottom), we utilized Elplasia 24-well plates that allow imaging from below (4441, Corning). We note that the initial cell aggregation in these plates is not as robust, thus end-point blastoid formation is less efficient than in Aggrewells. Cell aggregates are monitored, and when there are early signs of cavitation (small ‘bubbles’ around the aggregates), the plate is moved to a Nikon Eclipse Ti-E microscope that is equipped with a system for CO2 and temperature control (OKOlab). Blastoids were imaged at 37 °C and 5% CO2 for 24 h.

CUT&RUN analysis

Standard Illumina adapters were cut from the Illumina reads using Cutadapt72 and then aligned to a combined hg38 and E. coli genome version using Bowtie2 (ref. 73), with the -dovetail parameter on and the rest of parameters in its default behaviour. This means that, in case of multimapping, all the valid alignments are reported. PCR duplicates were removed from the analysis. Coverage bigwig files were generated with Deeptools74 bamCoverage and the -scaleFactor was set to the number obtained from the normalization of fragments mapped to the human genome (hg38) and the mapped fragments to the E. coli k12 MG1655 genome. Browser captures were obtained from IGV75.

ChIP–seq analysis

Standard Illumina adapters were cut from the Illumina reads using Cutadapt72. Reads were aligned to the Homo sapiens (hg38) genome using Bowtie2 (ref. 73) in its default behaviour. PCR duplicates were removed from the analysis using Samtools76. Coverage bigwig files were generated with Deeptools74 bamCoverage. Browser captures were obtained from IGV75.

Peaks were called using MACS3 (ref. 77). Identification of ZNF729–FH-bound repetitive DNA was performed by intersecting ZNF729–FH peaks with RepeatMasker (RRID:SCR_012954)78 using Bedtools79 intersect with -f 0.3. To be considered a peak at the promoter, ZNF729–FH or TRIM28 must bind to −1 kb or +200 bp around the transcription start site. Motif discovery analysis at the non-repetitive ZNF729 peaks was performed using the top 3,000 ZNF729-bound peaks using SeqPos58. Full SeqPos output can be found in Supplementary Table 7.

RNA-seq and Gene Ontology analysis

Illumina adapters were trimmed from reads using Skewer80. Transcript alignment and quantification were performed using Salmon81 against the human genome assembly version Gencode (v47)82. For differential gene expression analysis, we used DESeq2 (ref. 83) after excluding transcripts with less than 10 reads across the tested samples. DESeq2 compared the effect of nontarg-CARGO and LTR5Hs-CARGO in hnPSCs, the differences between blastoids and dark spheres, or the gene expression changes upon dTAGv-1 addition to the ZNF729–FH hnPSCs. Biological replicates were used as covariates. Analysis of chimpanzee naive PSCs bulk RNA-seq39 was performed in the same manner but using the Pan troglodytes panTro6 Clint_PTRv2 genome assembly. The rhesus genome Mmul_10 (RheMac10) genome reference lacks a ZNF729 transcript model. To assess the presence and expression of ZNF729 in the rhesus macaques naive state, we performed unguided transcriptome assembly from rhesus monkey (Macaca mulatta) naive PSCs bulk RNA-seq data40 using the Trinity pipeline84. We constructed a blast database from the Trinity output and searched for the human ZNF729 nucleotide sequence using blastn and tblastx algorithms85. The highest matches were searched against non-redundant NCBI database with blastn and blastx algorithms. None of the sequences scored ZNF729 in reciprocal blast as a top match. When a sequence had a blast matching to ZNF729, such matches were below 60% identity. Differentially regulated genes (FDR 5%) were used for human Gene Ontology analysis using Gorilla86 using as a background the list of genes expressed in hnPSCs, blastoids and dark spheres.

TEtranscripts

The software ‘TEtranscripts’87 was used to find differentially regulated transposable elements in the ZNF729–FH dTAGv-1 bulk RNA-seq experiments. To this end, the RNA-seq reads were aligned using HISAT2 (ref. 88), and following the tool’s manual, we allowed 100 alignments per read (-k 100) to optimize transposable element quantification and differential analysis. This analysis searches for differences at the level of transposable element families, so we cannot exclude the possibility that within a family, specific individual insertions could still be affected by ZNF729 depletion.

PIP-seq and pseudobulk differential gene expression analysis

For each sample and replicate, reads obtained from the Novaseq X were analysed with Fluent Bio’s proprietary software Pipseeker with default parameters and aligning against the GRCh38 transcriptome index (Gencode v40 2022.04, Ensembl 106). A background removal step was performed in all samples using CellBender89 with parameters –fpr 0.01 and –epochs 150. Full count matrices with background RNA removed were further analysed using Seurat90. Cells with more than 10% of mitochondrial counts were eliminated and the number of genes detected was also used for filtering the data (see Supplementary Table 3 for specific parameters applied for each sample). Each object was normalized using Seurat’s LogNormalize method and transformed using the ScaleData function removing the unwanted variation originated from mitochondrial contamination or the cell-cycle stage. Upon examining elbow plots, 20 principal components were considered significant for unsupervised clustering with the FindNeighbors and FindClusters Seurat functions. Cluster identities were assigned based on the genes specifically marking each cluster according to Seurat’s FindMarkers function and comparing them with lineage markers uncovered in human pre-implantation datasets18,27. Seurat objects belonging to the nontarg-CARGO and the LTR5Hs-CARGO blastoids were merged and the LTR5Hs-CARGO object was downsampled to the same number of cells as the nontarg-CARGO object for comparison purposes. Multiple iterations of downsampling were performed with comparable results. We subset cells belonging to the epiblast or the neo-epiblast into a single group and performed differential gene expression analysis using DESeq2 on the sample-level aggregated counts (pseudobulk). Specifically, DESeq2 tested the effect of the repression of LTR5Hs elements (nontarg-CARGO versus LTR5Hs-CARGO) using the PIP-seq replicate as a covariate. Only genes with FDR of 5% and fold changes < −1 or fold changes > 1 were considered statistically significant. Genes related to the trophectoderm or placental development in Extended Data Fig. 5g are established markers or were obtained from Petropoulos et al.18 and other literature searches. Regarding the number of cells belonging to the different clusters, we do note that scRNA-seq may not be accurate for trophectoderm cell counting, as we have observed an accelerated lysis of trophectoderm cells upon blastoid dissociation before cell capture. In agreement with such possibility, the overall proportion of trophectoderm cells was systematically lower in our scRNA-seq analyses than in immunostainings, irrespective of the LTR5Hs activity status.

Projection of transcriptomes into human embryo datasets

To identify the human embryo counterparts of the transcriptomes of cells dissociated from nontarg-CARGO or LTR5Hs-CARGO blastoids, we projected such transcriptomes into a collection of human embryo reference datasets18,27,91,92,93,94. Counts from each gene in each cell (slot ‘counts’ in Seurat) were extracted and uploaded to a human embryogenesis online prediction tool51 (https://petropoulos-lanner-labs.clintec.ki.se/). The identified annotations and UMAP values were used in our plots and conclusions.

scRNA-seq data analysis including transposons

Raw data from Kagawa et al.17 was downloaded from the Gene Expression Omnibus database entry GSE177689. Smart-seq2 PCR adapters were trimmed using Skewer80 and the resulting reads were aligned using HISAT2 (ref. 88) with the parameters –dta –no-mixed –no-discordant -k 100, to allow enough multimappers for transposon analysis. The resulting bam files were processed with the scTE95 software for transposon family identification and quantification. The resulting matrix was subset to contain only cells dissociated from the 96-h blastoids and it was analysed using Seurat, filtering cells with more than 25% of mitochondrial counts, less than 2,000 or more than 16,000 genes. Downstream unsupervised clustering was performed using 20 principal components. For comparing blastoids from Kagawa et al. and this article, we integrated a nontarg-CARGO scRNA-seq object with the data from Kagawa et al. after removing the transposons using default Seurat data integration functions.

Human–mouse and human–marmoset comparisons

LTR5Hs-regulated genes in the blastoids epiblast located within 250 kb of an LTR5Hs element were used as LTR5Hs target genes. To obtain genes expressed in the mouse and the marmoset epiblast, we used a previously published table containing expression levels and orthology analysis of human, mouse and marmoset genes. Specifically, we used the epiblast data (early ICM in the dataset)34. Only genes with an average expression of more than 2 fragments per kilobase million were considered expressed, with a caveat that this analysis may pass over more subtle, quantitative differences in expression between species, which can nonetheless be functionally important. This cut-off was validated by visual inspection of scRNA-seq datasets from mouse embryos96. Genes were assigned to evolutionary branches using data from the Gentree database38.

ZNF729 locus conservation

Figure 3e is a cartoon depiction of Zoonomia project’s Cactus genomic alignments53,54 in the UCSC genome browser97.

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.


Source link

Leave a Reply

Your email address will not be published. Required fields are marked *