Breaking News

Combined single-cell profiling of chromatin–transcriptome and splicing across brain cell types, regions and disease state

Ethics statement

All experiments were conducted in accordance with the 2011 Eighth Edition of the NIH Guide for the Care and Use of Laboratory Animals. Animal procedures were performed according to protocols approved by the Animal Care and Use Committee of Rockefeller University.

Macaque brain tissue acquisition

Brains were collected from two adult male rhesus macaques (M1 and M2, ages 29 and 26) that were humanely killed via intramuscular administration of ketamine, followed by intravenous administration of a pentobarbital overdose for approximately 10 min. These primates had not been exposed to any experimental pharmacological treatment for ≥6 months before being killed and had no recorded infections. Brains were collected within 20 min after pentobarbital administration (post mortem interval: 2 h and 1 h), placed on ice and dissected into 5- to 10-mm coronal slices of PFC and visual cortex using a brain mold guided by the Allen Brain Atlas. Samples were flash-frozen and maintained at −80 °C until processing.

Human brain tissue acquisition

All human samples were deidentified postmortem frozen samples, which were requested from the tissue banks maintained by the Center for Neurodegenerative Disease Research (CNDR) and the University of Pennsylvania Alzheimer’s Disease Core Center (ADCC), according to Weill Cornell Medicine institutional review board-approved protocols. Sample collection was conducted by CNDR/ADCC. A total of nine PFC samples from individuals with AD (five males and four females) and ten control PFC samples from individuals not diagnosed with dementia (six males and four females) were included in this study. Participant sex, age and diagnosis information was supplied by CNDR/ADCC and can be found in Supplementary Table 2. This study is considered ‘non-human subject research’.

Single-nuclei isolation

Single-nuclei isolation was performed for fresh-frozen human brain samples using the SnISOr–Seq8 protocol and the ATAC–seq protocol published by Corces et al.77.

10x Single-nuclei cDNA generation, gene expression and ATAC library construction and Illumina sequencing

A 10x Multiome ATAC + Gene Expression assay was performed by following the manufacturer’s instructions (10x Genomics, CG000338_ChromiumNextGEM_Multiome_ATAC_GEX_User_Guide_RevE, Chromium Next GEM Single Cell Multiome Reagent Kit A, 16 reactions PN-1000282). The quality of full-length 10x cDNA, ATAC and 3′ gene expression short-read libraries was measured by Qubit dsDNA HS assay (Invitrogen, Q32854) and TapeStation Genomic DNA assay (Agilent, 5067-5365 and 5067-5366). Sequencing libraries were loaded on Illumina NovaSeq6000 with PE 2 × 100 paired-end kits by setting the following read length: 28 cycles read 1, 8 cycles i7 index and 91 cycles read 2 for gene expression libraries and 50 cycles read 1N, 8 cycles i7 index, 24 cycles i5 index and 49 cycles read 2N for ATAC libraries. The fastq files were generated by running bcl2fastq v2.20.

Linear/asymmetric PCR and exome capture

Linear/asymmetric PCR was applied to naive full-length 10x cDNA derived from the last step to remove the nonbarcoded cDNA. Spliced barcoded cDNA was enriched by performing exome capture using custom SureSelect probe sets designed for macaques/humans and the reagent kit SureSelectXT HSQ (Agilent, G9611A). The detailed linear/asymmetric PCR and exome capture protocol is described in the SnISOr–Seq pipeline8,14.

Library preparation for ONT

For all samples, ~75 fmol of cDNA processed with linear/asymmetric PCR and exome capture underwent ONT library construction by using a Ligation Sequencing kit (Oxford Nanopore, SQK-LSK110 and SQK-LSK114) according to the manufacturer’s protocol (Nanopore Protocol, Amplicons by Ligation). The ONT library was loaded onto a PromethION sequencer by using a PromethION flow cell (Oxford Nanopore, FLO-PRO002 and FLO-PRO114M) and sequenced for 72 h. ONT long reads were base called using MinKNOW 20.06 or MinKNOW 23.07 and filtered for a base quality score of >7.

Exon–exon junction probe design

A list of 3,630 human genes (3,224 ortholog genes in macaques), including synaptic genes46 (659 for macaques and 720 for humans), TDP-43 binding targets47 (30 for macaques and 33 for humans), genes with cell-type-specific highly variable exons in the human PFC8 (259 for macaques and 271 for humans) and genes associated with missplicing in AD16 (173 for macaques and 202 for humans), ASD48,49,50 (1,875 for macaques and 2,102 for humans), ALS52 (391 for macaques and 428 for humans) and schizophrenia51 (962 for macaques and 1,080 for humans), was assembled. Using the GENCODE human annotation (release 34)78, all protein-coding transcripts of these genes were identified. For each exon–exon junction present in at least one transcript, 140 bases spanning the junction were selected, with 70 exonic bases on either side. If an exon was shorter than 70 bases, adjacent exon sequence was included to reach the required length. Sequences shorter than 130 bp or mapping to more than five genomic loci were excluded. Genes with fewer than five valid probes were also removed. A 120-mer was chosen from within the initial (130- to 140-base) sequence using Agilent Technology’s method for maximizing hybridization efficiency.

Short-read data processing

Both RNA and ATAC fastq files of the M1 PFC sample were subsampled randomly using seqtk 1.3 (https://github.com/lh3/seqtk) to reach a close reads per cell number with the other three samples. The cellranger-arc reference for macaques was built based on the gene annotation of mulatta.Mmul_10 release 104 and genome assembly of Mmul_10 downloaded from Ensembl79. The cellranger-arc reference for human was downloaded from 10x Genomics (References-2020-A Human reference, GRCh38).

Gene expression data processing and cell-type annotation

Gene × cell matrices processed with cellranger-arc-2.0.1 (refs. 80,81) were loaded into Seurat 4.2.0 (refs. 37), and cells were filtered per sample. Doublets were removed before clustering with DoubletFinder 2.0.3 (ref. 39) with an expected doublet ratio of 8~16%. After filtering for high-quality cells, each sample was scaled and normalized using default parameters and clustered using Seurat37. All samples from the same species were merged, scaled and normalized, and variable features were identified. Batch effect correction was performed using Harmony38. Cells were annotated based on published cell-type markers31,32, as well as the Azimuth37 human dataset and other published datasets (https://compbio.mit.edu/ad_aging_brain/) as references82,83. The marker genes used for cell-type/subtype annotation are shown in Supplementary Figs. 1 and 10.

Differential gene expression analysis

For each cell type/subtype, the set of differentially expressed genes detected from the comparison between conditions (macaque visual cortex versus PFC) was obtained by running the FindMarkers function of Seurat37 (test = MAST, FDR < 0.05, | log2 (fold change) | > 0). Gene Ontology enrichment analysis for the differentially expressed genes was performed by using the enrichGO function of clusterProfiler84 4.2.2 (OrgDb = org.Mmu.eg.db, pAdjustMethod = ‘BH’).

Compositional data analysis for cell types

Compositional data analysis for cell types identified in the species comparison between human and macaque PFC samples was performed using scCODA 0.1.9 (ref. 85), and results are shown in Supplementary Fig. 14d.

ATAC data processing

Fragments and peak × cell matrices processed with cellranger-arc-2.0.1 were loaded into Signac63 and Seurat37, and each sample was preprocessed individually with a unified set of peaks generated from bed files of all four samples to build the ‘ATAC’ assay. High-quality cells were selected after quality control and doublets removal using Signac63 and DoubletFinder 2.0.3 (ref. 39). Subsequently, normalization and dimensional reduction were performed after sample merging and batch effect correction using Harmony38. We used two methods to perform cell-type annotation for ATAC data. (1) Cells with matched barcodes in both single-cell RNA-seq and single-cell ATAC–seq data were retained using the barcode translation output from cellranger-arc-2.0.1, and cell-type identities from single-cell RNA-seq were directly assigned to corresponding single-cell ATAC–seq cells. This method was applied to all peak-related figures except for S8e. (2) Single-cell ATAC–seq cells were annotated via label transfer using Signac63. This method was applied to S8e only. The peaks were called per cell type/subtype using MACS2 (ref. 64) by running the CallPeaks function of Signac63. The ‘Peak’ assay was built for downstream analysis using the Signac functions FeatureMatrix and CreateChromatinAssay. The annotation object supplied for CreateChromatinAssay63 was built based on the gene annotation of mulatta.Mmul_10.104 (macaque) or Hsapiens.v86.annotation.hg38 (human) released by Ensembldb. Peaks found in >2% of cells and located on standard chromosomes were tested for differential accessibility between conditions (test method = LR, log (fold change) cutoff = 0), among which the peaks with an FDR of <0.05 were considered significant. Using the Grange files generated by reading the Ensembl-based annotation of macaques/humans with the function import.gff (rtracklayer V1.54.0)86, peak annotation was performed by running the ClosestFeature function of Signac_1.2.1 (ref. 63) or bedtools closest (V2.30.0)87 with the bed-formatted gene annotation transformed by the gtf2bed function of BEDOPS V2.4.41 (ref. 88). The ratios of significant peaks closest to the target genes were calculated as the peaks with an FDR of <0.05 among the peaks closest to the genes targeted in the splicing analysis.

Evaluate differential accessibility between conditions/cell types by downsampling

Peak calling, normalization, batch effect correction, differential accessibility analysis and generation of the peak annotation pipeline were performed as described in ‘ATAC data processing’ for all downsampling experiments. All the related box plots, scatter plots and density plots were generated using ggplot2 (ref. 89).

Brain region comparison and species comparison

For each cell type or subtype, 1,000 cells from each condition were randomly sampled. For condition comparisons (macaque PFC versus visual cortex or human PFC versus macaque PFC), 10,000 peaks were randomly subsampled among all peaks called from 2,000 cells (sum of cells from both conditions) per cell type or subtype and differential accessibility of peaks that were found in >2% of cells were tested (test method = LR, log (fold change) cutoff = 0, FDR < 0.05). Subsampling was repeated 20 times.

Excitatory neuron subtype comparison

For each pair of excitatory neuron subtypes shown in Fig. 3d, 10,000 peaks were randomly subsampled from the peaks called from 4,000 cells (1,000 cells of each subtype per brain region) per subtype comparison, and the differential accessibility of peaks observed in >2% of cells was tested (test method = LR, log2 (fold change) cutoff = 0, FDR < 0.05). Random subsampling was repeated 20 times.

Human AD versus control

To evaluate the differential accessibility per major cell type between AD and control samples, we randomly chose seven of ten control samples and six of nine AD samples for downsampling. For 7 random control samples, 150 random cells were selected per sample to make a total of 1,050 cells as the control group. Similarly, for 6 random control samples, 175 random cells were selected per sample to make a total of 1,050 cells as the AD group. For the condition comparison between AD and control samples, peaks were called from 2,100 random subsampled cells per cell type, followed by random sampling of 20,000 peaks for the differential accessibility test. Only peaks that were detected in >2% cells were tested (test method = LR, log (fold change) cutoff = 0, FDR < 0.05). Subsampling was repeated 20 times.

Neurons versus glia and comparison within neurons (human control PFC)

This experiment was performed as a proof of concept for downsampling shown in Supplementary Fig. 17b. The same protocol performed for AD versus control samples was applied to evaluate differential accessibility in neurons versus glia (positive control) and within neurons (negative control). Subsampling was repeated 100 times.

Differential accessibility of different peak categories between conditions evaluated by downsampling

To evaluate chromatin accessibility differences between excitatory neuron subtypes by genomic location, peaks were divided into different categories (exon/intron/UTR/intergenic) according to the closest features defined by the annotation. Of note, only the peaks whose closest features were either protein-coding genes or long noncoding RNA genes and that were located on standard chromosomes were considered here. For each peak category, 5,000 peaks were randomly selected from the peaks called from 1,000 cells randomly subsampled per condition (PFC/visual cortex). Random subsampling was performed 20 times.

Evaluation of the similarity of cell-type-specific peak sets of different conditions with the Jaccard similarity index

For each excitatory neuron subtype (RORB, CUX2 and CUX2.RORB), peaks were called for PFC or visual cortex cells separately. The peak calling, normalization and batch effect correction pipeline was performed as described in ‘ATAC data processing’. The peak coordinates were exported using granges function of GenomicRanges 1.46.1 (ref. 90) and written in sorted bed format. The Jaccard similarity index of the comparison between peaks called from PFC and visual cortex cells of each excitatory cell type was calculated using the bedtoolsr::bt.jaccard function of BedtoolsR 2.30.0-5 (ref. 91).

Differential motif enrichment analysis

For each excitatory neuron subtype (RORB, CUX2 and CUX2.RORB), we used the getMatrixSet and AddMotifs functions of Signac63 to get the motif information. Overrepresented motifs (FDR < 0.05) were detected by setting the significant brain-region-specific peaks as background (parameters for finding differentially accessible peaks: FDR < 0.05, test method = LR, min.pct = 0. 02, | log2 (fold change) | > 0). The enrichment score violin plot of one of the top hits is shown in Fig. 3i.

Mapping orthologous exons in human data

The TransMap92 projection alignment algorithm was used to map exons between human and macaque assemblies. LASTZ93 1.04.15 genomic alignments between the human GRCh38 and macaque RheMac10 reference assemblies were used to map reference transcript annotations between assemblies. TransMap was used instead of UCSC Genome Browser liftOver94, as it produces base-level alignments, allowing observation of indels and other differences between the LASTZ chain and net alignments files. These were obtained from the UCSC Genome Browser site, along with the below-mentioned programs to process them. Syntenic genomic alignments were obtained by filtering the net files to obtain the syntenic nets using ‘netFilter -syn’ and then using ‘netChainSubset -wholeChains’ to obtain a set of syntenic chain alignments for mappings. GENCODE78 human v35 and macaque were mapped to the other assembly using the ‘pslMap’ program95.

Species comparison of peaks in conserved exons

The most conserved exons pairs between humans and macaques were considered for chromatin accessibility comparison. A total of 157,596 human exons and 157,562 macaque exons composed of 159,279 pairs, which indicates for each exon in one species, only the one with the highest ortholog similarity in another species was considered. For all normal human PFC samples, the command bedtools intersect87 was used to filter for the fragments that overlapped with human exons (≥1 bp) in the conserved exon pairs. The conserved exon-covering fragments were sorted and indexed for each sample. The same procedure was performed for all macaque PFC samples except for that fragments were mapped to the hg38 genome by rtracklayer::liftOver96. For the bed file of ATAC peaks in Cell Ranger output, only the peaks covering conserved exons were kept for each sample. Similar to the fragment file processing procedure, all macaque PFC sample peaks were mapped to the human hg38 genome and combined with human PFC sample peaks. The conservative exon-covering peaks and fragments were used for ATAC assay creation. To build the ‘Peak’ assay, the peaks were called either by major cell types or subtypes by MACS2 (ref. 64; by running the CallPeaks function of Signac63). By running the Signac functions FeatureMatrix and CreateChromatinAssay, the ‘Peak’ assay was built for downstream analysis. The annotation used for CreateChromatinAssay63 was built based on the human gene annotation EnsDb.Hsapiens.v86. Only standard chromosome peaks were considered. Similarly, the combined data of the ‘Peak’ assay were scaled and normalized, and the top features were identified. Integration of data to control for sample-specific batch effects was performed using Harmony38.

Long-read data processing

ONT fastq files were first filtered for barcoded reads with the GetBarcodes function from scisorseqR9. Reads were mapped using minimap2 (ref. 53), followed by differential splicing analysis with scisorseqR using the commands MapAndFilter() and InfoPerLongRead() with default settings. Given that the default setting of the command InfoPerLongRead() requires a ‘minTimesIsoObserve’ equal to 5, only the spliced reads that support the unique isoforms observed at least five times were kept and recorded in AllInfo files of each sample. The generated AllInfo files were then UMI corrected, where UMIs were required to have an edit distance of ≥4. If multiple reads with similar UMIs did not meet this criterion, then only one read of the group was kept. UMI-filtered AllInfo files were used in scisorATAC’s casesVcontrols function with basic settings to yield differentially spliced exons. The cell-type-resolved single-cell long-read assignments per example gene with alternative exons were plotted using ScISOrWiz59.

Validation of POLN exon inclusion using quantitative PCR with reverse transcription

RNA was extracted from macaque tissue isolated from the PFC or visual cortex using an RNeasy Mini kit (Qiagen, 74104), which involved on-column DNase I digestion before RNA elution. cDNA was synthesized using SuperScript IV Reverse Transcriptase (Invitrogen, 18090200), according to the manufacturer’s protocol. Quantitative PCR with reverse transcription was performed using 30 ng of cDNA as template per sample, validated primers (see below) and PowerUp SYBR Green Master Mix (Applied Biosystems, A25742) on a QuantStudio 3 Real-Time PCR System (Thermo Fisher Scientific). Primers for quantitative PCR with reverse transcription were designed by using Primer-BLAST and were synthesized by Thermo Fisher Scientific. The primers targeted mutually shared POLN exons (5′-TGAGCAGTAACCAGCTTCGAG-3′ and 5′-GATGAAGGTCTCGCAGAGCA-3′) or visual cortex-specific exons (5′-AGAGTAGAGTCAGGGAGCCA-3′ and 5′-TGCCTCCTGGGTTCAAGCGA-3′). Comparisons were made using the comparative cycling threshold (Ct) method, and data were normalized to the PFC and are shown as fold change.

Merge macaque and human expression data by liger

We used liger97 to integrate the RNA assay data from six human normal PFC samples and two macaque PFC samples. We annotated a total of 16 cell types using the pipeline described in the gene expression data analysis section. Among all nine excitatory neuron subtypes, we only considered the three most abundant subtypes, which were L2–L3 IT_CUX2.CBLN2, L2–L3 IT_NRGN.CBLN2 and L3–L5 IT_RORB cells, for species comparison. The other two abundant subtypes (L2–L4 IT_CUX2.RORB and L2–L4 IT_CUX2.RORB.ACAP3 (2,843 and 4,549)) were excluded from the following analysis as they are under-represented in the human samples. Additional excitatory neuron subtypes (L5/L6 NP, L5 ET, L6 CT/L6b and L6 IT CAR3/L6 IT) were recovered in both species but were also excluded as each comprise less than 1,000 cells.

Calling differentially included exons

For each cell type and alternative exon, inclusion counts and exclusion counts were collected as previously performed. Before testing for differential exon inclusion, a χ2 criterion was applied for filtering. To compare exon inclusion for two distinct comparisons, a 2 × 2 table was populated for inclusion and exclusion counts for the two conditions, and a two-sided Fisher’s exact test was used following a Benjamini–Yekutieli correction for multiple testing. See the Supplementary tables for lists of significant excitatory subtype exons between the PFC and visual cortex.

Downsampling experiments for differential splicing analysis

Downsampling splicing experiments for the PFC versus visual cortex, PFC cell-type comparison and species comparison

To compare two comparisons (that is, differences between the PFC and visual cortex in RORB+ cells against the same areas in CUX2+ cells) with equal power, we performed downsampling experiments. We selected all exons that had at least 20 exclusion or inclusion counts in both brain areas. This was followed by randomly selecting 20 reads among the total. These reads were then used to recalculate the difference in percent isoform inclusion between the areas (ΔΨ). Next, we selected 100 exons randomly for this cell type between two brain areas, enforcing that there be at most one exon per gene. We then repeated these steps for all cell types that were compared. This yielded 100 2 × 2 tables for all comparisons, with exactly equal column sums and the same characteristics (table number) for multiple testing correction. We then performed a Fisher’s exact test and Benjamini–Yekutieli correction for multiple testing and recorded the number of significant events for all comparisons. The procedure was repeated 100 times, giving a distribution of significant percentages for both comparisons. These two distributions were compared with a two-sided Wilcoxon rank-sum test. For disease downsampling in Fig. 2f, we used 20 exons rather than 100 due to smaller sample size. This process was done for all downsampling comparisons except for AD versus control samples in Fig. 5 to account for individual variation due to the high number of individuals that were used. We describe this process below.

Downsampling splicing experiments for AD versus control samples

To account for individual variation due to the high number of human samples involved in this analysis, we designed an updated downsampling process to equalize comparisons and ensure that observed changes were not contributed by one or a few individuals. In this process, we first selected exons that were observed in seven or more samples in both the AD and control groups. We then randomly selected seven of those available samples to use in this analysis. We next filtered all exons that had five or more reads per sample, yielding a minimum of 35 reads per group. We then randomly selected 4 reads per sample, yielding a total of 28 reads per group. Twenty-five exons per cell type with sufficient data were selected randomly between AD and control samples. This yielded 25 2 × 2 tables for all comparisons, with exactly equal column sums and the same characteristics (table number) for multiple testing correction. We then performed a Fisher’s exact test and Benjamini–Yekutieli correction for multiple testing and recorded the number of significant events for all comparisons. The procedure was repeated 100 times, giving a distribution of significant percentages for cell types. These distributions were compared with a Wilcoxon rank-sum test.

Downsampling splicing experiments for neurons versus glia and comparisons within neurons

This experiment was performed as a proof of concept in control human PFC data and is shown in Supplementary Fig. 17a. For the neuronal control group (neuron 1 versus neuron 2), all neuronal cell types were combined and split into two equal groups (neuron 1 and neuron 2). In comparison, we compared all neurons to all glia. Downsampling experiments were performed the same as in the PFC versus visual cortex, PFC cell-type comparison and species comparison sections.

Comparing SynGO terms shared by neurons versus glia splicing analysis and downsampled analysis

To validate the downsampling analysis shown in Supplementary Fig. 17c, we compared the Gene Ontology of differentially spliced genes from both the full splicing and downsampling analyses.

Exome capture efficiency comparison between probe sets

To compare the exome capture efficiency between different probe sets, we used two datasets derived from two human PFC samples (C4 and C6), which were exome captured by two probe sets (whole-exome probe and brain gene exon–exon junction probe) separately. The whole-exome probe-captured dataset was released in our previous publication8. For each dataset, the differential splicing analysis was performed by comparing neurons and non-neurons by running the ‘casesVcontrols’ function of the scisorATAC package. The correlation between the ΔΨ values of shared tested exons derived from two datasets is shown in Fig. 4e.

Statistical sensitivity simulations

We made large numbers of matrices with a ΔΨ of 0.1 (1,000 total counts in each column). All such matrices have P values of ≤10 × 10. We then downsampled these to combined counts of 0–9 (in each column), 10–19 (in each column), 20–29 (in each column), 30–39 (in each column), 40–49 (in each column) and 50–249 (in each column) and recorded the fraction of matrices that passed Benjamini–Yekutieli correction for multiple testing (at a corrected P value of <0.05 for 100 tests in each case). These fractions give an idea of how many reads are required to find a true ΔΨ of 0.1. We repeated this process for ΔΨ 0.2, 0.3, 0.4 and 0.5. In summary, for a ΔΨ of ≥0.4 and read numbers (in each column) of ≥30, one reaches a sensitivity of 82%. These data are shown in Supplementary Fig. 6e.

Correlation between cell-type-specific splicing and cell states revealed by transcription and chromatin accessibility

To get the cell states defined by transcription and chromatin accessibility per gene per cell type, we followed the tutorial of Velocyto98 and MultiVelo10 0.1.3. Loom files were obtained by running Velocyto 0.17 for all human PFC samples (ten control and nine AD samples) and macaque samples (two PFC and 2 visual cortex). With the spliced and unspliced counts stored in loom files, running MultiVelo velocity stream and latent time was performed for the genes that had exons tested for differential splicing of each comparison (1,571 genes for the macaque brain region comparison, 2,936 genes for the species comparison and 1,874 genes for the AD versus control comparison).

A state value of 0, 1, 2 or 3 (corresponding to cell states priming, coupled-on, decoupled and coupled-off, respectively) was assigned to each cell × gene pair by Multivelo based on the RNA-seq and ATAC–seq expression dynamics. These state values (Sgc {0,1,2,3}) were then used to connect to the exon splicing levels per cell. If a gene only exhibited one state across all cells, then it was classified as a single-state gene and excluded from further analysis. For each cell in a given cell type, and all tested exons for a particular condition, we used the UMI-corrected AllInfo files as input to obtain the inclusion or exclusion of an exon–cell pair. Using the state value assigned for a gene (Sgc) as a proxy for all exons in that gene (Sec), the exon inclusion and exclusion vectors for a cell type were decomposed into individual state vectors. Thus, a matrix containing the state values as rows and inclusion or exclusion values as columns was populated. A state-wise percent spliced in (Ψ) value was therefore obtained by dividing the inclusion counts for a state by the total number of molecules arising from that gene containing that state. A matrix was only considered for testing for differential inclusion if it fulfilled the χ2 criteria. A P value using the χ2 test was reported, and if the number of states was limited to two, an LOR was also explicitly calculated. This process was repeated for all cell types and conditions in a comparison (for example, AD versus control).

Evaluation of the association between splicing and cell state

LOR was used for quantifying the strength of the association between two events, splicing and cell state. For an exon of a specific cell type, we calculated a P value, and a Benjamini–Yekutieli correction was performed for multiple testing. For a significant exon, we then used the LOR to quantify the difference in inclusion between both states. Thus, in addition to knowing that the Ψ values are significantly different, we can also assess how different they are. The P values are derived from the χ2 test for a 2 × 2 table. Likewise, the LOR is also deduced from the counts of the 2 × 2 table. ‘Ainc’ represents the number of reads that support the exon of a specific cell type in query for state A. ‘Aexc’ represents the number of reads that mapped to the gene but exclude the exon of a specific cell type in query for state A. A similar definition applies to ‘Binc’ and ‘Bexc for state B.

$${{\mathrm{LOR}}}=\log_2\left(\frac{{A}_{{{\mathrm{inc}}}}/{A}_{{{\mathrm{exc}}}}}{{B}_{{{\mathrm{inc}}}}/{B}_{{{\mathrm{exc}}}}}\right)$$

As described in ‘Long-read data processing’, only spliced reads supporting isoforms observed five or more times per sample (default) were retained in the AllInfo file. We also tested a relaxed cutoff, requiring isoforms to appear five or more times across all samples in a comparison. This led to a modest increase in significantly differentially included exons. Notably, over 80% of exons identified using the strict cutoff were also found with the relaxed cutoff, indicating high consistency. For each cell type in a comparison, only the exons where the total read counts were greater than 10 were retained. Using this, an exon × state matrix of Ψ values was obtained per condition, and the matrix for one condition was subtracted from the other, which was defined the as statePSI matrix. To identify the outliers, we limited the statePSI matrix to values that showed at least a 5% difference between conditions and then normalized each row of the statePSI matrix by the RNA-only Ψ, thus defining the normState matrix. Finally, in cases where the same exon was tested in both conditions for the same cell type and showed significance in at least one, the state Ψ values were plotted against the state to show the divergence in exon inclusion depending on chromatin–RNA state dynamics.

Definitions of Ψ, state ΔΨ, overall ΔΨ and normalized-state ΔΨ:

$$\Psi=\frac{{{\rm{inclusion}}\;{\rm{reads}}}}{{{\rm{inclusion}}\;{\rm{reads}}}+{{\rm{exclusion}}\;{\rm{reads}}}}$$

$${\rm{overall}}\,{{\Delta }}\varPsi ={\varPsi }^{{{\mathrm{Case}}}}-{\varPsi }^{{{\mathrm{Ctrl}}}}$$

$${\rm{state}}\,x\,{{\Delta }}\varPsi ={\varPsi }{\rm{(Case}}\,{\rm{state}}\,x{\rm{)}}-{\varPsi }{\rm{(}}{\rm{Ctrl}}\,{\rm{state}}\,x{\rm{)}}$$

$${\rm{normalized}}\; {\rm{state}}\,x\,{{\Delta }}\varPsi =\frac{{\rm{state}}\,x\,\Delta \varPsi }{{\rm{overall}}\,\Delta \varPsi }$$

Reporting summary

Further information on research design is available in the Nature Portfolio Reporting Summary linked to this article.

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

Back to top button