Mouse Cerebellum (Slideseq-V2)

[2]:
import numpy as np
import pandas as pd
import pathlib
import matplotlib.pyplot as plt
import matplotlib as mpl
import seaborn as sns
import scanpy as sc

import sys
sys.path.append('../src')
from utils import *
import pickle

import warnings
warnings.filterwarnings('ignore')

Preprocessing scRNA-ref

Ideally, the scRNAseq reference should from the exact same tissue that contains all the celltypes you expect to find in ST spots.

We will use a Mouse cerebellum scRNA-seq as reference dataset, the raw dataset in h5ad format (scRefSubsampled1000_cerebellum_singlenucleus.h5ad) is available in here.

Conveniently, we provided the pre-pocessed scRNA-ref (Ckpts_scRefs/Cerebellum/Ref_snRNA_cerebellum_qc.h5ad) as well as other relevent materials involved in the following example in here, so you can skip this part

[3]:
ad_sc = sc.read('scRefSubsampled1000_cerebellum_singlenucleus.h5ad')

ad_sc.obs['cell_type'] = ad_sc.obs['liger_ident'].map(lambda x:x.split('_')[-1])
cell_type_column = 'cell_type'

sc.pp.filter_cells(ad_sc, min_counts=500)
sc.pp.filter_cells(ad_sc, max_counts=20000)
sc.pp.filter_genes(ad_sc, min_cells=10)
ad_sc = ad_sc[:,~np.array([_.startswith('MT-') for _ in ad_sc.var.index])]
ad_sc = ad_sc[:,~np.array([_.startswith('mt-') for _ in ad_sc.var.index])]

ad_sc.X.max(),ad_sc.shape
[3]:
(7175.0, (14748, 20179))
[4]:
if ad_sc.X.max()<20:
    ad_sc.X = np.exp(ad_sc.X)-1
plt.hist(ad_sc.X.sum(1),bins=100)
plt.show()
../_images/notebooks_Mouse-Cerebellum-Slideseq_5_0.png
[5]:
ad_sc.raw = ad_sc.copy()
sc.pp.normalize_total(ad_sc, inplace=True, target_sum=5e3)

sc.pp.highly_variable_genes(ad_sc, flavor='seurat_v3',n_top_genes=1000)
sc.tl.rank_genes_groups(ad_sc, groupby=cell_type_column, method='wilcoxon')
markers_df = pd.DataFrame(ad_sc.uns["rank_genes_groups"]["names"]).iloc[0:50, :]
markers = list(np.unique(markers_df.melt().value.values))
markers = list(set(ad_sc.var.loc[ad_sc.var['highly_variable']==1].index)|set(markers)) # highly variable genes + cell type marker genes

ligand_recept = list(set(pd.read_csv('../extdata/ligand_receptors.txt',sep='\t').melt()['value'].values))
# if scRNA-seq reference is from human tissue, run following code to make gene name consistent
# ligand_recept = [_.upper() for _ in ligand_recept]
add_genes = ['Cadm3','Ptprt','Aldoc','Gria1'] # some important genes that we interested

markers = markers+add_genes+ligand_recept
len(markers)
WARNING: It seems you use rank_genes_groups on the raw count data. Please logarithmize your data before calling rank_genes_groups.
[5]:
2217
[6]:
ad_sc.var.loc[ad_sc.var.index.isin(markers),'Marker'] = True
ad_sc.var['Marker'] = ad_sc.var['Marker'].fillna(False)
ad_sc.var['highly_variable'] = ad_sc.var['Marker']

sc.pp.log1p(ad_sc)
sc.pp.pca(ad_sc,svd_solver='arpack', n_comps=30, use_highly_variable=True)

ad_sc.X.max()
[6]:
7.6586394
[7]:
sc.pp.neighbors(ad_sc, metric='cosine', n_neighbors=30, n_pcs = 30)
sc.tl.umap(ad_sc, min_dist = 0.5, spread = 1, maxiter=60)
[8]:
fig, axs = plt.subplots(1, 1, figsize=(10, 10))
sc.pl.umap(
    ad_sc, color=cell_type_column, size=15, frameon=False, show=False, ax=axs,legend_loc='on data'
)
plt.tight_layout()
../_images/notebooks_Mouse-Cerebellum-Slideseq_9_0.png
[9]:
#ad_sc.write('../Ckpts_scRefs/Cerebellum/Ref_snRNA_cerebellum_qc.h5ad')
[ ]:

Learning the gene expression distribution of scRNA-seq reference using score-based model

We use four GPUs to train scRNA-seq reference in parallel.

python ./src/Train_scRef.py \
--ckpt_path ./Ckpts_scRefs/Cerebellum \
--scRef ./Ckpts_scRefs/Cerebellum/Ref_snRNA_cerebellum_qc.h5ad \
--cell_class_column cell_type \
--gpus 0,1,2,3

The checkpoints and sampled psuedo-cells will be saved in ./Ckpts_scRefs/Cerebellum, e.g, model_5000.pt, model_5000.h5ad

Conveniently, we provided the pre-trained checkpoint (Ckpts_scRefs/Cerebellum/model_5000.pt) in here, so you can skip this part

[ ]:

[ ]:

Run SpatialScope

Step1: Singlet Doublet Classification

Because Slide-seq data lacks paired histological image, we replace the first step ”Nuclei segmentation” with ”Singlet/Doublet classification” inspired by RCTD. Of note, this step is also applicable to other high resolution ST data such as HDST, DBiT-seq etc.

python ./src/Singlet_Doublet_Classification.py
–cell_class_column cell_type
–tissue cere
–out_dir ./output
–ST_Data ./demo_data/slideseq-cerebellum.h5ad
–SC_Data ./Ckpts_scRefs/Cerebellum/Ref_snRNA_cerebellum_qc.h5ad

Step2: Cell type identification

python ./src/Cell_Type_Identification.py
–cell_class_column cell_type
–tissue cere
–out_dir ./output
–ST_Data ./output/cere/sp_adata_sdc.h5ad
–SC_Data ./Ckpts_scRefs/Cerebellum/Ref_snRNA_cerebellum_qc.h5ad
–hs_ST

Step 3: Gene expression decomposition

set -ex;

declare -a arr=(“0,1000” “1000,2000” “2000,3000” “3000,4000” “4000,5000” “5000,6000” “6000,7000” “7000,8000” “8000,9000” “9000,10000” “10000,11000” “11000,11626”)

for i in “${arr[@]}”

do

python ./src/Decomposition.py
–tissue cere
–out_dir ./output
–SC_Data ./Ckpts_scRefs/Cerebellum/Ref_snRNA_cerebellum_qc.h5ad
–cell_class_column cell_type
–ckpt_path ./Ckpts_scRefs/Cerebellum/model_5000.pt
–spot_range $i
–gpu 2,3,4,8

done

[ ]:

Cell type identification result (Fig. a)

[114]:
ad_sp = sc.read('../output/cere/sp_adata.h5ad')
ad_sc = sc.read('../Ckpts_scRefs/Cerebellum/Ref_snRNA_cerebellum_qc.h5ad')
[110]:
ad_sp.uns['cell_locations'].head()
[110]:
spot_index spot_class first_type second_type min_score singlet_score first_type_weight second_type_weight x y cell_nums cell_index cell_index_n cell_type discrete_label discrete_label_ct
0 GCCTTGCTGAGCTC singlet Bergmann Granule 445.194605 463.437230 0.824780 0.175220 1632.575000 3463.375000 1 GCCTTGCTGAGCTC_0 0 Bergmann 1 Bergmann
1 GGTTCGCGACCACA singlet Granule Ependymal 237.770038 252.347030 0.745520 0.254480 3078.088889 1795.177778 1 GGTTCGCGACCACA_0 0 Granule 9 Granule
2 CAGAGGCTTTTTTT singlet Astrocytes Granule 378.295186 383.238369 0.882583 0.117417 4682.637363 2342.197802 1 CAGAGGCTTTTTTT_0 0 Astrocytes 0 Astrocytes
3 CTTCGGATTTTTTT singlet Oligodendrocytes Astrocytes 327.364298 327.304644 0.998508 0.001492 4178.563830 2645.021277 1 CTTCGGATTTTTTT_0 0 Oligodendrocytes 15 Oligodendrocytes
4 TGTGGGGAATTTTT singlet Granule Bergmann 207.347290 208.740884 0.950068 0.049932 2757.945652 1932.380435 1 TGTGGGGAATTTTT_0 0 Granule 9 Granule
[108]:
color_dict = {'Astrocytes': '#1f77b4',
 'Bergmann': '#ff7f0e',
 'Granule': '#2ca02c',
 'Purkinje': '#d62728',
 'MLI2': '#9467bd',
 'MLI1': '#8c564b',
 'Oligodendrocytes': '#e377c2',
 'Lugaro': '#7f7f7f',
 'Polydendrocytes': '#bcbd22',
 'Golgi': '#17becf',
 'Candelabrum': '#98df8a',
 'Globular': '#f7b6d2',
 'Microglia': '#aec7e8',
 'Macrophages': '#dbdb8d',
 'Choroid': '#c7c7c7',
 'UBCs': '#c49c94',
 'Ependymal': '#9edae5',
 'Fibroblast': '#ff9896',
 'Endothelial': '#ffbb78'}
[111]:
plt.subplots(figsize=(10,8.5))
sns.scatterplot(data=ad_sp.uns['cell_locations'], x="x", y="y",s=10,hue='discrete_label_ct',palette=color_dict,hue_order=np.unique(res_p2['cell_type']),legend=True)#rocket_r
plt.axis('off')
plt.legend(bbox_to_anchor=(0.97, .98),framealpha=0)
plt.show()
../_images/notebooks_Mouse-Cerebellum-Slideseq_29_0.png
[134]:
ad_sp.shape
[134]:
(11626, 2312)
[ ]:

Dropouts correction (Fig. d)

[140]:
# We start by concatenating all the generated_cells_spotX_X.h5ad together
ad_scst = ConcatCells(s=0,e=12000,inter=1000,es=11626,file_path='../output/cere/',prefix='generated_cells_spot',suffix='.h5ad')
[189]:
sc.pp.log1p(ad_scst)
sc.pp.log1p(ad_sp)
[192]:
gene = 'Ninj2'
VisualDE(ad_sp,gene,0)
../_images/notebooks_Mouse-Cerebellum-Slideseq_35_0.png
[195]:
VisualDE(ad_scst,gene,.01)
../_images/notebooks_Mouse-Cerebellum-Slideseq_36_0.png
[173]:
VisualscDE(ad_sc,gene)
../_images/notebooks_Mouse-Cerebellum-Slideseq_37_0.png
[182]:
gene = 'Foxp4'
VisualDE(ad_sp,gene,0)
../_images/notebooks_Mouse-Cerebellum-Slideseq_38_0.png
[181]:
VisualDE(ad_scst,gene,0.01)
../_images/notebooks_Mouse-Cerebellum-Slideseq_39_0.png
[183]:
VisualscDE(ad_sc,gene)
../_images/notebooks_Mouse-Cerebellum-Slideseq_40_0.png
[155]:
fig, axs = plt.subplots(1, 1, figsize=(10, 10))
sc.pl.umap(
    ad_sc, color="cell_type", size=10, frameon=False, show=False, ax=axs,legend_loc='on data',palette=color_dict
)
plt.tight_layout()
../_images/notebooks_Mouse-Cerebellum-Slideseq_41_0.png
[ ]:

Cellular communications detection

refer to cell-cell interaction analysis in Human heart and Human-Heart-CCell-interaction.ipynb

Representative examples (Fig. f)

[196]:
L_gene,R_gene,L_ct,R_ct = 'Apoe','Sorl1','MLI2','Purkinje'
fig, ax = plt.subplots(1, 1, figsize=(10.2, 8),dpi=100,facecolor='#fafafa')
PlotLRGenes(ad_scst,L_gene,R_gene,L_ct,R_ct,title='${}$'.format(L_gene+'-'+R_gene),ax=ax,invertY=False,s=10)
../_images/notebooks_Mouse-Cerebellum-Slideseq_46_0.png
[198]:
L_gene,R_gene,L_ct,R_ct = 'Mdk','Ptprz1','Bergmann','Granule'
fig, ax = plt.subplots(1, 1, figsize=(10.2, 8),dpi=100,facecolor='#fafafa')
PlotLRGenes(ad_scst,L_gene,R_gene,L_ct,R_ct,title='${}$'.format(L_gene+'-'+R_gene),ax=ax,invertY=False,s=20)
../_images/notebooks_Mouse-Cerebellum-Slideseq_47_0.png
[199]:
L_gene,R_gene,L_ct,R_ct = 'Nlgn2','Nrxn3','Granule','Purkinje'
fig, ax = plt.subplots(1, 1, figsize=(10.2, 8),dpi=100,facecolor='#fafafa')
PlotLRGenes(ad_scst,L_gene,R_gene,L_ct,R_ct,title='${}$'.format(L_gene+'-'+R_gene),ax=ax,invertY=False,s=20)
../_images/notebooks_Mouse-Cerebellum-Slideseq_48_0.png
[ ]:

[ ]:

Cellular communications only detected in the corrected Slide-seq data by SpatialScope (Fig. g)

[208]:
with open('../output/cere/InitProp.pickle', 'rb') as handle:
    InitProp = pickle.load(handle)
[212]:
ad_sp.obs = ad_sp.obs.merge(InitProp['results']['results_df'][['spot_class','first_type','second_type']],left_index=True,right_index=True,how='left')
[215]:
ad_sp.obs['discrete_label_ct'] = ad_sp.obs['first_type']
[224]:
sns.set_context('paper',font_scale=3.5)
L_gene,R_gene,L_ct,R_ct = 'Psap','Gpr37l1','MLI1','Astrocytes'
fig, ax = plt.subplots(1, 1, figsize=(10.2, 8),dpi=100,facecolor='white')
PlotLRGenes(ad_sp,L_gene,R_gene,L_ct,R_ct,title='${}$'.format(L_gene+'-'+R_gene),ax=ax,invertY=False,s=30,perc=0.01)
../_images/notebooks_Mouse-Cerebellum-Slideseq_55_0.png
[225]:
sns.set_context('paper',font_scale=3.5)
L_gene,R_gene,L_ct,R_ct = 'Psap','Gpr37l1','MLI1','Astrocytes'
fig, ax = plt.subplots(1, 1, figsize=(10.2, 8),dpi=100,facecolor='white')
PlotLRGenes(ad_scst,L_gene,R_gene,L_ct,R_ct,title='${}$'.format(L_gene+'-'+R_gene),ax=ax,invertY=False,s=30,perc=0.01)
../_images/notebooks_Mouse-Cerebellum-Slideseq_56_0.png
[217]:
gene = L_gene
celltype = L_ct
sns.set_context('paper',font_scale=2.4)
fig, ax = plt.subplots(1, 1, figsize=(8, 6),dpi=100)
sns.kdeplot(data=ad_sp[ad_sp.obs['discrete_label_ct']==celltype,gene].to_df(), x=gene,ax=ax,fill=True,color='#D6D657',label='Slide-seq')
sns.kdeplot(data=ad_scst[ad_scst.obs['discrete_label_ct']==celltype,gene].to_df(), x=gene,ax=ax,fill=True,color='#BF8AD0',label='SpatialScope',clip=[-1,10])
ax.set_xlabel('${}$ expression in {}'.format(gene,celltype))
plt.legend(loc='best')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.show()
../_images/notebooks_Mouse-Cerebellum-Slideseq_57_0.png
[218]:
gene = R_gene
celltype = R_ct
sns.set_context('paper',font_scale=2.4)
fig, ax = plt.subplots(1, 1, figsize=(8, 6),dpi=100)
sns.kdeplot(data=ad_sp[ad_sp.obs['discrete_label_ct']==celltype,gene].to_df(), x=gene,ax=ax,fill=True,color='#D6D657',label='Slide-seq')
sns.kdeplot(data=ad_scst[ad_scst.obs['discrete_label_ct']==celltype,gene].to_df(), x=gene,ax=ax,fill=True,color='#BF8AD0',label='SpatialScope',clip=[-1,10])
ax.set_xlabel('${}$ expression in {}'.format(gene,celltype))
#plt.legend(loc='best')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.show()
../_images/notebooks_Mouse-Cerebellum-Slideseq_58_0.png
[222]:
sns.set_context('paper',font_scale=3.5)
L_gene,R_gene,L_ct,R_ct = 'Ptn','Ptprz1','Astrocytes','MLI2'
fig, ax = plt.subplots(1, 1, figsize=(10.2, 8),dpi=100,facecolor='#fafafa')
PlotLRGenes(ad_sp,L_gene,R_gene,L_ct,R_ct,title='${}$'.format(L_gene+'-'+R_gene),ax=ax,invertY=False,s=30)
../_images/notebooks_Mouse-Cerebellum-Slideseq_59_0.png
[223]:
sns.set_context('paper',font_scale=3.5)
L_gene,R_gene,L_ct,R_ct = 'Ptn','Ptprz1','Astrocytes','MLI2'
fig, ax = plt.subplots(1, 1, figsize=(10.2, 8),dpi=100,facecolor='#fafafa')
PlotLRGenes(ad_scst,L_gene,R_gene,L_ct,R_ct,title='${}$'.format(L_gene+'-'+R_gene),ax=ax,invertY=False,s=30)
../_images/notebooks_Mouse-Cerebellum-Slideseq_60_0.png
[220]:
gene = L_gene
celltype = L_ct
sns.set_context('paper',font_scale=2.4)
fig, ax = plt.subplots(1, 1, figsize=(8, 6),dpi=100)
sns.kdeplot(data=ad_sp[ad_sp.obs['discrete_label_ct']==celltype,gene].to_df(), x=gene,ax=ax,fill=True,color='#D6D657',label='Slide-seq')
sns.kdeplot(data=ad_scst[ad_scst.obs['discrete_label_ct']==celltype,gene].to_df(), x=gene,ax=ax,fill=True,color='#BF8AD0',label='SpatialScope',clip=[-1,10])
ax.set_xlabel('${}$ expression in {}'.format(gene,celltype))
plt.legend(loc='best')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
plt.show()
../_images/notebooks_Mouse-Cerebellum-Slideseq_61_0.png
[221]:
gene = R_gene
celltype = R_ct
sns.set_context('paper',font_scale=2.4)
fig, ax = plt.subplots(1, 1, figsize=(8, 6),dpi=100)
sns.kdeplot(data=ad_sp[ad_sp.obs['discrete_label_ct']==celltype,gene].to_df(), x=gene,ax=ax,fill=True,color='#D6D657',label='Slide-seq')
sns.kdeplot(data=ad_scst[ad_scst.obs['discrete_label_ct']==celltype,gene].to_df(), x=gene,ax=ax,fill=True,color='#BF8AD0',label='SpatialScope',clip=[-1,10])
ax.set_xlabel('${}$ expression in {}'.format(gene,celltype))
plt.legend(loc='best')
ax.spines['top'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.set_ylabel('')
plt.show()
../_images/notebooks_Mouse-Cerebellum-Slideseq_62_0.png
[ ]:

Comparison of cell counts between raw Slide-seq data and corrected Slide-seq data (Fig. i)

[226]:
cellcounts = pd.DataFrame(ad_scst.obs.discrete_label_ct.value_counts()).merge(pd.DataFrame(ad_sp.obs.discrete_label_ct.value_counts()),how='left',right_index=True,left_index=True)
cellcounts = cellcounts.fillna(0)
[227]:
cellcounts.head()
[227]:
discrete_label_ct_x discrete_label_ct_y
Granule 6241 5737
Oligodendrocytes 1042 930
Purkinje 975 750
Bergmann 715 1435
Astrocytes 698 1066
[228]:
sns.set_context('paper',font_scale=1.7)
width=0.4
fig, ax = plt.subplots(1, 1, figsize=(4, 16), dpi=100)
#plt.figure()
ax.margins(0.1, 0.01)
bars2 = ax.barh(np.arange(cellcounts.shape[0]-1)-width/2,cellcounts['discrete_label_ct_x'].values[1:],height=width,color='#BF8AD0',label='SpatialScope')
bars1 = ax.barh(np.arange(cellcounts.shape[0]-1)+width/2,cellcounts['discrete_label_ct_y'].values[1:],height=width,color='#D6D657',label='Slide-seq')
plt.yticks(np.arange(cellcounts.shape[0]-1))
ax.set_yticklabels(cellcounts.index.values[1:],rotation=30,ha='right')
plt.legend(loc=4)
#plt.ylabel('cell type')
plt.xlabel('cell counts')
ax.spines['bottom'].set_visible(False)
ax.spines['right'].set_visible(False)
ax.bar_label(bars1,fontsize=14,padding=5)
ax.bar_label(bars2,fontsize=14,padding=5)
ax.invert_yaxis()
ax.xaxis.tick_top()
ax.xaxis.set_label_position('top')
plt.show()
../_images/notebooks_Mouse-Cerebellum-Slideseq_67_0.png
[ ]: