SCENIC+ 的网络研讨会笔记

CZHanoiMay 18, 2025

讲解者：Seppe De Winter，文章的二作，来自巨无霸的Stein Aerts Lab - VIB - KULeuven组

安装 Install

这部分真的不想多说，因为众所周知的原因，安装得需要借助强大的“外”力；以及因为“pybigwig==0.3.23”的原因，Windows系统我试了很多次都没有成功。

git clone https://github.com/aertslab/scenicplus
#直播的时候这一步能秒完成，真的馋哭了
#git clone git@github.com:aertslab/scenicplus.git
mamba create --name scenicplus python=3.11.8
#推荐3.11版本
mamba activate scenicplus
cd scenicplus/
git checkout development
pip install .

Notes:如果出现下面的报错，原因是git连不上（典）

fatal: unable to access 'https://github.com/aertslab/LoomXpy/': Recv failure: Connection was reset
  error: subprocess-exited-with-error

  × git clone --filter=blob:none --quiet https://github.com/aertslab/LoomXpy 'C:\Users\Hanoi\AppData\Local\Temp\pip-install-vyuu71m4\loomxpy_152eb1f7ca014b02893fa317b79a7d6c' did not run successfully.
  │ exit code: 128
  ╰─> See above for output.

  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× git clone --filter=blob:none --quiet https://github.com/aertslab/LoomXpy 'C:\Users\Hanoi\AppData\Local\Temp\pip-install-vyuu71m4\loomxpy_152eb1f7ca014b02893fa317b79a7d6c' did not run successfully.
│ exit code: 128
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

这个时候我的解决办法是换成美国的VPN(暂时)

有一个十分费力的安装方法，后面得单独开一个页面来讲述了😲

验证安装成功:

终端中直接运行

(base) [chenzhh@nodecw4 ~]$ conda activate scenicplus
(scenicplus) [chenzhh@nodecw4 ~]$ scenicplus

   ____   ____ _____ _   _ ___ ____
  / ___| / ___| ____| \ | |_ _/ ___| _
  \___ \| |   |  _| |  \| || | |   _|.|_
   ___) | |___| |___| |\  || | |__|_..._|
  |____/ \____|_____|_| \_|___\____||_|


scenicplus verions: 1.0a2
usage: scenicplus [-h] {init_snakemake,prepare_data,grn_inference} ...

Single-Cell Enhancer-driven gene regulatory Network Inference and Clustering

positional arguments:
  {init_snakemake,prepare_data,grn_inference}

options:
  -h, --help            show this help message and exit
(scenicplus) [chenzhh@nodecw4 ~]$ pycistarget


██████╗ ██╗   ██╗ ██████╗██╗███████╗████████╗ █████╗ ██████╗  ██████╗ ███████╗████████╗
██╔══██╗╚██╗ ██╔╝██╔════╝██║██╔════╝╚══██╔══╝██╔══██╗██╔══██╗██╔════╝ ██╔════╝╚══██╔══╝
██████╔╝ ╚████╔╝ ██║     ██║███████╗   ██║   ███████║██████╔╝██║  ███╗█████╗     ██║
██╔═══╝   ╚██╔╝  ██║     ██║╚════██║   ██║   ██╔══██║██╔══██╗██║   ██║██╔══╝     ██║
██║        ██║   ╚██████╗██║███████║   ██║   ██║  ██║██║  ██║╚██████╔╝███████╗   ██║
╚═╝        ╚═╝    ╚═════╝╚═╝╚══════╝   ╚═╝   ╚═╝  ╚═╝╚═╝  ╚═╝ ╚═════╝ ╚══════╝   ╚═╝


pycistarget version: 1.1
usage: pycistarget [-h] {cistarget,dem} ...

Motif enrichment analysis.

positional arguments:
  {cistarget,dem}

options:
  -h, --help       show this help message and exit
(scenicplus) [chenzhh@nodecw4 ~]$ pycistopic
usage: pycistopic [-h] {qc,topic_modeling,tss} ...
pycistopic: error: the following arguments are required: command

**Notes：**不要问为什么pycistopic的输出是个error,因为De Winter讲的时候就是这样样子（）

Figure 0: Installed

工作流 Workflow Overlook

Figure 1: Workflow

主要目的从ATAC数据+Expression数据（multiomics data）推断enhancer driven GRNs。

主要流程分成了下面三个独立的仓库

①aertslab/pycisTopic

处理ATAC数据，获得consensus peak。

②aertslab/pycisTarget

推断潜在的TF和对应的Region，富集。

③aertslab/scenicplus

结合Expression数据获得GRNs。

Notes：

该方法只推荐用于模式生物和人，一方面是作者只提供了大鼠、小鼠、人和果蝇的blacklist，另一方面可能就是不同生物的GRNs之间真的有很大差异；

pycisTopic

Figure 2a: pycisTopic purpose

输入文件解释：

①每个样本的Fragmen.tsv.gz文件和index(.tbl)

②细胞注释（Github上有Issue写了无注释的方法，但极不推荐）

生成文件解释：

①bed文件 :存储Consensus peak set

②Imputed accessibility matrix*

Notes：插值的原因是scATAC-seq矩阵高度稀疏（对比scRNA-seq来说）

③bed文件 :存储Co-accessible regions（用于pycisTarget后续分析）

Topics: 描述染色质的开放模式
DARs (Differencially accessible regions): 不同细胞类型或条件间有显著差异的区域。

Figure 2b: pycisTopic workflow

①推断共识峰（获取新的特征）Infer Consensus Peak

<1>Pseudo-Bulk Peak Sets的建立: Aggregate all of the reads coming from Barcodes of the same cell type into a single fragment file.

<2>Call Peaks on each individual fragment file per cell type.

narrow_peak_dict = peak_calling(
    macs_path = macs_path,
    bed_paths = bed_paths,
    outdir = os.path.join(os.path.join(out_dir, "consensus_peak_calling/MACS")),
    genome_size = 'hs',
    n_cpu = 10,
    input_format = 'BEDPE',
    shift = 73,
    ext_size = 146,
    keep_dup = 'all',
    q_value = 0.05,
    _temp_dir = "/tmp"
)

<3>共识整合Consensus peak: Merge all these peaks per cell type into a single Peak Set.

from pycisTopic.iterative_peak_calling import get_consensus_peaks
peak_half_width=250
#peak scale will be set as 500 base
path_to_blacklist="/public/home/chenzhh/packgae_python/pycisTopic/blacklist/hg38-blacklist.v2.bed"
# Get consensus peaks
consensus_peaks = get_consensus_peaks(
    narrow_peaks_dict = narrow_peak_dict,
    peak_half_width = peak_half_width,
    chromsizes = chromsizes,
    path_to_blacklist = path_to_blacklist)
#↓导出为bed文件
consensus_peaks.to_bed(
    path = os.path.join(out_dir, "consensus_peak_calling/consensus_regions.bed"),
    keep =True,
    compression = 'infer',
    chain = False)

Figure 2c: IGV views of Peaks

Notes:

使用IGV看得到的结果，其中每一行都是一种细胞类型；

黑色的框则是最后计算得到的共识峰（Consensus peak）。后面的步骤则是以这些Consensus peak作为features，每个Cell Barcodes作为index重新生成matrix。

②QC + 生成新的计数矩阵

filter high quality Cell barcodes（个人觉得，如果之前做过了这步可以省略，而且De Winter讲到这里的时候翻车了hhhh）→generate account matrix

<1>QC

原理和其他工具都一样，简单介绍一下De Winter分享的如何看QC图（本人反正一直都是懵懵懂懂的）

Figure 2d: QC

图Left:(Number of Fragment——Barcode Rank): 希望观察到的是一个sharp knee

图Middle(Fragment Ratio——Fragment Size): 长这个样子，两个峰分别代表mononucleosomal (单核小体)和dinucleosomal(双核小体)

图Right(TSS的Normalized Eenrichiment): 很漂亮的TSS enrichment

<2>生成cistopic_obj

pycistopic_qc_output_dir = "outs/qc"

from pycisTopic.cistopic_class import create_cistopic_object_from_fragments
import polars as pl

cistopic_obj_list = []
for sample_id in fragments_dict:
    sample_metrics = pl.read_parquet(
        os.path.join(pycistopic_qc_output_dir, f'{sample_id}.fragments_stats_per_cb.parquet')
    ).to_pandas().set_index("CB").loc[ sample_id_to_barcodes_passing_filters[sample_id] ]
    cistopic_obj = create_cistopic_object_from_fragments(
        path_to_fragments = fragments_dict[sample_id],
        path_to_regions = path_to_regions,
        path_to_blacklist = path_to_blacklist,
        metrics = sample_metrics,
        valid_bc = sample_id_to_barcodes_passing_filters[sample_id],
        n_cpu = 1,
        project = sample_id,
        split_pattern = '-'
    )
    cistopic_obj_list.append(cistopic_obj)

如果不使用他们的QC，则将metrics和valid_bc设定为None，如下

path_to_regions = os.path.join(out_dir, "consensus_peak_calling/consensus_regions.bed")
path_to_blacklist = "/public/home/chenzhh/packgae_python/pycisTopic/blacklist/hg38-blacklist.v2.bed"

from pycisTopic.cistopic_class import create_cistopic_object_from_fragments

cistopic_obj_list = []
for sample_id in fragments_dict:
    cistopic_obj = create_cistopic_object_from_fragments(
        path_to_fragments = fragments_dict[sample_id],
        path_to_regions   = path_to_regions,
        path_to_blacklist = path_to_blacklist,
        metrics           = None,
        valid_bc          = None,
        n_cpu             = 10,
        project           = sample_id,
        split_pattern     = '-'
    )
    cistopic_obj_list.append(cistopic_obj)

③Topic modeling

<1>Impute accessibility

<2>获得 Co-accessible Topics: Sets of Co-accessible Regions

pycisTarget

Figure 3a: pycisTarget purpose

作用：在co-accessible region中找到enrich motifs

输入文件解释：

<1>Set of co-accessible region，可以是Topics或者DARs

→※CisTarget Database

<2>Motif Collection，见网址Index of /cistarget/motif_collections/v10nr_clust_public

输出文件：

注释成TF的富集的motifs集合

Figure 3b: create_cisTarget_databases WorkFlow

Notes: 尽管他们预先计算好了人鼠果蝇的motifs-to-TF Annotations (Target Database),但这些只是在General regions，所以推荐自己根据Consensus Peaks自己创建Database。

①Create Cistarget Database

<1>访问仓库aertslab/create_cisTarget_databases

给DNA序列进行motifs打分

<2>下载工具Cluster-Buster

使用隐马尔可夫链方法生成分数

wget https://resources.aertslab.org/cistarget/programs/cbust
chmod a+x cbust

<3>下载Motif Collection

**<4>**将Consensus Peaks 转为 fasta文件（使用bedtools)

选作: add 1kb of background padding，作为cluster-buster的Background sequence。

<5>运行

Figure 3c: pycisTarget WorkFlow

我的代码流程如下：

①run_create_fasta.sh

#!/usr/bin/env bash

eval "$(/public/home/chenzhh/bin/micromamba shell hook -s bash)"
micromamba activate create_cistarget_databases

GENOME_FA="/public/home/chenzhh/udanmas/Dianmu/outs/mm10.fa"
CHRSZS="/public/home/chenzhh/udanmas/Dianmu/outs/mm10.chrom.sizes"
REG_BED="/public/home/chenzhh/udanmas/Dianmu/outs/consensus_peak_calling/consensus_regions.bed"
OUT_FA="/public/home/chenzhh/udanmas/Dianmu/outs/mouse_brain_with1kb_bg.fa"
PADDING=1000

SCRIPT_DIR="/public/home/chenzhh/packgae_python/create_cisTarget_databases"
CREATE_SH="${SCRIPT_DIR}/create_fasta_with_padded_bg_from_bed.sh"

command -v bedtools >/dev/null 2>&1 || { echo >&2 "Error: bedtools not found in PATH."; exit 1; }
[ -x "${CREATE_SH}" ] || { echo >&2 "Error: script ${CREATE_SH} not found or not executable."; exit 1; }

echo "[$(date)] Starting create_fasta_with_padded_bg_from_bed..."
bash "${CREATE_SH}" \
    "${GENOME_FA}" \
    "${CHRSZS}" \
    "${REG_BED}" \
    "${OUT_FA}" \
    "${PADDING}" \
    yes

RET=$?
if [ $RET -eq 0 ]; then
  echo "[$(date)] Done successfully: ${OUT_FA}"
else
  echo "[$(date)] ERROR (exit code $RET)"
fi

exit $RET

②

#!/usr/bin/env bash

OUT_DIR="/public/home/chenzhh/udanmas/Dianmu"
SCRIPT_DIR="/public/home/chenzhh/packgae_python/create_cisTarget_databases"
DATABASE_PREFIX="mouse_brain"
FASTA_FILE="${OUT_DIR}/outs/mouse_brain_with1kb_bg.fa"
CBDIR="/public/home/chenzhh/udanmas/Mindulle/aertslab_motif_colleciton/v10nr_clust_public/singletons"
MOTIF_LIST="${OUT_DIR}/motifs.txt"
CORES=180

MICROMAMBA_BIN="/public/home/chenzhh/bin/micromamba"

cd "${OUT_DIR}" || { echo "Cannot cd to ${OUT_DIR}"; exit 1; }

ls "${CBDIR}" > "${MOTIF_LIST}"

if [ ! -x "${SCRIPT_DIR}/create_cistarget_motif_databases.py" ]; then
  echo "Error: ${SCRIPT_DIR}/create_cistarget_motif_databases.py not found or not executable"
  exit 1
fi

echo "[$(date)] START cisTarget DB build with ${CORES} cores"
"${MICROMAMBA_BIN}" run -n create_cistarget_databases python \
    "${SCRIPT_DIR}/create_cistarget_motif_databases.py" \
      -f "${FASTA_FILE}" \
      -M "${CBDIR}" \
      -m "${MOTIF_LIST}" \
      -o "${OUT_DIR}/${DATABASE_PREFIX}" \
      --bgpadding 1000 \
      -t ${CORES}

RET=$?
if [ $RET -eq 0 ]; then
  echo "[$(date)] DONE cisTarget DB: ${OUT_DIR}/${DATABASE_PREFIX}"
else
  echo "[$(date)] ERROR (exit ${RET})"
fi
exit $RET

③运行

chmod +x /public/home/chenzhh/udanmas/Dianmu/run_create_fasta.sh
nohup bash run_create_fasta.sh > create_fasta.log 2>&1 &
tail -f create_fasta.log
chmod +x /public/home/chenzhh/udanmas/Dianmu/run_create_cistarget_db.sh
nohup bash run_create_cistarget_db.sh > create_cistarget_db.log 2>&1 &
tail -f create_cistarget_db.log

Notes:

准备环境中需要注意很多很多问题（多到我想紫砂），我觉得后面他们应该会优化，所以我只简单写一下我搞完后还记得的内容吧

#标准创建这个环境请查看github吧，实在没有重复的心情（算了还是记录一遍吧）
git clone -b change_f4_output https://github.com/ghuls/cluster-buster/
cd cluster-buster/
make cbust
#顺带一提，cbust_amd_libm_aocc的编译不是必须的，只是一个加速项，编译这个那可费老鼻子劲了）
cp -a cbust "${CONDA_PREFIX}/bin/cbust"
cd "${CONDA_PREFIX}/bin"
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/liftOver
wget http://hgdownload.soe.ucsc.edu/admin/exe/linux.x86_64/bigWigAverageOverBed
chmod a+x liftOver bigWigAverageOverBed
micromamba activate create_cistarget_databases

mamba install -c bioconda bedtools

以及推荐环境安装时,推荐环境代码得到的环境能跑通那就有鬼了，一些很重要的点包括但不限于bedtools应该大于2.31版本他们也是只字未提（默认会直接安装2.14)

下面是我推荐的环境

name: create_cistarget_databases
channels:
  - conda-forge
dependencies:
  - python=3.10
  - numpy=1.21.6
  - pandas=1.5.3
  - pyarrow>=7.0.0
  - numba=0.56.4
  - python-flatbuffers
  - bedtools=2.31.1

SCENICPlus

讲者: Darina Abaffyová, 同组的PhD

Figure 4a: scenic+ WorkFlow

终于啊，终于，我们终于克服万难，但现在scenicplus的流程才刚刚开始（

他们使用了一个Snakemake的工具将整个流程进行封装，这样我们需要做的只有写一个config.ymal文件指向我们的文件（以及无休无止的对输入文件的Debug→因为流程的代码碰也不能碰）

①配置

Snakemake的工作目录是下面这样的：

conda activate scenicplus
cd       /public/home/chenzhh/udanmas/Lionrock/formal_scenic
scenicplus init_snakemake --out_dir scplus_pipeline
tree scplus_pipeline
#scplus_pipeline/
#└── Snakemake
#    ├── config
#    │   └── config.yaml
#    └── workflow
#        └── Snakefile
mkdir -p outs
mkdir -p tmp
vim scplus_pipeline/Snakemake/config/config.yaml

对于这个文件的写入，有以下几个文件需要格外说明：

input_data:
  cisTopic_obj_fname: "/cpfs01/projects-HDD/cfff-afe2df89e32e_HDD/zy_22111220045/Farewell/Lionrock/outs/cistopic_obj.pkl"
#选定好最佳的lda的model、指定多个topic之后保存的文件。可以暂不进行Normalization、Impute等操作，因为这些都会作为运行的Job；
  GEX_anndata_fname: "/cpfs01/projects-HDD/cfff-afe2df89e32e_HDD/zy_22111220045/Farewell/Lionrock/rna.h5ad"
#Gene Expression单独拎出来的h5ad文件。barcodes需要与cistopic_obj一致 （cistopic_obj.cell_data.index 和 cistopic_obj.cell_names）
  region_set_folder: "/cpfs01/projects-HDD/cfff-afe2df89e32e_HDD/zy_22111220045/Farewell/Lionrock/outs/region_sets"
#保存bed文件的目录。这一部分是发生大多报错的十字路口；一方面，需要删除掉所有的空文件`stat`；另一方面文件夹的结构和命名要规范；
  ctx_db_fname: "/cpfs01/projects-HDD/cfff-afe2df89e32e_HDD/zy_22111220045/Farewell/Lionrock/mm10_screen_v10_clust.regions_vs_motifs.rankings.feather"
#regions_vs_motifs.rankings.feather文件。目前我只跑通了使用官方提供的文件的流程。
  dem_db_fname: "/cpfs01/projects-HDD/cfff-afe2df89e32e_HDD/zy_22111220045/Farewell/Lionrock/mm10_screen_v10_clust.regions_vs_motifs.scores.feather"
#regions_vs_motifs.rankings.feather文件。目前我只跑通了使用官方提供的文件的流程。
  path_to_motif_annotations: "/cpfs01/projects-HDD/cfff-afe2df89e32e_HDD/zy_22111220045/Data/aertslab_motif_colleciton/v10nr_clust_public/snapshots/motifs-v10-nr.mgi-m0.00001-o0.0.tbl"
#motif注释文件，来自官网。
output_data:
  # output for prepare_GEX_ACC .h5mu
  combined_GEX_ACC_mudata: "ACC_GEX.h5mu"
  # output for motif enrichment results .hdf5
  dem_result_fname: "dem_results.hdf5"
  ctx_result_fname: "ctx_results.hdf5"
  # output html for motif enrichment results .html
  output_fname_dem_html: "dem_results.html"
  output_fname_ctx_html: "ctx_results.html"
  # output for prepare_menr .h5ad
  cistromes_direct: "cistromes_direct.h5ad"
  cistromes_extended: "cistromes_extended.h5ad"
  # output tf names .txt
  tf_names: "tf_names.txt"
  # output for download_genome_annotations .tsv
  genome_annotation: "genome_annotation.tsv"
  chromsizes: "chromsizes.tsv"
  # output for search_space .tsb
  search_space: "search_space.tsv"
  # output tf_to_gene .tsv
  tf_to_gene_adjacencies: "tf_to_gene_adj.tsv"
  # output region_to_gene .tsv
  region_to_gene_adjacencies: "region_to_gene_adj.tsv"
  # output eGRN .tsv
  eRegulons_direct: "eRegulon_direct.tsv"
  eRegulons_extended: "eRegulons_extended.tsv"
  # output AUCell .h5mu
  AUCell_direct: "AUCell_direct.h5mu"
  AUCell_extended: "AUCell_extended.h5mu"
  # output scplus mudata .h5mu
  scplus_mdata: "scplusmdata.h5mu"

params_general:
  temp_dir: "/cpfs01/projects-HDD/cfff-afe2df89e32e_HDD/zy_22111220045/Farewell/Lionrock/tmp"
  n_cpu: 30
  seed: 666
#注意配置这些文件，容易报错。
params_data_preparation:
  # Params for prepare_GEX_ACC
  bc_transform_func: "\"lambda x: f'{x}'\""
  is_multiome: True
  key_to_group_by: ""
  nr_cells_per_metacells: 10
  # Params for prepare_menr
  direct_annotation: "Direct_annot"
  extended_annotation: "Orthology_annot"
  # Params for download_genome_annotations
  species: "mmusculus"
  #hsapiens或者mmusculus
  biomart_host: "http://www.ensembl.org"
  #2024年之后统一使用这个网站
  # Params for search_space
  search_space_upstream: "1000 150000"
  search_space_downstream: "1000 150000"
  search_space_extend_tss: "10 10"

params_motif_enrichment:
  species: "mus_musculus"
  #homo_sapiens或者mus_musculus
  annotation_version: "v10nr_clust"
  motif_similarity_fdr: 0.001
  orthologous_identity_threshold: 0.0
  annotations_to_use: "Direct_annot Orthology_annot"
  fraction_overlap_w_dem_database: 0.4
  dem_max_bg_regions: 500
  dem_balance_number_of_promoters: True
  dem_promoter_space: 1_000
  dem_adj_pval_thr: 0.05
  dem_log2fc_thr: 1.0
  dem_mean_fg_thr: 0.0
  dem_motif_hit_thr: 3.0
  fraction_overlap_w_ctx_database: 0.4
  ctx_auc_threshold: 0.005
  ctx_nes_threshold: 3.0
  ctx_rank_threshold: 0.05




params_inference:
  # Params for tf_to_gene
  tf_to_gene_importance_method: "GBM"
  # Params regions_to_gene
  region_to_gene_importance_method: "GBM"
  region_to_gene_correlation_method: "SR"
  # Params for eGRN inference
  order_regions_to_genes_by: "importance"
  order_TFs_to_genes_by: "importance"
  gsea_n_perm: 1000
  quantile_thresholds_region_to_gene: "0.85 0.90 0.95"
  top_n_regionTogenes_per_gene: "5 10 15"
  top_n_regionTogenes_per_region: ""
  min_regions_per_gene: 0
  rho_threshold: 0.05
  min_target_genes: 10

然后在端口运行：

conda activate scenicplus
snakemake --cores 30 #可以和yaml文件中不一致，经过测试以这个为主

或者加入到nohup全家桶→

nohup bash run_create_fasta.sh > create_fasta_$(date +%Y%m%d_%H%M%S)_pandas223-222.log 2>&1 &
nohup bash run_create_cistarget_db.sh > create_db_$(date +%Y%m%d_%H%M%S)_pandas223-222.log 2>&1 &
nohup snakemake --cores 20 --rerun-incomplete       > snakemake_$(date +%Y%m%d_%H%M%S).log 2>&1 &

②八阿哥

遇到的bug包括但不限于数据不行、库不行；环境不行、人不行……下面我们逐步介绍我们对每个bug的解（妥）决（协）；

（1）IndexError: list index out of range

bug一般出现在第一个任务localrule motif_enrichment_cistarget，体现为啥也没说还没开始就出师未捷身先死；

> pwd
/cpfs01/projects-HDD/cfff-afe2df89e32e_HDD/zy_22111220045/Farewell/Lionrock/outs/region_sets/DARs_cell_type
> find . -maxdepth 1 -type f -name "*.bed" -size 0 -print
./ExN-CA2.bed
./ExN-DEGLU-1.bed
#可以得知这两个是空的，直接删除了事。这个是做DAR的结果，这一部分经常容易空。

（2）ValueError: Length mismatch: Expected axis has 0 elements, new values have 3 elements

bug表现为：

localrule motif_enrichment_cistarget:
2025-06-04 00:12:07,271 cisTarget    INFO     Reading cisTarget database
joblib.externals.loky.process_executor._RemoteTraceback:
File "/home/zy_22111220045/miniconda3/envs/scenicplus/lib/python3.11/site-packages/joblib/parallel.py", line 754, in _return_or_raise
    raise self._result
ValueError: Length mismatch: Expected axis has 0 elements, new values have 3 elements

通过反复测试，问题显然是上一步建库操作的问题。

③结果解释&下游分析

<1>scplus_mdata.uns["direct_e_regulon_metadata"]

	Region	Gene	importance_R2G	rho_R2G	importance_x_rho	importance_x_abs_rho	TF	is_extended	eRegulon_name	Gene_signature_name	Region_signature_name	importance_TF2G	regulation	rho_TF2G	triplet_rank
0	chr5:125638655-125639155	Tmem132b	0.000466	0.057858	0.000027	0.000027	Ahctf1	False	Ahctf1_direct_+/+	Ahctf1_direct_+/+_(39g)	Ahctf1_direct_+/+_(42r)	0.581941	1	0.106458	53472
1	chr11:35798818-35799318	Tenm2	0.000077	0.416579	0.000032	0.000032	Ahctf1	False	Ahctf1_direct_+/+	Ahctf1_direct_+/+_(39g)	Ahctf1_direct_+/+_(42r)	0.614399	1	0.155765	56352
2	chr15:8169326-8169826	Cplane1	0.148071	0.453076	0.067087	0.067087	Ahctf1	False	Ahctf1_direct_+/+	Ahctf1_direct_+/+_(39g)	Ahctf1_direct_+/+_(42r)	0.332098	1	0.115916	5396
3	chr16:95702923-95703423	Brwd1	0.064941	0.166287	0.010799	0.010799	Ahctf1	False	Ahctf1_direct_+/+	Ahctf1_direct_+/+_(39g)	Ahctf1_direct_+/+_(42r)	0.579776	1	0.073294	11180
4	chr5:150235924-150236424	Fry	0.000058	0.054841	0.000003	0.000003	Ahctf1	False	Ahctf1_direct_+/+	Ahctf1_direct_+/+_(39g)	Ahctf1_direct_+/+_(42r)	0.606757	1	0.122811	56195
...	...	...	...	...	...	...	...	...	...	...	...	...	...	...	...
58282	chr15:54571087-54571587	Enpp2	0.002650	-0.357639	-0.000948	0.000948	Zeb1	False	Zeb1_direct_-/-	Zeb1_direct_-/-_(32g)	Zeb1_direct_-/-_(48r)	10.831843	-1	-0.386763	4901
58283	chr19:10261259-10261759	Myrf	0.000008	-0.229283	-0.000002	0.000002	Zeb1	False	Zeb1_direct_-/-	Zeb1_direct_-/-_(32g)	Zeb1_direct_-/-_(48r)	2.413886	-1	-0.324805	50853
58284	chr2:127518216-127518716	Mal	0.002490	-0.479707	-0.001194	0.001194	Zeb1	False	Zeb1_direct_-/-	Zeb1_direct_-/-_(32g)	Zeb1_direct_-/-_(48r)	6.385099	-1	-0.371317	18872
58285	chr1:55449142-55449642	Plcl1	0.001321	-0.161939	-0.000214	0.000214	Zeb1	False	Zeb1_direct_-/-	Zeb1_direct_-/-_(32g)	Zeb1_direct_-/-_(48r)	1.779930	-1	-0.215149	45186
58286	chr6:85187372-85187872	Exoc6b	0.003271	-0.101657	-0.000333	0.000333	Zeb1	False	Zeb1_direct_-/-	Zeb1_direct_-/-_(32g)	Zeb1_direct_-/-_(48r)	2.747754	-1	-0.204703	34634

58287 rows × 15 columns

<2>

每日一个没用小技巧：

ps -eo user,%cpu --no-headers | awk '{cpu[$1]+=$2} END {for (u in cpu) print u, cpu[u]}' | sort -k2 -nr | head

查看是哪个人占了最多的核（八嘎