当前位置:网站首页>scanpy下载数据慢或者失败问题解决
scanpy下载数据慢或者失败问题解决
2022-06-10 09:21:00 【qq_45759229】
我一直以来都有这个问题,像
sc.dataset.paul15()
sc.datasets.pbmc3k_processed()
sc.datasets.pbmc68k_reduced()
这些数据集其实我都是下载失败的,尤其是在jupyter中运行,基本上没有成功过,所以可以采取本地导入的方式
paul15数据集
复制以下网址到浏览器
http://falexwolf.de/data/paul15.h5
然后下载,保存到本地某个文件夹中,这里在浏览器中下载其实非常快的,导入的时候用以下代码
import scanpy as sc
import h5py
import anndata as ad
filename="/Users/xiaokangyu/scanpy_dataset/paul15/paul15.h5"
with h5py.File(filename, 'r') as f:
X = f['data.debatched'][()]
gene_names = f['data.debatched_rownames'][()].astype(str)
cell_names = f['data.debatched_colnames'][()].astype(str)
clusters = f['cluster.id'][()].flatten().astype(int)
infogenes_names = f['info.genes_strings'][()].astype(str)
# each row has to correspond to a observation, therefore transpose
adata = ad.AnnData(X.transpose(), dtype=X.dtype)
adata.var_names = gene_names
adata.row_names = cell_names
# names reflecting the cell type identifications from the paper
cell_type = 6 * ['Ery']
cell_type += 'MEP Mk GMP GMP DC Baso Baso Mo Mo Neu Neu Eos Lymph'.split()
adata.obs['paul15_clusters'] = [f'{
i}{
cell_type[i-1]}' for i in clusters]
# make string annotations categorical (optional)
#_utils.sanitize_anndata(adata)
# just keep the first of the two equivalent names per gene
adata.var_names = [gn.split(';')[0] for gn in adata.var_names]
# remove 10 corrupted gene names
infogenes_names = np.intersect1d(infogenes_names, adata.var_names)
# restrict data array to the 3461 informative genes
adata = adata[:, infogenes_names]
# usually we'd set the root cell to an arbitrary cell in the MEP cluster
# adata.uns['iroot'] = np.flatnonzero(adata.obs['paul15_clusters'] == '7MEP')[0]
# here, set the root cell as in Haghverdi et al. (2016)
# note that other than in Matlab/R, counting starts at 0
adata.uns['iroot'] = 840
print(adata)
上面获得的adata,与adata=sc.datasets.paul15()得到的adata是一样的,同理可以对其他数据集这样操作
边栏推荐
- Linear Regression
- makefile中$$的使用
- All the contents of the page in row units are segmented in rows
- 谈谈数字化转型成功的10个启示
- How to hide application previews when switching applications while using shutter
- [JUC series] basic use of thread pool
- 在 Kubernetes 中基于 StatefulSet 部署 MySQL(上)
- 接触式IC卡 - STM32(Smart Card)
- How to Understand Your Data With Visualization
- Method of adding status bar in MFC window
猜你喜欢

Why can't Google search page infinite?

printk学习之(一):基本原理

阿里巴巴数字化转型的启示

Configure vscode+cmake under win11

Win11 installing pandoc

Switch the formatting plug-in of vscode

在 Kubernetes 中基于 StatefulSet 部署 MySQL(下)

悬赏任务源码开发设计构建时,要留意哪些事项

Inspiration de la transformation numérique d'Alibaba

Reference counting and smart pointer for VTK learning
随机推荐
Is it safe to open a stock account by mobile phone?
Thinking about function declaration
互联网公司研发效能团队为啥必须独立?何时独立?
win11下配置vscode+cmake
关于函数声明的思考
Summary of MATLAB error reporting
C language define variable parameter__ VA_ ARGS__ And__ VA_ ARGS__ Use of
Leetcode Langya list level 20 - binary sum
Texture mapping for VTK learning
乐鑫对 Zephyr 的最新支持
Skill tree evaluation
How to hide application previews when switching applications while using shutter
Task03: more complex query (2)
Actual use of runloop
谈谈数字化转型方法|正确认识数字化转型
The pipelineexecute pipeline execution process of VTK learning
BlockingQueue、SynchronousQueue、ArrayBlockingQueue、ReentrantLock、Semaphore、公平与非公平
消息队列选型手册
C语言一维数组名究竟是什么
Auto. JS Pro development environment configuration