当前位置:网站首页>宏基因组 (个人笔记)
宏基因组 (个人笔记)
2022-06-21 16:59:00 【违规账号247188】
source /home/dengqr/miniconda3/bin/activate
conda config --set auto_activate_base true
#备份数据 【原始数据不动original】
cp -r 00data 00data2
#数据上次确认
ls -l | grep ".gz$" > 1.txt
查看虚拟环境列表
conda env list
创建虚拟环境,防污染环境变量,如果有的软件在Solving environment步骤数小时无法安装,可以新建环境
conda create -n meta
加载环境
conda activate meta
### 质量评估fastqc
# =为指定版本,-c指定安装源,均可加速安装
# -y为同意安装
conda install fastqc=0.11.9 -c bioconda -y
fastqc -v
### 评估报告汇总multiqc
# 注1.7为Python2环境,1.8/9新版本需要Python3的环境
conda install multiqc=1.9 -c bioconda -y
multiqc --version
### 质量控制流程kneaddata
conda install kneaddata=0.7.4 -c bioconda -y
kneaddata --version
trimmomatic -version # 0.39
bowtie2 --version # 2.4.2
db= /home/dengqr/dataset/metagenome/ #这里直接cd 到要放的目录吧
# 查看可用数据库
kneaddata_database
# 包括人基因组bowtie2/bmtagger、人类转录组、核糖体RNA和小鼠基因组
# 下载人基因组bowtie2索引 3.44 GB
mkdir -p $/home/dengqr/dataset/metagenome/kneaddata/human_genome
kneaddata_database --download human_genome bowtie2 $/home/dengqr/dataset/metagenome/kneaddata/human_genome
# 数据库下载慢或失败,附录有百度云和国内备份链接
下载到了 $/home/dengqr/home/dengqr/dataset/metagenome/kneaddata/
#mv /home/dengqr/$/home/dengqr/dataset/metagenome/kneaddata/ /home/dengqr/dataset/metagenome/ 【已完成】
## 1.2 (可选)FastQC质量评估
# 第一次使用软件要记录软件版本,文章方法中必须写清楚
fastqc --version # 0.11.8
# time统计运行时间,fastqc质量评估
# *.gz为原始数据,-t指定多线程
time fastqc seq/*.gz -t 2
▼ #32线程time= 27分钟 tip:在往上加
time fastqc -o 01fastqc 00data/*.gz -t 32
multiqc将fastqc的多个报告生成单个整合报告,方法批量查看和比较
# 记录软件版本
multiqc --version # 1.5
# 整理seq目录下fastqc报告,输出multiqc_report.html至result/qc目录
multiqc -d seq/ -o result/qc
▼
multiqc -d 01fastqc/ -o 02fastqc_result/
查看右侧result/qc目录中multiqc_report.html
#去除宿主
#索引目录 "/home/dengqr/dataset/metagenome/kneaddata/human_genome/hg37dec_v0.1.1.bt2"
kneaddata -h
#去宿主后双端不匹配——序列改名 先检查下
zcat 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R2_001.fastq.gz |head -n 6
zcat 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R1_001.fastq.gz |head -n 6
cp 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R2_001.fastq.gz 00datatrain\
cp 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R1_001.fastq.gz 00datatrain\
zcat 00datatrain/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R2_001.fastq.gz |head -n 6
zcat 00datatrain/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R1_001.fastq.gz |head -n 6
(可选) 序列改名,解决NCBI SRA数据双端ID重名问题,详见[《MPB:随机宏基因组测序数据质量控制和去宿主的分析流程和常见问题》](https://mp.weixin.qq.com/s/ovL4TwalqZvwx5qWb5fsYA)。
gunzip 00datatrain/*.gz
sed -i '1~4 s/$/\\1/g' 00datatrain/*R2_001.fastq
sed -i '1~4 s/$/\\2/g' 00datatrain/*R1_001.fastq
# 再次核对样本是否标签有重复
zcat 00datatrain/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R2_001.fastq |head -n 6
zcat 00datatrain/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R1_001.fastq |head -n 6
# 结果压缩节省空间
gzip seq/*.fq
# pigz是并行版的gzip,没装可使用为gzip
pigz seq/*.fq
time kneaddata -i 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R2_001.fastq.gz \
-i 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R1_001.fastq.gz \
-o 03qc -v -t 32 --remove-intermediate-output \
--reorder --bowtie2-options "--very-sensitive --dovetail" \
-db kneaddata/human_genome
### Java不匹配——重装Java运行环境
若出现错误 Unrecognized option: -d64,则安装java解决:
conda install -c cyclus java-jdk
/home/dengqr/miniconda2/bin/trimmomatic
type trimmomatic
type fastqc
"/home/dengqr/miniconda3/share/trimmomatic-0.39-2/"
time kneaddata -i 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R2_001.fastq.gz \
-i 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R1_001.fastq.gz \
-o 03qc -v -t 32 --remove-intermediate-output \
-db kneaddata/human_genome
time kneaddata -i 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R2_001.fastq.gz \
-i 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R1_001.fastq.gz \
-o 03qc -v -t 32 --remove-intermediate-output \
--trimmomatic home/dengqr/miniconda2/bin/trimmomatic \
--reorder --bowtie2-options "--very-sensitive --dovetail" \
-db kneaddata/human_genome
time kneaddata -t 40 -v \
-i 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R2_001.fastq.gz \
-i 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R1_001.fastq.gz \
-o 03qc/ \
--trimmomatic /home/dengqr/miniconda3/share/trimmomatic-0.39-2/ \
--max-memory 80g \
--trimmomatic-options "SLIDINGWINDOW:4:20 MINLEN:50" \
-db kneaddata/human_genome/ \
--bowtie2-options "--very-sensitive --dovetail --reoeder" \
--remove-intermediate-output
"/home/dengqr/miniconda2/bin/trimmomatic-0.33.jar"
"/home/dengqr/miniconda3/bin/trimmomatic"
▼
time kneaddata \
-i 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R2_001.fastq.gz \
-i 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R1_001.fastq.gz \
-o temp/qc -v -t 40 --remove-intermediate-output \
--trimmomatic /home/dengqr/miniconda3/share/trimmomatic-0.39-2/ \
--trimmomatic-options "SLIDINGWINDOW:4:20 MINLEN:50" \
--reorder --bowtie2-options "--very-sensitive --dovetail" \
-db kneaddata/human_genome
▼
time kneaddata \
-i 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R2_001.fastq.gz \
-i 00data/OSCC35A_20211015NA_AGGCAGAA_S156_L002_R1_001.fastq.gz \
-o temp1/qc -v -t 40 --remove-intermediate-output \
--trimmomatic /home/dengqr/miniconda3/share/trimmomatic-0.39-2/ \
--trimmomatic-options "SLIDINGWINDOW:4:20 MINLEN:50" \
--reorder --bowtie2-options "--very-sensitive --dovetail" \
-db kneaddata/human_genome
▼
# 采用kneaddata附属工具kneaddata_read_count_table
kneaddata_read_count_table --input temp1/qc \
--output temp1/kneaddata.txt
# 筛选重点结果列
cut -f 1,2,4,12,13 temp1/kneaddata.txt | sed 's/_1_kneaddata//' > temp1/qc/sum.txt
cat temp1/qc/sum.txt
#质控结果
fastqc temp1/qc/*_1_kneaddata_paired_*.fastq -t 2 -o temp1
multiqc -d temp1/ -o temp1/
fastqc temp1/qc/*R2_001_kneaddata_paired_*.fastq -t 2 -o temp1
multiqc -d temp1/ -o temp1/
OSCC35A_20211015NA_AGGCAGAA_S156_L002_
"/home/dengqr/dataset/metagenome/00data/Control105A_R1_001.fastq.gz"
#多任务并行运行
→记得知情同意 # 打will cite承诺引用并行软件parallel
parallel --citation
parallel -j 3 --xapply "echo 00data/{1}_R1_001.fastq.gz 00data/{1}_R2_001.fastq.gz" ::: `tail -n+2 metadata.txt|cut -f1`
time parallel -j 2 --xapply \
"kneaddata -i 00data/{1}_R1_001.fastq.gz \ -i 00data/{1}_R2_001.fastq.gz \ -o temp/qc -v -t 40 --remove-intermediate-output \ --trimmomatic /home/dengqr/miniconda3/share/trimmomatic-0.39-2/ \ --trimmomatic-options 'SLIDINGWINDOW:4:20 MINLEN:50' \ --reorder --bowtie2-options '--very-sensitive --dovetail' \ -db kneaddata/human_genome" ::: `tail -n+2 metadata.txt|cut -f1`
边栏推荐
猜你喜欢

Development of digital collection system and construction of NFT artwork trading platform

EtherCAT igh master station controls three Delta asdaa2 servo rotating circles
![Leetcode 1108 IP address invalidation [violence] the leetcode path of heroding](/img/c6/d3eb6cee92b1c0848bf3d3b58b8b22.png)
Leetcode 1108 IP address invalidation [violence] the leetcode path of heroding

RK3566调试GC2053

Lei Jun's hundreds of billions of mistakes?

Byte Jump propose un nouveau type de réseau léger et efficace, mobovit, qui surpasse GhostNet et mobilenetv3 dans la classification, la détection et d'autres tâches CV!

基于AM4377的EtherCAT主站控制stm32从站

存储器分级介绍

原码、补码、反码的关系

POSIX信号量
随机推荐
Move Protocol Beta测试版稳定,临时决定奖池规模再扩大
I got a pay cut in disguise
POSIX信号量
EtherCAT master station based on am4377 controls STM32 slave station
存储器分级介绍
原码、补码、反码的关系
Global installation of node
Node的全局安装
EtherCAT igh master station controls Esther servo to return to zero
Node modular management
Typescript的通用类型检查
TypeScript的一些基本特征
云安全日报220621:Ubuntu操作系统发现英特尔微码漏洞,需要尽快升级
字節跳動提出輕量級高效新型網絡MoCoViT,在分類、檢測等CV任務上性能優於GhostNet、MobileNetV3!
epoll+threadpool高并发网络IO模型的实现
有哪些好用的工作汇报工具
POSIX create terminate thread
力扣141.环形链表
如何通过 dba_hist_active_sess_history 分析数据库历史性能问题
RT thread persimmon pie M7 Quanzhi f133 DDR running xboot