当前位置:网站首页>December 4, 2021 [metagenome] - sorting out the progress of metagenome process construction
December 4, 2021 [metagenome] - sorting out the progress of metagenome process construction
2022-06-30 07:38:00 【Muyiqing】
- analytical framework
-
- Quality control
- Filter
- trimmomatic PE {input.R1} {input.R2} {output.R1_PE} {output.R1_UNPE} {output.R2_PE} {output.R2_UNPE} LEADING:3 TRAILING:3 SLIDINGWINDOW:5:20 MINLEN:50 -phred33
- duplicate removal
- echo \"{input.R1}\" > {params.sample_ID}_merge.fastuniq ;echo \"{input.R2}\" >> {params.sample_ID}_merge.fastuniq;fastuniq -i {params.sample_ID}_merge.fastuniq -t q -o {output.R1_uniq} -p {output.R2_uniq} -c 0
- De hosting
- bwa mem -k 30 -R \'@RG\\tID:foo\\tSM:bar\\tLB:Abace\' -t {threads} {params.genome} {input.R1} {input.R2} > {output}
- samtools view -bS {input} -o {output}
- samtools sort {input} -o {output}
- samtools view {input.bam_file}|awk -F '\t' '$3==\"*\"{ {print $1}}'|uniq|seqtk subseq {input.R1} -> {output.nohost_file_R1};samtools view {input.bam_file}|awk -F '\t' '$3==\"*\"{ {print $1}}'|uniq|seqtk subseq {input.R2} -> {output.nohost_file_R2}
- The report
- gzip -c {input.nohost_file_R1} > {output.clean_R1}; gzip -c {input.nohost_file_R2} > {output.clean_R2};fastqc -o {output.fastqc_dir} --extract -q {output.clean_R1} {output.clean_R2}
- Filter
- Reads-based
- Species notes :MetaPhIAn
- cat {input.clean_R1} {input.clean_R2} > {params.combine_file};humann --input {params.combine_file} --output {output.result}
- Function notes :Humann
- humann --input 02.align/nohost/4a_combine.fq --output 05.Annotation/4a/
- Species notes :Kraken2
- Species notes :MetaPhIAn
- contigs-based
- assemble :Megahit
- megahit -t {threads} -1 {input.clean_R1} -2 {input.clean_R2} -o {output.dir} --k-min 35 --k-max 95 --k-step 20 --min-contig-len 500 -m 0.1;cp {output.dir}/final.contigs.fa {output.assembly_fa}
- Statistics Coverage:pileup
- pileup.sh in={input.bam} ref={input.genome} out={output.covstats} overwrite=true
- forecast :prodigal
- prodigal -i {input.contig} -o {output.gff} -f gff -p meta
- Special notes
- cd {params.sample_id};mkdir -p card dbCAN phi vfdb tcdb signalp;cd phi;/home/tanchaojun/anaconda3/envs/wgs/bin/diamond blastx -p 6 -k 1 -e 0.00001 --db /home/tanchaojun/database/phi/phi --query ../../../{input.genomic_cds} --out phi_result;cd ../vfdb;/home/tanchaojun/anaconda3/envs/wgs/bin/diamond blastx -p 6 -k 1 -e 0.00001 --db /home/tanchaojun/database/vfdb/vfdb_setA --query ../../../{input.genomic_cds} --out vfdb_result;cd ../tcdb;/home/tanchaojun/anaconda3/envs/wgs/bin/diamond blastx -p 6 -k 1 -e 0.00001 --db /home/tanchaojun/database/tcdb/tcdb --query ../../../{input.genomic_cds} --out tcdb_result;cd ../signalp;signalp -fasta ../../../{input.genomic_cds} -gff3 -mature -prefix signalp_result;cd ../dbCAN;/home/tanchaojun/anaconda3/envs/run_dbcan/bin/run_dbcan.py --db_dir /home/tanchaojun/database/dbCAN --hmm_cov 0.35 --hmm_eval 1e-15 --hmm_cpu 8 --dia_eval 1e-102 --dia_cpu 8 --out_dir ./ --out_pre dbCAN_result ../../../{input.genomic_cds} prok;cd ../card;/home/tanchaojun/anaconda3/envs/rgi/bin/rgi main -n 8 --input_sequence ../../../{input.genomic_cds} --output_file card_result --clean;
- EGGNOG、CAZy、COG notes
- cp {input.cds_fa} {params.sample_id}/cds.fa;cd {params.sample_id};mkdir -p eggnog COG;cd eggnog;emapper.py --cpu 20 --itype CDS -i ../../../{input.cds_fa} -o out --override -m diamond --evalue 0.001 --score 60 --pident 40 --query_cover 20 --subject_cover 20 --tax_scope auto --target_orthologs all --go_evidence non-electronic --pfam_realign none --report_orthologs --decorate_gff yes --data_dir /home/tanchaojun/anaconda3/envs/eggnog/lib/python3.7/site-packages/data;cd ../../../
- cp scripts/anno/COG/* {params.sample_id};cp scripts/anno/eggnog/* {params.sample_eggnog};cd {params.sample_eggnog};perl emapper2anno.pl out.emapper.annotations > ../COG/eggnog.anno.xls;cd ../COG;python COG.py fun2003-2014.tab ../eggnog/out.emapper.annotations;/usr/bin/Rscript 7.eggnog.plot.R DrawAnnotationPic.R.txt COG.pdf
- mkdir -p {params.CAZy_dir}&&cd {params.CAZy_dir};/home/tanchaojun/anaconda3/envs/run_dbcan/bin/run_dbcan.py --db_dir /home/tanchaojun/database/dbCAN --hmm_cov 0.35 --hmm_eval 1e-15 --hmm_cpu 8 --dia_eval 1e-102 --dia_cpu 8 --out_dir ./ --out_pre dbCAN_result ../../../{input.cds_fa} prok
- Species notes :NR( Not yet in )
- assemble :Megahit
- Post analysis ( Not completed yet )
- Species and functional composition analysis
- Venn
- heatmap
- species 、 Functional diagram
- Species composition diagram
- Sample comparison and analysis
- UPGMA Clustering analysis
- Hierarchical clustering heat map
- PCA
- PCoA
- Comparison and analysis between groups
- AMOVA
- Species and functional composition analysis
边栏推荐
- Arm debug interface (adiv5) analysis (I) introduction and implementation [continuous update]
- C language implementation sequence stack
- Lodash filter collection using array of values
- ADC basic concepts
- Experiment 1: comprehensive experiment [process on]
- Calculate Euler angle according to rotation matrix R yaw, pitch, roll source code
- 期末复习-PHP学习笔记4-PHP自定义函数
- How to quickly delete routing in Ad
- Global digital industry strategy and policy observation in 2021 (China Academy of ICT)
- Account command and account authority
猜你喜欢
Cadence innovus physical implementation series (I) Lab 1 preliminary innovus
Final review -php learning notes 6- string processing
Introduction to ecostruxure (1) IEC61499 new scheme
实验一、综合实验【Process on】
Virtual machine VMware: due to vcruntime140 not found_ 1.dll, unable to continue code execution
Examen final - notes d'apprentissage PHP 6 - traitement des chaînes
24C02
342 maps covering exquisite knowledge, one of which is classic and pasted on the wall
期末複習-PHP學習筆記6-字符串處理
Implementation of double linked list in C language
随机推荐
Lt268 the most convenient TFT-LCD serial port screen chip in the whole network
Implementation of binary search in C language
线程池——C语言
STM32 infrared communication 2
Next initializesecuritycontext failed: unknown error (0x80092012) - the revocation function cannot check whether the certificate is revoked.
Research Report on search business value in the era of big search in 2022
Final review -php learning notes 5-php array
DXP shortcut key
Program acceleration
Efga design open source framework openlane series (I) development environment construction
Network security and data in 2021: collection of new compliance review articles (215 pages)
1、 Output debugging information: makefile file debugging information $(warning "tests" $(mkfile\u path)); makefile file path
Thread network
Cadence innovus physical implementation series (I) Lab 1 preliminary innovus
How to batch modify packaging for DXP schematic diagram
期末複習-PHP學習筆記3-PHP流程控制語句
03 - programming framework: Division of application layer, middle layer and driver layer in bare metal programming
Sublime text 3 configuring the C language running environment
Local unloading traffic of 5g application
Final review -php learning notes 8-mysql database