当前位置:网站首页>December 4, 2021 [metagenome] - sorting out the progress of metagenome process construction
December 4, 2021 [metagenome] - sorting out the progress of metagenome process construction
2022-06-30 07:38:00 【Muyiqing】
- analytical framework
-
- Quality control
- Filter
- trimmomatic PE {input.R1} {input.R2} {output.R1_PE} {output.R1_UNPE} {output.R2_PE} {output.R2_UNPE} LEADING:3 TRAILING:3 SLIDINGWINDOW:5:20 MINLEN:50 -phred33
- duplicate removal
- echo \"{input.R1}\" > {params.sample_ID}_merge.fastuniq ;echo \"{input.R2}\" >> {params.sample_ID}_merge.fastuniq;fastuniq -i {params.sample_ID}_merge.fastuniq -t q -o {output.R1_uniq} -p {output.R2_uniq} -c 0
- De hosting
- bwa mem -k 30 -R \'@RG\\tID:foo\\tSM:bar\\tLB:Abace\' -t {threads} {params.genome} {input.R1} {input.R2} > {output}
- samtools view -bS {input} -o {output}
- samtools sort {input} -o {output}
- samtools view {input.bam_file}|awk -F '\t' '$3==\"*\"{ {print $1}}'|uniq|seqtk subseq {input.R1} -> {output.nohost_file_R1};samtools view {input.bam_file}|awk -F '\t' '$3==\"*\"{ {print $1}}'|uniq|seqtk subseq {input.R2} -> {output.nohost_file_R2}
- The report
- gzip -c {input.nohost_file_R1} > {output.clean_R1}; gzip -c {input.nohost_file_R2} > {output.clean_R2};fastqc -o {output.fastqc_dir} --extract -q {output.clean_R1} {output.clean_R2}
- Filter
- Reads-based
- Species notes :MetaPhIAn
- cat {input.clean_R1} {input.clean_R2} > {params.combine_file};humann --input {params.combine_file} --output {output.result}
- Function notes :Humann
- humann --input 02.align/nohost/4a_combine.fq --output 05.Annotation/4a/
- Species notes :Kraken2
- Species notes :MetaPhIAn
- contigs-based
- assemble :Megahit
- megahit -t {threads} -1 {input.clean_R1} -2 {input.clean_R2} -o {output.dir} --k-min 35 --k-max 95 --k-step 20 --min-contig-len 500 -m 0.1;cp {output.dir}/final.contigs.fa {output.assembly_fa}
- Statistics Coverage:pileup
- pileup.sh in={input.bam} ref={input.genome} out={output.covstats} overwrite=true
- forecast :prodigal
- prodigal -i {input.contig} -o {output.gff} -f gff -p meta
- Special notes
- cd {params.sample_id};mkdir -p card dbCAN phi vfdb tcdb signalp;cd phi;/home/tanchaojun/anaconda3/envs/wgs/bin/diamond blastx -p 6 -k 1 -e 0.00001 --db /home/tanchaojun/database/phi/phi --query ../../../{input.genomic_cds} --out phi_result;cd ../vfdb;/home/tanchaojun/anaconda3/envs/wgs/bin/diamond blastx -p 6 -k 1 -e 0.00001 --db /home/tanchaojun/database/vfdb/vfdb_setA --query ../../../{input.genomic_cds} --out vfdb_result;cd ../tcdb;/home/tanchaojun/anaconda3/envs/wgs/bin/diamond blastx -p 6 -k 1 -e 0.00001 --db /home/tanchaojun/database/tcdb/tcdb --query ../../../{input.genomic_cds} --out tcdb_result;cd ../signalp;signalp -fasta ../../../{input.genomic_cds} -gff3 -mature -prefix signalp_result;cd ../dbCAN;/home/tanchaojun/anaconda3/envs/run_dbcan/bin/run_dbcan.py --db_dir /home/tanchaojun/database/dbCAN --hmm_cov 0.35 --hmm_eval 1e-15 --hmm_cpu 8 --dia_eval 1e-102 --dia_cpu 8 --out_dir ./ --out_pre dbCAN_result ../../../{input.genomic_cds} prok;cd ../card;/home/tanchaojun/anaconda3/envs/rgi/bin/rgi main -n 8 --input_sequence ../../../{input.genomic_cds} --output_file card_result --clean;
- EGGNOG、CAZy、COG notes
- cp {input.cds_fa} {params.sample_id}/cds.fa;cd {params.sample_id};mkdir -p eggnog COG;cd eggnog;emapper.py --cpu 20 --itype CDS -i ../../../{input.cds_fa} -o out --override -m diamond --evalue 0.001 --score 60 --pident 40 --query_cover 20 --subject_cover 20 --tax_scope auto --target_orthologs all --go_evidence non-electronic --pfam_realign none --report_orthologs --decorate_gff yes --data_dir /home/tanchaojun/anaconda3/envs/eggnog/lib/python3.7/site-packages/data;cd ../../../
- cp scripts/anno/COG/* {params.sample_id};cp scripts/anno/eggnog/* {params.sample_eggnog};cd {params.sample_eggnog};perl emapper2anno.pl out.emapper.annotations > ../COG/eggnog.anno.xls;cd ../COG;python COG.py fun2003-2014.tab ../eggnog/out.emapper.annotations;/usr/bin/Rscript 7.eggnog.plot.R DrawAnnotationPic.R.txt COG.pdf
- mkdir -p {params.CAZy_dir}&&cd {params.CAZy_dir};/home/tanchaojun/anaconda3/envs/run_dbcan/bin/run_dbcan.py --db_dir /home/tanchaojun/database/dbCAN --hmm_cov 0.35 --hmm_eval 1e-15 --hmm_cpu 8 --dia_eval 1e-102 --dia_cpu 8 --out_dir ./ --out_pre dbCAN_result ../../../{input.cds_fa} prok
- Species notes :NR( Not yet in )
- assemble :Megahit
- Post analysis ( Not completed yet )
- Species and functional composition analysis
- Venn
- heatmap
- species 、 Functional diagram
- Species composition diagram
- Sample comparison and analysis
- UPGMA Clustering analysis
- Hierarchical clustering heat map
- PCA
- PCoA
- Comparison and analysis between groups
- AMOVA
- Species and functional composition analysis
边栏推荐
- 4diac getting started example
- Network, network card and IP configuration
- DXP software uses shortcut keys
- Calculate Euler angle according to rotation matrix R yaw, pitch, roll source code
- Xiashuo think tank: 42 reports on planet update today (including 23 planning cases)
- Test enumeration types with STM32 platform running RT thread
- right four steps of SEIF SLAM
- November 16, 2021 [reading notes] - macro genome analysis process
- Global digital industry strategy and policy observation in 2021 (China Academy of ICT)
- 期末复习-PHP学习笔记5-PHP数组
猜你喜欢

Final review -php learning notes 1

Periodic planning work

期末複習-PHP學習筆記6-字符串處理

Efga design open source framework fabulous series (I) establishment of development environment

Cross compile opencv3.4 download cross compile tool chain and compile (3)

Common sorting methods

Deloitte: investment management industry outlook in 2022

Self study notes -- use of 74h573

2022 retail industry strategy: three strategies for consumer goods gold digging (in depth)

线程池——C语言
随机推荐
2022 Research Report on China's intelligent fiscal and tax Market: accurate positioning, integration and diversity
Installation software operation manual (continuous update)
uniapp图片下方加标签标图片
期末复习-PHP学习笔记2-PHP语言基础
Solve the linear equation of a specified point and a specified direction
Final review -php learning notes 4-php custom functions
Network, network card and IP configuration
期末複習-PHP學習筆記6-字符串處理
RT thread kernel application development message queue experiment
Use of ecostruxure (2) IEC61499 to establish function blocks
C language implementation of chain stack (without leading node)
STM32 control LED lamp
Examen final - notes d'apprentissage PHP 6 - traitement des chaînes
Pool de Threads - langage C
Virtual machine VMware: due to vcruntime140 not found_ 1.dll, unable to continue code execution
Intersection of two lines
Wangbohua: development situation and challenges of photovoltaic industry
Disk space, logical volume
Minecraft 1.16.5模组开发(五十) 书籍词典 (Guide Book)
Variable storage unit and pointer
