当前位置:网站首页>C2 several methods of merging VCF files
C2 several methods of merging VCF files
2022-07-03 07:41:00 【502 notes on biological evolution】
1. Individuals are the same , Locus accumulation , amount to cat file
vcf-concat A.vcf.gz B.vcf.gz C.vcf.gz | gzip -c > out.vcf.gz
vcf-concat *.vcf.gz | gzip -c > out.vcf.gz
2. Same locus , Individual accumulation , amount to paste file
bcftools merge file1.vcf.gz fle2.vcf.gz file3.vcf.gz > out.vcf
bcftools merge file1.vcf.gz fle2.vcf.gz file3.vcf.gz -o out.vcf
3. Different loci , Individuals are also different , Take two. VCF Intersection of documents
3.1 Use bedtools To operate
grep "#" A.vcf > header.txt grep -v "#" A.vcf | sed 's/Chr1/1/g' > temp.txt cat header.txt temp.txt > A_new.vcf bcftools isec -p isec_output -Oz A_new.vcf.gz B.vcf.gz
It turns out that isec_output In this folder , There are 4 File
1.isec_output/0000.vcf.gz would be variants unique to 1.vcf.gz 2.isec_output/0001.vcf.gz would be variants unique to 2.vcf.gz 3.isec_output/0002.vcf.gz would be variants shared by 1.vcf.gz and 2.vcf.gz as represented in 1.vcf.gz 4.isec_output/0003.vcf.gz would be variants shared by 1.vcf.gz and 2.vcf.gz as represented in 2.vcf.gz
Then get two VCF Common locus SNP
bcftools merge --merge all 0002.vcf.gz 0003.vcf.gz > merged.vcf
3.2 Use bedops To operate
This is the intersection of two files bed file ,region file
bedops --intersect <(vcf2bed < A.vcf) <(vcf2bed < B.vcf) > answer.bed
This is the intersection of two files bed file , Site file
bedops --intersect <(vcf2bed < A.vcf) <(vcf2bed < B.vcf) > common-regions.bed bedops --everything <(vcf2bed < A.vcf) <(vcf2bed < B.vcf) > all-elements.bed bedops --element-of 1 all-elements.bed common-regions.bed > common-elements.bed
This is to get the intersection of two files in A Variation and in B Variation in , This is an individual file
bedops --element-of 1 <(vcf2bed < A.vcf) <(vcf2bed < B.vcf) > answer1.bed bedops --element-of 1 <(vcf2bed <A.vcf) <(vcf2bed < B.vcf) > answer2.bed
4. Different loci , Individuals are also different , Take two. VCF Union of files
In response to this situation , The acquisition of union files requires two files bam file
grep -v "#" A.vcf | cut -f 1,2 > pos1.txt
grep -v "#" B.vcf | cut -f 1,2 > pos2.txt
cat pos1.txt pos2.txt > posAll.txt
After getting these two files , Later mpileup, Got the of each file mpileup Format files and then merge
samtools mpileup -A -B -q 20 -Q 20 -f ref.fa bamfile.bam -l posAll.txt -r -o
#! /bin/bash # Use mpileup Command to generate vcf file # In this example, only 1 Chromosome was processed echo "SamtoolsMpileupByChr Begin: " `date` && \ samtools mpileup \ -l chr1Region.bed \ -r 1 \ -q 1 \ -C 50 \ -t DP,DV \ -m 2 \ -F 0.002 \ -f \ human.fasta \ test_3.bam \ --output test.chr1.raw.vcf && \ echo "SamtoolsMpileupByChr End: " `date`
mpileup Interpretation of the order
-C --adjust-MQ INT Coefficient used to reduce the quality of comparison , If reads There are too many mismatches in . Cannot be set to zero .BWA The recommended value is 50.
-A --count-orphans In detecting variation , Do not ignore abnormal reads Yes .
-I –positions FILE BED File or location list file containing regional sites . The location file contains two columns , Chromosome and location , from 1 Start counting .BED The file contains at least 3 Column , Chromosome 、 Start position and end position , Start end from 0 Start counting .
-r –region STR Only in the designated area pileup, Need indexed bam file . Usually and -l Parameters used together .
-q --min-MQ The minimum mapping quality of the alignment to use .
-f --fasta-ref FASTA Format fadix Index reference file . Files can be used optionally bgzip Compress .
-o –output FILE Generate pileup Format file or VCF、BCF File instead of the default standard output .
-g –BCF Calculate the likelihood value of genotype and the output file format is BCF.
-v –VCF Calculate the likelihood value of genotype and the output file format is VCF.
-D Output reads depth .
-V Output each sample that is not compared to the reference genome reads Number .
-t –output-tags LIST Set up FORMAT and INFO The contents of the list , Comma separated .
-u –uncompressed Generate uncompressed VCF and BCF file .
-I –skip-indel Don't test INDEL.
-m –min-ireads INT The candidate INDEL Minimum interval of reads.
-F –gap-frac FLOAT Contains intervals reads The smallest fragment of .
边栏推荐
- Lucene merge document order
- 項目經驗分享:實現一個昇思MindSpore 圖層 IR 融合優化 pass
- Analysis of the eighth Blue Bridge Cup single chip microcomputer provincial competition
- HISAT2 - StringTie - DESeq2 pipeline 进行bulk RNA-seq
- PAT甲级 1030 Travel Plan
- C2-关于VCF文件合并的几种方法
- Go language foundation ----- 01 ----- go language features
- Go language foundation ----- 18 ----- collaboration security, mutex lock, read-write lock, anonymous lock, sync Once
- Project experience sharing: realize an IR Fusion optimization pass of Shengsi mindspire layer
- Vertx multi vertical shared data
猜你喜欢
最全SQL与NoSQL优缺点对比
技术干货|关于AI Architecture未来的一些思考
Go language foundation ----- 03 ----- process control, function, value transfer, reference transfer, defer function
Go language foundation ----- 05 ----- structure
Robots protocol
技术干货|AI框架动静态图统一的思考
Analysis of the ninth Blue Bridge Cup single chip microcomputer provincial competition
Go language foundation ----- 09 ----- exception handling (error, panic, recover)
技术干货|昇思MindSpore NLP模型迁移之Bert模型—文本匹配任务(二):训练和评估
PAT甲级 1031 Hello World for U
随机推荐
【开发笔记】基于机智云4G转接板GC211的设备上云APP控制
【LeetCode】2. Valid Parentheses·有效的括号
An overview of IfM Engage
yarn link 是如何帮助开发者对 NPM 包进行 debug 的?
Go language foundation ----- 19 ----- context usage principle, interface, derived context (the multiplexing of select can be better understood here)
The difference between typescript let and VaR
输入三次猜一个数字
HCIA notes
Paper learning -- Study on the similarity of water level time series of Xingzi station in Poyang Lake
Leetcode 198: 打家劫舍
图像识别与检测--笔记
技术干货|昇思MindSpore可变序列长度的动态Transformer已发布!
Analysis of the problems of the 10th Blue Bridge Cup single chip microcomputer provincial competition
IO stream system and FileReader, filewriter
Go language foundation ----- 04 ----- closure, array slice, map, package
技术干货|昇思MindSpore初级课程上线:从基本概念到实操,1小时上手!
Leetcode 213: 打家劫舍 II
Lucene skip table
VMware network mode - bridge, host only, NAT network
Comparison of advantages and disadvantages between most complete SQL and NoSQL