当前位置:网站首页>C2 several methods of merging VCF files
C2 several methods of merging VCF files
2022-07-03 07:41:00 【502 notes on biological evolution】
1. Individuals are the same , Locus accumulation , amount to cat file
vcf-concat A.vcf.gz B.vcf.gz C.vcf.gz | gzip -c > out.vcf.gz
vcf-concat *.vcf.gz | gzip -c > out.vcf.gz
2. Same locus , Individual accumulation , amount to paste file
bcftools merge file1.vcf.gz fle2.vcf.gz file3.vcf.gz > out.vcf
bcftools merge file1.vcf.gz fle2.vcf.gz file3.vcf.gz -o out.vcf
3. Different loci , Individuals are also different , Take two. VCF Intersection of documents
3.1 Use bedtools To operate
grep "#" A.vcf > header.txt grep -v "#" A.vcf | sed 's/Chr1/1/g' > temp.txt cat header.txt temp.txt > A_new.vcf bcftools isec -p isec_output -Oz A_new.vcf.gz B.vcf.gz
It turns out that isec_output In this folder , There are 4 File
1.isec_output/0000.vcf.gz would be variants unique to 1.vcf.gz 2.isec_output/0001.vcf.gz would be variants unique to 2.vcf.gz 3.isec_output/0002.vcf.gz would be variants shared by 1.vcf.gz and 2.vcf.gz as represented in 1.vcf.gz 4.isec_output/0003.vcf.gz would be variants shared by 1.vcf.gz and 2.vcf.gz as represented in 2.vcf.gz
Then get two VCF Common locus SNP
bcftools merge --merge all 0002.vcf.gz 0003.vcf.gz > merged.vcf
3.2 Use bedops To operate
This is the intersection of two files bed file ,region file
bedops --intersect <(vcf2bed < A.vcf) <(vcf2bed < B.vcf) > answer.bed
This is the intersection of two files bed file , Site file
bedops --intersect <(vcf2bed < A.vcf) <(vcf2bed < B.vcf) > common-regions.bed bedops --everything <(vcf2bed < A.vcf) <(vcf2bed < B.vcf) > all-elements.bed bedops --element-of 1 all-elements.bed common-regions.bed > common-elements.bed
This is to get the intersection of two files in A Variation and in B Variation in , This is an individual file
bedops --element-of 1 <(vcf2bed < A.vcf) <(vcf2bed < B.vcf) > answer1.bed bedops --element-of 1 <(vcf2bed <A.vcf) <(vcf2bed < B.vcf) > answer2.bed
4. Different loci , Individuals are also different , Take two. VCF Union of files
In response to this situation , The acquisition of union files requires two files bam file
grep -v "#" A.vcf | cut -f 1,2 > pos1.txt
grep -v "#" B.vcf | cut -f 1,2 > pos2.txt
cat pos1.txt pos2.txt > posAll.txt
After getting these two files , Later mpileup, Got the of each file mpileup Format files and then merge
samtools mpileup -A -B -q 20 -Q 20 -f ref.fa bamfile.bam -l posAll.txt -r -o
#! /bin/bash # Use mpileup Command to generate vcf file # In this example, only 1 Chromosome was processed echo "SamtoolsMpileupByChr Begin: " `date` && \ samtools mpileup \ -l chr1Region.bed \ -r 1 \ -q 1 \ -C 50 \ -t DP,DV \ -m 2 \ -F 0.002 \ -f \ human.fasta \ test_3.bam \ --output test.chr1.raw.vcf && \ echo "SamtoolsMpileupByChr End: " `date`
mpileup Interpretation of the order
-C --adjust-MQ INT Coefficient used to reduce the quality of comparison , If reads There are too many mismatches in . Cannot be set to zero .BWA The recommended value is 50.
-A --count-orphans In detecting variation , Do not ignore abnormal reads Yes .
-I –positions FILE BED File or location list file containing regional sites . The location file contains two columns , Chromosome and location , from 1 Start counting .BED The file contains at least 3 Column , Chromosome 、 Start position and end position , Start end from 0 Start counting .
-r –region STR Only in the designated area pileup, Need indexed bam file . Usually and -l Parameters used together .
-q --min-MQ The minimum mapping quality of the alignment to use .
-f --fasta-ref FASTA Format fadix Index reference file . Files can be used optionally bgzip Compress .
-o –output FILE Generate pileup Format file or VCF、BCF File instead of the default standard output .
-g –BCF Calculate the likelihood value of genotype and the output file format is BCF.
-v –VCF Calculate the likelihood value of genotype and the output file format is VCF.
-D Output reads depth .
-V Output each sample that is not compared to the reference genome reads Number .
-t –output-tags LIST Set up FORMAT and INFO The contents of the list , Comma separated .
-u –uncompressed Generate uncompressed VCF and BCF file .
-I –skip-indel Don't test INDEL.
-m –min-ireads INT The candidate INDEL Minimum interval of reads.
-F –gap-frac FLOAT Contains intervals reads The smallest fragment of .
边栏推荐
- FileInputStream and fileoutputstream
- Analysis of the problems of the 12th Blue Bridge Cup single chip microcomputer provincial competition
- [set theory] Stirling subset number (Stirling subset number concept | ball model | Stirling subset number recurrence formula | binary relationship refinement relationship of division)
- Partage de l'expérience du projet: mise en œuvre d'un pass optimisé pour la fusion IR de la couche mindstore
- Technical dry goods Shengsi mindspire lite1.5 feature release, bringing a new end-to-end AI experience
- Use of file class
- Analysis of the ninth Blue Bridge Cup single chip microcomputer provincial competition
- Project experience sharing: realize an IR Fusion optimization pass of Shengsi mindspire layer
- PAT甲级 1032 Sharing
- 技术干货|昇思MindSpore可变序列长度的动态Transformer已发布!
猜你喜欢

Shengsi mindspire is upgraded again, the ultimate innovation of deep scientific computing

Go language foundation ----- 04 ----- closure, array slice, map, package

技术干货|利用昇思MindSpore复现ICCV2021 Best Paper Swin Transformer

Application of pigeon nest principle in Lucene minshouldmatchsumscorer

PAT甲级 1027 Colors in Mars

EtherCAT state machine transition (ESM)

技术干货|昇思MindSpore算子并行+异构并行,使能32卡训练2420亿参数模型
![[coppeliasim4.3] C calls UR5 in the remoteapi control scenario](/img/ca/2f72ea3590c358a6c9884aaa1a1c33.png)
[coppeliasim4.3] C calls UR5 in the remoteapi control scenario

Leetcode 213: 打家劫舍 II

Technical dry goods | alphafold/ rosettafold open source reproduction (2) - alphafold process analysis and training Construction
随机推荐
Traversal in Lucene
IO stream system and FileReader, filewriter
Analysis of the problems of the 12th Blue Bridge Cup single chip microcomputer provincial competition
研究显示乳腺癌细胞更容易在患者睡觉时进入血液
Project experience sharing: Based on mindspore, the acoustic model is realized by using dfcnn and CTC loss function
An overview of IfM Engage
技术干货|百行代码写BERT,昇思MindSpore能力大赏
Go language foundation ----- 05 ----- structure
URL programming
[mindspire paper presentation] summary of training skills in AAAI long tail problem
Usage of requests module
PgSQL converts string to double type (to_number())
opensips与对方tls sip trunk对接注意事项
密西根大学张阳教授受聘中国上海交通大学客座教授(图)
项目经验分享:基于昇思MindSpore,使用DFCNN和CTC损失函数的声学模型实现
Go language foundation ----- 04 ----- closure, array slice, map, package
【MySQL 14】使用DBeaver工具远程备份及恢复MySQL数据库(Linux 环境)
Paper learning -- Study on the similarity of water level time series of Xingzi station in Poyang Lake
Leetcode 213: 打家劫舍 II
Common architectures of IO streams