当前位置:网站首页>bedtools使用教程
bedtools使用教程
2022-07-02 09:38:00 【qq_27390023】
bedtools: a powerful toolset for genome arithmetic
bedtools工具是用于广泛的基因组学分析任务的一把利器。最广泛使用的工具能够实现基因组算术:即基因组上的集合理论。例如,bedtools允许人们从广泛使用的基因组文件格式(如BAM、BED、GFF/GTF、VCF)的多个文件中交叉、合并、计数、互补和洗牌基因组区间。虽然每个单独的工具被设计用来做一个相对简单的任务(例如,与两个区间文件相交),但通过在UNIX命令行上结合多个bedtools操作可以进行相当复杂的分析。
基因组注释文件下载地址
https://genome.ucsc.edu/cgi-bin/hgTables 下载bed文件
bedtools --version # 版本号
bedtools --contact # 帮助信息
# 下载测试文件
curl -O https://s3.amazonaws.com/bedtools-tutorials/web/cpg.bed
curl -O https://s3.amazonaws.com/bedtools-tutorials/web/exons.bed
curl -O https://s3.amazonaws.com/bedtools-tutorials/web/gwas.bed
curl -O https://s3.amazonaws.com/bedtools-tutorials/web/genome.txt
###1 bedtools intersect
## 计算overlap intervals
#Tool: bedtools intersect (aka intersectBed)
#Version: v2.30.0
#Summary: Report overlaps between two feature files.
#Usage: bedtools intersect [OPTIONS] -a <bed/gff/vcf/bam> -b <bed/gff/vcf/bam>
#注:-b 可以接多个文件
# 显示cpg.bed中和exons.bed有重叠的intervals
bedtools intersect -a cpg.bed -b exons.bed
# 显示exons.bed中和cpg.bed有重叠的intervals
bedtools intersect -a exons.bed -b cpg.bed
# 同时显示重叠区域的A、B文件中的原始记录
bedtools intersect -a exons.bed -b cpg.bed -wa -wb
# 显示重叠区域的碱基数
bedtools intersect -a cpg.bed -b exons.bed -wo
# 显示每一个cpg.bed文件中的记录在exons.bed文件中的重叠记录数
bedtools intersect -a cpg.bed -b exons.bed -c
# cpg.bed文件中不和exons.bed任何intervals重叠的记录
bedtools intersect -a cpg.bed -b exons.bed -v
bedtools intersect -a cpg.bed -b exons.bed -wo
# 设定阈值,显示cpg.bed中intervals至少有50%序列和exons.bed中的重叠
bedtools intersect -a cpg.bed -b exons.bed -wo -f 0.50
# 多个文件的重叠区域
bedtools intersect -a cpg.bed -b gwas.bed exons.bed
bedtools intersect -a cpg.bed -b gwas.bed exons.bed -wa -wb -names gwas exon # 加上文件label
# sorted数据通过加-sorted参数,运行速度更快
time bedtools intersect -a exons.bed -b cpg.bed gwas.bed -sorted >>/dev/null
###2 bedtools merge
#Tool: bedtools merge (aka mergeBed)
#Version: v2.30.0
#Summary: Merges overlapping BED/GFF/VCF entries into a single interval.
#Usage: bedtools merge [OPTIONS] -i <bed/gff/vcf>
#注意:bedtools merge要求输入文件先排序
# 排序,输入文件先按染色体排序,然后按起始位置排序。
sort -k1,1 -k2,2n test.bed >test.sorted.bed
# 显示最终的"合并 "区间
bedtools merge -i exons.bed | head -n 20
# 在计算导致每个新的 "合并 "区间的重叠区间的数量时,我们将 "计算 "第一列。
bedtools merge -i exons.bed -c 1 -o count | head -n 20
# 显示所有合并成新的"合并 "区间的重叠区间的第二行
bedtools merge -i exons.bed -c 2 -o collapse | head -n 20
# 合并距离不超过1000的区间,
bedtools merge -i exons.bed -d 1000 -c 1 -o count | head -20
# 合并距离不超过90区域,分别对第一列和第四列做不同的操作
bedtools merge -i exons.bed -d 90 -c 1,4 -o count,collapse | head -20
###3 bedtools complement
#Tool: bedtools complement (aka complementBed)
#Version: v2.30.0
#Summary: Returns the base pair complement of a feature file.
#Usage: bedtools complement [OPTIONS] -i <bed/gff/vcf> -g <genome>
#注:The genome file should tab delimited and structured as follows:
# <chromName><TAB><chromSize>
# genome.txt中,exons.bed没有的区间
bedtools complement -i exons.bed -g genome.txt
###4 bedtools genomecov
#Tool: bedtools genomecov (aka genomeCoverageBed)
#Version: v2.30.0
#Summary: Compute the coverage of a feature file among a genome.
#Usage: bedtools genomecov [OPTIONS] -i <bed/gff/vcf> -g <genome>
#注:需要排序好的文件
bedtools genomecov -i exons.bed -g genome.txt
# 输出BEDGRAPH,计算intervals的depth
bedtools genomecov -i exons.bed -g genome.txt -bg | head -20
###5 bedtools jaccard
#Tool: bedtools jaccard (aka jaccard)
#Version: v2.30.0
#Summary: Calculate Jaccard statistic b/w two feature files.
# Jaccard is the length of the intersection over the union.
# Values range from 0 (no intersection) to 1 (self intersection).
#Usage: bedtools jaccard [OPTIONS] -a <bed/gff/vcf> -b <bed/gff/vcf>
# 计算相似度
bedtools jaccard -a cpg.bed -b exons.bed
###6 bedtools coverage
#Tool: bedtools coverage (aka coverageBed)
#Version: v2.30.0
#Summary: Returns the depth and breadth of coverage of features from B
# on the intervals in A.
#Usage: bedtools coverage [OPTIONS] -a <bed/gff/vcf> -b <bed/gff/vcf>
bedtools coverage -a cpg.bed -b exons.bed
参考:
http://quinlanlab.org/tutorials/bedtools/bedtools.html
边栏推荐
- 从ros1到ros2配置的一些东西
- Win11 arm system configuration Net core environment variable
- tidb-dm报警DM_sync_process_exists_with_error排查
- JS——每次调用从数组里面随机取一个数,且不能与上一次为同一个
- spritejs
- 二.Stm32f407芯片GPIO编程,寄存器操作,库函数操作和位段操作
- Skills of PLC recorder in quickly monitoring multiple PLC bits
- TIPC Service and Topology Tracking4
- spritejs
- Xiao Sha's pain (double pointer
猜你喜欢
2022年4月17日五心红娘团队收获双份喜报
tidb-dm报警DM_sync_process_exists_with_error排查
TDSQL|就业难?腾讯云数据库微认证来帮你
TIPC addressing 2
MySQL比较运算符IN问题求解
JS——每次调用从数组里面随机取一个数,且不能与上一次为同一个
制造业数字化转型和精益生产什么关系
[idea] use the plug-in to reverse generate code with one click
QT learning diary 8 - resource file addition
[cloud native] 2.5 kubernetes core practice (Part 2)
随机推荐
[idea] use the plug-in to reverse generate code with one click
QT learning diary 8 - resource file addition
Openmldb meetup No.4 meeting minutes
原生方法合并word
TIPC Cluster5
念念不忘,必有回响 | 悬镜诚邀您参与OpenSCA用户有奖调研
PowerBI中导出数据方法汇总
Tick Data and Resampling
tidb-dm报警DM_sync_process_exists_with_error排查
Solve the problem of data blank in the quick sliding page of the uniapp list
Summary of data export methods in powerbi
TIPC协议
Amazon cloud technology community builder application window opens
Array splitting (regular thinking
Is it safe to open a stock account online? I'm a novice, please guide me
Rest (XOR) position and thinking
ImportError: cannot import name ‘Digraph‘ from ‘graphviz‘
flink二開,實現了個 batch lookup join(附源碼)
liftOver进行基因组坐标转换
Xiao Sha's pain (double pointer