Find Transposon Element insertions using long reads (nanopore), by alignment directly. (minimap2)

Overview

find_te_ins

find_te_ins is designed to find Transposon Element (TE) insertions using long reads (nanopore), by alignment directly. (minimap2)

Install

$ git clone https://github.com/bakerwm/find_te_ins.git
$ cd find_te_ins

Change the following variables upon your condition: genome_fa and te_fa in line-10 and line-11;

$ bash run_pipe.sh
run_pipe.sh 
    
    

    
   

Prerequisite

  • minimap2 - 2.17-r974-dirty, align long reads to reference genome
  • featureCounts - v2.0.0, quantification
  • samtools - v1.12, working with BAM files
  • python 3.8+
  • pysam 0.16.0.1, python module, working with BAM files

Getting Started

1 Prepare input files

  • genome_fa - reference genome in fasta format, in script run_pipe.sh, line-10
  • te_fa - TE consensus sequence in fasta format, in script run_pipe.sh, line-11
  • long reads - Long reads from NanoPore or Pacbio, in fasta or fastq format

2 Run pipe

$ cd ~/work/te_ins
# specify the path of long reads data: 
   
    /
   
$ git clone https://github.com/bakerwm/find_te_ins.git 
$ bash find_te_ins/run_pipe.sh <path-to-long-reads>/ results

[1/9] align to reference genome
[2/9] extract raw insertions from BAM, by CIGAR
[3/9] convert raw insertions to fasta format
[4/9] align raw_insertion to transposon
[5/9] extract transposon name for insertions
[6/9] merge raw_insertions by window=100
[7/9] count reads for each insertion
[8/9] save final insertions to file
[9/9] Done!

3 Output

The following files listed below are the output of the pipeline, the TE insertions saved in file *.te_ins.final.bed

$ tree -L 2 results/ONT_sample-1
.
├── ONT_sample-1
│   ├── ONT_sample-1.bam
│   ├── ONT_sample-1.bam.bai
│   ├── ONT_sample-1.raw_ins.bed
│   ├── ONT_sample-1.raw_ins.fa
│   ├── ONT_sample-1.raw_ins.fa.bam
│   ├── ONT_sample-1.raw_ins.fa.bam.bai
│   ├── ONT_sample-1.te_ins.bed
│   ├── ONT_sample-1.te_ins.final.bed
│   ├── ONT_sample-1.te_ins.final.bed6
│   ├── ONT_sample-1.te_ins.gtf
│   ├── ONT_sample-1.te_ins.quant.stderr
│   ├── ONT_sample-1.te_ins.quant.stdout
│   ├── ONT_sample-1.te_ins.quant.txt
│   ├── ONT_sample-1.te_ins.quant.txt.summary
│   ├── ONT_sample-1.te_ins.raw.txt
│   ├── run_minimap2.dm6.stderr
│   └── run_minimap2.dm6_transposon.stderr
...

{sample_name}.te_ins.final.bed

column 1. chr name of reference 
column 2. start pos of Insertion 
column 3. end pos of Insertion 
column 4. insertion name 
column 5. a fixed integer [255]  
column 6. strand # in current version, not consider the dirction of TE insertions !!!
column 7. name of TE consensus 
column 8. length of TE consensus  
column 9. proportion of the TE consensus identified  
column 10. number of supported reads for the insertion 
column 11. number of all reads cover the insertion 
column 12. proportion TE supported reads 
column 13. type of the TE insertions [full, p3, p5]

{sample_name}.te_ins.raw.txt

column 16 (last column), is the type of TE insertions: [full, p3, p5]

  • full, more then cutoff [60%] of the TE consensus were detected
  • p3, only the 3' end of the TE consensus were detected
  • p5, only the 5' end of the TE consensus were detected

In the .final.bed file, ONLY full TE insertions were saved for further analysis

Change criteria

TE types were defined in run_pipe.sh by anno_te.py, the criteria -c 0.6 could be changed to [0-1] float number based on your condition. see line-100 in file run_pipe.sh

# line-100 of run_pipe.sh
[[ ! -f ${te_ins_txt} ]] && python ${src_dir}/anno_te.py -x ${te_fa_fai} ${te_bam} | sort -k4,4 -k5,5n > ${te_ins_txt}

# change criteria to 0.7
[[ ! -f ${te_ins_txt} ]] && python ${src_dir}/anno_te.py -x ${te_fa_fai} -c 0.7 ${te_bam} | sort -k4,4 -k5,5n > ${te_ins_txt}

# remove te_ins files, and run the command again
$ rm results/ONT_sample-1.te_ins*
$ bash find_te_ins/run_pipe.sh 
   
    / results

   

How it works?

  1. extract INSERTIONS
Owner
Ming Wang
Ming Wang
This is a multi-app executor that it used when we have some different task in a our applications and want to run them at the same time

This is a multi-app executor that it used when we have some different task in a our applications and want to run them at the same time. It uses SQLAlchemy for ORM and Alembic for database migrations.

Majid Iranpour 5 Apr 16, 2022
A Blender addon for VSE that auto-adjusts video strip's length, if speed effect is applied.

Blender VSE Speed Adjust Addon When using Video Sequence Editor in Blender, the speed effect strip doesn't auto-adjusts clip length when changing its

Arpit Srivastava 2 Jan 18, 2022
This app is to use algorithms to find the root of the equation

In this repository, I made an amazing app with tkinter python language and other libraries the idea of this app is to use algorithms to find the root of the equation I used three methods from numeric

Mohammad Al Jadallah 3 Sep 16, 2022
A python package template that can be adapted for RAP projects

Warning - this repository is a snapshot of a repository internal to NHS Digital. This means that links to videos and some URLs may not work. Repositor

NHS Digital 3 Nov 08, 2022
A simple spyware in python.

Spyware-Python- Dependencies: Python 3.x OpenCV PyAutoGUI PyMongo (for mongodb connection) Flask (Web Server) Ngrok (helps us push our fla

Abubakar 3 Sep 07, 2022
Create Arrays (Working with For Loops)

DSA with Python Create Arrays (Working with For Loops) CREATING ARRAYS WITH USER INPUT Array is a collection of items stored at contiguous memory loca

1 Feb 08, 2022
Iris-client - Python client for DFIR-IRIS

Python client dfir_iris_client offers a Python interface to communicate with IRI

DFIR-IRIS 11 Dec 22, 2022
API to summarize input text

summaries API to summarize input text normal run $ docker-compose exec web python -m pytest disable warnings $ docker-compose exec web python -m pytes

Brad 1 Sep 08, 2021
Usando Multi Player Perceptron e Regressão Logistica para classificação de SPAM

Relatório dos procedimentos executados e resultados obtidos. Objetivos Treinar um modelo para classificação de SPAM usando o dataset train_data. Class

André Mediote 1 Feb 02, 2022
Powerful virtual assistant in python

Virtual assistant in python Powerful virtual assistant in python Set up Step 1: download repo and unzip Step 2: pip install requirements.txt (if py au

Arkal 3 Jan 23, 2022
External Network Pentest Automation using Shodan API and other tools.

Chopin External Network Pentest Automation using Shodan API and other tools. Workflow Input a file containing CIDR ranges. Converts CIDR ranges to ind

Aditya Dixit 9 Aug 04, 2022
A Unified Framework for Hydrology

Unified Framework for Hydrology The Python package unifhy (Unified Framework for Hydrology) is a hydrological modelling framework which combines inter

Unified Framefork for Hydrology - Community Organisation 6 Jan 01, 2023
Hy - A dialect of Lisp that's embedded in Python

Hy Lisp and Python should love each other. Let's make it happen. Hy is a Lisp dialect that's embedded in Python. Since Hy transforms its Lisp code int

Hy Society 4.4k Jan 02, 2023
Fetch PRs from GitHub and analyze which ones are unmergeable

Set up token Generate a personal access token on GitHub. Add repo permissions. export GH_TOKEN="abcdefg" Pull PR data make Usually, GitHub doesn't h

Stefan van der Walt 1 Nov 05, 2021
Python project that aims to discover CDP neighbors and map their Layer-2 topology within a shareable medium like Visio or Draw.io.

Python project that aims to discover CDP neighbors and map their Layer-2 topology within a shareable medium like Visio or Draw.io.

3 Feb 11, 2022
Python Projects is an Open Source to enhance your python skills

Welcome! 👋🏽 Python Project is Open Source to enhance your python skills. You're free to contribute. 🤓 You just need to give us your scripts written

Tristán 6 Nov 28, 2022
Covid 19 status. Flask application. CovidAPI. Heroku.

Covid 19 In this project we see total count of people who got this virus and total death. How does it works Written in Python. Web app, Flask. package

AmirHossein Mohammadi 12 Jan 16, 2022
Python interface to ISLEX, an English IPA pronunciation dictionary with syllable and stress marking.

pysle Questions? Comments? Feedback? Pronounced like 'p' + 'isle'. An interface to a pronunciation dictionary with stress markings (ISLEX - the intern

Tim 38 Dec 14, 2022
Geodesic Dome Math

dome Geodesic Dome Math Python dome tool dome.py calculates an icosahedron or 2v geodesic dome and creates 3d printable hubs as OpenSCAD sources. usag

Brian Olson 2 Feb 09, 2022
Calibre Libgen Non-fiction / Sci-tech store plugin

CalibreLibgenSci A Libgen Non-Fiction/Sci-tech store plugin for Calibre Installation Download the latest zip file release from here Open Calibre Navig

IDDQD 9 Dec 27, 2022