Discovering local read-level DNA methylation patterns and DNA methylation heterogeneity in intermediately methylated regions

Overview

MeConcord

  • MeConcord is a method used to investigate local read-level DNA methylation patterns for intermediately methylated regions with bisulfite sequencing data.
  • Intermediately methylated regions occupy a significant fraction of the whole genome and are markedly associated with epigenetic regulations or cell-type deconvolution of bulk data. However, these regions show distinct methylation patterns corresponding to different biological mechanisms. Although there have been some metrics developed for investigating these regions, the poor perfor-mance in antagonizing noises limits the utility for distinguishing distinct methylation patterns.
  • We proposed a method, MeConcord, with two metrics measuring local methylation concordance across reads and CpGs, respectively, with Hamming distance. MeConcord showed the most robust performance in distinguishing distinct methylation patterns (identical, uniform, and disor-dered) compared with other metrics.

Installation

  • MeConcord is implemented by Python and compatible with both Python 2 and Python 3.
  • Modules of python are required:pysam(if the input is .bam files), pandas,numpy, scipy,multiprocessing.
  • The scripts could be downloaded and used directly with command python *.py -i ....

Usage

Input

MeConcord currently only accept the output(.bam or converted to .sam) of Bismark (https://github.com/FelixKrueger/Bismark/blob/master/README.md)

Run

1.Obtaining CpG positions across genome

Usage: python pre_cpg_pos.py -i hg38.fa -o ./cpg_pos/

  • i, The path to reference sequences (.fa);
  • o, The path that you want to deposit the positions of CpG sites, each chromosome has a seperate file;
  • h, Help information

2.Converting mapped Bam, Sam, Sam.gz files from Bismark to methylation recordings read-by-read

Usage: python s1_bamToMeRecord.py -i test.bam -o test -c 0

  • i, The path to input files (.bam or .sam or .sam.gz);
  • o, Output prefix;
  • c, Clipping read ends with such base number (defalut 0); can be used when sequencing quality of read ends is not good. such as -c 5 to remove 5 bases from the both ends of the reads.
  • h, Help information

3.Spliting the big MeRecord files into small files of each chromosome to redude memory requirements in the next step

Usage: python s2_RecordSplit.py -i ./test_ReadsMethyAndMuts.txt -o ./test -g chr1,chr2,chr3,chr4,chr5

  • i, The path to s1 output. ( end with _ReadsMethyAndMuts.txt);
  • o, Output prefix;
  • g, Chromosomes used; (default chromsome 1-22); chromosomes shoud be seperated by comma;
  • h, Help information

4. Calculating concordance metrics (NRC, NCC and P-values)

Usage: python s3_RecordToMeConcord.py -p 4 -i ./test -o ./test -r ./region.bed -c ./cpgpos/ -b 150 -m 600 -z 0 -g chr1,chr2,chr3

  • i, The path to s2_RecordSplit.py output, with prefixed file name;
  • p, Threads used for parallel computation; default is 4;
  • o, Output prefix;
  • r, The files with genomic regions for computation, chrom, start, end seperated by tab;
  • c, Cpg position folder, output of pre_cpg_pos.py;
  • b, Bin size (default 150bp);
  • z, Whether is the genomic file based on 0; 0 (default) or 1; output is same to input bins; if -r is a bed file, -z should be 1;
  • g, Chromosomes used; (default chromsome 1-22); chromosomes shoud be seperated by comma;
  • m, Maximum of fragement length in sequencing library(default 600bp for paired-end reads). if there are single-end reads,m should be set as the length of reads, if not sure, default will work for most cases;

5. Methylation recordings to methylation matrix (optional)

Usage: python s4_RecordToMeMatrix.py -i ./test -o ./test -r ./p1.bed -c ./cpgpos/ -m 600 -z 0 -g chr1,chr2

  • i, The path to s2_RecordSplit.py output, with prefixed file name;
  • o, Output prefix;
  • r, The files with genomic regions for computation, chrom, start, end seperated by tab;
  • c, Cpg position folder, output of pre_cpg_pos.py;
  • z, Whether is the genomic file based on 0; 0 (default) or 1; output is same to input bins; if -r is a bed file, -z should be 1;
  • g, Chromosomes used; (default chromsome 1-22); chromosomes shoud be seperated by comma;
  • m, Maximum of reads length (default 600bp for paired-end reads). if there are single-end reads,m should be set length of reads, if not sure, default will work for most cases;

6. Visualization of methylation matrix (optional)

Usage: visualization_Matlab.m

  • Open this script and edit

    • path_to_matrix as the path you deposit the MeMatrix;
    • path_to_cpgPos as the path you deposit CpG positions of the genome, which is the result of pre_cpg_pos.py;
    • name as the name of MeMatrix, for example 'test_chr1_1287967_1288117';
  • Output: two lollipop plots, one without considering distance between CpGs, one considering distance between CpGs.

    • unmethylated CpGs are labeled as light blue
    • CpGs without signal are labeled as grey
    • methylated CpGs are labeled as dark red

Test for an example

  • STEP 1 python s1_bamToMeRecord.py -i ./test/GM12878_chr1_1286017_1294783.bam -o ./test/test -c 2 or python s1_bamToMeRecord.py -i ./test/GM12878_chr1_1286017_1294783.sam -o ./test/test -c 2 if there is no pysam module on Windows

    • The error that Could not retrieve index file for './test/GM12878_chr1_1286017_1294783.bam' doesn't affect the results.
    • Please check if there is an output in test folder, test_ReadsMethyAndMuts.txt. If yes, it works.
  • STEP 2 python s2_RecordSplit.py -i ./test/test_ReadsMethyAndMuts.txt -o ./test/test -g chr1

    • Please check if there is an output in test folder, test_ReadsMethyAndMuts_chr1.txt. If yes, it works.
  • STEP 3 python s3_RecordToMeConcord.py -p 1 -i ./test/test -o ./test/test -r ./test/tmp1.bed -c ./test/ -b 150 -m 600 -z 1 -g chr1

    • Please check if there is an output in test folder, test_MeConcord.txt. If yes, it works.
  • STEP 4 python s4_RecordToMeMatrix.py -i ./test/test -o ./test/test -r ./test/tmp2.bed -c ./test/ -m 600 -z 1 -g chr1

    • Please check if there is two output files in test folder, test_chr1_1287967_1288117_me.txt; test_chr1_1287967_1288117_unme.txt. If yes, it works.
Owner
omics tools,especially for DNA methylation
Pengenalan para anggota KOMPETEGRAM

Pengenalan Anggota KOMPETEGRAM Apa isi repositori ini ? 💬 Repositori ini berisi pengenalan nama anggota KOMPETEGRAM dari seluruh angkatan atau Batch.

Repositori KOMPETEGRAM 7 Sep 17, 2022
PREFS is a Python library to store and manage preferences and settings.

PREFS PREFS is a Python library to store and manage preferences and settings. PREFS stores a Python dictionary in a total human-readable file, the PRE

Pat 13 May 26, 2022
Data on Free Food at MIT

MIT Free Food Timing Procrastinating research by plotting data on how long it takes emails on the free-food at mit edu mailing list to go through. Dat

Peter Sharpe 2 Nov 01, 2021
Cross-platform .NET Core pre-commit hooks

dotnet-core-pre-commit Cross-platform .NET Core pre-commit hooks How to use Add this to your .pre-commit-config.yaml - repo: https://github.com/juan

Juan Odicio 5 Jul 20, 2021
The best way to learn Python is by practicing examples. The repository contains examples of basic concepts of Python. You are advised to take the references from these examples and try them on your own.

90_Python_Exercises_and_Challenges The best way to learn Python is by practicing examples. This repository contains the examples on basic and advance

Milaan Parmar / Милан пармар / _米兰 帕尔马 205 Jan 06, 2023
It was created to conveniently respond to events such as donation, follow, and hosting using the Alert Box provided by twip to streamers

This library is not an official library of twip. It was created to conveniently respond to events such as donation, follow, and hosting using the Alert Box provided by twip to streamers.

junah201 8 Nov 19, 2022
My attempt at this years Advent of Code!

Advent-of-code-2021 My attempt at this years Advent of Code! day 1: ** day 2: ** day 3: ** day 4: ** day 5: ** day 6: ** day 7: ** day 8: * day 9: day

1 Jul 06, 2022
CaskDB is a disk-based, embedded, persistent, key-value store based on the Riak's bitcask paper, written in Python.

CaskDB - Disk based Log Structured Hash Table Store CaskDB is a disk-based, embedded, persistent, key-value store based on the Riak's bitcask paper, w

886 Dec 27, 2022
A tool that bootstraps your dotfiles ⚡️

Dotbot Dotbot makes installing your dotfiles as easy as git clone $url && cd dotfiles && ./install, even on a freshly installed system! Rationale Gett

Anish Athalye 5.9k Jan 07, 2023
Batch generate asset browser previews

When dealing with hundreds of library files it becomes tedious to mark their contents as assets. Using python to automate the process is a perfect fit

54 Dec 24, 2022
LSO, also known as Linux Swap Operator, is a software with both GUI and terminal versions that you can manage the Swap area for Linux operating systems.

LSO - Linux Swap Operator Türkçe - LSO Nedir? LSO, diğer adıyla Linux Swap Operator Linux işletim sistemleri için Swap alanını yönetebileceğiniz hem G

Eren İnce 4 Feb 09, 2022
Bazel rules to install Python dependencies with Poetry

rules_python_poetry Bazel rules to install Python dependencies from a Poetry project. Works with native Python rules for Bazel. Getting started Add th

Martin Liu 7 Dec 15, 2021
These are my solutions to Advent of Code problems.

Advent of Code These are my solutions to Advent of Code problems. If you want to join my leaderboard, the code is 540750-9589f56d. When I solve for sp

Sumner Evans 5 Dec 19, 2022
🤡 Multiple Discord selfbot src deobfuscated !

Deobfuscated selfbot sources About. If you whant to add src, please make pull requests. If you whant to deobfuscate src, send mail to

Sreecharan 5 Sep 13, 2021
For Tok-k passages that have passed through the Bi-Encoder Retrival, ReRank is performed using CrossEncoder.

Cross-Encoder-with-Bi-Encoder For Tok-k passages that have passed through the Bi-Encoder Retrival, ReRank is performed using CrossEncoder. Data Data u

7 Feb 09, 2022
An universal linux port of deezer, supporting both Flatpak and AppImage

Deezer for linux This repo is an UNOFFICIAL linux port of the official windows-only Deezer app. Being based on the windows app, it allows downloading

Aurélien Hamy 154 Jan 06, 2023
A script that convert WiiU BotW mods to Switch

UltimateBoTWConverter A script that convert WiiU BotW mods to Switch. It uses every resource I could find under the sun that allows for conversion, wi

11 Nov 08, 2022
A simple script for generating screenshots with Vapoursynth

Vapoursynth-Screenshots A simple script for generating screenshots with Vapoursynth. About I'm lazy, and hate changing variables for each batch of scr

7 Dec 31, 2022
Code and data for learning to search in local branching

Code and data for learning to search in local branching

Defeng Liu 7 Dec 06, 2022
Bookmarkarchiver - Python script that archives all of your bookmarks on the Internet Archive

bookmarkarchiver Python script that archives all of your bookmarks on the Internet Archive. Supports all major browsers. bookmarkarchiver uses the off

Anthony Chen 3 Oct 09, 2022