Collapse a set of redundant kmers to use IUPAC degenerate bases

Overview

kmer-collapse

Collapse a set of redundant kmers to use IUPAC degenerate bases

Overview

Given an input set of kmers, find the smallest set of kmers that encapsulates all diversity in the input set using IUPAC degenerate bases. This aims to solve the problem described here: https://www.biostars.org/p/9498272/

Usage

Install the marisa-trie library, if necessary.

Modify the script's input variable to specify desired sequences, and then run the script:

$ python kmer-collapse.py
{
    "input": [
        "AAAAAAAAAA",
        "TAAAAAAAAA",
        "ACAAAAAAAA",
        "AGAAAAAAAA"
    ],
    "encoded_output": [
        "WAAAAAAAAA",
        "ASAAAAAAAA"
    ]
}

Notes

This has not been tested with any kmer sets but those examples provided. However, it aims to be scalable by pruning combinations of sub-kmers along the way, which would otherwise yield incorrect encodings. This also uses a trie for faster prefix testing. If futher performance is needed, some easy wins would be to cache sub-kmer tests, since most of these test outcomes would be redundant.

Additionally, no error checking is done on the input kmer alphabet or on the consistency of kmer lengths. It may be useful to validate input before using this script.

Examples

These examples are available from the script by uncommenting the relevant input.

A

{
    "input": [
        "AAAAAAAAAA",
        "TAAAAAAAAA"
    ],
    "encoded_output": [
        "WAAAAAAAAA"
    ]
}

B

{
    "input": [
        "AAAAAAAAAA",
        "TAAAAAAAAA",
        "GCGAAAAAAA"
    ],
    "encoded_output": [
        "GCGAAAAAAA",
        "WAAAAAAAAA"
    ]
}

C

{
    "input": [
        "AAAAAAAAAA"
    ],
    "encoded_output": [
        "AAAAAAAAAA"
    ]
}

D

{
    "input": [
        "AAAAAAAAAA",
        "TAAAAAAAAA",
        "CAAAAAAAAA",
        "GAAAAAAAAA"
    ],
    "encoded_output": [
        "NAAAAAAAAA"
    ]
}

E

{
    "input": [
        "AAAAAAAAAA",
        "TAAAAAAAAA",
        "TTAAAAAAAA",
        "ATAAAAAAAA"
    ],
    "encoded_output": [
        "WWAAAAAAAA"
    ]
}

F

{
    "input": [
        "AAAAAAAAAA",
        "TAAAAAAAAA",
        "CAAAAAAAAA",
        "GAAAAAAAAA",
        "TACAGATACA",
        "AACAGAAAAA"
    ],
    "encoded_output": [
        "NAAAAAAAAA",
        "TACAGATACA",
        "AACAGAAAAA"
    ]
}

G

{
    "input": [
        "AAAAAAAAAA",
        "TAAAAAAAAA",
        "ACAAAAAAAA",
        "AGAAAAAAAA"
    ],
    "encoded_output": [
        "ASAAAAAAAA",
        "WAAAAAAAAA"
    ]
}
Owner
Alex Reynolds
Pug caregiver, curler, cyclist, gardener, beginning French scholar
Alex Reynolds
A way to write regex with objects instead of strings.

Py Idiomatic Regex (AKA iregex) Documentation Available Here An easier way to write regex in Python using OOP instead of strings. Makes the code much

Ryan Peach 18 Nov 15, 2021
Program Input Data Mahasiswa Oop

PROGRAM INPUT NILAI MAHASISWA MENGGUNAKAN OOP PENGERTIAN OOP object-oriented-programing/OOP adalah paradigma pemrograman berdasarkan konsep "objek", y

Maulana Reza Badrudin 1 Jan 05, 2022
This Program Automates The Procces Of Adding Camos On Guns And Saving Them On Modern Warfare Guns

This Program Automates The Procces Of Adding Camos On Guns And Saving Them On Modern Warfare Guns

Flex Tools 6 May 26, 2022
Vaccine for STOP/DJVU ransomware, prevents encryption

STOP/DJVU Ransomware Vaccine Prevents STOP/DJVU Ransomware from encrypting your files. This tool does not prevent the infection itself. STOP ransomwar

Karsten Hahn 16 May 31, 2022
Web app to find your chance of winning at Texas Hold 'Em

poker_mc Web app to find your chance of winning at Texas Hold 'Em A working version of this project is deployed at poker-mc.ue.r.appspot.com. It's run

Aadith Vittala 7 Sep 15, 2021
A QGIS integration plugin for Kart repositories

QGIS Kart Plugin A plugin to work with Kart repositories Installation The Kart plugin is available in the QGIS Plugins server. To install the latest v

Koordinates 27 Jan 04, 2023
pythonOS: An operating system kernel made in python and assembly

pythonOS An operating system kernel made in python and assembly Wait what? It uses a custom compiler called snek that implements a part of python3.9 (

Abbix 69 Dec 23, 2022
Simple kivy project to help new kivy users build android apps with python.

Kivy Calculator A Simple Calculator made with kivy framework.Works on all platforms from Windows/linux to android. Description Simple kivy project to

Oussama Ben Sassi 6 Oct 06, 2022
Improving Representations via Similarities

embetter warning I like to build in public, but please don't expect anything yet. This is alpha stuff! notes Improving Representations via Similaritie

vincent d warmerdam 229 Jan 08, 2023
Two predictive attributes (Speed and Angle) and one attribute target (Power)

Two predictive attributes (Speed and Angle) and one attribute target (Power). A container crane has the function of transporting containers from one point to another point. The difficulty of this tas

Astitva Veer Garg 1 Jan 11, 2022
A toolkit for developing and deploying serverless Python code in AWS Lambda.

Python-lambda is a toolset for developing and deploying serverless Python code in AWS Lambda. A call for contributors With python-lambda and pytube bo

Nick Ficano 1.4k Jan 03, 2023
Add all JuliaLang unicode abbreviations to AutoKey.

Autokey Unicode characters Usage This script adds all the unicode character abbreviations supported by Julia to autokey. However, instead of [TAB], th

Randolf Scholz 49 Dec 02, 2022
An attempt at furthering Factorio Calculator to work in more general contexts.

factorio-optimizer Lets do Factorio Calculator but make it optimize. Why not use Factorio Calculator? Becuase factorio calculator is not general. The

Jonathan Woollett-Light 1 Jun 03, 2022
Active Transport Analytics Model: A new strategic transport modelling and data visualization framework

{ATAM} Active Transport Analytics Model Active Transport Analytics Model (“ATAM”

ATAM Analytics 2 Dec 21, 2022
A Python library to simulate a Zoom H6 recorder remote control

H6 A Python library to emulate a Zoom H6 recorder remote control Introduction This library allows you to control your Zoom H6 recorder from your compu

Matias Godoy 68 Nov 02, 2022
UniPD exam dates finder

UniPD exam dates finder Find dates for exams at UniPD Usage ./finder.py courses.csv It's suggested to save output to a file: ./finder.py courses.csv

Davide Peressoni 1 Jan 25, 2022
OpenTable Reservation Maker For Python

OpenTable-Reservation-Maker The code that corresponds with this blog post on writing a script to make reservations for me on opentable Getting started

JonLuca De Caro 36 Nov 10, 2022
Simple web application, which has a single endpoint, dedicated to annotation parsing and convertion.

Simple web application, which has a single endpoint, dedicated to annotation parsing and conversion.

Pavel Paranin 1 Nov 01, 2021
The LiberaPay archive module for the SeanPM life archive project.

By: Top README.md Read this article in a different language Sorted by: A-Z Sorting options unavailable ( af Afrikaans Afrikaans | sq Shqiptare Albania

Sean P. Myrick V19.1.7.2 1 Aug 26, 2022
The Python agent for Apache SkyWalking

SkyWalking Python Agent SkyWalking-Python: The Python Agent for Apache SkyWalking, which provides the native tracing abilities for Python project. Sky

The Apache Software Foundation 149 Dec 12, 2022