Tuple-sum-filter - Library to play with filtering numeric sequences by sums of their pairs, triplets, etc. With a bonus CLI demo

Overview

Tuple Sum Filter

A library to play with filtering numeric sequences by sums of their pairs, triplets, etc.

Comes with a bonus CLI to demo the functionality.

Requires (and CI tests on) python 3.8 to 3.10. If you need to use python 3.7 then try replacing math.prod(some_iterable) with functools.reduce(lambda x, y: x * y, some_iterable)

Approach

We're thinking of this mostly as a library with the CLI as only for demo purposes. Ways you can see this in the code:

  • logging should really handled by the consumer,
    • our get_logger should be something that is passed into the lib
  • the CLI is pretty light on automated tests
  • we use pretty loose production dependency pinning
    • rather than pip freeze > requirements.txt of a deployed app
    • we want to keep things loose so that consumers can keep installing us alongside other things
    • we should probably set up tox/nox test runs against v.latest of our dependencies

Running the demo

in a fresh virtualenv (python>=3.8)

# install project and deps
pip install git+https://github.com/lbillingham/tuple-sum-filter.git

# create a suitable input file
echo "1721\n979\n366\n299\n675\n1456\n" > example.txt

# run the demo
filter_demo --input_file=example.txt --sum_target=2020 --dimension=2

you should see output like

checking for pairs of numbers that sum to 2020 in example.txt
Results pair (1721, 299) match: sum to 2020 and multiply to 514579

Consuming the library

The main filtering functions are pairs_that_sum_to and triplets_that_sum_to. They both have signatures (numbers: Sequence[int|float], sum_target: int|float) -> things_that_passed_the_filter list[tuple].

There is also a file-reading helper numbers_in_file exported at the top level.

Developing

Run the following to install the project (and dev dependencies) into your active virtualenv:

make dev_install

day-to-day development tasks can be orchestrated via make

  • dependency management
  • test/lint/typecheck running
  • coverage reporting
  • run make without any arguments to see a list

There is a CI suite which runs lint and test on several python versions. We don't run typechecking as a gate in CI because we think that turns a sometimes-useful tool into a Goodhart target.

Performance

We have not been optimizing for performance and it kind of shows.

When we run the benchmarking suite we see ~0.4 seconds fairly consistently for the triplet/3D problem.

We have at least 3 ideas of how to speed things up: several of them include dropping floating-point support.

$ make benchmark

tests/performance_check.py ..                                                                                                                                [100%]


------------------------------------------------------------------------------------- benchmark: 2 tests ------------------------------------------------------------------------------------
Name (time in ms)             Min                 Max                Mean            StdDev              Median               IQR            Outliers       OPS            Rounds  Iterations
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
test_input1_pairs          5.4665 (1.0)        6.2297 (1.0)        5.6687 (1.0)      0.1018 (1.0)        5.6575 (1.0)      0.1289 (1.0)          47;3  176.4077 (1.0)         172           1
test_input1_triplets     384.6154 (70.36)    386.5000 (62.04)    385.4776 (68.00)    0.8287 (8.14)     385.4333 (68.13)    1.5047 (11.67)         2;0    2.5942 (0.01)          5           1
---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

--

🍪 ✂️ cookiecut from lbillingham's python-cli-template

You might also like...
Linux GUI app to codon optimize many single-fasta files with coding sequences , using many taxonomy ids
Linux GUI app to codon optimize many single-fasta files with coding sequences , using many taxonomy ids

codon_optimize_cds_with_many_taxids_singlefasta Linux GUI app to codon optimize many single-fasta files with coding sequences, using many taxonomy ids

Ingestinator is my personal VFX pipeline tool for ingesting folders containing frame sequences that have been pulled and downloaded to a local folder

Ingestinator Ingestinator is my personal VFX pipeline tool for ingesting folders containing frame sequences that have been pulled and downloaded to a

Python Common things by Problem Fighter Library, (Exception, Debug Log, etc.)

In the name of God, the Most Gracious, the Most Merciful. PF-PY-Common Documentation Install and update using pip: pip install -U xxxx Please find the

Devil - Very Semple Auto Filter V1 Bot
Devil - Very Semple Auto Filter V1 Bot

Devil Very Semple Auto Filter V1 Bot

Cairo-bloom - A naive bloom filter implementation in Cairo

🥀 cairo-bloom A naive bloom filter implementation in Cairo. A Bloom filter is a

Snakemake worflow to process and filter long read data from Oxford Nanopore Technologies.
Snakemake worflow to process and filter long read data from Oxford Nanopore Technologies.

Nanopore-Workflow Snakemake workflow to process and filter long read data from Oxford Nanopore Technologies. It is designed to compare whole human gen

Runnable Python demo of ArtLine

artline-demo How to run? pip3 install -r requirements.txt python3 app.py How to use? Run the Flask app Open localhost:5000 in browser Select an image(

Tiny demo site for exploring SameSite=Lax

samesite-lax-demo Background on my blog: Exploring the SameSite cookie attribute for preventing CSRF This repo holds some tools for exploring the impl

An extended version of the hotkeys demo code using action classes

An extended version of the hotkeys application using action classes. In adafruit's Hotkeys code, a macro is using a series of integers, assumed to be

Comments
  • Perf: :zap:  merge if you want to go faster but don't need float support

    Perf: :zap: merge if you want to go faster but don't need float support

    This moves away from the shared itertools implimentations for finding pairs, triplets of the input numbers that sum to a given target.

    Instead, we

    • 1st expose the underlying $~O^{dimensions}$ nested loops
    • trade some extra memory and some $O^{1}$ lookups to give us $~O^{dimensions-1}$
    • get a >= 170x speedup in our benchmarks

    However, we:

    • loose the ability to properly work with floating point input
      • the fast lookup uses hasing and hashing floats gets weird due to floating point equality
    • can't trivially extend to higher-dimension problems: 4-element-tuples etc.

    I've moved the float input tests out the their own file and away from the CI test path

    Performance benchmarks

    with these changes:

    $ make benchmark
    tests/performance_check.py ..                                                                                                                             [100%]
    
    -------------------------------------------------------------------------------------------- benchmark: 2 tests --------------------------------------------------------------------------------------------
    Name (time in us)               Min                   Max                  Mean              StdDev                Median                 IQR            Outliers          OPS            Rounds  Iterations
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    test_input1_pairs           22.1660 (1.0)        166.3790 (1.0)         23.6089 (1.0)        5.0170 (1.0)         23.0000 (1.0)        0.5123 (1.0)        87;521  42,356.8183 (1.0)        7677           1
    test_input1_triplets     1,994.8000 (89.99)    3,561.1120 (21.40)    2,152.1272 (91.16)    204.1428 (40.69)    2,033.1040 (88.40)    299.1878 (584.07)       41;4     464.6565 (0.01)        341           1
    ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    

    itertools, but non-float supporting version

    $ make benchmark
    tests/performance_check.py ..                                                                                                      [100%]
    
    ------------------------------------------------------------------------------------- benchmark: 2 tests ------------------------------------------------------------------------------------
    Name (time in ms)             Min                 Max                Mean            StdDev              Median               IQR            Outliers       OPS            Rounds  Iterations
    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    test_input1_pairs          2.8727 (1.0)        4.2386 (1.0)        3.1265 (1.0)      0.1638 (1.0)        3.1067 (1.0)      0.1888 (1.0)          78;9  319.8414 (1.0)         326           1
    test_input1_triplets     211.6325 (73.67)    213.3950 (50.35)    212.4042 (67.94)    0.6555 (4.00)     212.2717 (68.33)    0.8081 (4.28)          2;0    4.7080 (0.01)          5           1
    ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
    =========================================================== 2 passed in 3.59s ============================================================
    

    Note the change in units this branch is in microseconds, the itertools version is in milliseconds

    opened by lbillingham 0
Releases(v0.0.1)
  • v0.0.1(Feb 17, 2022)

    Initial release. Lib allows filtering by sum over pairs and triplets of numbers loaded from a local file. Plus a bonus CLI app that can be used for demoing the lib.

    Solution is itertools-y and rather slow (probably $O^{n}$ where pairs->n=2 and triplets->n=3).

    This is the version shown to MM

    Source code(tar.gz)
    Source code(zip)
Owner
Laurence Billingham
+ sustainable software + data science
Laurence Billingham
A collection of tips for using MISP.

MISP Tip of the Week A collection of tips for using MISP. Published via BelgoMISP (todo) and this repository. Available in MD and JSON. Do you want to

Koen Van Impe 52 Jan 07, 2023
Mpis-ex7 - Implementation of tasks 1, 2, 3 for Metody Probabilistyczne i Statystyka Lista 7

Implementations of task 1, 2 and 3 from here Author: Maciej Bazela Index: 261743 Each task was implemented in Python 3. I've used Cython to speed up e

Maciej Bazela 1 Feb 27, 2022
A Tandy Color Computer 1, 2, and 3 assembler written in Python

CoCo Assembler and File Utility Table of Contents What is it? Requirements License Installing Assembler Assembler Usage Input File Format Print Symbol

Craig Thomas 16 Nov 03, 2022
FindUncommonShares.py is a Python equivalent of PowerView's Invoke-ShareFinder.ps1 allowing to quickly find uncommon shares in vast Windows Domains.

FindUncommonShares The script FindUncommonShares.py is a Python equivalent of PowerView's Invoke-ShareFinder.ps1 allowing to quickly find uncommon sha

Podalirius 184 Jan 03, 2023
Python wrapper around Apple App Store Api

App Store Connect Api This is a Python wrapper around the Apple App Store Api : https://developer.apple.com/documentation/appstoreconnectapi So far, i

123 Jan 06, 2023
Collections of python projects

nppy, mostly contains projects written in Python. Some projects are very simple while some are a bit lenghty and difficult(for beginners) Requirements

ghanteyyy 75 Dec 20, 2022
Ultimate Score Server for RealistikOsu

USSR Ultimate Score Server for RealistikOsu (well not just us but it makes the acronym work.) Also I wonder how long this name will last. What is this

RealistikOsu! 15 Dec 14, 2022
WATTS provides a set of Python classes that can manage simulation workflows for multiple codes where information is exchanged at a coarse level

WATTS (Workflow and Template Toolkit for Simulation) provides a set of Python classes that can manage simulation workflows for multiple codes where information is exchanged at a coarse level.

13 Dec 23, 2022
A wrapper around the python Tkinter library for customizable and modern ui-elements in Tkinter

CustomTkinter With CustomTkinter you can create modern looking user interfaces in python with tkinter. CustomTkinter is a tkinter extension which prov

4.9k Jan 02, 2023
HOWTO: Downgrade from nYNAB to YNAB4

HOWTO: Downgrade from nYNAB to YNAB4 This page explains how to move from nYNAB to YNAB4 while retaining as much information as possible. See Appendix

Tobias Kunze 10 Dec 29, 2022
Experimental proxy for dumping the unencrypted packet data from Brawl Stars (WIP)

Brawl Stars Proxy Experimental proxy for version 39.99 of Brawl Stars. It allows you to capture the packets being sent between the Brawl Stars client

4 Oct 29, 2021
Simple calculator made in python

calculator Uma alculadora simples feita em python CMD, PowerShell, Bash ✔️ Início 💻 apt-get update apt-get upgrade -y apt-get install python git git

Spyware 8 Dec 28, 2021
Moleey Panel with python 3

Painel-Moleey pkg upgrade && pkg update pkg install python3 pip install pyfiglet pip install colored pip install requests pip install phonenumbers pkg

Moleey. 1 Oct 17, 2021
Thinky nature dots with python

Thinky Nature Welcome to my rice dotfiles of my ThinkPad X230 You surely will need to change some configs to fit your files and stuff and maybe tweak

Daniel Mironow 26 Dec 02, 2022
Simple Python tool to check if there is an Office 365 instance linked to a domain.

o365chk.py Simple Python script to check if there is an Office365 instance linked to a particular domain.

Steven Harris 37 Jan 02, 2023
Windows symbol tables for Volatility 3

Windows Symbol Tables for Volatility 3 This repository is the Windows Symbol Table storage for Volatility 3. How to Use $ git clone https://github.com

JPCERT Coordination Center 31 Dec 25, 2022
A basic layout of atm working of my local database

Software for working Banking service 😄 This project was developed for Banking service. mysql server is required To have mysql server on your system u

satya 1 Oct 21, 2021
Pre-1.0 door/chest sound injector for Minecraft

doorjector Pre-1.0 door/chest sound injector for Minecraft. While the game is running, doorjector hotswaps the new sounds for the old right before the

Sam 1 Nov 20, 2021
A python package for batch import of resume attachments to be parsed in HrFlow.

HrFlow Importer Description A python package for batch import of resume attachments to be parsed in HrFlow. hrflow-importer is an open-source project

HrFlow.ai (ex: Riminder.net) 3 Nov 15, 2022
Python script to commit to your github for a perfect commit streak. This is purely for education purposes, please don't use this script to do bad stuff.

Daily-Git-Commit Commit to repo every day for the perfect commit streak Requirments pip install -r requirements.txt Setup Download this repository. Cr

JareBear 34 Dec 14, 2022