Jug: A Task-Based Parallelization Framework

Overview

Jug: A Task-Based Parallelization Framework

Jug allows you to write code that is broken up into tasks and run different tasks on different processors.

https://travis-ci.com/luispedro/jug.png Join the chat at https://gitter.im/luispedro/jug

It uses the filesystem to communicate between processes and works correctly over NFS, so you can coordinate processes on different machines.

Jug is a pure Python implementation and should work on any platform.

Python versions 3.5 and above are supported.

Website: http://luispedro.org/software/jug

Documentation: https://jug.readthedocs.org/

Video: On vimeo or showmedo

Mailing List: http://groups.google.com/group/jug-users

Testimonials

"I've been using jug with great success to distribute the running of a reasonably large set of parameter combinations" - Andreas Longva

Install

You can install Jug with pip:

pip install Jug

or use, if you are using conda, you can install jug from conda-forge using the following commands:

conda config --add channels conda-forge
conda install jug

Citation

If you use Jug to generate results for a scientific publication, please cite

Coelho, L.P., (2017). Jug: Software for Parallel Reproducible Computation in Python. Journal of Open Research Software. 5(1), p.30.

http://doi.org/10.5334/jors.161

Short Example

Here is a one minute example. Save the following to a file called primes.py (if you have installed jug, you can obtain a slightly longer version of this example by running jug demo on the command line):

from jug import TaskGenerator
from time import sleep

@TaskGenerator
def is_prime(n):
    sleep(1.)
    for j in range(2,n-1):
        if (n % j) == 0:
            return False
    return True

primes100 = [is_prime(n) for n in range(2,101)]

This is a brute-force way to find all the prime numbers up to 100. Of course, this is only for didactical purposes, normally you would use a better method. Similarly, the sleep function is so that it does not run too fast. Still, it illustrates the basic functionality of Jug for embarassingly parallel problems.

Type jug status primes.py to get:

Task name                  Waiting       Ready    Finished     Running
----------------------------------------------------------------------
primes.is_prime                  0          99           0           0
......................................................................
Total:                           0          99           0           0

This tells you that you have 99 tasks called primes.is_prime ready to run. So run jug execute primes.py &. You can even run multiple instances in the background (if you have multiple cores, for example). After starting 4 instances and waiting a few seconds, you can check the status again (with jug status primes.py):

Task name                  Waiting       Ready    Finished     Running
----------------------------------------------------------------------
primes.is_prime                  0          63          32           4
......................................................................
Total:                           0          63          32           4

Now you have 32 tasks finished, 4 running, and 63 still ready. Eventually, they will all finish and you can inspect the results with jug shell primes.py. This will give you an ipython shell. The primes100 variable is available, but it is an ugly list of jug.Task objects. To get the actual value, you call the value function:

In [1]: primes100 = value(primes100)

In [2]: primes100[:10]
Out[2]: [True, True, False, True, False, True, False, False, False, True]

What's New

Version 2.1.1 (18 March 2021)

  • Include requirements files in distribution

Version 2.1.0 (18 March 2021)

  • Improvements to webstatus (by Robert Denham)
  • Removed Python 2.7 support
  • Fix output encoding for Python 3.8
  • Fix bug mixing mapreduce() & status --cache
  • Make block_access (used in mapreduce()) much faster (20x)
  • Fix important redis bug
  • More precise output in cleanup command

Version 2.0.2 (Thu Jun 11 2020)

  • Fix command line argument parsing

Version 2.0.1 (Thu Jun 11 2020)

  • Fix handling of JUG_EXIT_IF_FILE_EXISTS environmental variable
  • Fix passing an argument to jug.main() function
  • Extend --pdb to exceptions raised while importing the jugfile (issue #79)

version 2.0.0 (Fri Feb 21 2020)

  • jug.backend.base_store has 1 new method 'listlocks'
  • jug.backend.base_lock has 2 new methods 'fail' and 'is_failed'
  • Add 'jug execute --keep-failed' to preserve locks on failing tasks.
  • Add 'jug cleanup --failed-only' to remove locks from failed tasks
  • 'jug status' and 'jug graph' now display failed tasks
  • Check environmental exit variables by default (suggested by Renato Alves, issue #66)
  • Fix 'jug sleep-until' in the presence of barrier() (issue #71)

For older version see ChangeLog file or the full history.

Comments
  • Jug doesn't like jobs when they're ended by SIGTERM

    Jug doesn't like jobs when they're ended by SIGTERM

    There seems to be a weird interaction when you run jug cleanup while jobs are still running, but I could be wrong, but in addition, when a job is ended by sigterm jobs seem to enter the completed stage and never finish.

    opened by danpf 12
  • Implement a wrapper to run jug with gridmap

    Implement a wrapper to run jug with gridmap

    A light-weight wrapper to run jug with gridmap on a Grid Engine cluster

    Get the best of both worlds of gridmap and jug

    • http://pythonhosted.org/Jug/
    • http://gridmap.readthedocs.org/en/latest/

    This implements jug.grid_jug to run jug jobs with all the comfortable features of gridmap.

    As I am not sure whether this belongs into the gridmap or the jug package, I create a pull request for both to foster discussion between the two projects. The file jug/grid.py is identical with the file gridmap/jug.py .

    opened by andsor 9
  • Processes only start working after checking status

    Processes only start working after checking status

    When using subprocess.Popen(["jug", "execute", "exmp.py"], stderr=STDOUT, creationflags=CREATE_NEW_CONSOLE) runing jug process, the subprocess will only starts after checking status. The same thing happens when running by subprocess.Popen(["jug", "execute", "exmp.py"], stderr=STDOUT, shell=True)

    This problem doesn't appear when using subprocess.Popen(["jug", "execute", "exmp.py"], stderr=STDOUT)

    Would you mind to help?

    opened by chenxi-shi 7
  • Failed to assign tasks correctly

    Failed to assign tasks correctly

    Hi, I wrote a small piece of python code on a macbook which has 10 + 1 tasks. 10 similar sleeping task and 1 joint task. I am running it by 4 processes and found that sometime Jug cannot assign task correctly. It happens like one process does all tasks along and, at the same time, another 3 share the 11 tasks.

    opened by chenxi-shi 6
  • Raises error if jugdir is not defined

    Raises error if jugdir is not defined

    If jugdir is not provided on command-line to "jug execute", and there is no jug config file, the program errors out, since this is prior to getting to the call to init() which handles the missing jugdir.

    Just define a default:

    parser.add_option('--jugdir',
                    action='store',
                    dest='jugdir',
                    default='jugdir', # define default value
                    help='Where to save intermediate results')
    
    opened by rjplevin 6
  • IterateTask as a decorator

    IterateTask as a decorator

    I have been using a decorator to provide the functionality of iteratetask instead of the function.

    The reason I like this is that it specifies how to handle the output (i.e. how many items in the tuple) closer to the place the output is specified (i.e. at the end of the function) rather than where the workflow is specified (which is usually a different file).

    So it looks like:

    @IterateTask(n=3)
    @jug.TaskGenerator
    def do_thing(args):
        return a, b, c
    

    I've been using the following for a few weeks without incident:

    import jug
    from functools import wraps
    
    def IterateTask(n):
        def decorator(f):
             @wraps(f)
             def wrapper(*args, **kwargs):
                 return jug.iteratetask(f(*args, **kwargs), n=n)
             return wrapper
        return decorator
    

    Is this something you'd be interested in adding? Is there any reason I haven't foreseen that this isn't a good idea?

    opened by justinrporter 5
  • The right way to use Jug with class object

    The right way to use Jug with class object

    Hi, I defined a class which has a init. There are some methods in this class is decorated with @TaskGenerator. These methods cannot get class instance correctly. I tried 2 ways to solve it and they all don't work.

    1. decorate init with @TaskGenerator then init() return's a jug Task instead of None
    2. put a barrier() after defining the class object then got TypeError: cannot serialize '_io.TextIOWrapper' object

    Would you mind to tell me the correct way to do it?

    opened by chenxi-shi 5
  • Backwards compatibility fixes

    Backwards compatibility fixes

    sys.argv now contains jugfile and unparsed arguments (i.e. after --) The subcommand api tests were also leaking causing other parts of the testsuite to fail.

    opened by unode 5
  • `RuntimeError: maximum recursion depth exceeded while calling a Python object`

    `RuntimeError: maximum recursion depth exceeded while calling a Python object`

    Came across this error today:

    CRITICAL:root:Exception while running tangent_lstm.test_lstm: maximum recursion depth exceeded while calling a Python object
    Traceback (most recent call last):
      File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/bin/jug", line 5, in <module>
        main()
      File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/jug.py", line 415, in main
        execute(options)
      File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/jug.py", line 254, in execute
        execution_loop(tasks, options)
      File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/jug.py", line 163, in execution_loop
        t.run(debug_mode=options.debug)
      File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/task.py", line 95, in run
        self.store.dump(self._result, name)
      File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/backends/file_store.py", line 139, in dump
        encode_to(object, output)
      File "/gpfs/main/home/skainswo/Research/kaggle_gal/venv/local/lib/python2.7/site-packages/jug/backends/encode.py", line 77, in encode_to
        write(object, stream)
    RuntimeError: maximum recursion depth exceeded while calling a Python object
    

    I'm getting this error when creating a Task that wraps a function which returns a Keras model.

    opened by samuela 5
  • A few Fixes to the redis backend and the hash function, probably worth to merge, probably not.

    A few Fixes to the redis backend and the hash function, probably worth to merge, probably not.

    • The url decomposition regexp for the redis back-end apparently had a problem.
    • The hash-from-arguments approach didn't work for my particular problem: a type can be 'pickle-able' without the pickle being unique, something that makes tasks non-uniquely identifiable. So, made it possible for the user to control the hash somehow.
    opened by alcidesv 5
  • Allow cleanup of only locks

    Allow cleanup of only locks

    I recently had an issue with a few of my jug processes being killed and therefore locks being kept on some active tasks. Normally, I would just delete the lock files in the jugdir, but I was using the Redis backend, so this was not possible. As suggested in the docs, I ran jug cleanup, expecting only the locks to be cleared; I was surprised to find the completed jobs cleared as well.

    Is there any way to have jug cleanup only clear the locks? If not, can there be an option for this?

    opened by freemansw1 4
  • Invalidate --target is too greedy

    Invalidate --target is too greedy

    -------------------------------------------------------------------------------------
               0           0          1           0           0  jugfile.function
               0           0          1           0           0  jugfile.function_other
    .....................................................................................
               0           0          2           2           0  Total
    
    

    Running jug invalidate --target jugfile.function invalidates everything.

    opened by unode 4
Releases(v2.2.2)
  • v2.2.2(Jul 18, 2022)

  • v2.2.0(May 2, 2022)

  • v2.1.1(Mar 21, 2021)

    Important are the fixes for Python 3.8 and redis.

    Compared to 2.1.0, this patch release includes some missing files in the package.

    Full list of changes * Removed Python 2.7 support * Fix output encoding for Python 3.8 * Fix bug mixing mapreduce() & status --cache * Make block_access (used in mapreduce()) much faster (20x) * Fix important redis bug * More precise output in cleanup command

    Source code(tar.gz)
    Source code(zip)
  • v2.0.2(Jun 12, 2020)

    Bugfix release.

    Compared to 2.0.0:

    • Fix handling of JUG_EXIT_IF_FILE_EXISTS environmental variable
    • Fix passing an argument to jug.main() function
    • Extend --pdb to exceptions raised while importing the jugfile (issue #79)
    Source code(tar.gz)
    Source code(zip)
  • v2.0.1(Jun 12, 2020)

  • v2.0.0(Jun 12, 2020)

    The big changes are failed tasks (contributed by @unode).

    Also, environmental variables are now always checked and creating a file called __jug_please_stop.txt will stop a jug execute run in a clean way.

    Full ChangeLog

    • jug.backend.base_store has 1 new method listlocks
    • jug.backend.base_lock has 2 new methods fail and is_failed
    • Add 'jug execute --keep-failed' to preserve locks on failing tasks.
    • Add 'jug cleanup --failed-only' to remove locks from failed tasks
    • 'jug status' and 'jug graph' now display failed tasks
    • Check environmental exit variables by default (suggested by Renato Alves, issue #66)
    • Fix 'jug sleep-until' in the presence of barrier() (issue #71)
    Source code(tar.gz)
    Source code(zip)
  • v2.0.0rc0(Jan 31, 2020)

  • v1.6.9(Jan 31, 2020)

  • v1.6.3(Nov 1, 2017)

    Add a polite request for citations to the Jug paper:

    Coelho, L.P., (2017). Jug: Software for Parallel Reproducible Computation in
    Python. Journal of Open Research Software. 5(1), p.30.
    
    http://doi.org/10.5334/jors.161
    
    Source code(tar.gz)
    Source code(zip)
  • v1.6.2(Nov 1, 2017)

  • v1.6.1(Nov 1, 2017)

  • v1.6.0(Aug 24, 2017)

    Release 1.6.0

    Adds the jug graph subcommand to generate a little graph with tasks and their state.

    • Add jug graph subcommand
    • Generates a graph of tasks
    • jug execute --keep-going now ends with non-zero exit code in case of failures
    • Fix bug with cleanup in dict_store not providing the number of removed records
    • Add jug cleanup --keep-locks to remove obsolete results without affecting locks

    Almost all of the credit for changes since 1.5.0 goes to Renato Alves (@unode on github).

    Source code(tar.gz)
    Source code(zip)
  • v1.5.0(Jul 16, 2017)

    Several new functions (is_jug_running, invalidate in shell, ...) and a new internal structure which allows for extensions (work by Renato Alves, aka @Unode).

    Full ChangeLog:

    • Add demo subcommand
    • Add is_jug_running() function
    • Fix bug in finding config files
    • Improved --debug mode: check for unsupported recursive task creation
    • Add invalidate() to shell environment
    • Use ~/.config/jug/jugrc as configuration file
    • Add experimental support for extensible commands, use ~/.config/jug/jug_user_commands.py
    • jugrc: execute_wait_cycle_time_secs is now execute_wait_cycle_time
    • Expose sync_move in jug.utils
    Source code(tar.gz)
    Source code(zip)
  • v1.3.0(Nov 1, 2016)

    Major improvements are IPython 5 compatibility, jug_execute, and timing.

    Full ChangeLog:

    • Update shell subcommand to IPython 5
    • Use ~/.config/jugrc as configuration file
    • Cleanup usage string
    • Use bottle instead of web.py for webstatus subcommand
    • Add jug_execute function
    • Add timing functionality
    Source code(tar.gz)
    Source code(zip)
  • release-1.2.1(Jun 8, 2016)

    A bugfix release, especially fixing jug on Windows (previously unusable as Windows does not support fsync on directories).

    Full ChangeLog:

    * Changed execution loop to ensure that all tasks are checked (issue #33
    on github)
    * Fixed bug that made 'check' or 'sleep-until' slower than necessary
    * Fixed jug on Windows (which does not support fsync on directories)
    * Made Tasklets use slightly less memory
    
    Source code(tar.gz)
    Source code(zip)
  • release-1.2.0(Jun 8, 2016)

    Several optimizations in code and minor fixes made a new release worthwhile. Because the hashes have been changed, this needs to be a new (minor) version (see issue #25): upgrading will require you to rerun any tasks that use kwargs.

    Full ChangeLog

    • Use HIGHEST_PROTOCOL when pickle()ing
    • Add compress_numpy option to file_store
    • Add register_hook_once function
    • Optimize case when most (or all) tasks are already run
    • Add --short option to jug status and jug execute
    • Fix bug with dictionary order in kwargs (fix by Andreas Sorge)
    • Fix ipython colors (fix by Andreas Sorge)
    • Sort tasks in jug status
    Source code(tar.gz)
    Source code(zip)
Raise asynchronous exceptions in other thread, control the timeout of blocks or callables with a context manager or a decorator

stopit Raise asynchronous exceptions in other threads, control the timeout of blocks or callables with two context managers and two decorators. Attent

Gilles Lenfant 97 Oct 12, 2022
A Python package for easy multiprocessing, but faster than multiprocessing

MPIRE, short for MultiProcessing Is Really Easy, is a Python package for multiprocessing, but faster and more user-friendly than the default multiprocessing package.

753 Dec 29, 2022
Jug: A Task-Based Parallelization Framework

Jug: A Task-Based Parallelization Framework Jug allows you to write code that is broken up into tasks and run different tasks on different processors.

Luis Pedro Coelho 387 Dec 21, 2022
Thread-safe asyncio-aware queue for Python

Mixed sync-async queue, supposed to be used for communicating between classic synchronous (threaded) code and asynchronous

aio-libs 665 Jan 03, 2023
A curated list of awesome Python asyncio frameworks, libraries, software and resources

Awesome asyncio A carefully curated list of awesome Python asyncio frameworks, libraries, software and resources. The Python asyncio module introduced

Timo Furrer 3.8k Jan 08, 2023
🌀 Pykka makes it easier to build concurrent applications.

🌀 Pykka Pykka makes it easier to build concurrent applications. Pykka is a Python implementation of the actor model. The actor model introduces some

Stein Magnus Jodal 1.1k Dec 30, 2022
rosny is a lightweight library for building concurrent systems.

rosny is a lightweight library for building concurrent systems. Installation Tested on: Linux Python = 3.6 From pip: pip install rosny From source: p

Ruslan Baikulov 6 Oct 05, 2021
AnyIO is an asynchronous networking and concurrency library that works on top of either asyncio or trio.

AnyIO is an asynchronous networking and concurrency library that works on top of either asyncio or trio. It implements trio-like structured concurrenc

Alex Grönholm 1.1k Jan 06, 2023
A lightweight (serverless) native python parallel processing framework based on simple decorators and call graphs.

A lightweight (serverless) native python parallel processing framework based on simple decorators and call graphs, supporting both control flow and dataflow execution paradigms as well as de-centrali

102 Jan 06, 2023
Trio – a friendly Python library for async concurrency and I/O

Trio – a friendly Python library for async concurrency and I/O The Trio project aims to produce a production-quality, permissively licensed, async/awa

5k Jan 07, 2023
Unsynchronize asyncio by using an ambient event loop, or executing in separate threads or processes.

unsync Unsynchronize asyncio by using an ambient event loop, or executing in separate threads or processes. Quick Overview Functions marked with the @

Alex Sherman 802 Dec 28, 2022
Simple package to enhance Python's concurrent.futures for memory efficiency

future-map is a Python library to use together with the official concurrent.futures module.

Arai Hiroki 2 Nov 15, 2022
Backport of the concurrent.futures package to Python 2.6 and 2.7

This is a backport of the concurrent.futures standard library module to Python 2. It does not work on Python 3 due to Python 2 syntax being used in th

Alex Grönholm 224 Nov 07, 2022
Ultra fast asyncio event loop.

uvloop is a fast, drop-in replacement of the built-in asyncio event loop. uvloop is implemented in Cython and uses libuv under the hood. The project d

magicstack 9.1k Jan 07, 2023
Python Multithreading without GIL

Multithreaded Python without the GIL

Sam Gross 2.3k Jan 05, 2023
aiomisc - miscellaneous utils for asyncio

aiomisc - miscellaneous utils for asyncio Miscellaneous utils for asyncio. The complete documentation is available in the following languages: English

aiokitchen 295 Jan 08, 2023
A concurrent sync tool which works with multiple sources and targets.

Concurrent Sync A concurrent sync tool which works similar to rsync. It supports syncing given sources with multiple targets concurrently. Requirement

Halit ÅžimÅŸek 2 Jan 11, 2022
Parallelformers: An Efficient Model Parallelization Toolkit for Deployment

Parallelformers: An Efficient Model Parallelization Toolkit for Deployment

TUNiB 559 Dec 28, 2022
SCOOP (Scalable COncurrent Operations in Python)

SCOOP (Scalable COncurrent Operations in Python) is a distributed task module allowing concurrent parallel programming on various environments, from h

Yannick Hold 573 Dec 27, 2022