Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Overview
Luigi Logo
https://img.shields.io/travis/spotify/luigi/master.svg?style=flat https://img.shields.io/codecov/c/github/spotify/luigi/master.svg?style=flat https://landscape.io/github/spotify/luigi/master/landscape.svg?style=flat https://img.shields.io/pypi/v/luigi.svg?style=flat https://img.shields.io/pypi/l/luigi.svg?style=flat

Luigi is a Python (3.6, 3.7 tested) package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization, handling failures, command line integration, and much more.

Getting Started

Run pip install luigi to install the latest stable version from PyPI. Documentation for the latest release is hosted on readthedocs.

Run pip install luigi[toml] to install Luigi with TOML-based configs support.

For the bleeding edge code, pip install git+https://github.com/spotify/luigi.git. Bleeding edge documentation is also available.

Background

The purpose of Luigi is to address all the plumbing typically associated with long-running batch processes. You want to chain many tasks, automate them, and failures will happen. These tasks can be anything, but are typically long running things like Hadoop jobs, dumping data to/from databases, running machine learning algorithms, or anything else.

There are other software packages that focus on lower level aspects of data processing, like Hive, Pig, or Cascading. Luigi is not a framework to replace these. Instead it helps you stitch many tasks together, where each task can be a Hive query, a Hadoop job in Java, a Spark job in Scala or Python, a Python snippet, dumping a table from a database, or anything else. It's easy to build up long-running pipelines that comprise thousands of tasks and take days or weeks to complete. Luigi takes care of a lot of the workflow management so that you can focus on the tasks themselves and their dependencies.

You can build pretty much any task you want, but Luigi also comes with a toolbox of several common task templates that you use. It includes support for running Python mapreduce jobs in Hadoop, as well as Hive, and Pig, jobs. It also comes with file system abstractions for HDFS, and local files that ensures all file system operations are atomic. This is important because it means your data pipeline will not crash in a state containing partial data.

Visualiser page

The Luigi server comes with a web interface too, so you can search and filter among all your tasks.

Visualiser page

Dependency graph example

Just to give you an idea of what Luigi does, this is a screen shot from something we are running in production. Using Luigi's visualiser, we get a nice visual overview of the dependency graph of the workflow. Each node represents a task which has to be run. Green tasks are already completed whereas yellow tasks are yet to be run. Most of these tasks are Hadoop jobs, but there are also some things that run locally and build up data files.

Dependency graph

Philosophy

Conceptually, Luigi is similar to GNU Make where you have certain tasks and these tasks in turn may have dependencies on other tasks. There are also some similarities to Oozie and Azkaban. One major difference is that Luigi is not just built specifically for Hadoop, and it's easy to extend it with other kinds of tasks.

Everything in Luigi is in Python. Instead of XML configuration or similar external data files, the dependency graph is specified within Python. This makes it easy to build up complex dependency graphs of tasks, where the dependencies can involve date algebra or recursive references to other versions of the same task. However, the workflow can trigger things not in Python, such as running Pig scripts or scp'ing files.

Who uses Luigi?

We use Luigi internally at Spotify to run thousands of tasks every day, organized in complex dependency graphs. Most of these tasks are Hadoop jobs. Luigi provides an infrastructure that powers all kinds of stuff including recommendations, toplists, A/B test analysis, external reports, internal dashboards, etc.

Since Luigi is open source and without any registration walls, the exact number of Luigi users is unknown. But based on the number of unique contributors, we expect hundreds of enterprises to use it. Some users have written blog posts or held presentations about Luigi:

Some more companies are using Luigi but haven't had a chance yet to write about it:

We're more than happy to have your company added here. Just send a PR on GitHub.

External links

Authors

Luigi was built at Spotify, mainly by Erik Bernhardsson and Elias Freider. Many other people have contributed since open sourcing in late 2012. Arash Rouhani is currently the chief maintainer of Luigi.

Comments
  • New theme for luigi visualiser

    New theme for luigi visualiser

    This is a new theme for the visualiser accessible through .../index.lte.html. The theme takes the d3 theme and adds jQuery DataTables for displaying a filterable table of tasks and AdminLTE for styling.

    Key features are:

    • Table paging
    • Search filter by any text in the table
    • Info boxes displaying count of task status (done, running, etc.) are selectable to filter the table
    • Sidebar displaying task names with counts are selectable to filter the table
    • Exception reports are accessible via expandable rows

    I can imagine improving the layout of the "Parameters" column and making more use of the expandable rows but we are already finding this very useful in production.

    With 3 separate versions of index.html and several new dependencies I think it would be worth refactoring the /static structure and finding a neater way of selecting themes. If you agree I'll take a look at it.

    opened by stephenpascoe 89
  • Add Datadog contrib for monitoring purpose

    Add Datadog contrib for monitoring purpose

    Description

    Datadog is a tool that allows you to send metrics that you create dashboard and add alerting on specific behaviors.

    Adding this contrib will allow for users of this tool to log their pipeline information to Datadog.

    Motivation and Context

    Based on the status change of a task, we log that information to Datadog with the parameters that were used to run that specific task.

    This allows us to easily create dashboards to visualize the health. For example, we can be notified via Datadog if a task has failed, or we can graph the execution time of a specific task over a period of time.

    The implementation idea was strongly based on the stale PR https://github.com/spotify/luigi/pull/2044.

    Have you tested this? If so, how?

    We've been using this contrib for multiple months now (maybe a year?), at Glossier. This is the main point of reference to see the health of our pipeline.

    opened by thisiscab 48
  • Dynamic requirements

    Dynamic requirements

    This is a prototype for dynamic requirement based of erikbern's dynamic-requires branch

    It differs in that it multiple requirements can be returned at once and that it supports multiple threads. There is no concept of suspending tasks. If the requirements are not all available when yielded the task will be put back as pending.

    I struggled with getting access to the dynamic requirements from other processes. How it works now is that you have to add a dynamic_requires function that return the Tasks that may be yielded later on. I'm very open to suggestions on how to improve this.

    Again, just a prototype... Comments please!

    opened by tommyengstrom 43
  • Visible parameter for luigi.Parameters

    Visible parameter for luigi.Parameters

    Description

    Visible parameter for luigi.Parameter class which is needed to show/hide some parameters from WEB page such as passwords to database

    Motivation and Context

    Well, in my case this change is crucial because of some of my tasks connect to database and contain passwords for it which i don't want to show it from web page to all.

    In code it will look like this:

    class MyTask(luigi.Task):
        secret = luigi.Parameter(visible=False)    # default value for visible - True
        public = luigi.Parameter()
    

    Have you tested this? If so, how?

    I tested it on my tasks and it works fine. It just removes this parameters from node represents current task, so it shouldn't break any compatibility

    (Below images are just examples) Before: image After: image

    Maybe, it will be helpful for others too in similar case

    opened by nryanov 39
  • S3Client to use Boto3

    S3Client to use Boto3

    Description

    This work is to move away from boto to start using Boto3 for the S3Client.

    Motivation and Context

    Boto being no longer supported, Luigi should move away from it and use the current Boto3. This would solve a number of issues for example:

    • More stable downloads for big files
    • Built-in support for multipart uploads
    • Encryption with KMS
    • .... One of the main motivation from my part is the lack of support for aws task roles in boto. As more people are using this AWS functionality, It would make sense to move the S3Client to use boto3. More reasons can be found here: https://github.com/spotify/luigi/issues/1344

    Have you tested this? If so, how?

    This is Work In Progress. I'm trying my best to stick to the original tests but sometimes change is inevitable (Happy to chat for more details)

    Note

    This is my very first contribution so feedback and suggestions are more than welcome.

    opened by ouanixi 39
  • Task id hashing

    Task id hashing

    Implementing semi-opaque, hashed task_ids as discussed at #1312.

    This is the version I will be testing on our infrastructure in the next few weeks. This PR is intended to give my proposal better visibility whilst I test.

    The PR replaces task_ids with a value which is reliably unique but you can't extract the full parameter values from it. The actual algorithm is family_pval1_pval2_pval3_hash where:

    • family: The task_family (a.k.a task_name)
    • pval*: A truncated serialisation of the first 3 parameter values, sorted by parameter name
    • hash: A md5hash of the canonical JSON serialisation of the family and parameters, truncated to 10 characters.

    str(task) will return the traditional serialisation.

    The db_task_history database schema is changed to include tash_id in the tasks table. No further use is made of this column at this stage. A migration script is supplied to add the column to an existing database. This has been tested on MySQL only.

    All tests are updated to pass by replacing hard-coded task_ids with reference to Task.task_id.

    opened by stephenpascoe 39
  • Task history, from https://github.com/spotify/luigi/issues/51

    Task history, from https://github.com/spotify/luigi/issues/51

    All the tables are added as per the RFC, though currently only Task, TaskEvent, and TaskParameters is being used right now.

    Added a simple HTML front-end using Flask. I'd be open to using tornado to be consistent with server.py, I just used flask because I've used it before.

    Some screenshots of the pages:

    • Recent runs: http://cl.ly/image/1h01133g1Q07
    • Task details: http://cl.ly/image/1P1i3T0C1y42
    • All "A" runs: http://cl.ly/image/2X3h0X3k3P2h
    opened by JoeEnnever 34
  • Issue when upgrading luigi version

    Issue when upgrading luigi version

    When upgrading to a version of luigi at or after 6e549ef, I receive the following when python setup.py install within an existing venv.

    (venv) [email protected]:~/luigi$ python setup.py install
    running install
    running bdist_egg
    running egg_info
    writing requirements to luigi.egg-info/requires.txt
    writing luigi.egg-info/PKG-INFO
    writing top-level names to luigi.egg-info/top_level.txt
    writing dependency_links to luigi.egg-info/dependency_links.txt
    writing entry points to luigi.egg-info/entry_points.txt
    reading manifest file 'luigi.egg-info/SOURCES.txt'
    reading manifest template 'MANIFEST.in'
    writing manifest file 'luigi.egg-info/SOURCES.txt'
    installing library code to build/bdist.linux-x86_64/egg
    running install_lib
    running build_py
    error: can't copy 'luigi/static/visualiser/lib/URI.js': doesn't exist or not a regular file
    

    This commit added a file luigi/static/visualiser/lib/URI.js/1.18.2/URI.js . I'm suspicious there's a problem here.

    opened by dlstadther 31
  • Py3k

    Py3k

    First step to at python3 support. Majority of tests still fail but Luigi can be imported in python3. The code still run just fine in python2, the only limitation is an extra dependency on six.

    opened by gpoulin 31
  • [fixed] Task status messages

    [fixed] Task status messages

    This PR is a fixed version of PR #1621 which has a faulty commit history.

    This PR adds status messages to tasks which are also visible on the scheduler GUI.

    Examples:

    task list

    worker list

    Status messages are meant to change during the run method in an opportunistic way. Especially for long-running non-Hadoop tasks, the ability to read those messages directly in the scheduler GUI is quite helpful (at least for us). Internally, message changes are propagated to the scheduler via a _status_message_callback which is - in contrast to tracking_url_callback - not passed to the run method, but set by the TaskProcess.

    Usage example:

    class MyTask(luigi.Task):
        ...
        def run(self):
            for i in range(100):
                # do some hard work here
                if i % 10 == 0:
                    self.set_status_message("Progress: %s / 100" % i)
    

    I know that you don't like PR's that affect core code (which is reasonable =) ), but imho this feature is both lightweight and really helpful.

    opened by riga 30
  • Enables running of multiple tasks in batches

    Enables running of multiple tasks in batches

    Enables running of multiple tasks in batches

    Sometimes it's more efficient to run a group of tasks all at once rather than one at a time. With luigi, it's difficult to take advantage of this because your batch size will also be the minimum granularity you're able to compute. So if you have a job that runs hourly, you can't combine their computation when many of them get backlogged. When you have a task that runs daily, you can't get hourly runs.

    In order to gain efficiency when many jobs are queued up, this change allows workers to provide details of how jobs can be batched to the scheduler. If you have several hourly jobs of the same type in the scheduler, it can combine them into a single job for the worker. We allow parameters to be combined in three ways: we can combine all the arguments in a csv, take the min and max to form a range, or just provide the min or max. The csv gives the most specificity, but range and min/max are available for when that's all you need. In particular, the max function provides an implementation of #570, allowing for jobs that overwrite eachother to be grouped by just running the largest one.

    In order to implement this, the scheduler will create a new task based on the information sent by the worker. It's possible (as in the max/min case) that the new task already exists, but if it doesn't it will be cleaned up at the end of the run. While this new task is running, any other tasks will be marked as BATCH_RUNNING. When the head task becomes DONE or FAILED, the BATCH_RUNNING tasks will also be updated accordingly. They'll also have their tracking urls updated to match the batch task.

    This is a fairly big change to how the scheduler works, so there are a few issues with it in the initial implementation:

    • newly created batch tasks don't show up in dependency graphs
    • the run summary doesn't know what happened to the batched tasks
    • batching takes quadratic time for simplicity of implementation
    • I'm not sure what would happen if there was a yield in a batch run function

    For the user, batching is accomplished by setting batch_method in the parameters that you wish to batch. You can limit the number of tasks allowed to run simultaneously in a single batch by setting the class variable batch_size. If you want to exempt a specific task from being part of a batch, simply set its is_batchable property to False.

    opened by daveFNbuck 30
  • fix: using default_scheduler_url as a mounting point for not root pat…

    fix: using default_scheduler_url as a mounting point for not root pat…

    …h. fix #3213

    Signed-off-by: ROUDET Franck INNOV/IT-S [email protected]

    Description

    1 files modified to change the build of URL. Keep full base path from default_scheduler_url. 2nd attempt for that: more cleaner approach.

    Motivation and Context

    Fixes:

    • https://github.com/spotify/luigi/issues/3213
    • and https://groups.google.com/g/luigi-user/c/kJ7aI2GNfcE

    Have you tested this? If so, how?

    I'm able to test:

    • With or Without scheduler-url set
    • With or Without scheduler-url not at root
    opened by franckOL 1
  • using default_scheduler_url as a mounting point (not root `http://address/mount`)  behind proxy not working

    using default_scheduler_url as a mounting point (not root `http://address/mount`) behind proxy not working

    I need to mount luigi behind a nginx, for example luigi at http://address/mount. For that I configure:

    [core]
    default_scheduler_url=http://address/mount
    ....
    

    GUI is ok and works but, CLI not due to url resolution. it happens there https://github.com/spotify/luigi/blob/c13566418c92de3e4d8d33ead4e7c936511afae1/luigi/rpc.py#L54

    To understand what happened:

    parsed=urlparse('http://address/mount')
    url='/api/add_task'
    urljoin(parsed.geturl(), url)
    # ==> give 'http://address/api/add_task'
    # expected http://address/mount/api/add_task
    

    What I must do for working - slash at the end of mount point, no slash for url -:

    parsed=urlparse('http://address/mount/')
    url='api/add_task'
    urljoin(parsed.geturl(), url)
    # ==> http://address/mount/api/add_task
    
    opened by franckOL 0
  • Add support/option to pick `TaskProcess` worker process context

    Add support/option to pick `TaskProcess` worker process context

    Currently TaskProcess uses the default multiprocessing context (inherits from multiprocessing.Process), which ends up being spawn on mac/windows and fork on linux. There's no way to change the context on Linux to spawn or forkserver if the default fork is causing issues in the worker process (see https://github.com/python/cpython/issues/84559).

    opened by ravwojdyla 4
  • Fails to parse list parameter from cli argument: JSONDecodeError

    Fails to parse list parameter from cli argument: JSONDecodeError

    foo.py

    import luigi
    
    class Foo(luigi.Task):
        bar = luigi.ListParameter(default="['foo']")
        def run(self):
            print(f"Running task {self}")
        def output(self):
            return luigi.LocalTarget("./foo.txt")
    

    Running luigi --module foo Foo --bar 'bar' results in json.decoder.JSONDEcodeError: Expecting value: line1 column 1 (char 0)

    Stacktrace:

    ERROR: Uncaught exception in luigi
    Traceback (most recent call last):
      File "/home/lib/python3.8/site-packages/luigi/retcodes.py", line 75, in run_with_retcodes
        worker = luigi.interface._run(argv).worker
      File "/home/lib/python3.8/site-packages/luigi/interface.py", line 213, in _run
        return _schedule_and_run([cp.get_task_obj()], worker_scheduler_factory)
      File "/home/lib/python3.8/site-packages/luigi/cmdline_parser.py", line 114, in get_task_obj
        return self._get_task_cls()(**self._get_task_kwargs())
      File "/home/lib/python3.8/site-packages/luigi/cmdline_parser.py", line 131, in _get_task_kwargs
        res.update(((param_name, param_obj.parse(attr)),))
      File "/home/lib/python3.8/site-packages/luigi/parameter.py", line 1140, in parse
        i = json.loads(x, object_pairs_hook=FrozenOrderedDict)
      File "/usr/lib/python3.8/json/__init__.py", line 370, in loads
        return cls(**kw).decode(s)
      File "/usr/lib/python3.8/json/decoder.py", line 337, in decode
        obj, end = self.raw_decode(s, idx=_w(s, 0).end())
      File "/usr/lib/python3.8/json/decoder.py", line 355, in raw_decode
        raise JSONDecodeError("Expecting value", s, err.value) from None
    json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)
    

    Expected Behavior: The command should complete successfully

    OS: Ubuntu python: 3.8.10 luigi : 3.1.1

    opened by ymentha14 1
  • Failed to return in running luigi.build in luigi tasks

    Failed to return in running luigi.build in luigi tasks

    I have a workflow to trigger another luigi.build in the task run method (like below). Both parent and child level of luigi build use multiple workers.

    import luigi
    
    class Task2(luigi.Task):
        def run(self):
            print("*"*10, "done task2")
    
    class Task1(luigi.Task):
        def run(self):
            luigi.build([Task2()], local_scheduler=True, workers=2)
            print("*"*10, "done task1")
    
    luigi.build([Task1()], local_scheduler=True, workers=2)
    

    The scheduler in the child process (Task.run) stuck and failed to return. I killed the process and it seems to me the worker queue failed to pull the task.

    2022-11-12 01:06:00,411 INFO Informed scheduler that task   Task2__99914b932b   has status   PENDING
    2022-11-12 01:06:00,411 INFO Done scheduling tasks
    2022-11-12 01:06:00,411 INFO Running Worker with 2 processes
    ^C2022-11-12 01:06:40,606 INFO Worker Worker(salt=4670658361, workers=2, host=slurm-node6.development-singapore.chorus, username=gchan, pid=27743) was stopped. Shutting down Keep-Alive thread
    Traceback (most recent call last):
      File "test_luigi.py", line 37, in <module>
        luigi.build([Task1()], local_scheduler=True, workers=2)
      File "/home/gchan/intranet/environments/20220905_211953_12c7ddd0-0e8b-4bcd-b13b-b7e44cd1f344/lib/python3.8/site-packages/luigi/interface.py", line 239, in build
        luigi_run_result = _schedule_and_run(tasks, worker_scheduler_factory, override_defaults=env_params)
      File "/home/gchan/intranet/environments/20220905_211953_12c7ddd0-0e8b-4bcd-b13b-b7e44cd1f344/lib/python3.8/site-packages/luigi/interface.py", line 173, in _schedule_and_run
        success &= worker.run()
      File "/home/gchan/intranet/environments/20220905_211953_12c7ddd0-0e8b-4bcd-b13b-b7e44cd1f344/lib/python3.8/site-packages/luigi/worker.py", line 1242, in run
        self._handle_next_task()
      File "/home/gchan/intranet/environments/20220905_211953_12c7ddd0-0e8b-4bcd-b13b-b7e44cd1f344/lib/python3.8/site-packages/luigi/worker.py", line 1101, in _handle_next_task
        self._task_result_queue.get(
      File "/opt/chorus-python38/lib/python3.8/multiprocessing/queues.py", line 107, in get
        if not self._poll(timeout):
      File "/opt/chorus-python38/lib/python3.8/multiprocessing/connection.py", line 257, in poll
        return self._poll(timeout)
      File "/opt/chorus-python38/lib/python3.8/multiprocessing/connection.py", line 424, in _poll
        r = wait([self], timeout)
      File "/opt/chorus-python38/lib/python3.8/multiprocessing/connection.py", line 931, in wait
        ready = selector.select(timeout)
      File "/opt/chorus-python38/lib/python3.8/selectors.py", line 415, in select
        fd_event_list = self._selector.poll(timeout)
    KeyboardInterrupt
    

    Currently I am using the latest luigi version 3.1.1 and Python 3.8.12.

    opened by gavincyi 4
Releases(3.1.1)
  • 3.1.1(Aug 18, 2022)

    3.1.1

    Added

    luigi

    • Add worker config to cache task completion results. (#3178)

    Fixed

    luigi

    • Close requests.Socket in RemoteScheduler before exiting (#3173) (#3175)
    • Use int-parameters for random.randrange() (#3177)
    • Alternate (probably more correct) fix for #3182 (supercedes #3183) #3185
    Source code(tar.gz)
    Source code(zip)
  • 3.1.0(Jun 20, 2022)

    3.1.0

    Added

    luigi

    • Documentation guidance around release version increments #3074
    • Add support for naming tasks in @requires #3077
    • Add traceback_max_length parameter for error email notifications #3086
    • Document cause of Unfulfilled dependency error #3105
    • Add additional OptionalParameter datatype options #3079
    • UI: Add rerun command snippet on "show error" modal #3117
    • Add support for updating default config parser after loading luigi #3135
    • Allow batch_email.email_interval to be set in config as email-internal-minutes #3125
    • Allow TimeDeltaParameter to accept input as "seconds" #3125
    • Add EnumListParameter to top-level attribute imports #3144
    • Enable metrics_custom_import for MetricCollectors #3146
    • Improve warning when a parameter is not consumed by a task #3170

    luigi.contrib

    • Add configure_job BigQuery property #3098
    • Add parquet support to BigQuery #3099
    • Add network retry logic to BigQuery #3088
    • Add run_task_kwargs property to ECS #3083
    • Add pickle_protocol attribute and configuration option to Spark #3001
    • Add pg8000 driver support to Postgres #3142

    Fixed

    luigi

    • Fix default value for task.disable_window #3081
    • Fix deconstructor of LocalTarget when is_tmp attribute dne #3085
    • Improved documentation reference to luigi.format.Nop import #3047
    • Fix Python 3.10 deprecation warnings #3150
    • Remove unnecessary extra call to cls.get_task_namespace() #3129
    • Fix documentation typo in notifications_test.py #3151
    • Fix docs ci #3158
    • Fix task history rendering #3153
    • Move max_shown_tasks and max_graph_nodes documentation to correct section #3156
    • Fix ability to subclass Task's Register metaclass #3154
    • Replace legacy TravicCI readme badge with GithubActions #3159

    luigi.contrib

    • Fix apache ci
      • #3091
      • #3113
    • Fix documentation typo in sqla #3110
    • Fix spark cluster mode error for missing path #3111
    • Fix connection object passed to rdbms.CopyToTable's _add_metadata_columns() #3011

    Removed

    luigi

    • Remove @Tarrasch from codeowners #3127

    Changed

    luigi

    • Update license copyright year #3108
    • Improve error message on parsing default parameter value #3115
    • Group tasks by task class in svg graph #3122
    • Upgrade tenacity version #3147
    • Enable task search to be case insensitive #3157

    luigi.contrib

    • Update ExternalPythonProgramTask parameters to OptionalParameter #3130
    • Allow newer versions of prometheus-client package #3163
    Source code(tar.gz)
    Source code(zip)
  • 3.0.3(Apr 15, 2021)

    Added:

    luigi
    • Adding retry on _obj_exists #3022
    • don't use absolute path in redirect for visualizer #2785
    luigi.contrib
    • kubernetes: labels are applied to pod #3007

    Fixed:

    luigi
    • Flush stream_for_searching_tracking_url #3000
    luigi.contrib
    • Fix azureblob tests: use json instead of numpy #3032

    Changed:

    luigi
    • replace naive retry with tenacity #3026
    Source code(tar.gz)
    Source code(zip)
  • 3.0.2(Sep 23, 2020)

    3.0.2

    Fixed:

    luigi
    • Garbage collect task result queue when worker context exits #2973
    • Fixing problem with ListParameter and Dynamic Dependencies #2970

    Changed:

    luigi
    • Drop Python 3.3 and 3.4 support #2978
    luigi.contrib
    • Use updated uri for gcs batch reqs #2998
    Source code(tar.gz)
    Source code(zip)
  • 3.0.1(Jul 23, 2020)

    Added:

    luigi
    • Worker_timeout can be 0. #2968
    • Return bq job id from biquery.run_job() #2957
    • Documentation for check_complete_on_run config #2961
    Source code(tar.gz)
    Source code(zip)
  • 3.0.0(Jun 2, 2020)

    3.0.0

    This is a major release without many feature changes compared to 2.8.13. The reason we decided to give it a major bump is the drop of Python2 support. From this version on, Luigi stops supporting Python2 for the obvious reason. 3.0.0 release includes a series of PRs deprecating Python2, plus a few other changes listed below. Special thanks go to @drowoseque for doing all the great work!

    Added:

    luigi
    • Show task progress in visualizer workers tab. #2932

    Fixed:

    luigi
    • Fix TravisCI build #2948
    • Use is_alive in favour of isAlive for Python 3.9 compatibility. #2940
    Source code(tar.gz)
    Source code(zip)
  • 2.8.13(Apr 29, 2020)

    Added:

    luigi.contrib
    • Presto support in Luigi (#2885)

    Fixed:

    luigi
    • removed wrong type of Target.init path arg from doc master (#2927)
    • remove StreamingBodyAdaptor that didn't allow choosing the chunk size (#2929)
    • Fix docs explaining write modes for Luigi Targets. Closes #2783 (#2931)
    • All configuration parameters in docs now use underscore in their names for consistency. (#2890)

    Changed:

    luigi
    • Allowed wider popovers in grapth. (#2093)
    • update documentation to prefer pykube-ng (#2924)
    Source code(tar.gz)
    Source code(zip)
  • 2.8.12(Feb 19, 2020)

    Added:

    luigi
    • EnumListParameter #2801

    Fixed:

    luigi
    • Import ABC from collections.abc instead of collections for Python 3.9 compatibility #2895

    Changed:

    luigi.contrib
    • [luigi.contrib.hive] HiveTableTarget inherits HivePartitionTarget #2872
    • [luigi.contrib.pyspark_runner] SparkSession support in PySparkTask #2862
    Source code(tar.gz)
    Source code(zip)
  • 3.0.0b2(Feb 19, 2020)

    This the second 3.0.0 beta release including a series of PRs deprecating Python2, plus following:

    Special thanks go to @drowoseque for doing all the great work!

    Added:

    luigi
    • Add internal version info #2760
    • EnumListParameter #2801 (new since 3.0.0b1)
    luigi.contrib
    • [luigi.contrib.spark] pyspark python options added #2818

    Fixed:

    luigi
    • Fix params hashing #2540
    • Check for autoload_range istead of autoload-range
    • autoload_range doc fix #2878 (new since 3.0.0b1)

    Removed:

    luigi
    • [luigi.file] removed #2832
    • [luigi.mock.MockFile] removed #2839

    Changed:

    luigi
    • Allow python-daemon >= 2.2.0 if not on windows #2796
    • Make URLLibFetcher aware of basic auth info in scheduler URL. #2791
    luigi.contrib
    • [luigi.contrib.external_program.ExternalProgramTask] logs_output_pattern_to_url provided #2822
    • [luigi.contrib.hive] HiveTableTarget inherits HivePartitionTarget #2872 (new since 3.0.0b1)
    • [luigi.contrib.pyspark_runner] SparkSession support in PySparkTask #2862 (new since 3.0.0b1)
    Source code(tar.gz)
    Source code(zip)
  • 2.8.11(Jan 2, 2020)

    Added:

    luigi
    • Add internal version info #2760
    luigi.contrib
    • [luigi.contrib.spark] pyspark python options added #2818

    Fixed:

    luigi
    • Fix params hashing #2540
    • Check for autoload_range istead of autoload-range
    • autoload_range doc fix #2878

    Removed:

    luigi
    • [luigi.file] removed #2832
    • [luigi.mock.MockFile] removed #2839

    Changed:

    luigi
    • Allow python-daemon >= 2.2.0 if not on windows #2796
    • Make URLLibFetcher aware of basic auth info in scheduler URL. #2791
    luigi.contrib
    • [luigi.contrib.external_program.ExternalProgramTask] logs_output_pattern_to_url provided #2822
    Source code(tar.gz)
    Source code(zip)
  • 2.8.10(Nov 22, 2019)

    Added:

    luigi
    • Add HEAD endpoint to scheduler server for status/health checks #2789
    luigi.contrib
    • [luigi.contrib.hive] WarehouseHiveClient #2826

    Fixed:

    luigi.contrib
    • Add Python version-agnostic get_writer_schema. #2827
    • PySparkTask: handle special characters in name (#2778) #2779

    Changed:

    luigi.contrib
    • [luigi.contrib.spark] tracking_url_pattern as a property #2820
    • Add pod_creation_wait_interal #2813
    • Added optional argument 'aws_session_token' to S3Client #2798
    Source code(tar.gz)
    Source code(zip)
  • 2.8.9(Aug 27, 2019)

    Added:

    luigi
    • Adds "Force Commit" button in UI to set tasks to DONE #2751
    • Show task history link in visualizer when recording. #2759

    Fixed:

    luigi
    • Replace documentation reference to outdated test environment py27-nonhdfs #2762
    • Issue 2644: Tasks can be run several times under certain conditions #2645
    luigi & luigi.contrib
    • Ensure ignored tests are picked up by tox #2758

    Changed:

    luigi
    • Update tornado requirement for new enough python versions #2761
    luigi.contrib
    • contrib/ftp: Clean up temporary files #2755
    Source code(tar.gz)
    Source code(zip)
  • 2.8.8(Aug 12, 2019)

    Added:

    luigi
    • Expandable Namespace Folders for the Visualiser Sidebar #2716
    • Added new companies to the luigi users list: #2730 #2747
    luigi.contrib
    • Enable Overriding Poll Interval for Kubernetes Jobs #2724

    Fixed:

    luigi
    • Code example correction #2754

    Changed:

    luigi
    • Update release process #2727
    • Change GET request to POST requests in luigi/rpc #2732
    • Fix SendGrid email API documentation. #2745
    Source code(tar.gz)
    Source code(zip)
  • 2.8.7(Jun 14, 2019)

    Added:

    luigi
    • Add check_complete_on_run to optionally mark tasks as failed if complete() is false when run finishes (#2710)
    • Add section "Running Luigi on Windows" to docs (#2720)
    • Add Giphy to list of companies using Luigi to docs (#2713)
    luigi.contrib
    • Add DropboxTarget for luigi (#2696)

    Fixed:

    luigi
    • UI: Fix time graph - y axis to account for timezones. (#2711)

    Changed:

    luigi
    • Bump dependencies used by SendGrid integration. (#2715)
    • Update copyright year in LICENSE (#2723)
    luigi.contrib
    • Make RedisTarget compatible with redis-py >= 3 (#2722)
    Source code(tar.gz)
    Source code(zip)
  • 2.8.6(May 22, 2019)

  • 2.8.5(May 9, 2019)

    Fixed:

    luigi
    • (Doc) Fix example of "summary_length" #2700
    • Fix __init__ error when using TOML config #2702
    • add callback to metric collector #2704
    luigi.contrib
    • Fix BigQueryTarget parsing in beam_dataflow module #2705

    Changed:

    luigi.contrib
    • aws batch : job queue as parameter #2689
    Source code(tar.gz)
    Source code(zip)
  • 2.8.4(May 6, 2019)

    This release is broken due to #2628 .

    Added:

    luigi
    • Added support for a detailed LuigiRunResult instead of a plain Boolean (#2630)
    • Add worker option 'max_keep_alive_idle_duration (#2654)
    • Added worker-id commandline parameter (#2655)
    luigi.contrib
    • Add support for specifying kubernetes namespace (#2629)
    • Add a Task wrapper for MicroSoft OpenPAI (#2531)
    • Provide automatic URL tracking for Spark applications (#2661) (#2669)
    • add beam_dataflow_task to luigi/contrib (#2675)
    Both
    • Add Prometheus contrib for monitoring purpose (#2628)

    Fixed:

    luigi
    • setup.py: Support older setuptools (<=20.1.1) (#2623)
    • Fix broken aws tests (#2658)
    • Accept pathlib based path as argument for LocalTarget (#2548)
    • Fix durations in D3 graph (fixes #2620) (#2624)
    • Import collections ABCs from collections.abc, not collections (#2683)
    • Configuration documentation: remove deprecated/wrong [core]max_reschedules entry (#2692)
    luigi.contrib
    • fixes #2223 HdfsTarget is not working with snakebite (#2572)
    • Add port field for PostgresQuery (fixes #2625) (#2627)
    Both
    • Fix flake errors after moving to python 3 (#2695)

    Changed:

    luigi
    • Simplify implementation of temporary_path() (#2652)
    • Prevent range-tasks from autoloading (#2656)
    • Replace Python MapReduce example with Spark example. (#2668)
    • Minor improvements to single-worker-timeout support (#2667)
    • Require at least python-dateutil version 2.7.5 instead of only 2.7.5 (fixes #2662) (#2679)
    • RangeMonthly should deal with whole months (#2666)
    • Reconcile underscore/dash config style handling (#2688) #2691
    luigi.contrib
    • Add autodetect parameter to BigQueryLoadTask (#2363) (#2575)
    Source code(tar.gz)
    Source code(zip)
  • 2.8.3(Jan 16, 2019)

    Added:

    luigi
    • Add BaseTIS to the company list #2607
    • Add Hopper to the company list #2614
    • give a few default values to opts when setting up logging #2612
    • Add range functionality for monthly cadence. #2601
    luigi.contrib
    • Added port to PostgresTarget #2615
    • Support for Azure Blob Storage Target #2585
    • Add Datadog contrib for monitoring purpose #2434

    Fixed:

    luigi
    • Docs: Fixed a mistake with @inherits syntax in luigi/util.py #2613
    • Check type of column before migrating schemas for task db history for postgres dialect (fixes #2563) #2564
    luigi.contrib
    • S3: Fix call to message from TypeError not working with Python 3.6 #2617
    • Use proper API call in bigtable.py's make_dataset #2618
    • Sqla: Fix the table name when reflect is True in sqla.CopyToTable (fixes #2604) #2605

    Changed:

    luigi.contrib
    • Changed to buffered reads when using GCSTarget #2588
    Source code(tar.gz)
    Source code(zip)
  • 2.8.2(Dec 12, 2018)

  • 2.8.1(Dec 11, 2018)

    Note: Broken due to a runtime error in LuigiConfigParser. See https://github.com/spotify/luigi/issues/2592.

    Added:

    luigi
    • Add some docs to interface.run #2582
    • Configure logging via TOML config #2483
    luigi.contrib
    • Added port property to CopyToTable #2561
    • Make it so we can do from luigi.contrib.hdfs import HdfsFlagTarget #2594
    • contrib: Add ExternalDailySnapshot #2591

    Fixed:

    luigi
    • (docstring) Update task.py #2589
    • Docs: Fixed "Github" to fit to the rest of the doc #2596
    • Fix inspect.getargspec() DeprecationWarning in PY3 #2579
    luigi.contrib
    • Fix Travis Moto Test Failures #2586
    • Reduce TravisCI Test Runtime #2541

    Changed:

    luigi
    • Make Worker parameter task_process_context an OptionalParameter #2574
    luigi.contrib
    • S3Client improvements #2569
    Source code(tar.gz)
    Source code(zip)
  • 2.8.0(Nov 2, 2018)

    This is a minor version bump, due to:

    • Dropping Python 3.4 and 3.5 from CI, which means no automated tests to ensure compatibility for those versions
    • [Security Patch] CORS being disabled by default. A new section of configuration [cors] is introduced to enable custom settings. For details, refer to user group topic: https://groups.google.com/forum/#!topic/luigi-user/ZgfRTpBsVUY

    Added

    luigi:
    • Add Python 3.7 compatibility (#2466) This also drops 3.4 and 3.5 from CI.
    • Interpolate environment variables in .cfg config files (#2527)
    luigi.contrib:
    • Add CopyToTable task for MySQL (#2553)
    • Add HdfsFlagTarget (#2559)
    luigi.contrib:

    Fixed

    luigi:
    • Fix ReadTheDocs build (#2546)
    • Make capture_output non-positional in ExternalProgramTask (#2547)
    luigi.contrib:
    • Fix S3Client's _path_to_bucket_and_key to support keys with question marks (#2534)
    • Fix S3Client.remove - add max batch size (#2529)
    • Small fix to logging in contrib/ecs.py (#2556)
    • FIX HdfsAtomicWriteDirPipe.close() when using snakebite and the file do not exist. (#2549)

    Changed:

    luigi:
    • [ImgBot] optimizes images (#2555)
    luigi.contrib:
    • Remove s3 bucket validation prior to file upload (#2528)
    • Refactor s3 copy into sub-methods (#2508)
    Source code(tar.gz)
    Source code(zip)
  • 2.7.9(Sep 28, 2018)

    Added

    luigi.contrib:
    • Added optional choice for hdfs clients (#2487)
    • s3client check for deprecated host keyword and raise error with the details (#2493)
    • Add a "capture_output" parameter to ExternalProgramTask (#2430)

    Fixed

    luigi:
    • Fix exception when toml lib is not installed (#2506)
    • Replace direct attribute accessing by using built-n function getattr (#2509)
    • set upper bound of python-daemon (#2536)
    luigi.contrib:
    • Fix S3Client.copy return value consistency (#2488)
    • Update MockTarget mode to accept r* or w* (#2519)
    Source code(tar.gz)
    Source code(zip)
  • 2.7.8(Aug 24, 2018)

  • 2.7.7(Aug 24, 2018)

    Added

    luigi:
    • Add Data Revenue to the blogged list (#2472)
    • Add default reviewers in CODEOWNERS (#2465)
    • Optional TOML configs support (#2457)
    • Add support for multiple requires and inherits arguments (#2475)
    • Add a visiblity level for luigi.Parameters (#2278)
    • Make logging of RPC retries configurable #2486
    • Added a new event 'progress' (#2498)
    luigi.contrib:
    • Additions to provide support for the Load Sharing Facility (LSF) job scheduler (#2373)
    • Added default port behaviour for Redshift (#2474)
    • Add metadata columns to the RDBMS contrib (#2440)
    • Use passed password when create a redis connection (#2489)

    Changed

    luigi:
    • Update supplementary github files to improve repo organization and maintenance (#2463)
    • Use task_id in Task.eq comparison (#2462)
    • Replace luigi.Task by RunOnceTask in scheduler_visualisation_test (#2476)
    • (Breaking change) Bump tornado milestone version (#2490) This changes requires Python version 2.7.9+ and 3.4+
    luigi.contrib:
    • S3 client refactor (#2482)
    • Update moto to 1.x milestone version (#2471)

    Fixed

    luigi:
    • Fix Scheduler.add_task to overwrite accepts_messages attribute. (#2469)
    • Fix race condition (#2477)
    • Fix attribute forwarding for tasks with dynamic dependencies (#2478)
    luigi.contrib:
    • Fix transfer config import (#2458)

    Removed

    luigi:
    • Remove long-deprecated scheduler config variable alternatives (#2491)
    Source code(tar.gz)
    Source code(zip)
  • 2.7.6(Jul 11, 2018)

    Added

    luigi:
    • Add a configuration parameter to force multiprocessing (#2401)
    • Add a configuration parameter to enable/disable the pause button (#2399)
    • Send messages from scheduler to tasks (via "Send message" UI button) (#2426)
    • Allow to inject a context manager around TaskProcess.run (via task_process_context configuration parameter) (#2449)
    luigi.contrib:
    • S3: use Boto3 for the S3Client (#2423, #2149)
    • GCS: add method to push files using multiprocessing (#2376)
    • HDFS: add get_merge to snakebite client (#2410)
    • Redshift: add schema to DB if it doesn't exist (#2439)
    • Redshift: add table constraints support (#2435)

    Fixed

    luigi:
    • Allow long parameters in task history DB SQL result store (#2404)
    • Fix MissingParameterException when generating execution summary (#2415)
    • Fix luigid crash due to configuration file parsing (#2394)
    • Allow explicit parsing of BoolParameters (via luigi.BoolParameter.parsing variable) (#2427)
    • Make ChoiceParameter check if option is valid within .normalize (#2454)
    • ...and a good deal of documentation fixes and similar.
    luigi.contrib:
    • BigQuery: fix bulk_complete failing when argument is a generator (#2441)
    • Kubernetes: prevent KeyError in KubernetesJobTask (#2433)
    • Kubernetes: don't set activeDeadlineSeconds by default (#2452)
    Source code(tar.gz)
    Source code(zip)
  • 2.7.5(Apr 12, 2018)

  • 2.7.4(Apr 11, 2018)

    Added

    luigi:
    • Add google-auth-httplib2 as dependency (#2384)
    • Add new company user (#2388)
    • Allow release of resources during task running. (#2346)
    luigi.contrib:
    • Add parameterized backoff limit (#2375)

    Fixed

    luigi:
    • Typo in running luigi documentation (#2387)
    • Skip running coverage on unreasonable files
    • Pass kwargs to discovery.build() when instantiating GSCClient. (#2291)
    • Removed message 'No Instance(s) Available.' in Windows when starting … #2294
    Source code(tar.gz)
    Source code(zip)
  • 2.7.3(Mar 13, 2018)

    Added

    luigi:
    • Added generated data files to .gitignore (#2367)
    luigi.contrib:
    • Add possibility to specify Redshift column compression (#2343)

    Changed

    luigi:
    • Show status message button in worker tab when only progress is set (#2344)
    • Verbose worker error logging (#2353)
    luigi.contrib:
    • Replace oauth2client by google-auth (#2361)

    Fixed

    luigi:
    • Fix unicode formatting (#2339)
    luigi.contrib:
    • Fix contrib.docker_runner exit code check (#2356)
    Source code(tar.gz)
    Source code(zip)
  • 2.7.2(Jan 24, 2018)

    Added

    luigi:
    • A button to show errors for disabled tasks in the visualizer (#2266)
    • OptionalParameter parameter class (#2295)
    luigi.contrib:
    • Support for the Docker Python SDK (#2158)

    Changed

    luigi:
    • Change the logging status of prune messages from info to debug (#2254) - reduce repetitiveness of logging
    • Speed up the visualizer so that a refresh doesn’t take minutes when number of tasks gets into the millions (#2239)
    • Hide "re-enable" and "forgive failures" buttons on success (#2281)
    • Convert dates to datetimes for DateHourParameter (#2285)
    • Handle multiprocessing with request sessions (#2290) - Fixes bug with "RPCError: Received null response from remote scheduler"
    • Replaced param_args with dynamic property with deprecation message
    • Check task equality using param_kwargs instead of param_args

    Removed

    luigi.contrib:
    • Copying of Avro field documentation by BigQueryLoadAvro (#2269) - said copying is done by BigQuery natively now

    Fixed

    luigi:
    • Remove invalid entrypoint luigi.tools.migrate (#2257)
    • Preserve filter on server on reload (#2273)
    • Fix MRO on tasks using util.requires (#2307)
    luigi.contrib:
    • Fix Python 3 TypeError in contrib.hive.HiveTableTarget.exists (#2323)
    Source code(tar.gz)
    Source code(zip)
  • 2.7.1(Oct 5, 2017)

    Luigi 2.7.1 is mostly bug fixes and small feature additions.

    • Standardize Redshift credential usage across Redshift tasks: https://github.com/spotify/luigi/pull/2068
    • BigQueryLoadAvro now handles complex Avro types: https://github.com/spotify/luigi/pull/2224
    • Support for user-specified number of paralleled scheduled processes: https://github.com/spotify/luigi/pull/2205
    • ECS support for non-default cluster: https://github.com/spotify/luigi/pull/2045
    • BigQuery ExtractTask support: https://github.com/spotify/luigi/pull/2134
    • Support for autocommitting queries within Redshift and Postgres, allowing VACUUM statement execution: https://github.com/spotify/luigi/pull/2242
    • High Availability (HA) support with WebHdfsClient using multiple namenodes: https://github.com/spotify/luigi/pull/2230
    • Column mapping support for Redshift S3CopyToTable using the column list: https://github.com/spotify/luigi/pull/2245

    There have also been some other feature additions, bugfixes, and docfixes. See all commits here.

    Source code(tar.gz)
    Source code(zip)
Owner
Spotify
Spotify
Distributed-systems-algos - Distributed Systems Algorithms For Python

Distributed Systems Algorithms ISIS algorithm In an asynchronous system that kee

Tony Joo 2 Nov 30, 2022
A lightweight python module for building event driven distributed systems

Eventify A lightweight python module for building event driven distributed systems. Installation pip install eventify Problem Developers need a easy a

Eventify 16 Aug 18, 2022
ZeroNet - Decentralized websites using Bitcoin crypto and BitTorrent network

ZeroNet Decentralized websites using Bitcoin crypto and the BitTorrent network - https://zeronet.io / onion Why? We believe in open, free, and uncenso

ZeroNet 17.8k Jan 03, 2023
Python Stream Processing

Python Stream Processing Version: 1.10.4 Web: http://faust.readthedocs.io/ Download: http://pypi.org/project/faust Source: http://github.com/robinhood

Robinhood 6.4k Jan 07, 2023
Deluge BitTorrent client - Git mirror, PRs only

Deluge is a BitTorrent client that utilizes a daemon/client model. It has various user interfaces available such as the GTK-UI, Web-UI and a Console-UI. It uses libtorrent at it's core to handle the

Deluge team 1.3k Jan 07, 2023
Framework and Library for Distributed Online Machine Learning

Jubatus The Jubatus library is an online machine learning framework which runs in distributed environment. See http://jubat.us/ for details. Quick Sta

Jubatus 701 Nov 29, 2022
蓝鲸基础计算平台(BK-BASE)是一个专注于运维领域的的基础平台,打造一站式、低门槛的基础服务

蓝鲸基础计算平台(BK-BASE)是一个专注于运维领域的的基础平台,打造一站式、低门槛的基础服务。通过简化运维数据的收集、获取,提升数据开发效率,辅助运维人员实时运维决策,助力企业运营体系数字化、智能化转型。

Tencent 80 Dec 16, 2022
An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyperparameter tuning library.

Ray provides a simple, universal API for building distributed applications. Ray is packaged with the following libraries for accelerating machine lear

23.2k Dec 30, 2022
PowerGym is a Gym-like environment for Volt-Var control in power distribution systems.

Overview PowerGym is a Gym-like environment for Volt-Var control in power distribution systems. The Volt-Var control targets minimizing voltage violat

Siemens 44 Jan 01, 2023
Microsoft Distributed Machine Learning Toolkit

DMTK Distributed Machine Learning Toolkit https://www.dmtk.io Please open issues in the project below. For any technical support email to

Microsoft 2.8k Nov 19, 2022
Luigi is a Python module that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow management, visualization etc. It also comes with Hadoop support built in.

Luigi is a Python (3.6, 3.7 tested) package that helps you build complex pipelines of batch jobs. It handles dependency resolution, workflow managemen

Spotify 16.2k Jan 01, 2023
Distributed machine learning platform

Veles Distributed platform for rapid Deep learning application development Consists of: Platform - https://github.com/Samsung/veles Znicz Plugin - Neu

Samsung 897 Dec 05, 2022
PArallel Distributed Deep LEarning: Machine Learning Framework from Industrial Practice (『飞桨』核心框架,深度学习&机器学习高性能单机、分布式训练和跨平台部署)

English | 简体中文 Welcome to the PaddlePaddle GitHub. PaddlePaddle, as the only independent R&D deep learning platform in China, has been officially open

19.4k Dec 30, 2022
Bittorrent software for cats

NyaaV2 Setting up for development This project uses Python 3.7. There are features used that do not exist in 3.6, so make sure to use Python 3.7. This

3k Dec 30, 2022
An distributed automation framework.

Automation Kit Repository Welcome to the Automation Kit repository! Note: This package is progressing quickly but is not yet ready for full production

Automation Mojo 3 Nov 03, 2022
Privacy enhanced BitTorrent client with P2P content discovery

Tribler Towards making Bittorrent anonymous and impossible to shut down. We use our own dedicated Tor-like network for anonymous torrent downloading.

4.2k Dec 31, 2022
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.

Horovod Horovod is a distributed deep learning training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. The goal of Horovod is to make dis

Horovod 12.9k Dec 29, 2022
Run MapReduce jobs on Hadoop or Amazon Web Services

mrjob: the Python MapReduce library mrjob is a Python 2.7/3.4+ package that helps you write and run Hadoop Streaming jobs. Stable version (v0.7.4) doc

Yelp.com 2.6k Dec 22, 2022
Ray provides a simple, universal API for building distributed applications.

An open source framework that provides a simple, universal API for building distributed applications. Ray is packaged with RLlib, a scalable reinforcement learning library, and Tune, a scalable hyper

23.5k Jan 05, 2023
Run Python in Apache Storm topologies. Pythonic API, CLI tooling, and a topology DSL.

Streamparse lets you run Python code against real-time streams of data via Apache Storm. With streamparse you can create Storm bolts and spouts in Pyt

Parsely, Inc. 1.5k Dec 22, 2022