Python client for using Prefect Cloud with Saturn Cloud

Overview

prefect-saturn

GitHub Actions PyPI Version

prefect-saturn is a Python package that makes it easy to run Prefect Cloud flows on a Dask cluster with Saturn Cloud. For a detailed tutorial, see "Fault-Tolerant Data Pipelines with Prefect Cloud ".

Installation

prefect-saturn is available on PyPi.

pip install prefect-saturn

prefect-saturn can be installed directly from GitHub

pip install git+https://github.com/saturncloud/[email protected]

Getting Started

prefect-saturn is intended for use inside a Saturn Cloud environment, such as a Jupyter notebook.

import prefect
from prefect import Flow, task
from prefect_saturn import PrefectCloudIntegration


@task
def hello_task():
    logger = prefect.context.get("logger")
    logger.info("hello prefect-saturn")


flow = Flow("sample-flow", tasks=[hello_task])

project_name = "sample-project"
integration = PrefectCloudIntegration(
    prefect_cloud_project_name=project_name
)
flow = integration.register_flow_with_saturn(flow)

flow.register(
    project_name=project_name,
    labels=["saturn-cloud"]
)

Customize Dask

You can customize the size and behavior of the Dask cluster used to run prefect flows. prefect_saturn.PrefectCloudIntegration.register_flow_with_saturn() accepts to arguments to accomplish this:

For example, the code below tells Saturn that this flow should run on a Dask cluster with 3 xlarge workers, and that prefect should shut down the cluster once the flow run has finished.

flow = integration.register_flow_with_saturn(
    flow=flow,
    dask_cluster_kwargs={
        "n_workers": 3,
        "worker_size": "xlarge",
        "autoclose": True
    }
)

flow.register(
    project_name=project_name,
    labels=["saturn-cloud"]
)

Contributing

See CONTRIBUTING.md for documentation on how to test and contribute to prefect-saturn.

Comments
  • [CU-feu7x7] saturn labels

    [CU-feu7x7] saturn labels

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    Automatically add saturn-specific labels to the flow environment.

    How does this PR improve prefect-saturn?

    Along with the related change in saturn, this allows flows to be assigned to the correct cluster agent even if there are agents in multiple clusters that are all using the same prefect-cloud tenant.

    opened by bhperry 7
  • Bump development version

    Bump development version

    • [X] passes make lint
    • [X] adds tests to tests/ (if appropriate)

    What does this PR change?

    • Bumps version to 0.5.1.9000 for development

    How does this PR improve prefect-saturn?

    With this change, users can rely on the behavior that installations from source control will have a newer version than the newest release available on PyPI, but guaranteed to be older than the next release on PyPI.

    opened by dotNomad 6
  • [CU-frwyvw] Set flow instance size

    [CU-frwyvw] Set flow instance size

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    Adds instance size argument for flow node

    How does this PR improve prefect-saturn?

    Enables users to select a larger node size for their flow to run on. This means we are not forcing users to farm all of the work out to dask-clusters, they can instead run flows with a local executor and not be constrained by a medium tier node.

    opened by bhperry 4
  • pickle is a bad choice for hashing.

    pickle is a bad choice for hashing.

    Without this PR, modifying the python version could lead to different hashes for flow metadata. This PR hashes identifying information for flows using json, rather than pickle. This PR does change the hashing function. As a result re-registering a flow will create a new flow in Saturn Cloud.

    opened by hhuuggoo 3
  • add expectation that BASE_URL does not end in a slash

    add expectation that BASE_URL does not end in a slash

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    Changes PrefectCloudIntegration to expect the environment BASE_URL to NOT end in a trailing slash. In all Saturn production environments, BASE_URL does not end in a trailing slash.

    This PR also bumps the version of prefect-saturn to 0.2.0, since this change in behavior is a breaking change.

    How does this PR improve prefect-saturn?

    Without this fix, prefect-saturn is currently broken in Saturn production environments.

    Why can't we just keep the code that adds or removes slashes as needed?

    This project uses prefect's WebhookStorage (https://docs.prefect.io/orchestration/execution/storage_options.html#webhook). That type of storage uses template strings to reference environment variables, like this:

    flow = Flow(
        "some-flow",
        storage=Webhook(
            build_request_kwargs={
                "url": "${BASE_URL}/api/whatever",
            },
            build_request_http_method="POST",
           ....
        )
    )
    

    There is not place in there where we can introduce Saturn-written code that checks the environment variable BASE_URL and deals with missing or extra trailing slashes. So an assumption about whether or not the environment variable BASE_URL ends in a slash has to be hard-coded into the codebase here.

    opened by jameslamb 2
  • Fix tenant_id attr for Prefect 15

    Fix tenant_id attr for Prefect 15

    • [x] passes make lint
    • [ ] adds tests to tests/ (if appropriate)

    I don't believe any tests need to be added.

    What does this PR change?

    This PR changes the Client()._active_tenant_id > Client().tenant_id (when using Prefect >= 0.15) since _active_tenant_id is no longer a property of Prefect's Client in 0.15.

    How does this PR improve prefect-saturn?

    This allows prefect-saturn to work with prefect in version 0.15 which removed the private attribute.

    opened by dotNomad 1
  • [DEV-1227] Replace pyyaml

    [DEV-1227] Replace pyyaml

    Unlike PyYAML, ruamel.yaml supports:

    • YAML <= 1.2. PyYAML only supports YAML <= 1.1 This is vital, as YAML 1.2 intentionally breaks backward compatibility with YAML 1.1 in several edge cases. This would usually be a bad thing. In this case, this renders YAML 1.2 a strict superset of JSON. Since YAML 1.1 is not a strict superset of JSON, this is a good thing.
    • Roundtrip preservation When calling yaml.dump() to dump a dictionary loaded by a prior call to yaml.load():

    See more details at https://yaml.readthedocs.io/en/latest/pyyaml.html

    opened by wreis 1
  • Switch from Docker storage to Webhook storage

    Switch from Docker storage to Webhook storage

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    This PR replaces Docker storage with Webhook storage.

    This PR also adds a small working example to the README, to show users how it works. A longer-form tutorial will be up on https://www.saturncloud.io/docs/ some time in the next week.

    How does this PR improve prefect-saturn?

    The use of Webhook storage will make the integration between Prefect Cloud and Saturn Cloud much faster and less error-prone. It eliminates several hacks and special cases, should be much quicker, and removes an awkward race condition. With Docker storage, after calling .register_flow_with_saturn() you had to wait for a k8s job that built and pushed the image to complete. That could take up to 10 minutes, and nothing in prefect-saturn or the Saturn UI allowed you to see logs or other progress of that job.

    That job was also very brittle...it required docker-in-docker trickery, patching user-chosen images with several build-time-only dependencies, and running a sequence of multiple gnarly shell scripts.

    With Webhook storage, all of that complexity is eliminated. When you call .register_flow_with_saturn(), the flow is serialized with cloudpickle, sent to Saturn, and stored as bytes in an object store. At run time, when flow.storage.get_flow() is called, Saturn retrieves the binary content of the flow from an object store and sends it back over the wire. That's it! Everything is synchronous, so no weird race conditions, and it's just passing bytes around, so the storage process goes from 10+ minutes to under a second.

    opened by jameslamb 1
  • Add Saturn project details and remove unnecessary stuff

    Add Saturn project details and remove unnecessary stuff

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    • removes fields from saturn_details that are unnecessary as of #8
    • renames saturn_details to storage_details since that now only contains information for building storage
    • bump version floor to Python 3.7 (all the Saturn images are Python 3.7)
    • set ignore_healthchecks=True on the storage object.
      • This avoids a weird error where the image being built with flow.storage.build() doesn't have stuff like the Saturn start script in it, which could cause errors.
      • We don't need to care about that because every job in the flow's lifecycle will have the start script added to it's command / args

    How does this PR improve prefect-saturn?

    This PR removes unnecessary fields that could have become dependencies in users' code, reducing the surface for breaking changes.

    The ignore_healthchecks thing makes it possible for users to rely on the start script to install libraries.

    opened by jameslamb 1
  • change strategy for identifying flows

    change strategy for identifying flows

    • [x] passes make lint
    • [x] adds tests to tests/ (if appropriate)

    What does this PR change?

    This PR changes the strategy for uniquely identifying flows. Now the flow_hash sent to Saturn Cloud is the sha256 hash of:

    • project name
    • flow name
    • Prefect Cloud tenant id

    This means that flow_hash is equivalent to the Prefect Cloud concept flow_group_id. This PR proposes creating this hash ourselves because Prefect Cloud's flow_group_id can't be known until you've registered a flow with Prefect Cloud, and prefect-saturn needs to register with Saturn first.

    How does this PR improve prefect-saturn?

    This PR provides a reliable way to uniquely identify all versions of the same flow. It improves on the previous model, where tenant id was not considered. The previous model could have caused conflicts in the case where two flows with identical names and code, in Prefect Cloud projects with the same name but in different tenants, would get the same hash and conflict.

    Because this PR no longer considers the task graph in the hash, it also means that the hash will not change as a flow's task graph changes. That means pushing Docker storage to a container registry should be a lot faster, since it'll be more likely to hit the registry's cache.

    Notes for Reviewers

    I had to introduce prefect.client.Client in this PR, and then mock it with patch(). A lot of the diff in the test files is just whitespace, the result of adding in a with patch(....). I recommend reviewing with whitespace changes hidden.

    opened by jameslamb 1
  • Add more metadata to flows

    Add more metadata to flows

    This PR includes a few updates that improve execution and testing for the end-to-end experience with Prefect Cloud.

    What does this PR change?

    This PR adds more details from Saturn to the flow, so the agent running it knows how to configure the first job that loads the flow.

    How does this PR improve prefect-saturn?

    This PR allows flows executed from Prefect Cloud to take advantage of Saturn-y customization features like env secrets, filesystem secrets, and a custom start script.

    opened by jameslamb 1
Releases(v0.6.0)
  • v0.6.0(Nov 4, 2021)

    What's Changed

    • No default dask cluster by @jsignell in https://github.com/saturncloud/prefect-saturn/pull/50
    • Add encoding kwarg to open by @jsignell in https://github.com/saturncloud/prefect-saturn/pull/52

    New Contributors

    • @jsignell made their first contribution in https://github.com/saturncloud/prefect-saturn/pull/50

    Full Changelog: https://github.com/saturncloud/prefect-saturn/compare/v0.5.1...v0.6.0

    Source code(tar.gz)
    Source code(zip)
  • v0.5.1(Jul 15, 2021)

  • v0.5.0(Apr 23, 2021)

    Breaking

    None

    Features

    • replace pyyaml with ruamel-yaml (#28)
    • add support for KubernetesRun "RunConfig", if using this library with prefect >= 0.13.10 (#34, #36, #37)
      • this does not break compatibility with prefect >0.13.0,<=0.13.9

    Bug Fixes

    • fix deprecation warnings from prefect 0.14.x (#32)
      • some modules were reorganized from prefect 0.13.x to 0.14.x, and using the 0.13.x-style paths raises deprecation warnings

    Docs

    • support Python 3.8 (#33)
      • this library was already compatible with Python 3.8, but that is now tested on every build and documented in the package classifiers
    • fix keywords in package metadata (#39)
      • this improves the discoverability of this project on PyPi
    Source code(tar.gz)
    Source code(zip)
  • v0.4.4(Dec 9, 2020)

    Breaking

    None

    Features

    None

    Bug Fixes

    • set an explicit default of autoclose = False for dask_cluster_kwargs (#25)
      • this ensures that, by default, flows registered with prefect-saturn leave their Dask cluster up at the end of execution
      • this avoids the risk of one flow run closing down a Dask cluster that is in use by another flow run
      • this was already prefect-saturn's behavior, but only indirectly because autoclose defaults to False in dask-saturn. Not that is directly the default in prefect-saturn.
    • add tests on describe_sizes() (#24)

    Docs

    • added more docs in the README on how to customize the Dask cluster used by DaskExecutor (#25)
    Source code(tar.gz)
    Source code(zip)
  • v0.4.3(Dec 9, 2020)

    Breaking

    None

    Features

    • You can now set the instance size for the node that runs flow.run() (#23)
      • PrefectCloudIntegration.register_flow_with_saturn() gets a new keyword argument `instance_size
      • use new function describe_sizes() to list the valid options

    Bug Fixes

    None

    Docs

    • added documentation on changing the size of the instance that a flow runs on
    Source code(tar.gz)
    Source code(zip)
  • v0.4.2(Nov 19, 2020)

    Breaking

    None

    Features

    None

    Bug Fixes

    • fix broken installations from source distribution (prefect-saturn-*.tar.gz) (#22)

    Docs

    • package LICENSE file with package artifacts (#22)
    Source code(tar.gz)
    Source code(zip)
  • v0.4.1(Nov 6, 2020)

  • v0.4.0(Oct 21, 2020)

  • v0.3.0(Oct 16, 2020)

  • v0.2.0(Sep 18, 2020)

    Breaking

    • prefect-saturn now expects that the environment variable BASE_URL does not end in a slash (#17)

    Features

    None

    Bug Fixes

    None

    Docs

    None

    Source code(tar.gz)
    Source code(zip)
  • v0.1.1(Aug 31, 2020)

  • v0.1.0(Aug 13, 2020)

    Breaking

    • Moved .add_storage() and .add_environment() internal, and made .register_flow_with_saturn() do more. (#14) Now the interface is just like this:

      integration = PrefectCloudIntegration("some-project")
      flow.register_flow_with_saturn()
      flow.register(project_name="some-project", labels=["saturn-cloud"])
      
    • Replaced Docker storage with Webhook storage (#14)

    • Bumped prefect version floor to 0.13.0 (the first release that had Webhook) (#14)

    Features

    None

    Bug Fixes

    None

    Docs

    • Added a minimal working example in README (#14)
    Source code(tar.gz)
    Source code(zip)
  • v0.0.2(Jul 23, 2020)

    • moved some details of building KubernetesJobEnvironment into Saturn's back-end and out of this library
    • removed unnecessary elements in saturn_details
    • renamed saturn_details to storage_details since it now only contains things needed for building storage
    Source code(tar.gz)
    Source code(zip)
  • v0.0.1(Jul 21, 2020)

Owner
Saturn Cloud
End-to-End Data Science in Python Featuring Dask and Jupyter
Saturn Cloud
Simple screen recorder

Kooha Simple screen recorder Description Kooha is a simple screen recorder built with GTK. It allows you to record your screen and also audio from you

Dave Patrick 1.2k Jan 03, 2023
Python wrapper for GitHub API v3

Squeezeit - Python CSS and Javascript minifier Copyright (C) 2011 Sam Rudge This program is free software: you can redistribute it and/or modify it un

David Medina 207 Oct 24, 2022
Davide Gallitelli 3 Dec 21, 2021
Want to get your driver's license? Can't get a appointment because of COVID? Well I got a solution for you.

NJDMV-appoitment-alert Want to get your driver's license? Can't get a appointment because of COVID? Well I got a solution for you. We'll get you one i

Harris Spahic 3 Feb 04, 2022
52pojie 吾爱破解论坛 签到 支持云函数/服务器等Py3环境运行

52pojie-Checkin 52pojie 吾爱破解论坛 签到 Py3单程序 支持云函数/服务器等Py3环境运行 只需要Cookie即可运行 新版说明 依赖包请用项目 https://github.com/BlueSkyXN/requirements-serverless 需要填写的参数有 co

BlueSkyXN 22 Sep 15, 2022
ThetaGang is an IBKR bot for collecting money

💬 Join the Matrix chat, we can get money together. Θ ThetaGang Θ Beat the capitalists at their own game with ThetaGang 📈 ThetaGang is an IBKR tradin

Brenden Matthews 1.5k Jan 08, 2023
Senexia - A powerful telegram bot to manage your groups as effectively as possible

⚡ Kenechi bot ⚡ A Powerful, Smart And Simple Group Manager ... Written with AioG

Akhi 2 Jan 11, 2022
A simple versatile telgeram bot written in Python using pyTelegramBotAPI library.

A simple versatile telgeram bot written in Python using pyTelegramBotAPI library.

Benyamin Zojaji 15 Jun 17, 2022
A small discord bot to interface with python-discord's snekbox.

A small discord bot to interface with python-discord's snekbox.

Hassan Abouelela 0 Oct 05, 2021
google-resumable-media Apache-2google-resumable-media (🥉28 · ⭐ 27) - Utilities for Google Media Downloads and Resumable.. Apache-2

google-resumable-media Utilities for Google Media Downloads and Resumable Uploads See the docs for examples and usage. Experimental asyncio Support Wh

Google APIs 36 Nov 22, 2022
A jokes api python module

A jokes api python module

Fayas Noushad 3 Nov 28, 2021
Python bindings for Alexa Web Information Service (AWIS) API

Attention! This package is no longer maintained. See this ticket for more info. Wraps Alexa Web Information Service. Usage Making UrlInfo requests: ap

Atamert Ölçgen 51 Feb 12, 2022
Buy early bsc gems with custom gas fee, slippage, amount. Auto approve token after buy

Buy early bsc gems with custom gas fee, slippage, amount. Auto approve token after buy. Sell buyed token with custom gas fee, slippage, amount. And more.

Jesus Crypto 206 May 01, 2022
OKEX数字货币自动交易python语言SDK

okex-py OKEx数字货币自动交易python语言SDK (非官方) OKEx Cryptocurrency Exchange python SDK (Unofficial) 本项目基于V5 API 使用例子 Example import okex.v5.account_api as acco

43 Dec 01, 2022
Discord bot that generates boba drinks. Submission for sunhacks 2021

boba-bot Team Poggies' submission for Sunhacks 2021. Find our project page on Devpost, and a video demonstration can be found on YouTube. Commands $he

Joshua Tenorio 3 Nov 02, 2022
This bot will delete messages containing blacklisted words in your telegram groups.

Profanity Detector Bot This bot will delete messages containing blacklisted words in your telegram groups. Made using ProfanityDetector.

Aditya 17 Oct 08, 2022
A telegram bot to interact with a Minecraft Server

telegram-mc-bot A telegram bot to interact with a Minecraft Server It has the following commands: /status - Returns the server status (Online/Offline)

KleynArt 1 Dec 09, 2021
The scope of this project will be to build a data ware house on Google Cloud Platform that will help answer common business questions as well as powering dashboards

The scope of this project will be to build a data ware house on Google Cloud Platform that will help answer common business questions as well as powering dashboards.

Shweta_kumawat 2 Jan 20, 2022
Python functions for opentargets.org API

What is opy_Targets? Opentargets.org uses GraphQL API to explore it's content via coding. This ensemble of functions aim is to make it easy to use the

1 Jan 10, 2022
Simple integrate of API udemy.com with python

Pyudemy Simple integrate of API udemy.com with python Quick start $ pip install pyudemy or $ python setup.py install Authentication To make any calls

Hudson Brendon 30 Jan 02, 2023