AI Flow is an open source framework that bridges big data and artificial intelligence.

Related tags

Deep Learningai-flow
Overview

Flink AI Flow

Introduction

Flink AI Flow is an open source framework that bridges big data and artificial intelligence. It manages the entire machine learning project lifecycle as a unified workflow, including feature engineering, model training, model evaluation, model service, model inference, monitoring, etc. Throughout the entire workflow, Flink is used as the general purpose computing engine.

In addition to the capability of orchestrating a group of batch jobs, by leveraging an event-based scheduler(enhanced version of Airflow), Flink AI Flow also supports workflows that contain streaming jobs. Such capability is quite useful for complicated real-time machine learning systems as well as other real-time workflows in general.

Features

You can use Flink AI Flow to do the following:

  1. Define the machine learning workflow including batch/stream jobs.

  2. Manage metadata(generated by the machine learning workflow) of date sets, models, artifacts, metrics, jobs etc.

  3. Schedule and run the machine learning workflow.

  4. Publish and subscribe events

To support online machine learning scenarios, notification service and event-based schedulers are introduced. Flink AI Flow's current components are:

  1. SDK: It defines how to build a machine learning workflow and includes the api of the Flink AI Flow.

  2. Notification Service: It provides event listening and notification functions.

  3. Meta Service: It saves the meta data of the machine learning workflow.

  4. Event-Based Scheduler: It is a scheduler that triggered jobs by some events happened.

Documentation

QuickStart

You can use Flink AI Flow according to the guidelines of Quick Start. Besides, you can also take a look at our Tutorial for writing your own workflow.

API

Please refer to the API Documentation to find the API supported by Flink AI Flow.

Design

If you are interested in design principle of Flink AI Flow, please see the Design Documentation for more details.

Examples

You can refer to some examples of Flink AI Flow to have a better understanding of how to write a workflow. Please see the Examples directory.

Reporting bugs

If you encounter any issues please open an issue in github and we encourage you to provide a patch through github pull request as well.

Contributing

We happily welcome contributions to Flink AI Flow. Please see our contribution guide for details.

Contact Us

For more information, you can join the Flink AI Flow Users Group on DingTalk to contact us. The number of the DingTalk group is 35876083.

You can also join the group by scanning the QR code below:

Comments
  • [Notification] Support CLI tool for notification server

    [Notification] Support CLI tool for notification server

    What is the purpose of the change

    Support CLI tool for notification server

    Brief change log

    (for example:)

    • Support CLI tool for notification server

    Verifying this change

    (Please pick either of the following options)

    This change added tests.

    opened by jiangxin369 2
  • [Notification] Add notification cli server command

    [Notification] Add notification cli server command

    What is the purpose of the change

    Add notification cli server command #170

    Brief change log

    • Add notification cli server command

    Verifying this change

    This change added tests.

    opened by Sxnan 2
  • [Airflow]Fix remote log handler bug

    [Airflow]Fix remote log handler bug

    What is the purpose of the change

    Fix #172

    Brief change log

    • Change the log_relative_path in remote log handler like S3, GCS, Azure when they init the context.
    • Add get_provider_info.py to pass airflow test cases.
    • Modified old test case in providers.

    Verifying this change

    This change has modified old test case to fit these new remote log handlers.

    opened by aqua7regia 2
  • [Airflow] The remote logging supports the current log mechanism

    [Airflow] The remote logging supports the current log mechanism

    FileTaskHandler, a python log handler that handles and reads task instance logs, supports the current log mechanism which uses seq_num and try_number to name the log file. The remote log handler like S3TaskHandler should support the log mechanism.

    bug Airflow 
    opened by SteNicholas 2
  • [hotfix] fix typo of spark operator

    [hotfix] fix typo of spark operator

    What is the purpose of the change

    (For example: This pull request makes user can stop workflow externally.)

    Brief change log

    (for example:)

    • Send StopDagEvent to Airflow scheduler via notification service
    • Once received StopDagEvent, Airflow scheduler kills all running dag runs

    Verifying this change

    (Please pick either of the following options)

    This change is a trivial rework / code cleanup without any test coverage.

    (or)

    This change is already covered by existing tests, such as (please describe tests).

    (or)

    This change added tests.

    opened by jiangxin369 1
  • Daily uploading packages to PyPI

    Daily uploading packages to PyPI

    What is the purpose of the change

    (For example: This pull request makes user can stop workflow externally.)

    Brief change log

    (for example:)

    • Send StopDagEvent to Airflow scheduler via notification service
    • Once received StopDagEvent, Airflow scheduler kills all running dag runs

    Verifying this change

    (Please pick either of the following options)

    This change is a trivial rework / code cleanup without any test coverage.

    (or)

    This change is already covered by existing tests, such as (please describe tests).

    (or)

    This change added tests.

    opened by jiangxin369 1
  • Add Tutorial Example

    Add Tutorial Example

    Brief change log

    (for example:)

    • Add tutorial example and docs
    • Each workflow execution should have its own working directory to avoid file overriding

    Verifying this change

    (Please pick either of the following options)

    This change is a trivial rework / code cleanup without any test coverage.

    opened by jiangxin369 1
  • Add documentations

    Add documentations

    What is the purpose of the change

    (For example: This pull request makes user can stop workflow externally.)

    Brief change log

    (for example:)

    • Send StopDagEvent to Airflow scheduler via notification service
    • Once received StopDagEvent, Airflow scheduler kills all running dag runs

    Verifying this change

    (Please pick either of the following options)

    This change is a trivial rework / code cleanup without any test coverage.

    (or)

    This change is already covered by existing tests, such as (please describe tests).

    (or)

    This change added tests.

    opened by jiangxin369 1
  • Refactor Notification Event

    Refactor Notification Event

    What is the purpose of the change

    Refactor Notification Event

    Verifying this change

    (Please pick either of the following options)

    This change is a trivial rework / code cleanup without any test coverage.

    opened by jiangxin369 1
  • Fix unittests about existed namespace

    Fix unittests about existed namespace

    What is the purpose of the change

    Fix unittests about existed namespace

    Verifying this change

    (Please pick either of the following options)

    This change is a trivial rework / code cleanup without any test coverage.

    opened by jiangxin369 1
  • Add docs about concepts

    Add docs about concepts

    What is the purpose of the change

    Add docs about concepts

    Verifying this change

    (Please pick either of the following options)

    This change is a trivial rework / code cleanup without any test coverage.

    opened by jiangxin369 1
  • Unifying TaskStatusEvent and TaskStatusChangedEvent

    Unifying TaskStatusEvent and TaskStatusChangedEvent

    Describe the bug

    Your environment

    Operating system

    Database

    Python version

    To Reproduce

    Steps to reproduce the behavior (if you can):

    1. Submit a '...'
    2. Click on '....'
    3. See error

    Expected behavior

    Actual behavior

    Screenshots

    Additional context

    opened by jiangxin369 0
  • Workflow Execution Status Incorrect

    Workflow Execution Status Incorrect

    Describe the bug

    task1 action_on_task_status(task2, success) when task1 failed, task2 would not be scheduled, but the status of workflow execution is still running. I think the status shoud be failed

    Your environment

    Operating system

    Database

    Python version

    To Reproduce

    Steps to reproduce the behavior (if you can):

    1. Submit a '...'
    2. Click on '....'
    3. See error

    Expected behavior

    Actual behavior

    Screenshots

    Additional context

    opened by jiangxin369 0
  • AIFlow support python3.6

    AIFlow support python3.6

    Describe the feature

    AIFlow support python3.6

    Describe the solution you'd like

    Describe alternatives you've considered

    Additional context

    When run aiflow with python3.6, notification server blocks wich following logs:

    [2022-07-25 14:07:07,682 - server.py:68 [MainThread] - INFO: Notification server started.
    [2022-07-25 14:07:18,780 - server.py:189 [Thread-1] - ERROR: Lock is not acquired.
    Traceback (most recent call last):
      File "/root/venv_for_aiflow/lib64/python3.6/site-packages/notification_service/server.py", line 185, in _call_behavior_async
        return await behavior(argument, context), True
      File "/usr/lib64/python3.6/asyncio/coroutines.py", line 225, in coro
        res = yield from await_meth()
      File "/root/venv_for_aiflow/lib64/python3.6/site-packages/notification_service/service.py", line 221, in _list_all_events
        pass
      File "/usr/lib64/python3.6/asyncio/coroutines.py", line 212, in coro
        res = func(*args, **kw)
      File "/usr/lib64/python3.6/asyncio/locks.py", line 86, in __aexit__
        self.release()
      File "/usr/lib64/python3.6/asyncio/locks.py", line 207, in release
        raise RuntimeError('Lock is not acquired.')
    RuntimeError: Lock is not acquired.
    
    opened by jiangxin369 0
  • action_on_event_received only process events sent by current workflow execution

    action_on_event_received only process events sent by current workflow execution

    Describe the feature

    Currently, the event is broadcasting, once the wanted event is sent, all workflow executions of this workflow would be triggered.

    Describe the solution you'd like

    As the scheduler dispatcher would inspect the context of each event to figure out if it contains the workflow execution id, we can inject the runtime context of each task execution to the user-defined event to make sure the event will only effect on the current workflow execution.

    We need the following changes:

    • Add a global variable _CURRENT_TASK_CONTEXT in context.py, it is used to store the runtime context of each task execution.
    • To read and write the _CURRENT_TASK_CONTEXT, add get_runtime_task_context and set_runtime_task_context functions.
    • Add a public API called wrap_execution_context to inject the runtime info to the context of the event before sending it.
    _CURRENT_TASK_CONTEXT: TaskExecutionContext = None
    
    
    def set_runtime_task_context(context: TaskExecutionContext):
        global _CURRENT_TASK_CONTEXT
        _CURRENT_TASK_CONTEXT = context
    
    
    def get_runtime_task_context():
        return _CURRENT_TASK_CONTEXT
    
    
    def wrap_execution_info_to_context(event: Event):
        """
      The event whose context is wrapped with workflow execution info would only be processed by specific workflow execution.
      """
        pass
    

    How to use it?

    def func():
        notification_client = get_notification_client()
        event = Event(event_key=EVENT_KEY, message='This is a custom message.')
        
        // wrap event with context
        wrap_execution_info_to_context(event)
        // send event
        notification_client.send_event(event)
    
    with Workflow(name='workflow') as w1:
        task = PythonOperator(name='task', python_callable=func)
    

    Describe alternatives you've considered

    Additional context

    opened by jiangxin369 0
  • [NotificationService] Cannot use one EmbeddedNotificationClient instance in multiple threads

    [NotificationService] Cannot use one EmbeddedNotificationClient instance in multiple threads

    Describe the bug

    Cannot use one EmbeddedNotificationClient instance in multiple threads.

    Your environment

    Operating system

    Database

    Python version

    To Reproduce

        def test_create_client_multiple_threads(self):
            import threading
            from notification_service.embedded_notification_client import EmbeddedNotificationClient
            from notification_service.event import Event, EventKey
    
            client = EmbeddedNotificationClient(server_uri="localhost:50052", namespace='default', sender='sender')
    
            def send_event():
                event = Event(event_key=EventKey('key1'), message='a')
                client.send_event(event)
                print(client.sequence_number)
            threads = []
            for i in range(100):
                thread = threading.Thread(target=send_event)
                threads.append(thread)
            for t in threads:
                t.start()
            for t in threads:
                t.join()
    

    Expected behavior

    100 events sent.

    Actual behavior

    less than 100 events sent.

    Screenshots

    Additional context

    opened by jiangxin369 0
Releases(release-0.3.1)
  • release-0.3.1(Feb 22, 2022)

    Features

    1. Flink job plugin supports the multiple Flink versions #236
    2. Support stopping/resuming job scheduling #241
    3. Support log view of job execution for frontend #251
    4. Improve documentation

    Bug Fixes

    1. Airflow operator context does not set correctly #250
    2. Failed to trigger workflow execution #260
    3. Fix update the deployed model version multiple times #269
    4. Fix register_model_version sending wrong event type bug #290

    Welcome to use this release version and give us the feedback of the AIFlow.

    Source code(tar.gz)
    Source code(zip)
  • release-0.3.0(Dec 16, 2021)

    Features

    AIFlow

    1. Introduces the command-line interface to help operation.
    2. Support database version migration.
    3. Supports the worflow development on the Jupyter Notebook #140

    Notification Service

    1. Introduces the command-line interface to help operation.
    2. Support database version migration.

    Bug Fixes

    1. The remote logging supports the current log mechanism #172
    2. Unittest failed cause by init_ai_flow_context #185
    3. AIFlow Webserver cannot sort model version by version #207

    Welcome to use this release version and give us the feedback of the AIFlow.

    Source code(tar.gz)
    Source code(zip)
    examples.tar.gz(32.95 MB)
  • release-0.2.2(Nov 19, 2021)

    Features

    AIFlow

    1. Add the documents of the AIFlow. #117

    Airflow

    1. Allow LocalExecutor to run with SQLite. #41
    2. Airflow webserver supports the Airflow and Notification databases. #49

    Notification Service

    1. Introduces the countEvents interface. #11

    Bug Fixes

    1. Fix oss blob manager download concurrently. #3
    2. Deepcopy tasks in celery executor to avoid race condition. #12
    3. Fix periodic workflow cannot run. #77
    4. Fix HDFSBlobManager failed to download existed file. #87
    5. Removes the uncompleted api action_on_dataset_event. #104
    6. The workflow directory is set incorrect in AIFlow runtime. #111

    Welcome to use this release version and give us the feedback of the AIFlow.

    Source code(tar.gz)
    Source code(zip)
    examples.tar.gz(32.93 MB)
  • release-0.2.0(Feb 9, 2022)

    Features

    AIFlow

    1. Add workflow execution on event and ContextExtractor API #476
    2. Add task execution restful api #478
    3. AIFlow add WorkflowEventManager to listen and handle events #492
    4. Support start new workflow execution with context #479
    5. Introduce the workflow frontend of the AIFlow UI #509
    6. Add FlinkSqlProcessor #527
    7. Support job execution label #529
    8. Frontend support metadata ui #533
    9. Notification service supports the idempotence #553
    10. Add read-only job plugin #555

    Airflow

    1. Support celery executor on event based scheduler #482
    2. Add AirFlowScheduler with airflow restful api #486

    Bug Fixes

    1. Make AI Flow be able to use Notification Service with HA enabled #510
    2. Duplicated entry when create dagrun #570
    3. EventBaseScheduler catches and prints exceptions #586
    4. Scheduler should find schedulable tasks once dagrun finished #587
    5. EventBaseScheduler would trigger task multiple times incorrectly #598
    Source code(tar.gz)
    Source code(zip)
    ai-flow-examples.tar.gz(32.93 MB)
  • release-0.1.0(Feb 9, 2022)

Owner
A neutral organization to host ecosystem projects for Apache Flink
GAN JAX - A toy project to generate images from GANs with JAX

GAN JAX - A toy project to generate images from GANs with JAX This project aims to bring the power of JAX, a Python framework developped by Google and

Valentin Goldité 14 Nov 29, 2022
Codes for SIGIR'22 Paper 'On-Device Next-Item Recommendation with Self-Supervised Knowledge Distillation'

OD-Rec Codes for SIGIR'22 Paper 'On-Device Next-Item Recommendation with Self-Supervised Knowledge Distillation' Paper, saved teacher models and Andro

Xin Xia 11 Nov 22, 2022
Pretrained Pytorch face detection (MTCNN) and recognition (InceptionResnet) models

Face Recognition Using Pytorch Python 3.7 3.6 3.5 Status This is a repository for Inception Resnet (V1) models in pytorch, pretrained on VGGFace2 and

Tim Esler 3.3k Jan 04, 2023
A Fast and Stable GAN for Small and High Resolution Imagesets - pytorch

A Fast and Stable GAN for Small and High Resolution Imagesets - pytorch The official pytorch implementation of the paper "Towards Faster and Stabilize

Bingchen Liu 455 Jan 08, 2023
Official PyTorch implementation of Learning Intra-Batch Connections for Deep Metric Learning (ICML 2021) published at International Conference on Machine Learning

About This repository the official PyTorch implementation of Learning Intra-Batch Connections for Deep Metric Learning. The config files contain the s

Dynamic Vision and Learning Group 41 Dec 10, 2022
Hardware accelerated, batchable and differentiable optimizers in JAX.

JAXopt Installation | Examples | References Hardware accelerated (GPU/TPU), batchable and differentiable optimizers in JAX. Installation JAXopt can be

Google 621 Jan 08, 2023
Taichi Course Homework Template

太极图形课S1-标题部分 这个作业未来或将是你的开源项目,标题的内容可以来自作业中的核心关键词,让读者一眼看出你所完成的工作/做出的好玩demo 如果暂时未想好,起名时可以参考“太极图形课S1-xxx作业” 如下是作业(项目)展开说明的方法,可以帮大家理清思路,并且也对读者非常友好,请小伙伴们多多参

TaichiCourse 30 Nov 19, 2022
A Pose Estimator for Dense Reconstruction with the Structured Light Illumination Sensor

Phase-SLAM A Pose Estimator for Dense Reconstruction with the Structured Light Illumination Sensor This open source is written by MATLAB Run Mode Open

Xi Zheng 14 Dec 19, 2022
PyTorch implementation of "PatchGame: Learning to Signal Mid-level Patches in Referential Games" to appear in NeurIPS 2021

PatchGame: Learning to Signal Mid-level Patches in Referential Games This repository is the official implementation of the paper - "PatchGame: Learnin

Kamal Gupta 22 Mar 16, 2022
State of the Art Neural Networks for Deep Learning

pyradox This python library helps you with implementing various state of the art neural networks in a totally customizable fashion using Tensorflow 2

Ritvik Rastogi 60 May 29, 2022
A symbolic-model-guided fuzzer for TLS

tlspuffin TLS Protocol Under FuzzINg A symbolic-model-guided fuzzer for TLS Master Thesis | Thesis Presentation | Documentation Disclaimer: The term "

69 Dec 20, 2022
An Industrial Grade Federated Learning Framework

DOC | Quick Start | 中文 FATE (Federated AI Technology Enabler) is an open-source project initiated by Webank's AI Department to provide a secure comput

Federated AI Ecosystem 4.8k Jan 09, 2023
Pytorch implementation AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative Adversarial Networks

AttnGAN Pytorch implementation for reproducing AttnGAN results in the paper AttnGAN: Fine-Grained Text to Image Generation with Attentional Generative

Tao Xu 1.2k Dec 26, 2022
Volumetric parameterization of the placenta to a flattened template

placenta-flattening A MATLAB algorithm for volumetric mesh parameterization. Developed for mapping a placenta segmentation derived from an MRI image t

Mazdak Abulnaga 12 Mar 14, 2022
This repository contains code to train and render Mixture of Volumetric Primitives (MVP) models

Mixture of Volumetric Primitives -- Training and Evaluation This repository contains code to train and render Mixture of Volumetric Primitives (MVP) m

Meta Research 125 Dec 29, 2022
Hyperbolic Hierarchical Clustering.

Hyperbolic Hierarchical Clustering (HypHC) This code is the official PyTorch implementation of the NeurIPS 2020 paper: From Trees to Continuous Embedd

HazyResearch 154 Dec 15, 2022
Setup freqtrade/freqUI on Heroku

UNMAINTAINED - REPO MOVED TO https://github.com/p-zombie/freqtrade Creating the app git clone https://github.com/joaorafaelm/freqtrade.git && cd freqt

João 51 Aug 29, 2022
NeRViS: Neural Re-rendering for Full-frame Video Stabilization

Neural Re-rendering for Full-frame Video Stabilization

Yu-Lun Liu 9 Jun 17, 2022
Target Propagation via Regularized Inversion

Target Propagation via Regularized Inversion The present code implements an ideal formulation of target propagation using regularized inverses compute

Vincent Roulet 0 Dec 02, 2021
Ppq - A powerful offline neural network quantization tool with custimized IR

PPL Quantization Tool(PPL 量化工具) PPL Quantization Tool (PPQ) is a powerful offlin

605 Jan 03, 2023