当前位置:网站首页>Azkaban overview

Azkaban overview

2022-07-05 02:46:00 A vegetable chicken that is working hard

What is? azkaban

1. The term

  • Batch workflow task scheduler

2. explain

  • It is mainly used to run a group of work and processes in a specific order within a workflow , It's configured through simple key:value Right way , Through the configuration of dependencies To set up dependencies
  • Azkaban Use job Configuration files establish dependencies between tasks , And provide an easy to use web User interface to maintain and track your workflow

Why do we need a workflow scheduling system

1. Solve the dependency between task units

  • A complete data analysis system is usually composed of a large number of task units (shell Script program ,java Program ,mapreduce Program 、hive Script etc. )
  • There are time sequence and pre and post dependence relationships among task units
  • In order to well organize such a complex implementation plan , A workflow scheduling system is needed to schedule execution
     Insert picture description here

2. Timing scheduler

  • The whole execution process needs manual participation , And keep an eye on the progress of each task . But a lot of our tasks are carried out in the middle of the night , Set it up by writing a script crontab perform
  • Actually , The whole process is similar to a directed acyclic graph (DAG)
  • Each subtask is equivalent to a node in a large task , That is to say , What we need is a workflow scheduler , and Azkaban It is a scheduler that can solve the above problems

Azkaban characteristic

1. Compatible with any version of hadoop

2. Easy to use Web The user interface , Convenient and simple foolproof operation

3. Modular and pluggable plug-in mechanism

4. authentication / to grant authorization ( The work of authority )

5. Ability to kill and restart workflow

6. Email reminders about failure and success

Common workflow scheduling system

1. Simple task scheduling

  • Use it directly crontab Realization

2. Complex task scheduling

  • Develop a scheduling platform or use an off the shelf open source scheduling system , such as ooize、azkaban etc.

Ooize and Azkaban Feature comparison

 Insert picture description here

Azkaban The architecture of

1. Architecture diagram

 Insert picture description here

2. explain

  • AzkabanWebServer:AzkabanWebServer As a whole Azkaban The main of workflow system managers , It's user login authentication 、 be responsible for project management 、 Scheduled execution of workflow 、 A series of tasks such as tracking the progress of workflow execution
  • AzkabanExecutorServer: Be responsible for specific Submission of Workflow 、 perform , They use MySQL Database to coordinate task execution
  • Relational database (MySQL) Store most of the execution flow state ,AzkabanWebServer and AzkabanExecutorServer You need to access the database
原网站

版权声明
本文为[A vegetable chicken that is working hard]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/186/202207050240120530.html