当前位置:网站首页>Airflow 2.2.3 containerized installation
Airflow 2.2.3 containerized installation
2022-06-29 04:15:00 【Official account: Yunyuan ecosystem】
The above simple understanding airflow Concept and usage scenarios of , Passed today Docker Install it. Airflow, In the use of in-depth understanding airflow What are the specific functions .
1Airflow Containerized deployment
Alicloud's host environment :
- operating system :
Ubuntu 20.04.3 LTS - Kernel version :
Linux 5.4.0-91-generic
install docker
install Docker Refer to official documents [1], Pure system , There is no need to uninstall the old version , Because it is a cloud platform , To prevent the configuration from damaging the environment , You can take a snapshot in advance .
# to update repo
sudo apt-get update
sudo apt-get install \
ca-certificates \
curl \
gnupg \
lsb-release
# add to docker gpg key
curl -fsSL https://download.docker.com/linux/ubuntu/gpg | sudo gpg --dearmor -o /usr/share/keyrings/docker-archive-keyring.gpg
# Set up docker stable Warehouse address
echo \
"deb [arch=$(dpkg --print-architecture) signed-by=/usr/share/keyrings/docker-archive-keyring.gpg] https://download.docker.com/linux/ubuntu \
$(lsb_release -cs) stable" | sudo tee /etc/apt/sources.list.d/docker.list > /dev/null
# View installable docker-ce edition
[email protected]:~# apt-cache madison docker-ce
docker-ce | 5:20.10.12~3-0~ubuntu-focal | https://download.docker.com/linux/ubuntu focal/stable amd64 Packages
docker-ce | 5:20.10.11~3-0~ubuntu-focal | https://download.docker.com/linux/ubuntu focal/stable amd64 Packages
docker-ce | 5:20.10.10~3-0~ubuntu-focal | https://download.docker.com/linux/ubuntu focal/stable amd64 Packages
docker-ce | 5:20.10.9~3-0~ubuntu-focal | https://download.docker.com/linux/ubuntu focal/stable amd64 Packages
# Install command format
#sudo apt-get install docker-ce=<VERSION_STRING> docker-ce-cli=<VERSION_STRING> containerd.io
# Install the specified version
sudo apt-get install docker-ce=5:20.10.12~3-0~ubuntu-focal docker-ce-cli=5:20.10.12~3-0~ubuntu-focal containerd.io
Optimize Docker To configure
/etc/docker/daemon.json
{
"data-root": "/var/lib/docker",
"exec-opts": [
"native.cgroupdriver=systemd"
],
"registry-mirrors": [
"https://****.mirror.aliyuncs.com" # Configure some accelerated addresses here , For example, Alibaba cloud and so on ...
],
"storage-driver": "overlay2",
"storage-opts": [
"overlay2.override_kernel_check=true"
],
"log-driver": "json-file",
"log-opts": {
"max-size": "100m",
"max-file": "3"
}
}
Configure the boot itself
systemctl daemon-reload systemctl enable --now docker.service
Containerization installation Airflow
Database selection
According to the official website , The database recommends using MySQL8+ and postgresql 9.6+, stay Official docker-compose Script [2] It is used in PostgreSQL, So we need to adjust docker-compose.yml The content of
---
version: '3'
x-airflow-common:
&airflow-common
# In order to add custom dependencies or upgrade provider packages you can use your extended image.
# Comment the image line, place your Dockerfile in the directory where you placed the docker-compose.yaml
# and uncomment the "build" line below, Then run `docker-compose build` to build the images.
image: ${AIRFLOW_IMAGE_NAME:-apache/airflow:2.2.3}
# build: .
environment:
&airflow-common-env
AIRFLOW__CORE__EXECUTOR: CeleryExecutor
AIRFLOW__CORE__SQL_ALCHEMY_CONN: mysql+mysqldb://airflow:[email protected]/airflow # Replace here with mysql How to connect
AIRFLOW__CELERY__RESULT_BACKEND: db+mysql://airflow:[email protected]/airflow # Replace here with mysql How to connect
AIRFLOW__CELERY__BROKER_URL: redis://:[email protected]:6379/0 # To ensure safety , We are right. redis Authentication enabled , So here xxxx Replace with redis password
AIRFLOW__CORE__FERNET_KEY: ''
AIRFLOW__CORE__DAGS_ARE_PAUSED_AT_CREATION: 'true'
AIRFLOW__CORE__LOAD_EXAMPLES: 'true'
AIRFLOW__API__AUTH_BACKEND: 'airflow.api.auth.backend.basic_auth'
_PIP_ADDITIONAL_REQUIREMENTS: ${_PIP_ADDITIONAL_REQUIREMENTS:-}
volumes:
- ./dags:/opt/airflow/dags
- ./logs:/opt/airflow/logs
- ./plugins:/opt/airflow/plugins
user: "${AIRFLOW_UID:-50000}:0"
depends_on:
&airflow-common-depends-on
redis:
condition: service_healthy
mysql: # Change here to mysql service name
condition: service_healthy
services:
mysql:
image: mysql:8.0.27 # It is amended as follows mysql Latest image
environment:
MYSQL_ROOT_PASSWORD: bbbb # MySQL root Account and password
MYSQL_USER: airflow
MYSQL_PASSWORD: aaaa # airflow User's password
MYSQL_DATABASE: airflow
command:
--default-authentication-plugin=mysql_native_password # Specify the default authentication plug-in
--collation-server=utf8mb4_general_ci # According to the official character set
--character-set-server=utf8mb4 # According to the official character code
volumes:
- /apps/airflow/mysqldata8:/var/lib/mysql # Persistence MySQL data
- /apps/airflow/my.cnf:/etc/my.cnf # Persistence MySQL The configuration file
healthcheck:
test: mysql --user=$$MYSQL_USER --password=$$MYSQL_PASSWORD -e 'SHOW DATABASES;' # healthcheck command
interval: 5s
retries: 5
restart: always
redis:
image: redis:6.2
expose:
- 6379
command: redis-server --requirepass xxxx # redis-server Turn on password authentication
healthcheck:
test: ["CMD", "redis-cli","-a","xxxx","ping"] # redis Use a password to healthcheck
interval: 5s
timeout: 30s
retries: 50
restart: always
airflow-webserver:
<<: *airflow-common
command: webserver
ports:
- 8080:8080
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:8080/health"]
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-scheduler:
<<: *airflow-common
command: scheduler
healthcheck:
test: ["CMD-SHELL", 'airflow jobs check --job-type SchedulerJob --hostname "$${HOSTNAME}"']
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-worker:
<<: *airflow-common
command: celery worker
healthcheck:
test:
- "CMD-SHELL"
- 'celery --app airflow.executors.celery_executor.app inspect ping -d "[email protected]$${HOSTNAME}"'
interval: 10s
timeout: 10s
retries: 5
environment:
<<: *airflow-common-env
# Required to handle warm shutdown of the celery workers properly
# See https://airflow.apache.org/docs/docker-stack/entrypoint.html#signal-propagation
DUMB_INIT_SETSID: "0"
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-triggerer:
<<: *airflow-common
command: triggerer
healthcheck:
test: ["CMD-SHELL", 'airflow jobs check --job-type TriggererJob --hostname "$${HOSTNAME}"']
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
airflow-init:
<<: *airflow-common
entrypoint: /bin/bash
# yamllint disable rule:line-length
command:
- -c
- |
function ver() {
printf "%04d%04d%04d%04d" $${1//./ }
}
airflow_version=$$(gosu airflow airflow version)
airflow_version_comparable=$$(ver $${airflow_version})
min_airflow_version=2.2.0
min_airflow_version_comparable=$$(ver $${min_airflow_version})
if (( airflow_version_comparable < min_airflow_version_comparable )); then
echo
echo -e "\033[1;31mERROR!!!: Too old Airflow version $${airflow_version}!\e[0m"
echo "The minimum Airflow version supported: $${min_airflow_version}. Only use this or higher!"
echo
exit 1
fi
if [[ -z "${AIRFLOW_UID}" ]]; then
echo
echo -e "\033[1;33mWARNING!!!: AIRFLOW_UID not set!\e[0m"
echo "If you are on Linux, you SHOULD follow the instructions below to set "
echo "AIRFLOW_UID environment variable, otherwise files will be owned by root."
echo "For other operating systems you can get rid of the warning with manually created .env file:"
echo " See: https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#setting-the-right-airflow-user"
echo
fi
one_meg=1048576
mem_available=$$(($$(getconf _PHYS_PAGES) * $$(getconf PAGE_SIZE) / one_meg))
cpus_available=$$(grep -cE 'cpu[0-9]+' /proc/stat)
disk_available=$$(df / | tail -1 | awk '{print $$4}')
warning_resources="false"
if (( mem_available < 4000 )) ; then
echo
echo -e "\033[1;33mWARNING!!!: Not enough memory available for Docker.\e[0m"
echo "At least 4GB of memory required. You have $$(numfmt --to iec $$((mem_available * one_meg)))"
echo
warning_resources="true"
fi
if (( cpus_available < 2 )); then
echo
echo -e "\033[1;33mWARNING!!!: Not enough CPUS available for Docker.\e[0m"
echo "At least 2 CPUs recommended. You have $${cpus_available}"
echo
warning_resources="true"
fi
if (( disk_available < one_meg * 10 )); then
echo
echo -e "\033[1;33mWARNING!!!: Not enough Disk space available for Docker.\e[0m"
echo "At least 10 GBs recommended. You have $$(numfmt --to iec $$((disk_available * 1024 )))"
echo
warning_resources="true"
fi
if [[ $${warning_resources} == "true" ]]; then
echo
echo -e "\033[1;33mWARNING!!!: You have not enough resources to run Airflow (see above)!\e[0m"
echo "Please follow the instructions to increase amount of resources available:"
echo " https://airflow.apache.org/docs/apache-airflow/stable/start/docker.html#before-you-begin"
echo
fi
mkdir -p /sources/logs /sources/dags /sources/plugins
chown -R "${AIRFLOW_UID}:0" /sources/{logs,dags,plugins}
exec /entrypoint airflow version
# yamllint enable rule:line-length
environment:
<<: *airflow-common-env
_AIRFLOW_DB_UPGRADE: 'true'
_AIRFLOW_WWW_USER_CREATE: 'true'
_AIRFLOW_WWW_USER_USERNAME: ${_AIRFLOW_WWW_USER_USERNAME:-airflow}
_AIRFLOW_WWW_USER_PASSWORD: ${_AIRFLOW_WWW_USER_PASSWORD:-airflow}
user: "0:0"
volumes:
- .:/sources
airflow-cli:
<<: *airflow-common
profiles:
- debug
environment:
<<: *airflow-common-env
CONNECTION_CHECK_MAX_COUNT: "0"
# Workaround for entrypoint issue. See: https://github.com/apache/airflow/issues/16252
command:
- bash
- -c
- airflow
flower:
<<: *airflow-common
command: celery flower
ports:
- 5555:5555
healthcheck:
test: ["CMD", "curl", "--fail", "http://localhost:5555/"]
interval: 10s
timeout: 10s
retries: 5
restart: always
depends_on:
<<: *airflow-common-depends-on
airflow-init:
condition: service_completed_successfully
In the official docker-compose.yaml On this basis, only x-airflow-common,MySQL,Redis Related configuration , The next step is to start the container , Before startup , You need to create several persistent directories :
mkdir -p ./dags ./logs ./plugins echo -e "AIRFLOW_UID=$(id -u)" > .env # Be careful , Make sure that AIRFLOW_UID It's for ordinary users UID, And ensure that this user has the permission to create these persistent directories
If you are not an ordinary user , While running the container , Will report a mistake , Can't find airflow modular
docker-compose up airflow-init # Initialize database , And create tables docker-compose up -d # establish airflow Containers
When the status of the container is unhealthy When , To pass the docker inspect $container_name Check the reason for the error , thus airflow The installation of is complete .
Reference material
[1] Install Docker Engine on Ubuntu: https://docs.docker.com/engine/install/ubuntu/
[2] official docker-compose.yaml: https://airflow.apache.org/docs/apache-airflow/2.2.3/docker-compose.yaml
边栏推荐
- How to merge upstream and downstream SQL data records
- 我的创作纪念日
- Influence of air resistance on the trajectory of table tennis
- How sqlserver queries and removes results with null fields in the whole column
- IDEA修改jvm内存
- webassembly学习-动态链接
- [C language] address of stack memory associated with local variable 'num' returned
- 云主机mysql在本地电脑连接不上
- Yangzhou needs one English IT Helpdesk Engineer -20220216
- 你为什么做测试/开发程序员?还能回想出来吗......
猜你喜欢

快速开发项目-VScode插件

Libuv library overview and comparison of libevent, libev and libuv (Reprint)

Five thousand years of China

If I hadn't talked to Ali P7, I wouldn't know I was a mallet

Remote connection of raspberry pie in VNC Viewer Mode

Here comes Wi Fi 7. How strong is it?

Apifox : 不仅是Api调试工具,更是开发团队的协作神器

Redis cache penetration, cache breakdown, cache avalanche

SEAttention 通道注意力機制

Why is the test post a giant pit? The 8-year-old tester told you not to be fooled
随机推荐
How to back up all data on Apple mobile phone in 2 steps (free)
多机局域网办公神器 rustdesk 使用强推!!!
How to keep database and cache consistent
Path and LD_ LIBRARY_ Example of path usage
Technology: how to design zkvm circuit
C language -- branch structure
043. (2.12) what will happen if you become enlightened?
If you choose the right school, you can enter Huawei as a junior college. I wish I had known
【FPGA数学公式】使用FPGA实现常用数学公式
Nuxt - set SEO related tags, page titles, icons, etc. separately for each page (page configuration head)
SEAttention 通道注意力機制
c语言 --- 分支结构
PostgreSQL has a cross database references are not implemented bug
String differences between different creation methods
How sqlserver queries and removes results with null fields in the whole column
My creation anniversary
请问大佬,Oracle CDC报错 Call snapshotState on closed sou
Binary tree serialization and deserialization (leetcode (difficult))
ECS 四 Sync Point、Write Group、Version Number
SqlServer如何查询除去整列字段为null的结果