Apache Airflow with 3 Celery workers in docker-compose

Apache Airflow with three Celery Workers and docker-compose
executor = CeleryExecutor
result_backend = db+postgresql://airflow:airflow@postgres:5432/airflow
broker_url = redis://redis:6379/0
redis:
image: redis:5.0.5

Somebody can ask a question, why we do this and add different names for workers, and not just use docker-compose scale + ngnix.

Because ‘docker-compose scale + ngnix’ we will use when we need to have fault tolerance or we solve performance issues, but this is an additional ‘tool’.

When we have several different named workers we can use queues to split our DAGs between workers by priorities or DAG’s functionality.

For example, one worker is placed on VM, that also has installed JVM and resources for it, for example, to run Spark — we want always execute on this worker only DAGs with Spark. So you have only one VM with JVM and it’s enough for you, other workers run Ruby processes or only pure python. But, if you need fault tolerance you can scale your first ‘java-airflow worker’ with ‘compose scale + ngnix’ or some another way to have 3 instances of this airflow worker with JVM, but you will anyway have another ‘ruby-airflow worker’ or ‘pure-python-airflow worker’, that can also be scaled up with nginx. So, this is just scale — level for different purposes.

But this article not about queues and split DAGs between workers. Let’s return to our docker-compose.yml

worker_1:
build: .
restart: always
depends_on:
- postgres
volumes:
- ./airflow_files/dags:/usr/local/airflow/dags
entrypoint: airflow worker -cn worker_1
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-worker.pid ]"]
interval: 30s
timeout: 30s
retries: 3
worker_2:
build: .
restart: always
depends_on:
- postgres
volumes:
- ./airflow_files/dags:/usr/local/airflow/dags
entrypoint: airflow worker -cn worker_2
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-worker.pid ]"]
interval: 30s
timeout: 30s
retries: 3
worker_3:
build: .
restart: always
depends_on:
- postgres
volumes:
- ./airflow/dags:/usr/local/airflow/dags
entrypoint: airflow worker -cn worker_3
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-worker.pid ]"]
interval: 30s
timeout: 30s
retries: 3
flower:
build: .
restart: always
depends_on:
- postgres
volumes:
- ./airflow_files/dags:/usr/local/airflow/dags
entrypoint: airflow flower
healthcheck:
test: ["CMD-SHELL", "[ -f /usr/local/airflow/airflow-flower.pid ]"]
interval: 30s
timeout: 30s
retries: 3
ports:
- "5555:5555"
docker-compose down  # if your cluster is activedocker-compose up --build
Successful console traceback with Celery Worker
Empty Celery Flower UI with 3 workers
Flower UI after Airflow tasks execution

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store