CVE-2024-39877: Apache Airflow Arbitrary Code Execution

Apache Airflow is an open-source platform for programmatically authoring, scheduling, and monitoring workflows. While it offers robust features for managing complex workflows, it has experienced security vulnerabilities. One notable vulnerability, CVE-2024-39877, is the DAG (Directed Acyclic Graph) code execution vulnerability. This allows authenticated DAG authors to craft a doc_md parameter in a way that can execute arbitrary code in the scheduler context, which is prohibited according to the Airflow security model.

Patch Diffing

Patch diff showing doc_md content treated as raw data instead of a Jinja2 template

From the pull request on GitHub that patches the vulnerability, we can see that the DAG code execution vulnerability arises from improper handling of the doc_md parameter, which allows attackers to inject and execute arbitrary code within the scheduler context. The doc_md parameter in Airflow’s DAG allows for the inclusion of Markdown documentation. However, due to improper sanitization, as Jinja2 is used to render the content of this parameter, it is possible to inject Jinja2 templates that can execute arbitrary Python code. Since the Airflow scheduler processes this parameter, any code injected will run in the context of the scheduler. The vulnerability was patched by treating the data within the doc_md parameter as raw data.

Testing Lab

1. We will build the lab on Docker. First, we need to pull the vulnerable image:

airflow % docker pull apache/airflow:2.4.0

2. Then, download the Docker Compose file:

airflow % curl -LfO ‘https://airflow.apache.org/docs/apache-airflow/2.4.0/docker-compose.yaml’

3. Create the logs, dags, plugins, and config folders, and the .env file:

airflow % mkdir -p ./dags ./logs ./plugins ./config && echo -e “AIRFLOW_UID=$(id -u)” > .env

4. Check the created directories and files:

airflow % ls

config dags docker-compose.yaml logs plugins

5. Initiate Airflow:

airflow % sudo docker compose up airflow-init

6. Now, run Airflow:

airflow % sudo docker compose up

Terminal starting the Airflow Docker Compose lab on port 8080

We can find it working on port 8080. Username and password are airflow:airflow.

The Analysis

Now, to reproduce the vulnerability, we need to create a DAG.

What is a DAG?

A Directed Acyclic Graph (DAG) is a finite graph with directed edges and no cycles. In the context of Apache Airflow, a DAG is a collection of all the tasks you want to run, organized in a way that reflects their relationships and dependencies.

Directed: Each edge in the graph has a direction, going from one node (task) to another.
Acyclic: There are no cycles in the graph, meaning that you cannot start at one task and follow the directed edges back to the same task.
Graph: A collection of nodes (tasks) and edges (dependencies between tasks).

DAGs in Apache Airflow

In Apache Airflow, DAGs are defined in Python scripts, which specify the relationships and dependencies between tasks. Here are some key components:

Tasks: The individual units of work, which can be anything from running a shell command to calling an API or running a machine learning model.
Dependencies: The relationships between tasks, specifying which tasks need to be completed before others can start.
Scheduling: Defines when and how often the DAG should run.

DAG Example

The following DAG includes a doc_md parameter. This parameter allows you to document your DAG using Markdown. The documentation will be visible in the Airflow web interface when you view the DAG details.

from datetime import datetime
from airflow import DAG
from airflow.operators.empty import EmptyOperator
default_args = {
    'owner': 'airflow',
    'start_date': datetime(2023, 1, 1),
    'retries': 1
}
# Define the DAG
dag = DAG(
    'example_dag_with_doc_md',
    default_args=default_args,
    description='An example DAG with doc_md',
    schedule='@daily',
    doc_md="""
    # Example DAG
    This is an example DAG that demonstrates the use of the `doc_md` parameter to add documentation.
    ## Description
    This DAG has two dummy tasks: `start` and `end`.
    ## Tasks
    - `start`: This is the starting task.
    - `end`: This is the ending task.
    ## Dependencies
    The `end` task depends on the `start` task.
    """
)
# Define the tasks
start = EmptyOperator(
    task_id='start',
    dag=dag
)

end = EmptyOperator(
    task_id='end',
    dag=dag
)
# Set the task dependencies
start >> end

doc_md: This parameter is used to add Markdown documentation to the DAG. The content within the doc_md string is written in Markdown and will be rendered in the Airflow web interface when you view the DAG details.
EmptyOperator: This is a simple operator that does nothing. It is used here to create placeholder tasks.

Now, Let’s Try Our DAG

Save the DAG File

Save the above code as a Python file (e.g., example_dag_with_doc_md.py) in the Airflow DAGs folder (/opt/airflow/dags/ in the case of our Docker setup).

Trigger the DAG

Go to the Airflow web interface and trigger the DAG named example_dag_with_doc_md.

View Documentation

Click on the DAG in the Airflow web interface to view its details. You will see the rendered Markdown documentation in the Doc tab.

What Happened Here Exactly?

Let’s take a look at the def get_doc_md(self, doc_md: str | None) -> str | None: function from the vulnerable code to see how it resolves the Markdown content from doc_md:

def get_doc_md(self, doc_md: str | None) -> str | None:
    if doc_md is None:
        return doc_md
    env = self.get_template_env(force_sandboxed=True)
    if not doc_md.endswith(".md"):
        template = jinja2.Template(doc_md)
    else:
        try:
            template = env.get_template(doc_md)
        except jinja2.exceptions.TemplateNotFound:
            return f"""
            # Templating Error!
            Not able to find the template file: `{doc_md}`.
            """
    return template.render()

The get_doc_md method is designed to process the doc_md parameter, allowing DAG authors to embed Markdown documentation within their DAGs. Here’s a breakdown of how it works:

1. Check if doc_md is None:

If doc_md is None, the function returns early.

2. Initialize Jinja2 Environment:

It initializes a Jinja2 environment with sandboxing enabled using self.get_template_env(force_sandboxed=True).

3. Handle doc_md Content:

If doc_md does not end with .md, it directly creates a Jinja2 template from the doc_md string using template = jinja2.Template(doc_md). This step is highly dangerous as it allows any string provided in doc_md to be treated as a Jinja2 template without any sanitization. If an attacker can manipulate this content, they can easily inject malicious Jinja2 expressions or even arbitrary Python code into the template.
If doc_md ends with .md, the method attempts to load the template from the environment using env.get_template(doc_md). If the template file is not found, it returns a templating error message. However, this part is less critical than the direct template creation.

4. Render the Template:

The final step template.render() executes the rendered template, which is where the injected code gets executed.

So, the vulnerability is a classic example of an injection attack (Server-Side Template Injection, SSTI).

Exploitation

Let’s see how this vulnerability can be exploited:

Airflow login page reached at port 8080 with airflow:airflow credentials

Attack Scenario

Step-by-Step Breakdown

1. Send Malicious doc_md Payload:

The attacker sends a malicious payload through the doc_md parameter to the web server.

2. Forward Payload to Airflow:

The web server forwards this payload to the Airflow application, which then invokes the get_doc_md method.

3. Invoke get_doc_md Method:

The method checks if the doc_md parameter is None and proceeds to initialize the Jinja2 environment.

4. Create Jinja2 Template:

Next, it creates a Jinja2 template using the doc_md content and renders the template. During the rendering process, the malicious code embedded in the doc_md parameter is executed by the operating system (OS).

5. Execute Injected Code:

The OS performs the command and returns the output to the Airflow application.

6. Send Response:

Finally, Airflow sends the rendered template output back to the web server, which then delivers the response, including the command output, back to the attacker.

Example of Injected Code

To demonstrate this, let’s inject code to dump the available classes:

doc_md="""
    {{ ''.__class__.__mro__[1].__subclasses__() }}
    """

The {{ ”.__class__.__mro__[1].__subclasses__() }} in Jinja2 template code leverages Python’s introspection capabilities to list all subclasses of the object class, effectively revealing all classes loaded in the current Python environment. Here’s how it works:

”.__class__ retrieves the class of an empty string, which is str.
Accessing .__mro__ on this class provides the method resolution order (MRO), a tuple that includes the str class itself and its base classes, including object.
The expression .__mro__[1] selects the object class from this tuple.
Finally, .__subclasses__() lists all known subclasses of the object class, allowing us to enumerate the classes available in the runtime. This can be used to identify useful classes like os.system to execute commands on the OS and achieve code execution.

SSTI payload in a DAG enumerating subclasses of the object class to reach os.system

After updating the DAG, our injected expression got rendered and dumped all the available classes.

Rendered DAG output dumping the available Python classes, including subprocess.Popen

Here, we can see that useful classes like subprocess.Popen can be used to execute commands. The exploitation depends on the environment and the availability of the classes.

Conclusion

In this analysis, we discovered the CVE-2024-39877 vulnerability, which allows authenticated DAG authors to exploit the doc_md parameter to execute arbitrary code in the scheduler context, violating Airflow’s security model. The vulnerability arises from improper handling and sanitization of the doc_md parameter, which is rendered using Jinja2 templates. This oversight allows attackers to inject malicious Jinja2 expressions that can execute Python code.

The method get_doc_md in the vulnerable code initializes a Jinja2 environment and directly creates a template from the doc_md string if it does not end with .md, rendering the template without adequate sanitization. This process can be exploited by injecting payloads that leverage Python’s introspection capabilities to enumerate available classes and execute commands, thereby compromising the system.

To mitigate this, the patch ensures proper handling of doc_md as raw data, preventing the execution of arbitrary code.

Looking to strengthen your security posture? SecureLayer7 helps organizations identify vulnerabilities, reduce risk, and defend against evolving cyber threats. Contact our experts to get started.

// SecureLayer7

How SecureLayer7 helps

SecureLayer7 tests web platforms like Apache Airflow for template injection and code execution flaws in parameters such as doc_md. Our web application pentests trace user-controlled input to dangerous sinks and confirm exploitability before attackers do.

Get a Pentest

Frequently Asked Questions

What is CVE-2024-39877 in Apache Airflow?

A code execution flaw affecting Apache Airflow up to version 2.9.2. An authenticated DAG author can place Jinja2 template syntax inside the doc_md parameter, and the scheduler renders it, running arbitrary Python in the scheduler process. The Airflow security model treats DAG authors and operators as separate trust levels, so this crosses a boundary that should not be crossable.

How does the doc_md parameter lead to code execution?

doc_md accepts Markdown for DAG documentation, but Airflow passes its content through Jinja2 rendering. Jinja2 expressions are evaluated, not just displayed, so payloads using template features reach Python objects and call methods like os.system. Because the scheduler does the rendering, the injected code runs with scheduler privileges.

Which Apache Airflow versions are affected by CVE-2024-39877?

Versions from 2.4.0 through 2.9.2 are vulnerable. The fix landed in 2.9.3, which stops rendering doc_md as a template and handles the value as raw data instead. Upgrade to 2.9.3 or later to remove the issue.

Who can exploit CVE-2024-39877?

Only an authenticated user with permission to author or modify DAG files. The attacker needs write access to a DAG definition to set a malicious doc_md value. This is a privilege escalation from DAG author to scheduler-level code execution, not an unauthenticated remote exploit.

How do you fix CVE-2024-39877?

Upgrade to Apache Airflow 2.9.3 or newer, where doc_md is no longer passed to Jinja2. If you cannot upgrade right away, restrict DAG-authoring permissions to trusted users and review existing DAGs for template syntax inside doc_md. Audit scheduler logs for unexpected process execution.