Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

DAG with AthenaOperator not producing OpenLineage DAG terminal event #40598

Open
2 tasks done
kacpermuda opened this issue Jul 4, 2024 · 0 comments
Open
2 tasks done
Assignees
Labels
area:providers kind:bug This is a clearly a bug provider:amazon-aws AWS/Amazon - related issues provider:openlineage AIP-53

Comments

@kacpermuda
Copy link
Contributor

Apache Airflow Provider(s)

amazon, openlineage

Versions of Apache Airflow Providers

Tested on both: main branch and latest pypi versions:
apache-airflow-providers-openlineage==1.8.0
apache-airflow-providers-amazon==8.25.0

Apache Airflow version

main branch and 2.9.2

Operating System

MacOS 14.5

Deployment

Astronomer

Deployment details

astro-runtime:11.6.0 (tested with astro dev and actual deployment, they should be the same but still verified it)

Also tested with breeze on main branch

What happened

When running simple DAG with AthenaOperator, I am sometimes not receiving OpenLineage DAG complete events. It looks flaky at first, and I've not yet figured it out. There is nothing suspicious in scheduler logs for me.

I'm not sure how the choice of the task (AthenaOperator) can influence lack of DAG complete event that is emitted from the scheduler, so it's possible that I did something wrong here. Let me know if you are able to reproduce this behaviour.

What you think should happen instead

We should always receive OpenLineage DAG complete events.

How to reproduce

I've run this DAG a couple times, and did not receive DAG complete events.

On astro, this is my dockerfile:

FROM quay.io/astronomer/astro-runtime:11.6.0
ENV AIRFLOW__OPENLINEAGE__TRANSPORT='{"type": "console"}'
ENV AIRFLOW__LOGGING__LOGGING_LEVEL=DEBUG
Example DAG
from airflow.providers.amazon.aws.operators.athena import AthenaOperator
from airflow import DAG
import datetime as dt
import random
import string

suffix = ''.join(random.choices(string.ascii_lowercase + string.digits, k=3))
table_name = f"t_{suffix}"

query = f"""
CREATE TABLE {table_name} AS
SELECT
    UPPER(name) AS x,
    age * 2 AS y
FROM
    workers_csv;
"""

with DAG(
    dag_id='athena',
    start_date=dt.datetime(2024, 5, 21),
    schedule=None
) as dag:
    task = AthenaOperator(
        task_id="task",
        aws_conn_id="aws",
        query=query,
        database="default",
        output_location="s3://<bucket-name>/results",
        deferrable=False,
        region="eu-central-1",
    )

Anything else

Scheduler logs from astro dev:
astro_scheduler_logs.txt

Scheduler and task logs from breeze:
breeze_scheduler_logs.txt
breeze_task_logs.log

Marquez events:
marquez_events

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@kacpermuda kacpermuda added area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Jul 4, 2024
@dosubot dosubot bot added provider:amazon-aws AWS/Amazon - related issues provider:openlineage AIP-53 labels Jul 4, 2024
@shahar1 shahar1 removed the needs-triage label for new issues that we didn't triage yet label Jul 13, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers kind:bug This is a clearly a bug provider:amazon-aws AWS/Amazon - related issues provider:openlineage AIP-53
Projects
None yet
Development

No branches or pull requests

2 participants