Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

BeamRunJavaPipelineOperator fails without job_name set #40515

Closed
1 of 2 tasks
turb opened this issue Jul 1, 2024 · 5 comments · Fixed by #40645
Closed
1 of 2 tasks

BeamRunJavaPipelineOperator fails without job_name set #40515

turb opened this issue Jul 1, 2024 · 5 comments · Fixed by #40645
Labels
area:providers kind:bug This is a clearly a bug provider:google Google (including GCP) related issues

Comments

@turb
Copy link

turb commented Jul 1, 2024

Apache Airflow Provider(s)

apache-beam

Versions of Apache Airflow Providers

Can't find the actual information in Google Cloud Composer.

Some info in GCP doc

Apache Airflow version

2.7.3

Operating System

Google Cloud Composer

Deployment

Google Cloud Composer

Deployment details

Google Cloud Composer 2.8.2

What happened

BeamRunJavaPipelineOperator was running perfectly on Airflow 2.5.

After upgrading Google Cloud Composer with Airflow 2.6:

Failed to execute job xxx for task yyyy (Invalid job_name ({{task.task-id}}); the name must consist of only the characters [-a-z0-9], starting with a letter and ending with a letter or number ; 497143)

It seems {{task.task-id}} is not resolved, to task-id.

Upgrading then to Airflow 2.7 gives the same result.

Workaround: harcode "job_name": "ZZZ" in dataflow_config property.

What you think should happen instead

job_name should be automatically resolved from task_id as it was in 2.5.

How to reproduce

Run any BeamRunJavaPipelineOperator on 2.6+ without setting job_name.

Anything else

I am not at all familiar with Airflow internals, so providing a PR would take a lot of time.

Are you willing to submit PR?

  • Yes I am willing to submit a PR!

Code of Conduct

@turb turb added area:providers kind:bug This is a clearly a bug needs-triage label for new issues that we didn't triage yet labels Jul 1, 2024
Copy link

boring-cyborg bot commented Jul 1, 2024

Thanks for opening your first issue here! Be sure to follow the issue template! If you are willing to raise PR to address this issue please do so, no need to wait for approval.

@RNHTTR RNHTTR added provider:google Google (including GCP) related issues pending-response and removed needs-triage label for new issues that we didn't triage yet labels Jul 1, 2024
@RNHTTR
Copy link
Contributor

RNHTTR commented Jul 1, 2024

Is there a difference in the version of apache-airflow-providers-google before and after the upgrade?

@turb
Copy link
Author

turb commented Jul 1, 2024

@RNHTTR from the documentation I understand it was upgraded from 10.12.0 to 10.17.0, however I am unsure of it since I did not keep track of the precise version before upgrading (and neither GCP has a log of it).

@e-galan
Copy link
Contributor

e-galan commented Jul 11, 2024

The parsing of Dataflow job_name from the task_id parameter stopped working after #37934 , where the parsing of the dataflow_config parameter was moved from the operator's __init__() method into __execute__(). Apparently it was done to resolve issues in the unit tests that were caused by the use of Airflow's templating syntax in __init__() .

I should also note that this feature, where the job name was copied from the task ID is somewhat unique to the Apache Beam operator, as normally task id and job name are kept separate.

@e-galan
Copy link
Contributor

e-galan commented Jul 11, 2024

Submitted #40645 . It should resolve the issue @turb

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers kind:bug This is a clearly a bug provider:google Google (including GCP) related issues
Projects
None yet
Development

Successfully merging a pull request may close this issue.

3 participants