Send important executor logs to task logs #40468

vincbeck · 2024-06-27T19:19:12Z

If the executor fails to start a task, the user will not see any logs in the UI because the task has not started. This PR leverages TaskContextLogger implemented in #32646. It forwards the important error messages when an executor fail to execute a task to the task logs.

cc @o-nikolas

^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named {pr_number}.significant.rst or {issue_number}.significant.rst, in newsfragments.

airflow/utils/log/task_context_logger.py

o-nikolas

Left a few comments. Also what about the Batch executor? Do you plan on fast following with that after this PR?

airflow/executors/base_executor.py

airflow/utils/log/task_context_logger.py

airflow/providers/amazon/aws/executors/ecs/ecs_executor.py

vincbeck · 2024-06-27T21:06:13Z

Left a few comments. Also what about the Batch executor? Do you plan on fast following with that after this PR?

Yep, my plan is to first do it with one executor and receive feedbacks, address feedbacks etc ... Then when the direction is set, I'll create another PR for BatchExecutor

ferruzzi · 2024-06-27T22:57:47Z

Had a look and I don't have anything extra to add; I like the direction this is going. I'd need to see some examples to have any real opinion on the question of one or one-per-executor but otherwise, I think it looks good after that one change to the session creation that Niko mentioned.

RNHTTR · 2024-06-28T16:27:22Z

airflow/executors/base_executor.py

-                self.log.error(
-                    "could not queue task %s (still running after %d attempts)", key, attempt.total_tries
+                self.task_context_logger.error(
+                    "could not queue task %s (still running after %d attempts)",


I think it's confusing for a user to see that a task couldn't be queued because it's currently running. Is there any way to make this log message more useful/intuitive?

Do you have any suggestions?

No, but I've run into this when troubleshooting and it always confuses me 😅

You're the perfect candidate to create a meaningful error message then :)

I updated the error message, please let me know if that looks better to you

What about...

Could not queue task %s. Learn more: https://airflow.apache.org/docs/apache-airflow/stable/troubleshooting.html#still-running-on-executor

And add a blurb in troubleshooting.rst explaining what the comment says (triggerer race condition or the task is soon going to be marked failed)?

That works too. Thanks for the proposal!

Sorry for the late response, I was thinking about this a bit and it looks like you've settled on something, but an alternative just came to me... take it or leave it, of course... but I'll suggest an option:

[Could not | Failed to] queue task %s after %d attempts; executor [reports | notes | claims | states] task [is | as] [currently | already | " "] running.

I'm still juggling the phrasing.. but something along the lines of
Failed to queue task %s after %d attempts; executor reports task is currently running.

perhaps with the executor name/id in there

I like this proposal. @RNHTTR, what do you think? TBH @RNHTTR, I have nothing against writing a section in troubleshooting.rst about this special use case but since I have never encountered this use case I dont feel strongly confident describing it (since I am not sure how it happens). Or maybe you could describe it @RNHTTR? That'd be helpful

I dont feel strongly confident describing it (since I am not sure how it happens)

This is my problem too, which is why I'm hoping we can come up with something more useful or just not surface this log. Usually when I encounter this, I mostly ignore it and look for something else that's meaningful.

In my opinion, executor reports task is currently running is still tricky to users. If I understand correctly, it's reporting the state of the task in, for example, celery, which is different than the Airflow state of the task. I think surfacing this log as written will only confuse users more than if it didn't show up at all.

o-nikolas · 2024-06-28T19:27:21Z

airflow/providers/amazon/aws/executors/ecs/ecs_executor.py

@@ -386,14 +385,16 @@ def attempt_task_runs(self):
                    )
                    self.pending_tasks.append(ecs_task)
                else:
-                    self.log.error(


@syedahsn You should have a review of this if you get a chance. The failure reason handling has been modified here.

vincbeck requested review from eladkal, o-nikolas, kaxil, XD-DENG, ashb, pierrejeambrun and hussein-awala as code owners June 27, 2024 19:19

boring-cyborg bot added area:Executors-core LocalExecutor & SequentialExecutor area:logging area:providers provider:amazon-aws AWS/Amazon - related issues labels Jun 27, 2024

vincbeck commented Jun 27, 2024

View reviewed changes

airflow/utils/log/task_context_logger.py Show resolved Hide resolved

airflow/utils/log/task_context_logger.py Outdated Show resolved Hide resolved

Send important executor logs to task logs

76ba744

vincbeck force-pushed the vincbeck/task_log_executor branch from 46aa58f to 76ba744 Compare June 27, 2024 19:24

Add unit test

de2af6c

o-nikolas reviewed Jun 27, 2024

View reviewed changes

airflow/executors/base_executor.py Show resolved Hide resolved

airflow/utils/log/task_context_logger.py Outdated Show resolved Hide resolved

airflow/providers/amazon/aws/executors/ecs/ecs_executor.py Show resolved Hide resolved

vincbeck added 2 commits June 28, 2024 11:57

Create session only when ti is TaskInstanceKey

907083b

Fix tests

6eb4286

RNHTTR reviewed Jun 28, 2024

View reviewed changes

Fix tests + adjustments

34e00ce

o-nikolas reviewed Jun 28, 2024

View reviewed changes

vincbeck and others added 7 commits July 2, 2024 11:36

Merge branch 'main' into vincbeck/task_log_executor

44fe1fc

Update error message

4fcdefc

Fix test

ae80de4

Fix test

2abc248

Remove TaskInstanceKey type from _render_filename_db_access method

20e89e4

Fix ruff

595cbd6

Use correct create_session

bb7f1b6

Update error message

9aad244

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Send important executor logs to task logs #40468

Send important executor logs to task logs #40468

vincbeck commented Jun 27, 2024 •

edited

Loading

o-nikolas left a comment

vincbeck commented Jun 27, 2024 •

edited

Loading

ferruzzi commented Jun 27, 2024

RNHTTR Jun 28, 2024

vincbeck Jun 28, 2024

RNHTTR Jun 28, 2024

vincbeck Jun 28, 2024

vincbeck Jul 2, 2024

RNHTTR Jul 2, 2024

vincbeck Jul 2, 2024

ferruzzi Jul 2, 2024 •

edited

Loading

vincbeck Jul 2, 2024

RNHTTR Jul 3, 2024 •

edited

Loading

o-nikolas Jun 28, 2024

Send important executor logs to task logs #40468

Are you sure you want to change the base?

Send important executor logs to task logs #40468

Conversation

vincbeck commented Jun 27, 2024 • edited Loading

o-nikolas left a comment

Choose a reason for hiding this comment

vincbeck commented Jun 27, 2024 • edited Loading

ferruzzi commented Jun 27, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ferruzzi Jul 2, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

RNHTTR Jul 3, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vincbeck commented Jun 27, 2024 •

edited

Loading

vincbeck commented Jun 27, 2024 •

edited

Loading

ferruzzi Jul 2, 2024 •

edited

Loading

RNHTTR Jul 3, 2024 •

edited

Loading