-
Notifications
You must be signed in to change notification settings - Fork 13.8k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Send important executor logs to task logs #40468
base: main
Are you sure you want to change the base?
Conversation
46aa58f
to
76ba744
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few comments. Also what about the Batch executor? Do you plan on fast following with that after this PR?
Yep, my plan is to first do it with one executor and receive feedbacks, address feedbacks etc ... Then when the direction is set, I'll create another PR for |
Had a look and I don't have anything extra to add; I like the direction this is going. I'd need to see some examples to have any real opinion on the question of one or one-per-executor but otherwise, I think it looks good after that one change to the session creation that Niko mentioned. |
airflow/executors/base_executor.py
Outdated
self.log.error( | ||
"could not queue task %s (still running after %d attempts)", key, attempt.total_tries | ||
self.task_context_logger.error( | ||
"could not queue task %s (still running after %d attempts)", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think it's confusing for a user to see that a task couldn't be queued because it's currently running. Is there any way to make this log message more useful/intuitive?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you have any suggestions?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but I've run into this when troubleshooting and it always confuses me 😅
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're the perfect candidate to create a meaningful error message then :)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I updated the error message, please let me know if that looks better to you
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What about...
Could not queue task %s. Learn more: https://airflow.apache.org/docs/apache-airflow/stable/troubleshooting.html#still-running-on-executor
And add a blurb in troubleshooting.rst explaining what the comment says (triggerer race condition or the task is soon going to be marked failed)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That works too. Thanks for the proposal!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sorry for the late response, I was thinking about this a bit and it looks like you've settled on something, but an alternative just came to me... take it or leave it, of course... but I'll suggest an option:
[Could not | Failed to] queue task %s after %d attempts; executor [reports | notes | claims | states] task [is | as] [currently | already | " "] running.
I'm still juggling the phrasing.. but something along the lines of
Failed to queue task %s after %d attempts; executor reports task is currently running.
perhaps with the executor name/id in there
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like this proposal. @RNHTTR, what do you think? TBH @RNHTTR, I have nothing against writing a section in troubleshooting.rst about this special use case but since I have never encountered this use case I dont feel strongly confident describing it (since I am not sure how it happens). Or maybe you could describe it @RNHTTR? That'd be helpful
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I dont feel strongly confident describing it (since I am not sure how it happens)
This is my problem too, which is why I'm hoping we can come up with something more useful or just not surface this log. Usually when I encounter this, I mostly ignore it and look for something else that's meaningful.
In my opinion, executor reports task is currently running
is still tricky to users. If I understand correctly, it's reporting the state of the task in, for example, celery, which is different than the Airflow state of the task. I think surfacing this log as written will only confuse users more than if it didn't show up at all.
@@ -386,14 +385,16 @@ def attempt_task_runs(self): | |||
) | |||
self.pending_tasks.append(ecs_task) | |||
else: | |||
self.log.error( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@syedahsn You should have a review of this if you get a chance. The failure reason handling has been modified here.
If the executor fails to start a task, the user will not see any logs in the UI because the task has not started. This PR leverages
TaskContextLogger
implemented in #32646. It forwards the important error messages when an executor fail to execute a task to the task logs.cc @o-nikolas
^ Add meaningful description above
Read the Pull Request Guidelines for more information.
In case of fundamental code changes, an Airflow Improvement Proposal (AIP) is needed.
In case of a new dependency, check compliance with the ASF 3rd Party License Policy.
In case of backwards incompatible changes please leave a note in a newsfragment file, named
{pr_number}.significant.rst
or{issue_number}.significant.rst
, in newsfragments.