Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Update await_pod_completion and affected unit tests #40568

Closed
wants to merge 4 commits into from

Conversation

ferruzzi
Copy link
Contributor

@ferruzzi ferruzzi commented Jul 3, 2024

Follow-up to #40360

Only modifying one of the methods at the moment. If we like what I did here, I'll do the same for the others. Please triple check my unit test changes to make sure I didn't inadvertently break the underlying mocks.

For the tenacity part of the change, note that I did away with the idea of using exponential backoff from the previous attempt (for now??) and just re-implemented the existing sleep(2) as a tenacity retry to align it with the other methods in this module which were already converted to tenacity. We may go back to trying different retry periods later, but for now that part of the change is just for consistency.

@dstandish and @potiuk : You both made good suggestions in the previous attempt at this, let me know what you think.

@potiuk
Copy link
Member

potiuk commented Jul 3, 2024

Ilike it.

Copy link
Contributor

@dstandish dstandish left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally I am not sure we need to go changing every while loop to tenacity.

But I don't really mind it either.

Re the main part of the change, changing to read_namespaced_pod_status, it's cool that it was not much trouble.

@@ -65,6 +65,10 @@ class PodLaunchFailedException(AirflowException):
"""When pod launching fails in KubernetesPodOperator."""
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

[merge blocker] If we like how I implemented this, I'll go ahead and apply the same changes to the other waiters. Please do not merge this.

@ferruzzi
Copy link
Contributor Author

ferruzzi commented Jul 3, 2024

Personally I am not sure we need to go changing every while loop to tenacity.

I don't disagree. In this case, most of the file was already converted to tenacity, it looked to me like whoever did it either did only the easy ones or never got around to finishing. I also figured of we do decide to implement the tiered backoff idea from the other PR then it's easier to make that change. Also also, I just irrationally dislike seeing hardcoded While True; sleep() loops for some reason. :P

@dstandish
Copy link
Contributor

I don't disagree. In this case, most of the file was already converted to tenacity, it looked to me like whoever did it either did only the easy ones or never got around to finishing. I also figured of we do decide to implement the tiered backoff idea from the other PR then it's easier to make that change. Also also, I just irrationally dislike seeing hardcoded While True; sleep() loops for some reason. :P

Ok i think i understand better my problem here.

First, I'm with you on tierd backoff thing and such -- I see the value of tenacity for that kind of thing.

And with a strong caveat of "you do you" :) .....

Re just never getting around to finishing conversion to tenacity... I think it's a little different. The existing uses are a little different in that they are retrying after something unexpected happens. E.g. a connection error or something on a specific network call. But the use here is more "waiting for something expected to happen". When you see that decorator there, it basically signals, "this method retries for these failures". But here it doesn't mean that. It looks like it is retrying after error, but really it's sort of just a mechanism to wait indefinitely.

And if you wanted to add retry-on-unexpected type of behavior, you'd need to stack another decorator. This of course is no big deal, but just sort of illustrates the two diff types of usages.

In any case, I think I have a suggestion that may be good enough for everyone.

What I would suggest is, just move the tenacity part interior to the function.

Something like this

        for attempt in Retrying(
            retry=tenacity.retry_if_exception_type(ResourceNotReadyException),
            wait=tenacity.wait_fixed(2),
        ):
            with attempt:
                remote_pod = self.read_pod(pod)
                if <pod not ready>
                self.log.info("Pod %s has phase %s", pod.metadata.name, remote_pod.status.phase)
                raise ResourceNotReadyException

Then we don't need to decorate the whole function with tenacity, so it's clearer that it's a single synchronous wait process that is not actually retried.

Since we're on the topic, just want to point out that there is some ugliness there currently in our usage of tenacity and I think we may have overdone it a bit. For example I see we have it added on get_container_names. But from the looks of it the source of retries in there is read_pod, and that itself already has tenacity on it! So instead of retrying 3 times (which the params indicate) if read_pod is having trouble, then it will retry 9 times :)

Again, you do you and happy holiday, even though you are canadian

@ferruzzi
Copy link
Contributor Author

ferruzzi commented Jul 3, 2024

Alright, we may need to have this conversation at a later date, but I'm going to drop that out of scope for this PR. For now I'll leave the loops how they are and just try scoping the get_{foo} calls down to get_{foo}_status calls. These should be two different PRs anyway, I suppose. I just fell into the "while you're in there..." trap.

@ferruzzi ferruzzi force-pushed the ferruzzi/k8s/fix-get-state branch from cee2ab3 to 2b40646 Compare July 3, 2024 23:27
Copy link
Member

@hussein-awala hussein-awala left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't understand the goal of this PR in its current state (replacing self.read_pod that supports caching by a simple K8S client call).

I request a change to avoid merging it accidentally.

@ferruzzi
Copy link
Contributor Author

ferruzzi commented Jul 4, 2024

It was discussed pretty heavily in the previous PR (linked at the top of the description) but basically a user complained about CPU spikes while waiting for the pod and container to start. I thought it was the retry loop, but Jarek suggested that it was more likely the parsing of the api call, and suggested using get_state instead of read_pod when all we actually want is the state anyway.

@dstandish
Copy link
Contributor

dstandish commented Jul 4, 2024

I don't understand the goal of this PR in its current state (replacing self.read_pod that supports caching by a simple K8S client call).

I don't understand the goal of this PR in its current state (replacing self.read_pod that supports caching by a simple K8S client call).

I request a change to avoid merging it accidentally.

@hussein-awala look here #40360 (comment)

that is the impetus for this PR

"why load the whole pod when you just need the status" is, i think, the idea.

but i'm curious -- can you tell us more what you're talking about with the caching?

@hussein-awala
Copy link
Member

hussein-awala commented Jul 4, 2024

"why load the whole pod when you just need the status" is, i think, the idea.

Unfortunately, in K8S, the pod status route returns the full pod manifest, so the result of read_namespaced_pod_status is the same as read_namespaced_pod with the same performance/cost:

import time

from kubernetes import client as k8s, config

config.load_kube_config(context="minikube")
client = k8s.ApiClient()
api = k8s.CoreV1Api(client)

namespace = "some namepsace"
pod_name = "some pod"

status = api.read_namespaced_pod_status(name=pod_name, namespace=namespace)
pod = api.read_namespaced_pod(name=pod_name, namespace=namespace)
print(pod == status)

the output of this script is:

True

but i'm curious -- can you tell us more what you're talking about with the caching?

We have a caching mechanism in read_pod method:

    def read_pod(self):
        _now = utcnow()
        if (
            self.read_pod_cache is None
            or self.last_read_pod_at + timedelta(seconds=self.read_pod_cache_timeout) < _now
        ):
            self.read_pod_cache = self.pod_manager.read_pod(self.pod)
            self.last_read_pod_at = _now
        return self.read_pod_cache

I didn't dive into the code to check if we use/need it, but between two methods that return the same result, and one of them has a cache mechanism, I prefer the one with caching.

@potiuk
Copy link
Member

potiuk commented Jul 5, 2024

Unfortunately, in K8S, the pod status route returns the full pod manifest, so the result of read_namespaced_pod_status is the same as read_namespaced_pod with the same performance/cost:

Well. As I wrote I am not the expert in those APIs :). But that's interesting. The API URLs for those are clearly different ("/status" is added in the status one) and the two indeed return V1Pod object. But I thought there must be a reason why there are two different calls - my intuition (yes not based on documentation - because the documentation is pretty enigmatic:

  • read the specified Pod

vs.

  • read status of the specified Pod

And while in simple cases, it might be the same - I was wondering if there are no cases where the Pod returned by the first method is way bigger and contain more information than Pod status returned by the second

I could not find any docs about it or discussions though :)

@ferruzzi
Copy link
Contributor Author

ferruzzi commented Jul 5, 2024

Unfortunately, in K8S, the pod status route returns the full pod manifest, so the result of read_namespaced_pod_status is the same as read_namespaced_pod with the same performance/cost:

I actually didn't notice that. I saw the return type was the same but like Jarek mentioned, I didn't see much detail on it so I assumed the /status endpoint just didn't "fill in the whole form", because why would they have two endpoints return the same thing.

Alright, given this new knowledge...... I'm not sure where else to go with this. Do any of you have a suggestion to reduce CPU usage while stuck in the await_pod_start and await_container_start loops?

@dstandish
Copy link
Contributor

Alright... I might have something for you @ferruzzi

We can skip the V1Pod deserialization process (response -> json -> V1Pod) by adding the arg _preload_content=False

Then, we can just convert to json and get the pod status with data["status"]["phase"]

In my testing this results in 40x speedup, so presumably, much less CPU intensive.

k8s_api.txt

You can test yourself by running locally with arg pod vs raw. You'll need to update the hardcoded pod name.

But yeah my results were consistently 20 seconds vs 0.5 seconds (with 10,000 operations)

@dstandish
Copy link
Contributor

dstandish commented Jul 5, 2024

But yeah, @ferruzzi re your comments on the first PR, I think yeah it would be a good idea to try and repro the users's situation. You would ask what their setup is, how many tasks are concurrently waiting for pods to start -- I presume this is celery? This would help you understand whether the issue is real. E.g. is there not another noisy neighbor that is using lots of CPU. But then, in any case, having repro'd it, you could use the same setup to evaluate whether your enhancement helps and if so how much and under what scenarios -- whether it is this optimization or some other one such as longer waits.

@dstandish
Copy link
Contributor

dstandish commented Jul 5, 2024

@hussein-awala

I didn't dive into the code to check if we use/need it, but between two methods that return the same result, and one of them has a cache mechanism, I prefer the one with caching.

The context here is looking for pod status to change. So a cache would be unhelpful here. I am actually quite surprised to see that code. I don't think it makes a lot of sense. Pretty much whenever we call read pod we're doing it for a reason.

It looks like it was related to the pod logs consumer thing. We should not have such a cache on the method generally. Most calls should not use it.

@potiuk
Copy link
Member

potiuk commented Jul 6, 2024

Then, we can just convert to json and get the pod status with data["status"]["phase"]
In my testing this results in 40x speedup, so presumably, much less CPU intensive.

Nice! Good one @dstandish !

The context here is looking for pod status to change. So a cache would be unhelpful here. I am actually quite surprised to see that code. I don't think it makes a lot of sense. Pretty much whenever we call read pod we're doing it for a reason.

I quite agree here.

@hussein-awala
Copy link
Member

We can skip the V1Pod deserialization process (response -> json -> V1Pod) by adding the arg _preload_content=False
Then, we can just convert to json and get the pod status with data["status"]["phase"]

Yes, that's a good idea.

@ferruzzi
Copy link
Contributor Author

ferruzzi commented Jul 8, 2024

we can just convert to json and get the pod status with data["status"]["phase"]

Nice! Good one @dstandish !

Yes, that's a good idea.

Cool, looks like we have a way forward then. I'll look into implementing it. Do we want to reuse this PR or shall I open a new one when I have that done?

@dstandish
Copy link
Contributor

Your call

@ferruzzi ferruzzi closed this Jul 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area:providers provider:cncf-kubernetes Kubernetes provider related issues
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

5 participants