Get rid of force pod deletion by bart0sh · Pull Request #26 · pohly/kubernetes

bart0sh · 2022-09-16T13:28:14Z

DRA e2e test used force pod deletion for test and plugin pods.
This should not be a default way of deleting pods, but rather an exception.

It is useful to check new code with a stricter configuration because we want it to be of higher quality. Code reviews also become easier when reviewers don't need to point out inefficient code manually. What exactly should be enabled is up for debate. The current config uses the golangci-lint defaults plus everything that is enabled explicitly by the normal .golangci.yaml, just to be on the safe side. Error checking in _test.go files gets disabled because that seems to be fairly common.

When "make verify" is invoked in pull job, it will now also run golangci-lint with the strict configuration and write an $ARTIFACTS/golangci-lint-githubactions.log file with GitHub actions annotations. How to get those published for the GitHub PR is open. When "make verify" is invoked manually or in any other job, the stricter check is skipped. That works because "PR_NUMBER" is only set for pre-merge jobs (https://github.com/kubernetes/test-infra/blob/master/prow/jobs.md#job-environment-variables).

Because the script now explicitly selects the configuration file, the files no longer have to be in the root directory. Having them in hack without the leading dot is better because they then have the same owners as the script and are more visible. The downside is that manual invocations of golangci-lint without the parameter no longer work.

There's no good reason why error checking should be skipped in tests: if a function fails, the test should stop immediately. If errors are acceptable, then ignoring the error should be explicit and visible via "_ = someFunction()".

e2e.test usually writes stdout and stderr to the same destination (file or console). Therefore it makes no sense to print log messages with severity error or higher to stdout and stderr (the default) because they would just be shown twice. Therefore the e2e.test binary overrides the klog default before flag parsing. Command line flags can still change this.

Contextual logging cannot be enabled manually because there is no feature gate flag. Enabling the feature unconditionally: - should be low risk for E2E testing - if it fails, we want to know - is useful to get better log output from code which already supports it

This complements the other operations and will be used by some future tests.

This is useful for running a driver on a subset of all ready nodes: - use e2enode.GetBoundedReadySchedulableNodes with a suitable maximum number of nodes to determine how much nodes are available for a test - define pod anti-affinity in the PodTemplate: affinity: podAntiAffinity: requiredDuringSchedulingIgnoredDuringExecution: - labelSelector: matchLabels: app.kubernetes.io/instance: xxxxxxx topologyKey: kubernetes.io/hostname - set the ReplicaSetSpec.Replicas value to the number of nodes

It's new code, so we should follow the latest recommendations for logging.

ResourceClaimSpec and its associated code are needed for the core API and have to be defined there to avoid import cycles. Therefore the other types also get defined there.

Created with "make generated_files update".

This is needed for "kubectl get". It depends on the generated swagger docs.

This is similar to the support code for generic ephemeral inline volumes. Differences: - to avoid stuttering, the functions are just resourceclaim.Name and resourceclaim.IsForPod - resourceclaim.Name returns the right name for both cases (template and reference), which will simplify some code Other helper functions check some simple status conditions.

The controller uses the exact same logic as the generic ephemeral inline volume controller, just for inline ResourceClaimTemplate -> ResourceClaim.

The plugin handles the interaction with ResourceClaims that are referenced by a Pod.

This is needed by the API server, but can also be used by driver developers to validate user input early.

Dependencies need to be updated to use github.com/container-orchestrated-devices/container-device-interface. github.com/fsnotify/fsnotify v1.5.1 and sigs.k8s.io/yaml v1.3.0 are needed.

This call was implemented incorrectly and not actually needed as DRA Manager doesn't checkpoint its cache due to an idempotent nature of the DRA Plugin APIs. This call can also race with other parts of the DRA manager implementation. This should fix known issue: NodeUnprepareResource was not called for the resources that were removed from the DRA manager cache by the UpdatePodResources.

As we delete pod UID from DRA manager cache before checking how many is left we should call NodeUnprepareResource only when there are no pod UIDs left in the case for a certain resource claim.

Moved addPodUID and removePodUID APIs to the resource struct as they're always called for the particular resource. This should also fix incorrectly implemented deletePodUIs API that was removing pod UID from all claims referencing it.

These helper packages implement the parts of a dynamic resource allocation driver that are generic and can be used by the in-tree test driver as well as out-of-tree vendor drivers. Functional options make it possible to treat some parameters as optional (logger) and extend the API later on.

The driver can be used manually against a cluster started with local-up-cluster.sh and is also used for E2E testing. Because the tests proxy connections from the nodes into the e2e.test binary and create/delete files via the equivalent of "kubectl exec dd/rm", they can be run against arbitrary clusters. Each test gets its own driver instance and resource class, therefore they can run in parallel.

Used sets API to store UIDs of pods referencing resource claims in the DRA Manager cache.

In contrast to ginkgo.AfterEach, ginkgo.DeferCleanup runs the callback in first-in-last-out order. Using it makes the following test code work as expected: f := framework.NewDefaultFramework("some test") ginkgo.AfterEach(func() { // do something with f.ClientSet }) Previously, f.ClientSet was already set to nil by the framework's cleanup code.

By using DeferCleanup we can get rid of artificial nesting because it runs the callbacks in first-in-last-out order. While at it, tests that aren't specifically targeting kubelet are defined for a "cluster" context where they use more than one node, if available. This is a more realistic scenario.

kubelet isn't ready for it yet.

ginkgo.By can be used to record how long a callback runs. This particular feature is not used much (or at all?) in Kubernetes and we don't have tooling for analyzing the measurements. What would be more important is text output. framework.By provides that: - reminders that a step is still running at regular intervals, configurable with --slow-step-threshold - the total duration for slow steps Example: STEP: delete pods and claims 08/24/22 11:55:19.676 STEP: deleting *v1.Pod dra-5514/tester-1 08/24/22 11:55:19.679 STEP: deleting *v1.ResourceClaim dra-5514/tester-1-my-inline-claim 08/24/22 11:55:19.686 STEP still running: delete pods and claims 08/24/22 11:55:29.677 STEP still running: delete pods and claims 08/24/22 11:55:39.676 STEP still running: delete pods and claims 08/24/22 11:55:49.677 <additional output when the objects finally get removed> STEP duration: delete pods and claims 33.253509791s

We don't want to document in detail how to use local-up-cluster.sh, only show the specific options that are needed for DRA. Figuring out which other options may be needed is left as exercise to the reader... Paths for files were stale.

gomega.Eventually with a function that takes a gomega.Gomega instance can be used to poll for certain conditions. The advantage over custom code is that gomega assertions can be used. However, gomega as it stands today does not report why the polling continues. This is annoying when running a test interactively: one has to wait for the timeout before additional information is shown in the failure message. SucceedEventually is a replacement for gomega.Eventually which is tailored towards this use case and adds intermediate output. It's best combined with the By replacement. Example with some artificial failure: STEP: wait for plugin registration 08/24/22 14:17:23.488 ... STEP still running: wait for plugin registration 08/24/22 14:17:33.489 STEP still running: wait for plugin registration 08/24/22 14:17:43.489 STEP still running: wait for plugin registration 08/24/22 14:17:53.489 Aug 24 14:17:53.509: INFO: wait for plugin registration xxx Expected <bool>: false to equal <bool>: true STEP still running: wait for plugin registration 08/24/22 14:18:03.489 STEP still running: wait for plugin registration 08/24/22 14:18:13.489 STEP still running: wait for plugin registration 08/24/22 14:18:23.489 Aug 24 14:18:23.489: FAIL: Timed out after 60.000s. wait for plugin registration Expected <string>: xxx Expected <bool>: false to equal <bool>: true to be empty Full Stack Trace k8s.io/kubernetes/test/e2e/dra.(*Driver).SetUp.func7() /nvme/gopath/src/k8s.io/kubernetes/test/e2e/dra/deploy.go:195 +0xda k8s.io/kubernetes/test/e2e/framework.By({0x702cfda, 0x1c}, 0xc0029d3df8, {0x0, 0x0, 0x6fbc0cb?}) /nvme/gopath/src/k8s.io/kubernetes/test/e2e/framework/log.go:115 +0x5ea k8s.io/kubernetes/test/e2e/dra.(*Driver).SetUp(0xc0006af700) /nvme/gopath/src/k8s.io/kubernetes/test/e2e/dra/deploy.go:194 +0xa67 k8s.io/kubernetes/test/e2e/dra.NewDriver.func1() /nvme/gopath/src/k8s.io/kubernetes/test/e2e/dra/deploy.go:67 +0x26

If pod deletion hangs, then we want to see that. Otherwise we only get told that claims aren't getting deleted, but that then isn't the root cause.

By letting the test pods run to completion right away we save time because each process starts and stops as quickly as possible.

Having all processes exit immediately is not realistic and may hide issues that occur for longer-running processes. Another solution with the same effect (fast test execution) is to let kubelet kill each process quickly.

Moved call of the NodePrepareResources from the admission phase to the container configuration phase. This should enable retrying NodePrepareResources if DRA plugin is not ready or previous call of the NodePrepareResources was unsuccessful.

It's not decided yet whether we will implement Topology support for DRA or not. Removing toppology-related code will help to avoid wrong impression that DRA is used as a hint provider for the Topology Manager.

DRA e2e test used force pod deletion for test and plugin pods. This should not be a default way of deleting pods, but rather an exception.

pohly · 2022-09-16T14:35:32Z

The commit message is a bit misleading: "used force pod deletion for test and plugin pods."

The delete itself was not using force-deletion. That would have been a DeleteOption with explicit zero as grace period.

It was a bit more indirect: because of the termination grace period being zero, Kubernetes concluded that deletion should also be immediate. It's worth calling that out in the code comment and commit message. A link to https://github.com/kubernetes/kubernetes/blob/0f582f7c3f504e807550310d00f130cb5c18c0c3/pkg/registry/core/pod/strategy.go#L151-L171 would also be useful, for future reference.

pohly · 2022-09-16T14:36:39Z

FWIW, comments in the code are more important than commit message at the moment. Eventually all "e2e dra" commits will be squashed into one, with a commit message that won't capture all details on how the test came into existence.

bart0sh · 2022-09-16T17:52:08Z

The delete itself with a grace period 0 is a force deletion in my understanding.
k8s users are also aware of that I think as kubectl doesn't allow to set grace period 0 without using --force.

bart0sh · 2022-09-16T17:56:06Z

Referirng to https://github.com/kubernetes/kubernetes/blob/0f582f7c3f504e807550310d00f130cb5c18c0c3/pkg/registry/core/pod/strategy.go#L151-L171 would make it harder to understand in my opinion. At least it did it for me until I read this user-friendly explanation: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination-forced.

pohly · 2022-09-16T18:53:43Z

The delete itself with a grace period 0 is a force deletion in my understanding.

DeleteOptions.GracePeriodSeconds is a pointer, therefore DeleteOptions{} is not setting GracePeriodSeconds to zero. A client has to explicitly choose force deletion, and that was not done here.

pohly · 2022-09-16T18:55:06Z

Referirng to https://github.com/kubernetes/kubernetes/blob/0f582f7c3f504e807550310d00f130cb5c18c0c3/pkg/registry/core/pod/strategy.go#L151-L171 would make it harder to understand in my opinion. At least it did it for me until I read this user-friendly explanation: https://kubernetes.io/docs/concepts/workloads/pods/pod-lifecycle/#pod-termination-forced.

It's okay to refer to both, but the user-facing documentation does not cover the handling of Spec.TerminationGracePeriodSeconds, does it? That is only in the code.

bart0sh · 2022-09-16T20:16:38Z

My point was that we shouldn't give too many details on that. Everybody knows that grace period 0 for pod means its force deletion. The change is self-explanatory in my opinion - we don't want force deletion as a default way of deleting pods in our tests. I can change it to whatever you want if you insist, though.

pohly · 2022-09-19T07:05:34Z

Everybody knows that grace period 0 for pod means its force deletion.

But that's the point: the cleanup code as written at the moment doesn't set grace period 0. It's completely unexpected that it force-deleted pods. It only became a force-delete because of some setting elsewhere, and that needs to be explained. Otherwise it remains unclear why DeleteOptions.GracePeriodSeconds gets set.

pohly · 2022-09-19T07:06:30Z

A better solution might be to create pods with a 1 second deletion period and not do anything special in the cleanup code.

bart0sh · 2022-09-19T12:20:32Z

ok, closing then

pohly and others added 30 commits September 1, 2022 08:39

hack: re-enable errcheck also for tests

32563bc

There's no good reason why error checking should be skipped in tests: if a function fails, the test should stop immediately. If errors are acceptable, then ignoring the error should be explicit and visible via "_ = someFunction()".

golangci: update logcheck original-url

490b73d

e2e storage: add CreateFile to pod IO operations

6fe90b1

This complements the other operations and will be used by some future tests.

logcheck.conf: ensure that dra uses contextual logging

240ab4f

It's new code, so we should follow the latest recommendations for logging.

kube features: add DynamicResourceAllocation

f96aed7

api: add dynamic resource allocation API

a3cf01f

ResourceClaimSpec and its associated code are needed for the core API and have to be defined there to avoid import cycles. Therefore the other types also get defined there.

api: generated files for dynamic resource allocation

3e5f930

Created with "make generated_files update".

api: implement printers for dynamic resource allocation

ef0a936

This is needed for "kubectl get". It depends on the generated swagger docs.

kube-controller-manager: clone resource controller from volume/ephemeral

1b117f7

kube-controller-manager: add ResourceClaim controller

a5a8453

The controller uses the exact same logic as the generic ephemeral inline volume controller, just for inline ResourceClaimTemplate -> ResourceClaim.

scheduler: add dynamic resource allocation plugin

4025f2c

The plugin handles the interaction with ResourceClaims that are referenced by a Pod.

component-helpers: add validation code

8896dc4

This is needed by the API server, but can also be used by driver developers to validate user input early.

kubelet: add support for dynamic resource allocation

b7e799c

Dependencies need to be updated to use github.com/container-orchestrated-devices/container-device-interface. github.com/fsnotify/fsnotify v1.5.1 and sigs.k8s.io/yaml v1.3.0 are needed.

kubelet: fix refcounting mistake

bb5b7a7

As we delete pod UID from DRA manager cache before checking how many is left we should call NodeUnprepareResource only when there are no pod UIDs left in the case for a certain resource claim.

kubelet: move add/removePodUID APIs down the stack

1d2894f

Moved addPodUID and removePodUID APIs to the resource struct as they're always called for the particular resource. This should also fix incorrectly implemented deletePodUIs API that was removing pod UID from all claims referencing it.

kubelet: use sets instead of maps

38c34c4

Used sets API to store UIDs of pods referencing resource claims in the DRA Manager cache.

kubelet: simplified range expressions

2ce3721

kubelet: remove DeviceManager leftovers

01cd7f2

e2e: add tests for external resource claims

b00f1b7

e2e: test inline claim used by multiple containers

4a2651d

pohly and others added 16 commits September 1, 2022 08:51

e2e: disable retry test

1cf57f6

kubelet isn't ready for it yet.

e2e: show progress and duration of long running steps

f6e2af4

e2e: fix and simplify test driver documentation

90263ce

We don't want to document in detail how to use local-up-cluster.sh, only show the specific options that are needed for DRA. Figuring out which other options may be needed is left as exercise to the reader... Paths for files were stale.

e2e: print intermediate failure reports while polling

2b12710

e2e dra: assert that all test pods are gone

c4fa658

If pod deletion hangs, then we want to see that. Otherwise we only get told that claims aren't getting deleted, but that then isn't the root cause.

e2e dra: speed up test execution

e1ab48a

By letting the test pods run to completion right away we save time because each process starts and stops as quickly as possible.

e2e dra: speed up test execution, II

0ff6114

Having all processes exit immediately is not realistic and may hide issues that occur for longer-running processes. Another solution with the same effect (fast test execution) is to let kubelet kill each process quickly.

kubelet: get rid of DRA Topology-related code

a06a115

It's not decided yet whether we will implement Topology support for DRA or not. Removing toppology-related code will help to avoid wrong impression that DRA is used as a hint provider for the Topology Manager.

kubelet: improve plugin store

eb9bbf7

kubelet: remove unused DRA definitions

ced5c3f

e2e dra: delete pods gracefully

67a1752

DRA e2e test used force pod deletion for test and plugin pods. This should not be a default way of deleting pods, but rather an exception.

bart0sh force-pushed the PR015-dra-e2e-get-rid-of-force-deletion branch from 6df3c07 to 67a1752 Compare September 16, 2022 13:32

pohly force-pushed the dynamic-resource-allocation branch 2 times, most recently from dd47224 to df63205 Compare September 19, 2022 10:19

bart0sh closed this Sep 19, 2022

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Get rid of force pod deletion#26

Get rid of force pod deletion#26
bart0sh wants to merge 56 commits into
pohly:dynamic-resource-allocationfrom
bart0sh:PR015-dra-e2e-get-rid-of-force-deletion

bart0sh commented Sep 16, 2022

Uh oh!

pohly commented Sep 16, 2022

Uh oh!

pohly commented Sep 16, 2022 •

edited

Loading

Uh oh!

bart0sh commented Sep 16, 2022

Uh oh!

bart0sh commented Sep 16, 2022

Uh oh!

pohly commented Sep 16, 2022

Uh oh!

pohly commented Sep 16, 2022

Uh oh!

bart0sh commented Sep 16, 2022

Uh oh!

pohly commented Sep 19, 2022

Uh oh!

pohly commented Sep 19, 2022

Uh oh!

bart0sh commented Sep 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

bart0sh commented Sep 16, 2022

Uh oh!

pohly commented Sep 16, 2022

Uh oh!

pohly commented Sep 16, 2022 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

bart0sh commented Sep 16, 2022

Uh oh!

bart0sh commented Sep 16, 2022

Uh oh!

pohly commented Sep 16, 2022

Uh oh!

pohly commented Sep 16, 2022

Uh oh!

bart0sh commented Sep 16, 2022

Uh oh!

pohly commented Sep 19, 2022

Uh oh!

pohly commented Sep 19, 2022

Uh oh!

bart0sh commented Sep 19, 2022

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

pohly commented Sep 16, 2022 •

edited

Loading