Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[BUG] Unexpected resource creation when the ZookeeperCluster is already deleted #410

Open
srteam2020 opened this issue Nov 7, 2021 · 4 comments · May be fixed by #530
Open

[BUG] Unexpected resource creation when the ZookeeperCluster is already deleted #410

srteam2020 opened this issue Nov 7, 2021 · 4 comments · May be fixed by #530

Comments

@srteam2020
Copy link
Contributor

srteam2020 commented Nov 7, 2021

Description

We find that the zookeeper operator, after deleting the PVC and removing the finalizer for a ZookeeperCluster CR with a non-zero DeletionTimestamp (in reconcileFinalizers), will continue running the reconciliation logic if there is no error:

func (r *ReconcileZookeeperCluster) reconcileFinalizers(instance *zookeeperv1beta1.ZookeeperCluster) (err error) {
	...
	if instance.DeletionTimestamp.IsZero() {
		...
	} else {
		if utils.ContainsString(instance.ObjectMeta.Finalizers, utils.ZkFinalizer) {
			if err = r.cleanUpAllPVCs(instance); err != nil {
				return err
			}
			instance.ObjectMeta.Finalizers = utils.RemoveString(instance.ObjectMeta.Finalizers, utils.ZkFinalizer)
			if err = r.client.Update(context.TODO(), instance); err != nil {
				return err
			}
		}
	}
	return nil
}

This can lead to unexpected resource creation, because after removing the finalizer of the CR, the CR is going to be deleted. When the CR is deleted, the resources (e.g., sts, service, secret, etc.) that are owned by the CR will also be deleted. If the operator does not immediately return after removing the finalizer, the original reconcile logic will find these resources do not exist and try to create these resources. For example, in reconcileStatefulSet the operator will create the zookeeper statefulset. Such creation is unexpected and unnecessary, as they will be deleted again since the owner CR does not exist.

Importance

should-have

Location

reconcileFinalizers:
https://github.com/srteam2020/zookeeper-operator/blob/a99c93ba06fbece2c262f7744517c2e537491c29/pkg/controller/zookeepercluster/zookeepercluster_controller.go#L664

Suggestions for an improvement

Instead of creating the resources that are destined to be deleted immediately, we should make the operator return from reconcile if the pvc is deleted successfully and the finalizer is removed from the CR successfully, as no further actions should be taken if the CR is deleted.

We are willing to send a PR to help fix it.

@anishakj
Copy link
Contributor

anishakj commented Nov 9, 2021

@srteam2020 Could you please mention the steps of how you were able to reproduce the issue?

@srteam2020
Copy link
Contributor Author

@anishakj Thanks for your reply!
This is a transient issue and we do not have a way to 100% reliably reproduce it. However, we do observe it quite a few times when deploying zookeeper.
What we did is very simple: we just deleted the zookeepercluster. And from the operator log we can see:

{"level":"info","ts":1636472845.2814996,"logger":"controller_zookeepercluster","msg":"Deleting PVC","Request.Namespace":"default","Request.Name":"zookeeper-cluster","With Name":"data-zookeeper-cluster-0"}
{"level":"info","ts":1636472845.4435692,"logger":"controller_zookeepercluster","msg":"Creating a new Zookeeper Config Map","Request.Namespace":"default","Request.Name":"zookeeper-cluster","ConfigMap.Namespace":"default","ConfigMap.Name":"zookeeper-cluster-configmap"}
{"level":"info","ts":1636472845.4453032,"logger":"controller_zookeepercluster","msg":"Creating a new Zookeeper StatefulSet","Request.Namespace":"default","Request.Name":"zookeeper-cluster","StatefulSet.Namespace":"default","StatefulSet.Name":"zookeeper-cluster"}
{"level":"info","ts":1636472845.448103,"logger":"controller_zookeepercluster","msg":"Creating new client service","Request.Namespace":"default","Request.Name":"zookeeper-cluster","Service.Namespace":"default","Service.Name":"zookeeper-cluster-client"}
{"level":"info","ts":1636472845.451503,"logger":"controller_zookeepercluster","msg":"Creating new headless service","Request.Namespace":"default","Request.Name":"zookeeper-cluster","Service.Namespace":"default","Service.Name":"zookeeper-cluster-headless"}
{"level":"info","ts":1636472845.4550529,"logger":"controller_zookeepercluster","msg":"Creating admin server service","Request.Namespace":"default","Request.Name":"zookeeper-cluster","Service.Namespace":"default","Service.Name":"zookeeper-cluster-admin-server"}
{"level":"info","ts":1636472845.559795,"logger":"controller_zookeepercluster","msg":"Creating new pod-disruption-budget","Request.Namespace":"default","Request.Name":"zookeeper-cluster","PodDisruptionBudget.Namespace":"default","PodDisruptionBudget.Name":"zookeeper-cluster"}

After the operator deleted the PVC (and removed the finalizer from CR), the CR is about to be deleted and there should be no further actions needed from the operator. However, we observed from the log that the operator actually created the zookeeper statefulset and service again even if the zookeepercluster CR is deleted.

I think the root cause is that the operator does not directly return from reconcile() after removing the finalizer. If the operator continues reconcile() after reconcileFinalizers, it will find the statefulset and the service are not found in the following reconcile methods like reconcileStatefulSet and reconcileClientService because these resources have been deleted together with the zookeepercluster CR by k8s. In that case, the operator creates these resources unexpectedly.

@anishakj
Copy link
Contributor

@srteam2020 Would you like to create a PR to fix this?

@srteam2020
Copy link
Contributor Author

Yes I will try to create a PR to fix it

lipov3cz3k added a commit to lipov3cz3k/pravega-zookeeper-operator that referenced this issue Feb 14, 2023
lipov3cz3k added a commit to lipov3cz3k/pravega-zookeeper-operator that referenced this issue Mar 7, 2023
Signed-off-by: Lipovsky, Tomas <[email protected]>
lipov3cz3k added a commit to lipov3cz3k/pravega-zookeeper-operator that referenced this issue Apr 3, 2023
Signed-off-by: Lipovsky, Tomas <[email protected]>
lipov3cz3k added a commit to lipov3cz3k/pravega-zookeeper-operator that referenced this issue Apr 3, 2023
Signed-off-by: Lipovsky, Tomas <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
2 participants