Guest

Preview Tool

Cisco Bug: CSCvv55870 - DNAC:S&P:112C:Identitymgmt service continuously restarting on DR 6+1 scale cluster after 18 days run

Last Modified

Oct 16, 2020

Products (1)

  • Cisco DNA Center

Known Affected Releases

DNAC-Cyclops

Description (partial)

This is an internal class of errors, rarely yet seen in scale load DR solution cluster over time..
In this case it was started as identitymgmt failure.. but logs indicates one of the mongo POD has gone into Name resolution lookup failure (meaning k8s control plane has not fully wired up this POD).
You may check (taking mongodb-0 as example here)
kubectl describe pod -n maglev-system mongodb-0
kubectl get ep -n maglev-system external-mongodb-0 -o yamlkubectl get ep -n maglev-system external-mongodb-0 -o yaml
kubectl describe ep -n maglev-system external-mongodb-0

all will indicate the POD as running but in NotReady state..

This will be service impacting and for this release would need manual intervention to heal the runtime

Symptom:
The external symptoms may be seen as "identitymanager" failure.. but logs would indicate one of the mongo instance is in "Not Ready" state (and hence "nslookup" fails).
One or more service POD may be running but name look up will indicate failure.. 
And the commands for the respective POD will indicate them in "Not Ready" state
kubectl describe pod -n maglev-system mongodb-0
kubectl get ep -n maglev-system external-mongodb-0 -o yamlkubectl get ep -n maglev-system external-mongodb-0 -o yaml
kubectl describe ep -n maglev-system external-mongodb-0

If you check "journalctl -u kubelet" you would see this Error Trace

Aug 30 02:56:30 maglev-master-172-21-21-10 kubelet[138328]: E0830 02:56:30.128184  138328 desired_state_of_world_populator.go:298] Error processing volume "mongodb-data" for pod "mongodb-0_maglev-system(465cdfdd-9b62-4062-9fe1-eb06354944c4)": error processing PVC "maglev-system"/"mongodb-data-mongodb-0": failed to fetch PVC maglev-system/mongodb-data-mongodb-0 from API server. err=Get https://127.0.0.1:9443/api/v1/namespaces/maglev-system/persistentvolumeclaims/mongodb-data-mongodb-0: read tcp 127.0.0.1:42958->127.0.0.1:9443: use of closed network connection

The signature you may look for is "desired_state_of_world_populator.go" and " error processing PVC" " failed to fetch PVC" in kubelet logs.
Under this condition, the best recovery option is restart kubelet systemd service (systemctl restart kubelet)

(Re: 
https://github.com/kubernetes/kubernetes/issues/87615
https://github.com/golang/go/issues/39750
)

Conditions:
The external visible condition be that "nslookup" for "Running POD" would result in failure.
This is a rarely seen issue.. Still open in Kubernetes infra.
Bug details contain sensitive information and therefore require a Cisco.com account to be viewed.

Bug Details Include

  • Full Description (including symptoms, conditions and workarounds)
  • Status
  • Severity
  • Known Fixed Releases
  • Related Community Discussions
  • Number of Related Support Cases
Bug information is viewable for customers and partners who have a service contract. Registered users can view up to 200 bugs per month without a service contract.