Guest

Preview Tool

Cisco Bug: CSCvt45784 - HX : Resiliency health message shows empty Ip for zookeeper

Last Modified

Mar 25, 2020

Products (1)

  • Cisco HyperFlex HX-Series

Known Affected Releases

4.0(2a)

Description (partial)

Symptom:
In a 2N Robo cluster, stopping exhibitor service (stop exhibitor) on one of the nodes doesn't show the offending IP zookeeper

The cluster continues to remain healthy.

The aux-zk ip can be seen on the node that got control of it

root@SpringpathControllerIF9IT6YGSQ:~# echo srvr | nc 169.254.1.253 2181 <<<< aux-zk ip
Zookeeper version: 3.4.12-9a32a0b3d8a6492fa18ed92f6d2408bbfc408912, built on 07/01/2019 19:04 GMT
Latency min/avg/max: 0/0/7
Received: 2985
Sent: 3090
Connections: 4
Outstanding: 0
Zxid: 0x2f00000111
Mode: follower
Node count: 2920

root@SpringpathControllerIF9IT6YGSQ:~# echo srvr | nc 169.254.1.21 2181 <<< eth1 ip of node
Zookeeper version: 3.4.12-9a32a0b3d8a6492fa18ed92f6d2408bbfc408912, built on 07/01/2019 19:04 GMT
Latency min/avg/max: 0/0/26
Received: 261448
Sent: 261602
Connections: 23
Outstanding: 0
Zxid: 0x2f0000011e
Mode: leader
Node count: 2920


root@SpringpathController2YUYBDZ8CO:~# stcli cluster storage-summary --detail
address: 169.254.1.20
name: HX-15-Edge-2N-A
state: online
uptime: 0 days 3 hours 20 minutes 31 seconds
activeNodes: 2 of 2
compressionSavings: 58.91%
deduplicationSavings: 0.0%
freeCapacity: 2.4T
healingInfo:
    inProgress: False
resiliencyDetails: 
	current ensemble size:2
	# of caching failures before cluster goes to be critical and partially available:2
	minimum cache copies remaining:2
	minimum data copies available for some user data:2
	minimum metadata copies available for cluster metadata:2
	# of unavailable nodes:0
	# of nodes failure tolerable for cluster to be fully available:1
	health state reason:storage cluster is healthy.
	# of node failures before cluster goes to be crticial and partially available:2
	# of node failures before cluster goes into readonly:14
	# of persistent devices failures tolerable for cluster to be fully available:1
	# of node failures before cluster goes to enospace warn trying to move the existing data:na
	# of persistent devices failures before cluster goes to be critical and partially available:2
	# of persistent devices failures before cluster goes into readonly:na
	# of caching failures before cluster goes into readonly:na
	# of caching devices failures tolerable for cluster to be fully available:1
resiliencyInfo:
    messages:
        ----------------------------------------
        Storage cluster is healthy. 
        ----------------------------------------
        Storage cluster manager is not configured on <Missing IP>
        ----------------------------------------
    state: 2
    nodeFailuresTolerable: 0
    cachingDeviceFailuresTolerable: 1
    persistentDeviceFailuresTolerable: 1


On a three Node cluster we see the offending IP:

    messages:
        ----------------------------------------
        Storage cluster is healthy.
        ----------------------------------------
        storage cluster manager is not configured on 192.168.14.112
        ----------------------------------------

Conditions:
2N Edge Cluster
Server: HX220M5
HX: 4.0(2a)
Bug details contain sensitive information and therefore require a Cisco.com account to be viewed.

Bug Details Include

  • Full Description (including symptoms, conditions and workarounds)
  • Status
  • Severity
  • Known Fixed Releases
  • Related Community Discussions
  • Number of Related Support Cases
Bug information is viewable for customers and partners who have a service contract. Registered users can view up to 200 bugs per month without a service contract.