Cisco Bug: CSCup21372 - service not responding after sending FPOAM ping to switch-id
Jan 29, 2017
- Cisco MDS 9000 NX-OS and SAN-OS Software
Known Affected Releases
Symptom: Depending on the state of the issue, multiple symptoms are seen. I. What is always seen: 1.Commands like ping fabricpath switch-id <i>ID</i>" fail as follows: ---snip--- NEXUS# ping fabricpath switch-id <i>11</i> Service not responding ---snip--- II. Eventually, further issues will occur: 2. Commands like show run will hang for 2 minutes and then display an error before completing as follows: ---snip--- NEXUS# show run <----- At this point the CLI hangs for 2 minutes The following SAPs did not respond within the expected timeframe Pending SAPS:1195 Printing Ascii configuration for remaining SAPs !Command: show running-config !Time: Tue Feb 13 04:46:03 2001 version 7.0(5)N1(1) ---snip--- 4. Some commands like show fabricpath oam loopback database (also included in show tech fabricpath oam) will cause the switch to reload due to "fpoam hap reset": ---snip--- NEXUS# show fabricpath oam loopback database Broadcast message from root (console) (Wed Dec 10 11:58:19 2014): The system is going down for reboot NOW! ---snip--- After the switch comes back up you will then see the following: ---snip--- NEXUS# show system reset-reason ----- reset reason for Supervisor-module 1 (from Supervisor in slot 1) --- 1) At 461020 usecs after Sat Apr 18 08:47:22 2009 Reason: Reset triggered due to HA policy of Reset Service: fpoam hap reset Version: 7.0(5)N1(1) [...] NEXUS# show cores Module Instance Process-name PID Date(Year-Month-Day Time) ------ -------- --------------- -------- ------------------------- 1 1 fpoam 3703 2009-04-18 08:53:15 ---snip--- Conditions: 1. RSSMem for the "fpoam" process is around 100 MB (the value in the output is in bytes): ---snip--- NEXUS# show processes memory | inc PID|fpoam PID MemAlloc MemLimit StkSize RSSMem LibMem StackBase/Ptr Process 10394 96501760 505227955 86016 99651584 33148928 7ff8ef30/7ff8ee14 fpoam ---snip--- 2. One of the following has been done on the affected switch: a. Excessive execution of the ping fabricpath switch-id <i>ID</i> command for a reachable FP switch ID (in the lab around 17,500-17,600 executions of this command were required to trigger this issue). b. Use of the ping fabricpath switch id <i>ID</i> sweep <i>lower-limit upper-limit</i> command for a reachable FP switch ID, which eventually fails with "request not sent" ('Q') code. This can be triggered e.g. when using identical values for <i>lower-limit</i> and <i>upper-limit</i> or with <i>upper-limit</i> <= <i>lower-limit</i> as follows: ---snip--- NEXUS# ping fabricpath switch-id 11 sweep 1350 1350 Codes: '!' - success, 'Q' - request not sent, '.' - timeout, [...] Sender handle: 6 QQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQQ [...] ---snip--- Note that such command will never stop, and even if aborted with <CTRL>+<C>, it will continue to run in the background. Issuing the above command once on a freshly rebooted system will cause the "fpoam" process to reach the threshold of 100 MB after around 2.5 hours. Also refer to CSCuo39797: https://tools.cisco.com/bugsearch/bug/CSCuo39797 c. Use of the ping fabricpath switch id <i>ID</i> count <i>number</i> with <i>number</i> being larger than around 17,500. 3. The switch is running NX-OS 7.x; in previous versions the show fabricpath command does not yet exist.
Bug details contain sensitive information and therefore require a Cisco.com account to be viewed.
Bug Details Include
- Full Description (including symptoms, conditions and workarounds)
- Known Fixed Releases
- Related Community Discussions
- Number of Related Support Cases