Cisco Bug: CSCvv95834 - VSM: Diagnostic DPSwitchLoopback test failure does not trigger a PFM action (automatic reboot)
Oct 23, 2020
- Cisco ASR 9000 Series Aggregation Services Routers
Known Affected Releases
Symptom: - Customer noticed traffic loss associated with a particular VSM card at exact time this message was generated: -- LC/0/1/CPU0:Sep 17 10:33:26.589 UTC: canb-server-lc: %PLATFORM-CANB_SERVER-6-OPERATION_INFO : send_drv_cbc_reset.2163, Info - CPU-ctrl reset CBC - Problem was cleared as soon as the affected VSM was removed from the chassis; - As per as solution design, this action will force this specific traffic to move to other VSM card; - That log stopped when we removed the VSM and restored service. - At that time, we noticed that this node was reporting a diag failure associated with the affected VSM card: ------ [SNIP] A9K-VSM-500 0/1/CPU0: Overall diagnostic result: MINOR ERROR Diagnostic level at card bootup: bypass Test results: (. = Pass, F = Fail, U = Untested) 1 ) LcEobcHeartbeat -----------------> . 2 ) FIAScratchRegister --------------> . 3 ) CUPOLAScratchRegister -----------> . 4 ) NPULoopback ---------------------> . 5 ) DPSwitchLoopback ----------------> F [SNIP] ------ Conditions: - The error message ?Info - CPU-ctrl reset CBC" seems to be related to CBC heartbeat mechanism; -- Looks like that there was h/b failure and as a recovery s/w trying to reset the CBC; - Heartbeat messages exchanged only when card is up and running; - Heartbeat won’t go over the eobc link. The packet path is as follows: Active RP(cbc_server) -> UART -> local cbc -> can bus -> remote cbc(LC) - Apparently, both keepalives for CBC and for DPSwitchLoopback failed at the same time. - When the diagnostic DPSwitchLoopback fails, that failure is not registered with the PFM, therefore, the automatic reload is never executed; -- This will eliminate the need for any manual intervention during a failure and avoid the blackholing of traffic. - Besides the PFM register code, this ddts should also implement commands to be taken automatically before recovery actions is performed should be implemented.
Bug details contain sensitive information and therefore require a Cisco.com account to be viewed.
Bug Details Include
- Full Description (including symptoms, conditions and workarounds)
- Known Fixed Releases
- Related Community Discussions
- Number of Related Support Cases