Cisco Bug: CSCvr91674 - [FMR8]: Device_test core seen on tor after injecting PCIE errors
Sep 16, 2020
- Cisco Nexus 9000 Series Switches
Known Affected Releases
13.2(9a) 14.1(2o) 14.2(2.208)
Symptom: Whenever a switch hits a burst of PCIe, DRAM, or MCE errors, sometimes the device_test process crashes, which can cause the switch to reload. Conditions: It is very common to expect PCIe errors from PCIe devices such as the ASIC, FPGA, and NIC. These PCIe errors can be correctable (soft) and fatal (hard) errors. The PCIe Advanced Error Reporting (AER) driver handles all soft errors and a kernel crash is enforced on hard errors. So, occasional PCIe soft errors are ok and is not a concern. Also, it is common to have DRAM correctable errors (CE), which are mostly 1-bit ECC correctable errors and they are corrected by the driver software. On fatal DRAM errors, the switch will be forced to reboot by calling a kernel panic. Similarly, CPU machine check errors (MCE) can also be the soft and hard error type. The soft errors are handled by kernel and kernel panic is invoked on a hard error.
Bug details contain sensitive information and therefore require a Cisco.com account to be viewed.
Bug Details Include
- Full Description (including symptoms, conditions and workarounds)
- Known Fixed Releases
- Related Community Discussions
- Number of Related Support Cases