Cisco Bug: CSCvs31160 - Gen3 N9K Switch (ACI) - Multiple Resets -> "Reset Requested due to Fatal System Error"
Mar 13, 2020
- Cisco Nexus 9000 Series Switches
Known Affected Releases
Symptom: Gen3 N9K switch running in ACI mode reset twice in 16 days with due to system reset-reason: "Reset Requested due to Fatal System Error (3)" Before that the ToR had reset with the same reset-reason 4 times between May 11 and Nov 14 - No cores -> last one was for a python process on Sept 19 - No clear system reset reason - No kernel panic files for Nov 14 reset - Checked memory logs -> no memory exhaustion - DMESG -> no traces, no seg faults, no OOM, no PCI interrupt errors, etc - Sysmgr -> shows SUP being unstable but lists no apparent reason - SSD lifetime looks okay Older kernel panic files might be present on the switch with dmesg traces and kernel panic files show the following entries resulting to bootflash being mounted in read-only mode - <6>[1337755.626210] Write(10): 2a 00 08 32<2>[1337755.626216] EXT4-fs error (device sda8) in ext4_reserve_inode_write:4915: Journal has aborted - <snip> - <2>[1337755.626994] EXT4-fs error (device sda8): ext4_journal_check_start:56: Detected aborted journal - <2>[1337755.626997] EXT4-fs (sda8): Remounting filesystem read-only Conditions: No specific conditions. The affected switch in this case had been running for over 6K hours with 48 reboots during its lifetime. No high I/O observed for log files either. Seems to be non SSD life time related
Bug details contain sensitive information and therefore require a Cisco.com account to be viewed.
Bug Details Include
- Full Description (including symptoms, conditions and workarounds)
- Known Fixed Releases
- Related Community Discussions
- Number of Related Support Cases