Guest

Preview Tool

Cisco Bug: CSCvt08729 - kdump seen during image upgrade with panic unable to handle kernel NULL pointer dereference

Last Modified

Sep 02, 2020

Products (1)

  • Cisco Network Convergence System 5500 Series

Known Affected Releases

7.0.2.BASE

Description (partial)

Release-note

Symptom:
On RP cards after IOFPGA stage1  watch dog timeout, NMI is triggered which will cause kernel core dump and card will go for reload and comes backup find.
Stage1 watch dog timeout indicates that node is not stable and should need attention. Hence IOFPGA sends NMI interrupt.
The default behaviour on stage1 timeout NMI is to reload the card without kernel core dump.

crash> bt
PID: 0      TASK: ffffffff88c12500  CPU: 0   COMMAND: "swapper/0"
#0 [ffff88063cc05a58] machine_kexec at ffffffff880384df
#1 [ffff88063cc05aa8] crash_kexec at ffffffff880e0a81
#2 [ffff88063cc05b78] oops_end at ffffffff887be9f0
#3 [ffff88063cc05ba0] no_context at ffffffff8803f6cc
#4 [ffff88063cc05be8] __bad_area_nosemaphore at ffffffff8803f8ed
#5 [ffff88063cc05c30] bad_area_nosemaphore at ffffffff8803fa93
#6 [ffff88063cc05c40] __do_page_fault at ffffffff887c0eee
#7 [ffff88063cc05d40] do_page_fault at ffffffff887c133c
#8 [ffff88063cc05d50] page_fault at ffffffff887bdeb2
    [exception RIP: unknown or invalid address]
    RIP: 0000000000000000  RSP: ffff88063cc05e08  RFLAGS: 00010893
    RAX: 0000000000000020  RBX: 0000000000000020  RCX: 0000000000000020
    RDX: 0000000000000020  RSI: 0000000000000020  RDI: 0000000000000000
    RBP: ffff88063cc05e10   R8: ffffffff88d015c0   R9: 000000000000344e
    R10: 0000000000000000  R11: 000000000000344d  R12: 0000000000000020
    R13: 0000000000000000  R14: ffffffffc0ae7cc0  R15: ffffffff88c03fd8
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
#9 [ffff88063cc05e08] fpga_cpu_error_check at ffffffffc06777df [klm_cctrli]
#10 [ffff88063cc05e18] obfl_nmi_handler at ffffffffc0ae5bec [klm_obfl]
#11 [ffff88063cc05e48] sup_nmi_handler at ffffffffc0ae5d39 [klm_obfl]
#12 [ffff88063cc05e60] nmi_handle at ffffffff887beb90
#13 [ffff88063cc05ec0] do_nmi at ffffffff8800668f
#14 [ffff88063cc05ef0] end_repeat_nmi at ffffffff887be226
    [exception RIP: intel_idle+218]
    RIP: ffffffff883a8c2a  RSP: ffffffff88c03e80  RFLAGS: 00000046
    RAX: 0000000000000020  RBX: 0000000000000008  RCX: 0000000000000001
    RDX: 0000000000000000  RSI: ffffffff88c03fd8  RDI: 0000000000000000
    RBP: ffffffff88c03ea8   R8: 0000000000613b24   R9: 000000001ac3983c
    R10: 000001dc6eadb838  R11: 0000000000000010  R12: 0000000000000020
    R13: 0000000000000003  R14: 0000000000000004  R15: ffffffff88c03fd8
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
--- <NMI exception stack> ---
#15 [ffffffff88c03e80] intel_idle at ffffffff883a8c2a
#16 [ffffffff88c03eb0] cpuidle_enter_state at ffffffff8863480a
#17 [ffffffff88c03ee0] cpuidle_idle_call at ffffffff88634939
#18 [ffffffff88c03f20] arch_cpu_idle at ffffffff8800c7de
#19 [ffffffff88c03f30] cpu_startup_entry at ffffffff880bc4a5
#20 [ffffffff88c03f80] rest_init at ffffffff887b1ba2
#21 [ffffffff88c03f90] start_kernel at ffffffff88d1ddee
#22 [ffffffff88c03fc8] x86_64_start_reservations at ffffffff88d1d495
#23 [ffffffff88c03fd8] x86_64_start_kernel at ffffffff88d1d58d
crash>

Conditions:
Below are the possible scenarios when Stage1 watch dog timeout happens: 
 * calvados fails to come up
 * Host OS stuck while booting
 * Process hogging CPU, causing the s/w daemon responsible for punching watchdog timer control register to hung for long time.
Bug details contain sensitive information and therefore require a Cisco.com account to be viewed.

Bug Details Include

  • Full Description (including symptoms, conditions and workarounds)
  • Status
  • Severity
  • Known Fixed Releases
  • Related Community Discussions
  • Number of Related Support Cases
Bug information is viewable for customers and partners who have a service contract. Registered users can view up to 200 bugs per month without a service contract.