Guest

Preview Tool

Cisco Bug: CSCuo80884 - NETDEV WATCHDOG: eth7 (enic): transmit timed out

Last Modified

Nov 19, 2014

Products (1)

  • Cisco Videoscape Distribution Suite Transparent Caching

Known Affected Releases

5.1.1

Description (partial)

Cache Engine 2 rebooted by iSCSI connection error

<mgmt server log>Apr 27 21:18:52 ce-2 iscsid: Kernel reported iSCSI connection 5:0 error
(1011) state (3)

Apr 27 21:19:30 ce-1 pang[16577]: Current leader is me (#9477322257#ce-1)!
Num members = 1 

Apr 27 21:20:31 mg-1 snmpd[7116]: cluster has been degraded

Apr 27 21:24:24 ce-1 pang[16577]: Current leader is me (#9477322257#ce-1)!
Num members = 2 

Apr 27 21:25:06 ce-1 pang[16577]: volume             state
availability owner      total      free       used       usage
total_writes 

Apr 27 21:25:06 ce-1 pang[16577]: /mnt/vol1          mounted
active    ce-1          538        17         521        96.79      0Apr 27 21:25:06 ce-1 pang[16577]: /mnt/vol2          mounted
active    ce-1          538        18         520        96.57      0Apr 27 21:25:06 ce-1 pang[16577]: /mnt/vol3          mounted
active    ce-1          538        20         518        96.18      0Apr 27 21:25:06 ce-1 pang[16577]: /mnt/vol4          mounted
active    ce-1          538        17         521        96.72      0Apr 27 21:25:06 ce-1 pang[16577]: /mnt/vol5          mounted_cmdb
active    ce-1          538        20         518        96.27      0Apr 27 21:25:06 ce-1 pang[16577]: /mnt/vol6          mounted
active    ce-1          538        20         518        96.26      0Apr 27 21:25:06 ce-1 pang[16577]: /mnt/vol7          mounted
active    ce-1          538        14         524        97.30      0Apr 27 21:25:06 ce-1 pang[16577]: /mnt/vol8          mounted
active    ce-1          538        16         521        96.87      0Apr 27 21:25:06 ce-1 pang[16577]: /mnt/vol9          mounted
active    ce-1          538        14         524        97.30      0Apr 27 21:25:06 ce-1 pang[16577]: /mnt/vol10         mounted
active    ce-1          538        19         519        96.46      0Apr 27 21:25:06 ce-1 pang[16577]: /mnt/vol11         mounted
active    ce-1          538        17         521        96.72      0Apr 27 21:25:06 ce-1 pang[16577]: /mnt/vol12         mounted
active    ce-1          538        13         525        97.50      0Apr 27 21:26:07 ce-1 pang[16577]: Verifying blade 2 found mounted volumes
or xfs_repair - will reboot it 

Apr 27 21:26:07 ce-1 pang[16577]: Going to execute reboot to blade 2... 

Apr 27 21:26:13 ce-1 pang[16577]: <M> xfs_repair for data partition failed
with return code 1, volume 13 will stay inactive 

Apr 27 21:26:24 ce-1 pang[16577]: <M> xfs_repair for data partition failed
with return code 1, volume 14 will stay inactive 

Apr 27 21:28:07 ce-2 iscsid: iSCSI logger with pid=11465 started!

Apr 27 21:28:07 ce-2 iscsid: transport class version 2.0-870. iscsid
version 2.0-870

Apr 27 21:28:07 ce-2 iscsid: iSCSI daemon with pid=11472 started!

Apr 27 21:28:07 ce-2 iscsid: connection1:0 is operational now

Apr 27 21:28:07 ce-2 iscsid: connection2:0 is operational now

Apr 27 21:28:07 ce-2 iscsid: connection5:0 is operational now

Apr 27 21:28:07 ce-2 iscsid: connection6:0 is operational now

Apr 27 21:28:07 ce-2 iscsid: connection3:0 is operational now

Apr 27 21:28:07 ce-2 iscsid: connection4:0 is operational now

Apr 27 21:28:07 ce-2 iscsid: connection8:0 is operational now

Apr 27 21:28:07 ce-2 iscsid: connection7:0 is operational now

Apr 27 21:28:20 ce-2 snmpd[13534]: watchdog_threshold_reboots_time=54000 ,
reboot_time_diff=0 , g_start_kernel_watchdog 1

Apr 27 21:28:20 ce-2 snmpd[13534]: peerapp snmp agent has been restarted: 0
self-reboots till now , since 01-01-70 09:00:00 

Apr 27 21:31:27 ce-2 pang[16692]: Device state has been set to starting 

Apr 27 21:31:27 ce-2 pang[16692]: PANG version 5.0.3b276_ 

Apr 27 21:31:42 ce-1 pang[16577]: Current leader is me (#9477322257#ce-1)!
Num members = 3 

Apr 27 21:32:28 ce-1 pang[16577]: volume             state
availability owner      total      free       used       usage
total_writes 

Apr 27 21:32:28 ce-1 pang[16577]: /mnt/vol1          mounted
active    ce-1          538        16         522        96.90      0Apr 27 21:32:28 ce-1 pang[16577]: /mnt/vol2          mounted
active    ce-1          538        17         520        96.65      0Apr 27 21:32:28 ce-1 pang[16577]: /mnt/vol3          mounted
active    ce-1          538        20         518        96.26      0Apr 27 21:32:28 ce-1 pang[16577]: /mnt/vol4          mounted
active    ce-1          538        17         521        96.76      0Apr 27 21:32:28 ce-1 pang[16577]: /mnt/vol5          mounted_cmdb
active    ce-1          538        19         518        96.32      0Apr 27 21:32:28 ce-1 pang[16577]: /mnt/vol6          mounted
active    ce-1          538        19         518        96.31      0Apr 27 21:32:28 ce-1 pang[16577]: /mnt/vol7          mounted
active    ce-1          538        14         524        97.35      0Apr 27 21:32:28 ce-1 pang[16577]: /mnt/vol8          mounted
active    ce-1          538        16         522        96.92      0Apr 27 21:32:28 ce-1 pang[16577]: /mnt/vol9          mounted
active    ce-1          538        14         524        97.38      0Apr 27 21:32:28 ce-1 pang[16577]: /mnt/vol10         mounted
active    ce-1          538        18         520        96.62      0Apr 27 21:32:28 ce-1 pang[16577]: /mnt/vol11         mounted
active    ce-1          538        17         521        96.76      0Apr 27 21:32:28 ce-1 pang[16577]: /mnt/vol12         mounted
active    ce-1          538        13         525        97.55      0Apr 27 21:32:28 ce-1 pang[16577]: /mnt/vol13         mounted
active    ce-1          538        23         515        95.65      0Apr 27 21:32:28 ce-1 pang[16577]: /mnt/vol14         mounted
active    ce-1          538        21         517        95.99      0Apr 27 21:32:28 ce-1 pang[16577]: /mnt/vol15         mounted
active    ce-1          538        19         519        96.45      0Apr 27 21:32:28 ce-1 pang[16577]: /mnt/vol16         mounted
active    ce-1          538        17         521        96.81      0Apr 27 21:32:28 ce-1 pang[16577]: /mnt/vol17         mounted
active    ce-1          538        20         518        96.17      0Apr 27 21:32:28 ce-1 pang[16577]: /mnt/vol18         mounted
active    ce-1          538        27         511        95.68      0Apr 27 21:32:45 ce-2 pang[16692]: found volume 19. cmdb full path
/mnt/vol19cmdb. 

Apr 27 21:32:45 ce-2 pang[16692]: FINISHED -> /mnt/vol19cmdb/PA_stats.db
and /mnt/vol19cmdb/PA_parts.db 

Apr 27 21:32:53 ce-2 pang[16692]: cmdb path /mnt/vol19cmdb 

Apr 27 21:33:42 ce-2 pang[16692]: stored hash count 3032079, found hash
count 3032079 

Apr 27 21:33:42 ce-2 pang[16692]: stored byte count 13049471674078, found
byte count 13049471679926 

Apr 27 21:34:52 ce-2 pang[16692]: Device state has been set to started 

Apr 27 21:34:52 ce-2 pang[16692]: PANG started 

Apr 27 21:35:01 mg-1 snmpd[7116]: cluster has been enabled

Apr 27 21:36:09 ce-1 pang[16577]: detected   major:  too many one direction
sessions 

Apr 27 21:37:16 ce-1 pang[16577]: detected   major:  too many one direction
sessions 

Apr 27 21:38:02 ce-2 pang[16692]: XFS_R_IOE -
/mnt/vol13/3E/B88981ED26A3E09C2CF1E00C076F466A4D12CA - Input/output error
errno 5 

Apr 27 21:38:02 ce-2 pang[16692]: XFS_R_IOE -
/mnt/vol13/48/8A5B4DE00B4B6E1B1ED4D024C06D8C00000000 - Input/output error
errno 5 

Apr 27 21:38:02 ce-2 pang[16692]: warning: volume /dev/sdn3 (id 13) is
turned OFF because of errors 

Apr 27 21:38:02 ce-2 pang[16692]: detected   major:  volume 13 is
unavailable 

 

 

Apr 27 23:22:41 ce-2 pang[16692]: xfsctl(XFS_IOC_GETBMAP)  fd 222 file
/mnt/vol14/02/402A0328CA7BECFA74549632145C343389B9BE offset 0 len 19000 -
Structure needs cleaning 117 

Apr 27 23:22:41 ce-2 pang[16692]:
/mnt/vol14/02/402A0328CA7BECFA74549632145C343389B9BE - Structure needs
cleaning 

Apr 27 23:22:41 ce-2 pang[16692]: warning: volume /dev/sdo3 (id 14) is
turned OFF because of errors 

Apr 27 23:22:41 ce-2 pang[16692]: detected   major:  volume 14 is
unavailable

Symptom:
MGMT server Logs

Apr 27 21:18:52 ce-2 iscsid: Kernel reported iSCSI connection 5:0 error (1011) state (3)
Apr 27 21:19:30 ce-1 pang[16577]: Current leader is me (#9477322257#ce-1)!
Num members = 1 
Apr 27 21:20:31 mg-1 snmpd[7116]: cluster has been degraded
Apr 27 21:24:24 ce-1 pang[16577]: Current leader is me (#9477322257#ce-1)!
Num members = 2 
Apr 27 21:26:07 ce-1 pang[16577]: Verifying blade 2 found mounted volumes
or xfs_repair - will reboot it 
Apr 27 21:26:07 ce-1 pang[16577]: Going to execute reboot to blade 2... 
Apr 27 21:26:13 ce-1 pang[16577]: <M> xfs_repair for data partition failed
with return code 1, volume 13 will stay inactive 
Apr 27 21:26:24 ce-1 pang[16577]: <M> xfs_repair for data partition failed
with return code 1, volume 14 will stay inactive 
Apr 27 21:28:07 ce-2 iscsid: iSCSI logger with pid=11465 started!
Apr 27 21:28:07 ce-2 iscsid: transport class version 2.0-870. iscsid
version 2.0-870
Apr 27 21:28:07 ce-2 iscsid: iSCSI daemon with pid=11472 started!
Apr 27 21:28:07 ce-2 iscsid: connection1:0 is operational now
Apr 27 21:28:07 ce-2 iscsid: connection2:0 is operational now
Apr 27 21:28:07 ce-2 iscsid: connection5:0 is operational now
Apr 27 21:28:07 ce-2 iscsid: connection6:0 is operational now
Apr 27 21:28:07 ce-2 iscsid: connection3:0 is operational now
Apr 27 21:28:07 ce-2 iscsid: connection4:0 is operational now
Apr 27 21:28:07 ce-2 iscsid: connection8:0 is operational now
Apr 27 21:28:07 ce-2 iscsid: connection7:0 is operational now
Apr 27 21:28:20 ce-2 snmpd[13534]: watchdog_threshold_reboots_time=54000 ,
reboot_time_diff=0 , g_start_kernel_watchdog 1
Apr 27 21:28:20 ce-2 snmpd[13534]: peerapp snmp agent has been restarted: 0
self-reboots till now , since 01-01-70 09:00:00 
Apr 27 21:31:27 ce-2 pang[16692]: Device state has been set to starting 
Apr 27 21:31:27 ce-2 pang[16692]: PANG version 5.0.3b276_ 
Apr 27 21:31:42 ce-1 pang[16577]: Current leader is me (#9477322257#ce-1)!
Num members = 3 
Apr 27 21:32:45 ce-2 pang[16692]: found volume 19. cmdb full path
/mnt/vol19cmdb. 
Apr 27 21:32:45 ce-2 pang[16692]: FINISHED -> /mnt/vol19cmdb/PA_stats.db
and /mnt/vol19cmdb/PA_parts.db 
Apr 27 21:32:53 ce-2 pang[16692]: cmdb path /mnt/vol19cmdb 
Apr 27 21:33:42 ce-2 pang[16692]: stored hash count 3032079, found hash
count 3032079 
Apr 27 21:33:42 ce-2 pang[16692]: stored byte count 13049471674078, found
byte count 13049471679926 
Apr 27 21:34:52 ce-2 pang[16692]: Device state has been set to started 
Apr 27 21:34:52 ce-2 pang[16692]: PANG started 
Apr 27 21:35:01 mg-1 snmpd[7116]: cluster has been enabled
Apr 27 21:36:09 ce-1 pang[16577]: detected   major:  too many one direction
sessions 
Apr 27 21:37:16 ce-1 pang[16577]: detected   major:  too many one direction
sessions 
Apr 27 21:38:02 ce-2 pang[16692]: XFS_R_IOE -
/mnt/vol13/3E/B88981ED26A3E09C2CF1E00C076F466A4D12CA - Input/output error
errno 5 
Apr 27 21:38:02 ce-2 pang[16692]: XFS_R_IOE -
/mnt/vol13/48/8A5B4DE00B4B6E1B1ED4D024C06D8C00000000 - Input/output error
errno 5 
Apr 27 21:38:02 ce-2 pang[16692]: warning: volume /dev/sdn3 (id 13) is
turned OFF because of errors 
Apr 27 21:38:02 ce-2 pang[16692]: detected   major:  volume 13 is
unavailable 
Apr 27 23:22:41 ce-2 pang[16692]: xfsctl(XFS_IOC_GETBMAP)  fd 222 file
/mnt/vol14/02/402A0328CA7BECFA74549632145C343389B9BE offset 0 len 19000 -
Structure needs cleaning 117 
Apr 27 23:22:41 ce-2 pang[16692]:
/mnt/vol14/02/402A0328CA7BECFA74549632145C343389B9BE - Structure needs
cleaning 
Apr 27 23:22:41 ce-2 pang[16692]: warning: volume /dev/sdo3 (id 14) is
turned OFF because of errors 
Apr 27 23:22:41 ce-2 pang[16692]: detected   major:  volume 14 is
unavailable


Logs(from var logs) from the ce that reloaded:

Apr 27 20:13:33 ce-2 kernel: ------------[ cut here ]------------
Apr 27 20:13:33 ce-2 kernel: WARNING: at net/sched/sch_generic.c:219 dev_watchdog+0x11c/0x1ad()
Apr 27 20:13:33 ce-2 kernel: NETDEV WATCHDOG: eth7 (enic): transmit timed out
Apr 27 20:13:33 ce-2 kernel: Modules linked in: xfs(N) llpf(N) msr(N) af_packet(N) crc32c(N) libcrc32c(N) binfmt_misc(N) ib_iser(N) rdma_cm(N) ib_cm(N) iw_cm(N) ib_sa(N) ib_mad(N) ib_core(N) ib_addr(N) iscsi_tcp(N) libiscsi(N) scsi_transport_iscsi(N) mppVhba mppUpper bonding(N) ipmi_watchdog(N) ipmi_si(N) ipmi_devintf(N) ipmi_msghandler(N) ioatdma(N) fuse(N) loop(N) dm_mod(N) igb(X) rtc_cmos(N) rtc_core(N) button(N) wmi(N) rtc_lib(N) container(N) enic(N) pcspkr(N) joydev(N) dca(N) sg(N) usbhid(N) hid(N) ff_memless(N) ehci_hcd(N) sd_mod(N) crc_t10dif(N) usbcore(N) edd(N) ext3(N) mbcache(N) jbd(N) fan(N) ide_pci_generic(N) ide_core(N) ata_generic(N) ata_piix(N) libata(N) dock(N) megaraid_sas(N) mptsas(N) mptscsih(N) mptbase(N) thermal(N) processor(N) thermal_sys(N) hwmon(N) mpt2sas(X) scsi_transport_sas(N) raid_class(N) scsi_mod(N)
Apr 27 20:13:33 ce-2 kernel: Supported: No
Apr 27 20:13:33 ce-2 kernel: Pid: 0, comm: swapper Tainted: G          2.6.27.19-llpf-5-default #4
Apr 27 20:13:33 ce-2 kernel:
Apr 27 20:13:33 ce-2 kernel: Call Trace:
Apr 27 20:13:33 ce-2 kernel:  [<ffffffff8020d9f9>] show_trace_log_lvl+0x41/0x58
Apr 27 20:13:33 ce-2 kernel:  [<ffffffff80496ad4>] dump_stack+0x69/0x6f
Apr 27 20:13:33 ce-2 kernel:  [<ffffffff8023bf49>] warn_slowpath+0xa9/0xd1
Apr 27 20:13:33 ce-2 kernel:  [<ffffffff80435ce3>] dev_watchdog+0x11c/0x1ad
Apr 27 20:13:33 ce-2 kernel:  [<ffffffff802448f1>] run_timer_softirq+0x18d/0x204

Logs from the Leader:
Apr 27 20:21:04 ce-1 pang[18246]: Verifying blade 2 found mounted volumes or xfs_repair - will reboot it
Apr 27 20:21:04 ce-1 pang[18246]: Going to execute reboot to blade 2...

Conditions:
CE reboot due to iSCSI connection error,The server ce-2 got stuck after the data interface eth7 got congested
Bug details contain sensitive information and therefore require a Cisco.com account to be viewed.

Bug Details Include

  • Full Description (including symptoms, conditions and workarounds)
  • Status
  • Severity
  • Known Fixed Releases
  • Related Community Discussions
  • Number of Related Support Cases
Bug information is viewable for customers and partners who have a service contract. Registered users can view up to 200 bugs per month without a service contract.