Guest

Preview Tool

Cisco Bug: CSCuq20755 - DME Core seen continously when upgrading from 191 to 198

Last Modified

Sep 17, 2019

Products (1)

  • Cisco Unified Computing System

Known Affected Releases

2.0(1q)A 2.2(2.182)A

Description (partial)

This fix is for a race condition. This issue has been seen earlier also and has started occuring more frequently after we fixed  CSCuo34760. Before CSCuo34760, this occurred less frequently. With the fix for CSCuo34760, we started tearing down peer AG connections when  the primary knows the secondary dme is down or not responding. This leads to the possibility of race condition occurring more often.

Symptom:
DME crashes and restarts. DME is restarted by pmon as soon as it crashes, so the DME restart should not impact any data path.

Conditions:
This looks like a specific issue when DME tears down connection to any AGs and a genuine 
write failure happens on the socket biohandle corresponding to the same AG on which DME is trying to
close the connection to same AG.  When the write failure happens, dme tries to disconnect the channel and in the process access the socket bio handle, and at the same time dme tearing down the connection also leads to another thread trying to free up the same socket bio handle leading to a crash.

More generally this scenario would occur when a failover is happening because of primary going down or someone rebooting the primary or dme restarting and at the same time the HA link becomes unreliable due to some reason and writes/reads start failing
Bug details contain sensitive information and therefore require a Cisco.com account to be viewed.

Bug Details Include

  • Full Description (including symptoms, conditions and workarounds)
  • Status
  • Severity
  • Known Fixed Releases
  • Related Community Discussions
  • Number of Related Support Cases
Bug information is viewable for customers and partners who have a service contract. Registered users can view up to 200 bugs per month without a service contract.