Cisco Bug: CSCvv14088 - DBMON service eventually hangs if IM&P Node is rebooted when CUCM Publisher is not available
Oct 13, 2020
- Cisco Unified Communications Manager IM & Presence Service
Known Affected Releases
Symptom: RTMT client reports AMC Service is not available on an IM&P node DBMON Stops processing Change Notification messages, DB Change Notification Server\QueuedRequestsInDB Counter keeps on increasing activelog cm/trace/dbl/sdi/dbmon*.txt logs are no longer there IMP Nodes starts to raise IDSReplicationFailure Alarms Aug 19 09:49:24 imp01 local7 3 dbmon: 82: imp01.domain.com: Aug 19 2020 13:49:24.215 UTC : %UC_DB-3-IDSReplicationFailure: %[class_id=Realtime Replication is broken][class_msg=replstate = 3][specific_msg=Please run utils dbreplication status from publisher][AppID=Cisco Database Layer Monitor][ClusterID=Cluster1][NodeID=imp01.domain.com]: Combined alarm for emergency and error situations. IDS Replication has failed Conditions: If an IM&P node is rebooted while the CUCM Publisher is either Not reachable or it is also in the process of starting up its services such as A Cisco DB Replicator. As the IM&P Node starts up its A Cisco DB Service, it fails to update a series of Database Configuration files. This leads to DB Connector failure to reach the CUCM Publisher (jdbcurl_cuptoucm) which leads to services such as AMC to not start up correctly. In the AMC Logs we could see the following failures 2020-07-03 08:33:03,831 ERROR [main] dbl.ConnectionManager$ConnectionManagerEntry - com.cisco.ccm.dbl.ConnectionManager$ConnectionManagerEntry@40d241 java.sql.SQLException: Invalid database name Invalid database name: '' It also leads to DBMON (Cisco Database Layer Monitor) Service to leak File Descriptors which eventually leads the service to hang. This FD leaking condition can be detected via multiple methods. - Login to Cisco Unified IM and Presence Reporting on the IM&P Publisher and run the "IM and Presence Database Status" report. Manually inspect all the "IM and Presence Sqlhosts" files and note if any any IM&P node is missing an entry for the Cluster's CUCM Publisher node. For example a CUCM Publisher entry would look like ccm_pub_ccm12_5_1_13900_152 onsoctcp 10.10.37.30 ccm_pub_ccm12_5_1_13900_152 b=32767,rto=300 - On all IM&P nodes run the following command via Platform CLI admin:file search activelog cm/trace/dbl/sdi/start.log "WARNING: IMP subscriber tried pinging CUCM for 3 time to start db.By passing it now" Searching path: /var/log/active/cm/trace/dbl/sdi/start.log Searching file: /var/log/active/cm/trace/dbl/sdi/start.log Mon Aug 24 12:37:59 2020 dbllib.getUCMPubInfo WARNING: IMP subscriber tried pinging CUCM for 3 time to start db.By passing it now Search completed If you see matches that corresponds to the last time this IM&P node was started, then you are exposed to this dbmon leaking FDs issue. - List the open-fds of the Cisco Database Layer Monitor (dbmon) process admin:show process search dbmon PasreOption method called root 3784 27346 13 16:58 pts/1 00:00:00 sudo /usr/local/platform/cli_scripts/listProcesses.sh -search dbmon root 3788 3784 0 16:58 pts/1 00:00:00 /bin/bash /usr/local/platform/cli_scripts/listProcesses.sh -search dbmon root 3791 3788 0 16:58 pts/1 00:00:00 grep -i dbmon root 25314 11882 0 16:19 ? 00:00:00 /usr/bin/sudo -u database /usr/local/cm/bin/dbmon database 25316 25314 0 16:19 ? 00:00:10 /usr/local/cm/bin/dbmon admin:show process open-fd 25316 PasreOption method called COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME dbmon 25316 database cwd DIR 8,2 4096 2 / dbmon 25316 database rtd DIR 8,2 4096 2 / dbmon 25316 database txt REG 8,2 19715476 1155249 /usr/local/cm/bin/dbmon dbmon 25316 database mem REG 0,19 17732028 108261 /dev/shm/CiscoNotifySharedMem . . . dbmon 25316 database 628r REG 8,2 768 1185968 /usr/local/cm/db/informix/etc/sqlhosts dbmon 25316 database 629r REG 8,2 768 1185968 /usr/local/cm/db/informix/etc/sqlhosts dbmon 25316 database 630r REG 8,2 768 1185968 /usr/local/cm/db/informix/etc/sqlhosts dbmon 25316 database 631r REG 8,2 768 1185968 /usr/local/cm/db/informix/etc/sqlhosts dbmon 25316 database 632r REG 8,2 768 1185968 /usr/local/cm/db/informix/etc/sqlhosts If you observe an increasing number of open FDs against /usr/local/cm/db/informix/etc/sqlhosts then you are exposed to this dbmon leaking FDs issue.
Bug details contain sensitive information and therefore require a Cisco.com account to be viewed.
Bug Details Include
- Full Description (including symptoms, conditions and workarounds)
- Known Fixed Releases
- Related Community Discussions
- Number of Related Support Cases