Guest

Preview Tool

Cisco Bug: CSCvv14088 - DBMON service eventually hangs if IM&P Node is rebooted when CUCM Publisher is not available

Last Modified

Oct 13, 2020

Products (1)

  • Cisco Unified Communications Manager IM & Presence Service

Known Affected Releases

12.5(1)

Description (partial)

Symptom:
RTMT client reports AMC Service is not available on an IM&P node

DBMON Stops processing Change Notification messages, DB Change Notification Server\QueuedRequestsInDB Counter keeps on increasing

activelog cm/trace/dbl/sdi/dbmon*.txt  logs are no longer there

IMP Nodes starts to raise IDSReplicationFailure Alarms 

Aug 19 09:49:24 imp01 local7 3 dbmon: 82: imp01.domain.com: Aug 19 2020 13:49:24.215 UTC :  %UC_DB-3-IDSReplicationFailure: %[class_id=Realtime Replication is broken][class_msg=replstate = 3][specific_msg=Please run utils dbreplication status from publisher][AppID=Cisco Database Layer Monitor][ClusterID=Cluster1][NodeID=imp01.domain.com]: Combined alarm for emergency and error situations. IDS Replication has failed

Conditions:
If an IM&P node is rebooted while the CUCM Publisher is either Not reachable or it is also in the process of starting up its services such as A Cisco DB Replicator.

As the IM&P Node starts up its A Cisco DB Service, it fails to update a series of Database Configuration files. This leads to DB Connector failure to reach the CUCM Publisher (jdbcurl_cuptoucm) which leads to services such as AMC to not start up correctly.

In the AMC Logs we could see the following failures

2020-07-03 08:33:03,831 ERROR [main] dbl.ConnectionManager$ConnectionManagerEntry - com.cisco.ccm.dbl.ConnectionManager$ConnectionManagerEntry@40d241
java.sql.SQLException: Invalid database name Invalid database name: ''

It also leads to DBMON (Cisco Database Layer Monitor) Service to leak File Descriptors which eventually leads the service to hang. This FD leaking condition can be detected via multiple methods.

- Login to Cisco Unified IM and Presence Reporting on the IM&P Publisher and run the "IM and Presence Database Status" report. Manually inspect all the "IM and Presence Sqlhosts" files and note if any any IM&P node is missing an entry for the Cluster's CUCM Publisher node. For example a CUCM Publisher entry would look like

ccm_pub_ccm12_5_1_13900_152	onsoctcp	10.10.37.30	ccm_pub_ccm12_5_1_13900_152	b=32767,rto=300

- On all IM&P nodes run the following command via Platform CLI

admin:file search activelog cm/trace/dbl/sdi/start.log "WARNING:  IMP subscriber tried pinging CUCM for 3 time to start db.By passing it now"

Searching path: /var/log/active/cm/trace/dbl/sdi/start.log
Searching file: /var/log/active/cm/trace/dbl/sdi/start.log
Mon Aug 24 12:37:59 2020 dbllib.getUCMPubInfo  WARNING:  IMP subscriber tried pinging CUCM for 3 time to start db.By passing it now

Search completed

If you see matches that corresponds to the last time this IM&P node was started, then you are exposed to this dbmon leaking FDs issue.

- List the open-fds of the Cisco Database Layer Monitor (dbmon) process

admin:show process search dbmon
PasreOption method called
root      3784 27346 13 16:58 pts/1    00:00:00 sudo /usr/local/platform/cli_scripts/listProcesses.sh -search dbmon
root      3788  3784  0 16:58 pts/1    00:00:00 /bin/bash /usr/local/platform/cli_scripts/listProcesses.sh -search dbmon    
root      3791  3788  0 16:58 pts/1    00:00:00 grep -i dbmon
root     25314 11882  0 16:19 ?        00:00:00 /usr/bin/sudo -u database /usr/local/cm/bin/dbmon
database 25316 25314  0 16:19 ?        00:00:10 /usr/local/cm/bin/dbmon

admin:show process open-fd 25316
PasreOption method called
COMMAND   PID     USER   FD   TYPE             DEVICE  SIZE/OFF    NODE NAME
dbmon   25316 database  cwd    DIR                8,2      4096       2 /
dbmon   25316 database  rtd    DIR                8,2      4096       2 /
dbmon   25316 database  txt    REG                8,2  19715476 1155249 /usr/local/cm/bin/dbmon
dbmon   25316 database  mem    REG               0,19  17732028  108261 /dev/shm/CiscoNotifySharedMem
.
.
.
dbmon   25316 database  628r   REG                8,2       768 1185968 /usr/local/cm/db/informix/etc/sqlhosts
dbmon   25316 database  629r   REG                8,2       768 1185968 /usr/local/cm/db/informix/etc/sqlhosts
dbmon   25316 database  630r   REG                8,2       768 1185968 /usr/local/cm/db/informix/etc/sqlhosts
dbmon   25316 database  631r   REG                8,2       768 1185968 /usr/local/cm/db/informix/etc/sqlhosts
dbmon   25316 database  632r   REG                8,2       768 1185968 /usr/local/cm/db/informix/etc/sqlhosts

If you observe an increasing number of open FDs against /usr/local/cm/db/informix/etc/sqlhosts then you are exposed to this dbmon leaking FDs issue.
Bug details contain sensitive information and therefore require a Cisco.com account to be viewed.

Bug Details Include

  • Full Description (including symptoms, conditions and workarounds)
  • Status
  • Severity
  • Known Fixed Releases
  • Related Community Discussions
  • Number of Related Support Cases
Bug information is viewable for customers and partners who have a service contract. Registered users can view up to 200 bugs per month without a service contract.