Guest

Preview Tool

Cisco Bug: CSCus70331 - Mate Live Scheduler Bug

Last Modified

Feb 08, 2016

Products (1)

  • Cisco MATE Live

Known Affected Releases

6.2

Description (partial)

Symptom:
Problem Details: In Mate Live 6.0.2 the following actions stop scheduler from inserting new jobs into the ML database:

1. stop mld
2. stop embedded_web_server
3. start mld
4. start embedded_web_server

The last ml_insert job that was running when this action was performed goes into ABORTED state. After the embedded_web_server is started, the job is stuck in ABORTED state, and no other jobs will go from PENDING to IN_PROGRESS.

ml_insert_ctl -rerun or -cancel commands do not help. requests are enqueued, but nothing happens for a long time.
 in  production ml_insert jobs take about 28 minutes.

The only workaround I found is to go directly to ML database and update values for status, endtime, workdone and totalwork fields in jobstatus tables for job that is in ABORTED state. After this is done (with SQL UPDATE command), when embedded_web_server is restarted it continues to the next PENDING job, puts it inot IN_PROGRESS state and starts running the job without issues.

Also, after embedded web server is restarted, we sometimes get a "java.lang.OutOfMemoryError: Java heap space" error when running ml_insert_ctl -list or -status commands. I checked, and embedded_web_server is started with 4G heap space setting. The server itself seems not to be running out of memory (it has 32 G installed).




Customer has 2 core system with 32 GB of memory for the server  where the server is not out of space but it does shows heap space error


Since live giving problems they can move to manual collection



they are currently using manual collection taking 22 - 23 minutes driven by snapshot process from cron
they cannot give any logs now as the issue got mitigated using the workaround

Conditions:
Problem Details: In Mate Live 6.0.2 the following actions stop scheduler from inserting new jobs into the ML database:

1. stop mld
2. stop embedded_web_server
3. start mld
4. start embedded_web_server

The last ml_insert job that was running when this action was performed goes into ABORTED state. After the embedded_web_server is started, the job is stuck in ABORTED state, and no other jobs will go from PENDING to IN_PROGRESS.

ml_insert_ctl -rerun or -cancel commands do not help. requests are enqueued, but nothing happens for a long time.
 in  production ml_insert jobs take about 28 minutes.

The only workaround I found is to go directly to ML database and update values for status, endtime, workdone and totalwork fields in jobstatus tables for job that is in ABORTED state. After this is done (with SQL UPDATE command), when embedded_web_server is restarted it continues to the next PENDING job, puts it inot IN_PROGRESS state and starts running the job without issues.

Also, after embedded web server is restarted, we sometimes get a "java.lang.OutOfMemoryError: Java heap space" error when running ml_insert_ctl -list or -status commands. I checked, and embedded_web_server is started with 4G heap space setting. The server itself seems not to be running out of memory (it has 32 G installed).




Customer has 2 core system with 32 GB of memory for the server  where the server is not out of space but it does shows heap space error

Since live giving problems they can move to manual collection

they are currently using manual collection taking 22 - 23 minutes driven by snapshot process from cron
they cannot give any logs now as the issue got mitigated using the workaround
Bug details contain sensitive information and therefore require a Cisco.com account to be viewed.

Bug Details Include

  • Full Description (including symptoms, conditions and workarounds)
  • Status
  • Severity
  • Known Fixed Releases
  • Related Community Discussions
  • Number of Related Support Cases
Bug information is viewable for customers and partners who have a service contract. Registered users can view up to 200 bugs per month without a service contract.