Guest

Preview Tool

Cisco Bug: CSCvt55362 - Server may reload due to P_THERMTRIP without any kind of Thermal event

Last Modified

Jun 15, 2020

Products (1)

  • Cisco Unified Computing System

Known Affected Releases

4.0(2c)H 4.0(4b)H 4.0(4e)C

Description (partial)

Symptom:
The customer device will reboot with logging similar to the following:

qpi_logger:1478: cpu_util.c:811:Warning: Cannot read QPI CE counter [cpu=33,cc=80,rs=0,port=0], tried 0 time(s)
qpi_logger:1478: cpu_util.c:811:Warning: Cannot read QPI CE counter [cpu=33,cc=80,rs=0,port=0], tried 1 time(s)
qpi_logger:1478: cpu_util.c:811:Warning: Cannot read QPI CE counter [cpu=33,cc=80,rs=0,port=0], tried 2 time(s)
qpi_logger:1478: cpu_util.c:811:Warning: Cannot read QPI CE counter [cpu=33,cc=80,rs=0,port=0], tried 3 time(s)
qpi_logger:1478: cpu_util.c:811:Warning: Cannot read QPI CE counter [cpu=33,cc=80,rs=0,port=0], tried 4 time(s)

selparser.c:727: # EC 05 00 00 01 02 00 00 03 CA 63 5E 2C 60 04 DC 17 00 00 00 75 A0 0D 03 # 5ec | <TIMESTAMP> | ME | Intel ME  #0x17 | FW Status PECI over DMI interface error DMI timeout of PECI request | Asserted

qpi_logger:1478: cpu_util.c:77:Notice: BMC detected correctable QPI error[cpu=30,msc=30,c=1,p=2]
qpi_logger:1478: cpu_util.c:77:Notice: BMC detected correctable QPI error[cpu=31,msc=30,c=1,p=0]
qpi_logger:1478: cpu_util.c:77:Notice: BMC detected correctable QPI error[cpu=31,msc=30,c=1,p=1]
qpi_logger:1478: cpu_util.c:77:Notice: BMC detected correctable QPI error[cpu=31,msc=30,c=1,p=2]
qpi_logger:1478: cpu_util.c:77:Notice: BMC detected correctable QPI error[cpu=32,msc=30,c=1,p=0]
qpi_logger:1478: cpu_util.c:77:Notice: BMC detected correctable QPI error[cpu=32,msc=30,c=1,p=1]
qpi_logger:1478: cpu_util.c:77:Notice: BMC detected correctable QPI error[cpu=32,msc=30,c=1,p=2]

selparser:1388: [[xxxCVxxx]]:selparser.c:727: # ED 05 00 00 01 02 00 00 04 CA 63 5E 33 00 00 13 06 00 00 00 72 82 02 30 # 5ed | <TIMESTAMP> | BIOS | Critical Interrupt #0x06 | UPI Correctable "Link Layer CRC with successful reset with no degradation"CPU0, Port 2 | COR LL Rx detected CRC error: successful LLR without Phy Reinit | Asserted
selparser:1388: [[xxxCVxxx]]:selparser.c:727: # EE 05 00 00 01 02 00 00 04 CA 63 5E 33 00 00 13 06 00 00 00 72 82 10 30 # 5ee | <TIMESTAMP> | BIOS | Critical Interrupt #0x06 | UPI Correctable "Link Layer CRC with successful reset with no degradation"CPU1, Port 0 | COR LL Rx detected CRC error: successful LLR without Phy Reinit | Asserted
selparser:1388: [[xxxCVxxx]]:selparser.c:727: # EF 05 00 00 01 02 00 00 04 CA 63 5E 33 00 00 13 06 00 00 00 72 82 11 30 # 5ef | <TIMESTAMP> | BIOS | Critical Interrupt #0x06 | UPI Correctable "Link Layer CRC with successful reset with no degradation"CPU1, Port 1 | COR LL Rx detected CRC error: successful LLR without Phy Reinit | Asserted
selparser:1388: [[xxxCVxxx]]:selparser.c:727: # F0 05 00 00 01 02 00 00 04 CA 63 5E 33 00 00 13 06 00 00 00 72 82 12 30 # 5f0 | <TIMESTAMP> | BIOS | Critical Interrupt #0x06 | UPI Correctable "Link Layer CRC with successful reset with no degradation"CPU1, Port 2 | COR LL Rx detected CRC error: successful LLR without Phy Reinit | Asserted
selparser:1388: [[xxxCVxxx]]:selparser.c:727: # F1 05 00 00 01 02 00 00 04 CA 63 5E 33 00 00 13 06 00 00 00 72 82 20 30 # 5f1 | <TIMESTAMP> | BIOS | Critical Interrupt #0x06 | UPI Correctable "Link Layer CRC with successful reset with no degradation"CPU2, Port 0 | COR LL Rx detected CRC error: successful LLR without Phy Reinit | Asserted
selparser:1388: [[xxxCVxxx]]:selparser.c:727: # F2 05 00 00 01 02 00 00 04 CA 63 5E 33 00 00 13 06 00 00 00 72 82 21 30 # 5f2 | <TIMESTAMP> | BIOS | Critical Interrupt #0x06 | UPI Correctable "Link Layer CRC with successful reset with no degradation"CPU2, Port 1 | COR LL Rx detected CRC error: successful LLR without Phy Reinit | Asserted
selparser:1388: [[xxxCVxxx]]:selparser.c:727: # F3 05 00 00 01 02 00 00 04 CA 63 5E 33 00 00 13 06 00 00 00 72 82 22 30 # 5f3 | <TIMESTAMP> | BIOS | Critical Interrupt #0x06 | UPI Correctable "Link Layer CRC with successful reset with no degradation"CPU2, Port 2 | COR LL Rx detected CRC error: successful LLR without Phy Reinit | Asserted

fault-engined:-: %CIMC-3-EQUIPMENT_INOPERABLE:[F0174][major][equipment-inoperable][sys/rack-unit-1/board] P_THERMTRIP: Processor module is inoperable due to high temperature: Check cooling

Conditions:
As of 3/21/20, we understand this has impacted only 4.0 firmware and with C480-M5 series rack units.
Bug details contain sensitive information and therefore require a Cisco.com account to be viewed.

Bug Details Include

  • Full Description (including symptoms, conditions and workarounds)
  • Status
  • Severity
  • Known Fixed Releases
  • Related Community Discussions
  • Number of Related Support Cases
Bug information is viewable for customers and partners who have a service contract. Registered users can view up to 200 bugs per month without a service contract.