Guest

Preview Tool

Cisco Bug: CSCvt24805 - Tomahawk NP issue due to TCAM errors causing the PRM process to get blocked

Last Modified

Sep 02, 2020

Products (11)

  • Cisco ASR 9000 Series Aggregation Services Routers
  • Cisco ASR 9910 Router
  • Cisco ASR 9922 Router
  • Cisco IOS XR Software
  • Cisco ASR 9010 Router
  • Cisco ASR 9904 Router
  • Cisco ASR 9006 Router
  • Cisco ASR 9901 Router
  • Cisco ASR 9001 Router
  • Cisco ASR 9906 Router
View all products in Bug Search Tool Login Required

Known Affected Releases

6.2.3.BASE

Description (partial)

Symptom:
Tomahawk LC which is running a classic XR (32-bit) image and has TCAM  interface errors  causing prm_server_to  process blocked in  LC 0/18/cpu0 affecting the LC working capabilities .

1) Tomahawk LC which is running a classic XR (32-bit) image and then experiences a communication failure between PRM process and TCAM.

From prm error trace output:

Jan 16 10:50:35.841 prm_server/error 0/18/CPU0 t1  prm_int_handle_tcam_err: tcam error on NP1, cause 0x0000000c!

Jan 16 10:50:38.823 prm_server/error 0/18/CPU0 t1  tcam_enable_disable_parity: np 1 write TCAM_PARITY_REG1failed: 1, 4098

Jan 16 10:50:38.823 prm_server/error 0/18/CPU0 t1  prm_int_handle_tcam_err: cxr fail to inject errors: 'prm_server' detected the 'warning' condition 'TCAM error encountered.' 

2) we can see that all processes getting blocked in this scenario are related to prm_server_to where they are facing communication issues with the VIC card. 

Following is the communication path between the same:   LC FIB -> PRM -> NP -> TCAM
 
prm_server_to process is blocked in location 0/18/CPU0

 RP/0/RP0/CPU0:test# show processes blocked location 0/18/CPU0 

Thu Jan 16 10:50:28.170 BKK

  Jid       Pid Tid            Name State   TimeInState    Blocked-on
  180    172111   1        fab_xbar Reply    0:00:00:0000       1  node 0/RP0/CPU0 kernel
  304    172113   1   prm_server_to Reply    0:00:00:0000       1  node 0/RP0/CPU0 kernel
  304    172113  18   prm_server_to Mutex    0:00:00:0187  172113-21 #1 
  304    172113  21   prm_server_to Mutex    0:00:00:0000  172113-01 #1 
  294    192627   1 pifibm_server_lc Reply    0:00:11:0542  172113  prm_server_to
  366    237738   1         vic_0_0  Send    0:00:02:0900  172113  prm_server_to
  366    237738  12         vic_0_0 Reply    0:00:01:0303  172113  prm_server_to
  366    237738  14         vic_0_0 Mutex    0:00:02:0918  237738-01 #1 
  366    237738  15         vic_0_0 Mutex    0:00:02:0923  237738-01 #1 
  379    237739   1         vic_1_0  Send    0:00:01:0328  172113  prm_server_to
  379    237739  12         vic_1_0  Send    0:00:01:0338  172113  prm_server_to


 3) np controllers  interrupts  also have MEM_ERR_SINGLE  and MEM_ERR_DOUBLE callback value.(which also mean  memory issue)
1                 MEM_ERR_SINGLE   24       0      0  N   Y  434c492  --------------
1                 MEM_ERR_DOUBLE   25       0      0  N   Y  434c580  --------------

From prm trace:
Oct 28 13:20:42.934 prm_server/error 0/16/CPU0 t1 prm_int_handle_tcam_err: tcam error on NP1, cause 0x00000008! >>> It is similar to the one faced in this issue indicating TCAM interface error.

From NPdrvrlog file we can see errors like the following:

ERROR! 0x80001755 EZapiPrm_ExtTCAMCommand: EZprmCAMC_ExtTCAMCommand failed, channel Id 1. in file 'drivers/chips/np/ezchip-5c/src/host/driver/src/api/EZapiPrm.c' line 1007

Conditions:
Tomahawk LC running on classic XR version which is experiencing TCAM <=> NP communication issues
Bug details contain sensitive information and therefore require a Cisco.com account to be viewed.

Bug Details Include

  • Full Description (including symptoms, conditions and workarounds)
  • Status
  • Severity
  • Known Fixed Releases
  • Related Community Discussions
  • Number of Related Support Cases
Bug information is viewable for customers and partners who have a service contract. Registered users can view up to 200 bugs per month without a service contract.