Guest

Preview Tool

Cisco Bug: CSCun46610 - Mismatch errors with vma/vmr real memory

Last Modified

Nov 17, 2016

Products (1)

  • Cisco Nexus 7000 Series Switches

Known Affected Releases

1

Description (partial)

Symptom:
rtl and ref data mismatch

Conditions:
releasing delay = 0 
=============
runll -tn test_random_enq_deq -seed 923869321 -uq -vo "+define+VMA +define+VMA_MEM +num_enq=90000 +num_deq=90000 +dbg=1 +dbg_ref=1 +define+AV_MEM28_GBL_BLACKOUT=1" -dump

Log:
/auto/nvbu-asic20/users/icozzani/voq/2014_02_21/ip_waverider/sim/linked_list/vma_vmr/runs/test_random_enq_deq_923869321_vma_90k

@  692402.695 ns: [ERROR] [llq_ref_t::check] Data mismatch (qid(0 = 0x0) desc(0x57) nxt_ptr(0x648) prob(1) ipg(-1)) EXP (qid(0 = 0x0) desc(0x23) nxt_ptr(0x93) prob(1) ipg(-1))

RTL is sending back
(qid(0 = 0x0) desc(0x57) nxt_ptr(0x648)

and REF is expecting
(qid(0 = 0x0) desc(0x23) nxt_ptr(0x93)

Note that one enqueue request had been sent with to qid0 with desc(0x23) and nxt_ptr(0x93)
@  657362.517 ns: [MSG] [llq_enq_drv_t::run] Driving ENQ w req.prob = 1 req = qid(0 = 0x0) desc(0x23) nxt_ptr(0x93) prob(1) ipg(4)


but no enqueue request had been sent with to qid0 with desc(0x57) and nxt_ptr(0x648),  why does rtl reads a data with desc(0x57) and nxt_ptr(0x648) out of qid(0)?

We have those with desc(0x57)
@  103132.207 ns: [MSG] [llq_enq_drv_t::run] Driving ENQ w req.prob = 1 req = qid(0 = 0x0) desc(0x57) nxt_ptr(0x533) prob(1) ipg(4)
@  117804.873 ns: [MSG] [llq_enq_drv_t::run] Driving ENQ w req.prob = 1 req = qid(0 = 0x0) desc(0x57) nxt_ptr(0x28) prob(1) ipg(7)
@  198133.017 ns: [MSG] [llq_enq_drv_t::run] Driving ENQ w req.prob = 1 req = qid(0 = 0x0) desc(0x57) nxt_ptr(0x539) prob(1) ipg(7)
@  344021.925 ns: [MSG] [llq_enq_drv_t::run] Driving ENQ w req.prob = 1 req = qid(0 = 0x0) desc(0x57) nxt_ptr(0x579) prob(1) ipg(2)
@  456566.169 ns: [MSG] [llq_enq_drv_t::run] Driving ENQ w req.prob = 1 req = qid(0 = 0x0) desc(0x57) nxt_ptr(0x2e0) prob(1) ipg(5)
@  488148.619 ns: [MSG] [llq_enq_drv_t::run] Driving ENQ w req.prob = 1 req = qid(0 = 0x0) desc(0x57) nxt_ptr(0x2a8) prob(1) ipg(6)
@  507343.545 ns: [MSG] [llq_enq_drv_t::run] Driving ENQ w req.prob = 1 req = qid(0 = 0x0) desc(0x57) nxt_ptr(0x604) prob(1) ipg(9)
@  515423.583 ns: [MSG] [llq_enq_drv_t::run] Driving ENQ w req.prob = 1 req = qid(0 = 0x0) desc(0x57) nxt_ptr(0x1e5) prob(1) ipg(7)
@  515810.443 ns: [MSG] [llq_enq_drv_t::run] Driving ENQ w req.prob = 1 req = qid(0 = 0x0) desc(0x57) nxt_ptr(0x62a) prob(1) ipg(8)
@  519470.939 ns: [MSG] [llq_enq_drv_t::run] Driving ENQ w req.prob = 1 req = qid(0 = 0x0) desc(0x57) nxt_ptr(0xa4) prob(1) ipg(0)
@  543464.263 ns: [MSG] [llq_enq_drv_t::run] Driving ENQ w req.prob = 1 req = qid(0 = 0x0) desc(0x57) nxt_ptr(0x18c) prob(1) ipg(1)
@  548493.443 ns: [MSG] [llq_enq_drv_t::run] Driving ENQ w req.prob = 1 req = qid(0 = 0x0) desc(0x57) nxt_ptr(0x6d0) prob(1) ipg(10)
@  548752.239 ns: [MSG] [llq_enq_drv_t::run] Driving ENQ w req.prob = 1 req = qid(0 = 0x0) desc(0x57) nxt_ptr(0x246) prob(1) ipg(6)
@  668369.351 ns: [MSG] [llq_enq_drv_t::run] Driving ENQ w req.prob = 1 req = qid(0 = 0x0) desc(0x57) nxt_ptr(0x244) prob(1) ipg(3)

and those  with nxt_ptr(0x648)
@  189300.603 ns: [MSG] [llq_enq_drv_t::run] Driving ENQ w req.prob = 1 req = qid(0 = 0x0) desc(0x7f) nxt_ptr(0x648) prob(1) ipg(5)
@  302946.731 ns: [MSG] [llq_enq_drv_t::run] Driving ENQ w req.prob = 1 req = qid(0 = 0x0) desc(0x41) nxt_ptr(0x648) prob(1) ipg(6)
@  473201.149 ns: [MSG] [llq_enq_drv_t::run] Driving ENQ w req.prob = 1 req = qid(0 = 0x0) desc(0x42) nxt_ptr(0x648) prob(1) ipg(6)



releasing delay = 10
==============
runll -tn test_random_enq_deq -seed 923869321 -uq -vo "+define+VMA +define+VMA_MEM +num_enq=90000 +num_deq=90000 +free_release_delay=10 +dbg=1 +dbg_ref=1 +define+AV_MEM28_GBL_BLACKOUT=1" -dump


Log:
/auto/nvbu-asic20/users/icozzani/voq/2014_02_21/ip_waverider/sim/linked_list/vma_vmr/runs/test_random_enq_deq_923869321_vma_10dly


[ERROR] [llq_ref_t::check] Data mismatch (qid(15 = 0xf) desc(0x49) nxt_ptr(0xa) prob(1) ipg(-1)) EXP (qid(15 = 0xf) desc(0x55) nxt_ptr(0x7) prob(1) ipg(-1))

The test sends 2 enq to qid 15:
enq_drv:
@     110.055 ns: [MSG] [llq_enq_drv_t::run] Driving ENQ w req.prob = 1 req = qid(15 = 0xf) desc(0x55) nxt_ptr(0x7) prob(1) ipg(5)
@     140.737 ns: [MSG] [llq_enq_drv_t::run] Driving ENQ w req.prob = 1 req = qid(15 = 0xf) desc(0x49) nxt_ptr(0xa) prob(1) ipg(2)

ref:
@     111.389 ns: [MSG] [llq_ref_t::process_enq_cmd] Received ENQ qid(15 = 0xf) desc(0x55) nxt_ptr(0x7) prob(1) ipg(-1)
@     142.071 ns: [MSG] [llq_ref_t::process_enq_cmd] Received ENQ qid(15 = 0xf) desc(0x49) nxt_ptr(0xa) prob(1) ipg(-1)

It seems that RTL reads out the 2nd data before the 1st one.

[YH]: From the waveform, the first dequeue on qid 15 is on time 136.269ns, but it seems ENV failed to catch that one.




[icozzani] Update 3/5/2014
=============
Ying changed the deq_mon to add a 10 cycle delay.
The vma test pass.
I tried several runs and they all pass.
I have parametrised the delay to use 10 for vma and 8 for vmr. 
Both VMA and VMR pass for both 8 and 10 cycle delay.
Ying said that any value > 5 is good for VMA and VMR.
I retried with delay=0 and delay=4 and al ltests still pass while they should fail for delay<5.
Still under investigation by Ying.

[YH]: 3/7/2014
The command to reproduce the failure should be:
runll  -tn test_random_enq_deq -uq -seed 923869321 -- +define+VMA +define+VMA_MEM +num_enq=2000000 +num_deq=2000000 +free_release_delay=NN +dbg=0 +dbg_ref=0 +define+AV_MEM28_GBL_BLACKOUT=1 
When NN=0, 1, 2, the test failed. The failure point was verified to be the issue that the page was recycled too fast.
When NN>=3, the test passed.

[icozzani] 03/07
=============
When I run with delay1 with 2million enq/deq the test still PASS>
runll -tn test_random_enq_deq -seed 923869321 -uq -vo "+define+VMR +define+VMR_MEM +num_enq=2000000 +num_deq=2000000 +free_release_delay=1 +dbg=0 +dbg_ref=0 +define+AV_MEM28_GBL_BLACKOUT=1" -dump

log at
/auto/nvbu-asic20/users/icozzani/voq/2014_02_21/ip_waverider/sim/linked_list/vma_vmr/runs/test_random_enq_deq_923869321_vma_2Mil_dly1_pass

[YH]: 3/10/2014
The command to reproduce the failure should be:
runll  -tn test_random_enq_deq -uq -seed 923869321 -- +define+VMA +define+VMA_MEM +num_enq=2000000 +num_deq=2000000 +free_release_delay=NN +dbg=0 +dbg_ref=0 +define+AV_MEM28_GBL_BLACKOUT=1 
When NN=0, 1, 2, the test failed. The failure point was verified to be the issue that the page was recycled too fast.
When NN>=3, the test passed.
To generate failed case for VMR, you should use a different seed.


[icozzani] 3/12/2014
==============
VMA passes for any delay >=3 and fails for delay=0,1,2.
VMR passes with any value of delay from 0 to 10.
VMR also passes with the old deq_mon that does not have the delayed release of pages to the free pool.

I will close the case. If future regressions show any new failure we will investigate further.
Bug details contain sensitive information and therefore require a Cisco.com account to be viewed.

Bug Details Include

  • Full Description (including symptoms, conditions and workarounds)
  • Status
  • Severity
  • Known Fixed Releases
  • Related Community Discussions
  • Number of Related Support Cases
Bug information is viewable for customers and partners who have a service contract. Registered users can view up to 200 bugs per month without a service contract.