Guest

Preview Tool

Cisco Bug: CSCvu56463 - Excessive wait times during tunnel provisioning on ASR and CGR

Last Modified

Jun 18, 2020

Products (1)

  • Network Level Service

Known Affected Releases

4.5(1.11)

Description (partial)

Symptom:
When CGR establishes the netconfig session with ASR. ASR tries to acquire a lock it does "4 commands" & we debugged the commands as below (REFER to Condition section for the records of time ) : 
Show interface
Show ip nhrp 
show ipv6 nhrp
UpdateNetInterface

From the 3rd to 4th command it will take time & that's where lock takes more time.


Recommendations to fix this issue: 
+The ASR shouldn't be touched for FlexVPN tunnels
+Given we touch the ASR, need to fix bug of 20 minutes wait in between the 3rd and 4th show command (e.g. The app processing thread holds the lock where other app servers cant process CGRs behind the same ASR)
+If FND reg fails twice, don't roll back to before tunnel config ... Needs to stay with it's tunnels up until someone can figure out the issue.
+If FND tunnel prov fails four times don't start processing it again if it's already in process by another app server.

Conditions:
1/28/2020 - JAD20240C63
9:29:39AM - Tunnel Prov request (app1)
9:29:41AM - In processing thread
9:29:46AM - Finished a bunch of show commands (4.804s)
9:29:46AM - Check if before-tunnel-config and before-registration-config exists on flash (~2.3s) (both didn't exist)
9:29:55AM - Created before-tunnel-config (6.816s)
9:39:39AM - Received another tunnel prov request (set to come in every 10 minutes) (app4)
9:43:45AM - Acquired lock on ASR (CDC-UOTF-AGG1) (13.852min)
9:43:45AM - Opened NETCONF session with ASR (~1.3s)
9:43:45AM - Sent to ASR "show interfaces | include Description: | Encapsulation | address is | line protocol | packets input, | packets output, | Tunnel protection | Tunnel protocol| Tunnel source]"
9:44:03AM - Received response from show command (17.1s)
9:44:03AM - Sent to ASR "show ip nhrp | include ^[0-9A-F]| Tunnel| NBMA]"
9:44:03AM - Received response from show command (~0.5s)
9:44:05AM - Sent to ASR "show ipv6 nhrp | include ^[0-9A-F]"
9:44:05AM - Received response from show command (~0.6s)
9:49:41AM - Received another tunnel prov request (set to come in every 10 minutes) (app4 again)
9:59:49AM - Acquired lock on CGR (CGR1240/K9+JAD20240C63) (20.156min)
9:59:49AM - Added to processing queue again and starts processing instead of pushing tunnels down!
9:59:55AM - Finished a bunch of show commands (5.011s)
9:58:58AM - Check if before-tunnel-config and before-registration-config exists on flash (~2.3s) (Only before-tunnel-config exist, which FND created at 9:29:55AM)
9:59:58AM - FND decides to apply the before-tunnel-config File [flash:/before-tunnel-config] exists on element [CGR1240/K9+JAD20240C63], indicating configuration has already been applied. To avoid conflicts, existing configuration will be removed before re-applying configuration
10:00:19AM - Acquired lock on CGR (CGR1240/K9+JAD20240C63) (14.726s)
10:00:19AM - Added to processing queue again and starts processing instead of pushing tunnels down!
10:00:25AM - Finished a bunch of show commands (6.765s)
10:00:27AM - Check if before-tunnel-config and before-registration-config exists on flash (~2.6s) (Only before-tunnel-config exist, which FND created at 9:29:55AM)
10:00:27AM - File [flash:/before-tunnel-config] exists on element [CGR1240/K9+JAD20240C63], indicating configuration has already been applied. To avoid conflicts, existing configuration will be removed before re-applying configuration
10:18:28AM - Several show commands ran against ASR again, no Acquired lock statement, so guessing this processing thread is holding the ASR lock
10:18:28AM - Obtained current configuration of element [CDC-UOTF-AGG1] before tunnel provisioning. Time to obtain [967020 ms]
10:18:31AM - Released IPv4 and IPv6 leases back to DHCP server
10:18:33AM - Get IPv6 and IPv4 leases again from DHCP servers
10:18:33AM - Sends tunnel template down via "Processed tunnel template for element"
10:18:45AM - Gets following error => Tunnel provisioning request for element [CGR1240/K9+JAD20240C63] failed [java.io.IOException: java.io.IOException: Unable to configure tunnels on [CGR1240/K9+JAD20240C63]. [invalid configuration commands supplied [BAD_PARAMETER][crypto ikev2 proposal IOS_IKEv2_Proposal]. [PARSE_ERROR_NOMATCH]] Sent [[interface Loopback0, ip address 10.205.52.145 255.255.255.255, ipv6 address fd00:abd:0:0:0:0:0:14d/128, exit, crypto ikev2 proposal IOS_IKEv2_Proposal, encryption aes-cbc-256, group 14, integrity sha256, exit, crypto ikev2 policy IOS_IKEv2_Policy, proposal IOS_IKEv2_Proposal, exit, crypto

12:55PM - Found express config on CGR was missing tunnel license, fixed and reloaded router

1:04:14PM - Tunnel Prov request (app1)

1:09:04PM - Get tunnel provisioning success but nothing shows in the log

1:20:08PM - Request from (CN=CGRJAD20240C63.uof.duke-energy.com, SERIALNUMBER=PID:CGR1240/K9 SN:JAD20240C63) did not come from proxy (CN=lcdcfndtpsp01.duke-energy.com, O=Duke Energy Corporation, L=Charlotte, ST=North Carolina, C=US)
1:20:08PM - Received registration request from element: [CGR1240/K9+JAD20240C63]
1:20:08PM - Processes a bunch of show commands
1:20:08PM - Element CGR1240/K9+JAD20240C63 is running supported firmware version 15.6(3)M3.  Continuing with element configuration
1:20:08PM - Mesh support is disabled for the CGR CGR1240/K9+JAD20240C63 skip processing wpan commands
1:20:08PM - Interface [Wpan2/1] is not the detected master wpan. Skipping processing advanced wpan commands
1:20:14PM - Processes a bunch of other show commands
1:20:17PM - Retrieved status of file [flash:/before-registration-config] on [CGR1240/K9+JAD20240C63]. File exists. Time to determine status [2.075 s]
1:20:17PM - File [flash:/before-registration-config] exists on element [CGR1240/K9+JAD20240C63], indicating configuration has already been applied. To avoid conflicts, existing configuration will be removed before re-applying configuration
1:20:22PM - Reloading element CGR1240/K9+JAD20240C63 in order to complete configuration, Rolling back configuration
1:49:41PM - Received registration request from element: [CGR1240/K9+JAD20240C63]
1:49:50PM - Does a bunch of show commands
1:49:56PM - FND must have deleted the before-reg config before reload, now doesnt exist and it recreates it  Copied running-config of [CGR1240/K9+JAD20240C63] to [flash:/before-registration-config]. Time to copy [5.916 s]
1:50:25PM - Get error - Error during registration process: java.io.IOException: [could not undo changes, configuration is in an unknown state [ROLLBACK_FAILED][no ip sla schedule 1 life forever start-time now]. [PARSE_ERROR_NOERR]] Sent [[cgna gzip, cgna profile cg-nms-periodic, gzip interval 60, add-command show version | format flash:/managed/odm/cg-nms.odm, add-command show environment temperature | format flash:/managed/odm/cg-nms.odm, add-command show hosts | format flash:/managed/odm/cg-nms.odm, add-command show interfaces | format flash:/managed/odm/cg-nms.odm, add-comma

2:23PM - Found express config was missing the following section:
//We could add this to the tunnel template to prevent this from happening
ip sla 1
 icmp-echo 10.200.200.30
 frequency 3600
 timeout 6000
ip sla schedule 1 life forever start-time now

1:51:42PM - After failing FND registration twice it then rolls back to before tunnel

2:28:48PM - Received tunnel provisioning request from [CGR1240/K9+JAD20240C63]. Rate of incoming requests [1/min] (another happened earlier 10 minutes prior.. starting analysis here, moved request to 60 minutes)
2:38:34PM - Acquired lock on element [CDC-UOTF-AGG1]. Time to acquire lock [19.485 min]
2:38:54PM - Opened NETCONF session with CDC-UOTF-AGG1, did a bunch of show commands, ends with successful response to comamnd "show ipv6 nhrp | include ^[0-9A-F]"
2:57:49PM - Huge gap in time before FND sends the next show command!!!!  Maybe this is why it locks for so long.
2:57:49PM - Sending [show ipv6 interface | include address | protocol | subnet] to element [CDC-UOTF-AGG1]
2:58:01PM - Gets response from show command
2:58:07PM - Obtained current configuration of element [CDC-UOTF-AGG1] before tunnel provisioning. Time to obtain [1171309 ms].
Bug details contain sensitive information and therefore require a Cisco.com account to be viewed.

Bug Details Include

  • Full Description (including symptoms, conditions and workarounds)
  • Status
  • Severity
  • Known Fixed Releases
  • Related Community Discussions
  • Number of Related Support Cases
Bug information is viewable for customers and partners who have a service contract. Registered users can view up to 200 bugs per month without a service contract.