Thursday, 6 July 2017
Tuesday, 27 June 2017
DB issue.
well well well..Where do I start.It's like the scariest word on planet Earth for me. One time I saw some unusual behavior on our cluster I told my architect then he goes "Please don't give me heart attack" Lol
The project I am working on has supercluster.We have like 10 Subscriber,1 Publisher, and 2 TFTP server. Not bad huh! It's been more than a year we are being hunted by this Db issue now and then we spend countless house week weekend on a call with Cisco trying figure out if we can find the root cause and fix it.
Well, we were able to fix database temporary but we were never lucky enough to find any valid root cause.I will try to see if I can get the story from the beginning but for now, we will just have to stick with what happened next.
email copied from TAC finding name has been changed for security reason.
The project I am working on has supercluster.We have like 10 Subscriber,1 Publisher, and 2 TFTP server. Not bad huh! It's been more than a year we are being hunted by this Db issue now and then we spend countless house week weekend on a call with Cisco trying figure out if we can find the root cause and fix it.
Well, we were able to fix database temporary but we were never lucky enough to find any valid root cause.I will try to see if I can get the story from the beginning but for now, we will just have to stick with what happened next.
email copied from TAC finding name has been changed for security reason.
Problem Description:
Phones are in rejected state when they are failed over to secondary node
Action Plan:
As discussed, please find the summary of the Webex we had:
++ If the phone is associated to the device pool with only secondary node, it registers fine
++ If the phone is associated to the device pool with primary and secondary node, phone fails over fine with secondary
++ Issue is only with one particular device pool
++ Took CCM traces, App and Sys logs, pcaps from secondary node, TFTP logs
++ From pcaps, phone sends a register request and receive a 404 not found from the CUCM, as it is not present in the database
Warning: 399 XYZ-CUCMSUB-01 "Unable to find device/user in database"
++ With SQL query we can see that the phone is present in the database
++ Checked replication, its fine
++ From CCM traces, it shows that phone is in DB but it cannot find that it is a member of CM group of the same device pool
81068379.007 |15:08:57.507 |AppInfo |Device=SEP123456789 in DB already but cannot register. isDeviceNameAllowedToRegister=CallManager Pkid(3a5-880) is not a member of Call Manager Group(PPP-CMG) (isCallManagerMemberOfDevicePool)
++ Created a new DP with all same subs in it
++ phone registers fine with secondary node if primary is down
As per the observation, it looks like the above error was happening because of the RIS Data Collector Service having incorrect information about this phone (since we are testing with one) that was trying to register. Even though the phones were not registered on the first subscriber in the group, the RIS DC assumed that the phones were using the old node in the CCM group This is why we see " SEP123456789” in DB already but cannot register.
After making new CM group, we noticed that the subscriber was having incorrect status of phones and Publisher was now showing right status of phones. Since RIS DC interfaces the memory between CCM and Tomcat, it looked like Tomcat was not picking the right entries from its memory, which RIS DC has to provide.
Action Plan –
++ Restart RIS DC, Tomcat and CCM service for that node
++ For further RCA, please collect detailed logs:
1) CUCM
2) TFTP
3) RIS
Sunday, 21 May 2017
CUCM phone RTMT tshoot command
CUCM:
CUCMwiki Click here
CUCM quick CLI Click here
*/utils diagnose test
utils ntp status
show process load cpu
show process load memory
show process using-most cpu
show process using-most memory
utils core active list
/*
IP PHONE:
To see phone log on web page loghttp://CUCM:6970/MAC.cnf.xml
CUBE:
9 Key CUBE Command Click here
VOICE commands Click hereMGCP Tshoot - Click Here
CUBE wiki Click here
RTMT:
#show risdb query phone Click Here
UCCX:
Finesse logs can be directly collected from web as the below URL:
https://supportforums.cisco.com/discussion/13212531/finesse-logs-user-and-server
MRA n B2B
MRA n B2B Click here
Excellent UC Cli command Quick reference Click Here
Tuesday, 16 May 2017
Project home ASAv
http://www.cisco.com/c/en/us/support/docs/ip/layer-two-tunnel-protocol-l2tp/200340-Configure-L2TP-Over-IPsec-Between-Window.html
Sunday, 7 May 2017
Thursday, 4 May 2017
ITSP CUBE SIP Loop
Problem:
1) The call invite comes into the CUBE from ITSP.
2) A call invite is sent from the CUBE to the CUCM.
3) CUCM looks up the number, find that it is unassigned at replies to the CUBE with a SIP 404 (number not found).
4) The CUBE looks for alternative matches in the dial-peers and matches against an outbound rule.
5) A call invite is then sent outbound from the CUBE to ITSP.
6) ITSP send the invite back into the CUBE.
7) The CUBE detects based upon the SIP call ID that it is a duplicate call.
8) The outbound call on the CUBE is dropped, and the inbound call is replied to by the CUBE with a 504 (internal server error)
Solution:
Cisco CUCM and CUBE by default do not drop a call when it receives a 486 busy, 404 not found or out of bandwidth. They reroute for all cause codes other than Out of Bandwidth, User Busy, and Unallocated Number.
For CUCM, the value of the associated service parameters for the Cisco Call Manager service determines the rerouting decision for those cause codes. The Cluster wide Parameters (Route Plan) : Stop Routing on Out of Bandwidth Flag, Stop Routing on User Busy Flag, and Stop Routing on Unallocated Number Flag service parameters, determines what re-routing decision happens in this scenario.
All well and good fro CUCM, but what about CUBE...
We can also tell CUBE what to do in this circumstances just as with CUCM. The magic is to use the voice hunt command
#conf t
no voice hunt unassigned-number
no voice hunt invalid-number
no voice hunt user-busy
also,
Your CSS on the gateway in CUCM should not have access to any patterns pointing back to the voice gateway. That's a toll fraud as well as a call loop vulnerability if you have it set up that way.
Your trunk CSS shouldn't have access to that route pattern.
Reference: (Collected)
https://supportforums.cisco.com/discussion/12005196/cucm-returns-internal-service-error-un-allocated-numbers
https://supportforums.cisco.com/blog/12153411/sip-musings-and-other-matters
1) The call invite comes into the CUBE from ITSP.
2) A call invite is sent from the CUBE to the CUCM.
3) CUCM looks up the number, find that it is unassigned at replies to the CUBE with a SIP 404 (number not found).
4) The CUBE looks for alternative matches in the dial-peers and matches against an outbound rule.
5) A call invite is then sent outbound from the CUBE to ITSP.
6) ITSP send the invite back into the CUBE.
7) The CUBE detects based upon the SIP call ID that it is a duplicate call.
8) The outbound call on the CUBE is dropped, and the inbound call is replied to by the CUBE with a 504 (internal server error)
Solution:
Cisco CUCM and CUBE by default do not drop a call when it receives a 486 busy, 404 not found or out of bandwidth. They reroute for all cause codes other than Out of Bandwidth, User Busy, and Unallocated Number.
For CUCM, the value of the associated service parameters for the Cisco Call Manager service determines the rerouting decision for those cause codes. The Cluster wide Parameters (Route Plan) : Stop Routing on Out of Bandwidth Flag, Stop Routing on User Busy Flag, and Stop Routing on Unallocated Number Flag service parameters, determines what re-routing decision happens in this scenario.
All well and good fro CUCM, but what about CUBE...
We can also tell CUBE what to do in this circumstances just as with CUCM. The magic is to use the voice hunt command
#conf t
no voice hunt unassigned-number
no voice hunt invalid-number
no voice hunt user-busy
also,
Your CSS on the gateway in CUCM should not have access to any patterns pointing back to the voice gateway. That's a toll fraud as well as a call loop vulnerability if you have it set up that way.
Your trunk CSS shouldn't have access to that route pattern.
Reference: (Collected)
https://supportforums.cisco.com/discussion/12005196/cucm-returns-internal-service-error-un-allocated-numbers
https://supportforums.cisco.com/blog/12153411/sip-musings-and-other-matters
Sunday, 12 February 2017
Good to know UC
*Inside CSS call route doesn't choose depending on what PT it has access first but choose depending on best match!!! what??? true
Subscribe to:
Posts (Atom)