Thursday, 6 July 2017

Tuesday, 27 June 2017

DB issue.

well well well..Where do I start.It's like the scariest word on planet Earth for me. One time I saw some unusual behavior on our cluster I told my architect then he goes "Please don't give me heart attack" Lol


The project I am working on has supercluster.We have like 10 Subscriber,1 Publisher, and 2 TFTP server. Not bad huh! It's been more than a year we are being hunted by this Db issue now and then we spend countless house week weekend on a call with Cisco trying figure out if we can find the root cause and fix it.

Well, we were able to fix database temporary but we were never lucky enough to find any valid root cause.I will try to see if I can get the story from the beginning but for now, we will just have to stick with what happened next.

email copied from TAC finding name has been changed for security reason.

Problem Description:

Phones are in rejected state when they are failed over to secondary node

Action Plan:

As discussed, please find the summary of the Webex we had:

++ If the phone is associated to the device pool with only secondary node, it registers fine
++ If the phone is associated to the device pool with primary and secondary node, phone fails over fine with secondary
++ Issue is only with one particular device pool
++ Took CCM traces, App and Sys logs, pcaps from secondary node, TFTP logs
++ From pcaps, phone sends a register request and receive a 404 not found from the CUCM, as it is not present in the database

Warning: 399 XYZ-CUCMSUB-01 "Unable to find device/user in database"


++ With SQL query we can see that the phone is present in the database
++ Checked replication, its fine
++ From CCM traces, it shows that phone is in DB but it cannot find that it is a member of CM group of the same device pool

81068379.007 |15:08:57.507 |AppInfo  |Device=SEP123456789 in DB already but cannot register. isDeviceNameAllowedToRegister=CallManager Pkid(3a5-880) is not a member of Call Manager Group(PPP-CMG) (isCallManagerMemberOfDevicePool)


++ Created a new DP with all same subs in it
++ phone registers fine with secondary node if primary is down

As per the observation, it looks like the above error was happening because of the RIS Data Collector Service having incorrect information about this phone (since we are testing with one) that was trying to register. Even though the phones were not registered on the first subscriber in the group, the RIS DC assumed that the phones were using the old node in the CCM group This is why we see " SEP123456789” in DB already but cannot register.

After making new CM group, we noticed that the subscriber was having incorrect status of phones and Publisher was now showing right status of phones. Since RIS DC interfaces the memory between CCM and Tomcat, it looked like Tomcat was not picking the right entries from its memory, which RIS DC has to provide.


Action Plan –

++ Restart RIS DC, Tomcat and CCM service for that node
++ For further RCA, please collect detailed logs:

1) CUCM
2) TFTP
3) RIS
4) APP and sys logs

Wanna Learn More Click Here


Sunday, 21 May 2017

CUCM phone RTMT tshoot command

 

 CUCM:

CUCMwiki                Click here

CUCM quick CLI    Click here

*/
utils diagnose test
utils ntp status
show process load cpu
show process load memory
show process using-most cpu
show process using-most memory
utils core active list

 /*

 IP PHONE:

To see phone log on web page log

http://CUCM:6970/MAC.cnf.xml



 CUBE:

9 Key CUBE Command Click here

VOICE commands  Click here

MGCP Tshoot - Click Here


CUBE wiki Click here

 RTMT:

#show risdb query phone  Click Here



UCCX:



Finesse logs can be directly collected from web as the below URL:

https://supportforums.cisco.com/document/12356266/how-collect-finesse-logs
https://supportforums.cisco.com/discussion/13212531/finesse-logs-user-and-server



MRA n B2B

MRA n B2B              Click  here





Excellent UC Cli command Quick reference  Click Here


Tuesday, 16 May 2017

Project home ASAv












http://www.cisco.com/c/en/us/support/docs/ip/layer-two-tunnel-protocol-l2tp/200340-Configure-L2TP-Over-IPsec-Between-Window.html



Thursday, 4 May 2017

ITSP CUBE SIP Loop

Problem:


1) The call invite comes into the CUBE from ITSP.

2) A call invite is sent from the CUBE to the CUCM.

3) CUCM looks up the number, find that it is unassigned at replies to the CUBE with a SIP 404 (number not found).

4) The CUBE looks for alternative matches in the dial-peers and matches against an outbound rule.

5) A call invite is then sent outbound from the CUBE to ITSP.

6) ITSP send the invite back into the CUBE.

7) The CUBE detects based upon the SIP call ID that it is a duplicate call.

8) The outbound call on the CUBE is dropped, and the inbound call is replied to by the CUBE with a 504 (internal server error)




Solution:

Cisco CUCM and CUBE by default do not drop a call when it receives a 486  busy, 404 not found or out of bandwidth. They reroute for all cause codes other than Out of Bandwidth, User Busy, and Unallocated Number.

For CUCM, the value of the associated service parameters for the Cisco Call Manager service determines the rerouting decision for those cause codes. The Cluster wide Parameters (Route Plan) : Stop Routing on Out of Bandwidth Flag, Stop Routing on User Busy Flag, and Stop Routing on Unallocated Number Flag service parameters, determines what re-routing decision happens in this scenario.


All well and good fro CUCM, but what about CUBE...

We can also tell CUBE what to do in this circumstances just as with CUCM. The magic is to use the voice hunt command


#conf t

no voice hunt unassigned-number

no voice hunt invalid-number

no voice hunt user-busy



also,
Your CSS on the gateway in CUCM should not have access to any patterns pointing back to the voice gateway. That's a toll fraud as well as a call loop vulnerability if you have it set up that way.

Your trunk CSS shouldn't have access to that route pattern.


Reference: (Collected)

https://supportforums.cisco.com/discussion/12005196/cucm-returns-internal-service-error-un-allocated-numbers

https://supportforums.cisco.com/blog/12153411/sip-musings-and-other-matters

Sunday, 12 February 2017

Good to know UC

*Inside CSS call route doesn't choose depending on what PT it has access first but choose depending on best match!!! what??? true