Just got MRTG up and running…
End-users calling to complain that one minute the network is up and the next the network is down. Hmm…. is is a Layer1 problem? Maybe Layer3?
Turns out it’s an intermittent Layer1 issue…
So I get a call from an end-user that his calls are not forwarding out correctly to his mobile phone. Calling-party hears "The number you called is not a working number. Please check the number and call again".
Since I always backup configs before I make changes, I take a little time to compare some config checkpoints to spot where maybe I made a change that created this problem. No luck.
This is what the topology looks like…
Time to do some test calls while taking some debug trace files and send to Cisco…
VoIP CME:
debug voip ccapi inout
debug ccsip messages
debug voice translation
VoIP GW:
debug voip ccapi inout
debug isdn q931
debug ccsip messages
debug voice translation
Cisco TAC did a great job in their analysis of the trace files and it turned out the TOLLFRAUD_APP was being invoked which is rejecting the call .
Since the VoIP GW is not Internet-connected, there really isn’t any risk in this case for toll fraud. So I disabled the service and verified this resolved the issue.
So I get a call from the HR department. Seems the payroll processor is experiencing a very performance accessing Abra over the MPLS WAN.
So I add an ACE in the QoS ACL for include the payroll processor’s computer so that her traffic will get marked as DSCP EF(46).
No improvement. Hmmm.
So then I setup a test across the WAN using the topology below…
I think trigger an extended ping on the source…
Notice the average round-trip time of 141ms. Also notice the (1) lost packet out of (500).
I then go to the edge routers to verify packet/marking and notice something interesting on the far end…
Notice the (500) packets marked EF.
So the traffic was dropped in the MPLS network after it egressed the far end router.
Then I think…hmm… maybe I should be doing this test from the near-side edge router….
Maybe the problem is CoS on the trunk link between my Core Switch and the near-side edge router?
Still a high amount of average latency (though no packet loss).
Turns out one of the server guys was copying a huge amount of data over the WAN. As soon as this data copy terminated the issue disappeared….
Do in fact seem Verizon is not policying traffic. Need to investigate what we’re paying for in Gold CAR.
Meet the Sirona Orthophos XG. I sure hope the TCP/IP programming in this machine is not indicative of its xRay reliability.
So I’m replacing some Sonicwall’s for Cisco ASA’s and re-addressing the LAN for a doctor’s office. And we have to change the IP on the xRay machine…
So the default procedure is to go to a PC with software that communicates with the machine and change it from there. I plug in a private class A address with a /24 subnet mask and the software pops an error and says I have a network mask mismatch because it’s not a /8 :-|
In other words, the software is abiding by pre-RFC1519 rules from 1993 and doesn’t recognize Variable Length Subnet Mask!!!
So then I go to the xRay machine console. Uh… Houston… we have a problem with this here machine…
Boy does lightening do weird things. Had a bad storm come in last Sunday. When it was all over, the ISP cable modem had link LED’s lit up on ports for which there were no devices even plugged in :-)
PoE switch was no longer providing PoE either – but still correctly making forwarding, filtering, and flooding decisions.
But what was equally strange was the Cisco ASA Firewall would not forward correctly out to the Internet. Everything looked fine. WAN interface was up/up and saw no obvious problems.
After the ISP replaced their cable modem, I turned on a packet capture and pinged the WAN interface of the ASA to verify packets were at least reaching the Firewall. Packets were reaching the Firewall but the Firewall was NOT REPLYING!!!
Below is what it would look like if the Firewall was replying to each of those ICMP echo requests…
Time to call Cisco TAC.