Here’s a problem I initially suspected was being caused by our Cisco 4500 series switch or the software rev on the switch but it turned out to be something completely different.
Here’s the background of the problem:
Two of our office users started to have problems with their PC’s intermittently losing network connectivity for about 10 sec’s at a time, at which point the PC would reconnect to the network and behave perfectly for another couple of hours before disconnecting again.
Our new switch had been installed about a year or so ago with all new Cat6 cabling feeding through to our Comms room patch panel (or so I thought). Our PC’s had been behaving fine for a number of months and connecting fine at GB speeds.
The cabling from our patch panels out to our office floor ports is all Cat5e Gigaspeed certified.
The fact that two users were having the same problem was strange and initially had me worried as it was doubtful this was a problem with the OS build otherwise as lot more users would’ve reported the issue. It was also unlikely this was a virus as the likelihood of two PC’s being infected with the same virus at the exact same time was low.
So I started to dig some more – here’s what I found:
I was lucky to in a position where I was able to catch the network disconnect (as I’m calling it for now) on one of the floor boxes affected, at which point I unplugged the affected PC and plugged in a laptop.
The laptop displayed the same behaviour as the desktop PC – no network communication and then after a few seconds it came to life again.
This is what I was looking at on the PC’s after they experienced an outage:
It turned out a number of other users where having the same problem but hadn’t noticed the issue as the network outage was so short.
The above got me worried as the issue looked to be switch based – either a problem with the line card or the software rev on the switch itself.
At this point I contacted our Network team and laid out what I found.
Problem was they couldn’t find any port outages recorded on the switch. They suggested we were having a problem with the NIC card/PC hardware and suggested that manually enabling 1GB on the NIC card and switch port should fix the problem.
I had a problem with this suggestion however as it did not resolve why my laptop would not receive traffic (not even a link light) when it was plugged into an affected floorport. Manually enabling 1Gb on the switch and PC is also not best practise.
So a lot more Event Viewer digging on a lot more office PC’s revealed the same network outage conditions with the same event viewer error message reported. I was also able to find network outages on a Mac so this was highly unlikely a OS build or hardware issue on our PC’s
The Network team where still unable to see any outages on the switch so they weren’t likely to dig further until I suggested they check switch logging is actually switched on and configured to detect disconnects and check the software revision on the switch for any bugs that match the disconnect issue…
It was at this point they found out that link state messages were disabled in the image they were using on the switch. After they got link message logging enabled they were finally able to see the port disconnects which they found to be down to link flaps.
The reason they couldn’t see the drops wasn’t down to what I thought – but I took this as a win as they were now able to see the port outages…
After much back and forth around the possible causes of issue including bpdu packets and questions about the level of logging the Network team had enabled I then began to investigate our network cables as the cause even though I didn’t think it was possible for a cable plugged into a switch to cause a port to disable itself completely for 10 secs at a time. Especially when the PC connected to that cable behaves itself 99 percent of the time…
An incorrect assumption – but you learn something new everyday!
My checks indicated all the cables were rated for GB speeds until I found a bunch of cables that stood out from the others purely because they did not have “Gigaspeed” written at the end of the cable identifier. The cables in question were Systimax 1074D 4/24 (UL) and they ran from the switch to the patch panel.
It turned out almost all the cabling running from the switch to the patch panel had been replaced with Cat6 but not all. Some Systimax 1074D 4/24 (UL) cable – about 15% of the old cable running from the Cisco 4500 switch to the patch panel was still in place…
I couldn’t find any information about the Systimax 1074D 4/24 (UL) on the internet so I had no clue if it was Cat5, 5e or Cat6 certified. Thankfully a great guy called Alaric Jenkins from network cabling supplier Info Stor responded to my query and was able to tell me that the Systimax 1074D 4/24 (UL) cable is Cat5e but not certified for Gb ethernet.
So here’s what I learned:
1. Just because you’re able to get GB speed on your PC using a Cat5e cable 99% of the time – that doesn’t mean your Ethernet cable is GB certified…
2. Cabling can adversely affect your switch ports, even going so far as to completely disable them.