SIP-ing Through the Firewall
Here’s an odd puzzle for you. Fire up a SIP soft-phone and connect it to an external provider. Dial a number and you can make a call successfully.
Doesn’t sound particularly unusual. Now fire up another client on the same subnet and try to make a call – it doesn’t work.
That’s the problem we were faced with on a Checkpoint firewall recently. A problem that’s even more bizarre when you consider the fact we had an almost identically configured second firewall that didn’t display the problem.
So what was causing the trouble? Well, it’s probably worth going through the troubleshooting steps before revealing the (not so) exciting conclusion.
One good step when you get a real head-scratcher of a networking problem is to grab a packet dump. That’s exactly what I did and you can see the (sanitised) results below:
It mostly looks like normal SIP traffic. There’s the outbound register message at the top. Following that, we can see a 401 (unauthorised) message. We’re getting that because the server in question wants more details as part of the authentication process. That’s normal.
What isn’t is the ICMP unreachable message our local machine sends in response. A little more digging shows we’re seeing the return UDP traffic on a completely different port we sent out from.
Strange, it’s almost like the NAT the firewall is doing isn’t translating the port back but is definitely sending to the right host. A bit of debugging with the fw monitor command on the firewall confirms suspicions.
The firewall knows exactly where to send the packet (i.e. the NAT translation details are in memory). However, something is stopping the port translation from happening.
This is where the slight difference between the two almost identical firewalls comes in to play. The only significant difference is one is operating with the IPS feature. The other is not.
So, with that in mind, I set up a little experiment with some VMs. There would be a firewall with NAT on a single static IP address. Behind that would be a handful of SIP clients.
It’s not a complicated test rig but allowed me to figure out what was causing the problem.
Anyway, the first test – a firewall with minimal rules (allow all outbound) performing NAT with the IPS feature disabled. The result – all three endpoints could successfully make calls.
On to the second test – the same firewall and configuration with the IPS feature enabled. The result – one client can make calls, the other two fail.
That’s odd and is down to how the port translation in the NAT part works (or doesn’t). The first client gets a direct translation of traffic on port 5060 to port 5060 on the external IP address. The other two clients will get random, high numbered ports on the external IP address.
This isn’t a problem on the first test as the IPS feature is turned off and the translation works as expected. It breaks on the second test where the port part of the translation doesn’t happen.
A bit of reading up suggests it’s with how the IPS inspects SIP traffic on these firewalls. The suggested approach at that point is to change what the IPS believe UDP traffic on port 5060 to be.
By default, it’s SIP_UDP and seems to have some extra processing attached (that breaks our port address translation). As a test, I changed it to “none”.
At that point, all three clients started being able to make calls again. Great news as it meant we could keep the extra protection the IPS provides.
That said, it was a real head scratcher and proof that even the most basic assumptions (“NAT can’t be broken – everything else works!”) need to be checked sometimes.