A common obstacle in the world of IT is dealing with firewalls; particularly when they seem to be blocking something you need. Firewalls are by design, a pain to work with but obviously they’re a necessary proactive, security-hardening technique to lock-down any network services we don’t want unexposed on a network. Network services == network risk; the more you have, the bigger the risk of an attack so firewalls are a handy Band-Aid to tying some of these services down.
This article is a quick & dirty guide to troubleshooting firewall blocks and networking in general for when you have connection issues and you need to know what’s going on. I’m going to base the examples around my test hacking SharePoint environment which is one web-front-end server and one application server; the WFE obviously calling the app-server for back-end calls, which will break if a firewall cuts in.
If you don’t know the difference between TCP and UDP, read below. Otherwise skip this bit.
Firewalls impact UDP and TCP the same by blocking responses to requests but there’s a difference as to the impact of what that means to the sending device, depending on the protocol.
For UDP messages, given that the sending machine isn’t expecting a reply, the packets received that are ignored doesn’t have any impact as the sender will just keep blasting out packets anyway.
TCP on the other hand needs a full-on handshake to even start talking; if the sending machine doesn’t get something back relevant to the handshake it’ll sit there and wait until it either does get a reply or the connection times-out. This difference in protocol behaviour is kinda key to troubleshooting firewall issues.
This stuff is networking 101 really – UDP is like listening to music; it’s all receive only so not “replying” to a packet won’t do anything, but TCP is more like a phone-call – if you don’t answer the phone then the calling person will eventually hang-up after a few awkward moments of waiting silently for someone to answer, which is the firewall effect.
Key to knowing signs of a firewall also involves knowing what a normal TCP/IP conversation looks like. For those that don’t know the “TCP” part of the above sentence means “we’re going to talk in a formal & controlled way, handshaking first” and “IP” just refers to how the packets are routed between from sender to recipient, which is also the same for UDP packets or any other packets in the IP stack too.
But anyway, imagine two people talking over a phone – a friend calls another to his office number, and this is pretty similar to how TCP works. Here’s a normal conversation:
In the case that the person wanted isn’t there, normally an answerphone would take the call although let’s pretend the answerphone just magically knew that if nobody could answer the phone-call, the answerphone would kick in immediately instead of waiting. Anyway, TCP as if it were humans:
This is how TCP/IP was originally designed to deal with the right person not being available to take the call. Replace “person” with “service” with the caller picking the right service by differentiating by a port-number instead of a $name”. It’s a nice and convenient way that the caller is made immediately aware that the call has been in vein as soon as possible, because let’s be honest, it’s kinda annoying to call someone and not actually be able to get them so it’s good to get this “active refusal” as quickly as possible.
The problem is though that attackers can use this active denial or refusal by the destination to quickly figure out what services are available to start launching attacks against them. Because of this, the first thing firewalls do is prevent this helpful “the person you want isn’t here/available” messages so the conversation basically goes like this:
As you might imagine this sequence of conversation takes much, much longer so get to a conclusion than the “active denial” conversation, because of the particularly slow & blocking stages 4 & 5. Because of this extra delay it takes far longer to work-out that nothing’s listening than with an active denial or indeed a positive response from any service listening on that port, and this helps us greatly secure our server(s) against surface-scanning attacks. The only small problem for network admins however is that this same conversation is had if the destination server has gone offline for some reason too, which does also happens from time-to-time, so we need to know how to differentiate the two events.
Given we’re clear on what a firewall should be doing we just need to know what both the active-denial & timeout conversations look like in a network trace so we can start to determine if a firewall is causing inappropriate interruptions or not.
As already mentioned, I’ll use a very simple real-life example: a x2 server SharePoint farm with one web-front-end (WFE) and one search/app server.
When users perform a search via the WFE, it’s passed to the search service/server via the web-services backend which will return the results assuming that backend connection doesn’t fail for whatever reason. If it does, you’ll see a SharePoint “sad face”:
This is the SharePoint generic “your search died” screen you’ll see if that happens for whatever reason.
So if a host operating-system does its normal job of helpfully informing the calling device it’s wasting its’ time, how does that look in a network trace? Here’s how:
Here the SharePoint web-front-end has tried to talk via TCP to my app server on port 32843 (1st entry) and the app-server has replied “reset connection” (2nd entry), or in other words “go away (because there’s nothing listening)”. The caller checks twice more before deciding it’s no good and generating a network error for the application that made the call with the reported error of “there was nothing listening – the destination host refused”. This terrible security is also kinda handy – we already know what the problem is!
Given a firewall disables this handy diagnostic message back if there’s no service, how does it look when nothing’s received back? Like so:
Here we see the initial connection attempt by the WFE to the app-server although this time the firewall on the app-server blocks any talk back. Because of this no-response the calling machine decides to re-broadcast the packet as it’s possible that the endpoint is just slow to respond because of network latency or something like that. Notice the frame that the “retransmit” frames reference (#255 in this case), but alas, there’s no luck trying 3 times and so we have the same connection failure result.
Simple; run “netstat -ano” to get the process identifiers + ports, and pipe to “find” to get the port you’re interested in if you don’t want to see a load of ports you’re not interested in. If I want to see if my app-server has IIS listening for app-requests I’d do this first command:
The 1st time I run it, “find” doesn’t find anything meaning there’s nothing listening on port 32843. It’s telling the truth because IIS is stopped so starting it and then running the netstat command again shows quite clearly that port 32843 is now owned by PID “4”, which is always the Windows Kernel (not a standard application), which tells us requests to this port will be handled by the right application – IIS/SharePoint.
Pretty simple really – if you see SynReTransmit happen on the sender x2 times after the first SYN, with no response from the host, then the chances are it’s a firewall issue especially if you see the packets also arrive on the host (a network trace on the machine being called will show the Syn/ReSyn packets arriving at least) and also there’s definitely a service listening, then this is almost definitely a firewall blocking issue.
This should help the next time you have a battle with any particularly zealous network administrators (we love them really) and you need some evidence of TCP/UDP shenanigans going on at the firewall.
// Sam Betts