We had some issues at work recently with intermittent e-mail’s from a third party not arriving in our Google App’s mailboxes (we’ve now migrated to using Google App’s for Business as opposed to using Exchange exclusively).
This was a big problem for us as these e-mail’s are an important source for news stories.
I immediately suspected spam/DNS server issues due to my previous experience trouble-shooting our Ironport issue two years ago.
This time however the problem lay with the third parties DNS not ours.
Here are the steps I took to troubleshoot then fix the problem:
Symptoms:
The issue was first reported when one of our news reporters noted some e-mails were not arriving – sent in from this news sources distribution list.
Some e-mail not all were being dropped – this is an important factor as we’ll see below.
Troubleshooting steps:
1. I realised straight away the problem wasn’t related to any e-mail filters in the reporters inbox or SMTP blacklists since approx only 20% of the mails were not arriving. Nonetheless I ran the check’s necessary and found nothing causing mail to be filtering into the users bin.
2. I checked in on another colleague who’s mail domain address is different from the news reporter that detected the problem (We use multiple domain names for separate titles and business teams). That colleague indicated they were missing the exact same e-mails that the first reporter detected missing.
3. The distribution list in question is used by a number of other news organisations so I sent a mail out to another news organisation’s IT dept to ask if they’d noticed those particular e-mails missing – they hadn’t.
4. I then asked an ordinary user at the news sources site, (I’m calling her [email protected] here) to send me an e-mail so I could evaluate the incoming e-mail headers for anything out of place. The e-mail she sent arrived fine but the information I found in her mail header provided the light bulb moment when taken into account with the rest of the information gathered. I found two key entries in [email protected]‘s mail header:
Received-SPF: fail (google.com: domain of [email protected] does not designate 137.191.225.35 as permitted sender) client-ip=137.191.225.35
Authentication-Results: mx.google.com;
spf=hardfail (google.com: domain of [email protected] does not designate 137.191.225.35 as permitted sender) smtp.mail=137.191.225.35;
It was time to brush up on SPF or Sender Policy Framework
SPF basically boils down to a DNS entry that indicates which SMTP servers are permitted to send e-mail on behalf of a mail domain. This prevents spammers from getting e-mail’s into a users inbox since spam prevention e-mail gateways like Ironport and Postini will perform a DNS verification check each time a particular SMTP server tries to deliver an e-mail purporting to be from a particular e-mail address (domain).
As an example say a spammer tried to deliver an e-mail to [email protected], by spoofing an innocent individuals e-mail address (let’s say [email protected]) and used their spamming SMTP server to try to send [email protected] a spam e-mail. The Ironport/Postini/Spam Gateway when it’s first contacted examines the mailing domain indicated in the e-mail address of the message – say “hotmail.com” and the accompanying IP address of the spamming SMTP server.
Ironport/Postini will then contact the DNS server (which should always be accessible on the internet) for hotmail.com to check that DNS Server’s SPF record. If it see’s that the IP addresses listed in the SPF record doesn’t match the IP address of the SMTP server that just tried to deliver an e-mail purporting to be from [email protected] it will drop that e-mail delivery attempt to [email protected], thereby preventing the spam reaching [email protected]‘s inbox. According to hotmail.com‘s DNS SPF record the spamming SMTP server trying to deliver a message purporting to be from [email protected] is not authorized…
Now back to our problem – So our news organisations Postini gateway was designating the IP address of our third parties SMTP server 137.191.225.35 as not being permitted to send e-mail on behalf of that user.
I had two problems with that –
1. First why did the SPF fail?
2. The SPF failed but [email protected]‘s e-mail got through anyway.
It was time to probe our news sources (xxxx.ie‘s) DNS configuration. In the process of doing so I discovered it was possible to expose the SPF configuration for DNS server’s available on the internet using this tool
So I plugged in the IP Address of the 137.191.225.35 SMTP server designated not authorized in [email protected]‘s e-mail header.
The Beveridge SPF Test tool then exposed the following SPF configuration for the xxxx.ie domain (some IP addresses changed for security purposes):
v=spf1 mx ip4:137.191.xxx.x1 ip4:137.191.xxx.x2 ip4:137.191.xxx.x3 -all [TTL=86400]
But I also noticed subsequent SPF test’s also returned a different SPF configuration below:
v=spf1 mx ip4:137.191.xxx.x1 ip4:137.191.xxx.x2 ip4:137.191.xxx.x3 ip4:137.191.225.35 ip4:137.191.xxx.x5 -all
So what was I looking at?
Basically one of more DNS servers for the domain xxxx.ie which our Postini gateway was performing SPF checks against had an incorrect/out of date SPF configuration. The second SPF configuration was the correct one as it listed 5 SMTP servers that were authorized to send mail on behalf of the xxxx.ie domain
The first incorrect SPF also should not have contained the “[TTL=86400]” reference, as this doesn’t conform to SPF standards.
Cross-referencing the IP address of the 137.191.225.35 SMTP Server in the e-mail header I found that 137.191.225.35 matched a missing IP address in the incorrect SPF record. 137.191.225.35 was present in the second SPF record but not the first.
So our Postini was hitting the DNS server with the incorrect SPF configuration about 20% of the time and therefore dropping 20% of the e-mails sent because it was determining that 137.191.225.35 wasn’t authorized to send e-mail on behalf of xxxx.ie 20% of the time.
I’d figured out the issue was was due to an incomplete SPF record on one xxxx.ie DNS server but why did the e-mail from [email protected] deliver though it was given an SPF failure while some e-mails from the distribution list [email protected] were not being at all delivered?
My guess is that our Postini gateway even though it flagged [email protected] as being an SPF failure – SPF checking for Google/Postini is evidently not that strict. There must be additional spam filtering criteria in place on our Postini that flags the [email protected] e-mails based on message content, that together with the results returned from the problem DNS server flagged the [email protected] as being spam 20% of the time. Our Postini services are provided by an outsourced company so I don’t have access to the specific spam filtering criteria to verify this.
Fix:
I contacted the IT Security department manager for the xxxx.ie, who confirmed my findings and isolated the problem DNS server. The DNS server in question was not under his dept’s direct control but was managed by another dept – I’m guessing for redundancy purposes.
He logged a change to have the SPF details applied to the problem DNS server which should eliminate our problems receiving mail from any @xxxx.ie addresses in the future.
Hopefully this has given you an insight to the complex world of spam filtering….