The following scenario highlights the importance of having regular server status reporting in place for unexpected events.
For a couple of months at irregular intervals we had users complain that some mails (not all) they’d been expecting from external users were arriving days later or were not arriving at all.
I’d been examining the headers of delayed e-mails when they did arrive and had found that external mail servers delivering e-mails to ours were taking a number of hours or days to do so. Most of the delayed e-mails “seemed” to originate from Eircom (an ISP based in Ireland) so I chalked the problem down to an issue on Eircoms side. Our Security team agreed. This was an oversight on my part however which I’ll explain later…
As I examined more closely I was able to pull more delayed e-mails from external users whose e-mail were not routed through Eircom so logically this meant the problem wasn’t specifically with Eircom.
As a test I had some of the users that reported delays e-mail my employers e-mail address whilst CCing in my personal gmail address. Eventually the issue reproduced itself.
Mails arrived in my Gmail Inbox within a couple of minutes whilst e-mail destined for my employers e-mail address arrived a number of hours later. In the end I was able to reproduce the issue with the help of 4 seperate senders.
Convincing the Security team was a problem however as everything looked fine on our end. They had a hard time accepting the issue was on our side even though I could reproduce the problem with four seperate senders. Four seperate external mail servers behaving in the same way at the same time? and the problem is not on our end?
They needed more evidence…
So the next port of call was Eircom – They were able to provide the following logs. IP addresses and e-mail addresses removed:
2010-07-13 17:22:19.744009500 starting delivery 2146375: msg 3317173 to remote [email protected]
2010-07-13 17:28:59.581941500 starting delivery 2147602: msg 3317173 to remote [email protected]
2010-07-13 17:49:00.077173500 starting delivery 2150674: msg 3317173 to remote [email protected]
2010-07-13 18:22:19.009197500 starting delivery 2158998: msg 3317173 to remote [email protected]
2010-07-13 19:08:59.078851500 starting delivery 2171856: msg 3317173 to remote [email protected]
2010-07-13 20:08:59.270208500 starting delivery 2188502: msg 3317173 to remote [email protected]
2010-07-13 21:22:20.122931500 starting delivery 2203875: msg 3317173 to remote [email protected]
2010-07-13 22:48:59.138197500 starting delivery 2224308: msg 3317173 to remote [email protected]
2010-07-14 00:28:59.064738500 starting delivery 2244997: msg 3317173 to remote [email protected]
2010-07-14 02:22:19.239298500 starting delivery 2261581: msg 3317173 to remote [email protected]
2010-07-14 04:28:59.989719500 starting delivery 2279211: msg 3317173 to remote [email protected]
2010-07-14 06:48:59.004682500 starting delivery 2299239: msg 3317173 to remote [email protected]
2010-07-14 09:22:19.021063500 starting delivery 2342483: msg 3317173 to remote [email protected]
2010-07-14 12:08:59.033052500 starting delivery 2428140: msg 3317173 to remote [email protected]
2010-07-13 17:22:19.744009500 starting delivery 2146375: msg 3317173 to remote [email protected]
2010-07-13 17:24:09.861329500 delivery 2146375: deferral: Connected_to_<IPAddress>_but_sender_was_rejected./Remote_host_said:_451_#4.1.8_Domain
_of_sender_address_<3rdparty@theircompany.ie>_does_not_resolve/
From the above you can see that Eircoms gateway had attempted to deliver the mail in question numerous times over two days but was being rejected because the senders e-mail address could not be resolved.
So I asked the Security team to investigate if any work had been carried out on our DNS infrastructure (or any failures) on the 13th that would prevent the Ironport performing a DNS lookup.
They found that one of the DNS servers which our Ironport server was actively using for sender verification was restarting intermittently causing our Ironport to drop connection attempts when the external mail server attempted delivery.
They didn’t have status reporting in place for the DNS servers to report any unexpected events.
Moral of the story – don’t rely on e-mail headers to judge mail delivery attempts (they only indicate successful connections) and make sure you have status reporting in place.