Parsing Emails With Python, A Quick Tip

While some stuff runs in the background, a quick tip on parsing email with Python.

I got the following error message from Python:

Traceback (most recent call last):
File “”, line 20, in
date = dateutil.parser.parse(msg[‘date’])
File “/usr/lib/python2.7/dist-packages/dateutil/”, line 697, in parse
return DEFAULTPARSER.parse(timestr, **kwargs)
File “/usr/lib/python2.7/dist-packages/dateutil/”, line 301, in parse
res = self._parse(timestr, **kwargs)
File “/usr/lib/python2.7/dist-packages/dateutil/”, line 349, in _parse
l = _timelex.split(timestr)
File “/usr/lib/python2.7/dist-packages/dateutil/”, line 143, in split
return list(cls(s))
File “/usr/lib/python2.7/dist-packages/dateutil/”, line 137, in next
token = self.get_token()
File “/usr/lib/python2.7/dist-packages/dateutil/”, line 68, in get_token
nextchar =
AttributeError: ‘NoneType’ object has no attribute ‘read’

I have edited the email header in question but it reproduces the original error:

Received: by with SMTP id w14cs34683wfw;
Wed, 5 Nov 2008 08:11:39 -0800 (PST)
Received: by with SMTP id r1mr728791wad.136.1225901498795;
Wed, 05 Nov 2008 08:11:38 -0800 (PST)
Received: from ( [])
by with ESMTP id m26si29354pof.3.2008.;
Wed, 05 Nov 2008 08:11:38 -0800 (PST)
Received-SPF: pass ( domain of designates
Received: from ([])
by with comcast
id bUBY1a0010b6N64A9UBeJl; Wed, 05 Nov 2008 16:11:38 +0000
Received: from ([])
by with comcast
id bUAV1a00L2JMgtY8PUAV7G; Wed, 05 Nov 2008 16:10:30 +0000
X-Authority-Analysis: v=1.0 c=1 a=1Ht49J2nGmlg0oY3xr8A:9
a=8nxvWDfACCTtBObdks-tTUtrMyYA:4 a=OA_lqj45gZcA:10 a=diNjy0DT58-4uIkuavEA:9
a=e0_VUgpf8QEu0XMU188OmzzKrzoA:4 a=37WNUvjkh6kA:10
Received: from [] by;
Wed, 05 Nov 2008 16:10:28 +0000

To: “Podesta” ,
CC: “Denis McDonough OFA” ,”,,
Subject: DOD leadership – immediate attention
Date: Wed, 05 Nov 2008 16:10:28 +0000
Message-Id: <110520081610.3048.4911C574000C2E2100000BE82216>
X-Mailer: AT&T Message Center Version 1 (Oct 30 2007)
X-Authenticated-Sender: c2V3YWxsY29ucm95QGNvbWNhc3QubmV0
MIME-Version: 1.0
Content-Type: multipart/alternative; boundary=”NextPart_Webmail_9m3u9jl4l_3048_1225901428_0″

Content-Type: text/plain
Content-Transfer-Encoding: 8bit

I’m comparing “Date” to similar emails and getting no joy.

Absence is hard to notice, but once you know the rule, it’s obvious:

RFC822: Standard for ARPA Internet Text Messages says in part:

3. Lexical Analysis of Messages

3.1 General Description

A message consists of header fields and, optionally, a body. The body is simply a sequence of lines containing ASCII characters. It is separated from the headers by a null line (i.e., a line with nothing preceding the CRLF). (emphasis added)

Yep, the blank line I introduced while removing an errant double-quote on a line by itself, created the start for the body of the message.

Meaning that my Python script failed to find the “Date:” field and returning what someone thought would be a useful error message.

When you get errors parsing emails with Python (and I assume in other languages), check the format of your messages!

RFC822 has an appendix of parsing rules and a few examples.

Suggested listings of the most common email/email header format errors?

Comments are closed.