The ambiguity in this post's title is actually quite appropriate, since that is exactly what I hate about dates. You'd think there'd be some structure to it, but then you come across a date that just fucks everything up.
Still don't know what I'm talking about?
Well, if you're expecting me to go on a tirade about my romantic life, you're going to be disappointed. I'm not talking about those yummy fruits either (I like those). No, I'm talking about these dates:
Thu, 21 Apr 2005 00:13:28 GMT
Dates in a format more or less like the above can be seen every where on the internet, from email headers to HTTP headers to RSS (2.0) feeds. So how is it ambiguous? Well, it's not syntactically/lexically ambiguous. After all, such date formats are well documented using unambiguous ways that look like:
date-time = [ day "," ] date time ; dd mm yy
; hh:mm:ss zzz
day = "Mon" / "Tue" / "Wed" / "Thu"
/ "Fri" / "Sat" / "Sun"
date = 1*2DIGIT month 2DIGIT ; day month year
; e.g. 20 Jun 82
month = "Jan" / "Feb" / "Mar" / "Apr"
/ "May" / "Jun" / "Jul" / "Aug"
/ "Sep" / "Oct" / "Nov" / "Dec"
time = hour zone ; ANSI and Military
hour = 2DIGIT ":" 2DIGIT [":" 2DIGIT]
; 00:00:00 - 23:59:59
zone = "UT" / "GMT" ; Universal Time
; North American : UT
/ "EST" / "EDT" ; Eastern: - 5/ - 4
/ "CST" / "CDT" ; Central: - 6/ - 5
/ "MST" / "MDT" ; Mountain: - 7/ - 6
/ "PST" / "PDT" ; Pacific: - 8/ - 7
/ 1ALPHA ; Military: Z = UT;
; A:-1; (J not used)
; M:-12; N:+1; Y:+12
/ ( ("+" / "-") 4DIGIT ) ; Local differential
; hours+min. (HHMM)
Thu, 21 Apr 2005 00:13:28 CET
To a human, it looks more or less identical to the first example. At least, it doesn't look like it should be an invalid date string. But it is. According to the specs I pasted above, "CET" is not a valid zone enumeration, and my software shouldn't have to recognize it. Lets look at more examples. Take a look at the following date strings, and let me know if they comform to the same standards:
Thu, 21 Apr 2005 00:13:28 GMT Thu, 21 Apr 2005 00:13:28 -0600 Thu, 21 Apr 2005 00:13:28 EST Thursday, 21-Apr-05 00:13:28 GMT
To most people, they look more or less equivalent. To a machine that strictly implements documented standards (as much as RFCs can be considered standards) the 3 are very different. The first two comform to RFC 2616 (HTTP 1.1), RFC 822, as well as RFC 2822, which technically obsoletes RFC 822. How 'bout the 3rd one? It's fine for RFC822 and 2822, but doesn't comform to 2616 because the only enumerated timezone permitted there is "GMT". The 4th one is obviously the odd ball here, but it actually comforms to RFC 2616.
The bottom line is, this mess makes date parsing a real pain. For example, if IlohaMail gets a message with a date format that looks like the second example, and fails to adjust properly for timezones because it doesn't know what "CET" is, users complain. And they're not going to accept "the date in your email doesn't comform to standards" as a valid excuse, largely because as far as any one can see, it looks like any other valid date string.
The good news is that people have learned. Sort of. Atom uses a very machine-readable date format. On the other hand, RSS 2.0 which, I believe was designed around the same time as Atom, uses RFC 822 (which, as I mentioned earlier, was obsoleted by 2822).