Craigslist RSS feeds use invalid dc:date values, along the lines of <dc:date>2007-08-28T21:38:22-0700</dc:date> when the timezone offset should instead be -07:00 with a colon. Since the regex in W3CToIETFDate() is already ensuring that there's [+-] in front of four digits coming after everything else we want, it ought to be safe to make the colon optional.
1. Load attachment 278724 [details], a Craigslist search feed as of 2007-08-27.
2. Note that the preview lists the first item with a date of "Tue, Aug 28, 2007 2:38 PM" (if you're in PDT, as I am now), the result of treating 21:38:22 as a UTC time, rather than showing it as "9:38 PM", the intended time.
I think it's ok to fish for invalid field values as long as we don't break any valid dates.
Which makes step 1 "verify that we have test coverage for every possible form of W3CDTF." (And step 2 "worry about ISO 8601 forms which are not W3CDTF" but I'm not spending CHF 126.00 for a copy of the spec.)
(In reply to comment #2)
> Which makes step 1 "verify that we have test coverage for every possible form
> of W3CDTF."
mmm, ok, I can buy that. maybe lose the melodrama and say "lots of w3cdtf tests". ;)
> (And step 2 "worry about ISO 8601 forms which are not W3CDTF" but
> I'm not spending CHF 126.00 for a copy of the spec.)
Nope, ISO8601 does much more than timestamps, but no feed date element allows anything but timestamps. The unpleasant interaction of RFC 3339 with xsdDateTame is probably the nastiest case we have to deal with:
(In reply to comment #3)
> no feed date element allows anything but timestamps.
Sadly, that's not the case. Either the RSS 1.0 authors didn't read the part of http://www.w3.org/TR/NOTE-datetime which says "An adopting standard must specify which of these options it permits" or they meant to allow all six granularities, but didn't read the part that says "If a given standard allows more than one granularity, it should specify the meaning of the dates and times with reduced precision" so for dc:date in RSS 1.0, <dc:date>2007-08-29</dc:date> is an absolutely valid value, with an undefined relationship to values like <dc:date>2007-08-29 00:00:00</dc:date>.
Since ISO 8601 is well-designed, I don't see any way that allowing no-colon timezones will get anything wrong, as long as it's after [+-], but we still should be testing that we're getting our best-guess at what RSS 1.0 would have specified if they did their job, 2007 == 2007-01-01T00:00:00.0-00:00.
(In reply to comment #4)
> (In reply to comment #3)
> > no feed date element allows anything but timestamps.
> Sadly, that's not the case.
Yes, sad. We'll do the best we can with non-timestamp values, but we'll probably be buggy there until we can incorporate a real ISO8601 parser, hopefully in a library that we can share across the product.
Fixed by bug 682754.