Closed Bug 430930 Opened 12 years ago Closed 10 years ago

Date.parse cannot even parse "2008-04-26" (should understand ISO 8601)

Categories

(Core :: JavaScript Engine, defect, P3)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
status1.9.2 --- beta1-fixed

People

(Reporter: x00000000, Assigned: bruce)

References

()

Details

(Keywords: dev-doc-complete, Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 7 obsolete files)

ECMA3 doesn't require that Date.parse can parse anything other than the strings produced by Date.prototype.toString or Date.prototype.toUTCString of the same implementation, but it should be able to parse simple international ISO 8601 and similar dates, e.g. the time stamps above Bugzilla comments. Currently it understands US specific date formats only.

http://en.wikipedia.org/wiki/ISO_8601
Parsing an 8601 subset (just timestamps, I think, and not the durations or intervals) is also v. likely to be an ES4 requirement.
Taking this for now, to see if we can find a library to incorporate.
Assignee: general → sayrer
Adding this to the 1.9.1 triage queue by marking this wanted1.9.1?.  
Flags: wanted1.9.1?
Priority: -- → P3
RFC 3339 <http://www.ietf.org/rfc/rfc3339.txt> is a good subset to implement, as it is simple, "SHOULD be used in new protocols on the Internet" (per section 5.6 of the RFC), and is being used in at least one common format (Atom).

A similar (identical?) subset is described more concisely in W3C-DTF <http://www.w3.org/TR/NOTE-datetime>.
Flags: wanted1.9.1? → wanted1.9.1+
The W3C formats are the most minimal of the group described in this bug. 

The ES4 date_and_time proposal covers a larger potential set of formats, but includes a requirement that the formats all include the 'T' separator, even if lacking a trailing time component. I suspect this is a bug. 

The IETF formats are quite a bit more complex still. Full 8601 format would be a significant undertaking.

In the interest of simplicity, this patch implements (I think) the W3C formats and nothing else. Feedback appreciated. I probably made a number of mistakes or style gaffes.
(In reply to comment #5)
> The ES4 date_and_time proposal covers a larger potential set of formats, but
> includes a requirement that the formats all include the 'T' separator, even if
> lacking a trailing time component. I suspect this is a bug. 

Yes, see

http://bugs.ecmascript.org/ticket/147

/be
Blocks: es5
Related to this: the file <a href="http://mxr.mozilla.org/mozilla-central/source/js/src/xpconnect/loader/ISO8601DateUtils.jsm">js/src/xpconnect/loader/ISO8601DateUtils.jsm</a> parses and produces ISO8601 datestamps as well, and is already in firefox builds, if not the JS shell. It is possibly more robust code. Certainly simpler and a bit more permissive (accepts no-timezone stamps, missing dashes, and 3 digit month numbers). Preferences?
(In reply to comment #7)
> Related to this: the file <a
> href="http://mxr.mozilla.org/mozilla-central/source/js/src/xpconnect/loader/ISO8601DateUtils.jsm">js/src/xpconnect/loader/ISO8601DateUtils.jsm</a>
> parses and produces ISO8601 datestamps as well, and is already in firefox
> builds, if not the JS shell. It is possibly more robust code. Certainly simpler
> and a bit more permissive (accepts no-timezone stamps, missing dashes, and 3
> digit month numbers). Preferences?

I originally wrote that code for Forumzilla, and it's been in Thunderbird for years, so it's pretty well-exercised (however, note bug 439295 in parts of the code that were added more recently).  But it might not be particularly speedy.
ISO-format parsing support is supposed to be part of the core ES3.next / ES4 date object, which means (I think) that it ought to show up in the JS shell as well, not just builds glued to xpconnect. I wonder if we have a strategy for shipping implementable-in-script bits of standard objects as part of spidermonkey, or if we always do them in C. 

Of course, this bug also probably wants a companion bug for the function that *produces* ISO format dates. Also doable in script or in C.
(In reply to comment #9)
> I wonder if we have a strategy for
> shipping implementable-in-script bits of standard objects as part of
> spidermonkey, or if we always do them in C. 

We don't have such a strategy, but would very much welcome one IMO.
Self-hosting native stuff has been a gleam in various eyes for a while, but the time is now. This is good for several reasons.

The DOM has one or two self-hosted getters, IIRC. We'd want to take compilation out of startup, or any critical path. Suggestions?

/be
Lazy / compile-on-demand at a function level? I may be misreading, but it looks like you have possibly two JSFUN_* bits left. 

Seems to me that you'd get the most bang for the least disruption that way. The important part is being able to code functions in script, not declare the existence of classes and objects.
Use XDR to generate some static arrays of bytes at (Spidermonkey) compile-time, like is done with ABC in tamarin?  (Another bug?)
Hi --- this is blocking ECMAScript 5, for which Brendan has set a rough deadline that's coming up soon.  Any chance this bug could get some loving attention?
I'll get this patch from graydon landed.
I started working on a test case but got side-tracked.  I'll get back to it.  Sayre, if you want to land the patch first and test case later, that's ok with me.
You might also want to look closely at the "final" spec that made its way into ES5. I remember some committee members slipping in some oddball options such as >4-digit years (with a leading "+") and ... something odd I don't recall the details of, in the fractional-seconds part. Didn't go back to check this patch against what actually made it in.
Made Graydon's code build with the current tree:

- needed to update JSSTRING_CHARS_AND_LENGTH


Fixed two bugs in the existing format and calculations:

- timezone offset was added when should be subtracted

- Did not allow 24:00:00 as the same as 00:00:00 the next day.
Assignee: sayrer → bruce
Attachment #329755 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attached file a start on some tests (obsolete) —
Attachment #390199 - Attachment mime type: text/html → text/plain
Need to expand the range of years to handle 6 digit years: +/-YYYYYY

Graydon's code rejects years greater than 9999. This is both
unnecessary (as the parser accepts only 4 digits anyway) and
incorrect. The spec says in 15.9.1.1:

"The actual range of times supported by ECMAScript Date objects is
slightly smaller [than 9,007,199,254,740,991]: exactly –100,000,000
days to 100,000,000 days measured relative to midnight at the
beginning of 01 January, 1970 UTC. This gives a range of
8,640,000,000,000,000 milliseconds to either side of 01 January, 1970
UTC."

One correct implementation therefore would be to fail quickly if the
year is outside a range somewhere between +/- 273973 (ceil(1e8/365))
and +/- 285616 in order to avoid overflowing the millisecond
calculation, and then at the very end reject the date if the
milliseconds lies outside +/- 8,640,000,000,000,000.
Need to increase the flexibility of handling partial date/times

The code currently correctly handles the date-only forms:

YYYY 
YYYY-MM 
YYYY-MM-DD

The code currently also handles a full YYYY-MM-DD date followed by a
partial time in one of the following formats, plus a required
timezone.

THH:mm 
THH:mm:ss 
THH:mm:ss.sss

Unfortunately the spec says that it should also handle:

- time-only forms in the obove formats, but with the timezone
  optional.

- ANY combination of the allowed date-only forms and the time-only
  forms.  This implies that the timezone is also optional on
  date-times, including on the full format!

15.9.1.15 is silent on what to do if the timezone is missing. Assume
local time? That is what our current non-ISO code does.
Heuristics

From the spec:

The function first attempts to parse the format of the string
according to the rules called out in Date Time String Format
(15.9.1.15). If the string does not conform to that format the
function may fall back to any implementation-specific heuristics or
implementation-specific date formats.

One heuristic is plainly to fall back to a completely different date
format (e.g. that implemented prior to this bug) but another sensible
heuristic might be to allow things that are almost but not quite
compliant with the format given in 15.9.1.15:
(YYYY-MM-DDTHH:mm:ss.sssTZ and above extensions)

Possibilities?

- allow space instead of T?

- maybe allow some other date and time separators? What ones?
Sigh. I'm sorry, I should have stayed on top of this during spec finalization; I was worried *exactly* this sort of nonsense would creep back in. Discussion regularly kept drifting into handling "extra cases" that "shouldn't be hard" but of course made the algorithm necessarily ambiguous.
(In reply to comment #22)
> 
> One heuristic is plainly to fall back to a completely different date
> format (e.g. that implemented prior to this bug) but another sensible
> heuristic...

I think that's a different bug, if we decide to do it.
(In reply to comment #23)
> Discussion regularly kept drifting into handling "extra cases" that
> "shouldn't be hard" but of course made the algorithm necessarily
> ambiguous.

> Discussion regularly kept drifting into handling "extra cases" that
> "shouldn't be hard" but of course made the algorithm necessarily
> ambiguous.

I don't think there's anything hard there, other that the lack of
specifying what you do when things are missing.

Surely the only sane thing is to treat missing fields as 0's?

If you only have a year then it's midnight on Jan 1st. If you have
year and month then it's midnight on the 1st of that month. If you
have a day then it's midnight that day.

For times without dates I'd assume what is wanted is that time on Jan
1st 1970 (or maybe Jan 2 or Dec31 1969 in UTC depending on the
timezone given). Then you can add a date and a time together and get
the same thing as if they'd both been presented in the same string
in the first place.

Can't see that anything else would be useful.

Also can't see that "2009T15:30" is especially useful, but it's not
ambiguous or difficult.
Implemented the features described in comments 20 and 21.

Now handles six digit years, including years in the far negative past.

Now handles general combinations of the allowed date formats and time formats
(with or without timezone).
Attached patch fixed bugs, stricter checking (obsolete) — Splinter Review
Fixed some bugs involving defaulted fields. Stricter checking of days in a month, using the actual length of that particular month instead of always 31. Updated function comment to match new semantics.
Attached file updated tests (obsolete) —
added many more unit tests
Attachment #390832 - Attachment mime type: text/html → text/plain
Recast the unit tests into the form used by existing js tests and
added it to the patch as:

  js/tests/ecma_5/Date/15.9.4.2.js

This involved creating js/tests/ecma_5 and the required boilerplate.

I don't know if that's a good place or not.

I'd say this is about ready for some review.
Attachment #390197 - Attachment is obsolete: true
Attachment #390199 - Attachment is obsolete: true
Attachment #390479 - Attachment is obsolete: true
Attachment #390828 - Attachment is obsolete: true
Attachment #390832 - Attachment is obsolete: true
Attachment #391088 - Flags: review?(graydon)
The code looks fine, I'm still nervous about the heuristics. Why 0 in all cases except timezone, then local time? Why not default to UTC / Z if timezone is omitted?
In 15.9.4.2 it states that if the string conforms to the syntax specified in 15.9.1.15 then "The String may be interpreted as a local time, a UTC time, or a time in some other time zone, depending on the contents of a string".

Clearly it can not be a local time if it contains a "Z" or an explicit HH:MM timezone. So unless no timezone means local time there is no way to specify a local time.
Comment on attachment 391088 [details] [diff] [review]
Now with tests integrated into the js test system.

Looks ok, thanks.
Attachment #391088 - Flags: review?(graydon) → review+
Attachment #391088 - Attachment is obsolete: true
Attachment #393977 - Flags: review?(graydon)
Attachment #393977 - Flags: review?(graydon) → review+
http://hg.mozilla.org/mozilla-central/rev/31242b0c35d8
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
js/tests/ecma_5/Date/15.9.4.2.js
Flags: in-testsuite+
Documented:

https://developer.mozilla.org/en/JavaScript/Reference/Global_Objects/Date

Also updated the New in JavaScript 1.8.5 article.
Not properly documented. The doc still talks about RFC822 dates and "Mon, 25 Dec 1995 13:30:00 GMT". Only hint to ISO dates is a small, one-sentence yellow note. Not sufficient.
Ben, seems you have a good idea of what's to be done. Feel free to edit yourself!
Depends on: 698334
You need to log in before you can comment on or make changes to this bug.