Open Bug 1274354 Opened 6 years ago Updated 1 month ago

[meta] Date.parse, amirite

Categories

(Core :: JavaScript Engine, defect, P3)

defect

Tracking

()

Webcompat Priority ?

People

(Reporter: jorendorff, Unassigned)

References

(Depends on 17 open bugs, Blocks 2 open bugs)

Details

(Keywords: DevAdvocacy, meta, Whiteboard: [webcompat] )

OK so Date.parse has no standard, and our implementation is a dumpster fire.

We can fix either problem first. The purpose of this bug is to collect the Date.parse minibugs in one place.
Depends on: 1265136
Depends on: 1275498
Depends on: 1275502
Building us a new Date parser to submit to ECMA, see the work in progress here: http://mrrrgn.com/date/
Depends on: 1205298
The current spec states that Date.parse should expect a date string based on the ISO 8601 standard. If the date string does not fit that requirement, the VM is allowed to do whatever, for example a best effort approach to parse the string.

Why is this an issue? And why does the language have to change instead of simply providing a library? Moment.js already exists.
(In reply to yangguo from comment #2)
> The current spec states that Date.parse should expect a date string based on
> the ISO 8601 standard. If the date string does not fit that requirement, the
> VM is allowed to do whatever, for example a best effort approach to parse
> the string.
> 
> Why is this an issue? 

Because developers expect the behavior to be the same across implementations and that causes cross-compatibility issues. People rarely stick to the ISO 8601 standard, instead they unwittingly rely on implementation specific heuristics. The fallout from this is a steady stream of bugs with titles like "new Date() doesn't behave like Chrome/Safari, please fix."

Implementation defined heuristics have led us to a situation where the most popular engine's parser becomes the de facto standard and everyone else is just left to play catch up.

> And why does the language have to change instead of
> simply providing a library? Moment.js already exists.

"Always use ISO 8601 format dates in `new Date(...)` or it won't work across browsers" is a gotcha. Developers have a reasonable expectation that their code should work across implementations. Breaking this assumption with heuristics leads to confusion. 

I would argue that custom behavior belongs in libraries like Moment.js not in VMs, which should be consistent.

The good news is, while things are breaking, most implementations aren't extremely different. I believe a reasonable grammar can be made which is [at least almost] a superset of their individual quirks.
I see. So you are just trying to put existing behavior into the spec rather than designing a new Date.parse.

Do you have any data on how many websites rely on implementation specific heuristics?

Do you have a set of parsing rules that is compatible with existing implementation? Are there any cases where different implementations would parse differently?
Also, if users cannot rely on implementation specific heuristics to begin with (hence the cross-compatibility issues), why not remove the optional fall back from the spec text always return NaN?
(In reply to yangguo from comment #5)
> Also, if users cannot rely on implementation specific heuristics to begin
> with (hence the cross-compatibility issues), why not remove the optional
> fall back from the spec text always return NaN?

Web developers might only test their site on Chrome and not realize their Date strings rely on heuristics that are not implemented in IE/Edge or Firefox.
So, if the spec requires Date.parse to return NaN, and all browsers implement that, then it should be fine?
(In reply to yangguo from comment #7)
> So, if the spec requires Date.parse to return NaN, and all browsers
> implement that, then it should be fine?

If Date.parse were made very strict, many web sites would break. See comment 3.

The constraints here are:
- minimize breakage
- maximize cross-implementation compatibility
- minimize spec complexity

...in that order, a la Asimov's Three Laws of Robotics. If we determine that implementing V8's Date.pars() quirks fixes more web sites than it breaks, we must do it (and the current spec permits this).
I like the idea of standardizing Date.parse behavior because I am skeptical that it will be web-compatible to simply return NaN for non-ISO 8601 date strings. Yang has put some telemetry in V8 that should give us some data on how often the "legacy" path hits, so that should give some clarification to whether that skepticism is well-founded.

If we do find that we need additional date parsing behavior, seems like the first thing is to catalog what browsers do today and find the commonalities and differences. We
Oops, posted that comment halfway through. Here's the full comment.
----

I like the idea of standardizing Date.parse behavior because I am skeptical that it will be web-compatible to simply return NaN for non-ISO 8601 date strings. Yang has put some telemetry in V8 that should give us some data on how often the "legacy" path hits, so that should give some clarification to whether that skepticism is well-founded.

If we do find that we need additional date parsing behavior, seems like the first thing is to catalog what browsers do today and find the commonalities and differences. Probably it'll be somewhere in between the intersection and the union of what various browsers ship. Browsers can implement telemetry to see how web-compatible a change like this would be--whether the supported cases are necessary, and whether the unsupported cases are OK to ban.

This spec text that's linked seems to be mostly a grammar, and not this comparison. To me that seems a bit backwards--we should first do the comparison, document it, eventually augmented with telemetry data, and then derive the grammar.
> because I am skeptical that it will be web-compatible to simply return NaN for non-ISO 8601 date strings

That will definitely not be web compatible, our bug reports alone speak to that but I am excited to see telemetry data! :)

> If we do find that we need additional date parsing behavior, seems like the first thing is to catalog what browsers do today and find the commonalities and differences.

The grammar on the page from my prior comment is based on the parsing code from SpiderMonkey, V8, and Chakra. I agree that cataloging the differences could have been done beforehand, but adding probes and doing data analysis seems like a more cumbersome way to get started than simply reading the code. 

I'm working on a test suite and using it to build a compatibility table for SpiderMonkey, V8, Chakra, and the parser I've been working on. I'll drop the data here when it's ready. I'm moving more slowly than I'd like at the moment since I'm away on travel/vacation.
Morgan, reading the code seems like a valid way to go about it to me. Really glad you're doing all this hard work; I don't want to sound too critical. How does your grammar relate to what is supported by those three browsers? Does it approximate the union? The intersection? What about JSC?
(In reply to Daniel Ehrenberg from comment #12)

> glad you're doing all this hard work; I don't want to sound too critical.

Likewise I appreciate your feedback and help. I hope my reply didn't seem snippy.

> How does your grammar relate to what is supported by those three browsers? Does it approximate the union? The intersection? What about JSC?

It's approximating a union at this point. SpiderMonkey's date parser was very strict so there aren't many collision points. Another hopeful sign is that v8 appears to have gone out of its way to maintain compatibility with JSC. Because of that I'm feeling less worried about building a universal parser.

One area where telemetry will come in very useful might be in actually trimming it back a little. For example in v8: `new Date(',,adf adsfadfad,a,12')` returns "Sat Dec 01 2001 00:00:00 GMT+0000 (GMT)" which makes sense after looking at the code, but probably shouldn't be considered valid.
I've thrown a few million strings at the engines and tabulated the differences. The pseudo-fuzzer I'm using right now stinks (it's embarrassing) so I'm going to fix it up and expand on this data over the next day or so. Here is what we have now:

https://github.com/mrrrgn/proposal-date-time-string-format/blob/master/data/results.csv
^ I took that file down temporarily, the results are printing out of order. :( https://media.giphy.com/media/NZhO1SEuFmhj2/giphy.gif will be fixed up later.
mrrrgn, Looks like your csv file has V8, JSC and SpiderMonkey. It'd also be nice to have data for IE11 and Edge too (separately, as they are different populations and I'd be curious about compatibility with IE11's slice of the web). Maybe Brian (cc'd) can help you get that testing set up.

Since you've spent so much time looking at browsers' code for date parsing, I wonder if you might give us pointers for where exactly it'd make sense to add UseCounters for things that don't meet your grammar. There's a similar patch at https://codereview.chromium.org/2050733004 , but I fear it is a bit coarse-grained, getting at most usages of the non-IS0 8601 grammar. If I could somehow encourage you to contribute the patch with the UseCounters yourself, then that would be especially great!
> It'd also be nice to have data for IE11 and Edge too (separately, as they are different populations and I'd be curious about compatibility with IE11's slice of the web).

I'll put the tests up on a webpage and have someone visit it on edge/ie. I won't have access to any Windows machines until I get back from travel in July.

> If I could somehow encourage you to contribute the patch with the UseCounters yourself, then that would be especially great!

I'd love to give that a try, it's also a nice excuse to get acquainted with the Chromium workflow. :p
I have a computer on-hand with Edge and IE11. If you tell me what website to go to, I'd be happy to run it and post the results (now or whenever you have your revised version).

You can find out about getting started with V8 here: https://github.com/v8/v8/wiki/Using%20Git . Let me know if you run into any issues.
(In reply to Daniel Ehrenberg from comment #18)
> I have a computer on-hand with Edge and IE11. If you tell me what website to
> go to, I'd be happy to run it and post the results (now or whenever you have
> your revised version).
> 
> You can find out about getting started with V8 here:
> https://github.com/v8/v8/wiki/Using%20Git . Let me know if you run into any
> issues.

Awesome, could you try http://mrrrgn.com/date/data/gen-data.html ? It will take a long while to run unfortunately. Most of the engines are finding about 260,000 invalid date strings there.

The analysis is sloppy but sort of nice regardless. Grepping for any combination of "no,  no,  no,  yes" shows the edge cases particular to each engine.

I believe the grammar I've been working on is way too permissive right now, but it behaves consistently. *shrugs* It needs a lot more work.
(In reply to Daniel Ehrenberg from comment #18)
> I have a computer on-hand with Edge and IE11. If you tell me what website to
> go to, I'd be happy to run it and post the results (now or whenever you have
> your revised version).
> 
> You can find out about getting started with V8 here:
> https://github.com/v8/v8/wiki/Using%20Git . Let me know if you run into any
> issues.

That test page I offered before was written in a dumb way which made it slower than necessary. http://mrrrgn.com/date/data/gen-data.html should now run in < 10 seconds

Apologies for that.
I'm having trouble saving the results of this page. Although it runs fine, I can't copy-paste the results, as Ctrl-A overwhelms my tiny Windows netbook with such a large selection. Any way you could make a version of this site which includes uploading the results somewhere, or somehow making it save-able in a way that I could make useful?
Here you can find Edge 14 results (from latest insider build, approximately what will be released later this summer): https://gist.github.com/bterlson/09448e92a6fc8773092abadbc6c75c1c.

I will attempt to get IE11 numbers...
Depends on: 1328672
Depends on: 1341481
Depends on: 449921
Depends on: 802760
Depends on: 682781
Depends on: 1235045
Depends on: 1366955
The dependencies are starting to pile up here and it's causing compatibility issues.

Do we have any plans to put this work on a roadmap somewhere?
(In reply to Mike Kaply [:mkaply] from comment #23)
> The dependencies are starting to pile up here and it's causing compatibility
> issues.
> 
> Do we have any plans to put this work on a roadmap somewhere?
Flags: needinfo?(jorendorff)
Whiteboard: [webcompat]
Depends on: 1420028
This (old) page has a lot of tests that could be useful for this issue:
http://dygraphs.com/date-formats.html
Depends on: 1439800
Keywords: meta
Priority: -- → P3
Depends on: 1468499
There is a new proposal in the works, it would be great to have feedback on it: https://github.com/gibson042/ecma262-proposal-uniform-interchange-date-parsing
This came up again during this week's TC39 face-to-face.

Whereas Morgan aimed to reverse-engineer and standardize the behavior of Date.parse(), the new proposal is not that ambitious. Here's what it proposes.

The current spec for Date.parse(str) is:
  - if str matches the interchange format, parse it;
  - otherwise, return an implementation-defined result, possibly NaN.

The proposed spec is:
  - if str matches the interchange format, parse it;
  - if str is a "near miss" that almost matches the interchange format
    but deviates from it in one of a few specified ways, return NaN;
  - otherwise, return an implementation-defined result, possibly NaN.

This could be valuable to programmers because, when they write code that uses these "near miss" strings, they'll get NaNs, which should immediately cause noticeable bugs (rather than implementation-defined results, which could work fine until they try it in another browser). Early detection of bugs is good.

So the proposal is good, as long as:
  - The specified "near miss" strings are exactly the mistakes that programmers actually make in practice.
  - The change doesn't cause enough breakage to swamp the benefits.

But just now as I was typing that, I realized those two criteria seem to be at odds. Can both be true?
Flags: needinfo?(jorendorff)
Both can be true to the extent that lots of web content
  - does in fact hit the specific "near miss" strings affected by this proposal
  - but that almost always gets fixed later by the developers.

Imagine my skeptical face.
Depends on: 1515318
Summary: Date.parse, amirite → [meta] Date.parse, amirite

I was kind of concerned about the new proposal being too unambitious as well, but then I spoke with the champion, and we're thinking of specifying things fully as a second step. I didn't realize until the presentation in September that there were all sorts of things permitted by the RFC but rejected by all JS implementations, and it's great to tighten down on that end, for example. The goal is to make some progress on specifying things more closely, and not letting perfect be the enemy of good, while circling back around and reviving as much of Morgan's work as possible as a follow-on. See notes: https://github.com/tc39/proposal-temporal/blob/master/meetings/agenda-minutes-2019-01-08.md

Migrating Webcompat whiteboard priorities to project flags. See bug 1547409.

Webcompat Priority: --- → ?

See bug 1547409. Migrating whiteboard priority tags to program flags.

Depends on: 1557650
Depends on: 1559759
Depends on: 1599375
Depends on: 1600902
Depends on: 1617562
Depends on: 1617258
Depends on: 1655947
Depends on: 1676708
Depends on: 1693692
Depends on: 1723515
Depends on: 1730155
You need to log in before you can comment on or make changes to this bug.