Closed Bug 977849 Opened 10 years ago Closed 7 years ago

Update update-test262.sh to pull from github

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox54 --- fixed

People

(Reporter: mozillabugs, Assigned: sfink)

References

(Blocks 1 open bug)

Details

(Whiteboard: [js:p1])

Attachments

(11 files, 18 obsolete files)

16.04 KB, patch
evilpie
: review+
Details | Diff | Splinter Review
16.48 KB, patch
anba
: review+
Details | Diff | Splinter Review
5.22 MB, patch
anba
: review+
Details | Diff | Splinter Review
2.27 MB, application/x-zip
anba
: review+
Details
42.84 KB, patch
anba
: review+
Details | Diff | Splinter Review
5.14 KB, patch
Details | Diff | Splinter Review
1.65 KB, patch
evilpie
: review+
Details | Diff | Splinter Review
1.55 KB, patch
Waldo
: review+
Details | Diff | Splinter Review
1.44 KB, patch
shu
: review+
Details | Diff | Splinter Review
106.22 KB, patch
evilpie
: review+
Details | Diff | Splinter Review
2.92 KB, patch
dustin
: review+
Details | Diff | Splinter Review
The update-test262.sh script is currently written to pull updates to the test262 test suite from a Mercurial server, with http://hg.ecmascript.org suggested in the usage information.

TC39 now maintains the test suite on github:
https://github.com/tc39/test262

hg.ecmascript.org is about to be decommissioned. update-test262.sh needs to be updated to pull updates from github.
Blocks: 977848
Whiteboard: [js:p1]
Hi!  I'm looking for an ETA on this, not to nag but just to get a timeline of when we can plan to decom hg.ecmascript?

thanks in advance
I think you decom'd it already at this point.  :-)  Not that I'm particularly upset or angry or anything about it.

This is fairly near the top of my radar at this point, as we're starting to land tests and fixes that deviate from es5 semantics and require changes to imported tests.  We can record those changes in a somewhat ad hoc manner as post-pull patch applications in update-test262.sh, and in rare cases that's the right way to do it (when we're doing experiments that deliberately don't track es5/es6).  We need to get this runnable again so those post-pull patches can actually be mechanically applied.
Assignee: nobody → jwalden+bmo
Status: NEW → ASSIGNED
Attached patch bug977849-part1.patch (obsolete) — Splinter Review
A new update script to pull test262 from GitHub and then pre-process each test file to be compatible with the jstests harness.
Assignee: jwalden+bmo → andrebargull
Attachment #8809649 - Flags: review?(evilpies)
Attached patch bug977849-part2.patch (obsolete) — Splinter Review
Remove all old test262 files.
Attachment #8809650 - Flags: review?(evilpies)
Attached file bug977849-part3.zip (obsolete) —
Add the new test262 files (zipped patch because bugzilla doesn't allow 70MB patches *shocking* :-p).
Attachment #8809653 - Flags: review?(evilpies)
Attached patch bug977849-part4.patch (obsolete) — Splinter Review
And finally the updated exclusion list for tests which don't pass in SpiderMonkey.
Attachment #8809655 - Flags: review?(evilpies)
I haven't yet tried to support the async Promise tests, that needs to happen next. :-)
Thanks for doing this work! I wish we had this two years ago :) However we probably need to talk to different people before landing this ... Previously my idea is to use the nodejs runner instead of using our test runner, so we don't pay a cost of keeping the tests running. We also talked about having treeherder/task cluster download the test from git instead of putting the files in mercurial. 70MB is a huge repository increase so we need to discuss this with the appropriate peers.
Another caveat is the increased time required to finish jstests. With the old test262 checkout (~3300 files), I need about 1 minute to run all test262 tests in my development VM (with an opt build). The new test262 checkout (~42300 files, most tests duplicated to run in non-strict and strict mode) requires about 13 minutes on the same machine.
Oh and thanks for classifying all the errors. Must have been a lot of work and we could use this to reduce our number of failures.
Statistics:
1) With test duplication to run tests in non-strict and strict mode:
- Increased run time for jstests.py (with -j5): 13x
- 47.791 objects, total size 59,4 MB

2) Without test duplication:
- Increased run time for jstests.py (with -j5): 6.5x
- 26.284 objects, total size 32,9 MB
Hey gps, is this an acceptable increase? Jan suggested to ping you about this.
Flags: needinfo?(gps)
Attached patch bug977849-part1.patch (obsolete) — Splinter Review
Updated patches to skip the test duplication for strict-mode tests.
Attachment #8809649 - Attachment is obsolete: true
Attachment #8809649 - Flags: review?(evilpies)
Attachment #8809984 - Flags: review?(evilpies)
Attached patch bug977849-part2.patch (obsolete) — Splinter Review
Attachment #8809650 - Attachment is obsolete: true
Attachment #8809650 - Flags: review?(evilpies)
Attachment #8809985 - Flags: review?(evilpies)
Attached file bug977849-part3.zip (obsolete) —
Attachment #8809653 - Attachment is obsolete: true
Attachment #8809653 - Flags: review?(evilpies)
Attachment #8809986 - Flags: review?(evilpies)
Attached patch bug977849-part4.patch (obsolete) — Splinter Review
Attachment #8809655 - Attachment is obsolete: true
Attachment #8809655 - Flags: review?(evilpies)
Attachment #8809987 - Flags: review?(evilpies)
Attachment #8809985 - Flags: review?(evilpies) → review+
Comment on attachment 8809984 [details] [diff] [review]
bug977849-part1.patch

Review of attachment 8809984 [details] [diff] [review]:
-----------------------------------------------------------------

I don't think you need to change anything here, as long as this works! \o/

::: js/src/tests/test262-harness.diff
@@ +2,5 @@
> +index 52c9021..eeabfd8 100644
> +--- a/harness/detachArrayBuffer.js
> ++++ b/harness/detachArrayBuffer.js
> +@@ -1,3 +1,3 @@
> + function $DETACHBUFFER(buffer) {

Why isn't test262 using $.detachArrayBuffer?

::: js/src/tests/test262-host.js
@@ +6,5 @@
> +this.$ = {
> +    __proto__: null,
> +    createRealm() {
> +        var newGlobal = this.newGlobal();
> +        newGlobal.evaluate(`

Couldn't we just define this once at the top level and use eval for that definiton as well?

@@ +30,5 @@
> +    // This function is generally called from within a Promise handler, so any
> +    // exception thrown by this method will be swallowed and most likely
> +    // ignored by the Promise machinery.
> +    if ($mozAsyncTestDone)
> +        reportFailure("$DONE() already called");

Oh we don't use that unhanded rejected promise machinery in the shell? We probably should.

::: js/src/tests/test262-update.py
@@ +32,5 @@
> +    """
> +    import imp
> +
> +    packagingDir = os.path.join(test262Dir, "tools", "packaging")
> +    return imp.load_source("test262parser", os.path.join(packagingDir, "parseTestRecord.py"))

I think that function was removed in future versions. load_module wasn't deprecated till python 3.3.

@@ +67,5 @@
> +    skipIfTest = filterRefTest(refTest, "skip-if")
> +    failsTest = filterRefTest(refTest, "fails")
> +
> +    if skipTest:
> +        comments = ", ".join(imap(itemgetter(2), skipTest))

Personally I find using tuples confusing in this case.

@@ +97,5 @@
> +    # Prepend a possible "use strict" directive.
> +    if directive:
> +        source = directive + "\n" + source
> +
> +    # Add the |reftest| is present.

if

@@ +393,5 @@
> +    strictGroup = parser.add_mutually_exclusive_group()
> +    strictGroup.add_argument("--strict", default=False, action="store_true", dest="strict",
> +                             help="Generate additional strict mode tests.")
> +    strictGroup.add_argument("--no-strict", default=False, action="store_false", dest="strict",
> +                             help="Don't generate additional strict mode tests. This is the default mode.")

How does it know which one is the default?
Attachment #8809984 - Flags: review?(evilpies) → review+
Comment on attachment 8809986 [details]
bug977849-part3.zip

Okay, maybe it would be possible to move some of the imports to the top level, e.g those that provide verifyEnumerable etc. It seems like that is duplicated in a lot of shell.js files.
Comment on attachment 8809987 [details] [diff] [review]
bug977849-part4.patch

Review of attachment 8809987 [details] [diff] [review]:
-----------------------------------------------------------------

This is amazing and super useful. We should try to prioritize some of those failures.
Attachment #8809987 - Flags: review?(evilpies) → review+
I've separated the jstests framework changes into this new patch for clarity.

browser.js:
Test262 requires a host provided function to evaluate source code as global script code. In the shell, we can use the evaluate() function. But in browser, no builtin function is available to evaluate source code as global script code. We can provide this functionality through <script> elements, but there are some caveats:

1. Evaluating source code through <script> doesn't return the completion value. In the previous patch, I named the new browser.js function "evaluate" like it's shell counterpart, but that broke existing jstests which expect evaluate() to return a completion value. So I needed to use a different name for this function (it's now named "evaluateScript").
2. Catching and rethrowing errors needs some extra effort, but it seems to work so far.
3. I had to learn that it's not possible to evaluate source code with <script> in an <iframe>, when the <iframe> is no longer attached to the DOM (*). That means I needed to remove the call to HTMLIFramePrototypeRemove in browser.js' newGlobal() function.

(*) And I also had to learn that the same restriction applies to Workers in <iframe>s. :-)


manifest.py:
manifest.py applies different reftest terms on top of each other, but only stores the last terms string. This leads to different behaviour when running test262 in the shell compared to the browser reftests. In the shell, the reftest terms are cumulative and it's possible to use inline |reftest| comments and jstests.list entries. But in the browser, only the inline |reftest| comments was used.
Attachment #8810552 - Flags: review?(evilpies)
Attached patch bug977849-part1.patch (obsolete) — Splinter Review
Updated part 1 per review comments:
- Removed code duplication in test262-host.js
- Replaced imp.load_source with imp.find_module + imp.load_module
- Simplified reftest string creation ("fails" entries weren't actually used any more)
- Common harness files are now bundled in top level shell.js files to reduce code duplication in the generated test262 files
- And Promise tests are now also enabled for browser jstests

Carrying r+ from evilpie
Attachment #8809984 - Attachment is obsolete: true
Attachment #8810555 - Flags: review+
Attached file bug977849-part3.zip (obsolete) —
Regenerated part 3 again.
Attachment #8809986 - Attachment is obsolete: true
Attachment #8809986 - Flags: review?(evilpies)
Attachment #8810556 - Flags: review?(evilpies)
Attached patch bug977849-part4.patch (obsolete) — Splinter Review
Updated part 4 to exclude additional tests when running test262 in the browser (*) and added new bugzilla links for failing tests.

Carrying r+ from evilpie.

Note: I'll need to update this part again because some tests are timing out on certain platforms. 

(*) Not all test262 are compatible for browser environments!
Attachment #8809987 - Attachment is obsolete: true
Attachment #8810561 - Flags: review+
It also seems to be necessary to use more chunks when jsreftest includes test262 to avoid frequent "Output exceeded 52428800 bytes, remaining output has been truncated" and "command timed out: 7200 seconds elapsed running" issues on Try [1].

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=cad4b1851e6d9f237e84eee688317a7d58bbc1d7
Comment on attachment 8810552 [details] [diff] [review]
bug977849-part0.patch

Review of attachment 8810552 [details] [diff] [review]:
-----------------------------------------------------------------

Interesting. I didn't spot anything obviously wrong. The onbeforescriptexecute stuff in evaluateScript might be a bit too complicated, but as long as it works.

::: js/src/tests/lib/manifest.py
@@ +276,5 @@
>              return
>  
>      testcase.tag = matches.group(1)
> +    _append_terms_and_comment(testcase, matches.group(2), matches.group(4))
> +    _parse_one(testcase, matches.group(2), xul_tester)

So it's correct not to use testcase.terms here?
Attachment #8810552 - Flags: review?(evilpies) → review+
Comment on attachment 8810556 [details]
bug977849-part3.zip

rs=me, great to see the smaller shell.js files!
Attachment #8810556 - Flags: review?(evilpies) → review+
(In reply to Tom Schuster [:evilpie] from comment #26)
> ::: js/src/tests/lib/manifest.py
> @@ +276,5 @@
> >              return
> >  
> >      testcase.tag = matches.group(1)
> > +    _append_terms_and_comment(testcase, matches.group(2), matches.group(4))
> > +    _parse_one(testcase, matches.group(2), xul_tester)
> 
> So it's correct not to use testcase.terms here?

Yes, this simply avoids to parse any previous terms again.
(In reply to Tom Schuster [:evilpie] from comment #12)
> Hey gps, is this an acceptable increase? Jan suggested to ping you about
> this.

Thank you for checking for implications of landing this before landing this!

I'm a big proponent of monorepos and vendoring all of the things, like the ~30k new test files this bug wants to add. However, that mindset has to be tempered by the reality that adding tens of thousands of files has performance implications for several processes:

* clone size/time, both for developers and for automation
* working directory update time, both for developers and for automation (it is a bigger deal in automation because machines are doing large updates more frequently than humans)
* `hg status` and `git status` performance (although Mercurial has a mitigation for that via the `watchman` filesystem monitor)

Combined, this can contribute significant one-time and recurring overhead.

We want mozilla-central to be a useful monorepo and to scale to millions of files (if needed). However, before that can realistically happen:

* We need at least CI/automation performing sparse checkouts (only check out the files you need, not every file). This may require Mercurial's sparse checkout functionality being part of the core distribution (as opposed to a 3rd party extension)
* We may also want to support "narrow clones" (cloning a subset of files) so people and machines don't have to download data for thousands of files they likely don't care about.
* Git's scaling story for working directories with hundreds of thousands of files needs to improve. Without filesystem watching, performance of operations like `git status` can go off a cliff once the working directory reaches certain inode thresholds.

We have rough timelines for the first 2 items. And if push came to shove, Git client performance issues would likely not block us from adding more files to mozilla-central. Or the Git tooling would be taught to ignore certain directories to allow it to work/scale.

If we can avoid adding the ~30k new test files at this time, it would be preferred. If it needs to happen (e.g. there is a compelling developer productivity case to be made) we can do it. But I'd prefer to not press the scaling issues if we don't have to.

As for alternatives, we can have automation grab the tests from a separate repository. We don't like automation relying on 3rd party services (e.g. github.com) because we don't like outages of 3rd party services causing tree closures. Unfortunately, we don't have a good Git hosting story that scales well. So your alternate choices seem to be:

1) have automation clone from Git; make jobs tier-1 and incur a tree closure when Git host goes down
2) same as #1 except jobs are tier-2 and can be orange for long periods if Git host goes down
3) mirror Git repository to hg.mozilla.org and have automation consume it (there was precedence for this with FxOS foo and I believe we have existing software/infrastructure to throw at the problem)
Flags: needinfo?(gps)
Thank you gps! See bug 1285372 about the alternative approach of using the nodejs test262-harness and a separate test262 git checkout to run the tests.
I should also mention that this isn't the only project that may want to add tons of new files:

* vendoring servo will add ~10k files
* vendoring WPT CSS tests would add ~100k files (!!!)

mozilla-central is currently ~150k files. From what I've heard from Facebook's experience scaling their monorepo, mozilla-central is on the verge of running into some problems. With vendoring of servo being a sure thing, I really don't want to take the risk that the ~30k files in this bug jeopardize the servo vendoring. Once servo is vendored in and we have some more vcs syncing infrastructure in place around that, I'd feel much better about taking these JS test files and (eventually) the WPT CSS tests.
> * vendoring WPT CSS tests would add ~100k files (!!!)

I don't know where that number comes from, but I think the reality is closer to 35k files.

jgraham@luna:~/develop/csswg-test$ find . | wc -l
34660

FWIW I think that basing a stopgap solution on FxOS precedent would be a bad idea; that infrastructure was fragile and no one was sad to see the back of it. If you need a quick hack, don't want two-way sync and plan to update the tests relatively infrequently, uploading a zip of the testsuite to tooltool could work. It's not a great solution, but nothing other than "vendor the files in tree" is.
Attached patch bug977849-part2.patch (obsolete) — Splinter Review
Updated part 2 to apply cleanly on inbound. Carrying r+ from evilpie.
Attachment #8809985 - Attachment is obsolete: true
Attachment #8816707 - Flags: review+
Attached patch bug977849-part4.patch (obsolete) — Splinter Review
Updated part 4 to enable more test262 tests. Carrying r+.
Attachment #8810561 - Attachment is obsolete: true
Attachment #8816708 - Flags: review+
test262-harness.diff is no longer required, updated part 1 accordingly.
Attachment #8810555 - Attachment is obsolete: true
Attachment #8824405 - Flags: review+
Attached patch bug977849-part2.patch (obsolete) — Splinter Review
Updated part 2 to apply cleanly on inbound.
Attachment #8816707 - Attachment is obsolete: true
Attachment #8824406 - Flags: review+
Attached file bug977849-part3.zip (obsolete) —
Update part 3 to use newer checkout from test262.
Attachment #8810556 - Attachment is obsolete: true
Attachment #8824407 - Flags: review+
Attached patch bug977849-part4.patch (obsolete) — Splinter Review
Update exclusion list to match current test262 status.
Attachment #8816708 - Attachment is obsolete: true
Attachment #8824408 - Flags: review+
Attachment #8824406 - Attachment is obsolete: true
Attachment #8827947 - Flags: review+
Attached file bug977849-part3.zip
Attachment #8824407 - Attachment is obsolete: true
Attachment #8827948 - Flags: review+
Attachment #8824408 - Attachment is obsolete: true
Attachment #8827949 - Flags: review+
https://treeherder.mozilla.org/#/jobs?repo=try&revision=74ca7afe6a081d20bec23557946ab67a4570acba is a try push with SM-tc(262) jobs for running these tests. But the patches in this bug do not reflect the bug title; these patches add the files into the gecko repo directly. I think I heard that gps has changed his opinion since comments 29 and 31? If so, it would be good to document that here. And if not, I can look into changing the job to checking out from github and storing a revision id file in the gecko tree instead.
Flags: needinfo?(gps)
Servo vendoring will add ~2k files instead of ~10k. So I'm less worried about file count and am tentatively OK with vendoring these test-262 files.

Before we commit to vendoring these files in mozilla-central, I think we should prove out the value and feasibility of running these tests in automation. I think we should stand up tier-2 or tier-3 automation that runs these tests after cloning from GitHub. Once those are running for a few weeks or months and all the major issues are ironed out (sheriffing, intermittent failures, etc), we can vendor the test files in mozilla-central, remove the GitHub dependency, and make these a tier-1 test job.

My concern with jumping straight in is that we vendor all these files (an irreversible decision) and find a showstopper or other similar issue preventing or significantly delaying the running of the tests in automation. While I assume the risk is low and we'll eventually run test-262 from mozilla-central, it is a risk. And the cost to initially running from GitHub feels low compared to the cost of having all the extra data in mozilla-central forever.

Also, the VCS syncing infrastructure I'm building for Servo in bug 1322769 can be leveraged to automatically synchronize/vendor the external Git[Hub] repo into a subdirectory of mozilla-central.
Flags: needinfo?(gps)
Oh, if you really want tier-1 automation status, we can mirror the Git repo to hg.mozilla.org and have automation clone from there instead of GitHub. We could live in that state indefinitely and vendor the files in mozilla-central at your leisure. We'd need to pin commit hashes inside mozilla-central so behavior in automation is deterministic, of course.
Darn it, my bugmail somehow ate your reply. Need to track that down.

(In reply to Gregory Szorc [:gps] from comment #44)
> Oh, if you really want tier-1 automation status, we can mirror the Git repo
> to hg.mozilla.org and have automation clone from there instead of GitHub. We
> could live in that state indefinitely and vendor the files in
> mozilla-central at your leisure. We'd need to pin commit hashes inside
> mozilla-central so behavior in automation is deterministic, of course.

Ok, that was what I originally expected your desired outcome to be.

But now I'm thinking that maybe vendoring is a better idea. When I asked on dev-tech-js-internals whether people wanted to separate out these test262 tests so they wouldn't slow down regular test runs during development, the consensus seemed to be that these test262 tests should really be replacing many of our existing in-tree tests, and that we should be relying on the new tests to catch problems during development. And unless we expect people to start pulling down an external repo for testing, that seems to argue for vendoring if it doesn't cause too many other problems.

Also, I looked at the test job time increase. It wasn't as bad as expected -- the worst one is the debug job, which goes from 25 minutes to 40 minutes. For a CI job, that really doesn't seem too awful.

I get what you mean about this being irreversible, though. I'll look more closely at what anba has done here to see how hard it would be to try things out first with a github pull.
(In reply to Steve Fink [:sfink] [:s:] from comment #45)
> Also, I looked at the test job time increase. It wasn't as bad as expected
> -- the worst one is the debug job, which goes from 25 minutes to 40 minutes.
> For a CI job, that really doesn't seem too awful.

Running the tests on Android could be more troublesome, I got timeouts for debug and opt builds. And we may need more than one chunk for test262 to avoid the "Output exceeded 52428800 bytes, remaining output has been truncated" errors.

Try run from comment #24: https://treeherder.mozilla.org/#/jobs?repo=try&revision=cad4b1851e6d9f237e84eee688317a7d58bbc1d7
(In reply to André Bargull from comment #46)
> (In reply to Steve Fink [:sfink] [:s:] from comment #45)
> > Also, I looked at the test job time increase. It wasn't as bad as expected
> > -- the worst one is the debug job, which goes from 25 minutes to 40 minutes.
> > For a CI job, that really doesn't seem too awful.
> 
> Running the tests on Android could be more troublesome, I got timeouts for
> debug and opt builds.

Ok. So we should either add more chunks, or skip the slowest tests.

> And we may need more than one chunk for test262 to
> avoid the "Output exceeded 52428800 bytes, remaining output has been
> truncated" errors.

What is producing so much output during the run? It looks like the logs from your try push have expired, so I couldn't look and see. But 50MB of output seems excessive.
(In reply to André Bargull from comment #46)
> (In reply to Steve Fink [:sfink] [:s:] from comment #45)
> > Also, I looked at the test job time increase. It wasn't as bad as expected
> > -- the worst one is the debug job, which goes from 25 minutes to 40 minutes.
> > For a CI job, that really doesn't seem too awful.
> 
> Running the tests on Android could be more troublesome, I got timeouts for
> debug and opt builds. And we may need more than one chunk for test262 to
> avoid the "Output exceeded 52428800 bytes, remaining output has been
> truncated" errors.
> 
> Try run from comment #24:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=cad4b1851e6d9f237e84eee688317a7d58bbc1d7

I have a try push where I doubled the chunk count for the J reftest tasks across all platforms. Here's the Android stuff: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1051189078e7494e9723778c3fccc669ff1b9482&filter-searchStr=j%20android

That's 12 chunks on opt and 40 chunks on debug. The oranges were actual failures, not timeouts. The raw data:

    // All times in minutes
    var optTimes = [90, 96, 89, 93, 79, 79, 87, 88, 82, 85, 94, 89];
    var debugTimes = [99, 81, 95, 105, 101, 86, 81, 84, 86, 77,
                      90, 88, 80, 88, /* J15 had infra problem */, 83, 88, 89, 82, 87,
                      79, 80, 91, 80, 82, 98, 83, 78, 77, 78,
                      83, 81, 83, 85, 82, 75, 82, 77, 80, 82];

The timeout for these tasks is 120.

The opt times have breathing room and could probably go down to 10 chunks with average duration of 105 minutes. Problem is the durations aren't evenly distributed. J2 and J11 took longer.

While most of the debug chunks have plenty of breathing room, two are pretty close to 120. Again, problem is distribution. I don't know how to control the chunking itself.

:gps, what are the cost tradeoffs of spinning up less tasks that take more time or more tasks that take less time, do you know?
Flags: needinfo?(gps)
I mean cost literally above, as in money.
(In reply to Steve Fink [:sfink] [:s:] from comment #47)
> > And we may need more than one chunk for test262 to
> > avoid the "Output exceeded 52428800 bytes, remaining output has been
> > truncated" errors.
> 
> What is producing so much output during the run? It looks like the logs from
> your try push have expired, so I couldn't look and see. But 50MB of output
> seems excessive.

Maybe it was just the usual noisiness of debug browser builds? Because if each test produces a bit more than 2KB of output, the 50MB limit is easily hit.
I have some cost estimates from :garndt for the try push in comment 48 compared against a recent m-i push [1]:

14:02 < garndt> shu: so the costs for just the android 4.3 platform is: try: opt - 1.32 debug - 3.82....mozilla-inbound: opt - 1.00 
                debug 2.66.....the total cost for each push is not completely reflective of total cost because I'm only grabbing data 
                for taskcluster jobs, not buildbot....for just the task cluster jobs on the try push, it was $14.38 and the 
                mozilla-inbound

So roughly, doubling the chunks of the Android J job is 32% more expensive for opt, and 43% more expensive for debug. That's not great.

sfink, Waldo, and I chatted about simply not running Android J. My intuition is Android J would rarely catch anything Linux/Win/Mac J wouldn't. Waldo is against omitting Android J on principle.

philor also mentioned there's also this thing called SETA, which, with enough historical data, avoids running a particular platform+task combo on every push, if there's never been a backout due to an orange for that particular platform+task combo. SETA is turned on, for example, for macOS opt J.

Given that, I think we should eat the increased cost to the Android J jobs and significantly increase the chunk count, with the intention of subjecting it to SETA sooner than later. I'll probe some more for a reasonable number.

[1] https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=fbe43964eb8d1595bb67fd679da69053de3a2015
:jamher can help you out with seta at the appropriate time.
* :jmaher

Sorry for the noise.
Flags: needinfo?(gps)
Eurgh. I was implementing a version that checked test262 out from github (which turns out to be trivially easy, since anba fixed everything up nicely). But then I realized it would be specific to a particular job type. As in, it'd be easy to make an SM(262) job that runs the full suite. But it doesn't prove that this is landable, because it's  the in-browser jsreftests that need greening up, plus any jobs that could theoretically fail because the checkout is bigger/slower. But those can only be tested by pushing the actual tests, not by pulling from github.

I pushed my initial attempt anyway, to <https://treeherder.mozilla.org/#/jobs?repo=try&revision=0a1789aafeefdb59b27931b76bece13ba2454d3b>. It will probably fail. It's also a little bogus, in that it already has the tests in-tree, but updates (with the same git rev) over the top. That's just to avoid patch stack juggling.
I ran into test failures for the following tests:

language/expressions/object/method-definition/generator-name-prop-symbol.js
language/expressions/object/method-definition/name-name-prop-symbol.js

These tests have "var name = Symbol(...)" at the top level and triggers the
window's name setter.
Attachment #8832312 - Flags: review?(andrebargull)
(In reply to Shu-yu Guo [:shu] from comment #55)
> Created attachment 8832312 [details] [diff] [review]
> Sanitize uses of "name" and "length" when running in browser.
> 
> I ran into test failures for the following tests:
> 
> language/expressions/object/method-definition/generator-name-prop-symbol.js
> language/expressions/object/method-definition/name-name-prop-symbol.js
> 
> These tests have "var name = Symbol(...)" at the top level and triggers the
> window's name setter.

I think I'd prefer to fix this upstream and in the meantime simply add the failing tests to js/src/tests/jstests.list, similar to the other tests which are already skipped when running in the browser (cf. patch part 4).
(In reply to André Bargull from comment #56)
> (In reply to Shu-yu Guo [:shu] from comment #55)
> > Created attachment 8832312 [details] [diff] [review]
> > Sanitize uses of "name" and "length" when running in browser.
> > 
> > I ran into test failures for the following tests:
> > 
> > language/expressions/object/method-definition/generator-name-prop-symbol.js
> > language/expressions/object/method-definition/name-name-prop-symbol.js
> > 
> > These tests have "var name = Symbol(...)" at the top level and triggers the
> > window's name setter.
> 
> I think I'd prefer to fix this upstream and in the meantime simply add the
> failing tests to js/src/tests/jstests.list, similar to the other tests which
> are already skipped when running in the browser (cf. patch part 4).

Okay, that's fine, I'll blacklist them.
Attachment #8832312 - Attachment is obsolete: true
Attachment #8832312 - Flags: review?(andrebargull)
In the shell we have 3 intl402 tests that should be disabled pending normative updates to the spec (see bug 1332604).

test262/intl402/DateTimeFormat/prototype/12.3_a.js
test262/intl402/Collator/prototype/10.3_a.js
test262/intl402/NumberFormat/prototype/11.3_a.js

There is one failure that I don't understand:

test262/intl402/NumberFormat/11.1.1_20_c.js

The above test fails with "Test262Error: Didn't get correct minimumFractionDigits for currency BYR; expected 0, got 2." So ISO lists the Belarusian Ruble as BYN per [1]. But I guess ICU uses BYR? What's up with that?

[1] https://www.currency-iso.org/dam/downloads/lists/list_one.xml
(In reply to Shu-yu Guo [:shu] from comment #58)
> test262/intl402/NumberFormat/11.1.1_20_c.js

I found out this is already fixed in test262 upstream, so we don't have to worry about it.
In case it's useful for something, here's my patch that creates SM(262) jobs that pull from github. The other SM jobs are not modified; they continue to use whatever is checked into the tree.

In reality, we'd probably put the git rev in a separate file, and use it for jsreftests etc.
Assignee: andrebargull → sphink
Attached patch Sync test262.Splinter Review
Attachment #8833136 - Attachment is obsolete: true
Attachment #8833137 - Flags: review?(evilpies)
Attachment #8833138 - Flags: review?(jwalden+bmo)
Comment on attachment 8833139 [details] [diff] [review]
Update test skip list for Date tests that moved from annexB.

Review of attachment 8833139 [details] [diff] [review]:
-----------------------------------------------------------------

r=me for trivial move.
Attachment #8833139 - Flags: review+
Attachment #8833140 - Flags: review?(evilpies)
Comment on attachment 8833440 [details] [diff] [review]
Increase number of chunks for jsreftest to account for test262.

Review of attachment 8833440 [details] [diff] [review]:
-----------------------------------------------------------------

These numbers were purely arrived at by trial and error to ensure that 1) log size doesn't exceed truncation limit and 2) machines don't time out.
Attachment #8833440 - Flags: review?(dustin)
Depends on: 1336553
Comment on attachment 8833137 [details] [diff] [review]
Disable test262 tests that use "name" in the browser.

Review of attachment 8833137 [details] [diff] [review]:
-----------------------------------------------------------------

Please file a test262 github issue.
Attachment #8833137 - Flags: review?(evilpies) → review+
Attachment #8833140 - Flags: review?(evilpies) → review+
Attachment #8833440 - Flags: review?(dustin) → review+
Attachment #8833138 - Flags: review?(jwalden+bmo) → review+
I looked at the version control aspect of this again today with :shu. Surprisingly, the bundle size of the thousands of added files is only 3-4MB! So, this won't increase the amount of data transferred as part of clone or pull operations significantly. Yay!

In addition, the commits don't do anything sub-optimal like introduce thousands of files and later delete them. So, we're not forever wasting repo storage by adding history that isn't relevant. More yay!

The main version control concern from this vendoring will be file count. Working directory updates will take longer due to having to create more files. `status` and other operations that walk the working directory will also be slower due to having to do more work. (This is mostly mitigated for Mercurial users that have the fsmonitor extension + Watchman enabled.) Developers may complain. Some jobs in automation may run slower. Boo.

The mitigation for coping with the ever-increasing number of files in the working directory is sparse checkouts (only checking out the files/directories you care about). Mercurial and Git both have support for this feature. Alternatively, there are virtual filesystems for working directories. Those are arguably better since every file is always available and you don't have to teach things to deal with cases were some files are available but others aren't. Unfortunately, robust, cross-platform support for virtual filesystems in a way that could be leveraged by both developers and automation is still a ways off.

Anyway, the ~20,000 new files being introduced in this landing may slow down some things. But, there are obvious benefits to having the tests readily available in tree. I don't anticipate any major problems as a result of this landing - certainly not enough to cancel out the benefits it will bring. And if there are problems, there are existing version control features we can lean on to relieve scaling pressure.

So, as the person who would likely have to deal with the fallout of any version control scaling problems this might cause, you have my blessing to land this. Let's hope I don't regret saying that :)
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d14cac631ecc
Initial import of test262. (r=evilpie)
https://hg.mozilla.org/integration/autoland/rev/2da4446dd8f6
Increase number of chunks for jsreftest to account for test262. (r=dustin)
https://hg.mozilla.org/integration/autoland/rev/6fdb946f7b62
Disable test262 tests that use "name" in the browser. (r=evilpie)
https://hg.mozilla.org/integration/autoland/rev/a02efde2019f
Disable test262 tests pending a normative change to ecma402 (PR 122). (r=Waldo)
https://hg.mozilla.org/integration/autoland/rev/a783591e183a
Update test skip list for Date tests that moved from annexB. (r=me)
https://hg.mozilla.org/integration/autoland/rev/3261604e2320
Sync test262. (r=evilpie)
Blocks: 1338882
Depends on: 1338916
Pushed by shu@rfrn.org:
https://hg.mozilla.org/integration/mozilla-inbound/rev/85faf95c2400
Followup: unskip test262 on Android. (r=sfink)
Depends on: 1342963
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: