Bugzilla

Comment 3

•

8 years ago

Attached patch bug977849-part1.patch (obsolete) — Details — Splinter Review

A new update script to pull test262 from GitHub and then pre-process each test file to be compatible with the jstests harness.

Assignee: jwalden+bmo → andrebargull

Attachment #8809649 - Flags: review?(evilpies)

Comment 4

•

8 years ago

Attached patch bug977849-part2.patch (obsolete) — Details — Splinter Review

Remove all old test262 files.

Attachment #8809650 - Flags: review?(evilpies)

Comment 5

•

8 years ago

Attached file bug977849-part3.zip (obsolete) — Details

Add the new test262 files (zipped patch because bugzilla doesn't allow 70MB patches *shocking* :-p).

Attachment #8809653 - Flags: review?(evilpies)

Comment 6

•

8 years ago

Attached patch bug977849-part4.patch (obsolete) — Details — Splinter Review

And finally the updated exclusion list for tests which don't pass in SpiderMonkey.

Attachment #8809655 - Flags: review?(evilpies)

Comment 7

•

8 years ago

I haven't yet tried to support the async Promise tests, that needs to happen next. :-)

Comment 8

•

8 years ago

Thanks for doing this work! I wish we had this two years ago :) However we probably need to talk to different people before landing this ... Previously my idea is to use the nodejs runner instead of using our test runner, so we don't pay a cost of keeping the tests running. We also talked about having treeherder/task cluster download the test from git instead of putting the files in mercurial. 70MB is a huge repository increase so we need to discuss this with the appropriate peers.

Comment 9

•

8 years ago

Another caveat is the increased time required to finish jstests. With the old test262 checkout (~3300 files), I need about 1 minute to run all test262 tests in my development VM (with an opt build). The new test262 checkout (~42300 files, most tests duplicated to run in non-strict and strict mode) requires about 13 minutes on the same machine.

Comment 10

•

8 years ago

Oh and thanks for classifying all the errors. Must have been a lot of work and we could use this to reduce our number of failures.

Comment 11

•

8 years ago

Statistics:
1) With test duplication to run tests in non-strict and strict mode:
- Increased run time for jstests.py (with -j5): 13x
- 47.791 objects, total size 59,4 MB

2) Without test duplication:
- Increased run time for jstests.py (with -j5): 6.5x
- 26.284 objects, total size 32,9 MB

Comment 12

•

8 years ago

Hey gps, is this an acceptable increase? Jan suggested to ping you about this.

Flags: needinfo?(gps)

Comment 13

•

8 years ago

Attached patch bug977849-part1.patch (obsolete) — Details — Splinter Review

Updated patches to skip the test duplication for strict-mode tests.

Attachment #8809649 - Attachment is obsolete: true

Attachment #8809649 - Flags: review?(evilpies)

Attachment #8809984 - Flags: review?(evilpies)

Comment 14

•

8 years ago

Attached patch bug977849-part2.patch (obsolete) — Details — Splinter Review

Attachment #8809650 - Attachment is obsolete: true

Attachment #8809650 - Flags: review?(evilpies)

Attachment #8809985 - Flags: review?(evilpies)

Comment 15

•

8 years ago

Attached file bug977849-part3.zip (obsolete) — Details

Attachment #8809653 - Attachment is obsolete: true

Attachment #8809653 - Flags: review?(evilpies)

Attachment #8809986 - Flags: review?(evilpies)

Comment 16

•

8 years ago

Attached patch bug977849-part4.patch (obsolete) — Details — Splinter Review

Attachment #8809655 - Attachment is obsolete: true

Attachment #8809655 - Flags: review?(evilpies)

Attachment #8809987 - Flags: review?(evilpies)

Updated

•

8 years ago

Attachment #8809985 - Flags: review?(evilpies) → review+

Comment 17

•

8 years ago

Comment on attachment 8809984 [details] [diff] [review]
bug977849-part1.patch

Review of attachment 8809984 [details] [diff] [review]:
-----------------------------------------------------------------

I don't think you need to change anything here, as long as this works! \o/

::: js/src/tests/test262-harness.diff
@@ +2,5 @@
> +index 52c9021..eeabfd8 100644
> +--- a/harness/detachArrayBuffer.js
> ++++ b/harness/detachArrayBuffer.js
> +@@ -1,3 +1,3 @@
> + function $DETACHBUFFER(buffer) {

Why isn't test262 using $.detachArrayBuffer?

::: js/src/tests/test262-host.js
@@ +6,5 @@
> +this.$ = {
> +    __proto__: null,
> +    createRealm() {
> +        var newGlobal = this.newGlobal();
> +        newGlobal.evaluate(`

Couldn't we just define this once at the top level and use eval for that definiton as well?

@@ +30,5 @@
> +    // This function is generally called from within a Promise handler, so any
> +    // exception thrown by this method will be swallowed and most likely
> +    // ignored by the Promise machinery.
> +    if ($mozAsyncTestDone)
> +        reportFailure("$DONE() already called");

Oh we don't use that unhanded rejected promise machinery in the shell? We probably should.

::: js/src/tests/test262-update.py
@@ +32,5 @@
> +    """
> +    import imp
> +
> +    packagingDir = os.path.join(test262Dir, "tools", "packaging")
> +    return imp.load_source("test262parser", os.path.join(packagingDir, "parseTestRecord.py"))

I think that function was removed in future versions. load_module wasn't deprecated till python 3.3.

@@ +67,5 @@
> +    skipIfTest = filterRefTest(refTest, "skip-if")
> +    failsTest = filterRefTest(refTest, "fails")
> +
> +    if skipTest:
> +        comments = ", ".join(imap(itemgetter(2), skipTest))

Personally I find using tuples confusing in this case.

@@ +97,5 @@
> +    # Prepend a possible "use strict" directive.
> +    if directive:
> +        source = directive + "\n" + source
> +
> +    # Add the |reftest| is present.

if

@@ +393,5 @@
> +    strictGroup = parser.add_mutually_exclusive_group()
> +    strictGroup.add_argument("--strict", default=False, action="store_true", dest="strict",
> +                             help="Generate additional strict mode tests.")
> +    strictGroup.add_argument("--no-strict", default=False, action="store_false", dest="strict",
> +                             help="Don't generate additional strict mode tests. This is the default mode.")

How does it know which one is the default?

Attachment #8809984 - Flags: review?(evilpies) → review+

Comment 18

•

8 years ago

Comment on attachment 8809986 [details]
bug977849-part3.zip

Okay, maybe it would be possible to move some of the imports to the top level, e.g those that provide verifyEnumerable etc. It seems like that is duplicated in a lot of shell.js files.

Comment 19

•

8 years ago

Comment on attachment 8809987 [details] [diff] [review]
bug977849-part4.patch

Review of attachment 8809987 [details] [diff] [review]:
-----------------------------------------------------------------

This is amazing and super useful. We should try to prioritize some of those failures.

Attachment #8809987 - Flags: review?(evilpies) → review+

Comment 20

•

8 years ago

Attached patch bug977849-part0.patch — Details — Splinter Review

I've separated the jstests framework changes into this new patch for clarity.

browser.js:
Test262 requires a host provided function to evaluate source code as global script code. In the shell, we can use the evaluate() function. But in browser, no builtin function is available to evaluate source code as global script code. We can provide this functionality through <script> elements, but there are some caveats:

1. Evaluating source code through <script> doesn't return the completion value. In the previous patch, I named the new browser.js function "evaluate" like it's shell counterpart, but that broke existing jstests which expect evaluate() to return a completion value. So I needed to use a different name for this function (it's now named "evaluateScript").
2. Catching and rethrowing errors needs some extra effort, but it seems to work so far.
3. I had to learn that it's not possible to evaluate source code with <script> in an <iframe>, when the <iframe> is no longer attached to the DOM (*). That means I needed to remove the call to HTMLIFramePrototypeRemove in browser.js' newGlobal() function.

(*) And I also had to learn that the same restriction applies to Workers in <iframe>s. :-)


manifest.py:
manifest.py applies different reftest terms on top of each other, but only stores the last terms string. This leads to different behaviour when running test262 in the shell compared to the browser reftests. In the shell, the reftest terms are cumulative and it's possible to use inline |reftest| comments and jstests.list entries. But in the browser, only the inline |reftest| comments was used.

Attachment #8810552 - Flags: review?(evilpies)

Comment 21

•

8 years ago

Attached patch bug977849-part1.patch (obsolete) — Details — Splinter Review

Updated part 1 per review comments:
- Removed code duplication in test262-host.js
- Replaced imp.load_source with imp.find_module + imp.load_module
- Simplified reftest string creation ("fails" entries weren't actually used any more)
- Common harness files are now bundled in top level shell.js files to reduce code duplication in the generated test262 files
- And Promise tests are now also enabled for browser jstests

Carrying r+ from evilpie

Attachment #8809984 - Attachment is obsolete: true

Attachment #8810555 - Flags: review+

Comment 22

•

8 years ago

Attached file bug977849-part3.zip (obsolete) — Details

Regenerated part 3 again.

Attachment #8809986 - Attachment is obsolete: true

Attachment #8809986 - Flags: review?(evilpies)

Attachment #8810556 - Flags: review?(evilpies)

Comment 23

•

8 years ago

Attached patch bug977849-part4.patch (obsolete) — Details — Splinter Review

Updated part 4 to exclude additional tests when running test262 in the browser (*) and added new bugzilla links for failing tests.

Carrying r+ from evilpie.

Note: I'll need to update this part again because some tests are timing out on certain platforms. 

(*) Not all test262 are compatible for browser environments!

Attachment #8809987 - Attachment is obsolete: true

Attachment #8810561 - Flags: review+

Comment 24

•

8 years ago

It also seems to be necessary to use more chunks when jsreftest includes test262 to avoid frequent "Output exceeded 52428800 bytes, remaining output has been truncated" and "command timed out: 7200 seconds elapsed running" issues on Try [1].

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=cad4b1851e6d9f237e84eee688317a7d58bbc1d7

Comment 26

•

8 years ago

Comment on attachment 8810552 [details] [diff] [review]
bug977849-part0.patch

Review of attachment 8810552 [details] [diff] [review]:
-----------------------------------------------------------------

Interesting. I didn't spot anything obviously wrong. The onbeforescriptexecute stuff in evaluateScript might be a bit too complicated, but as long as it works.

::: js/src/tests/lib/manifest.py
@@ +276,5 @@
>              return
>  
>      testcase.tag = matches.group(1)
> +    _append_terms_and_comment(testcase, matches.group(2), matches.group(4))
> +    _parse_one(testcase, matches.group(2), xul_tester)

So it's correct not to use testcase.terms here?

Attachment #8810552 - Flags: review?(evilpies) → review+

Comment 27

•

8 years ago

Comment on attachment 8810556 [details]
bug977849-part3.zip

rs=me, great to see the smaller shell.js files!

Attachment #8810556 - Flags: review?(evilpies) → review+

Comment 28

•

8 years ago

(In reply to Tom Schuster [:evilpie] from comment #26)
> ::: js/src/tests/lib/manifest.py
> @@ +276,5 @@
> >              return
> >  
> >      testcase.tag = matches.group(1)
> > +    _append_terms_and_comment(testcase, matches.group(2), matches.group(4))
> > +    _parse_one(testcase, matches.group(2), xul_tester)
> 
> So it's correct not to use testcase.terms here?

Yes, this simply avoids to parse any previous terms again.

Comment 29

•

8 years ago

(In reply to Tom Schuster [:evilpie] from comment #12)
> Hey gps, is this an acceptable increase? Jan suggested to ping you about
> this.

Thank you for checking for implications of landing this before landing this!

I'm a big proponent of monorepos and vendoring all of the things, like the ~30k new test files this bug wants to add. However, that mindset has to be tempered by the reality that adding tens of thousands of files has performance implications for several processes:

* clone size/time, both for developers and for automation
* working directory update time, both for developers and for automation (it is a bigger deal in automation because machines are doing large updates more frequently than humans)
* `hg status` and `git status` performance (although Mercurial has a mitigation for that via the `watchman` filesystem monitor)

Combined, this can contribute significant one-time and recurring overhead.

We want mozilla-central to be a useful monorepo and to scale to millions of files (if needed). However, before that can realistically happen:

* We need at least CI/automation performing sparse checkouts (only check out the files you need, not every file). This may require Mercurial's sparse checkout functionality being part of the core distribution (as opposed to a 3rd party extension)
* We may also want to support "narrow clones" (cloning a subset of files) so people and machines don't have to download data for thousands of files they likely don't care about.
* Git's scaling story for working directories with hundreds of thousands of files needs to improve. Without filesystem watching, performance of operations like `git status` can go off a cliff once the working directory reaches certain inode thresholds.

We have rough timelines for the first 2 items. And if push came to shove, Git client performance issues would likely not block us from adding more files to mozilla-central. Or the Git tooling would be taught to ignore certain directories to allow it to work/scale.

If we can avoid adding the ~30k new test files at this time, it would be preferred. If it needs to happen (e.g. there is a compelling developer productivity case to be made) we can do it. But I'd prefer to not press the scaling issues if we don't have to.

As for alternatives, we can have automation grab the tests from a separate repository. We don't like automation relying on 3rd party services (e.g. github.com) because we don't like outages of 3rd party services causing tree closures. Unfortunately, we don't have a good Git hosting story that scales well. So your alternate choices seem to be:

1) have automation clone from Git; make jobs tier-1 and incur a tree closure when Git host goes down
2) same as #1 except jobs are tier-2 and can be orange for long periods if Git host goes down
3) mirror Git repository to hg.mozilla.org and have automation consume it (there was precedence for this with FxOS foo and I believe we have existing software/infrastructure to throw at the problem)

Flags: needinfo?(gps)

Comment 30

•

8 years ago

Thank you gps! See bug 1285372 about the alternative approach of using the nodejs test262-harness and a separate test262 git checkout to run the tests.

Comment 31

•

8 years ago

I should also mention that this isn't the only project that may want to add tons of new files:

* vendoring servo will add ~10k files
* vendoring WPT CSS tests would add ~100k files (!!!)

mozilla-central is currently ~150k files. From what I've heard from Facebook's experience scaling their monorepo, mozilla-central is on the verge of running into some problems. With vendoring of servo being a sure thing, I really don't want to take the risk that the ~30k files in this bug jeopardize the servo vendoring. Once servo is vendored in and we have some more vcs syncing infrastructure in place around that, I'd feel much better about taking these JS test files and (eventually) the WPT CSS tests.

James Graham [:jgraham]

Comment 32

•

8 years ago

> * vendoring WPT CSS tests would add ~100k files (!!!)

I don't know where that number comes from, but I think the reality is closer to 35k files.

jgraham@luna:~/develop/csswg-test$ find . | wc -l
34660

FWIW I think that basing a stopgap solution on FxOS precedent would be a bad idea; that infrastructure was fragile and no one was sad to see the back of it. If you need a quick hack, don't want two-way sync and plan to update the tests relatively infrequently, uploading a zip of the testsuite to tooltool could work. It's not a great solution, but nothing other than "vendor the files in tree" is.

Comment 33

•

8 years ago

Attached patch bug977849-part2.patch (obsolete) — Details — Splinter Review

Updated part 2 to apply cleanly on inbound. Carrying r+ from evilpie.

Attachment #8809985 - Attachment is obsolete: true

Attachment #8816707 - Flags: review+

Comment 34

•

8 years ago

Attached patch bug977849-part4.patch (obsolete) — Details — Splinter Review

Updated part 4 to enable more test262 tests. Carrying r+.

Attachment #8810561 - Attachment is obsolete: true

Attachment #8816708 - Flags: review+

Comment 35

•

7 years ago

Attached patch bug977849-part1.patch — Details — Splinter Review

test262-harness.diff is no longer required, updated part 1 accordingly.

Attachment #8810555 - Attachment is obsolete: true

Attachment #8824405 - Flags: review+

Comment 36

•

7 years ago

Attached patch bug977849-part2.patch (obsolete) — Details — Splinter Review

Updated part 2 to apply cleanly on inbound.

Attachment #8816707 - Attachment is obsolete: true

Attachment #8824406 - Flags: review+

Comment 37

•

7 years ago

Attached file bug977849-part3.zip (obsolete) — Details

Update part 3 to use newer checkout from test262.

Attachment #8810556 - Attachment is obsolete: true

Attachment #8824407 - Flags: review+

Comment 38

•

7 years ago

Attached patch bug977849-part4.patch (obsolete) — Details — Splinter Review

Update exclusion list to match current test262 status.

Attachment #8816708 - Attachment is obsolete: true

Attachment #8824408 - Flags: review+

Comment 39

•

7 years ago

Attached patch bug977849-part2.patch — Details — Splinter Review

Attachment #8824406 - Attachment is obsolete: true

Attachment #8827947 - Flags: review+

Comment 40

•

7 years ago

Attached file bug977849-part3.zip — Details

Attachment #8824407 - Attachment is obsolete: true

Attachment #8827948 - Flags: review+

Comment 41

•

7 years ago

Attached patch bug977849-part4.patch — Details — Splinter Review

Attachment #8824408 - Attachment is obsolete: true

Attachment #8827949 - Flags: review+

Assignee

Comment 42

•

7 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=74ca7afe6a081d20bec23557946ab67a4570acba is a try push with SM-tc(262) jobs for running these tests. But the patches in this bug do not reflect the bug title; these patches add the files into the gecko repo directly. I think I heard that gps has changed his opinion since comments 29 and 31? If so, it would be good to document that here. And if not, I can look into changing the job to checking out from github and storing a revision id file in the gecko tree instead.

Flags: needinfo?(gps)

Comment 43

•

7 years ago

Servo vendoring will add ~2k files instead of ~10k. So I'm less worried about file count and am tentatively OK with vendoring these test-262 files.

Before we commit to vendoring these files in mozilla-central, I think we should prove out the value and feasibility of running these tests in automation. I think we should stand up tier-2 or tier-3 automation that runs these tests after cloning from GitHub. Once those are running for a few weeks or months and all the major issues are ironed out (sheriffing, intermittent failures, etc), we can vendor the test files in mozilla-central, remove the GitHub dependency, and make these a tier-1 test job.

My concern with jumping straight in is that we vendor all these files (an irreversible decision) and find a showstopper or other similar issue preventing or significantly delaying the running of the tests in automation. While I assume the risk is low and we'll eventually run test-262 from mozilla-central, it is a risk. And the cost to initially running from GitHub feels low compared to the cost of having all the extra data in mozilla-central forever.

Also, the VCS syncing infrastructure I'm building for Servo in bug 1322769 can be leveraged to automatically synchronize/vendor the external Git[Hub] repo into a subdirectory of mozilla-central.

Flags: needinfo?(gps)

Comment 44

•

7 years ago

Oh, if you really want tier-1 automation status, we can mirror the Git repo to hg.mozilla.org and have automation clone from there instead of GitHub. We could live in that state indefinitely and vendor the files in mozilla-central at your leisure. We'd need to pin commit hashes inside mozilla-central so behavior in automation is deterministic, of course.

Assignee

Comment 45

•

7 years ago

Darn it, my bugmail somehow ate your reply. Need to track that down.

(In reply to Gregory Szorc [:gps] from comment #44)
> Oh, if you really want tier-1 automation status, we can mirror the Git repo
> to hg.mozilla.org and have automation clone from there instead of GitHub. We
> could live in that state indefinitely and vendor the files in
> mozilla-central at your leisure. We'd need to pin commit hashes inside
> mozilla-central so behavior in automation is deterministic, of course.

Ok, that was what I originally expected your desired outcome to be.

But now I'm thinking that maybe vendoring is a better idea. When I asked on dev-tech-js-internals whether people wanted to separate out these test262 tests so they wouldn't slow down regular test runs during development, the consensus seemed to be that these test262 tests should really be replacing many of our existing in-tree tests, and that we should be relying on the new tests to catch problems during development. And unless we expect people to start pulling down an external repo for testing, that seems to argue for vendoring if it doesn't cause too many other problems.

Also, I looked at the test job time increase. It wasn't as bad as expected -- the worst one is the debug job, which goes from 25 minutes to 40 minutes. For a CI job, that really doesn't seem too awful.

I get what you mean about this being irreversible, though. I'll look more closely at what anba has done here to see how hard it would be to try things out first with a github pull.

Comment 46

•

7 years ago

(In reply to Steve Fink [:sfink] [:s:] from comment #45)
> Also, I looked at the test job time increase. It wasn't as bad as expected
> -- the worst one is the debug job, which goes from 25 minutes to 40 minutes.
> For a CI job, that really doesn't seem too awful.

Running the tests on Android could be more troublesome, I got timeouts for debug and opt builds. And we may need more than one chunk for test262 to avoid the "Output exceeded 52428800 bytes, remaining output has been truncated" errors.

Try run from comment #24: https://treeherder.mozilla.org/#/jobs?repo=try&revision=cad4b1851e6d9f237e84eee688317a7d58bbc1d7

Assignee

Comment 47

•

7 years ago

(In reply to André Bargull from comment #46)
> (In reply to Steve Fink [:sfink] [:s:] from comment #45)
> > Also, I looked at the test job time increase. It wasn't as bad as expected
> > -- the worst one is the debug job, which goes from 25 minutes to 40 minutes.
> > For a CI job, that really doesn't seem too awful.
> 
> Running the tests on Android could be more troublesome, I got timeouts for
> debug and opt builds.

Ok. So we should either add more chunks, or skip the slowest tests.

> And we may need more than one chunk for test262 to
> avoid the "Output exceeded 52428800 bytes, remaining output has been
> truncated" errors.

What is producing so much output during the run? It looks like the logs from your try push have expired, so I couldn't look and see. But 50MB of output seems excessive.

Comment 48

•

7 years ago

(In reply to André Bargull from comment #46)
> (In reply to Steve Fink [:sfink] [:s:] from comment #45)
> > Also, I looked at the test job time increase. It wasn't as bad as expected
> > -- the worst one is the debug job, which goes from 25 minutes to 40 minutes.
> > For a CI job, that really doesn't seem too awful.
> 
> Running the tests on Android could be more troublesome, I got timeouts for
> debug and opt builds. And we may need more than one chunk for test262 to
> avoid the "Output exceeded 52428800 bytes, remaining output has been
> truncated" errors.
> 
> Try run from comment #24:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=cad4b1851e6d9f237e84eee688317a7d58bbc1d7

I have a try push where I doubled the chunk count for the J reftest tasks across all platforms. Here's the Android stuff: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1051189078e7494e9723778c3fccc669ff1b9482&filter-searchStr=j%20android

That's 12 chunks on opt and 40 chunks on debug. The oranges were actual failures, not timeouts. The raw data:

    // All times in minutes
    var optTimes = [90, 96, 89, 93, 79, 79, 87, 88, 82, 85, 94, 89];
    var debugTimes = [99, 81, 95, 105, 101, 86, 81, 84, 86, 77,
                      90, 88, 80, 88, /* J15 had infra problem */, 83, 88, 89, 82, 87,
                      79, 80, 91, 80, 82, 98, 83, 78, 77, 78,
                      83, 81, 83, 85, 82, 75, 82, 77, 80, 82];

The timeout for these tasks is 120.

The opt times have breathing room and could probably go down to 10 chunks with average duration of 105 minutes. Problem is the durations aren't evenly distributed. J2 and J11 took longer.

While most of the debug chunks have plenty of breathing room, two are pretty close to 120. Again, problem is distribution. I don't know how to control the chunking itself.

:gps, what are the cost tradeoffs of spinning up less tasks that take more time or more tasks that take less time, do you know?

Flags: needinfo?(gps)

Comment 49

•

7 years ago

I mean cost literally above, as in money.

Comment 50

•

7 years ago

(In reply to Steve Fink [:sfink] [:s:] from comment #47)
> > And we may need more than one chunk for test262 to
> > avoid the "Output exceeded 52428800 bytes, remaining output has been
> > truncated" errors.
> 
> What is producing so much output during the run? It looks like the logs from
> your try push have expired, so I couldn't look and see. But 50MB of output
> seems excessive.

Maybe it was just the usual noisiness of debug browser builds? Because if each test produces a bit more than 2KB of output, the 50MB limit is easily hit.

Comment 51

•

7 years ago

I have some cost estimates from :garndt for the try push in comment 48 compared against a recent m-i push [1]:

14:02 < garndt> shu: so the costs for just the android 4.3 platform is: try: opt - 1.32 debug - 3.82....mozilla-inbound: opt - 1.00
debug 2.66.....the total cost for each push is not completely reflective of total cost because I'm only grabbing data
for taskcluster jobs, not buildbot....for just the task cluster jobs on the try push, it was $14.38 and the
mozilla-inbound

So roughly, doubling the chunks of the Android J job is 32% more expensive for opt, and 43% more expensive for debug. That's not great.

sfink, Waldo, and I chatted about simply not running Android J. My intuition is Android J would rarely catch anything Linux/Win/Mac J wouldn't. Waldo is against omitting Android J on principle.

philor also mentioned there's also this thing called SETA, which, with enough historical data, avoids running a particular platform+task combo on every push, if there's never been a backout due to an orange for that particular platform+task combo. SETA is turned on, for example, for macOS opt J.

Given that, I think we should eat the increased cost to the Android J jobs and significantly increase the chunk count, with the intention of subjecting it to SETA sooner than later. I'll probe some more for a reasonable number.

[1] https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=fbe43964eb8d1595bb67fd679da69053de3a2015

James Graham [:jgraham]

Comment 52

•

7 years ago

:jamher can help you out with seta at the appropriate time.

James Graham [:jgraham]

Comment 53

•

7 years ago

* :jmaher

Sorry for the noise.

Updated

•

7 years ago

Flags: needinfo?(gps)

Assignee

Comment 54

•

7 years ago

Eurgh. I was implementing a version that checked test262 out from github (which turns out to be trivially easy, since anba fixed everything up nicely). But then I realized it would be specific to a particular job type. As in, it'd be easy to make an SM(262) job that runs the full suite. But it doesn't prove that this is landable, because it's  the in-browser jsreftests that need greening up, plus any jobs that could theoretically fail because the checkout is bigger/slower. But those can only be tested by pushing the actual tests, not by pulling from github.

I pushed my initial attempt anyway, to <https://treeherder.mozilla.org/#/jobs?repo=try&revision=0a1789aafeefdb59b27931b76bece13ba2454d3b>. It will probably fail. It's also a little bogus, in that it already has the tests in-tree, but updates (with the same git rev) over the top. That's just to avoid patch stack juggling.

Comment 55

•

7 years ago

Attached patch Sanitize uses of "name" and "length" when running in browser. (obsolete) — Details — Splinter Review

I ran into test failures for the following tests:

language/expressions/object/method-definition/generator-name-prop-symbol.js
language/expressions/object/method-definition/name-name-prop-symbol.js

These tests have "var name = Symbol(...)" at the top level and triggers the
window's name setter.

Updated

•

7 years ago

Attachment #8832312 - Flags: review?(andrebargull)

Comment 56

•

7 years ago

(In reply to Shu-yu Guo [:shu] from comment #55)
> Created attachment 8832312 [details] [diff] [review]
> Sanitize uses of "name" and "length" when running in browser.
> 
> I ran into test failures for the following tests:
> 
> language/expressions/object/method-definition/generator-name-prop-symbol.js
> language/expressions/object/method-definition/name-name-prop-symbol.js
> 
> These tests have "var name = Symbol(...)" at the top level and triggers the
> window's name setter.

I think I'd prefer to fix this upstream and in the meantime simply add the failing tests to js/src/tests/jstests.list, similar to the other tests which are already skipped when running in the browser (cf. patch part 4).

Comment 57

•

7 years ago

(In reply to André Bargull from comment #56)
> (In reply to Shu-yu Guo [:shu] from comment #55)
> > Created attachment 8832312 [details] [diff] [review]
> > Sanitize uses of "name" and "length" when running in browser.
> > 
> > I ran into test failures for the following tests:
> > 
> > language/expressions/object/method-definition/generator-name-prop-symbol.js
> > language/expressions/object/method-definition/name-name-prop-symbol.js
> > 
> > These tests have "var name = Symbol(...)" at the top level and triggers the
> > window's name setter.
> 
> I think I'd prefer to fix this upstream and in the meantime simply add the
> failing tests to js/src/tests/jstests.list, similar to the other tests which
> are already skipped when running in the browser (cf. patch part 4).

Okay, that's fine, I'll blacklist them.

Updated

•

7 years ago

Attachment #8832312 - Attachment is obsolete: true

Attachment #8832312 - Flags: review?(andrebargull)

Comment 58

•

7 years ago

In the shell we have 3 intl402 tests that should be disabled pending normative updates to the spec (see bug 1332604).

test262/intl402/DateTimeFormat/prototype/12.3_a.js
test262/intl402/Collator/prototype/10.3_a.js
test262/intl402/NumberFormat/prototype/11.3_a.js

There is one failure that I don't understand:

test262/intl402/NumberFormat/11.1.1_20_c.js

The above test fails with "Test262Error: Didn't get correct minimumFractionDigits for currency BYR; expected 0, got 2." So ISO lists the Belarusian Ruble as BYN per [1]. But I guess ICU uses BYR? What's up with that?

[1] https://www.currency-iso.org/dam/downloads/lists/list_one.xml

Comment 59

•

7 years ago

(In reply to Shu-yu Guo [:shu] from comment #58)
> test262/intl402/NumberFormat/11.1.1_20_c.js

I found out this is already fixed in test262 upstream, so we don't have to worry about it.

Assignee

Comment 60

•

7 years ago

Attached patch Create test262 jobs — Details — Splinter Review

In case it's useful for something, here's my patch that creates SM(262) jobs that pull from github. The other SM jobs are not modified; they continue to use whatever is checked into the tree.

In reality, we'd probably put the git rev in a separate file, and use it for jsreftests etc.

Comment on attachment 8833139 [details] [diff] [review]
Update test skip list for Date tests that moved from annexB.

Review of attachment 8833139 [details] [diff] [review]:
-----------------------------------------------------------------

r=me for trivial move.

Attachment #8833139 - Flags: review+

Updated

•

7 years ago

Attachment #8833140 - Flags: review?(evilpies)

Comment 69

•

7 years ago

Comment on attachment 8833440 [details] [diff] [review]
Increase number of chunks for jsreftest to account for test262.

Review of attachment 8833440 [details] [diff] [review]:
-----------------------------------------------------------------

These numbers were purely arrived at by trial and error to ensure that 1) log size doesn't exceed truncation limit and 2) machines don't time out.

Attachment #8833440 - Flags: review?(dustin)

Updated

•

7 years ago

Depends on: 1336553

Comment 70

•

7 years ago

Comment on attachment 8833137 [details] [diff] [review]
Disable test262 tests that use "name" in the browser.

Review of attachment 8833137 [details] [diff] [review]:
-----------------------------------------------------------------

Please file a test262 github issue.

Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d14cac631ecc
Initial import of test262. (r=evilpie)
https://hg.mozilla.org/integration/autoland/rev/2da4446dd8f6
Increase number of chunks for jsreftest to account for test262. (r=dustin)
https://hg.mozilla.org/integration/autoland/rev/6fdb946f7b62
Disable test262 tests that use "name" in the browser. (r=evilpie)
https://hg.mozilla.org/integration/autoland/rev/a02efde2019f
Disable test262 tests pending a normative change to ecma402 (PR 122). (r=Waldo)
https://hg.mozilla.org/integration/autoland/rev/a783591e183a
Update test skip list for Date tests that moved from annexB. (r=me)
https://hg.mozilla.org/integration/autoland/rev/3261604e2320
Sync test262. (r=evilpie)

Phil Ringnalda (:philor)

Comment 73

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/d14cac631ecc
https://hg.mozilla.org/mozilla-central/rev/2da4446dd8f6
https://hg.mozilla.org/mozilla-central/rev/6fdb946f7b62
https://hg.mozilla.org/mozilla-central/rev/a02efde2019f
https://hg.mozilla.org/mozilla-central/rev/a783591e183a
https://hg.mozilla.org/mozilla-central/rev/3261604e2320

Status: ASSIGNED → RESOLVED

Closed: 7 years ago

status-firefox54: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla54

Updated

•

7 years ago

Blocks: 1338882