Closed Bug 1342963 Opened 7 years ago Closed 7 years ago

Severe performance regression in Android jsreftest jobs

Categories

(Firefox for Android Graveyard :: Testing, defect, P1)

defect

Tracking

(firefox54 fixed)

RESOLVED FIXED
Firefox 54
Tracking Status
firefox54 --- fixed

People

(Reporter: gbrown, Assigned: gbrown)

References

Details

Attachments

(2 files, 1 obsolete file)

On mozilla-central, until Feb 23, each chunk of the Android opt jsreftests completed in 25 to 35 minutes; since then, each chunk is completing in about 110 minutes.

There is a similar regression in Android debug jsreftests.
Priority: -- → P1
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=85faf95c2400d9931cb0b98f6c089bf80c24d4f3&filter-searchStr=android+jsreftest and all later Android jsreftest jobs run much slower than earlier pushes.

At least part of the issue is the addition of previously skipped tests. Earlier runs of Android opt J1 have something like:

REFTEST SUITE-START | Running 3326 tests
REFTEST INFO | Running chunk 1 out of 10 chunks.  tests 1-355/355

Later runs have:

REFTEST SUITE-START | Running 26420 tests
REFTEST INFO | Running chunk 1 out of 10 chunks.  tests 1-2809/2809

I see bug 977849 also increased Android jsreftest chunks in an earlier push...but just barely enough.

If we really want to run all these tests, we'll need to use more chunks to avoid the intermittent timeouts we are seeing (duplicate bugs and bug 1204281).

Also, we usually try to keep test job run-times under 30 minutes. We'll need at least 30 Android opt chunks (20 more chunks) to meet that goal, and perhaps 90 chunks (55 more chunks) for Android debug.

:shu -- do you really want to run all those tests on Android?
:dustin -- is it okay to add that many chunks?
Blocks: 977849
Flags: needinfo?(shu)
Flags: needinfo?(dustin)
Summary: Severe performance regression in Android jsreftests → Severe performance regression in Android jsreftest jobs
So that's about 75 more 30-minute chunks per push, about 37.5 compute-hours, at US$0.04/hr that's about $1.50 per push just for compute, plus more to store all those logs.  I think -- Greg's better at these calculations than I am!
Flags: needinfo?(dustin) → needinfo?(garndt)
(In reply to Geoff Brown [:gbrown] from comment #6) 
> :shu -- do you really want to run all those tests on Android?

Yes, but not all the time. The previous plan was to aggressively SETA (I still don't know what that's an abbreviation for, but the thing where we don't run the tests on every push) them. They should also only be run when js/src is touched by the push. Can we make those things happen and up the chunk count and timeout times at the same time to cut down on costs and intermittents?
Flags: needinfo?(shu)
I think the win here is for when js/src is touched.  When SETA sees new jobs (or even chunks), it will always run those for 2 weeks to ensure they are stable and we get some basic data before skipping it.  SETA will determine which jobs have found regressions in the past (high value) and only run high value jobs for pushes 1-4, then on push 5, it will run every possibly test; repeat.

:dustin, do you see any faults in selecting running tests when certain files are changed.  I assume this is using the 'when' clause which I have bugged you about in the past many times.  One fault I see is that on try server these will only ever be run if/when we match the files- that could be confusing if there is a failure caused by something else.
Flags: needinfo?(dustin)
(In reply to Shu-yu Guo [:shu] from comment #8)
> (In reply to Geoff Brown [:gbrown] from comment #6) 
> > :shu -- do you really want to run all those tests on Android?
> 
> Yes, but not all the time. The previous plan was to aggressively SETA (I
> still don't know what that's an abbreviation for, but the thing where we
> don't run the tests on every push) them. They should also only be run when
> js/src is touched by the push. Can we make those things happen and up the
> chunk count and timeout times at the same time to cut down on costs and
> intermittents?

Thanks :shu. Only running when js/src is touched would be an interesting optimization. Would you want to see that only for Android, or desktop as well? Would you like to see that apply to all jsreftests, or just the test262 ones?
(In reply to Dustin J. Mitchell [:dustin] from comment #7)
> So that's about 75 more 30-minute chunks per push, about 37.5 compute-hours,
> at US$0.04/hr that's about $1.50 per push just for compute, plus more to
> store all those logs.  I think -- Greg's better at these calculations than I
> am!

Correct, we average $0.04/hr for that worker type.

Looking at the last 7 days, I recorded 1234 pushes resulting in automation for 'android-4-3-armv7-api15 debug' or 'android-4-3-armv7-api15 opt'.  Assuming $1.50/push [1] without optimization by SETA or using the 'when' that's $1851 per week in additional automation.  SETA and running tests only when files changes would definitely help.  

Is it possible to run tests only when file changes on certain branches, but always run them on try?  Try in this case was about a third of our pushes that contained those platforms (451 out of 1234)

[1] Assumed that the additional 75 chunks will run 30 minutes each and never be retried.
Flags: needinfo?(garndt)
(In reply to Geoff Brown [:gbrown] from comment #10)
> (In reply to Shu-yu Guo [:shu] from comment #8)
> > (In reply to Geoff Brown [:gbrown] from comment #6) 
> > > :shu -- do you really want to run all those tests on Android?
> > 
> > Yes, but not all the time. The previous plan was to aggressively SETA (I
> > still don't know what that's an abbreviation for, but the thing where we
> > don't run the tests on every push) them. They should also only be run when
> > js/src is touched by the push. Can we make those things happen and up the
> > chunk count and timeout times at the same time to cut down on costs and
> > intermittents?
> 
> Thanks :shu. Only running when js/src is touched would be an interesting
> optimization. Would you want to see that only for Android, or desktop as
> well? Would you like to see that apply to all jsreftests, or just the
> test262 ones?

It's fine (and preferred, even) for all jsreftests, on all platforms, to be only run when js/src is touched.
(In reply to Joel Maher ( :jmaher) from comment #9)
> :dustin, do you see any faults in selecting running tests when certain files
> are changed.  I assume this is using the 'when' clause which I have bugged
> you about in the past many times.  One fault I see is that on try server
> these will only ever be run if/when we match the files- that could be
> confusing if there is a failure caused by something else.

Yeah, that's always a risk -- if that "something else" is intermittent, then that's one thing (and something SETA should help with).  If that "something else" is a source-code change that wasn't captured by the `when` clause, then that highlights the difficulty of fine-tuning `when` clauses to actually catch every possible related change.

Still, it seems like it's probably worth the risk here!
Flags: needinfo?(dustin)
I have an example of this working:
https://treeherder.mozilla.org/#/jobs?repo=try&author=jmaher@mozilla.com&fromchange=839fa01da7aaa1ff23db2b73778cb2bddf6d28eb&group_state=expanded

There are some issues with the 'when' clause, but this is doable without too much hacking.
the problem with the 'when' clause is that we automatically append a bunch of taskcluster related file patterns to all 'when' clauses, for example:
['js/src/**']

becomes:
 ['js/src/**', u'taskcluster/ci\\test/**', u'taskcluster/taskgraph/**', u'taskcluster/docker/desktop1604-test/**']

while I can see some value in ensuring that all tasks are run properly, we do change taskcluster/ci/test/* often, that will cause us to run the tests very often.

Is there a way for 'tests' that we can avoid adding in these?
Flags: needinfo?(dustin)
On the other hand, a change to taskcluster/ci/test/* does have the capacity to affect these jobs, so *not* running them when that directory changes has the capacity to introduce bustage that doesn't show up on a push.

The idea is this:
  When anything changes that will cause a task to fail, run the task

At the extreme, is the halting problem - in the case described here, it would involve looking at the diff to taskcluster/ci/test/* and figuring out whether those changes could impact this particular test.  We would have even more difficulty with taskcluster/taskgraph/** since it's all Python.

So, we need to make an approximation.  If we under-estimate - fail to run tasks when they would fail - we get hidden bustage and waste sheriff and developer energy chasing it down.  If we over-estimate - run tasks when a detailed analysis can prove they will not fail - then we waste machine resources.  The former is far worse than the latter, so I am a strong advocate of over-estimating, which is what is going on here.

If you see a means of doing a finer analysis that does not cross into under-estimation, I'm open to that.

Note that :gps has done a bunch of thinking on this topic.
Flags: needinfo?(dustin)
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c6ba9e62d8fa45f1cd26617eb3272f22335e2c75

Increase Android opt jsreftest from 10 to 40 chunks.
Increase Android debug jsreftest from 35 to 100 chunks.

This returns run times for most jobs to the 30 - 45 minute range.

I intend not to check-in until the run-only-on-change work is ready.
Attachment #8842487 - Flags: review?(dustin)
:gbrown, do you have any problems with this being run more often than just js/src/* but much less frequent then ALL ?
Flags: needinfo?(gbrown)
No, I think that's fine.
Flags: needinfo?(gbrown)
ok, lets make it happen
Attachment #8842515 - Flags: review?(dustin)
Comment on attachment 8842515 [details] [diff] [review]
add 'when' clause to tests and js/src/** for jsreftests

Review of attachment 8842515 [details] [diff] [review]:
-----------------------------------------------------------------

::: taskcluster/ci/test/tests.yml
@@ +319,5 @@
>                  extra-options:
>                      - --reftest-suite=jsreftest
> +    when:
> +        files-changed:
> +            - "js/src/**"

Please add in js/public/**

cf http://searchfox.org/mozilla-central/source/taskcluster/ci/spidermonkey/kind.yml#30
One could argue for other things like mfbt/** as well. Or the full set at http://searchfox.org/mozilla-central/source/taskcluster/ci/spidermonkey/kind.yml#45

But "one" != "me", because I think of the Android variant of these as low-value, high-cost tests.
thanks :sfink, updated
Attachment #8842515 - Attachment is obsolete: true
Attachment #8842515 - Flags: review?(dustin)
Attachment #8842542 - Flags: review?(dustin)
this would account for jsreftests on all platforms, should we do mfbt/** ?
Attachment #8842542 - Flags: review?(dustin) → review+
Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/52a406607871
only run jsreftests when js/src/* changes. r=dustin
Keywords: leave-open
Attachment #8842487 - Flags: review?(dustin) → review+
Keywords: leave-open
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5ae59bdcd2d4
Add many more test chunks for Android jsreftests; r=dustin
https://hg.mozilla.org/mozilla-central/rev/5ae59bdcd2d4
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 54
Blocks: 1411358
No longer blocks: 1411358
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.