Closed Bug 1156425 Opened 5 years ago Closed 3 years ago

Add more Android mochitest/reftest chunks

Categories

(Release Engineering :: General, defect)

ARM
Android
defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: RyanVM, Unassigned)

References

Details

Attachments

(5 files)

We're hitting widespread "application ran for longer than allowed maximum time" failures on mochitests and reftests. Time for more chunks.

Geoff, I assume the story is the same for 4.3 as 2.3?
I think 4.3 is currently better than 2.3 -- I don't see "application ran for longer than allowed maximum time" on 4.3 Opt on try. But maybe that is because we are not running as many 4.3 jobs (ie, only on try), or maybe we are just close to the threshold and luckily avoiding it. 

Regardless, 4.3 and 2.3 times are generally quite similar in my experience, so I think it is best to increase the chunks on 4.3 the same as 2.3.
X2/X3 are typically running in 80+ minutes too as well.
And robocop is in the 60-70 minute range. Maybe we should adjust all the chunks at once.
Today, 4.3 R12 is often exceeding the 60 minute timeout.
Assignee: nobody → gbrown
Let's try this, applied exactly the same on 2.3 and 4.3:

               total-chunks (before) | total-chunks (after)
------------------------------------------------------------
xpcshell                 3           |        6
mochitest               16           |       20
robocop                  4           |        8
reftest                 16           |       20
jsreftest                6           |        8

I am tempted to increase reftest and mochitest to 24 chunks, but am hesitant to introduce *too* many new jobs all at once.
:jmaher -- Are changes to --total-chunks a concern to SETA? Will it adjust "automatically"?
Flags: needinfo?(jmaher)
gbrown, it doesn't adjust automatically, but it will find the chunk with the correct failure if one of them fails going forward.  Overall it does reduce the effectiveness of SETA, but in reality it shouldn't cause any problems!
Flags: needinfo?(jmaher)
Attachment #8598315 - Attachment description: wip - buildbot → Update Android test chunks - buildbot changes
Attachment #8598315 - Flags: review?(kmoir)
Attachment #8598318 - Attachment description: wip - mozharness → Update Android test chunks - mozharness changes
Attachment #8598318 - Flags: review?(kmoir)
Comment on attachment 8598319 [details] [diff] [review]
Update Android test chunks - mozilla-central changes

I notice that --total-chunks is defined in both mozharness and m-c configs. I will look into removing the redundancy in another bug.

:kmoir -- Can you help land these? I think we want to get the buildbot changes into production first, then get the mozharness and m-c changes landed on all trees asap.
Attachment #8598319 - Attachment description: wip - mozilla-central → Update Android test chunks - mozilla-central changes
Attachment #8598319 - Flags: review?(kmoir)
Attachment #8598315 - Flags: review?(kmoir) → review+
Attachment #8598318 - Flags: review?(kmoir) → review+
Attachment #8598319 - Flags: review?(kmoir) → review+
Blocks: 1159947
:ryanvm -- These patches are ready for check-in, but the landing is tricky. The buildbot patch needs a reconfig; after the reconfig, new jobs like mochitest-17 will fail until the mozharness patch is checked in and mozharness is bumped and the mozilla-central patch is checked in. 

Please advise...or make it happen!
Flags: needinfo?(ryanvm)
I think rail is looking to do a reconfig soon. If you get them landed ASAP, I can take care of the mozharness side.
Flags: needinfo?(ryanvm)
Attachment #8598315 - Flags: checked-in+
Attachment #8598318 - Flags: checked-in+
Attachment #8598315 - Flags: checked-in+
Attachment #8598318 - Flags: checked-in+
Backed out. The buildbot-configs change broke: "AssertionError: tst-linux64-ec2-018 has 4368 builders; limit is 4084"

http://hg.mozilla.org/build/buildbot-configs/rev/5fd4cf62d3b3
http://hg.mozilla.org/build/mozharness/rev/8ded7a95435e
Blocks: 1159493
Here's a temporary measure to avoid test failures while we wait for a more permanent solution to job capacity: Increase the maximum job wait for Android tests from 60 minutes to 75 minutes.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=59c2c12fcd07 - notice the green 4.3 R12s.
Attachment #8600311 - Flags: review?(ryanvm)
Comment on attachment 8600311 [details] [diff] [review]
increase android test job timeout to 75 minutes

Review of attachment 8600311 [details] [diff] [review]:
-----------------------------------------------------------------

*sigh*
Attachment #8600311 - Flags: review?(ryanvm) → review+
Keywords: leave-open
Increasing the Android job timeout to 75 minutes seems effective.

We need to come back to this in future to increase chunks and backout 75514dac86e9.
Assignee: gbrown → nobody
Blocks: 1160010
Recall that we previously increased the default Android test job timeout from 60 minutes to 75 minutes, since we could not increase the number of chunks.

We are hitting frequent 75 minute timeouts now, for Android 2.3 mochitest-9 (bug 1160010). I am still reluctant to try to increase chunks until efforts like Android debug reftests are sorted out.

So again, as a temporary measure, I propose the least-worst solution I see: Increase the timeout from 75 minutes to 90 minutes.
Attachment #8653012 - Flags: review?(ryanvm)
Comment on attachment 8653012 [details] [diff] [review]
increase default Android timeout from 75 minutes to 90 minutes

Review of attachment 8653012 [details] [diff] [review]:
-----------------------------------------------------------------

*sigh*
Attachment #8653012 - Flags: review?(ryanvm) → review+
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.