Closed Bug 956842 Opened 10 years ago Closed 10 years ago

Increase b2g desktop mochitest chunk size to 2 due to oversized log

Categories

(Release Engineering :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: ahal, Assigned: ahal)

References

Details

Attachments

(1 file)

The b2g desktop mochitest jobs are running so many tests that we are more or less at the 50mb log limit. We should spend some time looking into ways to make the log smaller, but in the meantime we are starting to see several intermittents. We should chunk the job to get around this in the short term.
Comment on attachment 8356223 [details] [diff] [review]
Patch 1.0 - increase b2g desktop mochitest chunks to 2

lgtm
Attachment #8356223 - Flags: review?(aki) → review+
In production (not CCed).
Sadly the following failures are now being seen after we chunk:
https://tbpl.mozilla.org/php/getParsedLog.php?id=32641891&tree=Mozilla-Inbound

{
9479 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/tests/mochitest/general/test_domWindowUtils.html | Test timed out.
13027 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/base/tests/test_after_paint_pref.html | Test timed out.
13253 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/base/tests/test_bug423523.html | Caret should be at offset 1
13724 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/base/tests/test_bug558663.html | before and middle should not be the same snapshots compare incorrectly. snapshot 1: <snip>
13725 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/base/tests/test_bug558663.html | before and middle2 should not be the same snapshots compare incorrectly. snapshot 1: <snip>
13726 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/base/tests/test_bug558663.html | middle and end should not be the same snapshots compare incorrectly. snapshot 1: <snip>
13727 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/base/tests/test_bug558663.html | middle2 and end should not be the same snapshots compare incorrectly. snapshot 1: <snip>
14122 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/base/tests/test_bug582771.html | Should have got mousemove - got 0, expected 1
14123 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/base/tests/test_bug582771.html | Should have got mousemove - got 0, expected 2
14124 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/base/tests/test_bug582771.html | Shouldn't have got mousemove - got 0, expected 2
14125 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/base/tests/test_bug582771.html | Should have got mousemove - got 0, expected 1
14126 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/base/tests/test_bug582771.html | Should have got mousemove - got 0, expected 3
14127 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/base/tests/test_bug582771.html | Shouldn't have got mousemove - got 0, expected 1
14128 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/base/tests/test_bug582771.html | Should have got mousemove - got 0, expected 4
14191 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/base/tests/test_bug677878.html | Test timed out.
14237 ERROR TEST-UNEXPECTED-FAIL | /tests/layout/base/tests/test_bug761572.html | Test timed out.
14238 ERROR TEST-UNEXPECTED-FAIL | (SimpleTest/TestRunner.js) | 4 test timeouts, giving up.
}
I backed out the patch for now, but we still need to merge it to production.. I'd rather let releng handle that:
http://hg.mozilla.org/build/buildbot-configs/rev/d08f0b276cdc
(In reply to Andrew Halberstadt [:ahal] from comment #6)
> I backed out the patch for now, but we still need to merge it to
> production.. I'd rather let releng handle that:
> http://hg.mozilla.org/build/buildbot-configs/rev/d08f0b276cdc

The reconfig will be happening soon.
Andrew, don't suppose you've had any luck figuring out the test failures? Bug 951771 is occurring extremely frequently now (more so than the in-bug TBPLbot comments would imply), so would be good to get this chunked soon if we can :-)
(In reply to Ed Morley [:edmorley UTC+0] from comment #8)
> Andrew, don't suppose you've had any luck figuring out the test failures?
> Bug 951771 is occurring extremely frequently now (more so than the in-bug
> TBPLbot comments would imply), so would be good to get this chunked soon if
> we can :-)

I decided to go the route of hiding subtests instead of this in bug 957768. That bug should go live when the next reconfig happens.. though I still haven't seen it working in the wild (waiting for some jobs to finish on Cypress) so I may need some follow-up fixes.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
Great, thank you :-)
Going to reopen this except to see if it improves the situation in bug 967588 at all.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Blocks: 967588
No longer blocks: 951771
Oh, apparently they were already unhidden, the other bug just hadn't gotten resolved. We may still want to revisit this in the future, but re-resolving for now.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → WONTFIX
Nope and nope - we haven't permanently switch all slaves back to m1, and the switch to m1 didn't actually entirely cure this - e.g. https://tbpl.mozilla.org/php/getParsedLog.php?id=34483029&tree=Mozilla-Inbound on a m1 instance. Somebody would have to carefully count, counting the percentage run on -spot- versus -ec2- and the percentage failing on each to see whether a permanent switch back to m1 would make them meet the letter of the law with a 9.9% failure rate, but they still fail by hanging in random tests.
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
(In reply to Phil Ringnalda (:philor) from comment #13)
> Nope and nope - we haven't permanently switch all slaves back to m1, and the
> switch to m1 didn't actually entirely cure this - e.g.
> https://tbpl.mozilla.org/php/getParsedLog.php?id=34483029&tree=Mozilla-
> Inbound on a m1 instance. Somebody would have to carefully count, counting
> the percentage run on -spot- versus -ec2- and the percentage failing on each
> to see whether a permanent switch back to m1 would make them meet the letter
> of the law with a 9.9% failure rate, but they still fail by hanging in
> random tests.

On b2g-inbound, there have been 0 failures of this type on Linux64 on spot instances; all the failures (other than the one that resulted in a backout) have been on on-demand instances.  I think switching the on-demand instance type to m1.medium will allow us to unhide these.
(In reply to Jonathan Griffin (:jgriffin) from comment #14)
> 
> On b2g-inbound, there have been 0 failures of this type on Linux64 on spot
> instances; all the failures (other than the one that resulted in a backout)
> have been on on-demand instances.  I think switching the on-demand instance
> type to m1.medium will allow us to unhide these.

This is since yesterday morning, I should add.
Right. It's more like a couple a day on mozilla-inbound (plus the hangs in a non-random test), but we've got pretty good momentum going for starring things as just "b" and not retriggering.
Status: REOPENED → RESOLVED
Closed: 10 years ago10 years ago
Resolution: --- → WONTFIX
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: