Closed Bug 932350 Opened 7 years ago Closed 5 years ago

Re-enable dom-level* tests on opt builds when they aren't timeout-prone

Categories

(Core :: DOM: Core & HTML, defect)

x86_64
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
firefox28 --- disabled
firefox29 --- disabled
firefox30 --- disabled
firefox48 --- fixed
b2g-v1.3 --- disabled
b2g-v1.4 --- disabled

People

(Reporter: RyanVM, Assigned: RyanVM)

References

(Blocks 2 open bugs)

Details

(In reply to Joel Maher (:jmaher) from bug 921635 comment #62)
> I propose:
> * disabling the tests as they were
> * filing a bug to enable them (treat them as new tests)
> * file a bug to get valid crash stacks

The dom-level1 and dom-level2 test suites were disabled in bug 921635. This bug tracks re-enabling them once the cause of the problem is found and fixed.

Bug 932349 tracks getting usable crash stacks from the timeouts so we can figure out where the hang is occurring.
See Also: → 910229
These are timeout-prone on every other platform (and I believe OOM/fragmentation prone). And we believe that the fallout of these tests makes the remaining tests that run more unreliable as well. We haven't gotten anywhere with getting these tests fixed (and they've been disabled on Android and OSX for awhile now) and they continue to be a constant source of problems anywhere they run. I've disabled them on all platforms.

Johnny, can you help find someone to own getting these re-enabled?

https://hg.mozilla.org/integration/mozilla-inbound/rev/8b8ecfe02a90
Blocks: 941443, 910229
Flags: needinfo?(jst)
Summary: Re-enable dom-level1 and dom-level2 tests on OSX when they don't cause frequent timeouts → Re-enable dom-level* tests on all platforms when they don't cause frequent timeouts
Whiteboard: [leave open]
Was this a test-disabling triggered by what was fixed in bug 969590, or was it actually problematic enough to disable prior to that?
This has been problematic on all platforms for a long time. See comment 1 and the dependencies.
Per a side-channel discussion with bz, re-enabling these on debug since the timeouts appear to only affect opt builds.
https://hg.mozilla.org/integration/mozilla-inbound/rev/a01884c4c877

Note that these will remain disabled on all Android builds via android.json.
Summary: Re-enable dom-level* tests on all platforms when they don't cause frequent timeouts → Re-enable dom-level* tests on opt builds when they aren't timeout-prone
Whiteboard: [leave open] → [tests disabled on debug][leave open]
Curious to see how things look w/ GGC these days:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=c8bbd84e5e25
Flags: needinfo?(jst)
Whiteboard: [tests disabled on debug][leave open] → [tests disabled on opt, Android, and B2G]
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #10)
> Fun, Android has a perma-fail.
> https://treeherder.mozilla.org/ui/logviewer.html#?job_id=2106707&repo=try

Affects B2G as well. I've left that disabled and filed bug 1075071 for it.
Based on these Try runs:
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=c8bbd84e5e25
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=8f3bf1e16c97
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=7d065ad7f5ff

The Good:
* Desktop opt looks solidly green with copious M3 retriggers to back it up.
* Android's pretty green across the board as well (by Android standards). Same for B2G opt (again, by B2G standards). (As much as can be when Android/B2G have been dominating the OrangeFactor top 10 for some time now without much sign of improvement on the immediate horizon)

The Bad:
* Bug 1075071 tracks Android/B2G/Mulet perma-fail in test_innerWidthHeight_script.html. Initial IRC analysis by wesj and bz is that it appears there may be a legitimate bug being caught in these failures. We can disable it for now on the affected platforms to get around it, but someone may want to try to find time to investigate.
* Some of the dom-level tests run under M4 on Mulet, which is currently hidden by default due to permafail. At this point, I'm unaware of anyone actively working to change that. So we'll get partial coverage there, which is better than none I guess.
* ASAN mochitest-3 now takes about 2hr to run, which is less than stellar. It currently takes ~15min.

The Ugly:
* Thanks to the fact that Android and B2G still use our old-school chunking algorithm, enabling these tests on those platforms has a significant impact on the distribution of tests across chunks. The worst example of this is B2G debug mochitest-4, which is perma-timeouts(all of content/media being in there certainly isn't helping). I haven't analyzed the runtimes for other chunks in depth, but I fully expect this has had some real impacts on average chunk runtimes (the ASAN numbers above back that up). We can play with the number of chunks to try to smooth things out, but that's a pretty painstaking process and is sure to leave things in a fragile state. But it also may be time to consider increasing the number of chunks on desktop as well.

The "Eh?":
* I haven't retriggered the other mochitest-plain chunks besides the ones where dom-level* are actually running. It's quite conceivable that moving tests from one chunk to another will affect the failure rate of other tests as well (as evidenced by B2G debug M4 example above, for example). Given where we are in the current release cycle and my complete lack of confidence in any such fallout being addressed in a timely fashion (I mean come on, this bug has sat unprioritized for how long?), I'm inclined to wait until after the next uplift to land any re-enabling patches. At least then we'll have a full 6 weeks to deal with any fallout before having to worry about it hitting other branches as well.

Fun observation, but not necessarily directly related:
* I also happened to notice that Linux64 mochitest-3 (both opt and debug) appears to have a bimodal runtime distribution of either ~1hr or ~2hr. Don't suppose anybody's eager to investigate that, are they?
Re-enabled on desktop builds. Going to hold off on Android/B2G until we can better address the chunking issues they present.
https://hg.mozilla.org/integration/mozilla-inbound/rev/f8bb9368beb1
Assignee: nobody → ryanvm
Whiteboard: [tests disabled on opt, Android, and B2G] → [tests disabled on Android and B2G][leave open]
As a quick update, this bug is still on my radar, but Android/B2G mochitest continue to have significant issues with respect to runtime and buildbot chunking limits. Both platforms are being transitioned to Taskcluster, which makes adding more chunks significantly easier. I expect that we'll be able to re-enable these tests once that transition takes place.
Not too bad on Android these days. We could probably up the chunks from 16 to 20 and re-enable these now. I'm explicitly not worrying about B2G anymore.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=76172ba894f4&group_state=expanded
Depends on: 1261641
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Whiteboard: [tests disabled on Android and B2G][leave open]
Target Milestone: --- → mozilla48
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.