Closed Bug 1168222 Opened 5 years ago Closed 2 years ago

b2g mochitests running style/ tests mostly time out

Categories

(Core :: CSS Parsing and Computation, defect)

ARM
Gonk (Firefox OS)
defect
Not set

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox41 --- affected

People

(Reporter: philor, Unassigned)

References

Details

Attachments

(2 obsolete files)

They've been hidden for months, because no matter what happened in what test the suites were taking more than two hours to run. Bug 1099195 "fixed" that by giving them four hours to run.

Now debug mochitest-20 nearly always dies after 1000 seconds without output in test_extra_inherit_initial.html, https://treeherder.mozilla.org/logviewer.html#?job_id=1552679&repo=mozilla-central, though there are rare intermittent greens like http://ftp.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/mozilla-central-emulator-debug/1432426221/mozilla-central_ubuntu64_vm-b2g-emulator-debug_test-mochitest-debug-20-bm113-tests1-linux64-build52.txt.gz where it sneaks through in 910724ms. If you skip it, https://treeherder.mozilla.org/#/jobs?repo=try&revision=ac06d9ec0734&exclusion_profile=false, then some combination of test_property_syntax_errors.html and test_logical_properties.html time out too often.

Opt mochitest-9, lacking the skip-ifs that debug got in the initial triage round, mostly times out in test_value_cloning.html and test_value_computation.html, https://treeherder.mozilla.org/logviewer.html#?job_id=1552414&repo=mozilla-central, though it will sometimes in addition hit 1000 seconds without output in test_value_storage.html, https://treeherder.mozilla.org/logviewer.html#?job_id=1545782&repo=mozilla-central
Blocks: 1168224
See Also: → 1186219
Quoting myself from bug 1186224 comment 3 (I initially posted there when I probably should've posted here):
============
I think we may just need to periodically yield to allow for GC. Otherwise, the garbage that we create as we process each property just piles up.  (This isn't necessarily bad; we do probably save some time by plowing through all the properties & doing a single mega-GC at the end of the test. But if we're hitting memory limits on resourse-constrained test platforms, then we should allow for periodic GC's.)

I've got some patches locally which restructure test_extra_inherit_initial.html to allow for this. I'm running them through Try and I'll probably end up posting them here.
This refactors the first test here to use Promises for iteration across the properties, with the help of Array.map().

(This is matching a pattern for Promise-driven iteration that I found online. Basically, each call to the map function receives a promise that represents the previous batch of work.  So, each property can set up its work in a new Promise that's chained to the previous property's work. See comments in test for more details.)

This patch doesn't make anything asynchronous yet -- this is just restructuring the test, while leaving it synchronous. I'll make it asynchronous (occasionally) to allow for GC in the next patch here.
(Sorry, when I said "map" in previous comment, I really meant to say "reduce".)
Here's part 2, which makes us handle properties in batches, and make the "test next property" call asynchronous at the end of each batch so that we can GC.

Taken to the extreme, with a batch-size of 1 (which should allow for maximal GC opportunities), this does reduce peak memory usage locally (from ~3.2% of my 16GB of RAM to ~2.5%). However, it doesn't help mochitest-debug-25 like I'd hoped it would -- we still end up with "timed out after 1000 seconds of no output" in this test in 7 out of 8 test runs:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d13881fecce5&exclusion_profile=false&filter-ref_data_name=mochitest-debug-25

So either the timeout is non-memory related, or the GC opportunities that this refactoring adds aren't sufficient, I guess.
See Also: → 1186224
Looks like the flexbox tests can cause similar memory usage spikes as well.
Attachment #8636940 - Attachment description: part 2: make test → part 2: make test periodically yield to allow for GC
(In reply to Daniel Holbert [:dholbert] from comment #4)
> However, it doesn't help mochitest-debug-25 like
> I'd hoped it would -- we still end up with "timed out after 1000 seconds of
> no output" in this test in 7 out of 8 test runs:

Hmm, so those results were from a Try run with an additional patch that allowed for extremely-frequent GCs (after each property). In a different Try run with *just* the attached patches (yielding after every 50 properties), things actually looks pretty good:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f65f73d61709&exclusion_profile=false&filter-ref_data_name=mochitest-debug-25

So there may be hope in this strategy after all.  Though I'm not sure why the try run from comment 4 would be more failure-prone.
Depends on: 1187038
(In reply to Phil Ringnalda (:philor) from comment #0)
> Now debug mochitest-20 nearly always dies after 1000 seconds without output
> in test_extra_inherit_initial.html, [...] though there are rare intermittent greens like
> [...] where it sneaks through in 910724ms.

So, it turns out the problem here is kinda silly, and has nothing to do with memory usage. It's a product of these factors:
 (1) This test just *takes a long time* on B2G debug (on the order of 1000 sec).
 (2) Mochitest TEST-PASS test noise doesn't get logged anymore.
 (3) So, from mozharness's perspective, we appear to be silent while this test is running.
 (4) ...and it kills us after 1000 seconds because it assumes something must've gone horribly wrong.

I filed bug 1187038 on this general problem.
(FWIW, my mysteriously-successful Try run in comment 6 managed to avoid dying in test_extra_inherit_initial.html simply because of some periodic "dump()" statements that I added there, which pacified mozharness.)
Attachment #8636861 - Attachment is obsolete: true
Attachment #8636940 - Attachment is obsolete: true
So as far as I know, the "memory usage spikes" mentioned in comment 5 aren't actually problematic after all (or at least, they're not making us hang for 1000 seconds or anything like that).
(Also FWIW, I've filed bug 1187110 on removing some redundant work in test_extra_inherit_initial.html, which should make that test no longer run up against the 1000s timeout.)
Depends on: 1187110
Mass closing as we are no longer working on b2g/firefox os.
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → WONTFIX
Mass closing as we are no longer working on b2g/firefox os.
You need to log in before you can comment on or make changes to this bug.