High frequency C1 failures on Fennec with C++ APZ

RESOLVED FIXED in Firefox 45

Status

()

Firefox for Android
Toolbar
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: kats, Assigned: kats)

Tracking

unspecified
Firefox 45
All
Android
Points:
---

Firefox Tracking Flags

(firefox45 fixed)

Details

Attachments

(1 attachment)

With APZ enabled on Fennec, the C1 crashtest chunk fails quite frequently, with 321077-2.xul timing out. I did a try push with more logging [1] and the log [2] has this:

20:56:59     INFO -  REFTEST INFO | [CONTENT] AfterPaintListener in data:text/html,elsewhere
20:56:59     INFO -  REFTEST INFO | [CONTENT] AfterPaintListener in data:text/html,elsewhere
20:56:59     INFO -  REFTEST INFO | [CONTENT] MakeProgress: apz-repaints-flushed fired
20:56:59     INFO -  REFTEST INFO | [CONTENT] MakeProgress: STATE_WAITING_TO_FINISH
20:56:59     INFO -  REFTEST INFO | [CONTENT] OnDocumentLoad fired for previous document
20:56:59     INFO -  REFTEST INFO | [CONTENT] AfterPaintListener in data:text/html,elsewhere
20:56:59     INFO -  REFTEST INFO | [CONTENT] AfterPaintListener in data:text/html,elsewhere

while the logcat [3] has this:

11-20 20:56:40.350   731   751 E GeckoConsole: [JavaScript Error: "TypeError: can't access dead object" {file: "chrome://reftest/content/reftest-content.js" line: 348}]

What I suspect is happening is that the test redirects to data:text/html,elsewhere and it sits there longer than it does without APZ because it's waiting for an APZ flush. When the APZ flush finishes, the old page has been GC'd already, resulting in the "can't access dead object" error which aborts the progress in reftest-content.js.

We see the "can't access dead object" in other tests as well, but I think the key difference is that in this test, we end up at a synthetic page (the data: url) which doesn't trigger any further events. Therefore none of the listeners in reftest-content.js run to continue the progress, so the entire harness just stalls. I'm not 100% sure on this though. There appears to be other weirdness in the run, in particular to do with the 783041-2.html test where the number of AfterPaintListeners increases dramatically; I think the harness isn't properly dealing with the page reloading itself a bunch of times.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=989eb16aff9e
[2] http://archive.mozilla.org/pub/mobile/try-builds/kgupta@mozilla.com-989eb16aff9e57a98a0525ec18e921c9a933bc74/try-android-api-11/try_ubuntu64_vm_armv7_large_test-crashtest-1-bm53-tests1-linux64-build565.txt.gz
[3] http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/2b7f9decee267f6e1928dc58c36b9d1aea6db0db16bbe729bf51a0ade86271f0e66ce1cf13eae0475731081d169d2e63a83882ff6c2d447ed24a8f7d40a768f8
Created attachment 8691108 [details] [diff] [review]
Patch

After poring over the C1 verbose logs with both APZ enabled [1] and disabled [2] I came to the conclusion that the crashtests are fundamentally different from regular reftests in that they do different things. In particular, they can do things like reload themselves [3] or redirect to invalid URLs [4]. Since the intent of the APZ flush is to flush out any pending repaints before the page is snapshotted, and crashtests don't get snapshotted, the APZ flush is unnecessary here. In fact it's problematic because it can cause the harness to wait unnecessarily for the page to "finish painting" when in fact the page has no intention of doing anything of the sort.

So my solution is to simply not do the APZ flush for tests that aren't going to get snapshotted. Try pushes at [5] (for Fennec, with rbarker's patches) and [6] (on vanilla m-c). Will request review if everything looks green.

[1] http://archive.mozilla.org/pub/mobile/try-builds/kgupta@mozilla.com-989eb16aff9e57a98a0525ec18e921c9a933bc74/try-android-api-11/try_ubuntu64_vm_armv7_large_test-crashtest-1-bm53-tests1-linux64-build565.txt.gz
[2] http://archive.mozilla.org/pub/mobile/try-builds/kgupta@mozilla.com-6584261e85ea72d3149e1bc98eb28e8ecd730838/try-android-api-11/try_ubuntu64_vm_armv7_large_test-crashtest-1-bm120-tests1-linux64-build337.txt.gz
[3] http://hg.mozilla.org/mozilla-central/file/d3d286102ba7/gfx/tests/crashtests/783041-2.html#l57
[4] http://hg.mozilla.org/mozilla-central/file/d3d286102ba7/layout/base/crashtests/321077-2.xul#l15
[5] https://treeherder.mozilla.org/#/jobs?repo=try&revision=c05b0682d846
[6] https://treeherder.mozilla.org/#/jobs?repo=try&revision=cc03be950e6a
Comment on attachment 8691108 [details] [diff] [review]
Patch

Review of attachment 8691108 [details] [diff] [review]:
-----------------------------------------------------------------

Try looks about as green as it usually gets these days.
Attachment #8691108 - Flags: review?(dbaron)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #1)
> can do things like reload themselves [3] or redirect to invalid URLs [4].

Well, they can't -- any attempts to do things like this massively confuse the harness.  They just sometimes manage not to cause orange.  We should file bugs on such things... and maybe even modify the harness to make those things errors.
Comment on attachment 8691108 [details] [diff] [review]
Patch

That said, this patch seems fine.
Attachment #8691108 - Flags: review?(dbaron) → review+

Comment 5

2 years ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/cbb7d4c40875

Comment 6

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/cbb7d4c40875
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years ago
status-firefox45: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 45
You need to log in before you can comment on or make changes to this bug.