Closed Bug 1059990 Opened 11 years ago Closed 8 years ago

Improve hang reporting for Android tests

Categories

(Firefox for Android Graveyard :: Testing, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: gbrown, Assigned: gbrown)

Details

Attachments

(1 file)

In theory, when the browser hangs during an Android test, the test harness will detect that there has been no output for N seconds, report an error, and shut down the browser first with kill -3 (to produce an ANR report), then with kill -6 (to produce a crash report) and finally with kill -9 (to ensure the process is killed). See http://hg.mozilla.org/mozilla-central/annotate/47c9418fbc28/build/mobile/remoteautomation.py#l366. In bug 1054292, there are now hundreds of Android 4.0 test hang reports, but none of them seem very effective. The kill -3/-6/-9 procedure appears to be happening, but: - ANR reports are usually (always?) not generated; - .dmp files are usually created but are often corrupt, so no crash report is generated; - when a crash report is generated, the stack for the known problem (bug 1059797) is not reported. Can we improve the existing mechanism? Can we leverage Fennec's telemetry hang reporting?
The existing system seems to work better on Android 2.3. Consider: https://tbpl.mozilla.org/php/getParsedLog.php?id=46969350&tree=Mozilla-Inbound which produced both an ANR report and a crash report.
Maybe this is not as bad as I thought. Consider https://tbpl.mozilla.org/?tree=Try&rev=bc4330803c2d, a try push that produced a different hang. These failures seem to produce ANR reports and crash reports more often than in bug 1054292. Also, investigation of bug 1059797 now points to the compositor thread as the real problem; you can see the problematic stack in many of the crash reports in bug 1054292.
One of the drawbacks of Gecko hang monitoring is that it needs the Gecko thread running in order to output data through telemetry. Maybe we can add an asynchronous mechanism to it so it can output data on its own. The ANR reporter works on its own. I don't know if its output will be useful, but you can find it at /data/data/PACKAGE/files/mozilla/PROFILE/saved_telemetry_pings/. Also, I think it'll be great if we can somehow attach GDB to a hanging process, dump all the thread stacks, and quit. For native hangs, GDB traces will be a lot better than what we get through ANR logs.
(In reply to Jim Chen [:jchen :nchen] from comment #3) > /data/data/PACKAGE/files/mozilla/PROFILE/saved_telemetry_pings/. Oops -- seems like it is actually saved-telemetry-pings! (- vs _)
(In reply to Geoff Brown [:gbrown] (PTO Sept 15 - Oct 7) from comment #4) > (In reply to Jim Chen [:jchen :nchen] from comment #3) > > /data/data/PACKAGE/files/mozilla/PROFILE/saved_telemetry_pings/. > > Oops -- seems like it is actually saved-telemetry-pings! (- vs _) Ah you're right! Sorry! See [1] for the format of the JSON files. I'm still not sure if we do generate ANR reports during test hangs. I hope we do! [1] http://mxr.mozilla.org/mozilla-central/source/mobile/android/base/ANRReporter.java#259
Attached patch work in progressSplinter Review
Here's my work in progress patch. You can see it running at https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=4b74c73bc4bc. I have reproduced the "CreateShader" hang a couple of times but never found an anr report. It needs to be tested more. mochitest-1 and some robocop runs often create a file in saved-telemetry-pings but these have "reason":"saved-session". For example, http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/26b78c624ee5cc4c28472ec5579a41a7442d759d3a0403a07386f1849dfc8b57792a085efc31b9b9cbfafbb617d6d3c3c263d41a2578e65544b188f33793d445 I suppose we should pull any file found in saved-telemetry-pings, then open the file, check the reason, and discard anything that is not "android-anr-report". I won't get back to this for several weeks -- feel free to take this bug.
Assignee: nobody → gbrown
Status: NEW → RESOLVED
Closed: 8 years ago
Component: General → Testing
Product: Testing → Firefox for Android
Resolution: --- → WONTFIX
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: