Open Bug 1229549 Opened 4 years ago Updated 4 years ago

a try run of a change that crashes on startup doesn't produce useful diagnostics

Categories

(Testing :: General, defect, major)

defect
Not set
major

Tracking

(firefox45 affected)

Tracking Status
firefox45 --- affected

People

(Reporter: dbaron, Unassigned)

References

Details

(Keywords: meta)

So I just did a try run of a change that makes Firefox crash during startup.  (I did this intentionally, because I wanted to see the crash report for *how* it crashed for a particular change, on the machines in automation.)

In particular, the change I pushed was adding:
  const_cast<uint8_t*>(mFd->mFileData)[offset] = 0;
right before the return at the end of nsZipArchive::GetData in modules/libjar/nsZipArchive.cpp.

When doing this, I would expect that the diagnostics for the failed test runs should indicate that we crashed, and how.  I would expect a "PROCESS-CRASH" with a stack signature to show up in every single build's details, visible in treeherder.

However, of our many test harness results:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a7878dcf8553
only a single one (reftest, including crashtest and jsreftest) produces the correct result:

TEST-UNEXPECTED-FAIL | reftest | application terminated with exit code 1
PROCESS-CRASH | reftest | application crashed [@ nsZipArchive::GetData(nsZipItem *)] 

And all the rest fail in different ways -- many of which are ways that are common intermittent failures that we typically can't do anything about because there aren't any useful diagnostics.

Mochitests report:
 TEST-UNEXPECTED-FAIL | runtests.py | Timed out while waiting for server startup. 

Web platform tests report:
 Test runner failed to initialise correctly; shutting down 

Cpp unit tests report:
 TEST-UNEXPECTED-TIMEOUT | TestAudioEventTimeline.exe | timed out after 900 seconds 

xpcshell tests report:
 TEST-UNEXPECTED-FAIL | dom/base/test/unit/test_error_codes.js | xpcshell return code: 1 

The builder even reported a cryptic failure and turned purple, despite submitting a build to be tested:
 command timed out: 10800 seconds without output running ['c:/mozilla-build/python27/python', '-u', 'scripts/scripts/fx_desktop_build.py', '--config', 'builds/releng_base_windows_32_builds.py', '--config', 'balrog/production.py', '--branch', 'try', '--build-pool', 'production'], attempting to kill 

Marionette reports:
 AssertionError: Timed out waiting for port! 

Talos reports:
 TalosError: browser failed to close after being initialized 

b-m (VideoPuppeteer, whatever that is), reports:
1:00.71 LOG: MainThread ERROR Failure during execution of the playback test.
AssertionError: Timed out waiting for port! 

So it seems like testing that things like a simple startup crash give a correct diagnostic on automation is something that needs to be done, and that these cases need to be fixed.

This bug probably needs dependent bugs filed on fixing the actual problems.
For the mochitests, I suspect that xpcshell is crashing too:

11:55:26     INFO -  MochitestServer : launching [u'C:\\slave\\test\\build\\tests\\bin\\xpcshell.exe', '-g', 'C:\\slave\\test\\build\\application\\firefox', '-v', '170', '-f', 'C:\\slave\\test\\build\\tests\\bin\\components\\httpd.js', '-e', "const _PROFILE_PATH = 'c:\\\\users\\\\cltbld\\\\appdata\\\\local\\\\temp\\\\tmpj0bw6x.mozrunner'; const _SERVER_PORT = '8888'; const _SERVER_ADDR = '127.0.0.1'; const _TEST_PREFIX = undefined; const _DISPLAY_RESULTS = false;", '-f', 'C:\\slave\\test\\build\\tests\\mochitest\\server.js']
11:55:26     INFO -  runtests.py | Server pid: 1200
11:55:26     INFO -  runtests.py | Websocket server pid: 3312
11:55:26     INFO -  runtests.py | SSL tunnel pid: 1960
11:56:56  WARNING -  TEST-UNEXPECTED-FAIL | runtests.py | Timed out while waiting for server startup.

You introduced a browser crash, but you also introduced a web-server crash, and because xpcshell won't start, the harness never gets around to running the browser.

It would be nice if server crashes had more diagnostics (a full crash report?).
Thanks for doing this, this is good information.

Good point about xpcshell Geoff. We should be able to get crash stacks out of there (we do out of xpcshell tests, so why not here?). Maybe we just need to pass in MOZ_CRASHREPORTER=1 and MOZ_CRASHREPORTER_NO_REPORT=1 into it.
Depends on: 1229765
Yeah, we can get crash reports out of xpcshell no problem:
http://hg.mozilla.org/mozilla-central/annotate/f6ac392322b3/testing/xpcshell/runxpcshelltests.py#l899 is how the xpcshell test harness does it.

Note that the xpcshell harness runs some JS to set the minidump path to its temp dir:
http://hg.mozilla.org/mozilla-central/annotate/f6ac392322b3/testing/xpcshell/head.js#l108

A startup crash that happens early enough could fail to run this, meaning the minidump would wind up in the temp dir (this is where the crash reporter writes minidumps before we have a profile). We don't actually handle that properly in the xpcshell harness right now.
You need to log in before you can comment on or make changes to this bug.