Closed Bug 1440719 Opened 2 years ago Closed 2 years ago

"application timed out after 370 seconds with no output" sometimes caused by "Gah. Your tab just crashed." (MOZ_CRASHREPORTER_SHUTDOWN not set)

Categories

(Testing :: General, enhancement, P2)

Version 3
enhancement

Tracking

(firefox61 fixed)

RESOLVED FIXED
mozilla61
Tracking Status
firefox61 --- fixed

People

(Reporter: gbrown, Assigned: gbrown)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Bug 1290856 and others collect test failure reports for "application timed out after 370 seconds with no output" -- when the harness notices that a test has apparently hung.

Many current failures in bug 1290856 have a screenshot showing a tab crash "Gah. Your tab just crashed." Tests hang because the harness is waiting for that tab crash dialog.

:whimboo points out that we can likely handle this much better by setting MOZ_CRASHREPORTER_SHUTDOWN: 

https://bugzilla.mozilla.org/show_bug.cgi?id=1339568#c71
See Also: → 1388249
Please note that mozcrash should usually also pick up the minidumps from those content crashes.
Setting MOZ_CRASHREPORTER_SHUTDOWN produces good crash reports and avoids the timeouts in linux mochitest-media (bug 1290856 and others):

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e776f29b7fda88dd744b54df192e6b8a3a01d749
...but triggers a variety of failures in mochitest-bc tests:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=35291e99bdb825c190127be5e905dc35378468b5
See Also: → 1378743
Each of those cases has actually two assertions logged before that leak warning/error. So it might have revealed an underlying problem.
Priority: -- → P2
There are mochitest-browser-chrome tests that actually test content process crashes, so I don't know that we can use it there. I'd say enable it for other suites and we can file a followup to do something smarter for browser-chrome (like create an API for those tests to disable the behavior for the duration of the test).
(In reply to Henrik Skupin (:whimboo) from comment #1)
> Please note that mozcrash should usually also pick up the minidumps from
> those content crashes.

This does seem to be working as designed. Looking at a random log from the latest orangefactor link in that bug:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1290856&startday=2018-02-26&endday=2018-03-04&tree=all

shows:
https://treeherder.mozilla.org/logviewer.html#?repo=autoland&job_id=164256106&lineNumber=5262

[task 2018-02-26T02:25:32.321Z] 02:25:32     INFO - PROCESS-CRASH | Main app process exited normally | application crashed [@ mozilla::NrUdpSocketIpc::create]
Blocks: 1443327
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #5)
> I'd say enable it for
> other suites and we can file a followup to do something smarter for
> browser-chrome

That seems very sensible -- thanks! Follow-up bug 1443327 filed and referenced by this patch.

Seems okay on try now:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=561f1ec2d0914f8b9ae32a14c5f613cef8fe99d8
Attachment #8956250 - Flags: review?(ted)
Attachment #8956250 - Flags: review?(ted) → review+
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1383d36fdabf
Set MOZ_CRASHREPORTER_SHUTDOWN during tests, other than browser-chrome; r=ted
Summary: "application timed out after 370 seconds with no output" sometimes caused by "Gah. Your tab just crashed." → "application timed out after 370 seconds with no output" sometimes caused by "Gah. Your tab just crashed." (MOZ_CRASHREPORTER_SHUTDOWN not set)
Thanks. The failure is a plain mochitest, dom/ipc/tests/test_CrashService_crash.html and the log contains:

  A content process crashed and MOZ_CRASHREPORTER_SHUTDOWN is set, shutting down

I guess that makes sense. :(
Flags: needinfo?(gbrown)
Excluding both plain and browser-chrome mochitests from MOZ_CRASHREPORTER_SHUTDOWN seems a bit much. Perhaps:

 - move this test to browser-chrome?
 - move this test and browser-chrome crash-service tests to a separate subsuite?
With this patch, I get an error every time I try to run mochitests locally:

The details of the failure are as follows:

KeyError: 'MOZ_CRASHREPORTER_SHUTDOWN'

  File "/home/kris/code/mozilla-central/testing/mochitest/mach_commands.py", line 422, in run_mochitest_general
    **harness_args)
  File "/home/kris/code/mozilla-central/testing/mochitest/mach_commands.py", line 152, in run_desktop_test
    result = mochitest.run_test_harness(parser, options)
  File "/home/kris/code/mozilla-central/obj-package/_tests/testing/mochitest/runtests.py", line 3074, in run_test_harness
    result = runner.runTests(options)
  File "/home/kris/code/mozilla-central/obj-package/_tests/testing/mochitest/runtests.py", line 2605, in runTests
    res = self.runMochitests(options, tests_in_manifest)
  File "/home/kris/code/mozilla-central/obj-package/_tests/testing/mochitest/runtests.py", line 2400, in runMochitests
    result = self.doTests(options, testsToRun)
  File "/home/kris/code/mozilla-central/obj-package/_tests/testing/mochitest/runtests.py", line 2705, in doTests
    debuggerInfo is not None)
  File "/home/kris/code/mozilla-central/obj-package/_tests/testing/mochitest/runtests.py", line 1655, in buildBrowserEnv
    del browserEnv["MOZ_CRASHREPORTER_SHUTDOWN"]
Command exited with status 1
Thanks :kmag. I'll be sure to fix that before pushing again.
We should move that test to browser-chrome, preferably. Plain Mochitests shouldn't be trying to test things like that. It looks like we'll also have problems with at least a few mochitest-chrome tests:
https://dxr.mozilla.org/mozilla-central/rev/bccdc684210431c233622650a91454c09f6af9eb/dom/plugins/test/mochitest/chrome.ini
Honestly I think this is important enough to fix that I would sign off on disabling the tests that break with this enabled for now, re-landing this patch, and moving the tests to browser-chrome or whatever afterwards (as long as someone signs up to do the work and we don't wind up with the tests permanently disabled).
The only plain-mochitest that fails with MOZ_CRASHREPORTER_SHUTDOWN is dom/ipc/tests/test_CrashService_crash.html. This test only runs on non-e10s. These days we only run non-e10s plain-mochitest on linux32/debug. I cannot move the test to browser-chrome (where we have other crash service tests) because it wouldn't run at all: we do not run non-e10s browser-chrome anywhere.

:adw -- It has been a few years, but it looks like you wrote test_CrashService_crash.html. 
:jimm -- Checking in with you as you are the triage owner for DOM Content Processes.
:jmaher -- fyi, in case you have an interest or a bright idea.

Can anyone suggest what we should do with this test? Since it is non-e10s only, I find myself wondering if test_CrashService_crash.html could be retired (disabled or deleted). If it still needs to run, on non-e10s, I think it needs to be moved to a new mochitest-plain subsuite, run without MOZ_CRASHREPORTER_SHUTDOWN, only on non-e10s; I'm pretty sure it would be the only such test in the new subsuite.
Flags: needinfo?(jmathies)
Flags: needinfo?(jmaher)
Flags: needinfo?(adw)
Shouldn't MOZ_CRASHREPORTER_SHUTDOWN only have an effect for content processes? We use it only here:

https://dxr.mozilla.org/mozilla-central/source/browser/modules/ContentCrashHandlers.jsm#152

So how could a non-e10s test job fail when no content process is around? Or do I miss something?
Wait, surely you're reading that wrong and it's disabled for *non*-e10s, right?
https://hg.mozilla.org/mozilla-central/annotate/d6957f004e9cc3d7408ac3a8f2b49ff97556e27f/dom/ipc/tests/mochitest.ini#l18

That would make way more sense, given that it's testing content crashes. Also, this should definitely be a chrome or browser-chrome test, it's doing all sorts of chrome things.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #18)
> Wait, surely you're reading that wrong and it's disabled for *non*-e10s,
> right?
> https://hg.mozilla.org/mozilla-central/annotate/
> d6957f004e9cc3d7408ac3a8f2b49ff97556e27f/dom/ipc/tests/mochitest.ini#l18

It's not easy to parse:

  skip-if: !(crashreporter && !e10s && (toolkit == 'gtk3' || toolkit == 'cocoa' || toolkit == 'windows'))

but I read that as:

  run-if: crashreporter && non-e10s && any-desktop-platform


I checked a sampling of test logs and found it is skipped on e10s test tasks. For example:

https://taskcluster-artifacts.net/JsK3lE2pQ4uAoj8DMVVxig/0/public/logs/live_backing.log

  03:46:27     INFO - TEST-START | dom/ipc/tests/test_CrashService_crash.html
  03:46:27     INFO - TEST-SKIP | dom/ipc/tests/test_CrashService_crash.html | took 0ms


Also, the failure that caused the backout - comment 9 - was linux32/debug non-e10s mochitest-5...and I think that was the only (non-intermittent) failure on that push.
You are correct and that conditional is extremely hard to parse. It looks like this became confusing when we removed `run-if`--this was originally a `run-if` condition that got rewritten to `skip-if`:
https://hg.mozilla.org/mozilla-central/diff/451dda813637/dom/ipc/tests/mochitest.ini

I guess that makes sense--it's a plain mochitest that explicitly spawns a content process, so that wouldn't work in e10s where the test itself would be running in a content process.

It should be possible to rewrite this as a browser-chrome test though, since we have helpers to run things in tabs etc there, like:
https://dxr.mozilla.org/mozilla-central/rev/d6957f004e9cc3d7408ac3a8f2b49ff97556e27f/browser/components/sessionstore/test/browser_crashedTabs.js#142

Running it in e10s mode in browser-chrome is fine, since it actually does want a content process.
I could get on board with converting this to browser-chrome.
Flags: needinfo?(jmaher)
(In reply to Geoff Brown [:gbrown] from comment #16)
> :adw -- It has been a few years, but it looks like you wrote

Yes, IIRC Ted is right:

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #20)
> I guess that makes sense--it's a plain mochitest that explicitly spawns a
> content process, so that wouldn't work in e10s where the test itself would
> be running in a content process.
Flags: needinfo?(adw)
I'm going to push ahead with the non-problematic cases: everything except browser-chrome and non-e10s. That's good enough for now.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=17bfa3ab85b2c6e9df5fa9280f0cfa5c1883eace
Flags: needinfo?(jmathies)
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1681780e0715
Set MOZ_CRASHREPORTER_SHUTDOWN during tests, other than browser-chrome and non-e10s; r=ted
https://hg.mozilla.org/mozilla-central/rev/1681780e0715
Status: NEW → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
You need to log in before you can comment on or make changes to this bug.