Closed Bug 1765785 Opened 3 years ago Closed 2 months ago

Intermittent dom/ipc/tests/test_process_error.xhtml | single tracking bug

Categories

(Core :: DOM: Content Processes, defect, P3)

defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: jmaher, Unassigned)

References

Details

(Keywords: intermittent-failure, intermittent-testcase, Whiteboard: [retriggered])

Attachments

(1 obsolete file)

No description provided.

Hi :jmaher, is this supposed to be solved by whatever fix bug 1544147 will get?

Flags: needinfo?(jmaher)

yes- we are adjusting our bugs to get rid of the raw error in the bug title so we can automate our sheriffing process easier.

here is how to find out what errors are related to this bug:

/mach test-info failure-report --bugid 1765785
22 errors with:
08:01:16     INFO - TEST-UNEXPECTED-FAIL | dom/ipc/tests/test_process_error.xhtml | Test timed out. -
08:01:17     INFO - TEST-UNEXPECTED-FAIL | dom/ipc/tests/test_process_error.xhtml | [SimpleTest.finish()] No checks actually run. (You need to call ok(), is(), or similar functions at least once.  Make sure you use SimpleTest.waitForExplicitFinish() if you need it.)
08:01:18    ERROR - TEST-UNEXPECTED-FAIL | chrome://mochitests/content/chrome/dom/ipc/tests/test_process_error.xhtml logged result after SimpleTest.finish(): [SimpleTest.finish()] No checks actually run. (You need to call ok(), is(), or similar functions at least once.  Make sure you use SimpleTest.waitForExplicitFinish() if you need it.)

  macosx1015-64-qr/debug-mochitest-chrome: 1
  macosx1015-64-qr/opt-mochitest-chrome: 3
  macosx1015-64-qr/debug-mochitest-chrome-spi-nw: 11
  macosx1015-64-shippable-qr/opt-mochitest-chrome-spi-nw: 2
  macosx1015-64-shippable-qr/opt-mochitest-chrome: 2
  macosx1015-64-qr/opt-mochitest-chrome-spi-nw: 3

in this report, there is one error block which happens on the configurations listed. If there was a second type of error, then it would show as a different block of lines.

Flags: needinfo?(jmaher)

There seem to be two different issues at work here and in bug 1544147 though the root cause might be the same. In some cases the test hits the point where the child process should crash but no minidump appears to be generated and the parent process receives an ipc:content-shutdown message with an empty dump ID. We also find this line in the output:

JavaScript error: chrome://mochitests/content/chrome/dom/ipc/tests/process_error.xhtml, line 38: NS_ERROR_NOT_AVAILABLE: Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsIPropertyBag2.getPropertyAsBool]

The missing property indicates that CrashReporterHost::GenerateCrashReport() failed. That function has several failure modes but the most common is that minidump generation has failed and we couldn't find a dump. Indeed there's no dump left behind which suggests this was the actual failure mode. More importantly this matches in errors like this one or the macOS ones such as this one. In both cases a minidump was generated, but the test harness only saw it after timing out. Unfortunately the task didn't retain the minidump nor it processed it so we can't be sure but it's likely to be the minidump we were expecting.

Putting both these failures together points to the likely root cause: the minidump generator being too slow or failing outright. While the latter scenario is possible and we've seen it happen it's not particularly worrisome, we could debug it. The former scenario however - the generator being too slow - is really odd, because in theory ContentParent::ActorDestroy() shouldn't be able to race past it and call CrashReporterHost::GenerateCrashReport() before the minidump is generated. The minidump generator keeps the crashed process frozen until it's finished thus preventing the IPC channel from being torn down, and holds a lock while doing so, so even if CrashReporterHost::GenerateCrashReport() somehow races past it should still block on the lock.

In the end the only two possible explanations are a race between the main thread and the crash generator thread in the main process, or an outright failure of the crash generator. The only way to tell is to add two bits of functionality:

  • Warnings across the failure paths of the crash generator to figure out exactly where we're failing
  • A mechanism to scoop up the leftover minidumps so that we can verify if they're what we expect them to be (or their corrupt, partially written, etc...)

I'll file bugs for both.

Depends on: 1769948
Severity: -- → S3
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Attachment #9386065 - Attachment is obsolete: true
Status: REOPENED → RESOLVED
Closed: 1 year ago5 months ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Status: REOPENED → RESOLVED
Closed: 5 months ago2 months ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: