Closed Bug 1539449 Opened 5 months ago Closed 3 months ago

Mochitest shows green on try even with a crash in content process

Categories

(Testing :: Mochitest, defect, P2)

Version 3
defect

Tracking

(firefox69 fixed)

RESOLVED FIXED
mozilla69
Tracking Status
firefox69 --- fixed

People

(Reporter: mjf, Assigned: ahal)

References

(Depends on 1 open bug)

Details

Attachments

(3 files)

I was investigating a crash in the RDD process decoding av1 with the DAV1D decoder. On a Windows asan build, it was turning the test orange. After lots of debugging around why it was failing in RDD, I tried decoding in content but the test stayed green.

Then I looked at the logs and found that the decoder was crashing the content process, but not turning the test orange. Try run here[1]. Log for content crash here[2].

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=a3ec9590a68febee64fb38ad4ed4f86cad1ede88&selectedJob=236277480
[2] https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=236277480&repo=try&lineNumber=2551

See Also: → 1538474

:ahal, any thoughts on this- maybe something to pick up next month? I marked this as a P3, feel free to adjust!

Flags: needinfo?(ahal)
Priority: -- → P3

Crashes not turning jobs orange is very serious, and there are other bugs recently filed related to content crashes not producing stack traces (e.g bug 1545856). So there might be something deeper afoot.

Unfortunately the logs here have expired. Michael, was this reproducible and do you think you could craft a STR?

Flags: needinfo?(ahal) → needinfo?(mfroman)
Priority: P3 → P2
See Also: → 1545856

The bug that was causing this crash has been fixed, but prior to the fix it was 100% reproducible. I will talk with :achronop to see if we can provide a patch that will cause the crash again.

The bug that fixed the crash was Bug 1535631.

Andrew,

I re-pushed to try with something close to my original patch applied (decoding av1 on content process instead of RDD process), and it reproduced the crash, but the test stays green. I will bundle the patches I applied and attach them here later this evening. Note, that it is necessary to build/run these patches off an older base of moz-central (5edbe9b1b822), before the DAV1D decoder was fixed.

Here is the try run[1]. Here is pointer into the log[2].

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=c355dadb8ce58fd0424a5531b57a12e478dabe80&selectedJob=246689416
[2] https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=246689416&repo=try&lineNumber=1460

Flags: needinfo?(mfroman) → needinfo?(ahal)

Here is the patch to show a DAV1D decoder crash on the content process that does not turn the mochitest orange.

Apply onto 5edbe9b1b822.

Assignee: nobody → mfroman

Thanks! This looks pretty similar to bug 1545856. In there Gabriel's theory was that the crash happened before the crashreporter was able to set itself up. We can test this theory here by adding a sleep to the test harness and seeing if that causes us to get a traceback.

I think there are two issues that needs to be fixed here:

  1. Make sure that content crashes turn the task orange even if there is no stack trace
  2. Make sure there are stack traces

Problem 1) is unique to this bug (in the other one, reftest happens to error out a different way).

Flags: needinfo?(ahal)

If you want to run a test locally with my patch applied, you can do:
./mach mochitest dom/media/test/test_playback.html

Thanks, I can reproduce on try. This will be very helpful. Here were my STR:

$ hg up 5edbe9b1b822
$ curl https://bug1539449.bmoattachments.org/attachment.cgi?id=9065272 | hg import -
$ ./mach try fuzzy -q "win64asan mochimedia !spi"
Assignee: mfroman → nobody
See Also: → 1494799

There's a failure case where content processes are crashing but no stack trace
is being printed. Test harnesses often rely on the presence of a minidump to
determine whether or not there was a crash, and so in this failure mode report
success (so tasks are staying green).

This adds a new string to mozharness' error logs to make sure the task turns
orange. Note: it does not fix the lack of stack traces.

Not sure I'll be able to fix the missing stack traces, but this patch at least solves problem 1 (which is the more serious of the two). We can't land it yet though because there's a chance this might reveal pre-existing content process crashes that weren't previously detected.

If one of these exists, we'll need to figure out a landing strategy to get this into the tree ASAP without being backed out.

Assignee: nobody → ahal
Status: NEW → ASSIGNED

This patch breaks the Mn tasks because they have a unit test that purposefully triggers a content process crash (and therefore A content process has crashed gets dumped to the log and parsed by mozharness). I'll need to either hide this test output from the log or else skip checking for this substr in Mn.

Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f9397b497e49
[mozharness] Ensure content crashes turn task orange even if no stack was printed, r=jmaher
https://hg.mozilla.org/integration/autoland/rev/cfc637a0af12
[marionette] Suppress Gecko output in test_crash.py to prevent mozharness failure, r=whimboo
Status: ASSIGNED → RESOLVED
Closed: 3 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla69
Depends on: 1555123
Regressions: 1559902
You need to log in before you can comment on or make changes to this bug.