Closed
Bug 1091322
Opened 10 years ago
Closed 9 years ago
Intermittent test_zmedia_cleanup.html | application crashed [@ mozilla::layers::Compositor::AssertOnCompositorThread()] after Assertion failure: CompositorParent::CompositorLoop() == MessageLoop::current() (Can only call this from the compositor thread!),
Categories
(Core :: Graphics: Layers, defect)
Tracking
()
RESOLVED
WORKSFORME
Tracking | Status | |
---|---|---|
e10s | + | --- |
People
(Reporter: KWierso, Unassigned)
References
(Blocks 1 open bug)
Details
(Keywords: intermittent-failure)
16:45:46 INFO - nsStringStats
16:45:46 INFO - => mAllocCount: 2923512
16:45:46 INFO - => mReallocCount: 296938
16:45:46 INFO - => mFreeCount: 2923476 -- LEAKED 36 !!!
16:45:46 INFO - => mShareCount: 4927300
16:45:46 INFO - => mAdoptCount: 145709
16:45:46 INFO - => mAdoptFreeCount: 145709
16:45:46 INFO - => Process ID: 1875, Thread ID: 140698723338688
16:45:46 INFO - TEST-INFO | Main app process: killed by SIGSEGV
16:45:46 INFO - 1815 INFO TEST-START | Shutdown
16:45:46 INFO - 1816 INFO Passed: 214309
16:45:46 INFO - 1817 INFO Failed: 0
16:45:46 INFO - 1818 INFO Todo: 24138
16:45:46 INFO - 1819 INFO Slowest: 230736ms - /tests/dom/imptests/editing/conformancetest/test_runtest.html
16:45:46 INFO - 1820 INFO SimpleTest FINISHED
16:45:46 INFO - 1821 INFO TEST-INFO | Ran 1 Loops
16:45:46 INFO - 1822 INFO SimpleTest FINISHED
16:45:46 INFO - 1823 ERROR TEST-UNEXPECTED-FAIL | /tests/dom/media/tests/mochitest/test_zmedia_cleanup.html | application terminated with exit code 11
16:45:46 INFO - runtests.py | Application ran for: 0:44:05.438840
16:45:46 INFO - zombiecheck | Reading PID log: /tmp/tmpZE5q1Cpidlog
16:45:46 INFO - ==> process 1837 launched child process 1875
16:45:46 INFO - ==> process 1875 launched child process 5172
16:45:46 INFO - zombiecheck | Checking for orphan process with PID: 1875
16:45:46 INFO - zombiecheck | Checking for orphan process with PID: 5172
16:45:55 INFO - mozcrash Saved minidump as /builds/slave/test/build/blobber_upload_dir/7a73cf6d-1b7c-84d1-3f07168f-7f967c13.dmp
16:45:55 INFO - mozcrash Saved app info as /builds/slave/test/build/blobber_upload_dir/7a73cf6d-1b7c-84d1-3f07168f-7f967c13.extra
16:45:55 WARNING - PROCESS-CRASH | /tests/dom/media/tests/mochitest/test_zmedia_cleanup.html | application crashed [@ mozilla::layers::Compositor::AssertOnCompositorThread()]
16:45:55 INFO - Crash dump filename: /tmp/tmpx0nahD.mozrunner/minidumps/7a73cf6d-1b7c-84d1-3f07168f-7f967c13.dmp
16:45:55 INFO - Operating system: Linux
16:45:55 INFO - 0.0.0 Linux 3.2.0-23-generic #36-Ubuntu SMP Tue Apr 10 20:39:51 UTC 2012 x86_64
16:45:55 INFO - CPU: amd64
16:45:55 INFO - family 6 model 62 stepping 4
16:45:55 INFO - 1 CPU
16:45:55 INFO - Crash reason: SIGSEGV
16:45:55 INFO - Crash address: 0x0
16:45:55 INFO - Thread 17 (crashed)
16:45:55 INFO - 0 libxul.so!mozilla::layers::Compositor::AssertOnCompositorThread() [Compositor.cpp:0c50940fd16d : 46 + 0x18]
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 12•10 years ago
|
||
Without any gfx people CCed, this bug is *going* places.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 37•10 years ago
|
||
The shutdown of compositor goes like this:
// [some stuff]
sCompositorThreadHolder = nullptr;
while (!sFinishedCompositorShutDown) {
NS_ProcessNextEvent(nullptr, true); // the main thread is waiting here while the compositor the crash
}
After sCompositorThreadHolder is nulled, AssertOnCompositorThread will always return false because what it does is check that the current MessageLoop is the one stored in the thread kept by sCompositorThreadHolder.
Benoit, can null out the pointer after the while loop to avoid this or is there a dependency on the compositor thread holder being nulled before?
Flags: needinfo?(bjacob)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 97•10 years ago
|
||
Can we please find an active owner for this frequent e10s crash? It's contributing to an entire test suite being hidden at the moment.
tracking-e10s:
--- → ?
Flags: needinfo?(milan)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 102•10 years ago
|
||
Let's see if Benoit can take a look after we deal with 33.* issues.
Assignee: nobody → bjacob
Flags: needinfo?(milan)
Comment 103•10 years ago
|
||
I'll see if I can find a regressing cset to backout in the mean time.
Comment 104•10 years ago
|
||
Can't help but notice bug 1088898 landing around the time that this started.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•10 years ago
|
Blocks: e10s-tests
Comment 109•10 years ago
|
||
FYI the test test_zmedia_cleanup.html is not drawing anything. It is suppose to turn off the WiFi network interface on B2G emulator. So theoretically it really does something on B2G.
That being said, the network related tests on the B2G emulator are currently turned off anyhow, so theoretically we could simply turn this test off as it should not be needed. Obviously that would/could simply hide the real problem here.
Comment 110•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/pushloghtml?changeset=8345ae427a3f appears to be when it started.
Comment 111•10 years ago
|
||
Retriggers are strongly pointing to this push as when it started.
https://treeherder.mozilla.org/ui/#/jobs?repo=b2g-inbound&revision=edf60abe62a5
Blocks: 998872
Comment 112•10 years ago
|
||
Sean, I'm sorry to put you in this situation, but this failure is currently a major contributor to a test suite being hidden by default on Treeherder due to how often it fails. Unfortunately, the gfx team is tied up with OMTC firefighting at the moment, so I'm afraid that backing out bug 998872 is our only realistic short-term option here to getting this resolved.
Flags: needinfo?(selin)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 120•10 years ago
|
||
Ryan, another short term solution could be to disable test_zmedia_cleanup.html right now. test_zmedia_cleanup.html is not really a test by itself, so we are not loosing anything by disabling it. In this case we should open another bug report to investigate and clean this up once the gfx team has time again. And I should note that if we would want to re-enable the WebRTC tests on B2G emulator then the disabled test_zmedia_cleanup.html would become a blocker.
Comment 121•10 years ago
|
||
Try run of bug 998872 (and some deps that landed on top of it) backed out:
https://tbpl.mozilla.org/?tree=Try&rev=f573c2e79394
Comment 122•10 years ago
|
||
(In reply to Nils Ohlmeier [:drno] from comment #120)
> Ryan, another short term solution could be to disable
> test_zmedia_cleanup.html right now. test_zmedia_cleanup.html is not really a
> test by itself, so we are not loosing anything by disabling it. In this case
> we should open another bug report to investigate and clean this up once the
> gfx team has time again. And I should note that if we would want to
> re-enable the WebRTC tests on B2G emulator then the disabled
> test_zmedia_cleanup.html would become a blocker.
These failures are happening on desktop Firefox w/ e10s enabled, not B2G.
Comment 123•10 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #122)
> These failures are happening on desktop Firefox w/ e10s enabled, not B2G.
I know. Which makes it even stranger.
Unfortunately our manifest's don't allow to specify just an 'if', because test_zmedia_cleanup.html should only get execute on B2G. The test itself has code in it to figure out if it runs on the B2G emu.
Comment 124•10 years ago
|
||
Ah, I understand what you're saying now. Here's a Try run of test_zmedia_cleanup.html disabled:
https://tbpl.mozilla.org/?tree=Try&rev=a637782a99a2
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 126•10 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #112)
> Sean, I'm sorry to put you in this situation, but this failure is currently
> a major contributor to a test suite being hidden by default on Treeherder
> due to how often it fails. Unfortunately, the gfx team is tied up with OMTC
> firefighting at the moment, so I'm afraid that backing out bug 998872 is our
> only realistic short-term option here to getting this resolved.
Hi Ryan,
Actually I'm more inclined for Nils' solution. Disabling test_zmedia_cleanup.html, which appears only relevant to WebRTC on B2G, seems more acceptable for short term to me.
Flags: needinfo?(selin)
Comment 127•10 years ago
|
||
...if it works.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 129•10 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #121)
> Try run of bug 998872 (and some deps that landed on top of it) backed out:
> https://tbpl.mozilla.org/?tree=Try&rev=f573c2e79394
The backout Try push looks green, so that at least confirms it to be a valid way of fixing this. We'll see what the test disabling run looks like (my concern being that it'll just move the crash to another test since test_zmedia_cleanup.html should be a no-op on desktop anyway per drno's comments).
Comment 130•10 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #124)
> Ah, I understand what you're saying now. Here's a Try run of
> test_zmedia_cleanup.html disabled:
> https://tbpl.mozilla.org/?tree=Try&rev=a637782a99a2
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #129)
> (my concern being that it'll just move the crash to another test since
> test_zmedia_cleanup.html should be a no-op on desktop anyway per drno's
> comments).
And sure enough, that's exactly what happens. The crash just moves to test_peerConnection_toJSON.html instead.
https://tbpl.mozilla.org/php/getParsedLog.php?id=51852297&tree=Try
So yeah, backing out is the only way to get to green in the short-term unless the gfx comes up with manpower to solve this on their end of things.
Flags: needinfo?(selin)
Comment 131•10 years ago
|
||
I'm thinking to disable all TV tests for e10s on Linux debug builds as a short-term workaround. Here's the try run and it looks good to me.
https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=4313f438e8df
Flags: needinfo?(selin)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 136•10 years ago
|
||
(In reply to Sean Lin [:seanlin] from comment #131)
> I'm thinking to disable all TV tests for e10s on Linux debug builds as a
> short-term workaround. Here's the try run and it looks good to me.
> https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=4313f438e8df
Looks great, thanks for doing that!
Comment 137•10 years ago
|
||
TV tests disabled.
https://hg.mozilla.org/mozilla-central/rev/9ae56a13f079
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 139•10 years ago
|
||
(In reply to Nicolas Silva [:nical] from comment #37)
> The shutdown of compositor goes like this:
>
> // [some stuff]
> sCompositorThreadHolder = nullptr;
>
> while (!sFinishedCompositorShutDown) {
> NS_ProcessNextEvent(nullptr, true); // the main thread is waiting here
> while the compositor the crash
> }
>
>
> After sCompositorThreadHolder is nulled, AssertOnCompositorThread will
> always return false because what it does is check that the current
> MessageLoop is the one stored in the thread kept by sCompositorThreadHolder.
>
> Benoit, can null out the pointer after the while loop to avoid this or is
> there a dependency on the compositor thread holder being nulled before?
Sorry for not getting to this earlier.
sFinishedCompositorShutDown is only set to true here:
/* static */ void
CompositorThreadHolder::DestroyCompositorThread(Thread* aCompositorThread)
{
MOZ_ASSERT(NS_IsMainThread());
MOZ_ASSERT(!sCompositorThreadHolder, "We shouldn't be destroying the compositor thread yet.");
DestroyCompositorMap();
delete aCompositorThread;
sFinishedCompositorShutDown = true;
}
(in the same file).
This means that it will never be set to true as long as sCompositorThreadHolder remains as a strong reference helf on the CompositorThreadHolder singleton. This means that if you move the 'sCompositorThreadHolder = nullptr' line after this while loop, then this while loop will never terminate.
Just try it :-)
Flags: needinfo?(bjacob)
Comment 140•10 years ago
|
||
Ok, then I suppose we should have AssertOnCompositorThread only do the assertion if sCompositorThreadHolder is not null or have another state to mark that the assertion should do nothing after a certain point in the shutdown.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 144•10 years ago
|
||
Did disabling those TV tests change the chunking, and that's why it worked? These last few were the (non-intermittent) result of bug 1117650 moving some tests from alphabetically before media to after media, and thus pulling the webrtc tests from being at the start of debug e10s m3 to being at the end of debug e10s m2, where they have to deal with shutdown, which they fail to deal with. I backed it out, to get the tree green, but I don't feel good about that, and since your tests apparently simply aren't capable of running without having some other tests to run after them as a buffer, I really should have disabled all of your tests instead.
Comment 145•10 years ago
|
||
Further evidence that it's WebRTC causing trouble, rather than gfx: that also moved the WebRTC tests in ASAN test chunking, and being near shutdown without anyone else to buffer and allow time for WebRTC to fizzle out results in https://treeherder.mozilla.org/logviewer.html#?job_id=5696825&repo=mozilla-inbound as well.
Comment 146•10 years ago
|
||
Maire, do you have anybody who can look into this soonish? This is now actively going to be blocking other devs from landing seemingly-unrelated patches due to chunking changes on B2G.
Assignee: jacob.benoit.1 → nobody
Flags: needinfo?(mreavy)
Comment 147•10 years ago
|
||
I can try to add stopping all all media playing in the WebRTC tests to make the life of Gfx easier when it gets to cleanup...
Comment 148•10 years ago
|
||
(In reply to Nils Ohlmeier [:drno] from comment #147)
> I can try to add stopping all all media playing in the WebRTC tests to make
> the life of Gfx easier when it gets to cleanup...
Nils -- Please try that and let's then assess how much that helps (if it's sufficient or if more is needed).
Flags: needinfo?(mreavy)
Comment 149•10 years ago
|
||
Reproduced the problem on try:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=599e6e3705bb
But even with my change, which stops/pauses all media at the end of each test does not prevent the problem:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=b8670b05b212
So I'm out of ideas what we can do on the WebRTC test sides to avoid this problem. If someone from the Gfx side could explain what is actually causing the problem here I'll be happy to adjust the WebRTC tests.
Comment 150•10 years ago
|
||
Milan, can you suggest someone who might be able to help? This bug is blocking other patches from landing, which is not so cool :(
Flags: needinfo?(milan)
Comment 152•10 years ago
|
||
(In reply to Milan Sreckovic [:milan] from comment #151)
> Does the patch from bug 1122722 help?
Flags: needinfo?(drno)
Comment 153•10 years ago
|
||
I started a try run with the patches from bug 1117650 and bug 1122722... leave need-info to remind myself on the result later.
Flags: needinfo?(drno)
Comment 154•10 years ago
|
||
And obviously I wanted to remove the check mark before saving...
Flags: needinfo?(drno)
Comment 155•10 years ago
|
||
So it looks pretty green with that patch included: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2cabfcd658a3
I verified that mochitest chunk 2 executes test_zmedia_cleanup. I re-triggered another 10 times. If that does not fail I think we seem to have a fix for the problem.
Comment 156•10 years ago
|
||
So after a quite a few re-runs of the test only one other intermittent problem showed up. So I'm fairly confident that bug 1122722 fixes this issue.
Depends on: 1122722
Flags: needinfo?(drno)
Comment 157•10 years ago
|
||
(In reply to Nils Ohlmeier [:drno] from comment #156)
> So after a quite a few re-runs of the test only one other intermittent
> problem showed up. So I'm fairly confident that bug 1122722 fixes this issue.
OK, I'll get bug 1122722 through reviews and do a try run.
Comment 158•9 years ago
|
||
Inactive; closing (see bug 1180138).
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•