Closed Bug 1346126 Opened 3 years ago Closed 2 years ago
ASSERTION: Child has wrong manager: 'Error', file /home/worker/workspace/build/src/gfx/layers/Layers
.cpp, line 1003
This was a weird failure on the graphics branch.  The assertion repeats and fills up the whole log.  19:10:43 INFO - [Parent 2027] ###!!! ASSERTION: Child has wrong manager: 'Error', file /home/worker/workspace/build/src/gfx/layers/Layers.cpp, line 1003 19:11:24 INFO - #01: mozilla::layers::ContainerLayerComposite::~ContainerLayerComposite() [gfx/layers/composite/ContainerLayerComposite.cpp:678] 19:11:24 INFO - 19:11:24 INFO - #02: mozilla::layers::Layer::Release() [gfx/layers/Layers.h:785] 19:11:24 INFO - 19:11:24 INFO - #03: mozilla::layers::LayerTransactionParent::RecvReleaseLayer(mozilla::layers::LayerHandle const&) [mfbt/Maybe.h:447] 19:11:24 INFO - 19:11:24 INFO - #04: mozilla::layers::PLayerTransactionParent::OnMessageReceived(IPC::Message const&) [obj-firefox/ipc/ipdl/PLayerTransactionParent.cpp:268] 19:11:24 INFO - 19:11:24 INFO - #05: mozilla::layers::PCompositorBridgeParent::OnMessageReceived(IPC::Message const&) [obj-firefox/ipc/ipdl/PCompositorBridgeParent.cpp:562] 19:11:24 INFO - 19:11:24 INFO - #06: mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) [ipc/glue/MessageChannel.h:535] 19:11:24 INFO - 19:11:24 INFO - #07: mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message&&) [ipc/glue/MessageChannel.cpp:1730] 19:11:24 INFO - 19:11:24 INFO - #08: mozilla::ipc::MessageChannel::RunMessage(mozilla::ipc::MessageChannel::MessageTask&) [ipc/glue/MessageChannel.cpp:1604] 19:11:24 INFO - 19:11:24 INFO - #09: mozilla::ipc::MessageChannel::MessageTask::Run() [xpcom/threads/Monitor.h:36] 19:11:24 INFO - 19:11:24 INFO - #10: MessageLoop::RunTask(already_AddRefed<mozilla::Runnable>) [mfbt/RefPtr.h:62] 19:11:24 INFO - 19:11:24 INFO - #11: MessageLoop::DoWork() [mfbt/AlreadyAddRefed.h:101] 19:11:24 INFO - 19:11:24 INFO - #12: base::MessagePumpDefault::Run(base::MessagePump::Delegate*) [ipc/chromium/src/base/message_pump_default.cc:36] 19:11:24 INFO - 19:11:24 INFO - #13: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:502] 19:11:24 INFO - 19:11:24 INFO - #14: base::Thread::ThreadMain() [ipc/chromium/src/base/thread.cc:182] 19:11:24 INFO - 19:11:24 INFO - #15: ThreadFunc(void*) [ipc/chromium/src/base/platform_thread_posix.cc:39] 19:11:24 INFO - 19:11:24 INFO - #16: libsystem_pthread.dylib + 0x405a 19:11:24 INFO -  https://treeherder.mozilla.org/#/jobs?repo=graphics&selectedJob=82900636  https://treeherder.mozilla.org/logviewer.html#?job_id=82900636&repo=graphics&lineNumber=756688
This is indeed a weird error. Even though it happened on the graphics branch it happened on a non-WR build so I don't think it's related to webrender (at least, not directly). If we look I bet we would find instances of this on other trees as well. It's just not obvious because the TH summary just says the log filled up. The only other large-ish changes to the graphics code that might caused this that I know of are the work dvander's been doing to remove compositors and refactor stuff. /cc dvander and mattwoodrow.
2 years ago
Priority: -- → P3
I'm seeing these reported in bug 1411358: The assertion message floods the test log and eventually the job times out: https://treeherder.mozilla.org/logviewer.html#?job_id=176630915&repo=mozilla-inbound&lineNumber=1861540 https://treeherder.mozilla.org/logviewer.html#?job_id=176379965&repo=mozilla-inbound&lineNumber=1643628 https://treeherder.mozilla.org/logviewer.html#?job_id=176043776&repo=mozilla-central&lineNumber=1488851
This also floods bug 1443043 with OS X failures and the logs grow to 500+ MB which even makes the log viewer slow.
:davidb -- Can we bump up the priority / find someone to have a look at this soon? It seems this condition is implicated in a variety of bugs, causing a fair amount of triage confusion. If we are going to ignore the assertion, perhaps it could simply be removed?
Any thoughts on who could investigate here? (I think Matt reviewed bjacob's code in this area)
I still do not have any idea why this is happening. Best guess is something to do with layer tree adoption. I see we don't use RemoveAllChildren here, like we do in ClientLayerContainer. RemoveAllChildren doesn't use this assert because it knows that it's removing it's own children. Let's use that here and see if it either fixes this issue or makes it more obvious.
Assignee: nobody → rhunt
Attachment #8982569 - Flags: review?(matt.woodrow) → review+
Pushed by firstname.lastname@example.org: https://hg.mozilla.org/integration/mozilla-inbound/rev/b9814173e9a8 Use RemoveAllChildren in ~ContainerLayerComposite. r=mattwoodrow
Can this land on Beta too? Pretty sure the same issue is present there also.
Yeah this can land on beta, it's a very simple patch.
Comment on attachment 8982569 [details] [diff] [review] remove-all-children.patch Approval Request Comment [Feature/Bug causing the regression]: Unknown situation triggering an assertion when removing a layer and causing an infinite loop [User impact if declined]: Unsure if this affects users, definitely affects our automated tests [Is this code covered by automated tests?]: Yes, intermittently [Has the fix been verified in Nightly?]: Yes, it doesn't crash more [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: None [Is the change risky?]: No [Why is the change risky/not risky?]: It slightly changes the method we remove child layers to use a more efficient one that one loop infinitely in this case. [String changes made/needed]: None
Attachment #8982569 - Flags: approval-mozilla-beta?
Comment on attachment 8982569 [details] [diff] [review] remove-all-children.patch Low-risk fix to avoid an ugly logbloat situation in CI. Approved for 61.0b12.
Attachment #8982569 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.