Closed
Bug 1335745
Opened 7 years ago
Closed 7 years ago
Intermittent linux-qr TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/dom/xbl/crashtests/336960-1.html | application terminated with exit code 11 | application crashed [@ nsPresContext::NotifyDidPaintForSubtree]
Categories
(Core :: Graphics: WebRender, defect, P3)
Tracking
()
RESOLVED
FIXED
mozilla55
Tracking | Status | |
---|---|---|
firefox52 | --- | unaffected |
firefox-esr52 | --- | unaffected |
firefox53 | --- | unaffected |
firefox54 | --- | unaffected |
firefox55 | --- | fixed |
People
(Reporter: kats, Assigned: kats)
References
Details
(Keywords: intermittent-failure, Whiteboard: [gfx-noted])
Attachments
(1 file)
An intermittent failure in QR crashtests that's only shown up once so far. But the crash stack contains QR-specific code so it's quite plausible that this is a bug in the QR code that needs fixing. https://treeherder.mozilla.org/logviewer.html#?job_id=73556375&repo=graphics&lineNumber=5348 Top of the crash stack looks like this: 0 libxul.so!nsPresContext::NotifyDidPaintForSubtree [nsPresContext.cpp:84b84e7610ee : 2591 + 0x0] 1 libxul.so!nsView::DidCompositeWindow [nsView.cpp:84b84e7610ee : 1087 + 0x13] 2 libxul.so!mozilla::layers::WebRenderLayerManager::DidComposite [WebRenderLayerManager.cpp:84b84e7610ee : 417 + 0x16] 3 libxul.so!mozilla::layers::CompositorBridgeChild::RecvDidComposite [CompositorBridgeChild.cpp:84b84e7610ee : 584 + 0x13] 4 libxul.so!mozilla::layers::PCompositorBridgeChild::OnMessageReceived [PCompositorBridgeChild.cpp:84b84e7610ee : 1537 + 0x21] 5 libxul.so!mozilla::ipc::MessageChannel::DispatchAsyncMessage [MessageChannel.cpp:84b84e7610ee : 1781 + 0x6] 6 libxul.so!mozilla::ipc::MessageChannel::DispatchMessage [MessageChannel.cpp:84b84e7610ee : 1716 + 0xb] 7 libxul.so!mozilla::ipc::MessageChannel::RunMessage [MessageChannel.cpp:84b84e7610ee : 1589 + 0xb] 8 libxul.so!mozilla::ipc::MessageChannel::MessageTask::Run [MessageChannel.cpp:84b84e7610ee : 1622 + 0xc] 9 libxul.so!nsThread::ProcessNextEvent [nsThread.cpp:84b84e7610ee : 1261 + 0x6]
Updated•7 years ago
|
Assignee: nobody → sotaro.ikeda.g
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 4•7 years ago
|
||
I talked to Timothy about this and he did some investigation. His notes: === I did a few more try pushes. Looks like the test that is failing is 336744-1.html. In it we have a xul popup. In which case it makes sense that we would get a DidComposite msg for painting to the widget of the xul popup, and that DidComposite would goto a view in the content document that contains the xul popup. And in this case that document is not a root document. The actual assert/crash usually happens several tests after 336744-1.html is finished, so I think the test is still alive in the bf cache, but it got disconnected from it's root prescontext (which is normally what happens to bf cached documents). So the question is why are we getting a DidComposite msg for a popup that is no longer on screen? Possibilities 1) the popup didn't actually close because of some bug and so it's still painting 2) the DidComposite msg just took a little long to arrive, it was from a composite when the popup was open 3) something else? And then why is this happening with webrender but not without webrender? That makes it a little suspicious. If there is no other bug here then just bailing if we can't get a rootprescontext seems reasonable. === This all makes perfect sense to me. I think the answer to question is (2) because when webrender is enabled, there's an extra thread involved. There is a "render thread" in addition to the regular compositor thread in the parent/GPU process. So after doing the composite, the code at [1] (running on the render thread) schedules a message to the compositor thread, which eventually runs and sends the composite notification back to content. (Although I note that this crash seems to only ever happen on non-e10s crashtests, so that means "content" is still living in the parent process). Anyhow, the extra thread indirection could certainly account for the extra latency, and would explain why it only happens with webrender. I'll write a patch to guard against a missing rootprescontext. [1] http://searchfox.org/mozilla-central/rev/0079c7adf3b329bff579d3bbe6ac7ba2f6218a19/gfx/webrender_bindings/RenderThread.cpp#191
Assignee: sotaro.ikeda.g → bugmail
Assignee | ||
Comment 5•7 years ago
|
||
Try push to confirm the patch is good: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a4a3885d968e6f17d708efaa30609d792ed544f9
Assignee | ||
Updated•7 years ago
|
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 8•7 years ago
|
||
I accidentally left in a debugging MOZ_ASSERT. Updated patch has that removed, and here's a try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=988db5ad6ae2ce129778b3fa0066b5bf5f5f97bb
Comment 9•7 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #4) > (Although I note that this crash seems to only ever happen on non-e10s > crashtests, so that means "content" is still living in the parent process). The reason for this is that the root content document is also a root document in e10s. So even when the content document is in the bf cache it can still get a root prescontext (ie its own prescontext).
Comment 10•7 years ago
|
||
mozreview-review |
Comment on attachment 8850516 [details] Bug 1335745 - Guard against a null rootPresContext. https://reviewboard.mozilla.org/r/123112/#review125624
Attachment #8850516 -
Flags: review?(tnikkel) → review+
Comment 11•7 years ago
|
||
Pushed by kgupta@mozilla.com: https://hg.mozilla.org/projects/graphics/rev/e299338a1e4f Guard against a null rootPresContext. r=tnikkel
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Comment 13•7 years ago
|
||
This still fails often on central yesterday, see e.g. See https://treeherder.mozilla.org/logviewer.html#?job_id=86468435&repo=mozilla-central
Flags: needinfo?(bugmail)
Assignee | ||
Comment 14•7 years ago
|
||
Yeah, the fix hasn't merged to central yet. We need to merge the graphics branch to central for that to happen. I want to wait until the next webrender update though before doing that.
Flags: needinfo?(bugmail)
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 16•7 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/e299338a1e4f
status-firefox52:
--- → unaffected
status-firefox53:
--- → unaffected
status-firefox54:
--- → unaffected
status-firefox55:
--- → fixed
status-firefox-esr52:
--- → unaffected
Target Milestone: --- → mozilla55
You need to log in
before you can comment on or make changes to this bug.
Description
•