Closed Bug 1335745 Opened 3 years ago Closed 3 years ago

Intermittent linux-qr TEST-UNEXPECTED-FAIL | file:///home/worker/workspace/build/tests/reftest/tests/dom/xbl/crashtests/336960-1.html | application terminated with exit code 11 | application crashed [@ nsPresContext::NotifyDidPaintForSubtree]

Categories

(Core :: Graphics: WebRender, defect, P3)

Other Branch
x86_64
Linux
defect

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox52 --- unaffected
firefox-esr52 --- unaffected
firefox53 --- unaffected
firefox54 --- unaffected
firefox55 --- fixed

People

(Reporter: kats, Assigned: kats)

References

Details

(Keywords: intermittent-failure, Whiteboard: [gfx-noted])

Attachments

(1 file)

An intermittent failure in QR crashtests that's only shown up once so far. But the crash stack contains QR-specific code so it's quite plausible that this is a bug in the QR code that needs fixing.

https://treeherder.mozilla.org/logviewer.html#?job_id=73556375&repo=graphics&lineNumber=5348

Top of the crash stack looks like this:

0  libxul.so!nsPresContext::NotifyDidPaintForSubtree [nsPresContext.cpp:84b84e7610ee : 2591 + 0x0]
1  libxul.so!nsView::DidCompositeWindow [nsView.cpp:84b84e7610ee : 1087 + 0x13]
2  libxul.so!mozilla::layers::WebRenderLayerManager::DidComposite [WebRenderLayerManager.cpp:84b84e7610ee : 417 + 0x16]
3  libxul.so!mozilla::layers::CompositorBridgeChild::RecvDidComposite [CompositorBridgeChild.cpp:84b84e7610ee : 584 + 0x13]
4  libxul.so!mozilla::layers::PCompositorBridgeChild::OnMessageReceived [PCompositorBridgeChild.cpp:84b84e7610ee : 1537 + 0x21]
5  libxul.so!mozilla::ipc::MessageChannel::DispatchAsyncMessage [MessageChannel.cpp:84b84e7610ee : 1781 + 0x6]
6  libxul.so!mozilla::ipc::MessageChannel::DispatchMessage [MessageChannel.cpp:84b84e7610ee : 1716 + 0xb]
7  libxul.so!mozilla::ipc::MessageChannel::RunMessage [MessageChannel.cpp:84b84e7610ee : 1589 + 0xb]
8  libxul.so!mozilla::ipc::MessageChannel::MessageTask::Run [MessageChannel.cpp:84b84e7610ee : 1622 + 0xc]
9  libxul.so!nsThread::ProcessNextEvent [nsThread.cpp:84b84e7610ee : 1261 + 0x6]
Assignee: nobody → sotaro.ikeda.g
Depends on: 1346143
Duplicate of this bug: 1345471
I talked to Timothy about this and he did some investigation. His notes:

===
I did a few more try pushes. Looks like the test that is failing is 336744-1.html. In it we have a xul popup. In which case it makes sense that we would get a DidComposite msg for painting to the widget of the xul popup, and that DidComposite would goto a view in the content document that contains the xul popup. And in this case that document is not a root document.

The actual assert/crash usually happens several tests after 336744-1.html is finished, so I think the test is still alive in the bf cache, but it got disconnected from it's root prescontext (which is normally what happens to bf cached documents).

So the question is why are we getting a DidComposite msg for a popup that is no longer on screen? Possibilities
1) the popup didn't actually close because of some bug and so it's still painting
2) the DidComposite msg just took a little long to arrive, it was from a composite when the popup was open
3) something else?

And then why is this happening with webrender but not without webrender? That makes it a little suspicious.

If there is no other bug here then just bailing if we can't get a rootprescontext seems reasonable.
===

This all makes perfect sense to me. I think the answer to question is (2) because when webrender is enabled, there's an extra thread involved. There is a "render thread" in addition to the regular compositor thread in the parent/GPU process. So after doing the composite, the code at [1] (running on the render thread) schedules a message to the compositor thread, which eventually runs and sends the composite notification back to content. (Although I note that this crash seems to only ever happen on non-e10s crashtests, so that means "content" is still living in the parent process). Anyhow, the extra thread indirection could certainly account for the extra latency, and would explain why it only happens with webrender.

I'll write a patch to guard against a missing rootprescontext.

[1] http://searchfox.org/mozilla-central/rev/0079c7adf3b329bff579d3bbe6ac7ba2f6218a19/gfx/webrender_bindings/RenderThread.cpp#191
Assignee: sotaro.ikeda.g → bugmail
I accidentally left in a debugging MOZ_ASSERT. Updated patch has that removed, and here's a try push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=988db5ad6ae2ce129778b3fa0066b5bf5f5f97bb
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #4)
> (Although I note that this crash seems to only ever happen on non-e10s
> crashtests, so that means "content" is still living in the parent process).

The reason for this is that the root content document is also a root document in e10s. So even when the content document is in the bf cache it can still get a root prescontext (ie its own prescontext).
Comment on attachment 8850516 [details]
Bug 1335745 - Guard against a null rootPresContext.

https://reviewboard.mozilla.org/r/123112/#review125624
Attachment #8850516 - Flags: review?(tnikkel) → review+
Pushed by kgupta@mozilla.com:
https://hg.mozilla.org/projects/graphics/rev/e299338a1e4f
Guard against a null rootPresContext. r=tnikkel
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
No longer blocks: 1338995
Duplicate of this bug: 1338995
Yeah, the fix hasn't merged to central yet. We need to merge the graphics branch to central for that to happen. I want to wait until the next webrender update though before doing that.
Flags: needinfo?(bugmail)
You need to log in before you can comment on or make changes to this bug.