Closed Bug 1674142 Opened 4 years ago Closed 3 years ago

Crash in [@ mozilla::widget::InProcessCompositorWidget::PreRender]

Categories

(Core :: Graphics: WebRender, defect)

Unspecified
macOS
defect

Tracking

()

RESOLVED FIXED
88 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox83 --- wontfix
firefox84 --- wontfix
firefox85 --- wontfix
firefox86 + wontfix
firefox87 + wontfix
firefox88 + fixed

People

(Reporter: sefeng, Assigned: bradwerth)

References

(Blocks 1 open bug)

Details

(4 keywords, Whiteboard: [sec-survey][post-critsmash-triage][adv-main88+r])

Crash Data

Attachments

(3 files)

Crash report: https://crash-stats.mozilla.org/report/index/2c0175c1-b779-4442-97ba-5bea50201028

Reason: EXC_BAD_ACCESS / EXC_I386_GPFLT

Top 10 frames of crashing thread:

0 XUL mozilla::widget::InProcessCompositorWidget::PreRender widget/InProcessCompositorWidget.cpp:40
1 XUL mozilla::wr::RendererOGL::UpdateAndRender gfx/webrender_bindings/RendererOGL.cpp:160
2 XUL mozilla::wr::RenderThread::UpdateAndRender gfx/webrender_bindings/RenderThread.cpp:488
3 XUL mozilla::wr::RenderThread::HandleFrameOneDoc gfx/webrender_bindings/RenderThread.cpp:325
4 XUL mozilla::detail::RunnableMethodImpl<mozilla::wr::RenderThread*, void  xpcom/threads/nsThreadUtils.h:1240
5 XUL MessageLoop::DoWork ipc/chromium/src/base/message_loop.cc:548
6 XUL base::MessagePumpDefault::Run ipc/chromium/src/base/message_pump_default.cc:35
7 XUL MessageLoop::Run ipc/chromium/src/base/message_loop.cc:309
8 XUL base::Thread::ThreadMain ipc/chromium/src/base/thread.cc:191
9 XUL ThreadFunc ipc/chromium/src/base/platform_thread_posix.cc:40

Honestly I don't know if this belongs to WebRender. Low frequent crashes. MacOS only.
Crash address has always been at 0x0.

The severity field is not set for this bug.
:jimm, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jmathies)

MacOS related crash that's showing up in 83 now that we've shipped wr.

Blocks: gfx-triage
Flags: needinfo?(jmathies)
Flags: needinfo?(bwerth)

This is a weird one. The crash signature implies that mWidget is being NULL-dereferenced, but there appear to be adequate ASSERT protections to prevent mWidget from being assigned a NULL value. This could be prevented by doing a release NULL-check in PreRender, which the InProcessCompositorWidget class is designed to avoid. I'll supply the patch and we'll work out whether or not it's a good idea in review.

Flags: needinfo?(bwerth)
Assignee: nobody → bwerth
Attachment #9191753 - Attachment description: Bug 1674142: Add a runtime NULL dereference check to InProcessCompositorWidget::PreRender. → Bug 1674142: Add a runtime assert to InProcessCompositorWidget constructor.
Pushed by bwerth@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2991f7c13357
Add a runtime assert to InProcessCompositorWidget constructor. r=jrmuizel
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 85 Branch
No longer blocks: gfx-triage

Are there further plans to investigate the issue? It's third-most frequent bug on macOS 10.16/11. Crash reports.

Flags: needinfo?(bwerth)

The newly-added release assert isn't being hit, which indicates that the failure is in some caller of CompositorWidget::CreateLocal. I'll add a new patch which will push the crash to there, so we'll have a chance of finding a repeatable testcase. It will also change the crash signature.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Pushed by bwerth@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cd39bfc72703
Part 2: Upgrade to a release assert in CompositorWidget::CreateLocal. r=jrmuizel
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: 85 Branch → 87 Branch

Change the status for beta to have the same as nightly and release.
For more information, please visit auto_nag documentation.

Flags: needinfo?(bwerth)

Should the crash signature continued to be tracked by this bug (and it getting reopened) or shall a new bug be created?

Flags: needinfo?(bwerth)

[Tracking Requested - why for this release]: Top crasher for macOS 11.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 87 Branch → ---

Now that the release assert is landed, the crash report should be more useful. Can you link to a recent crash report?

Flags: needinfo?(bwerth) → needinfo?(sefeng)

Hi Brad, you can just click the signature to check all crashes belong to this signature, which you can sort them by date.

One of the latest one is https://crash-stats.mozilla.org/report/index/464ada42-28d5-44ae-8a9d-5ec4d0210226 from build 20210223185702. It doesn't look like the runtime assertion was hit?

Flags: needinfo?(sefeng)

(In reply to Sean Feng [:sefeng] from comment #16)

Hi Brad, you can just click the signature to check all crashes belong to this signature, which you can sort them by date.

Thank you; I didn't know about that. I'll try to find a fix.

There are decreasingly-plausible explanations for how mWidget becomes NULL in
the crash reports we are seeing. This patch closes one of the possible -- but
still unlikely -- reasons by checking for a NULL pointer after a static_cast.
The contract for static_cast should not allow it to return NULL with non-NULL
input, but it would trigger the crashes we are seeing.

Now that we're digging so deep to find this crash, and the crash is still
happening, it seems useful now to prevent PreRender from dereferencing a NULL
pointer, even if it might lead to an unrendered widget.

Sorry, had an unrelated try push here.

Markus, is it possible there's a use-after-free of InProcessCompositorWidget happening here? What's the cross thread safety situation here?

Flags: needinfo?(mstange.moz)

This reminds me of bug 1679368 comment 3.

Flags: needinfo?(mstange.moz)
Depends on: 1679368
Attachment #9205798 - Attachment description: Bug 1674142: Prevent a crash in InProcessCompositorWidget::PreRender. → Bug 1674142: Add additional checks to find a crash in InProcessCompositorWidget::PreRender.

Most of the crashes where the crashing address is 0x0 show the UAF value in the registers (rax)
bp-23bdc693-6d52-4de0-a3ea-095170210228

Not always: bp-6031a5af-b54c-4cd6-b9a1-bfac00210227 has rax: 0x1000005320000032 (it's rare though)

The crashes with "small" crash addresses (around 0x550 is common) seem to have 0x0 or 0x1 in rax

UAFs are usually rated sec-high, but this is happening in the parent process where it would be harder to groom memory for an exploit (no user scripts) so it's not as severe as a similar bug in a child process.

I wonder if there was a case that CompositorBridgeParent was destroyed without calling CompositorBridgeParent::ActorDestroy() nor WebRenderBridgeParent::Destroy(). When CompositorBridgeParent is alive, InProcessCompositorWidget::mWidget should also be alive. From it, IPC might not work as expected on some cases.

Current implementation expects that CompositorBridgeParent is alive when WebRenderBridgeParent is alive. When CompositorBridgeParent is destroyed, CompositorBridgeParent::StopAndClearResources() should be called.

And it seems not safe to call DestroyCompositor() from nsChildView::~nsChildView()

Bug 1597854 and Bug 1690752 also seemed to be caused by Comment 23. They seems not happen when CompositorBridgeParent is alive.

I wonder if we need to handle also ActorDealloc() in addition to ActorDestroy().

Comment on attachment 9205798 [details]
Bug 1674142: Add additional checks to find a crash in InProcessCompositorWidget::PreRender.

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: With difficulty, since we don't know how to trigger the bug yet ourselves.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: all
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: Will not be difficult to backport. Since this is a crash-hunting patch, it may be best to wait until we identify the underlying cause of the crash.
  • How likely is this patch to cause regressions; how much testing does it need?: This patch will introduce a new crash signature to some operation that is leading to this crash.
Attachment #9205798 - Flags: sec-approval?
  • Which older supported branches are affected by this flaw?: all
  • If not all supported branches, which bug introduced the flaw?: None

Do we have any idea why there are no crashes on older builds, then? No crashes on ESR-78. For 81 I only see crashes on beta, and in 82 only in Nightly and Beta but not release. We don't start seeing Release crashes until 83. Seems to imply the bug is in, or at least tickled by, something that was "dev only" for a bit and finally turned on in 83. [ok, there's one Thunderbird-68 crash with this signature, too--bp-14c6bfea-7c28-4a7f-8e2f-a7b4c0201010--so maybe the "tickled by" theory is a better one]

  • If not, how different, hard to create, and risky will they be?: Will not be difficult to backport. Since this is a crash-hunting patch, it may be best to wait until we identify the underlying cause of the crash.

fair enough

Comment on attachment 9205798 [details]
Bug 1674142: Add additional checks to find a crash in InProcessCompositorWidget::PreRender.

sec-approval=dveditz

Attachment #9205798 - Flags: sec-approval? → sec-approval+

webrender is disabled on 78esr (bug 1637691), and was disabled on mac on release until bug 1654271 (and didn't actually ship until 83 because of bug 1670961), so I think that corresponds to the observations in comment 27.

(In reply to Sotaro Ikeda [:sotaro] from comment #23)

I wonder if there was a case that CompositorBridgeParent was destroyed without calling CompositorBridgeParent::ActorDestroy() nor WebRenderBridgeParent::Destroy(). When CompositorBridgeParent is alive, InProcessCompositorWidget::mWidget should also be alive. From it, IPC might not work as expected on some cases.

I opened Bug 1697579 to address these concerns.

Group: dom-core-security → core-security-release
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 88 Branch

As part of a security bug pattern analysis, we are requesting your help with a high level analysis of this bug. It is our hope to develop static analysis (or potentially runtime/dynamic analysis) in the future to identify classes of bugs.

Please visit this google form to reply.

Flags: needinfo?(bwerth)
Whiteboard: [sec-survey]
Flags: needinfo?(bwerth)
Flags: qe-verify+
Whiteboard: [sec-survey] → [sec-survey][post-critsmash-triage]

I wonder if Bug 1700848 might be related to the crash.

Depends on: 1700848

These crashes still happen in 88, and the patch that landed here was "crash-hunting", so should this bug be reopened?

Flags: needinfo?(bwerth)

(In reply to Julien Cristau [:jcristau] from comment #34)

These crashes still happen in 88, and the patch that landed here was "crash-hunting", so should this bug be reopened?

The crash signatures don't show the code from https://phabricator.services.mozilla.com/D106672 that landed. That code will change the crash signature. This bug can stay closed.

Flags: needinfo?(bwerth)
Whiteboard: [sec-survey][post-critsmash-triage] → [sec-survey][post-critsmash-triage][adv-main88+]
Whiteboard: [sec-survey][post-critsmash-triage][adv-main88+] → [sec-survey][post-critsmash-triage][adv-main88+r]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: