Closed Bug 1827024 Opened 2 years ago Closed 2 years ago

Crash in [@ mozilla::layers::ImageBridgeChild::FlushAllImagesSync] on poison values

Categories

(Core :: Graphics, defect)

Unspecified
Android
defect

Tracking

()

RESOLVED FIXED
114 Branch
Tracking Status
firefox-esr102 113+ fixed
firefox112 --- wontfix
firefox113 + fixed
firefox114 + fixed

People

(Reporter: mccr8, Assigned: aosmond)

References

Details

(Keywords: crash, csectype-uaf, sec-high, Whiteboard: [adv-main113+r][adv-ESR102.11+r])

Crash Data

Attachments

(2 files, 1 obsolete file)

Crash report: https://crash-stats.mozilla.org/report/index/3a6d293c-2cf0-49fd-aff3-542da0230405

Reason: SIGSEGV / SEGV_MAPERR

Top 10 frames of crashing thread:

0  libxul.so  mozilla::layers::ImageBridgeChild::FlushAllImagesSync  gfx/layers/ipc/ImageBridgeChild.cpp:383
1  libxul.so  mozilla::detail::runnable_args_base<  dom/media/webrtc/transport/runnable_utils.h:41
2  libxul.so  nsThread::ProcessNextEvent  xpcom/threads/nsThread.cpp:1219
2  libxul.so  NS_ProcessNextEvent  xpcom/threads/nsThreadUtils.cpp:477
3  libxul.so  mozilla::ipc::MessagePumpForNonMainThreads::Run  ipc/glue/MessagePump.cpp:300
4  libxul.so  MessageLoop::RunInternal  ipc/chromium/src/base/message_loop.cc:381
4  libxul.so  MessageLoop::RunHandler  ipc/chromium/src/base/message_loop.cc:374
4  libxul.so  MessageLoop::Run  ipc/chromium/src/base/message_loop.cc:356
5  libxul.so  nsThread::ThreadFunc  xpcom/threads/nsThread.cpp:384
6  libnss3.so  _pt_root  nsprpub/pr/src/pthreads/ptthread.c:201

17 crashes on this signature, on Android and Windows, in the last 6 months. None on 110, and those on 111 are all Android, for whatever that means. All of them are on poison values.

The crashes are on the line aClient->FlushAllImages();. aClient is passed in as a raw pointer. The caller, ImageBridgeChild::FlushAllImages, uses what I guess is a sync runnable and says "RefPtrs on arguments are not needed since this dispatches synchronously.", and looking at all threads you can see that ImageBridgeChild::FlushAllImages is still around on another thread in a few crashes I looked at. In fact, FlushAllImages seemed to be happening on a DOM worker thread, being called from OffscreenCanvasDisplayHelper::CommitFrameToCompositor in the crashes I looked at, so maybe this is offscreen canvas related?

I'm not sure how we get from one to another, but from SearchFox it looks like the caller of FlushAllImages is ImageContainer::ClearAllImages(), where the callsite looks like this:
imageBridge->FlushAllImages(mImageClient, this);

This means that aClient from above is mImageClient. I don't know if it is possible, but if anything happens to overwrite mImageClient between here and when we do all of our sync runnable shenanigans then I think the image client can get destroyed. Maybe it should get rooted here? It also looks like the two callers of ClearAllImages are on fields, so this in ClearAllImages() might have similar issues, so maybe those should get rooted on the stack instead of the heap.

FWIW, I looked at all 17 of the crashes from the last 6 months, and for all of them the caller of ImageContainer::ClearAllImages() on the other thread was OffscreenCanvasDisplayHelper::CommitFrameToCompositor.

I don't fully understand how the lifetimes are failing here, but I'll take a deeper look since I broke it.

Flags: needinfo?(aosmond)

mImageClient is not the safest pointer. We could example, replace it in ImageContainer::EnsureImageClient:
https://searchfox.org/mozilla-central/rev/23690c9281759b41eedf730d3dcb9ae04ccaddf8/gfx/layers/ImageContainer.cpp#166

which is called by ImageContainer::GetAsyncContainerHandle during display list building:
https://searchfox.org/mozilla-central/rev/23690c9281759b41eedf730d3dcb9ae04ccaddf8/gfx/layers/wr/WebRenderUserData.cpp#214

while we are accessing it on the DOM worker thread. In theory this could happen because the GPU process crashed?

Assignee: nobody → aosmond
Status: NEW → ASSIGNED
Flags: needinfo?(aosmond)
Attached file Bug 1827024.

Comment on attachment 9327681 [details]
Bug 1827024.

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Most of the patch is adding thread annotations, and it will be obvious there are some gaps, but I believe it would require some study to see which ones have issues and how to trigger them.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?:
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: Relatively easy, there are some things that got added since the ESR.
  • How likely is this patch to cause regressions; how much testing does it need?: It is possible there may be some additional deadlocks, there are a lot of locks involved here, but the main lock is recursive at least.
  • Is Android affected?: Yes
Attachment #9327681 - Flags: sec-approval?

Comment on attachment 9327681 [details]
Bug 1827024.

Approved to uplift and land

Attachment #9327681 - Flags: sec-approval? → sec-approval+

Andrew, can you please land and request Beta approval on this patch? And it sounds like we'll need a different patch for ESR also?

Flags: needinfo?(aosmond)

Comment on attachment 9327681 [details]
Bug 1827024.

Beta/Release Uplift Approval Request

  • User impact if declined: Rare intermittent crash, sec lifetime issue
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Most of the changes are just annotations. The functional change is holding a lock a little longer to guarantee a lifetime in a multithreaded context.
  • String changes made/needed:
  • Is Android affected?: Yes
Flags: needinfo?(aosmond)
Attachment #9327681 - Flags: approval-mozilla-beta?

It is in the autoland queue right now.

Group: gfx-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 114 Branch

Comment on attachment 9327681 [details]
Bug 1827024.

Approved for 113.0b9.

Attachment #9327681 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

https://hg.mozilla.org/releases/mozilla-beta/rev/9e036a4dadc2

This will need a rebased patch for ESR uplift.

Flags: needinfo?(aosmond)
Attached file Rebased for mozilla-esr, v1 (obsolete) —

ESR Approval Request Comment
[Feature/Bug causing the regression]: N/A
[User impact if declined]: Low volume crash / sec issue
[Is this code covered by automated tests?]: Yes
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: N/A
[Is the change risky?]: No
[Why is the change risky/not risky?]: Most of the change is adding annotations to ensure we hold onto a lock, which is just a compile time change. The only user facing change is to extend the hold time on a lock to ensure an object doesn't get freed before we return a reference to it.
[String changes made/needed]: N/A

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.

Flags: needinfo?(aosmond)
Attachment #9330970 - Flags: approval-mozilla-esr102?

Wrong file.

Attachment #9330970 - Attachment is obsolete: true
Attachment #9330970 - Flags: approval-mozilla-esr102?

Comment on attachment 9330971 [details] [diff] [review]
Rebased for mozilla-esr, v2

ESR Approval Request Comment
[Feature/Bug causing the regression]: N/A
[User impact if declined]: Low volume crash / sec issue
[Is this code covered by automated tests?]: Yes
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: N/A
[Is the change risky?]: No
[Why is the change risky/not risky?]: Most of the change is adding annotations to ensure we hold onto a lock, which is just a compile time change. The only user facing change is to extend the hold time on a lock to ensure an object doesn't get freed before we return a reference to it.
[String changes made/needed]: N/A

See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.

Attachment #9330971 - Flags: approval-mozilla-esr102?

Comment on attachment 9330971 [details] [diff] [review]
Rebased for mozilla-esr, v2

Approved for 102.11esr.

Attachment #9330971 - Attachment is patch: true
Attachment #9330971 - Flags: approval-mozilla-esr102? → approval-mozilla-esr102+
Whiteboard: [adv-main113+r]
Whiteboard: [adv-main113+r] → [adv-main113+r][adv-ESR102.11+]
Whiteboard: [adv-main113+r][adv-ESR102.11+] → [adv-main113+r][adv-ESR102.11+r]
QA Whiteboard: [post-critsmash-triage]
Flags: qe-verify-
Regressions: 1831099
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: