Crash in [@ mozilla::layers::ImageBridgeChild::FlushAllImagesSync] on poison values
Categories
(Core :: Graphics, defect)
Tracking
()
People
(Reporter: mccr8, Assigned: aosmond)
References
Details
(Keywords: crash, csectype-uaf, sec-high, Whiteboard: [adv-main113+r][adv-ESR102.11+r])
Crash Data
Attachments
(2 files, 1 obsolete file)
|
48 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
tjr
:
sec-approval+
|
Details | Review |
|
16.42 KB,
patch
|
RyanVM
:
approval-mozilla-esr102+
|
Details | Diff | Splinter Review |
Crash report: https://crash-stats.mozilla.org/report/index/3a6d293c-2cf0-49fd-aff3-542da0230405
Reason: SIGSEGV / SEGV_MAPERR
Top 10 frames of crashing thread:
0 libxul.so mozilla::layers::ImageBridgeChild::FlushAllImagesSync gfx/layers/ipc/ImageBridgeChild.cpp:383
1 libxul.so mozilla::detail::runnable_args_base< dom/media/webrtc/transport/runnable_utils.h:41
2 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1219
2 libxul.so NS_ProcessNextEvent xpcom/threads/nsThreadUtils.cpp:477
3 libxul.so mozilla::ipc::MessagePumpForNonMainThreads::Run ipc/glue/MessagePump.cpp:300
4 libxul.so MessageLoop::RunInternal ipc/chromium/src/base/message_loop.cc:381
4 libxul.so MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:374
4 libxul.so MessageLoop::Run ipc/chromium/src/base/message_loop.cc:356
5 libxul.so nsThread::ThreadFunc xpcom/threads/nsThread.cpp:384
6 libnss3.so _pt_root nsprpub/pr/src/pthreads/ptthread.c:201
17 crashes on this signature, on Android and Windows, in the last 6 months. None on 110, and those on 111 are all Android, for whatever that means. All of them are on poison values.
The crashes are on the line aClient->FlushAllImages();. aClient is passed in as a raw pointer. The caller, ImageBridgeChild::FlushAllImages, uses what I guess is a sync runnable and says "RefPtrs on arguments are not needed since this dispatches synchronously.", and looking at all threads you can see that ImageBridgeChild::FlushAllImages is still around on another thread in a few crashes I looked at. In fact, FlushAllImages seemed to be happening on a DOM worker thread, being called from OffscreenCanvasDisplayHelper::CommitFrameToCompositor in the crashes I looked at, so maybe this is offscreen canvas related?
I'm not sure how we get from one to another, but from SearchFox it looks like the caller of FlushAllImages is ImageContainer::ClearAllImages(), where the callsite looks like this:
imageBridge->FlushAllImages(mImageClient, this);
This means that aClient from above is mImageClient. I don't know if it is possible, but if anything happens to overwrite mImageClient between here and when we do all of our sync runnable shenanigans then I think the image client can get destroyed. Maybe it should get rooted here? It also looks like the two callers of ClearAllImages are on fields, so this in ClearAllImages() might have similar issues, so maybe those should get rooted on the stack instead of the heap.
| Reporter | ||
Comment 1•2 years ago
|
||
FWIW, I looked at all 17 of the crashes from the last 6 months, and for all of them the caller of ImageContainer::ClearAllImages() on the other thread was OffscreenCanvasDisplayHelper::CommitFrameToCompositor.
| Assignee | ||
Comment 2•2 years ago
|
||
I don't fully understand how the lifetimes are failing here, but I'll take a deeper look since I broke it.
| Assignee | ||
Comment 3•2 years ago
|
||
mImageClient is not the safest pointer. We could example, replace it in ImageContainer::EnsureImageClient:
https://searchfox.org/mozilla-central/rev/23690c9281759b41eedf730d3dcb9ae04ccaddf8/gfx/layers/ImageContainer.cpp#166
which is called by ImageContainer::GetAsyncContainerHandle during display list building:
https://searchfox.org/mozilla-central/rev/23690c9281759b41eedf730d3dcb9ae04ccaddf8/gfx/layers/wr/WebRenderUserData.cpp#214
while we are accessing it on the DOM worker thread. In theory this could happen because the GPU process crashed?
| Assignee | ||
Updated•2 years ago
|
| Assignee | ||
Comment 4•2 years ago
|
||
| Assignee | ||
Comment 5•2 years ago
|
||
Comment on attachment 9327681 [details]
Bug 1827024.
Security Approval Request
- How easily could an exploit be constructed based on the patch?: Most of the patch is adding thread annotations, and it will be obvious there are some gaps, but I believe it would require some study to see which ones have issues and how to trigger them.
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which older supported branches are affected by this flaw?:
- If not all supported branches, which bug introduced the flaw?: None
- Do you have backports for the affected branches?: No
- If not, how different, hard to create, and risky will they be?: Relatively easy, there are some things that got added since the ESR.
- How likely is this patch to cause regressions; how much testing does it need?: It is possible there may be some additional deadlocks, there are a lot of locks involved here, but the main lock is recursive at least.
- Is Android affected?: Yes
Comment 6•2 years ago
|
||
Comment on attachment 9327681 [details]
Bug 1827024.
Approved to uplift and land
Comment 7•2 years ago
|
||
Andrew, can you please land and request Beta approval on this patch? And it sounds like we'll need a different patch for ESR also?
Updated•2 years ago
|
| Assignee | ||
Comment 8•2 years ago
|
||
Comment on attachment 9327681 [details]
Bug 1827024.
Beta/Release Uplift Approval Request
- User impact if declined: Rare intermittent crash, sec lifetime issue
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Most of the changes are just annotations. The functional change is holding a lock a little longer to guarantee a lifetime in a multithreaded context.
- String changes made/needed:
- Is Android affected?: Yes
| Assignee | ||
Comment 9•2 years ago
|
||
It is in the autoland queue right now.
Comment 10•2 years ago
|
||
r=gfx-reviewers,lsalzman
https://hg.mozilla.org/integration/autoland/rev/9a8764df77e0d2824daa87ce4147eb7e6671fd84
https://hg.mozilla.org/mozilla-central/rev/9a8764df77e0
Comment 11•2 years ago
|
||
Comment on attachment 9327681 [details]
Bug 1827024.
Approved for 113.0b9.
Comment 12•2 years ago
|
||
| uplift | ||
https://hg.mozilla.org/releases/mozilla-beta/rev/9e036a4dadc2
This will need a rebased patch for ESR uplift.
| Assignee | ||
Comment 13•2 years ago
|
||
ESR Approval Request Comment
[Feature/Bug causing the regression]: N/A
[User impact if declined]: Low volume crash / sec issue
[Is this code covered by automated tests?]: Yes
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: N/A
[Is the change risky?]: No
[Why is the change risky/not risky?]: Most of the change is adding annotations to ensure we hold onto a lock, which is just a compile time change. The only user facing change is to extend the hold time on a lock to ensure an object doesn't get freed before we return a reference to it.
[String changes made/needed]: N/A
See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
| Comment hidden (obsolete) |
| Assignee | ||
Comment 15•2 years ago
|
||
Wrong file.
| Assignee | ||
Comment 16•2 years ago
|
||
Comment on attachment 9330971 [details] [diff] [review]
Rebased for mozilla-esr, v2
ESR Approval Request Comment
[Feature/Bug causing the regression]: N/A
[User impact if declined]: Low volume crash / sec issue
[Is this code covered by automated tests?]: Yes
[Has the fix been verified in Nightly?]: Yes
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: N/A
[Is the change risky?]: No
[Why is the change risky/not risky?]: Most of the change is adding annotations to ensure we hold onto a lock, which is just a compile time change. The only user facing change is to extend the hold time on a lock to ensure an object doesn't get freed before we return a reference to it.
[String changes made/needed]: N/A
See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Comment 17•2 years ago
|
||
Comment on attachment 9330971 [details] [diff] [review]
Rebased for mozilla-esr, v2
Approved for 102.11esr.
Comment 18•2 years ago
|
||
| uplift | ||
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Description
•