Crash in [@ nouveau_fence_trigger_work] on poison values
Categories
(Core :: Graphics, defect, P1)
Tracking
()
People
(Reporter: mccr8, Assigned: lsalzman)
References
(Blocks 1 open bug)
Details
(Keywords: crash, csectype-uaf, sec-high, Whiteboard: [adv-main111+r])
Crash Data
Attachments
(1 file)
48 bytes,
text/x-phabricator-request
|
dmeehan
:
approval-mozilla-beta+
dmeehan
:
approval-mozilla-release-
tjr
:
sec-approval+
|
Details | Review |
Crash report: https://crash-stats.mozilla.org/report/index/65b272d8-e521-4366-9cce-b71a20230215
Reason: SIGSEGV / SI_KERNEL
Top 10 frames of crashing thread:
0 libgallium_dri.so nouveau_fence_trigger_work src/gallium/drivers/nouveau/nouveau_fence.c:53
1 libgallium_dri.so nouveau_fence_update src/gallium/drivers/nouveau/nouveau_fence.c:128
2 libgallium_dri.so nouveau_fence_wait src/gallium/drivers/nouveau/nouveau_fence.c:219
3 libgallium_dri.so st_finish src/mesa/state_tracker/st_cb_flush.c:76
4 libgallium_dri.so st_glFinish src/mesa/state_tracker/st_cb_flush.c:115
5 libgallium_dri.so _mesa_get_fallback_texture src/mesa/main/texobj.c:1070
6 libgallium_dri.so _mesa_update_texture_state src/mesa/main/texstate.c:910
7 libgallium_dri.so _mesa_update_state_locked src/mesa/main/state.c:476
8 libgallium_dri.so _mesa_update_state src/mesa/main/state.c:504
9 libgallium_dri.so _mesa_ClearBufferfv src/mesa/main/clear.c:663
The above crash looks like it is in canvas WebGL.
In this one, there's some CanvasRenderingContext2D stuff on the main thread:
bp-a6af659d-5054-4d66-9c3e-203f20230215
Lots of these crashes are on the jemalloc poison value. These stacks look pretty deep inside the graphics drivers so I'm not sure if we can do anything.
Reporter | ||
Updated•1 year ago
|
Comment 1•1 year ago
|
||
These crashes are happening with relatively recent versions of Mesa, in particular 22.2.5 is still supported so we might file a bug on their tracker. I did it in the past for similar issues and they were always responsive.
Comment 2•1 year ago
|
||
I have filed a confidential issue on Mesa's bug tracker: https://gitlab.freedesktop.org/mesa/mesa/-/issues/8309
Since this is affecting both Ubuntu and Debian users I don't think it's an issue with either distribution's packages, it's more likely to be an upstream issue.
Reporter | ||
Updated•1 year ago
|
Reporter | ||
Comment 3•1 year ago
|
||
I'll mark this as sec-high for now, but maybe it should be sec-vector instead.
Updated•1 year ago
|
Comment 4•1 year ago
|
||
The severity field for this bug is set to S3. However, the bug is flagged with the sec-high
keyword.
:bhood, could you consider increasing the severity of this security bug?
For more information, please visit auto_nag documentation.
Updated•1 year ago
|
Updated•1 year ago
|
Comment 5•1 year ago
|
||
Upstream thinks this is a thread safety issue. We can consider blocking nouveau < 22.3
Comment 6•1 year ago
|
||
It looks like this is happening because of accelerated canvas in the parent causing us to do multi-threaded GL on nouveau which doesn't work. Lee can we disable accelerated canvas in the parent for now?
Updated•1 year ago
|
Assignee | ||
Comment 7•1 year ago
|
||
Updated•1 year ago
|
Assignee | ||
Comment 8•1 year ago
•
|
||
In discussions with Jeff, I believe the fallout from this is limited to Linux + Nouveau configurations, so a fraction of a fraction of the release population.
I don't know how exploitable this is, but I would guess the most common problem will simply be instability and crashiness.
Given all that, sec-high might be a bit much.
Assignee | ||
Comment 9•1 year ago
•
|
||
Comment on attachment 9319572 [details]
Bug 1817336 - Only create accelerated Canvas2D contexts in content process. r?jrmuizel
Security Approval Request
- How easily could an exploit be constructed based on the patch?: Unknown, as this will mostly just lead to races inside the Mesa driver. This only affects the Linux + Nouveau population, and the most common symptom will be frequent crashes when using browser features that rely on any form of Canvas2D-based screenshotting or otherwise utilize Canvas2D in the parent process. However, given the crash frequency compared to the size of this population (only 3.5% of Linux users), the crash frequency is a bit worrying.
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: Unknown
- Which older supported branches are affected by this flaw?: 110
- If not all supported branches, which bug introduced the flaw?: Bug 110
- Do you have backports for the affected branches?: Yes
- If not, how different, hard to create, and risky will they be?:
- How likely is this patch to cause regressions; how much testing does it need?: This ensures accelerated Canvas2D can't be used in anything other than the content process, which was the case prior to 110 anyway. We generally should never be creating accelerated Canvas2D contexts in any other process than a content process anyway, since these are not typically long-running use-cases that benefit from heavy acceleration.
- Is Android affected?: No
Assignee | ||
Comment 10•1 year ago
•
|
||
A bit more detail - normally WebGL remoting from the content process to the parent/GPU process will ensure that such WebGL commands are executed on either the Renderer or CanvasRenderer thread in the parent/GPU process depending on whether the THREADSAFE_GL feature is present (which it is present on all platforms save Linux/Nouveau).
However, if something in the parent process tries to do its own Canvas2D usage, and potentially creating an accelerated WebGL context, it will bypass the remoting normally used from the content process and directly execute those commands in a thread other than the Renderer or CanvasRenderer threads. WebRender will then be running on the Renderer thread in the same process, although on a separate GL context. However, Nouveau is not safe to use in this way (hence no THREADSAFE_GL feature), and will result in the crashes we are seeing.
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 11•1 year ago
•
|
||
Andrew, can we get a quick sec-approval on this? I am not terribly worried about the sec implications, more that this is causing a worryingly high crash rate among a very small subset of our release population. I would like to get this into nightly and tested ASAP, in case this needs a dot-release ride-along or something like that.
Assignee | ||
Comment 12•1 year ago
|
||
[Tracking Requested - why for this release]:
High crash rate among a small subset (Linux + Mesa/Nouveau) of our release population.
Comment 13•1 year ago
|
||
Comment on attachment 9319572 [details]
Bug 1817336 - Only create accelerated Canvas2D contexts in content process. r?jrmuizel
Approved to land. This seems obtuse enough I'm not that concerned about ensuring an uplift and it seems we may want to bake it for a bit anyway.
Reporter | ||
Updated•1 year ago
|
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 14•1 year ago
•
|
||
Comment on attachment 9319572 [details]
Bug 1817336 - Only create accelerated Canvas2D contexts in content process. r?jrmuizel
Beta/Release Uplift Approval Request
- User impact if declined: High crash rate for users of Linux + Mesa/Nouveau.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): This disables acceleration in a scenario that it was never expected to be used in the first place. So in the worst case scenario that this didn't actually fix the bug, it doesn't break anything because it only returns us to the status quo prior to 110. On the flip side, we still want to ensure Canvas2D acceleration is never used outside of content processes anyway.
Best case scenario, this will just magically fix all these crashes for our users without incident. This bug is hard to test for locally, so without pushing it into the wild it is really hard to guess if this is working, since the incidence in nightly is fairly low, while jetting up significantly in beta and release. While it might not justify a dot-release on its own, a ride-along with a dot-release I would strongly advocate for.
- String changes made/needed:
- Is Android affected?: No
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Updated•1 year ago
|
Assignee | ||
Comment 15•1 year ago
|
||
Bug 1777849 seems like we ran into a kinda similar WebGL thread-safety issue from an entirely different source - libX11 is also not thread-safe in older versions, which caused it to crash... Worth nothing, even though it has a completely different cause, because the end result is that we're running multiple WebGL threads in a scenario where that is not good.
Comment 16•1 year ago
|
||
Assignee | ||
Comment 17•1 year ago
|
||
It seems like in coordination with bug 1777849, this can aggravate the libX11 race condition as well by originating more GLX calls from the main thread, which can induce the symptoms in bug 1777849 independent of Nouveau. Rather, that causes the Xlib crash as opposed to what we see here with Nouveau. The fix here works for both cases regardless.
Updated•1 year ago
|
Comment 18•1 year ago
|
||
Comment on attachment 9319572 [details]
Bug 1817336 - Only create accelerated Canvas2D contexts in content process. r?jrmuizel
Approved for 111.0b7
Comment 19•1 year ago
|
||
uplift |
Updated•1 year ago
|
Updated•1 year ago
|
Comment 20•1 year ago
|
||
Comment on attachment 9319572 [details]
Bug 1817336 - Only create accelerated Canvas2D contexts in content process. r?jrmuizel
Rejecting release uplift approval, we are in RC week for Fx111.
There is no Fx110 dot release planned.
Updated•1 year ago
|
Comment 22•1 year ago
|
||
Copying crash signatures from duplicate bugs.
Updated•1 year ago
|
Updated•6 months ago
|
Updated•3 months ago
|
Assignee | ||
Comment 24•3 months ago
|
||
This fix landed in 112. It's already in 115 ESR?
Comment 25•3 months ago
|
||
Sounds like we should get a new bug filed for any lingering crashes with this signature.
Description
•