Implementation bug UNPACK_SKIP_PIXELS + width > UNPACK_ROW_LENGTH with pokerogue.net, possibly resulting in OOB reads
Categories
(Core :: Graphics: CanvasWebGL, defect, P2)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox-esr115 | --- | unaffected |
| firefox-esr128 | --- | unaffected |
| firefox138 | --- | wontfix |
| firefox139 | + | fixed |
| firefox140 | + | fixed |
| firefox141 | + | fixed |
People
(Reporter: yannis, Assigned: lsalzman)
References
(Blocks 1 open bug, Regression, )
Details
(Keywords: csectype-bounds, regression, sec-high, Whiteboard: [adv-main139.0.4+])
Crash Data
Attachments
(2 files)
|
48 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-release+
tjr
:
sec-approval+
|
Details | Review |
|
5.24 KB,
text/plain
|
Details |
https://pokerogue.net/ is a recurrent URL found in the crash signatures attached to bug 1957421 (Windows) and bug 1963920 (macOS). This website hosts an open source browser game (source code). During my investigation I ran this website in a debug build with the goal to exercise some code paths and gather some hints.
Just loading the login screen of the game often results in the following issue: webgl::ExplicitPixelPackingState::ForUseWith returns Err("UNPACK_SKIP_PIXELS + width > UNPACK_ROW_LENGTH.") with the following call stack:
00 xul!mozilla::webgl::ExplicitPixelPackingState::ForUseWith+0x315 [C:\mozilla-source\mozilla-unified\dom\canvas\WebGLContext.cpp @ 2780]
01 xul!mozilla::webgl::TexUnpackBlobDesc::ExplicitUnpacking+0x38 [C:\mozilla-source\mozilla-unified\dom\canvas\WebGLTypes.h @ 1199]
02 xul!mozilla::webgl::TexUnpackSurface::TexOrSubImage+0x643 [C:\mozilla-source\mozilla-unified\dom\canvas\TexUnpackBlob.cpp @ 1095]
03 xul!mozilla::WebGLTexture::TexImage+0x904 [C:\mozilla-source\mozilla-unified\dom\canvas\WebGLTextureUpload.cpp @ 1110]
04 xul!mozilla::WebGLContext::TexImage+0x153 [C:\mozilla-source\mozilla-unified\dom\canvas\WebGLContextTextures.cpp @ 207]
05 xul!mozilla::HostWebGLContext::TexImage+0x53 [C:\mozilla-source\mozilla-unified\dom\canvas\HostWebGLContext.h @ 574]
06 xul!mozilla::dom::WebGLParent::RecvTexImage+0x73 [C:\mozilla-source\mozilla-unified\dom\canvas\WebGLParent.cpp @ 109]
07 xul!mozilla::dom::PWebGLParent::OnMessageReceived+0xcc7 [C:\mozilla-source\mozilla-unified\obj-x86_64-pc-windows-msvc\ipc\ipdl\PWebGLParent.cpp @ 471]
08 xul!mozilla::gfx::PCanvasManagerParent::OnMessageReceived+0x2cc [C:\mozilla-source\mozilla-unified\obj-x86_64-pc-windows-msvc\ipc\ipdl\PCanvasManagerParent.cpp @ 261]
When I look at the comparison that triggers the error, there is always a gap of 114 between UNPACK_SKIP_PIXELS + width and UNPACK_ROW_LENGTH (e.g. 576 vs. 462, 561 vs 447, 594 vs. 480). In debug builds, this error causes a GPU process crash right off the bat, with the following call stack:
00 xul!AnnotateMozCrashReason+0x11 [C:\mozilla-source\mozilla-unified\obj-x86_64-pc-windows-msvc\dist\include\mozilla\Assertions.h @ 59]
01 xul!mozilla::gfx::Log<1,mozilla::gfx::CriticalLogger>::WriteLog+0xa3 [C:\mozilla-source\mozilla-unified\obj-x86_64-pc-windows-msvc\dist\include\mozilla\gfx\Logging.h @ 833]
02 xul!mozilla::gfx::Log<1,mozilla::gfx::CriticalLogger>::Flush+0x96 [C:\mozilla-source\mozilla-unified\obj-x86_64-pc-windows-msvc\dist\include\mozilla\gfx\Logging.h @ 276]
03 xul!mozilla::gfx::Log<1,mozilla::gfx::CriticalLogger>::~Log+0xab [C:\mozilla-source\mozilla-unified\obj-x86_64-pc-windows-msvc\dist\include\mozilla\gfx\Logging.h @ 269]
04 xul!mozilla::webgl::TexUnpackSurface::TexOrSubImage+0xdac [C:\mozilla-source\mozilla-unified\dom\canvas\TexUnpackBlob.cpp @ 1106]
05 xul!mozilla::WebGLTexture::TexImage+0x904 [C:\mozilla-source\mozilla-unified\dom\canvas\WebGLTextureUpload.cpp @ 1110]
06 xul!mozilla::WebGLContext::TexImage+0x153 [C:\mozilla-source\mozilla-unified\dom\canvas\WebGLContextTextures.cpp @ 207]
07 xul!mozilla::HostWebGLContext::TexImage+0x53 [C:\mozilla-source\mozilla-unified\dom\canvas\HostWebGLContext.h @ 574]
08 xul!mozilla::dom::WebGLParent::RecvTexImage+0x73 [C:\mozilla-source\mozilla-unified\dom\canvas\WebGLParent.cpp @ 109]
09 xul!mozilla::dom::PWebGLParent::OnMessageReceived+0xcc7 [C:\mozilla-source\mozilla-unified\obj-x86_64-pc-windows-msvc\ipc\ipdl\PWebGLParent.cpp @ 471]
0a xul!mozilla::gfx::PCanvasManagerParent::OnMessageReceived+0x2cc [C:\mozilla-source\mozilla-unified\obj-x86_64-pc-windows-msvc\ipc\ipdl\PCanvasManagerParent.cpp @ 261]
The DEBUG crash occurs because we go through the following lines from dom/canvas/TexUnpackBlob.cpp:
if (!dstUnpackingRes.isOk()) {
// This will force a crash in DEBUG, the error is UNPACK_SKIP_PIXELS + width > UNPACK_ROW_LENGTH.
gfxCriticalError() << dstUnpackingRes.inspectErr();
// This would also force a crash in DEBUG if we did not already crash at the line before.
webgl->ErrorImplementationBug("ExplicitUnpacking failed: %s",
dstUnpackingRes.inspectErr().c_str());
return false;
}
In release, we don't crash but return false and continue. This makes me wonder if the crashes in bug 1957421 (Windows) and bug 1963920 could be a later consequence of the same original issue? They look like out-of-bound reads where we pass a larger size value to memcpy than the actual size of the source buffer, which seems like it could be a consequence of the thing we catch in DEBUG.
Updated•11 months ago
|
Comment 1•11 months ago
|
||
I suspect you're right, :yannis, that the UNPACK_SKIP_PIXELS + width > UNPACK_ROW_LENGTH overflows cause the crashes in bug 1963920, bug 1959138 and elsewhere.
We don't yet know how, exactly, they happen on non-debug builds. But I suspect VM_PROT_NONE is involved. See bug 1963920 comment #22.
| Reporter | ||
Comment 2•11 months ago
•
|
||
If I look for Nightly crashes between 2025-01-01 to 2025-03-01 with TexUnpackSurface::TexOrSubImage in the proto signature (which seems to be a common denominator for the crashes we are looking at), we see a single crash for all nightly builds from January, compared to 81 for the February builds. The first February crash is with build ID 20250206094455, and then we almost always get at least one crash each day.
This makes the patches from bug 1943241 highly suspicious to me, as they are the only modifications to dom/canvas/TexUnpackBlob.cpp that landed within that time frame (this file is seldom touched), and they landed on 2025-02-05. :lsalzman, as the author of these patches, could you take a look?
I mentioned that these could potentially be out-of-bounds read crashes because when you do OOB reads, either you end up on a mapped address and read dummy data and continue, or you crash because you end up on an unmapped address. To explore this possibility we would need to double check if and how in the lines below, dstBegin can end up being of unsufficient size:
*out_error =
DoTexOrSubImage(isSubImage, gl, mDesc.imageTarget, level, dui, xOffset,
yOffset, zOffset, size.x, size.y, size.z, dstBegin);
Since the crashes end up using fTexImage2D, I think we can restrict ourselves to the case where isSubImage is false, so that would mean investigating how the size of the buffer behind dstBegin may end up being too small given size.x, size.y, dui->unpackFormat and dui->unpackType.
I'm not sure if or how this relates to UNPACK_SKIP_PIXELS + width > UNPACK_ROW_LENGTH. For example this string appears in the graphics critical error field of 51 crash reports out of the 81 mentioned above. So it is a strong correlation, but sometimes it isn't present. It could just be a non-causal correlation, where pokerogue.net happens to be a good reproducer for both the crashes and the UNPACK_SKIP_PIXELS + width > UNPACK_ROW_LENGTH thing.
| Reporter | ||
Comment 3•11 months ago
•
|
||
Also this github issue seems very related: https://github.com/pagefaultgames/pokerogue/issues/5649
Somebody seems able to reproduce the crash, I'll ask them if they can help us debug.
Edit: Also these Reddit threads:
https://www.reddit.com/r/pokerogue/comments/1k1q71e/game_keeps_crashing_firefox/
https://www.reddit.com/r/pokerogue/comments/1js88ti/game_isnt_working_on_firefox/
https://www.reddit.com/r/pokerogue/comments/1jrvpnn/crashing/
These suggest that it's pretty easy to reproduce by playing, so I'll try that as well.
| Reporter | ||
Comment 4•11 months ago
•
|
||
I was able to reproduce the crash from bug 1957421 myself by playing the game. I now have a time-travel trace of the crash happening in my GPU process, so I should be able to extract relevant information from it. I'll update this bug accordingly.
| Reporter | ||
Comment 5•11 months ago
•
|
||
The time-travel trace indeed points to an OOB read problem, due to a mismatch between mDesc->size and surf->GetSize().
mozilla::webgl::TexUnpackSurface::TexOrSubImage gets called with mDesc->size.x set to 1274 and mDesc->size.y set to 137. These values come from the TexUnpackBlobDesc structure sent across the process boundary over IPDL by PWebGLChild::SendTexImage. Within mozilla::webgl::TexUnpackSurface::TexOrSubImage, we get a surface through the gfx::CanvasManagerParent::GetCanvasSurface path added by bug 1943241. Then we reach the call to ConvertIfNeeded, where we pass a surf->GetSize().width of 1310 and a surf->GetSize().height of 69. Consequently it will do a calloc of 1310 * 69 * 4, i.e. 361560. However, what we pass to DoTexOrSubImage is the values received over IPDL. For the call to fTexImage2D to be valid, the pixels buffer should be of size 1274 * 137 * 4, i.e. 698152, but it is only of size 361560 because of the mismatch, resulting in an OOB read and a crash.
I'm marking bug 1943241 as a regressor based on this information. :lsalzman, let me know if there's anything more you'd want me to check with the time-travel trace.
Values at the ConvertIfNeeded call site:
this = 0x00000191`f80c8060
webgl = 0x00000191`7d962800
rowLength = 0x51e
rowCount = 0x45
srcFormat = BGRA8 (0n27)
srcBegin = 0x00000191`f2971000 ""
srcStride = 0n5240
dstFormat = RGBA8 (0n20)
dstStride = 0n5240
out_begin = 0x00000033`667fe8a8
out_anchoredBuffer = 0x00000033`667fe828
unpacking = <value unavailable>
srcIsPremult = <value unavailable>
dstIsPremult = <value unavailable>
fnHasPremultMismatch = <value unavailable>
srcOrigin = <value unavailable>
dstOrigin = BottomLeft (0n1)
srcColorSpace = Srgb (0n0)
dstColorSpace = <value unavailable>
dstTotalBytes = <value unavailable>
dstBuffer = class mozilla::UniqueBuffer
dstBegin = <value unavailable>
wasTrivial = <value unavailable>
Values at the mozilla::gl::GLContext::fTexImage2D call site:
gl = 0x00000191`09aea800
target = class StrongGLenum<TexImageTargetDetails>
level = 0n0
dui = 0x00000191`7def6f04
width = 0n1274
height = 0n137
depth = 0n1
data = 0x00000192`05599000
errorScope = class mozilla::gl::GLContext::LocalErrorScope
border = 0n0
Comment 6•11 months ago
|
||
Set release status flags based on info from the regressing bug 1943241
Comment 7•11 months ago
|
||
I'll mark this sec-high. It sounds a bit sandbox-escape-y if this is happening in the parent process in response to a child process message?
Comment 8•11 months ago
|
||
This looks pretty similar to bug 1966083.
Comment 9•11 months ago
•
|
||
(In reply to Jason Kratzer [:jkratzer] from comment #8)
This looks pretty similar to bug 1966083.
Per discussion with Tyson, I think Jason meant bug 1963341.
Comment 10•11 months ago
|
||
Oops - my apologies. I copied the wrong bug ID.
Updated•10 months ago
|
Comment 11•10 months ago
|
||
Regression from some work Lee did. sec-high.
Comment 12•10 months ago
•
|
||
I've figured out how to use a HookCase hook library to emulate these crashes on macOS. See bug 1963920 comment #23 and following.
If my emulation is accurate, we should be able to get rid of these crashes simply by ensuring that the width and height values passed to mozilla::gl::GLContext::raw_fTexImage2D() are correct.
To make things a bit clearer: I suspect that the call to calloc() (with size 361560) in comment #5 is correct, and that it's the width (1274) and height (137) parameters passed to mozilla::gl::GLContext::fTexImage2D() that are incorrect.
Comment 13•10 months ago
|
||
Set release status flags based on info from the regressing bug 1943241
Updated•10 months ago
|
| Assignee | ||
Comment 14•10 months ago
|
||
I've tried reproducing this issue with pokerogue.net, but I can't manage to get any disparity between the actual surface size and the IPDL sizes when I go to the login screen there. Is there any more reliable way to reproduce this?
| Assignee | ||
Comment 15•10 months ago
|
||
| Assignee | ||
Comment 16•10 months ago
|
||
After a lot of eyeballing the code, what I believe is happening is that SourceSurfaceCanvasRecording is getting allocated at the same address, so that they end up aliasing where we mistake an earlier version of the surface for the one we're supposed to be waiting for.
To work around this, I have made a patch that uses unique sequential "export" IDs for the surfaces instead of relying on their memory addresses, so that they can't alias in this scenario anymore. As a fail-safe, I have added some code to also adjust the surface size if for some reason we look up a surface, but it's not the expected size. This allows conversions to proceed even in the aliased case, even though I believe this should not actually happen anymore, but it's better than crashing in case I am wrong.
| Assignee | ||
Comment 17•10 months ago
|
||
Comment on attachment 9491850 [details]
Bug 1966083 - Support canvas export surfaces. r?aosmond
Security Approval Request
- How easily could an exploit be constructed based on the patch?: Not easily.
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which branches (beta, release, and/or ESR) are affected by this flaw, and do the release status flags reflect this affected/unaffected state correctly?: 137+
- If not all supported branches, which bug introduced the flaw?: Bug 1943241
- Do you have backports for the affected branches?: No
- If not, how different, hard to create, and risky will they be?: Not risky.
- How likely is this patch to cause regressions; how much testing does it need?: Unlikely.
- Is the patch ready to land after security approval is given?: Yes
- Is Android affected?: Yes
Comment 18•10 months ago
|
||
Comment on attachment 9491850 [details]
Bug 1966083 - Support canvas export surfaces. r?aosmond
Approved to land and request uplift
| Assignee | ||
Comment 19•10 months ago
|
||
Comment on attachment 9491850 [details]
Bug 1966083 - Support canvas export surfaces. r?aosmond
Beta/Release Uplift Approval Request
- User impact if declined/Reason for urgency: Potential heap buffer overrun and crashing when using Canvas2D + WebGL.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky):
- String changes made/needed:
- Is Android affected?: Yes
Comment 20•10 months ago
|
||
Comment 21•10 months ago
|
||
Updated•10 months ago
|
Comment 22•10 months ago
•
|
||
Here's how to search for this bug's crashes on mozilla-central, beta and release builds that contain this bug's patch:
Comment 23•10 months ago
|
||
Comment on attachment 9491850 [details]
Bug 1966083 - Support canvas export surfaces. r?aosmond
Approved for 140.0b5.
Updated•10 months ago
|
Comment 24•10 months ago
|
||
| uplift | ||
Updated•10 months ago
|
Updated•10 months ago
|
| Reporter | ||
Comment 25•10 months ago
|
||
It can take a while for a crash to trigger (maybe 20 minutes in my case). I don't think the login screen is enough. Thanks for the patch!
Comment 26•10 months ago
•
|
||
(In reply to Yannis Juglaret [:yannis] from comment #25)
It can take a while for a crash to trigger (maybe 20 minutes in my case). I don't think the login screen is enough. Thanks for the patch!
So here's the obvious question: Can you still reproduce the crash with builds containing the patch?
Using my search from comment #22, I still haven't found any crashes in builds containing the patch. But the crashes happen infrequently enough that we can't be sure of a fix until the absence of crashes has lasted about a week (since the patch landed).
I also reproduced the duplication of the old (pre-patch) surfaceIds (when they were still pointers). This doesn't happen with the new (sequential) surfaceIds. I did this with a revised version of my HookCase hook library (which logs a bunch of things, including the lifecycle of canvas surfaces (type SurfaceDescriptor::TSurfaceDescriptorCanvasSurface)).
Comment 27•10 months ago
|
||
For what it's worth, here's a log (made with my current hook library) of two different canvas surfaces, with different sizes, sent one after the other with the same surfaceId. This was made using the last mozilla-central nightly not to contain this bug's patch (https://archive.mozilla.org/pub/firefox/nightly/2025/06/2025-06-02-21-11-30-mozilla-central/firefox-141.0a1.en-US.mac.dmg).
Comment 28•10 months ago
|
||
Comment on attachment 9491850 [details]
Bug 1966083 - Support canvas export surfaces. r?aosmond
Approved for 139.0.4
Updated•10 months ago
|
Comment 29•10 months ago
|
||
| uplift | ||
Updated•10 months ago
|
Comment 30•10 months ago
|
||
Using my search from comment #22, I still haven't found any crashes in builds with this bug's patch. It's been a week since it landed on mozilla-central, and six days since it landed on beta. In the week prior to its landing there were 15 crashes. I think we can say that the patch has been verified.
| Reporter | ||
Comment 31•10 months ago
•
|
||
Thanks :smichaud! I agree this should be fixed now. I've also asked for feedback in the github issue linked to the bug.
Comment 33•10 months ago
|
||
Copying crash signatures from duplicate bugs.
| Reporter | ||
Updated•10 months ago
|
Comment 36•10 months ago
|
||
Copying crash signatures from duplicate bugs.
Comment 37•10 months ago
|
||
Copying crash signatures from duplicate bugs.
Comment 39•10 months ago
|
||
Copying crash signatures from duplicate bugs.
Comment 40•9 months ago
•
|
||
For the record, there are a few crashes on builds containing this bug's patch, which have signatures (and stacks) very similar to this bug's crashes, but which don't contain the UNPACK_SKIP_PIXELS + width > UNPACK_ROW_LENGTH critical graphics error:
Unless the number of these increases dramatically, I don't think we have anything to worry about.
| Assignee | ||
Comment 41•9 months ago
|
||
(In reply to Steven Michaud [:smichaud] (Retired) from comment #40)
For the record, there are a few crashes on builds containing this bug's patch, which have signatures (and stacks) very similar to this bug's crashes, but which don't contain the
UNPACK_SKIP_PIXELS + width > UNPACK_ROW_LENGTHcritical graphics error:Unless the number of these increases dramatically, I don't think we have anything to worry about.
All of those signatures combined cast such a wide net that it is certain you will log some crash reports with one of those. The data still shows a dramatic fall off in crashes since the patch landed.
But I don't believe those exceptions represent the issue highlighted in this bug so far, and I would regard them as separate bugs. It would be better to deal with them separately in follow-ups rather than further encumber this particular bug.
Updated•6 months ago
|
Comment 42•6 months ago
•
|
||
(In reply to Lee Salzman [:lsalzman] from comment #41)
But I don't believe those exceptions represent the issue highlighted in this bug so far, and I would regard them as separate bugs. It would be better to deal with them separately in follow-ups rather than further encumber this particular bug.
I agree they're separate, intriguingly the majority of them are on theguardian website as indicated by several user comments, so they may be reproducible and worth exploring as another bug.
Updated•3 months ago
|
Comment 45•2 months ago
|
||
Copying crash signatures from duplicate bugs.
Comment 46•2 months ago
|
||
Copying crash signatures from duplicate bugs.
Description
•