Closed Bug 1966083 (CVE-2025-49709) Opened 11 months ago Closed 10 months ago

Implementation bug UNPACK_SKIP_PIXELS + width > UNPACK_ROW_LENGTH with pokerogue.net, possibly resulting in OOB reads

Categories

(Core :: Graphics: CanvasWebGL, defect, P2)

defect

Tracking

()

VERIFIED FIXED
141 Branch
Tracking Status
firefox-esr115 --- unaffected
firefox-esr128 --- unaffected
firefox138 --- wontfix
firefox139 + fixed
firefox140 + fixed
firefox141 + fixed

People

(Reporter: yannis, Assigned: lsalzman)

References

(Blocks 1 open bug, Regression, )

Details

(Keywords: csectype-bounds, regression, sec-high, Whiteboard: [adv-main139.0.4+])

Crash Data

Attachments

(2 files)

https://pokerogue.net/ is a recurrent URL found in the crash signatures attached to bug 1957421 (Windows) and bug 1963920 (macOS). This website hosts an open source browser game (source code). During my investigation I ran this website in a debug build with the goal to exercise some code paths and gather some hints.

Just loading the login screen of the game often results in the following issue: webgl::ExplicitPixelPackingState::ForUseWith returns Err("UNPACK_SKIP_PIXELS + width > UNPACK_ROW_LENGTH.") with the following call stack:

00 xul!mozilla::webgl::ExplicitPixelPackingState::ForUseWith+0x315 [C:\mozilla-source\mozilla-unified\dom\canvas\WebGLContext.cpp @ 2780] 
01 xul!mozilla::webgl::TexUnpackBlobDesc::ExplicitUnpacking+0x38 [C:\mozilla-source\mozilla-unified\dom\canvas\WebGLTypes.h @ 1199] 
02 xul!mozilla::webgl::TexUnpackSurface::TexOrSubImage+0x643 [C:\mozilla-source\mozilla-unified\dom\canvas\TexUnpackBlob.cpp @ 1095] 
03 xul!mozilla::WebGLTexture::TexImage+0x904 [C:\mozilla-source\mozilla-unified\dom\canvas\WebGLTextureUpload.cpp @ 1110] 
04 xul!mozilla::WebGLContext::TexImage+0x153 [C:\mozilla-source\mozilla-unified\dom\canvas\WebGLContextTextures.cpp @ 207] 
05 xul!mozilla::HostWebGLContext::TexImage+0x53 [C:\mozilla-source\mozilla-unified\dom\canvas\HostWebGLContext.h @ 574] 
06 xul!mozilla::dom::WebGLParent::RecvTexImage+0x73 [C:\mozilla-source\mozilla-unified\dom\canvas\WebGLParent.cpp @ 109] 
07 xul!mozilla::dom::PWebGLParent::OnMessageReceived+0xcc7 [C:\mozilla-source\mozilla-unified\obj-x86_64-pc-windows-msvc\ipc\ipdl\PWebGLParent.cpp @ 471] 
08 xul!mozilla::gfx::PCanvasManagerParent::OnMessageReceived+0x2cc [C:\mozilla-source\mozilla-unified\obj-x86_64-pc-windows-msvc\ipc\ipdl\PCanvasManagerParent.cpp @ 261]

When I look at the comparison that triggers the error, there is always a gap of 114 between UNPACK_SKIP_PIXELS + width and UNPACK_ROW_LENGTH (e.g. 576 vs. 462, 561 vs 447, 594 vs. 480). In debug builds, this error causes a GPU process crash right off the bat, with the following call stack:

00 xul!AnnotateMozCrashReason+0x11 [C:\mozilla-source\mozilla-unified\obj-x86_64-pc-windows-msvc\dist\include\mozilla\Assertions.h @ 59] 
01 xul!mozilla::gfx::Log<1,mozilla::gfx::CriticalLogger>::WriteLog+0xa3 [C:\mozilla-source\mozilla-unified\obj-x86_64-pc-windows-msvc\dist\include\mozilla\gfx\Logging.h @ 833] 
02 xul!mozilla::gfx::Log<1,mozilla::gfx::CriticalLogger>::Flush+0x96 [C:\mozilla-source\mozilla-unified\obj-x86_64-pc-windows-msvc\dist\include\mozilla\gfx\Logging.h @ 276] 
03 xul!mozilla::gfx::Log<1,mozilla::gfx::CriticalLogger>::~Log+0xab [C:\mozilla-source\mozilla-unified\obj-x86_64-pc-windows-msvc\dist\include\mozilla\gfx\Logging.h @ 269] 
04 xul!mozilla::webgl::TexUnpackSurface::TexOrSubImage+0xdac [C:\mozilla-source\mozilla-unified\dom\canvas\TexUnpackBlob.cpp @ 1106] 
05 xul!mozilla::WebGLTexture::TexImage+0x904 [C:\mozilla-source\mozilla-unified\dom\canvas\WebGLTextureUpload.cpp @ 1110] 
06 xul!mozilla::WebGLContext::TexImage+0x153 [C:\mozilla-source\mozilla-unified\dom\canvas\WebGLContextTextures.cpp @ 207] 
07 xul!mozilla::HostWebGLContext::TexImage+0x53 [C:\mozilla-source\mozilla-unified\dom\canvas\HostWebGLContext.h @ 574] 
08 xul!mozilla::dom::WebGLParent::RecvTexImage+0x73 [C:\mozilla-source\mozilla-unified\dom\canvas\WebGLParent.cpp @ 109] 
09 xul!mozilla::dom::PWebGLParent::OnMessageReceived+0xcc7 [C:\mozilla-source\mozilla-unified\obj-x86_64-pc-windows-msvc\ipc\ipdl\PWebGLParent.cpp @ 471] 
0a xul!mozilla::gfx::PCanvasManagerParent::OnMessageReceived+0x2cc [C:\mozilla-source\mozilla-unified\obj-x86_64-pc-windows-msvc\ipc\ipdl\PCanvasManagerParent.cpp @ 261]

The DEBUG crash occurs because we go through the following lines from dom/canvas/TexUnpackBlob.cpp:

  if (!dstUnpackingRes.isOk()) {
    // This will force a crash in DEBUG, the error is UNPACK_SKIP_PIXELS + width > UNPACK_ROW_LENGTH.
    gfxCriticalError() << dstUnpackingRes.inspectErr();
    // This would also force a crash in DEBUG if we did not already crash at the line before.
    webgl->ErrorImplementationBug("ExplicitUnpacking failed: %s",
                                  dstUnpackingRes.inspectErr().c_str());
    return false;
  }

In release, we don't crash but return false and continue. This makes me wonder if the crashes in bug 1957421 (Windows) and bug 1963920 could be a later consequence of the same original issue? They look like out-of-bound reads where we pass a larger size value to memcpy than the actual size of the source buffer, which seems like it could be a consequence of the thing we catch in DEBUG.

Group: core-security → gfx-core-security

I suspect you're right, :yannis, that the UNPACK_SKIP_PIXELS + width > UNPACK_ROW_LENGTH overflows cause the crashes in bug 1963920, bug 1959138 and elsewhere.

We don't yet know how, exactly, they happen on non-debug builds. But I suspect VM_PROT_NONE is involved. See bug 1963920 comment #22.

If I look for Nightly crashes between 2025-01-01 to 2025-03-01 with TexUnpackSurface::TexOrSubImage in the proto signature (which seems to be a common denominator for the crashes we are looking at), we see a single crash for all nightly builds from January, compared to 81 for the February builds. The first February crash is with build ID 20250206094455, and then we almost always get at least one crash each day.

This makes the patches from bug 1943241 highly suspicious to me, as they are the only modifications to dom/canvas/TexUnpackBlob.cpp that landed within that time frame (this file is seldom touched), and they landed on 2025-02-05. :lsalzman, as the author of these patches, could you take a look?

I mentioned that these could potentially be out-of-bounds read crashes because when you do OOB reads, either you end up on a mapped address and read dummy data and continue, or you crash because you end up on an unmapped address. To explore this possibility we would need to double check if and how in the lines below, dstBegin can end up being of unsufficient size:

  *out_error =
      DoTexOrSubImage(isSubImage, gl, mDesc.imageTarget, level, dui, xOffset,
                      yOffset, zOffset, size.x, size.y, size.z, dstBegin);

Since the crashes end up using fTexImage2D, I think we can restrict ourselves to the case where isSubImage is false, so that would mean investigating how the size of the buffer behind dstBegin may end up being too small given size.x, size.y, dui->unpackFormat and dui->unpackType.

I'm not sure if or how this relates to UNPACK_SKIP_PIXELS + width > UNPACK_ROW_LENGTH. For example this string appears in the graphics critical error field of 51 crash reports out of the 81 mentioned above. So it is a strong correlation, but sometimes it isn't present. It could just be a non-causal correlation, where pokerogue.net happens to be a good reproducer for both the crashes and the UNPACK_SKIP_PIXELS + width > UNPACK_ROW_LENGTH thing.

Flags: needinfo?(lsalzman)

Also this github issue seems very related: https://github.com/pagefaultgames/pokerogue/issues/5649

Somebody seems able to reproduce the crash, I'll ask them if they can help us debug.

Edit: Also these Reddit threads:

https://www.reddit.com/r/pokerogue/comments/1k1q71e/game_keeps_crashing_firefox/
https://www.reddit.com/r/pokerogue/comments/1js88ti/game_isnt_working_on_firefox/
https://www.reddit.com/r/pokerogue/comments/1jrvpnn/crashing/

These suggest that it's pretty easy to reproduce by playing, so I'll try that as well.

I was able to reproduce the crash from bug 1957421 myself by playing the game. I now have a time-travel trace of the crash happening in my GPU process, so I should be able to extract relevant information from it. I'll update this bug accordingly.

The time-travel trace indeed points to an OOB read problem, due to a mismatch between mDesc->size and surf->GetSize().

mozilla::webgl::TexUnpackSurface::TexOrSubImage gets called with mDesc->size.x set to 1274 and mDesc->size.y set to 137. These values come from the TexUnpackBlobDesc structure sent across the process boundary over IPDL by PWebGLChild::SendTexImage. Within mozilla::webgl::TexUnpackSurface::TexOrSubImage, we get a surface through the gfx::CanvasManagerParent::GetCanvasSurface path added by bug 1943241. Then we reach the call to ConvertIfNeeded, where we pass a surf->GetSize().width of 1310 and a surf->GetSize().height of 69. Consequently it will do a calloc of 1310 * 69 * 4, i.e. 361560. However, what we pass to DoTexOrSubImage is the values received over IPDL. For the call to fTexImage2D to be valid, the pixels buffer should be of size 1274 * 137 * 4, i.e. 698152, but it is only of size 361560 because of the mismatch, resulting in an OOB read and a crash.

I'm marking bug 1943241 as a regressor based on this information. :lsalzman, let me know if there's anything more you'd want me to check with the time-travel trace.

Values at the ConvertIfNeeded call site:

                this = 0x00000191`f80c8060
               webgl = 0x00000191`7d962800
           rowLength = 0x51e
            rowCount = 0x45
           srcFormat = BGRA8 (0n27)
            srcBegin = 0x00000191`f2971000 ""
           srcStride = 0n5240
           dstFormat = RGBA8 (0n20)
           dstStride = 0n5240
           out_begin = 0x00000033`667fe8a8
  out_anchoredBuffer = 0x00000033`667fe828
           unpacking = <value unavailable>
        srcIsPremult = <value unavailable>
        dstIsPremult = <value unavailable>
fnHasPremultMismatch = <value unavailable>
           srcOrigin = <value unavailable>
           dstOrigin = BottomLeft (0n1)
       srcColorSpace = Srgb (0n0)
       dstColorSpace = <value unavailable>
       dstTotalBytes = <value unavailable>
           dstBuffer = class mozilla::UniqueBuffer
            dstBegin = <value unavailable>
          wasTrivial = <value unavailable>

Values at the mozilla::gl::GLContext::fTexImage2D call site:

             gl = 0x00000191`09aea800
         target = class StrongGLenum<TexImageTargetDetails>
          level = 0n0
            dui = 0x00000191`7def6f04
          width = 0n1274
         height = 0n137
          depth = 0n1
           data = 0x00000192`05599000
     errorScope = class mozilla::gl::GLContext::LocalErrorScope
         border = 0n0
Keywords: regression
Regressed by: 1943241

Set release status flags based on info from the regressing bug 1943241

I'll mark this sec-high. It sounds a bit sandbox-escape-y if this is happening in the parent process in response to a child process message?

This looks pretty similar to bug 1966083.

(In reply to Jason Kratzer [:jkratzer] from comment #8)

This looks pretty similar to bug 1966083.

Per discussion with Tyson, I think Jason meant bug 1963341.

See Also: → 1963341

Oops - my apologies. I copied the wrong bug ID.

Regression from some work Lee did. sec-high.

Blocks: gfx-triage
Severity: -- → S2
Priority: -- → P2

I've figured out how to use a HookCase hook library to emulate these crashes on macOS. See bug 1963920 comment #23 and following.

If my emulation is accurate, we should be able to get rid of these crashes simply by ensuring that the width and height values passed to mozilla::gl::GLContext::raw_fTexImage2D() are correct.

To make things a bit clearer: I suspect that the call to calloc() (with size 361560) in comment #5 is correct, and that it's the width (1274) and height (137) parameters passed to mozilla::gl::GLContext::fTexImage2D() that are incorrect.

Set release status flags based on info from the regressing bug 1943241

Assignee: nobody → lsalzman

I've tried reproducing this issue with pokerogue.net, but I can't manage to get any disparity between the actual surface size and the IPDL sizes when I go to the login screen there. Is there any more reliable way to reproduce this?

Flags: needinfo?(lsalzman) → needinfo?(yjuglaret)

After a lot of eyeballing the code, what I believe is happening is that SourceSurfaceCanvasRecording is getting allocated at the same address, so that they end up aliasing where we mistake an earlier version of the surface for the one we're supposed to be waiting for.

To work around this, I have made a patch that uses unique sequential "export" IDs for the surfaces instead of relying on their memory addresses, so that they can't alias in this scenario anymore. As a fail-safe, I have added some code to also adjust the surface size if for some reason we look up a surface, but it's not the expected size. This allows conversions to proceed even in the aliased case, even though I believe this should not actually happen anymore, but it's better than crashing in case I am wrong.

Comment on attachment 9491850 [details]
Bug 1966083 - Support canvas export surfaces. r?aosmond

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Not easily.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which branches (beta, release, and/or ESR) are affected by this flaw, and do the release status flags reflect this affected/unaffected state correctly?: 137+
  • If not all supported branches, which bug introduced the flaw?: Bug 1943241
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: Not risky.
  • How likely is this patch to cause regressions; how much testing does it need?: Unlikely.
  • Is the patch ready to land after security approval is given?: Yes
  • Is Android affected?: Yes
Attachment #9491850 - Flags: sec-approval?

Comment on attachment 9491850 [details]
Bug 1966083 - Support canvas export surfaces. r?aosmond

Approved to land and request uplift

Attachment #9491850 - Flags: sec-approval? → sec-approval+

Comment on attachment 9491850 [details]
Bug 1966083 - Support canvas export surfaces. r?aosmond

Beta/Release Uplift Approval Request

  • User impact if declined/Reason for urgency: Potential heap buffer overrun and crashing when using Canvas2D + WebGL.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky):
  • String changes made/needed:
  • Is Android affected?: Yes
Attachment #9491850 - Flags: approval-mozilla-release?
Attachment #9491850 - Flags: approval-mozilla-beta?
Group: gfx-core-security → core-security
Status: NEW → RESOLVED
Closed: 10 months ago
Resolution: --- → FIXED
Group: core-security → core-security-release

Comment on attachment 9491850 [details]
Bug 1966083 - Support canvas export surfaces. r?aosmond

Approved for 140.0b5.

Attachment #9491850 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
QA Whiteboard: [sec] [qa-triage-done-c141/b140]
Flags: qe-verify-

It can take a while for a crash to trigger (maybe 20 minutes in my case). I don't think the login screen is enough. Thanks for the patch!

Flags: needinfo?(yjuglaret)

(In reply to Yannis Juglaret [:yannis] from comment #25)

It can take a while for a crash to trigger (maybe 20 minutes in my case). I don't think the login screen is enough. Thanks for the patch!

So here's the obvious question: Can you still reproduce the crash with builds containing the patch?

Using my search from comment #22, I still haven't found any crashes in builds containing the patch. But the crashes happen infrequently enough that we can't be sure of a fix until the absence of crashes has lasted about a week (since the patch landed).

I also reproduced the duplication of the old (pre-patch) surfaceIds (when they were still pointers). This doesn't happen with the new (sequential) surfaceIds. I did this with a revised version of my HookCase hook library (which logs a bunch of things, including the lifecycle of canvas surfaces (type SurfaceDescriptor::TSurfaceDescriptorCanvasSurface)).

For what it's worth, here's a log (made with my current hook library) of two different canvas surfaces, with different sizes, sent one after the other with the same surfaceId. This was made using the last mozilla-central nightly not to contain this bug's patch (https://archive.mozilla.org/pub/firefox/nightly/2025/06/2025-06-02-21-11-30-mozilla-central/firefox-141.0a1.en-US.mac.dmg).

Comment on attachment 9491850 [details]
Bug 1966083 - Support canvas export surfaces. r?aosmond

Approved for 139.0.4

Attachment #9491850 - Flags: approval-mozilla-release? → approval-mozilla-release+
Target Milestone: --- → 141 Branch
Alias: CVE-2025-49709

Using my search from comment #22, I still haven't found any crashes in builds with this bug's patch. It's been a week since it landed on mozilla-central, and six days since it landed on beta. In the week prior to its landing there were 15 crashes. I think we can say that the patch has been verified.

Status: RESOLVED → VERIFIED

Thanks :smichaud! I agree this should be fixed now. I've also asked for feedback in the github issue linked to the bug.

See Also: → 1961479
Duplicate of this bug: 1961479

Copying crash signatures from duplicate bugs.

Crash Signature: [@ 70b25bcc-1a24-7019-05f9-973de10c450e]
Crash Signature: [@ 70b25bcc-1a24-7019-05f9-973de10c450e]
Duplicate of this bug: 1957421
Duplicate of this bug: 1963920

Copying crash signatures from duplicate bugs.

Crash Signature: [@ glgCopyRowsWithMemCopy]

Copying crash signatures from duplicate bugs.

Crash Signature: [@ glgCopyRowsWithMemCopy] → [@ glgCopyRowsWithMemCopy] [@ memcpy | angle::LoadToNative<T>] [@ memcpy_avx_ermsb_Intel | angle::LoadToNative<T>] [@ memcpy_avx_ermsb_amd | angle::LoadToNative<T>]
Duplicate of this bug: 1959138

Copying crash signatures from duplicate bugs.

Crash Signature: [@ glgCopyRowsWithMemCopy] [@ memcpy | angle::LoadToNative<T>] [@ memcpy_avx_ermsb_Intel | angle::LoadToNative<T>] [@ memcpy_avx_ermsb_amd | angle::LoadToNative<T>] → [@ glgCopyRowsWithMemCopy] [@ memcpy | angle::LoadToNative<T>] [@ memcpy_avx_ermsb_Intel | angle::LoadToNative<T>] [@ memcpy_avx_ermsb_amd | angle::LoadToNative<T>] [@ glrWriteTextureData]
Crash Signature: [@ glgCopyRowsWithMemCopy] [@ memcpy | angle::LoadToNative<T>] [@ memcpy_avx_ermsb_Intel | angle::LoadToNative<T>] [@ memcpy_avx_ermsb_amd | angle::LoadToNative<T>] [@ glrWriteTextureData] → [@ glgCopyRowsWithMemCopy] [@ memcpy | angle::LoadToNative<T>] [@ memcpy_avx_ermsb_Intel | angle::LoadToNative<T>] [@ memcpy_avx_ermsb_amd | angle::LoadToNative<T>] [@ glrWriteTextureData]

(In reply to Steven Michaud [:smichaud] (Retired) from comment #40)

For the record, there are a few crashes on builds containing this bug's patch, which have signatures (and stacks) very similar to this bug's crashes, but which don't contain the UNPACK_SKIP_PIXELS + width > UNPACK_ROW_LENGTH critical graphics error:

https://crash-stats.mozilla.org/search/?proto_signature=~raw_fTexImage2D&proto_signature=~fTexSubImage2D&build_id=%3E%3D20250603093015&graphics_critical_error=%21~UNPACK_SKIP_PIXELS%20%2B%20width%20%3E%20UNPACK_ROW_LENGTH&version=141.0a1&version=140.0b&version=139.0.4&date=%3E%3D2025-05-17T15%3A54%3A00.000Z&date=%3C2025-06-17T15%3A54%3A00.000Z&_facets=signature&_facets=platform_version&_facets=proto_signature&_facets=address&_facets=reason&_facets=platform&_sort=-date&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform_version&_columns=platform#facet-signature

Unless the number of these increases dramatically, I don't think we have anything to worry about.

All of those signatures combined cast such a wide net that it is certain you will log some crash reports with one of those. The data still shows a dramatic fall off in crashes since the patch landed.

But I don't believe those exceptions represent the issue highlighted in this bug so far, and I would regard them as separate bugs. It would be better to deal with them separately in follow-ups rather than further encumber this particular bug.

See Also: → 1979590
Whiteboard: [adv-main139.0.4+]

(In reply to Lee Salzman [:lsalzman] from comment #41)

But I don't believe those exceptions represent the issue highlighted in this bug so far, and I would regard them as separate bugs. It would be better to deal with them separately in follow-ups rather than further encumber this particular bug.

I agree they're separate, intriguingly the majority of them are on theguardian website as indicated by several user comments, so they may be reproducible and worth exploring as another bug.

See Also: → 1989317
Group: core-security-release
Duplicate of this bug: 1979590
Duplicate of this bug: 1989317

Copying crash signatures from duplicate bugs.

Crash Signature: [@ glgCopyRowsWithMemCopy] [@ memcpy | angle::LoadToNative<T>] [@ memcpy_avx_ermsb_Intel | angle::LoadToNative<T>] [@ memcpy_avx_ermsb_amd | angle::LoadToNative<T>] [@ glrWriteTextureData] → [@ glgCopyRowsWithMemCopy] [@ memcpy | angle::LoadToNative<T>] [@ memcpy_avx_ermsb_Intel | angle::LoadToNative<T>] [@ memcpy_avx_ermsb_amd | angle::LoadToNative<T>] [@ glrWriteTextureData] [@ mozilla::WebGLTexelConversions::unpack<T>]

Copying crash signatures from duplicate bugs.

Crash Signature: [@ glgCopyRowsWithMemCopy] [@ memcpy | angle::LoadToNative<T>] [@ memcpy_avx_ermsb_Intel | angle::LoadToNative<T>] [@ memcpy_avx_ermsb_amd | angle::LoadToNative<T>] [@ glrWriteTextureData] [@ mozilla::WebGLTexelConversions::unpack<T>] → [@ glgCopyRowsWithMemCopy] [@ memcpy | angle::LoadToNative<T>] [@ memcpy_avx_ermsb_Intel | angle::LoadToNative<T>] [@ memcpy_avx_ermsb_amd | angle::LoadToNative<T>] [@ glrWriteTextureData] [@ mozilla::WebGLTexelConversions::unpack<T>] [@ mozilla::WebG…
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: