Open Bug 1989317 Opened 2 months ago Updated 1 month ago

Crash in [@ mozilla::WebGLTexelConversions::unpack<T>]

Categories

(Core :: Graphics: CanvasWebGL, defect, P3)

defect

Tracking

()

Tracking Status
firefox-esr115 --- unaffected
firefox-esr140 --- unaffected
firefox143 --- wontfix
firefox144 --- affected
firefox145 --- affected

People

(Reporter: aryx, Assigned: bradwerth)

References

Details

(Keywords: crash)

Crash Data

28 crash reports for Firefox 143 branch, none for v142 branch.

Crash report: https://crash-stats.mozilla.org/report/index/420f5cd0-fedf-4fee-bd40-430450250918

Reason:

SIGSEGV / SEGV_ACCERR

Top 10 frames:

0  libxul.so  mozilla::WebGLTexelConversions::unpack<(mozilla::WebGLTexelFormat)27, unsigne...  /build/firefox/parts/firefox/build/dom/canvas/WebGLTexelConversions.h:635
0  libxul.so  mozilla::(anonymous namespace)::WebGLImageConverter::run<(mozilla::WebGLTexel...  /build/firefox/parts/firefox/build/dom/canvas/WebGLTexelConversions.cpp:222
0  libxul.so  mozilla::(anonymous namespace)::WebGLImageConverter::run<(mozilla::WebGLTexel...  /build/firefox/parts/firefox/build/dom/canvas/WebGLTexelConversions.cpp:264
0  libxul.so  mozilla::(anonymous namespace)::WebGLImageConverter::run<(mozilla::WebGLTexel...  /build/firefox/parts/firefox/build/dom/canvas/WebGLTexelConversions.cpp:284
0  libxul.so  mozilla::(anonymous namespace)::WebGLImageConverter::run<(mozilla::WebGLTexel...  /build/firefox/parts/firefox/build/dom/canvas/WebGLTexelConversions.cpp:304
1  libxul.so  mozilla::ConvertImage(unsigned long, unsigned long, void const*, unsigned lon...  /build/firefox/parts/firefox/build/dom/canvas/WebGLTexelConversions.cpp:486
2  libxul.so  mozilla::webgl::TexUnpackBlob::ConvertIfNeeded(mozilla::WebGLContext const*, ...  /build/firefox/parts/firefox/build/dom/canvas/TexUnpackBlob.cpp:463
3  libxul.so  mozilla::webgl::TexUnpackSurface::TexOrSubImage(bool, bool, mozilla::WebGLTex...  /build/firefox/parts/firefox/build/dom/canvas/TexUnpackBlob.cpp:1203
4  libxul.so  mozilla::WebGLTexture::TexImage(unsigned int, unsigned int, mozilla::avec3<un...  /build/firefox/parts/firefox/build/dom/canvas/WebGLTextureUpload.cpp:1110
5  libxul.so  mozilla::WebGLContext::TexImage(unsigned int, unsigned int, mozilla::avec3<un...  /build/firefox/parts/firefox/build/dom/canvas/WebGLContextTextures.cpp:200
Flags: needinfo?(bwerth)

Hmm... all the crashes are in BGRA8 unpack. And the error is an access error, I think, a read error. That line accesses the first src address in the array, but other accesses have already been successful.

I'll see if I can improve things here. For now, mark it with severity appropriate to a low-volume Linux crash on a specific chipset.

Assignee: nobody → bwerth
Severity: -- → S3
Flags: needinfo?(bwerth)
Priority: -- → P3

We have a variety of segfaults here, including SEGV_ACCERR and KERN_PROTECTION_FAILURE. I'm going to assume that we are dealing with a garbage source pointer, set to some value.

See Also: → 1979590

Bug 1979590 seems to be the Windows version of this. One thing to note is that a lot of bit flips show up in this, but not all. If it was truly bit flips we would not expect them to all happen on the same site, would we? That seems quite unlikely.

(In reply to Brad Werth [:bradwerth] from comment #1)

but other accesses have already been successful

Are you referring to the following (where the crash location is indicated by an arrow)?

  template <>
  MOZ_ALWAYS_INLINE void unpack<WebGLTexelFormat::BGRA8, uint8_t, uint8_t>(
      const uint8_t* __restrict src, uint8_t* __restrict dst) {
    dst[0] = src[2];
    dst[1] = src[1];
->  dst[2] = src[0];
    dst[3] = src[3];
  }

On AMD64 machines (on macOS at least), this is implemented as follows. r14 is srcRowStart (== mSrcStart). xmm0 is 0x03000102.

movd       xmm1, dword [r14 + rcx]
pshufb     xmm1, xmm0

The entire copy operation happens at once, followed by the byte shuffling. The copy operation is the first attempt to dereference the invalid pointer in r14. So in fact there have been no prior accesses.

On ARM64 machines (on macOS at least) there are four copy operations, each for a single byte. But for some reason these crashes don't happen on ARM64 machines, on macOS or Windows. They do happen on Linux, but I haven't been able to look at the Linux XUL (ARM64 or AMD64) in a disassembler. I suspect the copy operation is also unitary there.

On ARM64 machines (on macOS at least) there is also a single copy operation for four bytes at once, followed by four instructions that do the byte shuffling. For some reason this bug's crashes don't happen on ARM64 machines, on macOS or Windows. So it took me a while to find the right machine code. Here x27 is srcRowStart (== mSrcStart).

ldr        s0, [x27, x9]
ushll      v0.8h, v0.8b, #0x0
rev32      v0.4h, v0.4h
ext        v0.8b, v0.8b, v0.8b, #0x6
uzp1       v0.8b, v0.8b, v0.8b

For reference:

The following code snippet is where this bug's crashes happen. Their location is indicated by an arrow.

while (srcPtr != srcRowEnd) {
    // convert a single texel. We proceed in 4 steps: unpack the source
    // texel so the corresponding interchange format (e.g. unpack RGB565 to
    // RGBA8), do colorSpace conversion if necessary, convert the resulting
    // data type to the destination type (e.g. convert from RGBA8 to
    // RGBA32F), and finally pack the destination texel (e.g. pack RGBA32F
    // to RGB32F).
    IntermediateSrcType unpackedSrc[MaxElementsPerTexel];
    IntermediateDstType unpackedDst[MaxElementsPerTexel];

    // unpack a src texel to corresponding intermediate src format.
    // for example, unpack RGB565 to RGBA8
->  unpack<SrcFormat>(srcPtr, unpackedSrc);

    if (!sameColorSpace) {
      // do colorSpace conversion, which leaves alpha untouched
      float srcAsFloat[MaxElementsPerTexel];
      convertType(unpackedSrc, srcAsFloat);
      auto inTexelVec =
          color::vec3({srcAsFloat[0], srcAsFloat[1], srcAsFloat[2]});
      auto outTexelVec = conversion.DstFromSrc(inTexelVec);
      srcAsFloat[0] = outTexelVec[0];
      srcAsFloat[1] = outTexelVec[1];
      srcAsFloat[2] = outTexelVec[2];
      convertType(srcAsFloat, unpackedSrc);
    }

    // convert the data type to the destination type, if needed.
    // for example, convert RGBA8 to RGBA32F
    convertType(unpackedSrc, unpackedDst);
    // pack the destination texel.
    // for example, pack RGBA32F to RGB32F
    pack<DstFormat, PremultiplicationOp>(unpackedDst, dstPtr);

    srcPtr += NumElementsPerSrcTexel;
    dstPtr += NumElementsPerDstTexel;
  }
  srcRowStart += srcStrideInElements;
  dstRowStart += dstStrideInElements;
}

In today's macOS trunk nightly this is implemented on AMD64 machines as follows:

   000000000405a347         xor        eax, eax
   000000000405a349         movd       xmm0, dword [float_value_minus_32639_5+500] ; 0x862e890
   
                        loc_405a351:
   000000000405a351         mov        rdx, qword [rsi]
   000000000405a354         shl        rdx, 0x2
   000000000405a358         test       rdx, rdx
   000000000405a35b         je         loc_405a37d
   
   000000000405a35d         xor        ecx, ecx
   
                        loc_405a35f:
-> 000000000405a35f         movd       xmm1, dword [r14+rcx]
   000000000405a365         pshufb     xmm1, xmm0
   000000000405a36a         movd       dword [r15+rcx], xmm1
   000000000405a370         add        rcx, 0x4
   000000000405a374         cmp        rdx, rcx
   000000000405a377         jne        loc_405a35f
   
   000000000405a379         mov        rcx, qword [rsi+8]
   
                        loc_405a37d:
   000000000405a37d         add        r14, r12
   000000000405a380         add        r15, r13
   000000000405a383         inc        rax
   000000000405a386         cmp        rax, rcx
   000000000405a389         jb         loc_405a351

On ARM64 machines it's implemented as follows:

00000000039b7e8c         mov        x8, #0x0

                     loc_39b7e90:
00000000039b7e90         ldr        x10, [x19]
00000000039b7e94         lsl        x10, x10, #0x2
00000000039b7e98         cbz        x10, loc_39b7ec8

00000000039b7e9c         mov        x9, #0x0

                     loc_39b7ea0:
00000000039b7ea0         ldr        s0, [x27, x9]
00000000039b7ea4         ushll      v0.8h, v0.8b, #0x0
00000000039b7ea8         rev32      v0.4h, v0.4h
00000000039b7eac         ext        v0.8b, v0.8b, v0.8b, #0x6
00000000039b7eb0         uzp1       v0.8b, v0.8b, v0.8b
00000000039b7eb4         str        s0, [x28, x9]
00000000039b7eb8         add        x9, x9, #0x4
00000000039b7ebc         cmp        x10, x9
00000000039b7ec0         b.ne       loc_39b7ea0

00000000039b7ec4         ldr        x9, [x19, #0x8]

                     loc_39b7ec8:
00000000039b7ec8         add        x27, x27, x25
00000000039b7ecc         add        x28, x28, x26
00000000039b7ed0         add        x8, x8, #0x1
00000000039b7ed4         cmp        x8, x9
00000000039b7ed8         b.lo       loc_39b7e90

FF 144 was released on 2025-10-14, but has experienced none of this bug's crashes. There've been a bunch on the 144.0a1 branch, and one on the 145.0a1 branch (on Windows 10). So it's possible these crashes have been "fixed" in 144 branch (and later) releases, though not yet on the trunk.

You need to log in before you can comment on or make changes to this bug.