Closed Bug 1787959 Opened 9 months ago Closed 9 months ago

crash in [@ DispatchCommand]

Categories

(Core :: Graphics: CanvasWebGL, defect)

defect

Tracking

()

RESOLVED FIXED
106 Branch
Tracking Status
firefox-esr91 --- disabled
firefox-esr102 --- disabled
firefox104 --- disabled
firefox105 --- disabled
firefox106 --- fixed

People

(Reporter: tsmith, Assigned: lsalzman)

References

(Blocks 1 open bug, Regression)

Details

(4 keywords, Whiteboard: [bugmon:bisected,confirmed][post-critsmash-triage])

Attachments

(4 files, 2 obsolete files)

Attached file testcase.html (obsolete) —

Found while fuzzing m-c 20220829-ad01d1ce5556 (--enable-debug --enable-fuzzing)

To reproduce via Grizzly Replay:

$ pip install fuzzfetch grizzly-framework
$ python -m fuzzfetch -d --fuzzing -n firefox
$ python -m grizzly.replay ./firefox/firefox testcase.html --xvfb

I have only managed to reproduce this crash on a fuzzing debug build. I tried with a fuzzing debug build with all optimization disabled to get a rr trace but I was unable to trigger the issue.

The stack trace output by the UBSan does not contain any entries

==15373==ERROR: UndefinedBehaviorSanitizer: SEGV on unknown address 0x7f58ff000000 (pc 0x7f58ff000000 bp 0x7f586c0b5d60 sp 0x7f586c0b5ce0 T15455)
==15373==The signal is caused by a READ memory access.

While reducing the crash I did see

==161106==ERROR: UndefinedBehaviorSanitizer: ILL on unknown address 0x7f4cff00006b (pc 0x7f4cff00006b bp 0x7f4cac08ad60 sp 0x7f4cac08ace0 T161153)
    #0 0x7f4cff00006b  (/home/user/workspace/browsers/m-c-20220829094551-fuzzing-debug/libxul.so+0xeb4406b) (BuildId: 60b5d7c9cd5810ce8ca90721bfc74fce76e23c30)

The attached stack is from the Pernosco from a rr trace I collected with a optimized debug build.

The Pernosco session is available here: https://pernos.co/debug/vJiv1NU7XCucv93B0tXcJw/index.html

start_thread () at pthread_create.c:463
_pt_root () at ptthread.c:201
ThreadFunc () at nsThread.cpp:384
Run () at message_loop.cc:356
RunHandler () at message_loop.cc:374
RunInternal () at message_loop.cc:381
Run () at MessagePump.cpp:300
NS_ProcessNextEvent () at nsThreadUtils.cpp:465
ProcessNextEvent () at nsThread.cpp:1199
Run () at MessageChannel.cpp:1578
RunMessage () at MessageChannel.cpp:1480
DispatchMessage () at MessageChannel.cpp:1680
DispatchAsyncMessage () at MessageChannel.cpp:1755
OnMessageReceived () at PCanvasManagerParent.cpp:214
OnMessageReceived () at PWebGLParent.cpp:243
RecvDispatchCommands () at WebGLParent.cpp:64
DispatchCommand<mozilla::HostWebGLContext> () at WebGLCommandQueue.h:251
DispatchCommand<mozilla::HostWebGLContext> () at WebGLCommandQueue.h:251
DispatchCommand<mozilla::HostWebGLContext> () at WebGLCommandQueue.h:251
DispatchCommand<mozilla::HostWebGLContext> () at WebGLCommandQueue.h:251
DispatchCommand<mozilla::HostWebGLContext> () at WebGLCommandQueue.h:251
DispatchCommand<mozilla::HostWebGLContext> () at WebGLCommandQueue.h:251
DispatchCommand<mozilla::HostWebGLContext> () at WebGLCommandQueue.h:251
DispatchCommand<mozilla::HostWebGLContext> () at WebGLCommandQueue.h:251
DispatchCommand<mozilla::HostWebGLContext> () at WebGLCommandQueue.h:251
DispatchCommand<mozilla::HostWebGLContext> () at WebGLCommandQueue.h:251
DispatchCommand<mozilla::HostWebGLContext> () at WebGLCommandQueue.h:251
DispatchCommand<mozilla::HostWebGLContext> () at WebGLCommandQueue.h:251
DispatchCommand<mozilla::HostWebGLContext> () at WebGLCommandQueue.h:251
DispatchCommand<mozilla::HostWebGLContext> () at WebGLCommandQueue.h:251
DispatchCommand<mozilla::HostWebGLContext> () at WebGLCommandQueue.h:251
DispatchCommand<mozilla::HostWebGLContext> () at WebGLCommandQueue.h:251
DispatchCommand<mozilla::HostWebGLContext> () at WebGLCommandQueue.h:251
DispatchCommand<mozilla::HostWebGLContext> () at WebGLCommandQueue.h:251
DispatchCommand<mozilla::HostWebGLContext> () at WebGLCommandQueue.h:251
DispatchCommand<mozilla::HostWebGLContext> () at WebGLCommandQueue.h:251
DispatchCommand<mozilla::HostWebGLContext> () at WebGLCommandQueue.h:251
DispatchCommand<mozilla::HostWebGLContext> () at WebGLCommandQueue.h:251
DispatchCommand<mozilla::HostWebGLContext> () at WebGLCommandQueue.h:251
DispatchCommand<mozilla::HostWebGLContext> () at WebGLCommandQueue.h:251
0x3fbd4273cc80+3163304832
Flags: in-testsuite?

Bugmon Analysis
Verified bug as reproducible on mozilla-central 20220829214526-dfe1912f7a46.
The bug appears to have been introduced in the following build range:

Start: 1e32960788ea12bea041aaab3d46c7866f8fe87c (20220712093327)
End: 26d422c0f19648d56e3310bb1aedb8ed1bbe6c09 (20220712074830)
Pushlog: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=1e32960788ea12bea041aaab3d46c7866f8fe87c&tochange=26d422c0f19648d56e3310bb1aedb8ed1bbe6c09

Keywords: regression
Whiteboard: [bugmon:bisected,confirmed]

Lee, could this be a regression from one of your patches in the regression range in comment 1? Thanks.

Flags: needinfo?(lsalzman)

The stack trace, even in the pernosco session, seems pretty bare here and doesn't tell me much. Is there any way to get a better trace, Tyson?

Flags: needinfo?(lsalzman) → needinfo?(twsmith)

My patches only were focused on accelerated canvas2d. The attached testcase seems to be specifically using WebGL and not Canvas2D at all, so I would guess whatever crash is here predates my patches.

Superficially it seems like the crash is happening in the DoReadPixels path in WebGL (just by inspection of the testcase), but the stack trace doesn't confirm this at all because there's not enough debug info in it. My patches definitely don't touch that at all.

(In reply to Lee Salzman [:lsalzman] from comment #3)

The stack trace, even in the pernosco session, seems pretty bare here and doesn't tell me much. Is there any way to get a better trace, Tyson?

Sure I'll see what I can do. Seems like by the time we crash things are pretty messed up. I'm not sure why ASan does not detect this.

Is there extra logging I can enable that might also help point us in the right direction?

Attached file sanitizer_stack.txt (obsolete) —
Attached file stderr.txt

I was able to get a sanitizer stack by setting UBSAN_OPTIONS=fast_unwind_on_fatal=true in the environment but it doesn't look much better. I also added the stderr output.

I'm not sure if this is helpful but the test case does not trigger the issue if pref webgl.out-of-process=false is set.

Without better traceback information this bug isn't really actionable. At least the circumstantial evidence (with respect to 'readPixels') casts doubt on bugmon's regression range. This bug doesn't seem to repro outside of that debug build and the pernosco session won't let me inspect anything. I'm not having any luck reproducing this in my local builds to try to work around that, so unless I run into some luck with a couple more builds, I think we're going to be stalled on this one without a more reliable testcase.

Flags: needinfo?(twsmith)

Initially it seemed like this might be thread stack size related in the CanvasRenderer thread, and I tried to reproduce it in local builds by modifying the stack size up or down, but I had no luck with that route of exploration. So I am basically stumped on how to reproduce this locally at all. Given that this doesn't effect release/opt builds seemingly, or even my local debug builds either, I am going to put this on the backburner until better evidence comes along.

Sotaro, is there any chance this crash might be related to the move to the CanvasRenderer thread?

Flags: needinfo?(sotaro.ikeda.g)
Attached file sanitizer_stack.txt
Attachment #9292356 - Attachment is obsolete: true
Attached file testcase.html

jkratzer suggested I try changing the "y" parameter in the readPixels() call. This lead to a better stack trace and I was able to reproduce the issue in a no-opt build.

I will have an updated Pernosco trace shortly.

Attachment #9292144 - Attachment is obsolete: true

A Pernosco session is available here: https://pernos.co/debug/HObA9_WX-iSaoN5cuky3HQ/index.html

Sorry for the delay there was a hiccup on the Pernosco side.

It seems like this DEBUG-only code here is the culprit: https://searchfox.org/mozilla-central/source/dom/canvas/WebGLContext.cpp#941

The GL packing state leaks in, causing it to write to the last offsets specified by the packing state. Since the buffer it was writing to was on the stack, when it messed up, it overwrote the stack and screwed up all the stack traces we were seeing. For now I have a tentative fix that just resets the packing state before the call to fReadPixels.

This debug code was introduced in bug 1427668.

Release and opt builds are unaffected though since this code is debug-only, so it shouldn't represent a threat to normal users, which is really good news.

Assignee: nobody → lsalzman
Status: NEW → ASSIGNED
Keywords: sec-other

Comment on attachment 9292763 [details]
Bug 1787959 - Reset packing state before reading pixels. r?jgilbert

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: It seems feasible one could at least make an exploit that writes up to 4 bytes of junk (but only 4 bytes) near the top of the stack. However, this code is only enabled in debug builds, so there is no way to write an exploit for release or optimized builds at all.
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: Unknown
  • Which older supported branches are affected by this flaw?: 91+
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: Yes
  • If not, how different, hard to create, and risky will they be?:
  • How likely is this patch to cause regressions; how much testing does it need?: Unlikely to cause regressions.
  • Is Android affected?: Unknown
Attachment #9292763 - Flags: sec-approval?

Comment on attachment 9292763 [details]
Bug 1787959 - Reset packing state before reading pixels. r?jgilbert

sec-other bugs don't need sec-approval. You can go ahead and land it whenever.

Attachment #9292763 - Flags: sec-approval?

(sec-moderate also doesn't need sec-approval)

Flags: needinfo?(sotaro.ikeda.g)
Group: gfx-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → 106 Branch

Bugmon Analysis
Unable to reproduce bug 1787959 using build mozilla-central 20220829094551-ad01d1ce5556. Without a baseline, bugmon is unable to analyze this bug.
Removing bugmon keyword as no further action possible. Please review the bug and re-add the keyword for further analysis.

Keywords: bugmon
Flags: qe-verify-
Whiteboard: [bugmon:bisected,confirmed] → [bugmon:bisected,confirmed][post-critsmash-triage]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.