Closed Bug 1837686 (CVE-2023-4046) Opened 2 years ago Closed 2 years ago

Call instruction in try is ignored by alias analysis

Categories

(Core :: JavaScript: WebAssembly, defect, P1)

Firefox 114
defect

Tracking

()

VERIFIED FIXED
117 Branch
Tracking Status
firefox-esr102 116+ verified
firefox-esr115 116+ verified
firefox114 --- wontfix
firefox115 --- wontfix
firefox116 + verified
firefox117 + verified

People

(Reporter: caiiiycuk, Assigned: jseward, NeedInfo)

References

(Blocks 1 open bug)

Details

(Keywords: csectype-jit, reporter-external, sec-high, Whiteboard: [adv-main116+] [adv-ESR115.1+] [adv-ESR102.14+])

Attachments

(6 files, 1 obsolete file)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0

Steps to reproduce:

  1. Open https://dos.zone/digger-may-06-1999/
  2. Select emulation backend dosboxX
  3. Press Play button

Actual results:

The game stuck at the beginning, with exception:
RuntimeError: unreachable executed

Expected results:

The game should work fine.

--
I am the creator of the js-dos project, and version 8 is almost ready to be released. However, Firefox is the only browser that does not work properly with the dosbox-x backend. The dosbox-x backend utilizes a new exception-handling feature of WebAssembly.

I cannot delay the release for an extended period of time. I can provide any additional information that you may need. It would be unfortunate if, in the end, I am forced to state that Firefox is not supported.

The Bugbug bot thinks this bug should belong to the 'Core::Audio/Video: Playback' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → Audio/Video: Playback
Product: Firefox → Core
Component: Audio/Video: Playback → JavaScript: WebAssembly
Severity: -- → S2
Priority: -- → P3

Using D124177, I bisected and found the failing functions "1502 3600 5567 5953 6956 7281 8254". The program fails only in the Ion, works in the Baseline.

I cannot find any wasm exceptions called/used. The mentioned above functions have bunch of nested try (with catch, catch_all and delegate). If exceptions are not invoked, looks like something breaks in Ion when bunch of nested try's are used.

(In reply to Yury Delendik (:yury) from comment #5)

I cannot find any wasm exceptions called/used. The mentioned above functions have bunch of nested try (with catch, catch_all and delegate). If exceptions are not invoked, looks like something breaks in Ion when bunch of nested try's are used.

Yeah, exceptions are typically not thrown for a digger. They are mainly used when emulating Win 95/98.

If you'd like, I can provide a build with full function names.

(In reply to caiiiycuk from comment #7)

If you'd like, I can provide a build with full function names.

It might help with finding exact place in the source code. Rebuilding the code might change the function indices (that is the reason I attached a wat file to lookup/study them)

Looking at the machine code produced by Ion, the global.get value is optimized, and spilled into local stack slot during the $f0 call, after the call, the value from stack slot is used (instead of reading the global again).

(In reply to Yury Delendik (:yury) from comment #10)

Looking at the machine code produced by Ion, the global.get value is
optimized, and spilled into local stack slot during the $f0 call, after
the call, the value from stack slot is used (instead of reading the global again).

Before/after of MIR optimisation of the test case. Two MWasmLoadInstanceDataField
nodes have been GVN'd together (3 and 14), despite a call being between them.

Marking this security-sensitive because (after discussing with Julian on Matrix) this is looking like an alias analysis bug.

Group: javascript-core-security

After some investigation and with considerable hints from Jan, the following
seems to be the case:

  • MWasmLoadInstanceDataField::congruent is correct, and its alias set
    annotations are correct too. Also, MWasmCall{Catchable,Uncatchable} has
    (by default) correct alias set annotations.

  • The problem appears to be caused by an instruction-processing loop [1] in
    the middle of the alias analyser.

    It skips the last insn in every block -- presumably to save time, on the
    basis that no control insn will access memory. But the introduction of
    MWasmCallCatchable as a control insn breaks that assumption. For the test
    case listing in comment 10, this causes it to ignore

    11 = None.wasmcallcatchable 2
    

    hence not invalidating the first MWasmLoadInstanceDataField, and so
    causing it to be incorrectly used as a replacement for the second one.

  • Allowing the loop to visit the last insn at least fixes the comment 10
    test case. Tentative patch to follow; not suitable for landing and needs
    more testing.

[1] https://searchfox.org/mozilla-central/source/js/src/jit/AliasAnalysis.cpp#190-192

A possible fix, as described in comment 13.

Assignee: nobody → jseward
Assignee: jseward → nobody

The patch fixes the game in comment 0

(In reply to Yury Delendik (:yury) from comment #15)

The patch fixes the game in comment 0

Can you also quick check if it fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1814841

Attachment #9339655 - Attachment is obsolete: true

Fixing this is a bit tricky. Here's a summary:

  • The alias analyser ignores the last instruction in every block:
    https://searchfox.org/mozilla-central/source/js/src/jit/AliasAnalysis.cpp#191

  • Until now, this has been OK because no block-ending instruction reads or
    writes memory (except for MTableSwitch's access to its jump table, but that
    is private readonly data, so it doesn't count.)

  • However, wasm exceptions introduced MWasmCallCatchable as a block terminator
    instruction. This (because it calls elsewhere) can modify memory. However,
    the alias analyser ignores it, leading to this bug and also bug 1814841.

  • An obvious fix is to make the alias analyser also process the last insn in a
    block. This fixes both bugs.

  • However, it leads to the potential danger of regressing performance, by
    causing more GVN/LICM invalidations than would previously have happened.
    The only way that would not happen is that all block-terminating
    instructions except MWasmCallCatchable have an empty alias set.

  • But that's not the case. By default, instructions have a writes-everything
    alias set:
    https://searchfox.org/mozilla-central/source/js/src/jit/MIR.h#880
    and this is not overridden for MWasmReturnVoid, MWasmReturn and MTableSwitch.

  • Hence, only modifying the alias analyser loop might cause a perf regression.
    In order to avoid that, we need to also change MWasmReturnVoid, MWasmReturn
    and MTableSwitch to return an empty alias set.

  • I am thinking to create two patches, in the usual way for sec bugs:

    • a minimal one that modifies the loop and changes MWasmReturnVoid,
      MWasmReturn and MTableSwitch to return an empty alias set. This is the
      minimal change that fixes the problem and doesn't cause perf regressions.

    • a later one, with test case, comments, and assertions.

Comments? Does this seem correct? Is there a less risky way to fix this?

Considering that the 116 train is due to leave on 3 July, and that this is a
somewhat important bug, we don't have much time.

Flags: needinfo?(rhunt)
Flags: needinfo?(jdemooij)

(In reply to Julian Seward [:jseward] from comment #17)

Comments? Does this seem correct? Is there a less risky way to fix this?

Thanks for the analysis. This looks correct to me.

I was thinking a slightly simpler fix would be to check isWasmCallCatchable() and only process those control instructions, and then in a follow-up bug handle the other control instructions too. However I don't think that's actually simpler because (1) you already did this analysis and identified the set of control instructions that we want to change the alias set of, and (2) to improve test coverage it would probably be better to process all control instructions to help us find issues there.

Flags: needinfo?(jdemooij)
Flags: needinfo?(rhunt)
Summary: WebAssembly with does not work as expected → Call instruction in try is ignored by alias analysis

I agree with Jan here, it seems like you've done the right analysis and fixing it this way seems correct to me.

Assignee: nobody → jseward
Priority: P3 → P1

This patch adds

  • a test case (valiantly extracted by :yury from a much larger test case)

  • assertions and discussion, just prior to the offending loop, in the alias
    analyser

  • a comment about the alias set marking for MTableSwitch

  • a getExtras clause for MWasmLoadInstanceDataField, to improve debug printing
    of such nodes

Depends on D182324

Comment on attachment 9341479 [details]
Bug 1837686. r=jandem,rhunt.

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: Note, following recommended procedure, the fix is presented as two patches: a minimal, no-comment fix patch, and a for-later followup patch that contains a test case, comments and assertions. The following answers relate to the minimal fix only.

Creating an exploit based on the patch would be, at a minimum, difficult. I can't see how to do it. Readers of the patch would be able to infer "has something to do with alias analysis and probably with wasm", but that's about all.

Even if adversaries knew what problem was fixed and had the test case, it's hard to see how to exploit that. The bug causes miscompilation of wasm under obscure circumstances, by causing an older value of a global variable in a routine to possibly be mistakenly used instead of the up to date value. JS is unaffected by the bug. There's no possible type confusion (at the level of the wasm type system) and I don't see how it could cause a wasm sandbox escape. More likely is that the wasm code either fails to work properly (we have two reports of this) and/or perhaps segfaults [the content process].

  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: 114, 115esr
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: No
  • If not, how different, hard to create, and risky will they be?: Should be easy and low-risk; the fix itself is nearly a one-liner, and applies to code that hasn't changed much in a long time.
  • How likely is this patch to cause regressions; how much testing does it need?: If the patch is incorrect, in the worst case it could cause a slowdown for JS, but not miscompilation. However it has been carefully analysed and tested, both for stability and perf regression, and no problems have been found.
  • Is Android affected?: Yes
Attachment #9341479 - Flags: sec-approval?
Attachment #9341480 - Flags: sec-approval?

The bug has a release status flag that shows some version of Firefox is affected, thus it will be considered confirmed.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Status: NEW → ASSIGNED

Comment on attachment 9341479 [details]
Bug 1837686. r=jandem,rhunt.

Approved to land and uplift

Attachment #9341479 - Flags: sec-approval? → sec-approval+

Comment on attachment 9341480 [details]
Bug 1837686 - followup test case, assertions and comments. r=jandem,rhunt.

Clearing flag on test

Attachment #9341480 - Flags: sec-approval?
Whiteboard: [reminder-test 2023-09-12]

Just want to double check on ESR 102

From discussing with Julian Seward, it sounds like you were requesting this land in Nightly around Beta 8 for 116 (July 20th). Do I have that right? Would we be able to uplift it to 116 at that point?

Flags: needinfo?(fbraun)

Tom, earlier conversation suggested that we prefer to land more closely to the actual release.
Did you consider this worthy of landing earlier for any specific reason?

I suppose the risk is low given the patch (which seems potentially obscure enough - especially in comparison to a typical "raw pointer to smart pointer" diff that makes it obvious to every reader).

Flags: needinfo?(fbraun) → needinfo?(tom)

sec-approval is permission to land, no one has to land it immediately after requesting it, they're just not supposed to land it before - and I don't want to leave unanswered requests sitting in the queue either.

Flags: needinfo?(tom)

(In reply to Frederik Braun [:freddy] from comment #28)

earlier conversation suggested that we prefer to land more closely to the actual release.

I see and understand the reasons for that. From our side, I think it's fair to say we would welcome the opportunity to land it slightly earlier, if that was agreeable to all parties, on the following basis:

  • this fix is right in the middle of IonMonkey's optimisation pipeline, and while I think it is OK, there is a certain nervousness about it on our side. Clearly, breaking JS in some subtle way would be catastrophic. Hence the desire for more rather than less "baking time".

  • and on the other hand, per comment 22, it is hard to see how exposure of the patch could lead to construction of an exploit.

Can we please confirm the status of ESR102 too?

Yes, 102.13.0esr is also affected. I tested it on Windows x64.

Group: javascript-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 117 Branch

The patch landed in nightly and beta is affected.
:jseward, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox116 to wontfix.

For more information, please visit BugBot documentation.

Flags: needinfo?(jseward)

Comment on attachment 9341479 [details]
Bug 1837686. r=jandem,rhunt.

Beta/Release Uplift Approval Request

  • User impact if declined: Potential incorrect compilation and execution of wasm code that uses the wasm "exceptions" feature. That may possible include Adobe PhotoShop (the wasm version of it).
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): In principle it is a bit risky because the fix also affects optimised compilation of JS in general. In practice I'd say it is low risk because:
  • the patch is very small and we believe we understand the consequences of it
  • it has been on nightly about 3 days and no regressions reported yet

re testing, the existing test suite does not cover the failing case. There is a second patch in this bug which adds a test case, but per normal sec- protocol we will hold off landing it for some months.

  • String changes made/needed: none
  • Is Android affected?: Yes
Flags: needinfo?(jseward)
Attachment #9341479 - Flags: approval-mozilla-beta?

Comment on attachment 9341479 [details]
Bug 1837686. r=jandem,rhunt.

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: It's sec:high.
  • User impact if declined: Potential incorrect compilation and execution of wasm code that uses the wasm "exceptions" feature. That may possible include Adobe PhotoShop (the wasm version of it).
  • Fix Landed on Version: 117
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): In principle it is a bit risky because the fix also affects optimised compilation of JS in general. In practice I'd say it is low risk because:
  • the patch is very small and we believe we understand the consequences of it
  • it has been on nightly about 3 days and no regressions reported yet
Attachment #9341479 - Flags: approval-mozilla-esr115?

Comment on attachment 9341479 [details]
Bug 1837686. r=jandem,rhunt.

Approved for 116.0b4

Attachment #9341479 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: qe-verify+
QA Whiteboard: [qa-triaged]

Comment on attachment 9341479 [details]
Bug 1837686. r=jandem,rhunt.

Approved for 115.1esr and 102.14esr.

Attachment #9341479 - Flags: approval-mozilla-esr115?
Attachment #9341479 - Flags: approval-mozilla-esr115+
Attachment #9341479 - Flags: approval-mozilla-esr102+

Reproduced this issue using an old Nightly build from 2023-07-01, verified that the game can be played with emulation backend dosboxX on Firefox 102esr, 115esr, 116.0b5 and Latest Nightly 117.0a1 across platforms (Windows 10, macOS 13 and Ubuntu 22.04).

Whiteboard: [reminder-test 2023-09-12] → [reminder-test 2023-09-12] [adv-main116+] [adv-ESR115.1+] [adv-ESR102.14+]

2 months ago, tjr placed a reminder on the bug using the whiteboard tag [reminder-test 2023-09-12] .

jseward, please refer to the original comment to better understand the reason for the reminder.

Flags: needinfo?(jseward)
Whiteboard: [reminder-test 2023-09-12] [adv-main116+] [adv-ESR115.1+] [adv-ESR102.14+] → [adv-main116+] [adv-ESR115.1+] [adv-ESR102.14+]
Group: core-security-release
Alias: CVE-2023-4046
Pushed by jseward@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/e46d56c4dc3a followup test case, assertions and comments. r=jandem,rhunt.

Sorry for the burst of bugspam: filter on tinkling-glitter-filtrate
Adding reporter-external keyword to security bugs found by non-employees for accounting reasons

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: