Call instruction in try is ignored by alias analysis
Categories
(Core :: JavaScript: WebAssembly, defect, P1)
Tracking
()
People
(Reporter: caiiiycuk, Assigned: jseward, NeedInfo)
References
(Blocks 1 open bug)
Details
(Keywords: csectype-jit, reporter-external, sec-high, Whiteboard: [adv-main116+] [adv-ESR115.1+] [adv-ESR102.14+])
Attachments
(6 files, 1 obsolete file)
763 bytes,
application/x-javascript
|
Details | |
4.04 KB,
text/plain
|
Details | |
2.69 KB,
patch
|
Details | Diff | Splinter Review | |
48 bytes,
text/x-phabricator-request
|
diannaS
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-esr102+
RyanVM
:
approval-mozilla-esr115+
tjr
:
sec-approval+
|
Details | Review |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
267 bytes,
text/plain
|
Details |
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:109.0) Gecko/20100101 Firefox/114.0
Steps to reproduce:
- Open https://dos.zone/digger-may-06-1999/
- Select emulation backend dosboxX
- Press Play button
Actual results:
The game stuck at the beginning, with exception:
RuntimeError: unreachable executed
Expected results:
The game should work fine.
--
I am the creator of the js-dos project, and version 8 is almost ready to be released. However, Firefox is the only browser that does not work properly with the dosbox-x backend. The dosbox-x backend utilizes a new exception-handling feature of WebAssembly.
I cannot delay the release for an extended period of time. I can provide any additional information that you may need. It would be unfortunate if, in the end, I am forced to state that Firefox is not supported.
Comment 1•2 years ago
|
||
The Bugbug bot thinks this bug should belong to the 'Core::Audio/Video: Playback' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.
Possible duplicate ot this https://bugzilla.mozilla.org/show_bug.cgi?id=1814841.
Updated•2 years ago
|
Comment 3•2 years ago
|
||
Comment 4•2 years ago
|
||
Using D124177, I bisected and found the failing functions "1502 3600 5567 5953 6956 7281 8254". The program fails only in the Ion, works in the Baseline.
Comment 5•2 years ago
|
||
I cannot find any wasm exceptions called/used. The mentioned above functions have bunch of nested try (with catch
, catch_all
and delegate
). If exceptions are not invoked, looks like something breaks in Ion when bunch of nested try's are used.
(In reply to Yury Delendik (:yury) from comment #5)
I cannot find any wasm exceptions called/used. The mentioned above functions have bunch of nested try (with
catch
,catch_all
anddelegate
). If exceptions are not invoked, looks like something breaks in Ion when bunch of nested try's are used.
Yeah, exceptions are typically not thrown for a digger. They are mainly used when emulating Win 95/98.
If you'd like, I can provide a build with full function names.
Comment 8•2 years ago
|
||
(In reply to caiiiycuk from comment #7)
If you'd like, I can provide a build with full function names.
It might help with finding exact place in the source code. Rebuilding the code might change the function indices (that is the reason I attached a wat file to lookup/study them)
Comment 9•2 years ago
|
||
Comment 10•2 years ago
•
|
||
Looking at the machine code produced by Ion, the global.get
value is optimized, and spilled into local stack slot during the $f0
call, after the call, the value from stack slot is used (instead of reading the global again).
Assignee | ||
Comment 11•2 years ago
|
||
(In reply to Yury Delendik (:yury) from comment #10)
Looking at the machine code produced by Ion, the
global.get
value is
optimized, and spilled into local stack slot during the$f0
call, after
the call, the value from stack slot is used (instead of reading the global again).
Before/after of MIR optimisation of the test case. Two MWasmLoadInstanceDataField
nodes have been GVN'd together (3 and 14), despite a call being between them.
Comment 12•2 years ago
|
||
Marking this security-sensitive because (after discussing with Julian on Matrix) this is looking like an alias analysis bug.
Assignee | ||
Comment 13•2 years ago
•
|
||
After some investigation and with considerable hints from Jan, the following
seems to be the case:
-
MWasmLoadInstanceDataField::congruent is correct, and its alias set
annotations are correct too. Also, MWasmCall{Catchable,Uncatchable} has
(by default) correct alias set annotations. -
The problem appears to be caused by an instruction-processing loop [1] in
the middle of the alias analyser.It skips the last insn in every block -- presumably to save time, on the
basis that no control insn will access memory. But the introduction of
MWasmCallCatchable as a control insn breaks that assumption. For the test
case listing in comment 10, this causes it to ignore11 = None.wasmcallcatchable 2
hence not invalidating the first MWasmLoadInstanceDataField, and so
causing it to be incorrectly used as a replacement for the second one. -
Allowing the loop to visit the last insn at least fixes the comment 10
test case. Tentative patch to follow; not suitable for landing and needs
more testing.
[1] https://searchfox.org/mozilla-central/source/js/src/jit/AliasAnalysis.cpp#190-192
Assignee | ||
Comment 14•2 years ago
|
||
A possible fix, as described in comment 13.
Assignee | ||
Updated•2 years ago
|
Comment 15•2 years ago
|
||
The patch fixes the game in comment 0
Reporter | ||
Comment 16•2 years ago
|
||
(In reply to Yury Delendik (:yury) from comment #15)
The patch fixes the game in comment 0
Can you also quick check if it fixes https://bugzilla.mozilla.org/show_bug.cgi?id=1814841
Updated•2 years ago
|
Updated•2 years ago
|
Assignee | ||
Comment 17•2 years ago
|
||
Fixing this is a bit tricky. Here's a summary:
-
The alias analyser ignores the last instruction in every block:
https://searchfox.org/mozilla-central/source/js/src/jit/AliasAnalysis.cpp#191 -
Until now, this has been OK because no block-ending instruction reads or
writes memory (except for MTableSwitch's access to its jump table, but that
is private readonly data, so it doesn't count.) -
However, wasm exceptions introduced MWasmCallCatchable as a block terminator
instruction. This (because it calls elsewhere) can modify memory. However,
the alias analyser ignores it, leading to this bug and also bug 1814841. -
An obvious fix is to make the alias analyser also process the last insn in a
block. This fixes both bugs. -
However, it leads to the potential danger of regressing performance, by
causing more GVN/LICM invalidations than would previously have happened.
The only way that would not happen is that all block-terminating
instructions except MWasmCallCatchable have an empty alias set. -
But that's not the case. By default, instructions have a writes-everything
alias set:
https://searchfox.org/mozilla-central/source/js/src/jit/MIR.h#880
and this is not overridden for MWasmReturnVoid, MWasmReturn and MTableSwitch. -
Hence, only modifying the alias analyser loop might cause a perf regression.
In order to avoid that, we need to also change MWasmReturnVoid, MWasmReturn
and MTableSwitch to return an empty alias set. -
I am thinking to create two patches, in the usual way for sec bugs:
-
a minimal one that modifies the loop and changes MWasmReturnVoid,
MWasmReturn and MTableSwitch to return an empty alias set. This is the
minimal change that fixes the problem and doesn't cause perf regressions. -
a later one, with test case, comments, and assertions.
-
Comments? Does this seem correct? Is there a less risky way to fix this?
Considering that the 116 train is due to leave on 3 July, and that this is a
somewhat important bug, we don't have much time.
Comment 18•2 years ago
|
||
(In reply to Julian Seward [:jseward] from comment #17)
Comments? Does this seem correct? Is there a less risky way to fix this?
Thanks for the analysis. This looks correct to me.
I was thinking a slightly simpler fix would be to check isWasmCallCatchable()
and only process those control instructions, and then in a follow-up bug handle the other control instructions too. However I don't think that's actually simpler because (1) you already did this analysis and identified the set of control instructions that we want to change the alias set of, and (2) to improve test coverage it would probably be better to process all control instructions to help us find issues there.
Updated•2 years ago
|
Comment 19•2 years ago
|
||
I agree with Jan here, it seems like you've done the right analysis and fixing it this way seems correct to me.
Assignee | ||
Updated•2 years ago
|
Updated•2 years ago
|
Assignee | ||
Comment 20•2 years ago
|
||
Assignee | ||
Comment 21•2 years ago
|
||
This patch adds
-
a test case (valiantly extracted by :yury from a much larger test case)
-
assertions and discussion, just prior to the offending loop, in the alias
analyser -
a comment about the alias set marking for MTableSwitch
-
a getExtras clause for MWasmLoadInstanceDataField, to improve debug printing
of such nodes
Depends on D182324
Assignee | ||
Comment 22•2 years ago
|
||
Comment on attachment 9341479 [details]
Bug 1837686. r=jandem,rhunt.
Security Approval Request
- How easily could an exploit be constructed based on the patch?: Note, following recommended procedure, the fix is presented as two patches: a minimal, no-comment fix patch, and a for-later followup patch that contains a test case, comments and assertions. The following answers relate to the minimal fix only.
Creating an exploit based on the patch would be, at a minimum, difficult. I can't see how to do it. Readers of the patch would be able to infer "has something to do with alias analysis and probably with wasm", but that's about all.
Even if adversaries knew what problem was fixed and had the test case, it's hard to see how to exploit that. The bug causes miscompilation of wasm under obscure circumstances, by causing an older value of a global variable in a routine to possibly be mistakenly used instead of the up to date value. JS is unaffected by the bug. There's no possible type confusion (at the level of the wasm type system) and I don't see how it could cause a wasm sandbox escape. More likely is that the wasm code either fails to work properly (we have two reports of this) and/or perhaps segfaults [the content process].
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which older supported branches are affected by this flaw?: 114, 115esr
- If not all supported branches, which bug introduced the flaw?: None
- Do you have backports for the affected branches?: No
- If not, how different, hard to create, and risky will they be?: Should be easy and low-risk; the fix itself is nearly a one-liner, and applies to code that hasn't changed much in a long time.
- How likely is this patch to cause regressions; how much testing does it need?: If the patch is incorrect, in the worst case it could cause a slowdown for JS, but not miscompilation. However it has been carefully analysed and tested, both for stability and perf regression, and no problems have been found.
- Is Android affected?: Yes
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Updated•2 years ago
|
Comment 23•2 years ago
|
||
The bug has a release status flag that shows some version of Firefox is affected, thus it will be considered confirmed.
Updated•2 years ago
|
Comment 24•2 years ago
|
||
Comment on attachment 9341479 [details]
Bug 1837686. r=jandem,rhunt.
Approved to land and uplift
Comment 25•2 years ago
|
||
Comment on attachment 9341480 [details]
Bug 1837686 - followup test case, assertions and comments. r=jandem,rhunt.
Clearing flag on test
Updated•2 years ago
|
Comment 26•2 years ago
|
||
Just want to double check on ESR 102
Comment 27•2 years ago
|
||
From discussing with Julian Seward, it sounds like you were requesting this land in Nightly around Beta 8 for 116 (July 20th). Do I have that right? Would we be able to uplift it to 116 at that point?
Updated•2 years ago
|
Comment 28•2 years ago
|
||
Tom, earlier conversation suggested that we prefer to land more closely to the actual release.
Did you consider this worthy of landing earlier for any specific reason?
I suppose the risk is low given the patch (which seems potentially obscure enough - especially in comparison to a typical "raw pointer to smart pointer" diff that makes it obvious to every reader).
Comment 29•2 years ago
|
||
sec-approval is permission to land, no one has to land it immediately after requesting it, they're just not supposed to land it before - and I don't want to leave unanswered requests sitting in the queue either.
Assignee | ||
Comment 30•2 years ago
|
||
(In reply to Frederik Braun [:freddy] from comment #28)
earlier conversation suggested that we prefer to land more closely to the actual release.
I see and understand the reasons for that. From our side, I think it's fair to say we would welcome the opportunity to land it slightly earlier, if that was agreeable to all parties, on the following basis:
-
this fix is right in the middle of IonMonkey's optimisation pipeline, and while I think it is OK, there is a certain nervousness about it on our side. Clearly, breaking JS in some subtle way would be catastrophic. Hence the desire for more rather than less "baking time".
-
and on the other hand, per comment 22, it is hard to see how exposure of the patch could lead to construction of an exploit.
Comment 31•2 years ago
|
||
Can we please confirm the status of ESR102 too?
Assignee | ||
Comment 32•2 years ago
|
||
Yes, 102.13.0esr is also affected. I tested it on Windows x64.
Updated•2 years ago
|
Comment 33•2 years ago
|
||
Comment 34•2 years ago
|
||
Comment 35•2 years ago
|
||
The patch landed in nightly and beta is affected.
:jseward, is this bug important enough to require an uplift?
- If yes, please nominate the patch for beta approval.
- If no, please set
status-firefox116
towontfix
.
For more information, please visit BugBot documentation.
Assignee | ||
Comment 36•2 years ago
|
||
Comment on attachment 9341479 [details]
Bug 1837686. r=jandem,rhunt.
Beta/Release Uplift Approval Request
- User impact if declined: Potential incorrect compilation and execution of wasm code that uses the wasm "exceptions" feature. That may possible include Adobe PhotoShop (the wasm version of it).
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): In principle it is a bit risky because the fix also affects optimised compilation of JS in general. In practice I'd say it is low risk because:
- the patch is very small and we believe we understand the consequences of it
- it has been on nightly about 3 days and no regressions reported yet
re testing, the existing test suite does not cover the failing case. There is a second patch in this bug which adds a test case, but per normal sec- protocol we will hold off landing it for some months.
- String changes made/needed: none
- Is Android affected?: Yes
Assignee | ||
Comment 37•2 years ago
|
||
Comment on attachment 9341479 [details]
Bug 1837686. r=jandem,rhunt.
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: It's sec:high.
- User impact if declined: Potential incorrect compilation and execution of wasm code that uses the wasm "exceptions" feature. That may possible include Adobe PhotoShop (the wasm version of it).
- Fix Landed on Version: 117
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): In principle it is a bit risky because the fix also affects optimised compilation of JS in general. In practice I'd say it is low risk because:
- the patch is very small and we believe we understand the consequences of it
- it has been on nightly about 3 days and no regressions reported yet
Comment 38•2 years ago
|
||
Comment on attachment 9341479 [details]
Bug 1837686. r=jandem,rhunt.
Approved for 116.0b4
Comment 39•2 years ago
|
||
Updated•2 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Comment 40•2 years ago
|
||
Comment on attachment 9341479 [details]
Bug 1837686. r=jandem,rhunt.
Approved for 115.1esr and 102.14esr.
Comment 41•2 years ago
|
||
uplift |
Comment 42•2 years ago
|
||
uplift |
Updated•2 years ago
|
Comment 43•2 years ago
|
||
Reproduced this issue using an old Nightly build from 2023-07-01, verified that the game can be played with emulation backend dosboxX on Firefox 102esr, 115esr, 116.0b5 and Latest Nightly 117.0a1 across platforms (Windows 10, macOS 13 and Ubuntu 22.04).
Updated•2 years ago
|
Comment 44•2 years ago
|
||
Comment 45•2 years ago
|
||
2 months ago, tjr placed a reminder on the bug using the whiteboard tag [reminder-test 2023-09-12]
.
jseward, please refer to the original comment to better understand the reason for the reminder.
Updated•2 years ago
|
Updated•2 years ago
|
Comment 46•2 years ago
|
||
Comment 47•2 years ago
|
||
bugherder |
Comment 48•1 year ago
|
||
Sorry for the burst of bugspam: filter on tinkling-glitter-filtrate
Adding reporter-external
keyword to security bugs found by non-employees for accounting reasons
Description
•