[ARM64 M1] Frequent tab crashes with zoom video
Categories
(Core :: JavaScript: WebAssembly, defect, P1)
Tracking
()
People
(Reporter: lth, Assigned: jseward)
References
Details
(Keywords: sec-high, Whiteboard: [sec-survey][adv-main92+r][adv-esr91.1+r])
Attachments
(1 file)
48 bytes,
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-esr91+
tjr
:
sec-approval+
|
Details | Review |
Spun off from bug 1720650 with same steps to reproduce, basically, zoom in the browser will crash the tab with high frequency - sometimes within seconds, sometimes within minutes. For more information and a crash dump, see bug 1720650 comment 14. Mostly these crashes do not leave crash dumps, but the times when I see backtraces in the console they look like the one reported there.
(Point of note, it is not possible to build Firefox in debug-noopt mode on the M1, the build dies with a linker error because the distance between a jump and its target is too great. This makes debugging this problem more complex.)
Reporter | ||
Comment 1•3 years ago
|
||
Triage (will be updated):
One thing I've noticed is that this tends to crash quickly after I start talking... even if the mic is off. But without sound in the room it can stay up for a while. But for example, if the computer running zoom in the browser makes a noise in some way, the tab will often crash.
Julian makes the point that we don't know if this is ARM64 specific.
Assignee | ||
Comment 2•3 years ago
|
||
I played around with this a bit. Causing the crash is difficult; it happens at
seemingly random times after connection to a meeting. It has the feel of
a threading or at least timing-related bug. I did establish however that
-
x86_64-linux is unaffected; at least, I can't get an x86_64 build to crash.
-
On arm64-linux, it feels like it crashes more often in an assertion-enabled
build, although that may be an illusion. -
On arm64-linux, it requires Ion to crash. That is,
javascript.options.wasm_optimizingjit must be true in order to get a crash.
Reporter | ||
Comment 3•3 years ago
|
||
FWIW, a number of the times when I've run into this it's because an assertion has fired, so it would make sense if assertion-enabled builds feel more crashy. The one crash dump I have (https://crash-stats.mozilla.org/report/index/ddd562f0-993f-4291-af1b-528030210805) has such a crash, but stack traces printed in the console have effectively been equivalent to the one in the dump.
Assignee | ||
Comment 4•3 years ago
|
||
More initial triage results. To get it to crash requires all 3 of the
following:
- arm64 -- x86_64 will not crash
- ion -- javascript.options.wasm_optimizingjit=true, .wasm_baselinejit=false
- simd enabled -- javascript.options.wasm_simd must be true
It will then crash (segfault) after a few seconds or minutes in
js::wasm::GetNearestEffectiveTls. Adding debug printing shows there are two
different ways it can crash:
-
fp
points at unmapped memory. Hencefp->callerIsExitOrJitEntryFP()
segfaults. -
fp
points at something accessible, but for whatever reason, the assertion
!LookupCode(fp->returnAddress())
fails, meaning thatLookupCode
returned non-null when it should have returned null.
This would explain why it fails in both debug and non-debug builds, but with
the debug builds more easily.
- is particularly interesting, because the failing
fp
values always have a
particular form, eg:
fp = 0xa500a600a700a8
fp = 0xc700cc00cf00cc
fp = 0x10001200130016
fp = 0xb500ba00be00ba
This might lead one to wonder whether a saved FP value (presumably in the
previous frame) has been overwritten by a SIMD value.
It seems also to me that, once the tab has crashed, it will crash again much
more quickly if one clicks on "Restore this tab!".
Reporter | ||
Comment 5•3 years ago
|
||
Here's a bug that should be fixed regardless, though I doubt it matters for this problem:
diff --git a/js/src/jit/arm64/MoveEmitter-arm64.cpp b/js/src/jit/arm64/MoveEmitter-arm64.cpp
--- a/js/src/jit/arm64/MoveEmitter-arm64.cpp
+++ b/js/src/jit/arm64/MoveEmitter-arm64.cpp
@@ -134,7 +134,7 @@ void MoveEmitterARM64::emitSimd128Move(c
}
vixl::UseScratchRegisterScope temps(&masm.asVIXL());
- const ARMFPRegister scratch = temps.AcquireD();
+ const ARMFPRegister scratch = temps.AcquireQ();
masm.Ldr(scratch, toMemOperand(from));
masm.Str(scratch, toMemOperand(to));
}
Reporter | ||
Comment 6•3 years ago
|
||
At least one important bug: MoveResolver (in MoveEmitterARM64::emit) misallocates space for SIMD spill slot and scribbles on the frame pointer (not the diff shown above, though that should be fixed too). Plausibly this can be used to trick carefully controlled code into forging the frame pointer, which can be used for arbitrary memory reads and/or to redirect the code pointer. This is going to need an uplift to 92 and ESR91.
Reporter | ||
Updated•3 years ago
|
Reporter | ||
Comment 7•3 years ago
|
||
Presumably (thinking out loud), the Architecture file on each platform ought to define the spill slot size, this should not be hidden down in the move emitter like it is now. This would sensibly be the max of sizeof(Registers::RegisterContent) and sizeof(FloatRegisters::RegisterContent).
Assignee | ||
Comment 8•3 years ago
|
||
(In reply to Lars T Hansen [:lth] from comment #6)
At least one important bug: MoveResolver (in MoveEmitterARM64::emit) misallocates
space for SIMD spill slot [..]
FTR, the smoking gun there, that we found, is
;; What's with q4/q3 here? 16-byte store/load but PSP only moves by 8
0x00002fd60bc94404: sub x28, x28, #0x8
0x00002fd60bc94408: mov sp, x28
0x00002fd60bc9440c: str q4, [x28]
0x00002fd60bc94410: mov v4.16b, v6.16b
0x00002fd60bc94414: mov v6.16b, v2.16b
0x00002fd60bc94418: mov v2.16b, v1.16b
0x00002fd60bc9441c: mov v1.16b, v3.16b
0x00002fd60bc94420: ldr q3, [x28]
0x00002fd60bc94424: add x28, x28, #0x8
0x00002fd60bc94428: mov sp, x28
Assignee | ||
Comment 9•3 years ago
|
||
Assignee | ||
Comment 10•3 years ago
|
||
Attached file Bug 1725638 - Fix Zoom problems. r=lth.
This fixes two problems relating to translation of MoveGroups on AArch64:
-
MoveEmitterARM64::emitSimd128Move
: in the case where both source and
target are in memory, we requested a D-class (64-bit) register when in fact
a Q-class (128-bit) register should have been requested. It's unclear
whether this in fact causes a problem -- it may be that the emission of the
associated instruction ignores this. But if it doesn't, then we really do
need to explicitly request a Q-class reg. -
MoveEmitterARM64::toMemOperand
: this reservessizeof(void*)
(8) bytes
of stack space for a move that needs to go via memory. This is obviously
wrong for a 128-bit SIMD move, and will cause the bottom 8 bytes of the
stack to be overwritten, per comments above. The patch changes that size
toSimd128DataSize
(16), which is correct and avoids the overwrite.
Assignee | ||
Comment 11•3 years ago
|
||
Comment on attachment 9237026 [details]
Bug 1725638. r=lth.
Security Approval Request
- How easily could an exploit be constructed based on the patch?: In order to understand the significance and exploitability of the flaw, an
attacker would have to know quite a lot about register allocation in
optimising compilers. Once they had got that, however, it would seem to us
plausible to develop an exploit that (per comment 6) could overwrite the
stored frame pointer, then do arbitrary reads and/or redirect execution
elsewhere.
But only on aarch64 platforms, including aarch64-Android. Not on x86/x64.
- Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
- Which older supported branches are affected by this flaw?: firefox90, firefox91, firefox92(=beta), esr91.
- If not all supported branches, which bug introduced the flaw?: None
- Do you have backports for the affected branches?: Yes
- If not, how different, hard to create, and risky will they be?:
- How likely is this patch to cause regressions; how much testing does it need?: It's very simple (a 2-liner); it affects only wasm on AArch64; it has been
tested fairly extensively by hand. I think it is unlikely to cause
regression.
Assignee | ||
Comment 12•3 years ago
|
||
Comment on attachment 9237026 [details]
Bug 1725638. r=lth.
Beta/Release Uplift Approval Request
- User impact if declined: Worst case -- (assuming skilled attacker etc): On AArch64 targets (MacOS,
Android) -- remote code execution as a result of loading a malicious
WebAssembly payload.
For definite -- crashing of tab for Zoom's in-browser (wasm-based) client,
making it unusable.
Potential -- crashing of tab for any application that uses wasm SIMD on
AArch64.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: Yes
- If yes, steps to reproduce: Start a Zoom meeting using some other device/client, and get hold of the
meeting ID.
With an AArch64/MacOS build of Fx, check you have
javascript.options.wasm_optimizingjit = true.
Go to https://www.zoom.us/join and enter the meeting ID. But then don't click
on "Launch Meeting"; instead scroll down and click on "Join from Your
Browser". Once the browser connects to the meeting, the tab will crash at
some random point, but almost certainly within about 5 minutes of connecting.
A debug (assertions-enabled) build will crash more readily.
- List of other uplifts needed: none
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): It's very simple (a 2-liner); it affects only wasm on AArch64; it has been
tested fairly extensively by hand. I think it is unlikely to cause
regression. - String changes made/needed: none
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 13•3 years ago
|
||
Comment on attachment 9237026 [details]
Bug 1725638. r=lth.
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: Is sec:high
- User impact if declined: Worst case -- (assuming skilled attacker etc): On AArch64 targets (MacOS,
Android) -- remote code execution as a result of loading a malicious
WebAssembly payload.
For definite -- crashing of tab for Zoom's in-browser (wasm-based) client,
making it unusable.
Potential -- crashing of tab for any application that uses wasm SIMD on
AArch64.
- Fix Landed on Version: 93
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): It's very simple (a 2-liner); it affects only wasm on AArch64; it has been
tested fairly extensively by hand. I think it is unlikely to cause
regression. - String or UUID changes made by this patch: none
Updated•3 years ago
|
Updated•3 years ago
|
Comment 14•3 years ago
•
|
||
Comment on attachment 9237026 [details]
Bug 1725638. r=lth.
Approved to land.
[No security hat] I don't think the commit message is very good - Zoom (the video conferencing site) is something that triggers the issue but it's not the real problem. I assumed for a bit that this was a fullscreen/APZ-style zoom issue. I think even a blank commit message would be better; and it has the benefit of not implicating the reproducing site.
Reporter | ||
Comment 15•3 years ago
•
|
||
(In reply to Tom Ritter [:tjr] (Intermittently available) from comment #14)
I think even a blank commit message would be better; and it has the benefit of not implicating the reproducing site.
I agree.
Assignee | ||
Comment 16•3 years ago
|
||
Good point; will redo patch with a blank commit message.
Updated•3 years ago
|
Comment 17•3 years ago
|
||
Comment on attachment 9237026 [details]
Bug 1725638. r=lth.
Approved for 92.0b8.
Comment 18•3 years ago
|
||
uplift |
Comment 19•3 years ago
|
||
Updated•3 years ago
|
Comment 20•3 years ago
|
||
As part of a security bug pattern analysis, we are requesting your help with a high level analysis of this bug. It is our hope to develop static analysis (or potentially runtime/dynamic analysis) in the future to identify classes of bugs.
Please visit this google form to reply.
Comment 21•3 years ago
|
||
Comment on attachment 9237026 [details]
Bug 1725638. r=lth.
Approved for 91.1esr.
Comment 22•3 years ago
|
||
uplift |
Comment 23•3 years ago
|
||
(In reply to Lars T Hansen [:lth] from comment #7)
Presumably (thinking out loud), the Architecture file on each platform ought to define the spill slot size, this should not be hidden down in the move emitter like it is now. This would sensibly be the max of sizeof(Registers::RegisterContent) and sizeof(FloatRegisters::RegisterContent).
Lars, did we fix this within the patch here?
It seems not. Can you please file a follow-up if appropriate.
Assignee | ||
Comment 24•3 years ago
|
||
(In reply to Frederik Braun [:freddy] from comment #23)
Can you please file a follow-up if appropriate.
Done, bug 1727698.
Comment 25•3 years ago
|
||
Hello,
I tried to reproduce this issue on my MacBook Air M1 2020 on Fx 93.0a1 (BuildID: 20210813092746) and Fx 92.0a1 (Build ID: 20210715094037) (the build from the original bug) and I cant reproduce this issue.
@Lars maybe you can take a look and confirm this issue is fixed on Firefox 92.0b8 and 93.0a1 and 91.1esr?
Comment 26•3 years ago
|
||
Hello,
With help from @jseward we managed the reproduce the bug on Fx 93.0a1 (BuildID: 20210813092746).
I can confirm that the issue is fixed on Fx 93.01 (Latest Nightly) and Fx 92.0b8. I couldn't test on 91.1esr treeherder build because mac OS says its broken.
Comment 27•3 years ago
|
||
Hello,
I managed to verify this issue on Fx. 91.1 esr using a treeherder build from https://treeherder.mozilla.org/jobs?repo=mozilla-esr91&revision=9e69a6210f885d2931fa98af4bd50e422d276af4&selectedTaskRun=PF1KM_yiT1yO8l1i2lIUFA.0
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Updated•2 years ago
|
Description
•