1725638 - [ARM64 M1] Frequent tab crashes with zoom video

Reporter

Description

•

3 years ago

Spun off from bug 1720650 with same steps to reproduce, basically, zoom in the browser will crash the tab with high frequency - sometimes within seconds, sometimes within minutes. For more information and a crash dump, see bug 1720650 comment 14. Mostly these crashes do not leave crash dumps, but the times when I see backtraces in the console they look like the one reported there.

(Point of note, it is not possible to build Firefox in debug-noopt mode on the M1, the build dies with a linker error because the distance between a jump and its target is too great. This makes debugging this problem more complex.)

Lars T Hansen [:lth]

Reporter

Comment 1

•

3 years ago

Triage (will be updated):

One thing I've noticed is that this tends to crash quickly after I start talking... even if the mic is off. But without sound in the room it can stay up for a while. But for example, if the computer running zoom in the browser makes a noise in some way, the tab will often crash.

Julian makes the point that we don't know if this is ARM64 specific.

Julian Seward [:jseward]

Assignee

Comment 2

•

3 years ago

I played around with this a bit. Causing the crash is difficult; it happens at
seemingly random times after connection to a meeting. It has the feel of
a threading or at least timing-related bug. I did establish however that

x86_64-linux is unaffected; at least, I can't get an x86_64 build to crash.
On arm64-linux, it feels like it crashes more often in an assertion-enabled
build, although that may be an illusion.
On arm64-linux, it requires Ion to crash. That is,
javascript.options.wasm_optimizingjit must be true in order to get a crash.

Lars T Hansen [:lth]

Reporter

Comment 3

•

3 years ago

FWIW, a number of the times when I've run into this it's because an assertion has fired, so it would make sense if assertion-enabled builds feel more crashy. The one crash dump I have (https://crash-stats.mozilla.org/report/index/ddd562f0-993f-4291-af1b-528030210805) has such a crash, but stack traces printed in the console have effectively been equivalent to the one in the dump.

Julian Seward [:jseward]

Assignee

Comment 4

•

3 years ago

More initial triage results. To get it to crash requires all 3 of the
following:

arm64 -- x86_64 will not crash
ion -- javascript.options.wasm_optimizingjit=true, .wasm_baselinejit=false
simd enabled -- javascript.options.wasm_simd must be true

It will then crash (segfault) after a few seconds or minutes in
js::wasm::GetNearestEffectiveTls. Adding debug printing shows there are two
different ways it can crash:

fp points at unmapped memory. Hence fp->callerIsExitOrJitEntryFP()
segfaults.
fp points at something accessible, but for whatever reason, the assertion
!LookupCode(fp->returnAddress()) fails, meaning that LookupCode
returned non-null when it should have returned null.

This would explain why it fails in both debug and non-debug builds, but with
the debug builds more easily.

is particularly interesting, because the failing fp values always have a
particular form, eg:

fp = 0xa500a600a700a8
fp = 0xc700cc00cf00cc
fp = 0x10001200130016
fp = 0xb500ba00be00ba

This might lead one to wonder whether a saved FP value (presumably in the
previous frame) has been overwritten by a SIMD value.

It seems also to me that, once the tab has crashed, it will crash again much
more quickly if one clicks on "Restore this tab!".

Lars T Hansen [:lth]

Reporter

Comment 5

•

3 years ago

Here's a bug that should be fixed regardless, though I doubt it matters for this problem:

diff --git a/js/src/jit/arm64/MoveEmitter-arm64.cpp b/js/src/jit/arm64/MoveEmitter-arm64.cpp
--- a/js/src/jit/arm64/MoveEmitter-arm64.cpp
+++ b/js/src/jit/arm64/MoveEmitter-arm64.cpp
@@ -134,7 +134,7 @@ void MoveEmitterARM64::emitSimd128Move(c
   }
 
   vixl::UseScratchRegisterScope temps(&masm.asVIXL());
-  const ARMFPRegister scratch = temps.AcquireD();
+  const ARMFPRegister scratch = temps.AcquireQ();
   masm.Ldr(scratch, toMemOperand(from));
   masm.Str(scratch, toMemOperand(to));
 }

Lars T Hansen [:lth]

Reporter

Updated

•

3 years ago

Group: core-security

Keywords: sec-high

Lars T Hansen [:lth]

Reporter

Comment 6

•

3 years ago

At least one important bug: MoveResolver (in MoveEmitterARM64::emit) misallocates space for SIMD spill slot and scribbles on the frame pointer (not the diff shown above, though that should be fixed too). Plausibly this can be used to trick carefully controlled code into forging the frame pointer, which can be used for arbitrary memory reads and/or to redirect the code pointer. This is going to need an uplift to 92 and ESR91.

Lars T Hansen [:lth]

Reporter

Updated

•

3 years ago

Assignee: lhansen → jseward

Lars T Hansen [:lth]

Reporter

Comment 7

•

3 years ago

Presumably (thinking out loud), the Architecture file on each platform ought to define the spill slot size, this should not be hidden down in the move emitter like it is now. This would sensibly be the max of sizeof(Registers::RegisterContent) and sizeof(FloatRegisters::RegisterContent).

Julian Seward [:jseward]

Assignee

Comment 8

•

3 years ago

(In reply to Lars T Hansen [:lth] from comment #6)

At least one important bug: MoveResolver (in MoveEmitterARM64::emit) misallocates
space for SIMD spill slot [..]

FTR, the smoking gun there, that we found, is

   ;; What's with q4/q3 here?  16-byte store/load but PSP only moves by 8
   0x00002fd60bc94404:  sub     x28, x28, #0x8
   0x00002fd60bc94408:  mov     sp, x28
   0x00002fd60bc9440c:  str     q4, [x28]
   0x00002fd60bc94410:  mov     v4.16b, v6.16b
   0x00002fd60bc94414:  mov     v6.16b, v2.16b
   0x00002fd60bc94418:  mov     v2.16b, v1.16b
   0x00002fd60bc9441c:  mov     v1.16b, v3.16b
   0x00002fd60bc94420:  ldr     q3, [x28]
   0x00002fd60bc94424:  add     x28, x28, #0x8
   0x00002fd60bc94428:  mov     sp, x28

Julian Seward [:jseward]

Assignee

Comment 9

•

3 years ago

Attached file Bug 1725638. r=lth. — Details

Julian Seward [:jseward]

Assignee

Comment 10

•

3 years ago

Attached file Bug 1725638 - Fix Zoom problems. r=lth.

This fixes two problems relating to translation of MoveGroups on AArch64:

MoveEmitterARM64::emitSimd128Move: in the case where both source and
target are in memory, we requested a D-class (64-bit) register when in fact
a Q-class (128-bit) register should have been requested. It's unclear
whether this in fact causes a problem -- it may be that the emission of the
associated instruction ignores this. But if it doesn't, then we really do
need to explicitly request a Q-class reg.
MoveEmitterARM64::toMemOperand: this reserves sizeof(void*) (8) bytes
of stack space for a move that needs to go via memory. This is obviously
wrong for a 128-bit SIMD move, and will cause the bottom 8 bytes of the
stack to be overwritten, per comments above. The patch changes that size
to Simd128DataSize (16), which is correct and avoids the overwrite.

Julian Seward [:jseward]

Assignee

Comment 11

•

3 years ago

Comment on attachment 9237026 [details]
Bug 1725638. r=lth.

Security Approval Request

How easily could an exploit be constructed based on the patch?: In order to understand the significance and exploitability of the flaw, an
attacker would have to know quite a lot about register allocation in
optimising compilers. Once they had got that, however, it would seem to us
plausible to develop an exploit that (per comment 6) could overwrite the
stored frame pointer, then do arbitrary reads and/or redirect execution
elsewhere.

But only on aarch64 platforms, including aarch64-Android. Not on x86/x64.

Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
Which older supported branches are affected by this flaw?: firefox90, firefox91, firefox92(=beta), esr91.
If not all supported branches, which bug introduced the flaw?: None
Do you have backports for the affected branches?: Yes
If not, how different, hard to create, and risky will they be?:
How likely is this patch to cause regressions; how much testing does it need?: It's very simple (a 2-liner); it affects only wasm on AArch64; it has been
tested fairly extensively by hand. I think it is unlikely to cause
regression.

Attachment #9237026 - Flags: sec-approval?

Julian Seward [:jseward]

Assignee

Comment 12

•

3 years ago

Comment on attachment 9237026 [details]
Bug 1725638. r=lth.

Beta/Release Uplift Approval Request

User impact if declined: Worst case -- (assuming skilled attacker etc): On AArch64 targets (MacOS,
Android) -- remote code execution as a result of loading a malicious
WebAssembly payload.

For definite -- crashing of tab for Zoom's in-browser (wasm-based) client,
making it unusable.

Potential -- crashing of tab for any application that uses wasm SIMD on
AArch64.

Is this code covered by automated tests?: No
Has the fix been verified in Nightly?: Yes
Needs manual test from QE?: Yes
If yes, steps to reproduce: Start a Zoom meeting using some other device/client, and get hold of the
meeting ID.

With an AArch64/MacOS build of Fx, check you have
javascript.options.wasm_optimizingjit = true.

Go to https://www.zoom.us/join and enter the meeting ID. But then don't click
on "Launch Meeting"; instead scroll down and click on "Join from Your
Browser". Once the browser connects to the meeting, the tab will crash at
some random point, but almost certainly within about 5 minutes of connecting.
A debug (assertions-enabled) build will crash more readily.

List of other uplifts needed: none
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): It's very simple (a 2-liner); it affects only wasm on AArch64; it has been
tested fairly extensively by hand. I think it is unlikely to cause
regression.
String changes made/needed: none

Attachment #9237026 - Flags: approval-mozilla-beta?

Julian Seward [:jseward]

Assignee

Updated

•

3 years ago

Flags: qe-verify+

Julian Seward [:jseward]

Assignee

Comment 13

•

3 years ago

Comment on attachment 9237026 [details]
Bug 1725638. r=lth.

ESR Uplift Approval Request

If this is not a sec:{high,crit} bug, please state case for ESR consideration: Is sec:high
User impact if declined: Worst case -- (assuming skilled attacker etc): On AArch64 targets (MacOS,
Android) -- remote code execution as a result of loading a malicious
WebAssembly payload.

For definite -- crashing of tab for Zoom's in-browser (wasm-based) client,
making it unusable.

Potential -- crashing of tab for any application that uses wasm SIMD on
AArch64.

Fix Landed on Version: 93
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): It's very simple (a 2-liner); it affects only wasm on AArch64; it has been
tested fairly extensively by hand. I think it is unlikely to cause
regression.
String or UUID changes made by this patch: none

Attachment #9237026 - Flags: approval-mozilla-esr91?

Ryan VanderMeulen [:RyanVM]

Updated

•

3 years ago

Group: core-security → javascript-core-security

Ryan VanderMeulen [:RyanVM]

Updated

•

3 years ago

status-firefox91: --- → wontfix

status-firefox92: --- → affected

status-firefox93: --- → affected

status-firefox-esr78: --- → unaffected

status-firefox-esr91: --- → affected

tracking-firefox92: --- → +

tracking-firefox93: --- → +

tracking-firefox-esr91: --- → 92+

Tom Ritter [:tjr] (OOTO until 4/30 at least)

Comment 14

•

3 years ago

•

Edited

Comment on attachment 9237026 [details]
Bug 1725638. r=lth.

Approved to land.

[No security hat] I don't think the commit message is very good - Zoom (the video conferencing site) is something that triggers the issue but it's not the real problem. I assumed for a bit that this was a fullscreen/APZ-style zoom issue. I think even a blank commit message would be better; and it has the benefit of not implicating the reproducing site.

Attachment #9237026 - Flags: sec-approval? → sec-approval+

Lars T Hansen [:lth]

Reporter

Comment 15

•

3 years ago

•

Edited

(In reply to Tom Ritter [:tjr] (Intermittently available) from comment #14)

I think even a blank commit message would be better; and it has the benefit of not implicating the reproducing site.

I agree.

Julian Seward [:jseward]

Assignee

Comment 16

•

3 years ago

Good point; will redo patch with a blank commit message.

Phabricator Automation

Updated

•

3 years ago

Attachment #9237026 - Attachment description: Bug 1725638 - Fix Zoom problems. r=lth. → Bug 1725638. r=lth.

Ryan VanderMeulen [:RyanVM]

Comment 17

•

3 years ago

Comment on attachment 9237026 [details]
Bug 1725638. r=lth.

Approved for 92.0b8.

Attachment #9237026 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Ryan VanderMeulen [:RyanVM]

Comment 18

•

3 years ago

uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/7d9f1f86175a

status-firefox92: affected → fixed

Ryan VanderMeulen [:RyanVM]

Comment 19

•

3 years ago

https://hg.mozilla.org/mozilla-central/rev/340109ff1e24

Group: javascript-core-security → core-security-release

Status: ASSIGNED → RESOLVED

Closed: 3 years ago

status-firefox93: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → 93 Branch

Cornel Ionce [:noni] [Hubs QA]

Updated

•

3 years ago

QA Whiteboard: [qa-triaged]

BugBot [:suhaib / :marco/ :calixte]

Comment 20

•

3 years ago

As part of a security bug pattern analysis, we are requesting your help with a high level analysis of this bug. It is our hope to develop static analysis (or potentially runtime/dynamic analysis) in the future to identify classes of bugs.

Please visit this google form to reply.

Flags: needinfo?(jseward)

Whiteboard: [sec-survey]

Ryan VanderMeulen [:RyanVM]

Comment 21

•

3 years ago

Comment on attachment 9237026 [details]
Bug 1725638. r=lth.

Approved for 91.1esr.

Attachment #9237026 - Flags: approval-mozilla-esr91? → approval-mozilla-esr91+

Ryan VanderMeulen [:RyanVM]

Comment 22

•

3 years ago

uplift

https://hg.mozilla.org/releases/mozilla-esr91/rev/6a6f716a03ba

status-firefox-esr91: affected → fixed

Frederik Braun [:freddy]

Comment 23

•

3 years ago

(In reply to Lars T Hansen [:lth] from comment #7)

Presumably (thinking out loud), the Architecture file on each platform ought to define the spill slot size, this should not be hidden down in the move emitter like it is now. This would sensibly be the max of sizeof(Registers::RegisterContent) and sizeof(FloatRegisters::RegisterContent).

Lars, did we fix this within the patch here?
It seems not. Can you please file a follow-up if appropriate.

Flags: needinfo?(lhansen)

Julian Seward [:jseward]

Assignee

Comment 24

•

3 years ago

(In reply to Frederik Braun [:freddy] from comment #23)

Can you please file a follow-up if appropriate.

Done, bug 1727698.

Flags: needinfo?(lhansen)

Daniel Cicas [:dcicas], Release QA

Comment 25

•

3 years ago

Hello,

I tried to reproduce this issue on my MacBook Air M1 2020 on Fx 93.0a1 (BuildID: 20210813092746) and Fx 92.0a1 (Build ID: 20210715094037) (the build from the original bug) and I cant reproduce this issue.

@Lars maybe you can take a look and confirm this issue is fixed on Firefox 92.0b8 and 93.0a1 and 91.1esr?

Flags: needinfo?(lhansen)

Daniel Cicas [:dcicas], Release QA

Comment 26

•

3 years ago

Hello,

With help from @jseward we managed the reproduce the bug on Fx 93.0a1 (BuildID: 20210813092746).

I can confirm that the issue is fixed on Fx 93.01 (Latest Nightly) and Fx 92.0b8. I couldn't test on 91.1esr treeherder build because mac OS says its broken.

Status: RESOLVED → VERIFIED

status-firefox92: fixed → verified

status-firefox93: fixed → verified

Flags: qe-verify+

Flags: needinfo?(lhansen)

Daniel Cicas [:dcicas], Release QA

Comment 27

•

3 years ago

Hello,

I managed to verify this issue on Fx. 91.1 esr using a treeherder build from https://treeherder.mozilla.org/jobs?repo=mozilla-esr91&revision=9e69a6210f885d2931fa98af4bd50e422d276af4&selectedTaskRun=PF1KM_yiT1yO8l1i2lIUFA.0

status-firefox-esr91: fixed → verified

Frederik Braun [:freddy]

Updated

•

3 years ago

Whiteboard: [sec-survey] → [sec-survey][adv-main92+r]

Frederik Braun [:freddy]

Updated

•

3 years ago

Whiteboard: [sec-survey][adv-main92+r] → [sec-survey][adv-main92+r][adv-esr91.0.1+r]

Julien Cristau [:jcristau]

Updated

•

3 years ago

Whiteboard: [sec-survey][adv-main92+r][adv-esr91.0.1+r] → [sec-survey][adv-main92+r][adv-esr91.1+r]

Julian Seward [:jseward]

Assignee

Updated

•

3 years ago

Flags: needinfo?(jseward)

Daniel Veditz [:dveditz]

Updated

•

2 years ago

Group: core-security-release