Closed Bug 1499198 Opened 2 years ago Closed 2 years ago

Crash in js::wasm::Instance::callExport


(Core :: Javascript: WebAssembly, defect)

62 Branch
Not set



Tracking Status
firefox-esr60 63+ verified
firefox62 --- wontfix
firefox63 --- wontfix
firefox64 --- verified


(Reporter: philipp, Assigned: bbouvier)



(Keywords: crash, csectype-wildptr, sec-low, Whiteboard: [adv-esr60.3+][adv-main64+])

Crash Data


(1 file)

This bug was filed from the Socorro interface and is
report bp-47e3e173-b5d4-4df0-b108-8cee80181014.

crash reports with this signature a re recently spiking up in volume on release to ~250 recorded reports a day - most of the crashes happen in the content process.

a lot of user comments are referring to the facebook game as the source of the crash.
Component: JavaScript Engine: JIT → Javascript: Web Assembly
Unfortunately, this is the equivalent of the most generic entry into the JIT: wasm::Instance::callExport is the C++ function that's called when entering wasm code. So there could be one or several different causes to this bug. In the worst case, there could be a call from wasm to the JIT, and the JIT function is the one which crashes, so this could also be an issue in JIT code (baseline or ion).

From a quick look at the first 5 reports, the crashes reasons seem to ~all consist of EXCEPTION_ACCESS_VIOLATION_READ, on Windows, on x86 (not sure if Socorro displays something different for x64, so this might be 32 or 64 bits; the code addresses look like 32 bits though). The stacks always look a bit weird:

- one raw address
- N times the same address
- then wasm::Instance::callExport on top of this
- then IonBuilder::makeCall??? Probably we can't trust this.

(One remark: if the raw address that's shown is the value of FP, then the address that's repeated N times is marked with the low bit, meaning it's a recursive call in a wasm function; the top stack address isn't marked, so this would then be a JIT function or exit. But I see a weird pattern in the repeated address (e.g. 0xea7afff), which resemble a JS::Value layout. So this may be a red herring.)

I'll try to see if there's anything else useful in the crash reports. Playing with the Facebook app might be a start.
It'd be useful to know if the crash dumps contain the surrounding assembly code at the point of the crash.  With wasm, this can sometimes be especially indicative of (baseline vs. ion) and which operation.
Wow, yeah, with 99% x86 (compared to ~28% of overall population), that suggests something 32-bit-specific; maybe regarding the use of guard pages and bounds checks.  It'd be useful to run an instrumented build that print()ed on traps (esp. out-of-bounds) to see if that's happening in normal runs.
If there's something we can land to get better diagnostics from the crashes, I could take it in 60.3esr if I had it today still. We might be able to get it in 63rc2 as well.
Flags: needinfo?(luke)
Flags: needinfo?(bbouvier)
Ok, with help from Jan, I think I know what the bug is.  So Jan extracted this assembly for the crash dump:
and other crash dumps crash at the same punpckldq instruction.

Notice right before we're moving a constant address into a register and loading it; that's a red flag in wasm code because we serialize module code and so process-relative addresses are a no-no, so this would have to be in a wasm non-module stub (or JS JIT code).

Jan found a stub that seems to emit this pattern:
and we can see it baking in a static variable address!  (If ImmPtr() was used instead of ImmWord(), this would have asserted because IsCompilingWasm().)

Lars, seems like there could be a quick fix?

It'd be great to have a fix in the next releases.  Alternatively, if it's too risky, we can just disable asm.js caching.
Flags: needinfo?(luke)
Flags: needinfo?(lhansen)
Flags: needinfo?(bbouvier)
Will look at this first thing in the morning.
Actually, Lars: is punpckldq only emitted for wasm?

If so, then the fact that the crashes are only for FF62 and wasm serialization was removed in FF63 (bug 1469395) and not yet re-added (bug 1487113) might mean that this crash is implicitly fixed in FF63 without any patches.
Note that ESR60 is also affected.
(In reply to Luke Wagner [:luke] from comment #7)
> Actually, Lars: is punpckldq only emitted for wasm?

If you meant to ask "only wasm and not asm.js": yes. Otherwise, it's also used in regular JS for the MRandom node.

> If so, then the fact that the crashes are only for FF62 and wasm
> serialization was removed in FF63 (bug 1469395) and not yet re-added (bug
> 1487113) might mean that this crash is implicitly fixed in FF63 without any
> patches.

Agreed; we'd need a stop gap for ESR60 (and maybe others) in this case.
Attached patch fix.patchSplinter Review
This is silly, we have masm features to embed the constant values within the code for 128-bit values (because SIMD!). This is equivalent and just a bit slower (by adding one load per conversion), but it doesn't use an absolute reference.
Assignee: nobody → bbouvier
Attachment #9017918 - Flags: review?(lhansen)
Regarding the sec-rating:
- in wasm:
  - in the best case, the address we're trying to read from is incorrect and we'll segfault
  - in the worst case, we read from an incorrect address, and the computation will be incorrect. Then the double will flow through wasm operators, and at the JS boundaries, be converted either into a regular double if it's valid, or a canonical NaN; so no risk for wrong type interpretation.
- under js (since this function is used by MRandom code gen): the problem is non-existent, because jit code isn't cached.

Luke agreed with me. So I think this is sec-low.

I'll be away until tomorrow (end of day here), so feel free to tweak/land the patch. For what it's worth, I've tried it locally and it passes all the wasm testing in --tbpl mode.
Flags: needinfo?(lhansen)
Comment on attachment 9017918 [details] [diff] [review]

Review of attachment 9017918 [details] [diff] [review]:

(Stealing, so we can get a fix in ESR) Looks great to me!
Attachment #9017918 - Flags: review?(lhansen) → review+
Comment on attachment 9017918 [details] [diff] [review]

[ESR Uplift Approval Request]

If this is not a sec:{high,crit} bug, please state case for ESR consideration: It prevents a crash with decent volume

User impact if declined: Continued crashes (for a long time)

Fix Landed on Version: Nightly

Risk to taking this patch: Low

Why is the change risky/not risky? (and alternatives if risky): Simple patch.

String or UUID changes made by this patch:
Attachment #9017918 - Flags: approval-mozilla-esr60?

Per IRC discussion with Luke and Al, I'm going to take this fix for 60.3esr because it's a topcrash there, but we're going to let it ride the 64 train to release since 63 isn't hitting the crash reported here and it'd be needless late-cycle churn.
Comment on attachment 9017918 [details] [diff] [review]

Fixes a topcrash, approved for 60.3esr.
Attachment #9017918 - Flags: approval-mozilla-esr60? → approval-mozilla-esr60+
Whiteboard: [adv-esr60.3+]
See Also: → 1500016
Group: javascript-core-security → core-security-release
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
Flags: qe-verify+
Was able to reproduce the crash using the url from comment 0 after multiple refreshes on the page: (Fx 62.0.3, Windows 7 32bit).

Did not encounter the crash on new builds after the fix 64.0b5 and 60.3.0esr under Windows 7 32bit. Checking Socorro I do still see 3 crashes in the last 7 days affecting 64.0b4. Is it safe we call this verified fixed based on these results?
Flags: needinfo?(bbouvier)
Yes, I think so (and that's in line with comment 14).
Flags: needinfo?(bbouvier)
(In reply to Benjamin Bouvier [:bbouvier] from comment #19)
> Yes, I think so (and that's in line with comment 14).

Thanks, marking the bug accordingly.
Flags: qe-verify+
Whiteboard: [adv-esr60.3+] → [adv-esr60.3+][adv-main64+]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.