Closed Bug 1893915 Opened 11 months ago Closed 11 months ago

Loading site triggers SIGILL

Categories

(Core :: JavaScript: WebAssembly, defect, P1)

defect

Tracking

()

RESOLVED FIXED
127 Branch
Tracking Status
firefox-esr115 --- wontfix
firefox125 --- wontfix
firefox126 --- wontfix
firefox127 --- fixed

People

(Reporter: tsmith, Assigned: jseward)

References

(Blocks 1 open bug, )

Details

(Keywords: crash, pernosco, sec-moderate, Whiteboard: [adv-main127+r])

Attachments

(8 files)

Found with m-c 20240423-4630feab4425 (--enable-debug)

This was found by visiting a live website with a debug build.

STR:

  • Launch browser and visit site

This issue was triggered by visiting http://interactive.index.hr/.

A Pernosco session is available here: https://pernos.co/debug/Dboj3Qti_SpayKs1wCrQHA/index.html#f{m[AcwJ,AbT5_,t[AcI,KZE_,f{e[Acl1,jgy2_,s{acNH/AAAA,bAbs,uLgy95w,oLiPC7g___/

It seems like the Pernosco session might be broken - I don't see a SIGILL, and opening the GDB panel actually produces an internal error, at which point the whole Pernosco UI stops working properly. Could you look into it and possibly send us another trace?

Flags: needinfo?(twsmith)

Open toolbox -> signals received.

I will also contact the Pernosco team about the Pernosco UI.

Flags: needinfo?(twsmith)

Some random observations (may be irrelevant):

I can reproduce on x86_64-linux with m-c 737006:650dda918743, unoptimised
debug build, with the test case in comment 3, with
--no-threads --wasm-compiler=baseline.

With --wasm-compiler=ion, it does not reproduce.

The two x.exports.f() lines are both necessary; with only one, it does not
reproduce.

With valgrind and also GDB I see the pattern of two SIGILLs followed by a
SIGTRAP. The two SIGILLs have the same stack ..

valgrind: Unrecognised instruction at address 0x215e1447c10e.
   at 0x215E1447C10E: ???
   by 0x215E1447C16A: ???
   by 0x215E1447C2AA: ???
   by 0x215E1447C4B3: ???
   by 0x3C77A0F: js::wasm::Instance::callExport(JSContext*, unsigned int, JS::CallArgs const&, js::wasm::CoercionLevel) (src/wasm/WasmInstance.cpp:3268)

and the SIGTRAP has the same two frames at the bottom ('AOF, 'C4B3) but a
different tail:

Process terminating with default action of signal 5 (SIGTRAP)
   at 0x215E1447C3AE: ???
   by 0x215E1447C2E0: ???
   by 0x215E1447C4B3: ???
   by 0x3C77A0F: js::wasm::Instance::callExport(JSContext*, unsigned int, JS::CallArgs const&, js::wasm::CoercionLevel) (src/wasm/WasmInstance.cpp:3268)

It might be that the routine 0x215E1447C2AA in the SIGILLs runs a few
instructions further (to 0x215E1447C2E0), makes another call, and that traps
(at 0x215E1447C3AE).

With --wasm-compiler=ion, I see the two SIGILLs but no SIGTRAP.

Using the attached patch, and

./dist/bin/js --no-ion --no-threads --wasm-compiler=baseline \
   ../../bug1893915-testcase-from-comment4.js

it is possible to control individually which compiler (Ion vs Baseline) is
used for each wasm function (see hacks in ExecuteCompileTask).

From this it is clear that it doesn't matter which compiler compiles
func $f1 and func (export "f"). It is func $f2 that matters; with
Baseline we get SIGTRAP, with Ion there's no SIGTRAP. So it seems like it's
the code generation of $f2 on Baseline that is somehow wrong.

Of course that is hardly a surprise -- of the 3 functions this is by far the
most complex.

Assignee: nobody → jseward
Attached file spew-baseline

Per comment 6, Baseline code for function $f2.

Attachment #9399303 - Attachment mime type: application/octet-stream → text/plain
Attached file spew-ion

Per comment 6, Ion code for function $f2. Perhaps we can deduce what the
problem is by comparing these two.

Attached file spew-baseline-tidied

Cleaned up Baseline code for function $f2 (with redundant labels removed)

Attached file spew-ion-tidied

Cleaned up Ion code for function $f2

Attachment #9399327 - Attachment mime type: application/octet-stream → text/plain
Attachment #9399328 - Attachment mime type: application/octet-stream → text/plain

We ran this in Ion for comparison, since Ion does not crash. Pernosco session: https://pernos.co/debug/T0cREOVqbf1RT4mYE86UGw/index.html#f{m[A+s,1UY_,t[AQ,jtU_,f{e[A+s,0/o_,s{aYP6a/sAA,bBQ,uBU63Fg,oBVNjHQ___/

The difference with Ion seems to be that some other instance calls run, which end up setting and clearing packedExitFP_. Therefore, by the time we call the imported JS function, packedExitFP_ is null again and the assert does not fire.

Severity: -- → S3
Priority: -- → P1

This bug probably does not need a very high security rating. We trip a debug assert, but the exit FP information in JIT activations is often stale and regularly reset to other values whenever it would be important (e.g. when starting a trap). It's not clear to any of us how this could actually be exploited. Still, we might as well keep the bug secure, just with a low or moderate severity.

Keywords: sec-moderate
Pushed by bvisness@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a414439df5c0 Always clear exit FP when handling wasm exceptions. r=jandem
Group: javascript-core-security → core-security-release
Status: NEW → RESOLVED
Closed: 11 months ago
Resolution: --- → FIXED
Target Milestone: --- → 127 Branch

(In reply to Acasandrei Beatrice (needinfo me) from comment #17)

(In reply to Pulsebot from comment #15)

Pushed by bvisness@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a414439df5c0
Always clear exit FP when handling wasm exceptions. r=jandem

Perfherder has detected a talos performance change from push a414439df5c0f6563443f055e6f53e4fec5f3b94.

Not sure it is us. I see pdf updates in the same pushlog https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=91cfcfc5935b5ac51c36c1a08a064c33a2d96fd5&tochange=a414439df5c0f6563443f055e6f53e4fec5f3b94 . Bug 1894727 or bug 1894705 ?

Flags: needinfo?(bacasandrei)
Flags: needinfo?(bacasandrei)
QA Whiteboard: [post-critsmash-triage]
Flags: qe-verify-

Should we backport this ESR also? We'll need a rebased patch and an uplift request if yes.

Flags: needinfo?(jseward)

(In reply to Ryan VanderMeulen [:RyanVM] from comment #19)

Should we backport this ESR also? We'll need a rebased patch and an uplift request if yes.

We decided "no" (but thanks for the query):

(1) it seems like the fix only fixes an assertion failure.
(2) we don't have a good way to know that we won't break something else in esr115.
(3) in any case esr115 is late in cycle and esr128, with this fix already in, will be along soon.
(4) we don't have any indication of this crashing in the wild; the bug came from tsmith on the fuzzing team.
(5) if esr115 does start crashing we can always request uplift later.

Flags: needinfo?(jseward)
Whiteboard: [adv-main127+r]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: