Loading site triggers SIGILL
Categories
(Core :: JavaScript: WebAssembly, defect, P1)
Tracking
()
People
(Reporter: tsmith, Assigned: jseward)
References
(Blocks 1 open bug, )
Details
(Keywords: crash, pernosco, sec-moderate, Whiteboard: [adv-main127+r])
Attachments
(8 files)
852 bytes,
application/x-javascript
|
Details | |
4.27 KB,
patch
|
Details | Diff | Splinter Review | |
6.08 KB,
text/plain
|
Details | |
10.82 KB,
text/plain
|
Details | |
3.67 KB,
text/plain
|
Details | |
7.35 KB,
text/plain
|
Details | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
Found with m-c 20240423-4630feab4425 (--enable-debug)
This was found by visiting a live website with a debug build.
STR:
- Launch browser and visit site
This issue was triggered by visiting http://interactive.index.hr/
.
A Pernosco session is available here: https://pernos.co/debug/Dboj3Qti_SpayKs1wCrQHA/index.html#f{m[AcwJ,AbT5_,t[AcI,KZE_,f{e[Acl1,jgy2_,s{acNH/AAAA,bAbs,uLgy95w,oLiPC7g___/
Comment 1•11 months ago
|
||
It seems like the Pernosco session might be broken - I don't see a SIGILL, and opening the GDB panel actually produces an internal error, at which point the whole Pernosco UI stops working properly. Could you look into it and possibly send us another trace?
Reporter | ||
Comment 2•11 months ago
|
||
Open toolbox -> signals received.
I will also contact the Pernosco team about the Pernosco UI.
Comment 3•11 months ago
|
||
Comment 4•11 months ago
|
||
Comment on attachment 9399235 [details]
jsshell -wasm-compilier=baseline test case
Pernosco recording https://pernos.co/debug/qoPS6ZHr1SYudsji8u0jsQ/index.html#f{m[A7s,AR/u_,t[AQ,NFGz_,f{e[A7c,BFOR_,s{aVdmgPqAA,bBQ,uBVZEjA,oBWVL7w___/
Assignee | ||
Comment 5•11 months ago
|
||
Some random observations (may be irrelevant):
I can reproduce on x86_64-linux with m-c 737006:650dda918743, unoptimised
debug build, with the test case in comment 3, with
--no-threads --wasm-compiler=baseline
.
With --wasm-compiler=ion
, it does not reproduce.
The two x.exports.f()
lines are both necessary; with only one, it does not
reproduce.
With valgrind and also GDB I see the pattern of two SIGILLs followed by a
SIGTRAP. The two SIGILLs have the same stack ..
valgrind: Unrecognised instruction at address 0x215e1447c10e.
at 0x215E1447C10E: ???
by 0x215E1447C16A: ???
by 0x215E1447C2AA: ???
by 0x215E1447C4B3: ???
by 0x3C77A0F: js::wasm::Instance::callExport(JSContext*, unsigned int, JS::CallArgs const&, js::wasm::CoercionLevel) (src/wasm/WasmInstance.cpp:3268)
and the SIGTRAP has the same two frames at the bottom ('AOF, 'C4B3) but a
different tail:
Process terminating with default action of signal 5 (SIGTRAP)
at 0x215E1447C3AE: ???
by 0x215E1447C2E0: ???
by 0x215E1447C4B3: ???
by 0x3C77A0F: js::wasm::Instance::callExport(JSContext*, unsigned int, JS::CallArgs const&, js::wasm::CoercionLevel) (src/wasm/WasmInstance.cpp:3268)
It might be that the routine 0x215E1447C2AA in the SIGILLs runs a few
instructions further (to 0x215E1447C2E0), makes another call, and that traps
(at 0x215E1447C3AE).
With --wasm-compiler=ion
, I see the two SIGILLs but no SIGTRAP.
Assignee | ||
Comment 6•11 months ago
|
||
Using the attached patch, and
./dist/bin/js --no-ion --no-threads --wasm-compiler=baseline \
../../bug1893915-testcase-from-comment4.js
it is possible to control individually which compiler (Ion vs Baseline) is
used for each wasm function (see hacks in ExecuteCompileTask).
From this it is clear that it doesn't matter which compiler compiles
func $f1
and func (export "f")
. It is func $f2
that matters; with
Baseline we get SIGTRAP, with Ion there's no SIGTRAP. So it seems like it's
the code generation of $f2
on Baseline that is somehow wrong.
Of course that is hardly a surprise -- of the 3 functions this is by far the
most complex.
Assignee | ||
Updated•11 months ago
|
Assignee | ||
Comment 8•11 months ago
|
||
Per comment 6, Ion code for function $f2
. Perhaps we can deduce what the
problem is by comparing these two.
Assignee | ||
Comment 9•11 months ago
|
||
Cleaned up Baseline code for function $f2
(with redundant labels removed)
Assignee | ||
Comment 10•11 months ago
|
||
Cleaned up Ion code for function $f2
Assignee | ||
Updated•11 months ago
|
Assignee | ||
Updated•11 months ago
|
Comment 11•11 months ago
|
||
Comment 12•11 months ago
|
||
We ran this in Ion for comparison, since Ion does not crash. Pernosco session: https://pernos.co/debug/T0cREOVqbf1RT4mYE86UGw/index.html#f{m[A+s,1UY_,t[AQ,jtU_,f{e[A+s,0/o_,s{aYP6a/sAA,bBQ,uBU63Fg,oBVNjHQ___/
The difference with Ion seems to be that some other instance calls run, which end up setting and clearing packedExitFP_. Therefore, by the time we call the imported JS function, packedExitFP_ is null again and the assert does not fire.
Updated•11 months ago
|
Comment 13•11 months ago
|
||
Depends on D209039
Comment 14•11 months ago
•
|
||
This bug probably does not need a very high security rating. We trip a debug assert, but the exit FP information in JIT activations is often stale and regularly reset to other values whenever it would be important (e.g. when starting a trap). It's not clear to any of us how this could actually be exploited. Still, we might as well keep the bug secure, just with a low or moderate severity.
Updated•11 months ago
|
Comment 15•11 months ago
|
||
Comment 16•11 months ago
|
||
Updated•11 months ago
|
Comment hidden (obsolete) |
Comment 18•11 months ago
|
||
(In reply to Acasandrei Beatrice (needinfo me) from comment #17)
(In reply to Pulsebot from comment #15)
Pushed by bvisness@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a414439df5c0
Always clear exit FP when handling wasm exceptions. r=jandemPerfherder has detected a talos performance change from push a414439df5c0f6563443f055e6f53e4fec5f3b94.
Not sure it is us. I see pdf updates in the same pushlog https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=91cfcfc5935b5ac51c36c1a08a064c33a2d96fd5&tochange=a414439df5c0f6563443f055e6f53e4fec5f3b94 . Bug 1894727 or bug 1894705 ?
Updated•11 months ago
|
Updated•11 months ago
|
Updated•10 months ago
|
Comment 19•10 months ago
|
||
Should we backport this ESR also? We'll need a rebased patch and an uplift request if yes.
Assignee | ||
Comment 20•10 months ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #19)
Should we backport this ESR also? We'll need a rebased patch and an uplift request if yes.
We decided "no" (but thanks for the query):
(1) it seems like the fix only fixes an assertion failure.
(2) we don't have a good way to know that we won't break something else in esr115.
(3) in any case esr115 is late in cycle and esr128, with this fix already in, will be along soon.
(4) we don't have any indication of this crashing in the wild; the bug came from tsmith on the fuzzing team.
(5) if esr115 does start crashing we can always request uplift later.
Updated•10 months ago
|
Updated•10 months ago
|
Updated•17 days ago
|
Description
•