Closed Bug 1519037 Opened 9 months ago Closed 8 months ago

Assertion failure: !used(), at js/src/jit/Label.h:85 with --dump-bytecode

Categories

(Core :: JavaScript Engine, defect, critical)

All
Linux
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla67
Tracking Status
firefox-esr60 --- unaffected
firefox65 --- unaffected
firefox66 --- fixed
firefox67 --- fixed

People

(Reporter: gkw, Assigned: arai)

References

(Blocks 2 open bugs)

Details

(5 keywords, Whiteboard: [jsbugmon:])

Attachments

(13 files)

24.19 KB, text/plain
Details
7.44 MB, application/x-zip-compressed
Details
3.72 MB, application/octet-stream
Details
28.40 KB, text/plain
Details
1.77 KB, patch
Details | Diff | Splinter Review
1012 bytes, patch
Details | Diff | Splinter Review
20.92 KB, text/plain
Details
5.32 KB, text/plain
Details
20.93 KB, text/plain
Details
1.40 KB, application/x-zip-compressed
Details
2.91 KB, patch
Details | Diff | Splinter Review
47 bytes, text/x-phabricator-request
Details | Review
9.25 KB, patch
arai
: review+
Details | Diff | Splinter Review

The following testcase crashes on mozilla-central revision 9d40c98b9efb (build with --enable-debug --build-with-clang, run with --fuzzing-safe --wasm-gc --test-wasm-await-tier2 --no-asmjs --ion-instruction-reordering=on --ion-eager --no-streams --ion-offthread-compile=off --gc-zeal=9,453 --no-native-regexp --dump-bytecode w97-out.wrapper w97-out.wasm):

See attachment.

Backtrace:

#0  js::jit::Label::~Label (this=0x55a000000f89) at js/src/jit/Label.h:85
#1  GenerateImportJitExit (masm=..., fi=..., offsets=0x7f3000006e0, throwLabel=<optimized out>) at js/src/wasm/WasmStubs.cpp:1476
#2  js::wasm::GenerateStubs (env=..., imports=..., exports=..., code=0x55a09778bd48) at js/src/wasm/WasmStubs.cpp:1959
#3  0x000055a092fdf6dd in js::wasm::ModuleGenerator::finishCodeTier (this=0x7ffc90426b50) at js/src/wasm/WasmGenerator.cpp:1004
#4  0x000055a092fe02cb in js::wasm::ModuleGenerator::finishModule (this=0x7ffc90426b50, bytecode=..., maybeTier2Listener=0x0, maybeLinkData=0x0) at js/src/wasm/WasmGenerator.cpp:1112
#5  0x000055a092f6ed92 in js::wasm::CompileBuffer (args=..., bytecode=..., error=0x7ffc90427c90, warnings=<optimized out>, maybeLinkData=0x0) at js/src/wasm/WasmCompile.cpp:551
#6  0x000055a092fffac4 in js::WasmModuleObject::construct (cx=0x55a093bb4010, argc=<optimized out>, vp=<optimized out>) at js/src/wasm/WasmJS.cpp:1088
/snip

For detailed crash information, see attachment.

This is a hard-to-reproduce one. It might be related to OOM issues, but locking s-s just-in-case. I'm unable to get a regression window due to its intermittency.

Setting needinfo? from Lars (who was looking at a lot of wasm fuzzbugs recently), as a start.

Flags: needinfo?(lhansen)
Summary: Assertion failure: !used(), at js/src/jit/Label.h:85 with wasm → Assertion failure: !used(), at js/src/jit/Label.h:85
Attached file wasm_and_wrapper.zip
Attached file seed.7z

Seed

This shouldn't be possible, and if it were possible it should be trivial to reproduce it. It is probably indicative of something else bad going on, like the stack being scribbled on. Needs serious attention.

Whiteboard: [jsbugmon:update] → [jsbugmon:]
JSBugMon: Cannot process bug: Error: Failed to isolate test from comment

Given the logic of setting and checking Label::used in this very short region, I don't see OOM as a realistic reason why used should not be set at the end of it. Some wild pointer scribbling on the stack is more likely. (I've not tried to repro yet, just looked at the code.)

Flags: needinfo?(lhansen)

Elevating to [fuzzblocker], this happens fairly often enough.

Whiteboard: [jsbugmon:] → [fuzzblocker][jsbugmon:]

Are you able to repro without --dump-bytecode?

Flags: needinfo?(nth10sd)

I'm not sure. They are hard-to-reproduce but you may be right, most results that I checked have --dump-bytecode.

Flags: needinfo?(nth10sd)

(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #10)

I'm not sure. They are hard-to-reproduce but you may be right, most results that I checked have --dump-bytecode.

I don't see that flag listed in fuzz-flags.txt either.

I'll wait for more results.

A week later, looking through about 50 more reports, all of them show --dump-bytecode, coupled with that not being on fuzz-flags, it'll be removed from testing.

Flags: needinfo?(lhansen)
Whiteboard: [fuzzblocker][jsbugmon:] → [jsbugmon:]

OK. In that case it's more likely than not that this is some sort of buffer overrun in bytecode dumping - at least that's consistent with symptoms. I skimmed that code but nothing jumped out at me, as expected. I probably won't devote any time to this bug for the next while, there are too many other things to do.

Flags: needinfo?(lhansen)

if what Lars says is the case would that be caught by running --dump-bytecode in an ASAN build?

Flags: needinfo?(nth10sd)

That can be tried but it will be lower on the radar for now.

I spent some extra time today but couldn't get it to reproduce in an ASAN build.

Flags: needinfo?(nth10sd)
Keywords: sec-audit
Attached file stack for ARM assert

I tried turning --dump-bytecode on for 1-2 weeks again - ARM builds also crash at [@ js::jit::Simulator::decodeSpecialCondition] or [@ js::jit::SimulatorProcess::checkICacheLocked] or [@ js::jit::Simulator::readW] or even Assertion failure: callee_saved_value == get_register(r4), at /home/ubuntu/trees/mozilla-central/js/src/jit/arm/Simulator-arm.cpp:4815

Not sure if this helps, but I guess I can take the flag off the radar again. Please let me know in a day or so if there's any more information I can provide.

Arai-san, I see CompileUtf8::InflateToUtf16 in part of the stack in comment 18.

Would that possibly be related to --dump-bytecode? (I'm speculating that it might be we're dumping a lot of bytecode due to a large wasm + wrapper file, that CompileUtf8::InflateToUtf16 isn't dealing properly with, hence we are crashing/asserting all over the place)

Flags: needinfo?(arai.unmht)

bytecode dumping itself happens at the end of shell,

if it crashes before that part (so, while running the code), what --dump-bytecode affects is JSRuntime::profilingScripts (which I don't know much about)
https://searchfox.org/mozilla-central/rev/152993fa346c8fd9296e4cd6622234a664f53341/js/src/shell/js.cpp#10385-10386

Yeah, I don't think the crash is UTF-8 related, or at least not related to the calls appearing in those stacks.

what's --build-with-clang option?
I cannot find it

Flags: needinfo?(nth10sd)

It's a funfuzz flag which is not needed. (I forgot to remove it at the top here)

Flags: needinfo?(nth10sd)

as discussed in IRC, prepared 2 patches.

--dump-bytecode option has 2 feature, one is enabling profilingScripts part,
and other for actually dumping bytecode at the end of the shell.

this one disables profilingScripts part

Assignee: nobody → arai.unmht

and this one disables the bytecode dumping part

Also, let's open this up, since it seems to only involve --dump-bytecode.

Group: javascript-core-security
Summary: Assertion failure: !used(), at js/src/jit/Label.h:85 → Assertion failure: !used(), at js/src/jit/Label.h:85 with --dump-bytecode

Let me know if you still need testing B patch results. (I hadn't started it yet since we have the testing A patch results now)

Attachment #9041641 - Attachment is patch: false

gkw says its ARM and x86_64

Hardware: x86_64 → All

Run this with a shell compiled with :arai's comment 24 patch:

--fuzzing-safe --no-threads --ion-eager --gc-zeal=19,251 --dump-bytecode w282-out.wrapper w282-out.wasm

Crashes at:

00963: 63 retrval
{}
--- END SCRIPT w282-out.wrapper:1 ---

Thread 1 "js-dbg-64-linux" received signal SIGSEGV, Segmentation fault.
js::NativeObject::getReservedSlot (this=0xfffe4b4b4b4b4b4b, index=0) at /home/fuzz3/trees/mozilla-central/js/src/vm/NativeObject.h:1072
1072 MOZ_ASSERT(index < JSSLOT_FREE(getClass()));
(gdb) bt 5
#0 js::NativeObject::getReservedSlot (this=0xfffe4b4b4b4b4b4b, index=0) at /home/fuzz3/trees/mozilla-central/js/src/vm/NativeObject.h:1072
#1 0x000055555684c1cf in js::ScriptSourceObject::source (this=0xfffe4b4b4b4b4b4b) at /home/fuzz3/trees/mozilla-central/js/src/vm/JSScript.h:1176
#2 JSScript::scriptSource (this=<optimized out>) at /home/fuzz3/trees/mozilla-central/js/src/vm/JSScript.cpp:1002
#3 0x00005555566f0207 in JSScript::filename (this=0xfffe4b4b4b4b4b4b) at /home/fuzz3/trees/mozilla-central/js/src/vm/JSScript.h:2381
#4 js::DumpRealmPCCounts (cx=0x7ffff5517000) at /home/fuzz3/trees/mozilla-central/js/src/vm/BytecodeUtil.cpp:202
(More stack frames follow...)
(gdb)

Flags: needinfo?(pbone)

Full configuration command with needed environment variables is:
AR=ar sh /home/fuzz3/trees/mozilla-central/js/src/configure --enable-debug --with-ccache --enable-gczeal --enable-debug-symbols --disable-tests --disable-cranelift

m-c rev 476293c6700f was used with :arai's comment 24 patch.

So, turns out that ZoneCellIter can return a script that is about to be finalized.
Added filtering, and also JS::ExposeScriptToActiveJS call (not sure if necessary).
:gkw, can you test with this patch? (to be clear, without patch A, to see if this fixes the original situation)

Flags: needinfo?(arai.unmht) → needinfo?(nth10sd)
Status: NEW → ASSIGNED

I've spawned some instances with the comment 34 patch. If there is any immediate bug, I'll reply, else, will let it run for a few days.

Flags: needinfo?(nth10sd)

(In reply to Tooru Fujisawa [:arai] from comment #34)

So, turns out that ZoneCellIter can return a script that is about to be finalized.

Yes, it returns all cells in the zone, even if they are about the be finalized by incremental sweeping. ZoneCellIter is a pretty blunt instrument.

Added filtering, and also JS::ExposeScriptToActiveJS call (not sure if necessary).

You don't need this as it already happens here: https://searchfox.org/mozilla-central/source/js/src/gc/GC-inl.h#163

thanks!

I'll check other non-GC-internal consumers as well, and post the fixed patch shortly.

Component: Javascript: Web Assembly → JavaScript Engine
Flags: needinfo?(pbone)

to be clear, I posted a patch, but I still want to see the automated-testing result of the patch in comment #34 (and also patch in comment #38 too), to see if it fixes the original issue, given the issue that I've confirmed the fix is in the different place than the original one.

(In reply to Tooru Fujisawa [:arai] from comment #39)

to be clear, I posted a patch, but I still want to see the automated-testing result of the patch in comment #34 (and also patch in comment #38 too), to see if it fixes the original issue, given the issue that I've confirmed the fix is in the different place than the original one.

Understood.

I've spawned some instances with the comment 34 patch. If there is any immediate bug, I'll reply, else, will let it run for a few days.

(Initially I didn't test the patch with --dump-bytecode here, I've reinstated the flag specially for this now. If nothing special shows up in a day or two, I'll switch over to testing the one in comment 38 and report my results)

Comment on attachment 9041698 [details] [diff] [review]
Filter out scripts that is about to be finalized in bytecode dumping and profiling.

I still see this assertion with no stack:

Assertion failure: !used(), at /home/ubuntu/trees/mozilla-central/js/src/jit/Label.h:85

Tested on m-c rev 476293c6700f with patch from comment 34.

Should I test the comment 38 patch next or wait for another one?
Attachment #9041698 - Flags: feedback-

patch in comment #38 added similar filtering in BaselineJIT related to profiling (which I think is enabled by the --dump-bytecode option)
it would be nice to test with it.

if it doesn't fix, I'll file another bug for the patch (given the patch itself should be valid).

Hi sfink,

Can our hazard analysis be taught that ZoneCellIter can iterate over about-to-be-finalised cells and you shouldn't keep pointers to them if you're going to call something that may GC. Actually I think this would be super complex and probably not feasable. It's not just the scope of ZoneCellIter because in this case the scripts were put into a vector that was live while a GC occured.

In other words. No I don't know if Arai has covered all the uses of ZoneCellIter.

Flags: needinfo?(sphink)

Arai's patch fixes the easilly reproducable case for me.

Comment on attachment 9041811 [details]
Bug 1519037 - Filter out objects that is about to be finalized when iterating for non-GC purpose. r?pbone

I think this patch fixes most, if not all, of these issues. I haven't seen anything pop up immediately after almost a day of dedicated testing, even with --dump-bytecode ramped up.

Flags: needinfo?(arai.unmht)
Attachment #9041811 - Flags: feedback+

In lieu of this GC bug, do we conclude that it may (or may not) be worth adding this --dump-bytecode flag to fuzz-flags.txt ?

Pushed by arai_a@mac.com:
https://hg.mozilla.org/integration/autoland/rev/45fe8012bb58
Filter out objects that is about to be finalized when iterating for non-GC purpose. r=pbone

(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #46)

In lieu of this GC bug, do we conclude that it may (or may not) be worth adding this --dump-bytecode flag to fuzz-flags.txt ?

For the purpose of finding bugs that may affect Firefox, --code-coverage option would be more suitable, that enables profiling part that --dump-bytecode also enables.
But at least for now --dump-bytecode covers everything --code-coverage does, so I think either would work.

Anyway, adding the flag to test those parts will be beneficial.

Flags: needinfo?(arai.unmht)
Status: ASSIGNED → RESOLVED
Closed: 8 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla67
See Also: → 1526234
See Also: → 1526257

Is this still a fuzzblocker for you on Beta or can we leave this on Nightly?

Flags: needinfo?(nth10sd)

It's preferable that we backport to Beta, please.

Flags: needinfo?(nth10sd) → needinfo?(arai.unmht)

Beta/Release Uplift Approval Request

Feature/Bug causing the regression

several places across tree

User impact if declined

Not likely for Firefox users at this point, given it's in debugging code, or not-yet-enabled part.
but if there's any case, it's crash by dereferencing poisoned memory.

This is mostly for fuzzing team testing those code on beta channel.

Is this code covered by automated tests?

No

Has the fix been verified in Nightly?

Yes

Needs manual test from QE?

No

If yes, steps to reproduce

List of other uplifts needed

None

Risk to taking this patch

Low

Why is the change risky/not risky? (and alternatives if risky)

Mostly not-used code in Firefox

String changes made/needed

none

Flags: needinfo?(arai.unmht)
Attachment #9042677 - Flags: review+
Attachment #9042677 - Flags: approval-mozilla-beta?
Comment on attachment 9042677 [details] [diff] [review]
(mozilla-beta) Filter out objects that is about to be finalized when iterating for non-GC purpose. r=pbone

Fix for assertions for beta fuzzing, OK to uplift.
Attachment #9042677 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

There are conflicts while merging js/src/vm/BytecodeUtil.cpp
Seems to be caused by https://hg.mozilla.org/mozilla-central/log/45fe8012bb58/js/src/vm/BytecodeUtil.cpp?patch=&linerange=2555:2568

:arai , can you please provide a patch for beta?

Flags: needinfo?(arai.unmht)

(In reply to Narcis Beleuzu [:NarcisB] from comment #54)

:arai , can you please provide a patch for beta?

doesn't attachment 9042677 [details] [diff] [review] apply?

Flags: needinfo?(arai.unmht)

(In reply to Paul Bone [:pbone] from comment #43)

Hi sfink,

Can our hazard analysis be taught that ZoneCellIter can iterate over about-to-be-finalised cells and you shouldn't keep pointers to them if you're going to call something that may GC. Actually I think this would be super complex and probably not feasable. It's not just the scope of ZoneCellIter because in this case the scripts were put into a vector that was live while a GC occured.

Maybe?

The "keeping pointers to them" is pretty hard to detect (though it sort of is what we do with the stylo analysis, fwiw.)

I was thinking maybe you could do some trick by saying that pointers retrieved from a ZoneAllCellIter are "unrootable" -- it doesn't matter if you put them in a Rooted, a GC can invalidate them anyway. So it would still require some data flow analysis to find all the variables that might end up storing an unrootable. That's the tricky part; how should it know that myarray.append(cell) is storing 'cell' in 'myarray' and therefore 'myarray' should not be considered to be rooted even if its type is Rooted<GCVector<T>> or whatever? But it might work, just by saying that any method called on a variable is assumed to store any unrootable arguments. (You really shouldn't be doing much with ZoneAllCellIter results in the first place, so this isn't that harsh a restriction.)

I'm not sure if it's worth it, but I do think it's feasible. If the same pattern were useful for other checks, the benefit/cost ratio would be higher. Can you think of other situations where it might apply?

Flags: needinfo?(sphink)
Flags: qe-verify+
QA Whiteboard: [qa-triaged]

Hi Gary or anyone, Can you help us out with some steps on how to reproduce this issue, also would we need Fuzzin builds in order to check this issue or regular Beta and Release builds are the ones that should be tested?

Flags: needinfo?(nth10sd)
Flags: needinfo?(pbone)

Hi Rares,

Arai is probably the best person to answer, I'll let them correct me if I get anything wrong.

  1. This is hard to reproduce, I guess the fuzzers had to run it very often to get it to crash.
  2. It's the JS shell, comment 1 has the build options for the shell.
  3. I don't remember if it was particular versions that had the problem, you might need to use a very specific code revision.

Good luck.

Flags: needinfo?(pbone)

The reproduction steps are following:

  1. Download https://bugzilla.mozilla.org/attachment.cgi?id=9035558 and extract it somewhere (WARNING, the extracted files are VERY LARGE, ~2GB)
  2. build JS shell (m-c 9d40c98b9efb) with --enable-debug, or grab debug build of JS shell
  3. run JS shell with the following command:

./js --fuzzing-safe --wasm-gc --test-wasm-await-tier2 --no-asmjs --ion-instruction-reordering=on --ion-eager --no-streams --ion-offthread-compile=off --gc-zeal=9,453 --no-native-regexp --dump-bytecode w97-out.wrapper w97-out.wasm

or

./js --fuzzing-safe --arm-asm-nop-fill=1 --ion-edgecase-analysis=off --ion-eager --more-compartments --nursery-strings=off --ion-offthread-compile=off --no-cgc --gc-zeal=10,166 --baseline-eager --dump-bytecode w97-out.wrapper w07-out.wasm

(w97-out.wrapper and w97-out.wasm are files extracted above)

  1. run it until it crashes

from my experience, applying https://bugzilla.mozilla.org/attachment.cgi?id=9041436 can make the crash more frequent (or, maybe just that it takes less time for each run).

fwiw, bug 1526257 fixes the underlying issue in cleaner way.

I verify that this issue reproduces on m-c rev 6886f5504885 compiled with patch in comment 24:

/snip
00957:  61  nop
                  {}
00958:  61  jumptarget (ic: 136)
                  {"interp": 1}
00963:  63  retrval
                  {}
--- END SCRIPT w282-out.wrapper:1 ---
Segmentation fault (core dumped)

but no longer reproduces on m-c rev 45fe8012bb58 compiled with patch in comment 24:

/snip
00046:  19  strict-setprop "loadPath"
                  {}
00051:  19  pop
                  {}
00052:  20  retrval
                  {}
--- END SCRIPT shell/ModuleLoader.js:17 ---

https://hg.mozilla.org/mozilla-central/rev/45fe8012bb58 is the fix for this bug in comment 49.

-> VERIFIED FIXED

Status: RESOLVED → VERIFIED
Flags: needinfo?(nth10sd)

Gary was wondering about the rating of this as sec-audit. It sounds like this found a bunch of issues with how we iterate over scripts. It seems like this is mostly debugging kinds of code, but maybe not all of it. So I'll change the rating to sec-moderate.

Keywords: sec-auditsec-moderate

Hi, based on comment 62 this issue was verified by Gary Kwong, we can remove the qeverify+ flag.

QA Whiteboard: [qa-triaged]
Flags: qe-verify+
You need to log in before you can comment on or make changes to this bug.