Assertion failure: oom::maxAllocations == (18446744073709551615UL), at dist/include/js/Utility.h:355 with evalInWorker and oomTest/stackTest racing

RESOLVED FIXED in Firefox 64

Status

()

defect
P1
critical
RESOLVED FIXED
8 months ago
7 months ago

People

(Reporter: decoder, Assigned: nbp)

Tracking

(Blocks 2 bugs, {assertion, jsbugmon, testcase})

Trunk
mozilla64
x86_64
Linux
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite +

Firefox Tracking Flags

(firefox-esr60 wontfix, firefox62 wontfix, firefox63 wontfix, firefox64 fixed)

Details

(Whiteboard: [jsbugmon:update,bisect][fuzzblocker])

Attachments

(1 attachment, 1 obsolete attachment)

Reporter

Description

8 months ago
The following testcase crashes on mozilla-central revision 2d2dee08739f (build with --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --disable-tests --disable-profiling --enable-debug --enable-optimize, run with --fuzzing-safe --ion-offthread-compile=off --ion-eager):

evalInWorker(`
  oomTest(() => parseModule('import v from "mod";'));
`);
src = `
  function SameValue(v1, v2) {
    if (!SameValue(a1[i], a2[i]))
    return true;
  }
`;
src += src;
src += src;
src += src;
stackTest(function() {
  eval(src);
});
evalInWorker("")();


Backtrace:

received signal SIGSEGV, Segmentation fault.
#0  0x0000555555821768 in js::AutoEnterOOMUnsafeRegion::~AutoEnterOOMUnsafeRegion (this=<optimized out>, __in_chrg=<optimized out>) at dist/include/js/Utility.h:355
#1  0x00005555558117d6 in EvalInWorker (cx=0x7ffff5f18000, argc=<optimized out>, vp=<optimized out>) at js/src/shell/js.cpp:4001
#2  0x0000555555969ed5 in CallJSNative (cx=0x7ffff5f18000, native=0x5555558114c0 <EvalInWorker(JSContext*, unsigned int, JS::Value*)>, args=...) at js/src/vm/Interpreter.cpp:461
#3  0x000055555595c517 in js::InternalCallOrConstruct (cx=<optimized out>, cx@entry=0x7ffff5f18000, args=..., construct=construct@entry=js::NO_CONSTRUCT) at js/src/vm/Interpreter.cpp:553
#4  0x000055555595cb3d in InternalCall (cx=0x7ffff5f18000, args=...) at js/src/vm/Interpreter.cpp:607
#5  0x000055555595cc8a in js::CallFromStack (cx=<optimized out>, args=...) at js/src/vm/Interpreter.cpp:613
#6  0x0000555555ab4b33 in js::jit::DoCallFallback (cx=<optimized out>, frame=0x7fffffffcb58, stub_=<optimized out>, argc=<optimized out>, vp=0x7fffffffcb08, res=...) at js/src/jit/BaselineIC.cpp:3810
#7  0x000008f6f0c00253 in ?? ()
[...]
#29 0x0000000000000000 in ?? ()
rax	0x0	0
rbx	0x7ffff5f18000	140737319632896
rcx	0x7ffff6c1c2dd	140737333281501
rdx	0x0	0
rsi	0x7ffff6eeb770	140737336227696
rdi	0x7ffff6eea540	140737336223040
rbp	0x7fffffffc3b0	140737488339888
rsp	0x7fffffffc3a0	140737488339872
r8	0x7ffff6eeb770	140737336227696
r9	0x7ffff7fe6cc0	140737354034368
r10	0x58	88
r11	0x7ffff6b927a0	140737332717472
r12	0x7fffffffcb18	140737488341784
r13	0x7ffff5fca920	140737320364320
r14	0x1	1
r15	0x7fffffffc3e0	140737488339936
rip	0x555555821768 <js::AutoEnterOOMUnsafeRegion::~AutoEnterOOMUnsafeRegion()+216>
=> 0x555555821768 <js::AutoEnterOOMUnsafeRegion::~AutoEnterOOMUnsafeRegion()+216>:	movl   $0x0,0x0
   0x555555821773 <js::AutoEnterOOMUnsafeRegion::~AutoEnterOOMUnsafeRegion()+227>:	ud2



This is not a recent regression but it took me quite a while to find a useful testcase. Also, this is causing a fair amount of crashes, marking fuzzblocker for OOM testing.
Reporter

Comment 1

8 months ago
:nbp, can you help with this or forward to someone else? Thanks!
Flags: needinfo?(nicolas.b.pierron)
Reporter

Comment 2

8 months ago
This can also assert with "Assertion failure: maxAllocations >= 0 (alloc count + oom limit exceeds range, your oom limit is probably too large), at dist/include/js/Utility.h:356".
Assignee

Comment 3

8 months ago
Looking.
Assignee: nobody → nicolas.b.pierron
Status: NEW → ASSIGNED
Flags: needinfo?(nicolas.b.pierron)
Assignee

Updated

8 months ago
Priority: -- → P1
Assignee

Comment 4

8 months ago
Ok, I managed to reproduce this issue but failed to reproduce it under rr.
Assignee

Comment 5

8 months ago
Ok, just a blind guess: oomTest and stackTest are running concurrently and overwritting each others state variable and global variables.

I wonder, should we either make these a thread-local-storage variables, in which case we might miss some off-thread OOM failures?! Or should we create a global lock for all?
(In reply to Nicolas B. Pierron [:nbp] from comment #5)
Yes that sounds likely.  

How about moving the JSContext::runningOOMTest flag to JSRuntime and using the version on the parent runtime for worker threads?  I think that would prevent this situation.
Assignee

Comment 7

7 months ago
Made the code by moving the runningOOMTest to the JSRuntime as an atomic to guard the globals.

However, when the worker thread is running the oomTest function, I noticed that the LifoAlloc of the main thread can trigger some of these simulated OOMs.  I suspect js::oom::IsThreadSimulatingOOM triggers for all threads.
Assignee

Comment 8

7 months ago
This patch prevent stackTest from starting at the same time as an already
running oomTest in a worker thread.

Note: A follow-up is required to remove the while(true) from the test case, as
the worker thread oomTest function call causes the main thread to report OOMs.
Attachment #9016697 - Flags: review?(jcoppeard)
(In reply to Nicolas B. Pierron [:nbp] from comment #7)
> However, when the worker thread is running the oomTest function, I noticed
> that the LifoAlloc of the main thread can trigger some of these simulated
> OOMs.  I suspect js::oom::IsThreadSimulatingOOM triggers for all threads.

Yes you are right, and this is not good.  Maybe we should bad oom tests in workers?  The other solution is to use thread local variables for these things but I think it's going to be a performance hit to do this on every memory allocation.  What do you think?
Assignee

Comment 10

7 months ago
This patch add a new fake thread type, which is used to identify the current thread.

With this changes, Worker threads will only trigger the current thread type
instead of checking all threads as these checking/fuzzing functions would do on
the main thread. The problem was that worker thread would be checking the main
thread instead of the worker threads, and that worker threads would be checking
each others.

Also, THREAD_TYPE_WASM_TIER2 was not checked before and I changed the loop
condition to check it as well, and as such removed the +1 on thread types.

This patch also extend some testing infra put in place for checking shared array
buffers, and uses it as a dummy way of synchronizing threads.
Attachment #9017184 - Flags: review?(jcoppeard)
Assignee

Updated

7 months ago
Attachment #9016697 - Attachment is obsolete: true
Attachment #9016697 - Flags: review?(jcoppeard)
Comment on attachment 9017184 [details] [diff] [review]
Add THREAD_TYPE_CURRENT to simulate errors on a single worker thread.

Review of attachment 9017184 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the updated patch.  This looks good to me.

The dummy thread synchronisation is a bit scary, but it's test code so I think this is OK.
Attachment #9017184 - Flags: review?(jcoppeard) → review+

Comment 12

7 months ago
Pushed by npierron@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/64e46a7cc29f
Add THREAD_TYPE_CURRENT to simulate errors on a single worker thread. r=jonco

Comment 13

7 months ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/64e46a7cc29f
Status: ASSIGNED → RESOLVED
Last Resolved: 7 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
You need to log in before you can comment on or make changes to this bug.