Closed Bug 1347539 Opened 6 years ago Closed 5 years ago

Crash [@ js::TenuringTracer::traverse] or Assertion failure: ownerContext().context() == nullptr, at js/src/gc/ZoneGroup.cpp:55 or various crashes with use-after-free involving evalInCooperativeThread

Categories

(Core :: JavaScript Engine: JIT, defect, P1)

x86_64
Linux
defect

Tracking

()

VERIFIED FIXED
Tracking Status
firefox-esr45 --- unaffected
firefox-esr52 --- unaffected
firefox53 --- unaffected
firefox54 --- unaffected
firefox55 --- verified

People

(Reporter: decoder, Assigned: bhackett1024)

References

Details

(7 keywords, Whiteboard: [fuzzblocker] [jsbugmon:update,ignore])

Crash Data

Attachments

(4 files)

The following testcase crashes on mozilla-central revision 8dd496fd015a (build with --enable-posix-nspr-emulation --enable-valgrind --enable-gczeal --disable-tests --enable-stdcxx-compat --disable-profiling --enable-debug --enable-optimize, run with --fuzzing-safe --thread-count=2 --ion-eager --baseline-eager):

See attachment.


Backtrace:

 received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff68ff700 (LWP 31691)]
0x0000000000e01118 in js::ZoneGroup::enter (this=this@entry=0x7ffff6986180) at js/src/gc/ZoneGroup.cpp:55
#0  0x0000000000e01118 in js::ZoneGroup::enter (this=this@entry=0x7ffff6986180) at js/src/gc/ZoneGroup.cpp:55
#1  0x00000000006f53d6 in JSContext::enterZoneGroup (group=0x7ffff6986180, this=0x7ffff383b000) at js/src/jscntxtinlines.h:523
#2  JSContext::enterCompartment (maybeLock=0x0, c=0x7ffff6938800, this=0x7ffff383b000) at js/src/jscntxtinlines.h:452
#3  JSContext::enterCompartmentOf<JS::Rooted<JSScript*> > (target=..., this=0x7ffff383b000) at js/src/jscntxtinlines.h:463
#4  js::AutoCompartment::AutoCompartment<JS::Rooted<JSScript*> > (target=..., cx=0x7ffff383b000, this=<synthetic pointer>) at js/src/jscompartmentinlines.h:44
#5  js::jit::AttachFinishedCompilations (cx=cx@entry=0x7ffff383b000) at js/src/jit/Ion.cpp:2114
#6  0x0000000000b9097e in InvokeInterruptCallback (cx=0x7ffff383b000) at js/src/vm/Runtime.cpp:499
#7  0x00001bb75bc644de in ?? ()
[...]
#25 0x0000000000000000 in ?? ()
rax	0x0	0
rbx	0x7ffff6986180	140737330569600
rcx	0x7ffff6c28a2d	140737333332525
rdx	0x0	0
rsi	0x7ffff6ef7770	140737336276848
rdi	0x7ffff6ef6540	140737336272192
rbp	0x7ffff68fde90	140737330011792
rsp	0x7ffff68fde80	140737330011776
r8	0x7ffff6ef7770	140737336276848
r9	0x7ffff68ff700	140737330018048
r10	0x0	0
r11	0x0	0
r12	0x7ffff383b000	140737278881792
r13	0x7ffff6938800	140737330251776
r14	0x7ffff383b000	140737278881792
r15	0x7ffff68fdf10	140737330011920
rip	0xe01118 <js::ZoneGroup::enter()+312>
=> 0xe01118 <js::ZoneGroup::enter()+312>:	movl   $0x0,0x0
   0xe01123 <js::ZoneGroup::enter()+323>:	ud2    

The attached testcase is entirely unreduced because it stops reproducing when attempting to reduce it.

The issue is very likely shell-only because of evalInCooperativeThread. Marking this s-s and fuzzblocker until triaged though to be sure. Even if this is shell-only, it lowers our ability to see other use-after-free and GC issues.
Attached file Testcase
Whiteboard: [jsbugmon:update,bisect][fuzzblocker] → [fuzzblocker] [jsbugmon:update]
JSBugMon: Bisection requested, result:
autoBisect shows this is probably related to the following changeset:

The first bad revision is:
changeset:   https://hg.mozilla.org/mozilla-central/rev/fad2e60d7843
user:        Brian Hackett
date:        Fri Feb 17 05:13:11 2017 -0700
summary:     Bug 1337968 - Add API and shell harness for cooperative multithreading, r=jandem.

This iteration took 1.624 seconds to run.
Assignee: nobody → jcoppeard
Priority: -- → P1
Whiteboard: [fuzzblocker] [jsbugmon:update] → [fuzzblocker] [jsbugmon:update,ignore]
JSBugMon: The testcase found in this bug no longer reproduces (tried revision ff04d410e74b).
I'm going to mark this sec-other because the browser does not take advantage of this yet.
Keywords: sec-other
Is this fallout from the cooperative multithreading patches?
Flags: needinfo?(bhackett1024)
Attached file another testcase
Here's another testcase, that is fairly reproducible but stops reproducing as reliably when reduced.

Configuration command:
CXX="g++ -m32 -msse2 -mfpmath=sse" CC="gcc -m32 -msse2 -mfpmath=sse" AR=ar PKG_CONFIG_LIBDIR=/usr/lib/pkgconfig sh ./configure --target=i686-pc-linux --with-ccache --enable-gczeal --enable-debug-symbols --disable-tests

$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.4) 5.4.0 20160609
Copyright (C) 2015 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.2 LTS"

Check out the line near count=1174, it seems to involve evalInCooperativeThread as well.

Reproduces on m-c rev 0e0eb96528a1 and --fuzzing-safe --thread-count=2 --ion-eager.

Hannes, would you like to double check if this is the case, or whether this is reproducible for you?
Flags: needinfo?(hv1989)
(gdb) x/i $pc
=> 0x872e49e <js::TenuringTracer::traverse<JSObject>(JSObject**)+30>:	cmpl   $0x1,(%edx)
(gdb) x/b $edx
0xfffffff0:	Cannot access memory at address 0xfffffff0
(gdb)

I'll leave this to the devs to rate.
$ ./js-32-linux-0e0eb96528a1 
js> getBuildConfiguration()
({'rooting-analysis':false, 'exact-rooting':true, 'trace-jscalls-api':false, 'incremental-gc':true, 'generational-gc':true, debug:false, release_or_beta:false, 'has-ctypes':false, x86:true, x64:false, 'arm-simulator':false, 'arm64-simulator':false, asan:false, tsan:false, 'has-gczeal':true, 'more-deterministic':false, profiling:true, dtrace:false, valgrind:false, 'oom-backtraces':false, 'binary-data':true, 'intl-api':true, 'mapped-array-buffer':true, 'moz-memory':true, 'pointer-byte-size':4})
js>
Crash Signature: [@ js::TenuringTracer::traverse]
Summary: Assertion failure: ownerContext().context() == nullptr, at js/src/gc/ZoneGroup.cpp:55 or various crashes with use-after-free involving evalInCooperativeThread → Crash [@ js::TenuringTracer::traverse] or Assertion failure: ownerContext().context() == nullptr, at js/src/gc/ZoneGroup.cpp:55 or various crashes with use-after-free involving evalInCooperativeThread
(In reply to Gary Kwong [:gkw] [:nth10sd] from comment #6)
> Created attachment 8852277 [details]
> another testcase
>
> Hannes, would you like to double check if this is the case, or whether this
> is reproducible for you?

I can reproduce this one and managed to pintpoint the issue.

The issue is in "AttachFinishedCompilations":
http://searchfox.org/mozilla-central/source/js/src/jit/Ion.cpp#2118

When we have too many compilation pending to get linked, we just link them instead of doing lazy linking.
This wasn't a problem because we had full access to the runtime and could link for any JSContext.

Now with the multiple JSContexts/threads in a runtime we don't have this unique access anymore. As a result AutoCompartment doesn't work correctly anymore. This was non-obvious and we should (next to fixing this bug) also make sure AutoCompartment cannot be used incorrectly. I.e. make sure we don't enter a compartment in another zonegroup ...

Brian is already needinfo. He is the best person to move this forward.
Component: JavaScript Engine → JavaScript Engine: JIT
Flags: needinfo?(hv1989)
I think for this one firefox54 is not affected, since this isn't enabled by default and we are still in this one thread land currently.
Blocks: 1335095
Assignee: jcoppeard → bhackett1024
Attached patch patchSplinter Review
This patch moves the lazy link list from the runtime to the zone group so that any eager linking will be done in the same zone group that the context is currently in.

There is an assertion to check that AutoCompartments are not entering zone groups in use by other threads, but it is DEBUG-only.  This patch changes that assertion to be release mode (it is only checked when changing the owner context for a zone group, so shouldn't have an effect on perf).
Flags: needinfo?(bhackett1024)
Attachment #8856603 - Flags: review?(hv1989)
Duplicate of this bug: 1345458
Duplicate of this bug: 1345423
Comment on attachment 8856603 [details] [diff] [review]
patch

Review of attachment 8856603 [details] [diff] [review]:
-----------------------------------------------------------------

Lgtm. Thanks for taking this first upon being back!

wrt to "This patch changes that assertion to be release mode". I didn't see this change in this patch? Or did I overlook?
Attachment #8856603 - Flags: review?(hv1989) → review+
(In reply to Hannes Verschore [:h4writer] from comment #14)
> Comment on attachment 8856603 [details] [diff] [review]
> patch
> 
> Review of attachment 8856603 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Lgtm. Thanks for taking this first upon being back!
> 
> wrt to "This patch changes that assertion to be release mode". I didn't see
> this change in this patch? Or did I overlook?

This is the MOZ_RELEASE_ASSERT(ownerContext().context() == nullptr); change in ZoneGroup::enter.
Please look at the crash-stats link; this signature (js::TenuringTracer::traverse<T>) has been getting ~500 crashes per day for a Long Time (I see crashes in crashstats at least back to FF40).

Perhaps the older crashes (pre-55) were safe.  If so, please state so explicitly (and why) and mark.  Overall, the crash addresses in pre-55 don't seem much different at a glance.

If there are multiple bugs, please split off other bugs via clone.
Flags: needinfo?(bhackett1024)
(In reply to Randell Jesup [:jesup] from comment #18)
> Please look at the crash-stats link; this signature
> (js::TenuringTracer::traverse<T>) has been getting ~500 crashes per day for
> a Long Time (I see crashes in crashstats at least back to FF40).
> 
> Perhaps the older crashes (pre-55) were safe.  If so, please state so
> explicitly (and why) and mark.  Overall, the crash addresses in pre-55 don't
> seem much different at a glance.
> 
> If there are multiple bugs, please split off other bugs via clone.

I'm not sure what you're asking for.  This fixes a bug in functionality that is currently shell-only and does not affect any version of the browser.
Flags: needinfo?(bhackett1024)
Brian, can you help me understand why we're getting crashes with this stack (going back to 45?) if the browser isnt using this functionality? Thanks!
Flags: needinfo?(bhackett1024)
There may be two separate bugs here. The fuzzbug was generated with evalInCooperativeThread so is restricted to bhackett's new code that is currently shell only. The older EXCEPTION_ACCESS_VIOLATION_READ in  js::TenuringTracer::traverse<JSObject>(JSObject**) may be a more general catch-all point for other bad ptr issues. 

Jon any thoughts on this?
Flags: needinfo?(jcoppeard)
Naveed answered the question for Brian. Traverse<> is a very generic GC signature that can be caused by many underlying issues.
Flags: needinfo?(bhackett1024)
> Naveed answered the question for Brian. Traverse<> is a very generic GC
> signature that can be caused by many underlying issues.

Andrew: who should look at the crashes tied to this signature and file clones for the different underlying issues? O(500 sec crashes per day) seems worth further investigation(In reply to Andrew McCreight [:mccr8] from comment #24)
Flags: needinfo?(continuation)
(In reply to Randell Jesup [:jesup] from comment #25)
I've filed a separate bug (bug 1358073) for those crashes since it's not the same issue as this bug.
Flags: needinfo?(jcoppeard)
Flags: needinfo?(continuation)
Marking fixed per comment 19.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Group: javascript-core-security → core-security-release
Status: RESOLVED → VERIFIED
JSBugMon: This bug has been automatically verified fixed.
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.