Closed Bug 1175761 Opened 4 years ago Closed 4 years ago

crash in IsMarkedInternalCommon<T> when visiting ReadWriteWeb article, with Gecko Profiler Add-on installed & e10s disabled

Categories

(Core :: JavaScript Engine, defect, critical)

Unspecified
Linux
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla41
Tracking Status
firefox41 + verified
firefox47 --- affected
firefox48 --- fix-optional
firefox49 --- fix-optional

People

(Reporter: dholbert, Assigned: bhackett)

References

()

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is 
report bp-38a51da1-74ca-4026-868e-119192150618.
=============================================================

STR for me with my normal browsing profile:
 1. Visit http://readwrite.com/2015/06/15/microsoft-skype-for-the-web
 2. Wait a few seconds.

ACTUAL RESULTS:
Crash.

I've hit this 3 out of 3 times in my normal browsing profile.

Can't reproduce in a fresh profile, or in a fresh profile with NoScript installed [which is what I initially suspected might be involved; I use that & it's occasionally involved with triggering strange JS-engine crashes].

I'll try to figure out what's special about my normal profile that's helping to trigger this crash.

Crash reports:
bp-af077b02-2ad6-4056-832d-0f71c2150618
bp-38a51da1-74ca-4026-868e-119192150618
bp-40bb5a12-b8d2-45c8-83f8-18f942150618
OK, I can reproduce this in a fresh profile if I just do two things:
 (1) Install Gecko Profiler Add-on -- linked here, at the end of "Getting the Profiler Add-on":
 https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Profiling_with_the_Built-in_Profiler
 (2) Disable e10s in Firefox Preferences.

Then (after restarting to complete those changes), I can reliably crash by visiting the readwrite article linked in comment 0.
Summary: crash in IsMarkedInternalCommon<T> when visiting ReadWriteWeb article → crash in IsMarkedInternalCommon<T> when visiting ReadWriteWeb article, with Gecko Profiler Add-on installed & e10s disabled
CC bhackett due to type inference being in backtrace.

Here's the top 13 frames of the backtrace, from one of the crash reports:
{
0 	libxul.so 	IsMarkedInternalCommon<JSObject*> 	js/src/vm/Runtime.h
1 	libxul.so 	js::TypeSet::IsTypeMarked(js::TypeSet::Type*) 	js/src/vm/TypeInference.cpp
2 	libxul.so 	js::jit::JitcodeGlobalEntry::IonEntry::markIfUnmarked(JSTracer*) 	js/src/jit/JitcodeMap.cpp
3 	libxul.so 	js::jit::JitcodeGlobalTable::markIteratively(JSTracer*) 	js/src/jit/JitcodeMap.h
4 	libxul.so 	void js::gc::GCRuntime::markWeakReferences<js::CompartmentsIterT<js::gc::GCZoneGroupIter> >(js::gcstats::Phase) 	js/src/jsgc.cpp
5 	libxul.so 	js::gc::GCRuntime::endMarkingZoneGroup() 	js/src/jsgc.cpp
6 	libxul.so 	js::gc::GCRuntime::beginSweepPhase(bool) 	js/src/jsgc.cpp
7 	libxul.so 	js::gc::GCRuntime::incrementalCollectSlice(js::SliceBudget&, JS::gcreason::Reason) 	js/src/jsgc.cpp
8 	libxul.so 	js::gc::GCRuntime::gcCycle(bool, js::SliceBudget&, JS::gcreason::Reason) 	js/src/jsgc.cpp
9 	libxul.so 	js::gc::GCRuntime::collect(bool, js::SliceBudget, JS::gcreason::Reason) 	js/src/jsgc.cpp
10 	libxul.so 	js::gc::GCRuntime::gcSlice(JS::gcreason::Reason, long) 	js/src/jsgc.cpp
11 	libxul.so 	js::gc::GCRuntime::notifyDidPaint() 	js/src/jsgc.cpp
12 	libxul.so 	nsXPConnect::NotifyDidPaint() 	js/xpconnect/src/nsXPConnect.cpp
13 	libxul.so 	nsRefreshDriver::Tick(long, mozilla::TimeStamp) 	layout/base/nsRefreshDriver.cpp
}
mozregression gives me this regression range:
 https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=c223b8844264&tochange=3c26bef95d54

In that range, bug 1162986 looks like the most likely candidate.
Tried loading this in a debug build (with the same profile, w/ e10s disabled and Gecko Profiler Addon).

The first time I tried to load the article, it loaded fine. The second time, I failed this fatal assertion:
> Assertion failure: !IsInsideNursery(*thingp), at js/src/gc/Marking.cpp:2077

The backtrace of the assertion-failure looks similar to comment 2 (in that we're falling over in IsMarkedInternalCommon, and there's some typeinference code up the stack a bit).
When the assertion fails, at level 3 of my just-attached backtrace (in TypeSet::IsTypeMarked), when we're at this line...
> TypeSet::IsTypeMarked(TypeSet::Type* v)
> {
>     bool rv;
>     if (v->isSingletonUnchecked()) {
>         JSObject* obj = v->singletonNoBarrier();
>>>>      rv = IsMarkedUnbarriered(&obj);

...the gdb command 'p obj' gives me the following:
> (JSObject *) 0x7fab08aec850 Cannot access memory at address 0xfffc2b2b2b2b2b2b
[Tracking Requested - why for this release]: Recent regression (in past few days), causing crash shortly after loading popular tech news site.
Attached patch stopgapSplinter Review
The optimization tracking code keeps TypeSet::Type values on the heap with no associated post barriers --- as of bug 1162986 Types can now be in the nursery.  I don't see an easy way to add a post barrier, as it's not clear to me what the lifetime of a IonTrackedTypeVector is and adding code for tracing strong references from all of JitcodeGlobalTable looks kind of complicated.  So this patch just watches for nursery types we are trying to track and substitutes an unknown type instead.
Assignee: nobody → bhackett1024
Flags: needinfo?(bhackett1024)
Attachment #8624300 - Flags: review?(shu)
Comment on attachment 8624300 [details] [diff] [review]
stopgap

Review of attachment 8624300 [details] [diff] [review]:
-----------------------------------------------------------------

:( Thanks for the bandaid.
Attachment #8624300 - Flags: review?(shu) → review+
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/a371be23c51f
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Adding a tracking flag for FF41 and adding a qe-verify flag so QE team can verify the fix works and test related scenarios to ensure there aren't any unexpected regressions.
Flags: qe-verify+
Reproduced the crash using old Nightly build from 2015-06-17[1], verified that the crash does not reproduce anymore using latest Nightly 43.0a1 and Firefox 41 beta 5 (though I could not follow the steps exactly, without e10s). I installed geckoprofiler, restarted the browser and visited heavy loaded websites. Marking as verified fixed based on my testing.

*[1]bp-24ae7b81-245f-4007-9a6d-300082150828
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Crash volume for signature 'IsMarkedInternalCommon<T>':
 - nightly (version 50): 0 crash from 2016-06-06.
 - aurora  (version 49): 1 crash from 2016-06-07.
 - beta    (version 48): 25 crashes from 2016-06-06.
 - release (version 47): 152 crashes from 2016-05-31.
 - esr     (version 45): 0 crash from 2016-04-07.

Crash volume on the last weeks:
             Week N-1   Week N-2   Week N-3   Week N-4   Week N-5   Week N-6   Week N-7
 - nightly          0          0          0          0          0          0          0
 - aurora           0          0          1          0          0          0          0
 - beta             5          6          5          4          0          3          0
 - release         27         24         27         20         22         20          5
 - esr              0          0          0          0          0          0          0

Affected platforms: Windows, Linux
This is a very low volume crash. I don't think we need to pursue it or track it as a carryover regression. It's coming up for me in triage now because of the bot in comment 13 marking 49 beta as affected.
You need to log in before you can comment on or make changes to this bug.