Closed
Bug 1194430
Opened 10 years ago
Closed 10 years ago
Intermittent browser_storage_dynamic_windows.js | application crashed [@ js::jit::JitcodeGlobalEntry::IonCacheEntry::sweep(JSRuntime*)]
Categories
(DevTools :: Performance Tools (Profiler/Timeline), defect)
DevTools
Performance Tools (Profiler/Timeline)
Tracking
(firefox41 unaffected, firefox42 fixed, firefox43 fixed, firefox-esr38 unaffected)
RESOLVED
FIXED
Firefox 43
Tracking | Status | |
---|---|---|
firefox41 | --- | unaffected |
firefox42 | --- | fixed |
firefox43 | --- | fixed |
firefox-esr38 | --- | unaffected |
People
(Reporter: RyanVM, Assigned: shu)
References
Details
(Keywords: crash, intermittent-failure)
Attachments
(1 file)
2.27 KB,
patch
|
djvj
:
review+
Sylvestre
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
Jordan, these are showing up in lots of places in the devtools tests lately. You said you were looking into them?
Flags: needinfo?(jsantell)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Updated•10 years ago
|
Component: Developer Tools → Developer Tools: Performance Tools (Profiler/Timeline)
Flags: needinfo?(jsantell) → needinfo?(shu)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 4•10 years ago
|
||
I pushed a try build with printfs, let's see if anything fails: https://treeherder.mozilla.org/#/jobs?repo=try&revision=86350c9d940e
Flags: needinfo?(shu)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 8•10 years ago
|
||
So I have confirmed that the segfault at nullptr is due to not being able to find the rejoin entry of an IonCacheEntry.
My first hunch was that due to OSI, we could somehow end up with a not-marked mainline IonScript but a marked IC. I pushed to try with an assertion that at the time of adding an IonCacheEntry to to the global table, those IC entries' rejoin entries exist. That assertion never tripped, and some of the tests still crashed during sweep.
That is to say, I haven't gotten very far. I still don't understand how an IC JitCode could be marked but its rejoin entry wasn't.
I also pushed to try with a patch that removes IonCachEntries from the table if their rejoin entries can't be found: https://treeherder.mozilla.org/#/jobs?repo=try&revision=de57859e095c
Guess I'll wait to see if anything crashes above.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 18•10 years ago
|
||
(In reply to Shu-yu Guo [:shu] from comment #8)
> So I have confirmed that the segfault at nullptr is due to not being able to
> find the rejoin entry of an IonCacheEntry.
>
> My first hunch was that due to OSI, we could somehow end up with a
> not-marked mainline IonScript but a marked IC. I pushed to try with an
> assertion that at the time of adding an IonCacheEntry to to the global
> table, those IC entries' rejoin entries exist. That assertion never tripped,
> and some of the tests still crashed during sweep.
>
> That is to say, I haven't gotten very far. I still don't understand how an
> IC JitCode could be marked but its rejoin entry wasn't.
>
> I also pushed to try with a patch that removes IonCachEntries from the table
> if their rejoin entries can't be found:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=de57859e095c
>
> Guess I'll wait to see if anything crashes above.
I wonder if a blind fix for this might be to keep a refcount on IonEntries, counting the number of IonCacheEntries that refer to them. A non-zero refcount implicitly keeps the IonEntry marked and live.
Flags: needinfo?(kvijayan)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 34•10 years ago
|
||
I'm making some progress, but still haven't found the root cause.
I have confirmed that when sweeping the IonCacheEntry with no rejoin entry, SPS is *off*, and IonCacheEntry's JitCode was not marked by the table itself.
This means that the IonCacheEntry's JitCode was marked by something else, while the rejoin entry's JitCode was never marked. With sfink's help I've pushed another diagnostic patch to try that should dump the entire heap when the JitcodeGlobalTable detects this situation.
Hopefully that'll tell us what's marking the stub JitCode.
Assignee | ||
Comment 35•10 years ago
|
||
:( One failure's heapdump.txt didn't upload. The one that did upload doesn't even show the JitCode pointer in question in the dump. It doesn't seem like heap corruption, so I don't know why a supposedly marked TenuredCell isn't showing up in the heap dump.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 37•10 years ago
|
||
I was able to get a failure where the IC stub JitCode in question appears in the heap dump [1]. It only appears a single time and isn't reachable from anything else. I suppose that makes sense, because the only time it's ever marked is when it's pushed on stack during GC. Still doesn't tell us where its mainline JitCode went though.
[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=ae82ca8c287f
Comment 38•10 years ago
|
||
Some background, this didn't come up until bug 1172180 landed (the real performance actor) -- and looks like it's only on e10s. Some of these tests that cause it it seems are currently disabled, but can be reenabled on try runs via reverting https://hg.mozilla.org/integration/mozilla-inbound/rev/9c35a8c75f64
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 42•10 years ago
|
||
I've debugged this for 2 day now, and the only sure conclusion I've been able
to drawn is that this assertion happens during major GCs where an IC stub
jitcode is marked but its rejoin jitcode is not marked.
Without being able to reproduce locally and use rr, I have no tractable way to
find out why an IC stub's mainline jitcode isn't getting marked.
Always marking the table during major GCs would keep the rejoin code alive, so
I'm just going to do this.
The try push looked pretty green: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4ee7210f97cb
Comment 43•10 years ago
|
||
I think the test causing this is still disabled in your push
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Assignee | ||
Comment 48•10 years ago
|
||
Comment on attachment 8648871 [details] [diff] [review]
Always mark the global jitcode table during major GCs.
Review of attachment 8648871 [details] [diff] [review]:
-----------------------------------------------------------------
sfink pointed out there's such a thing as "delayed marking", which could result in a situation where an IC stub JitCode is marked but its mainline JitCode isn't.
I tried to confirm but couldn't get the crash again across ~100 retriggers. :/
I'm just gonna go with this patch.
Attachment #8648871 -
Flags: review?(kvijayan)
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 50•10 years ago
|
||
Comment on attachment 8648871 [details] [diff] [review]
Always mark the global jitcode table during major GCs.
Review of attachment 8648871 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit/JitcodeMap.cpp
@@ +812,5 @@
> AutoSuppressProfilerSampling suppressSampling(trc->runtime());
> uint32_t gen = trc->runtime()->profilerSampleBufferGen();
> uint32_t lapCount = trc->runtime()->profilerSampleBufferLapCount();
>
> + if (!trc->runtime()->spsProfiler.enabled())
A small comment for why this is done would be good.
Attachment #8648871 -
Flags: review?(kvijayan) → review+
Comment 51•10 years ago
|
||
Nice job tracking this down btw. Crazy finnicky crash.
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment hidden (Legacy TBPL/Treeherder Robot) |
Comment 56•10 years ago
|
||
Comment hidden (Legacy TBPL/Treeherder Robot) |
Status: NEW → RESOLVED
Closed: 10 years ago
status-firefox43:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 43
Comment hidden (Legacy TBPL/Treeherder Robot) |
Reporter | ||
Comment 60•10 years ago
|
||
Shu, does this issue affect older branches as well? If so, please nominate for approval as-necessary :)
Assignee | ||
Comment 62•10 years ago
|
||
Comment on attachment 8648871 [details] [diff] [review]
Always mark the global jitcode table during major GCs.
Approval Request Comment
[Feature/regressing bug #]: 1182730
[User impact if declined]: crashes when profiling
[Describe test coverage new/current, TreeHerder]: on central on TH
[Risks and why]: low, bugfix only
[String/UUID change made/needed]: none
Attachment #8648871 -
Flags: approval-mozilla-aurora?
Reporter | ||
Updated•10 years ago
|
Comment 63•10 years ago
|
||
Comment on attachment 8648871 [details] [diff] [review]
Always mark the global jitcode table during major GCs.
Fix an intermittent, taking it.
Attachment #8648871 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Reporter | ||
Comment 64•10 years ago
|
||
Updated•7 years ago
|
Product: Firefox → DevTools
You need to log in
before you can comment on or make changes to this bug.
Description
•