1194430 - Intermittent browser_storage_dynamic_windows.js | application crashed [@ js::jit::JitcodeGlobalEntry::IonCacheEntry::sweep(JSRuntime*)]

So I have confirmed that the segfault at nullptr is due to not being able to find the rejoin entry of an IonCacheEntry. My first hunch was that due to OSI, we could somehow end up with a not-marked mainline IonScript but a marked IC. I pushed to try with an assertion that at the time of adding an IonCacheEntry to to the global table, those IC entries' rejoin entries exist. That assertion never tripped, and some of the tests still crashed during sweep. That is to say, I haven't gotten very far. I still don't understand how an IC JitCode could be marked but its rejoin entry wasn't. I also pushed to try with a patch that removes IonCachEntries from the table if their rejoin entries can't be found: https://treeherder.mozilla.org/#/jobs?repo=try&revision=de57859e095c Guess I'll wait to see if anything crashes above.

Comment hidden (Legacy TBPL/Treeherder Robot)

log: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=12872109 repository: mozilla-inbound start_time: 2015-08-14T00:19:08 who: rvandermeulen[at]mozilla[dot]com machine: tst-linux64-spot-1779 buildname: Ubuntu VM 12.04 x64 mozilla-inbound pgo test mochitest-e10s-devtools-chrome revision: 7bd651ae54dd 4624 INFO TEST-UNEXPECTED-FAIL | browser/devtools/performance/test/browser_perf-recording-selected-03.js | Test timed out - expected PASS 4627 INFO TEST-UNEXPECTED-FAIL | browser/devtools/performance/test/browser_perf-recording-selected-03.js | Found a tab after previous test timed out: http://example.com/browser/browser/devtools/performance/test/doc_simple-test.html - expected PASS 4630 INFO TEST-UNEXPECTED-FAIL | browser/devtools/performance/test/browser_perf-recording-selected-04.js | A promise chain failed to handle a rejection: - at resource://gre/modules/commonjs/toolkit/loader.js -> resource://gre/modules/devtools/server/protocol.js:1125 - Error: Connection closed, pending request to server1.conn78.child1/performanceActor21, type startRecording failed FATAL ERROR: AsyncShutdown timeout in ShutdownLeaks: Wait for cleanup to be finished before checking for leaks Conditions: [{"name":"DevTools: Wait until toolbox is destroyed","state":"(none)","filename":"resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js","lineNumber":1934,"stack":["resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js:Toolbox.prototype.destroy/leakCheckObse [Parent 3632] ###!!! ABORT: file resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js, line 1934 [Parent 3632] ###!!! ABORT: file resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js, line 1934 [Child 3836] ###!!! ABORT: Aborting on channel error.: file /builds/slave/m-in-l64-pgo-00000000000000000/build/src/ipc/glue/MessageChannel.cpp, line 1762 [Child 3836] ###!!! ABORT: Aborting on channel error.: file /builds/slave/m-in-l64-pgo-00000000000000000/build/src/ipc/glue/MessageChannel.cpp, line 1762 TEST-UNEXPECTED-FAIL | browser/devtools/performance/test/browser_timeline-waterfall-generic.js | application terminated with exit code 11 PROCESS-CRASH | browser/devtools/performance/test/browser_timeline-waterfall-generic.js | application crashed [@ mozalloc_abort(char const*)] PROCESS-CRASH | browser/devtools/performance/test/browser_timeline-waterfall-generic.js | application crashed [@ js::jit::JitcodeGlobalEntry::IonCacheEntry::sweep(JSRuntime*)]

Comment hidden (Legacy TBPL/Treeherder Robot)

Kannan Vijayan [:djvj]

Comment 18

•

10 years ago

(In reply to Shu-yu Guo [:shu] from comment #8) > So I have confirmed that the segfault at nullptr is due to not being able to > find the rejoin entry of an IonCacheEntry. > > My first hunch was that due to OSI, we could somehow end up with a > not-marked mainline IonScript but a marked IC. I pushed to try with an > assertion that at the time of adding an IonCacheEntry to to the global > table, those IC entries' rejoin entries exist. That assertion never tripped, > and some of the tests still crashed during sweep. > > That is to say, I haven't gotten very far. I still don't understand how an > IC JitCode could be marked but its rejoin entry wasn't. > > I also pushed to try with a patch that removes IonCachEntries from the table > if their rejoin entries can't be found: > https://treeherder.mozilla.org/#/jobs?repo=try&revision=de57859e095c > > Guess I'll wait to see if anything crashes above. I wonder if a blind fix for this might be to keep a refcount on IonEntries, counting the number of IonCacheEntries that refer to them. A non-zero refcount implicitly keeps the IonEntry marked and live.

Flags: needinfo?(kvijayan)

Comment hidden (Legacy TBPL/Treeherder Robot)

log: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=12895468 repository: mozilla-inbound start_time: 2015-08-14T11:47:19 who: rvandermeulen[at]mozilla[dot]com machine: tst-linux64-spot-1121 buildname: Ubuntu VM 12.04 x64 mozilla-inbound opt test mochitest-e10s-devtools-chrome revision: b4a066523682 4677 INFO TEST-UNEXPECTED-FAIL | browser/devtools/performance/test/browser_timeline-waterfall-generic.js | Test timed out - expected PASS 4680 INFO TEST-UNEXPECTED-FAIL | browser/devtools/performance/test/browser_timeline-waterfall-generic.js | Found a tab after previous test timed out: http://example.com/browser/browser/devtools/performance/test/doc_simple-test.html - expected PASS FATAL ERROR: AsyncShutdown timeout in ShutdownLeaks: Wait for cleanup to be finished before checking for leaks Conditions: [{"name":"DevTools: Wait until toolbox is destroyed","state":"(none)","filename":"resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js","lineNumber":1934,"stack":["resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js:Toolbox.prototype.destroy/leakCheckObse [Parent 3600] ###!!! ABORT: file resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js, line 1934 [Parent 3600] ###!!! ABORT: file resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js, line 1934 [Child 3819] ###!!! ABORT: Aborting on channel error.: file /builds/slave/m-in-l64-000000000000000000000/build/src/ipc/glue/MessageChannel.cpp, line 1762 [Child 3819] ###!!! ABORT: Aborting on channel error.: file /builds/slave/m-in-l64-000000000000000000000/build/src/ipc/glue/MessageChannel.cpp, line 1762 TEST-UNEXPECTED-FAIL | browser/devtools/performance/test/browser_timeline-waterfall-generic.js | application terminated with exit code 11 PROCESS-CRASH | browser/devtools/performance/test/browser_timeline-waterfall-generic.js | application crashed [@ js::jit::JitcodeGlobalEntry::IonCacheEntry::sweep(JSRuntime*)] PROCESS-CRASH | browser/devtools/performance/test/browser_timeline-waterfall-generic.js | application crashed [@ mozalloc_abort(char const*)]

Comment hidden (Legacy TBPL/Treeherder Robot)

log: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=12895684 repository: mozilla-inbound start_time: 2015-08-14T11:55:12 who: rvandermeulen[at]mozilla[dot]com machine: tst-linux32-spot-1068 buildname: Ubuntu VM 12.04 mozilla-inbound opt test mochitest-e10s-devtools-chrome revision: 5f2d07e6f367 4669 INFO TEST-UNEXPECTED-FAIL | browser/devtools/performance/test/browser_perf-recording-selected-03.js | This test exceeded the timeout threshold. It should be rewritten or split up. If that's not possible, use requestLongerTimeout(N), but only as a last resort. - expected PASS 4770 INFO TEST-UNEXPECTED-FAIL | browser/devtools/performance/test/browser_perf-theme-toggle-01.js | This test exceeded the timeout threshold. It should be rewritten or split up. If that's not possible, use requestLongerTimeout(N), but only as a last resort. - expected PASS 4782 INFO TEST-UNEXPECTED-FAIL | browser/devtools/performance/test/browser_perf-ui-recording.js | Test timed out - expected PASS 4785 INFO TEST-UNEXPECTED-FAIL | browser/devtools/performance/test/browser_perf-ui-recording.js | Found a tab after previous test timed out: http://example.com/browser/browser/devtools/performance/test/doc_simple-test.html - expected PASS FATAL ERROR: AsyncShutdown timeout in ShutdownLeaks: Wait for cleanup to be finished before checking for leaks Conditions: [{"name":"DevTools: Wait until toolbox is destroyed","state":"(none)","filename":"resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js","lineNumber":1934,"stack":["resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js:Toolbox.prototype.destroy/leakCheckObse [Parent 3556] ###!!! ABORT: file resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js, line 1934 [Parent 3556] ###!!! ABORT: file resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js, line 1934 [Child 3806] ###!!! ABORT: Aborting on channel error.: file /builds/slave/m-in-lx-0000000000000000000000/build/src/ipc/glue/MessageChannel.cpp, line 1762 [Child 3806] ###!!! ABORT: Aborting on channel error.: file /builds/slave/m-in-lx-0000000000000000000000/build/src/ipc/glue/MessageChannel.cpp, line 1762 TEST-UNEXPECTED-FAIL | browser/devtools/performance/test/browser_timeline-waterfall-generic.js | application terminated with exit code 11 PROCESS-CRASH | browser/devtools/performance/test/browser_timeline-waterfall-generic.js | application crashed [@ js::jit::JitcodeGlobalEntry::IonCacheEntry::sweep(JSRuntime*)] PROCESS-CRASH | browser/devtools/performance/test/browser_timeline-waterfall-generic.js | application crashed [@ mozalloc_abort(char const*)]

Comment hidden (Legacy TBPL/Treeherder Robot)

log: https://treeherder.mozilla.org/logviewer.html#?repo=fx-team&job_id=4258628 repository: fx-team start_time: 2015-08-14T12:50:05 who: rvandermeulen[at]mozilla[dot]com machine: tst-linux64-spot-589 buildname: Ubuntu VM 12.04 x64 fx-team pgo test mochitest-e10s-devtools-chrome revision: 5c906eb9b8d9 4641 INFO TEST-UNEXPECTED-FAIL | browser/devtools/performance/test/browser_perf_recordings-io-05.js | Test timed out - expected PASS 4644 INFO TEST-UNEXPECTED-FAIL | browser/devtools/performance/test/browser_perf_recordings-io-05.js | Found a tab after previous test timed out: http://example.com/browser/browser/devtools/performance/test/doc_simple-test.html - expected PASS FATAL ERROR: AsyncShutdown timeout in ShutdownLeaks: Wait for cleanup to be finished before checking for leaks Conditions: [{"name":"DevTools: Wait until toolbox is destroyed","state":"(none)","filename":"resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js","lineNumber":1934,"stack":["resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js:Toolbox.prototype.destroy/leakCheckObse [Parent 3587] ###!!! ABORT: file resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js, line 1934 [Parent 3587] ###!!! ABORT: file resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js, line 1934 [Child 3803] ###!!! ABORT: Aborting on channel error.: file /builds/slave/fx-team-l64-pgo-00000000000000/build/src/ipc/glue/MessageChannel.cpp, line 1762 [Child 3803] ###!!! ABORT: Aborting on channel error.: file /builds/slave/fx-team-l64-pgo-00000000000000/build/src/ipc/glue/MessageChannel.cpp, line 1762 TEST-UNEXPECTED-FAIL | browser/devtools/performance/test/browser_timeline-waterfall-generic.js | application terminated with exit code 11 PROCESS-CRASH | browser/devtools/performance/test/browser_timeline-waterfall-generic.js | application crashed [@ js::jit::JitcodeGlobalEntry::IonCacheEntry::sweep(JSRuntime*)] PROCESS-CRASH | browser/devtools/performance/test/browser_timeline-waterfall-generic.js | application crashed [@ mozalloc_abort(char const*)]

Comment hidden (Legacy TBPL/Treeherder Robot)

log: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=2018939 repository: mozilla-central start_time: 2015-08-14T13:13:13 who: rvandermeulen[at]mozilla[dot]com machine: tst-linux64-spot-2017 buildname: Ubuntu VM 12.04 x64 mozilla-central opt test mochitest-e10s-devtools-chrome revision: 276ee420ed31 4631 INFO TEST-UNEXPECTED-FAIL | browser/devtools/performance/test/browser_perf_recordings-io-01.js | Test timed out - expected PASS 4634 INFO TEST-UNEXPECTED-FAIL | browser/devtools/performance/test/browser_perf_recordings-io-01.js | Found a tab after previous test timed out: http://example.com/browser/browser/devtools/performance/test/doc_simple-test.html - expected PASS FATAL ERROR: AsyncShutdown timeout in ShutdownLeaks: Wait for cleanup to be finished before checking for leaks Conditions: [{"name":"DevTools: Wait until toolbox is destroyed","state":"(none)","filename":"resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js","lineNumber":1934,"stack":["resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js:Toolbox.prototype.destroy/leakCheckObse [Parent 3598] ###!!! ABORT: file resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js, line 1934 [Parent 3598] ###!!! ABORT: file resource://gre/modules/commonjs/toolkit/loader.js -> resource:///modules/devtools/framework/toolbox.js, line 1934 [Child 3817] ###!!! ABORT: Aborting on channel error.: file /builds/slave/m-cen-l64-00000000000000000000/build/src/ipc/glue/MessageChannel.cpp, line 1762 [Child 3817] ###!!! ABORT: Aborting on channel error.: file /builds/slave/m-cen-l64-00000000000000000000/build/src/ipc/glue/MessageChannel.cpp, line 1762 TEST-UNEXPECTED-FAIL | browser/devtools/performance/test/browser_timeline-waterfall-generic.js | application terminated with exit code 11 PROCESS-CRASH | browser/devtools/performance/test/browser_timeline-waterfall-generic.js | application crashed [@ mozalloc_abort(char const*)] PROCESS-CRASH | browser/devtools/performance/test/browser_timeline-waterfall-generic.js | application crashed [@ js::jit::JitcodeGlobalEntry::IonCacheEntry::sweep(JSRuntime*)]

Comment hidden (Legacy TBPL/Treeherder Robot)

Shu-yu Guo [:shu]

Assignee

Comment 34

•

10 years ago

I'm making some progress, but still haven't found the root cause. I have confirmed that when sweeping the IonCacheEntry with no rejoin entry, SPS is *off*, and IonCacheEntry's JitCode was not marked by the table itself. This means that the IonCacheEntry's JitCode was marked by something else, while the rejoin entry's JitCode was never marked. With sfink's help I've pushed another diagnostic patch to try that should dump the entire heap when the JitcodeGlobalTable detects this situation. Hopefully that'll tell us what's marking the stub JitCode.

Shu-yu Guo [:shu]

Assignee

Comment 35

•

10 years ago

:( One failure's heapdump.txt didn't upload. The one that did upload doesn't even show the JitCode pointer in question in the dump. It doesn't seem like heap corruption, so I don't know why a supposedly marked TenuredCell isn't showing up in the heap dump.

Comment hidden (Legacy TBPL/Treeherder Robot)

Shu-yu Guo [:shu]

Assignee

Comment 37

•

10 years ago

I was able to get a failure where the IC stub JitCode in question appears in the heap dump [1]. It only appears a single time and isn't reachable from anything else. I suppose that makes sense, because the only time it's ever marked is when it's pushed on stack during GC. Still doesn't tell us where its mainline JitCode went though. [1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=ae82ca8c287f

Jordan Santell [:jsantell] [@jsantell]

Comment 38

•

10 years ago

Some background, this didn't come up until bug 1172180 landed (the real performance actor) -- and looks like it's only on e10s. Some of these tests that cause it it seems are currently disabled, but can be reenabled on try runs via reverting https://hg.mozilla.org/integration/mozilla-inbound/rev/9c35a8c75f64

Comment hidden (Legacy TBPL/Treeherder Robot)

Shu-yu Guo [:shu]

Assignee

Comment 42

•

10 years ago

Attached patch Always mark the global jitcode table during major GCs. — Details — Splinter Review

I've debugged this for 2 day now, and the only sure conclusion I've been able to drawn is that this assertion happens during major GCs where an IC stub jitcode is marked but its rejoin jitcode is not marked. Without being able to reproduce locally and use rr, I have no tractable way to find out why an IC stub's mainline jitcode isn't getting marked. Always marking the table during major GCs would keep the rejoin code alive, so I'm just going to do this. The try push looked pretty green: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4ee7210f97cb

Jordan Santell [:jsantell] [@jsantell]

Comment 43

•

10 years ago

I think the test causing this is still disabled in your push

Comment hidden (Legacy TBPL/Treeherder Robot)

Shu-yu Guo [:shu]

Assignee

Comment 48

•

10 years ago

Comment on attachment 8648871 [details] [diff] [review] Always mark the global jitcode table during major GCs. Review of attachment 8648871 [details] [diff] [review]: ----------------------------------------------------------------- sfink pointed out there's such a thing as "delayed marking", which could result in a situation where an IC stub JitCode is marked but its mainline JitCode isn't. I tried to confirm but couldn't get the crash again across ~100 retriggers. :/ I'm just gonna go with this patch.

Attachment #8648871 - Flags: review?(kvijayan)

Comment hidden (Legacy TBPL/Treeherder Robot)

Kannan Vijayan [:djvj]

Comment 50

•

10 years ago

Comment on attachment 8648871 [details] [diff] [review] Always mark the global jitcode table during major GCs. Review of attachment 8648871 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/JitcodeMap.cpp @@ +812,5 @@ > AutoSuppressProfilerSampling suppressSampling(trc->runtime()); > uint32_t gen = trc->runtime()->profilerSampleBufferGen(); > uint32_t lapCount = trc->runtime()->profilerSampleBufferLapCount(); > > + if (!trc->runtime()->spsProfiler.enabled()) A small comment for why this is done would be good.

Attachment #8648871 - Flags: review?(kvijayan) → review+

Kannan Vijayan [:djvj]

Comment 51

•

10 years ago

Nice job tracking this down btw. Crazy finnicky crash.

Comment hidden (Legacy TBPL/Treeherder Robot)

Pulsebot

Comment 56

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/938701d3cf0c

Comment hidden (Legacy TBPL/Treeherder Robot)

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 58

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/938701d3cf0c

Status: NEW → RESOLVED

Closed: 10 years ago

status-firefox43: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → Firefox 43

Comment hidden (Legacy TBPL/Treeherder Robot)

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 60

•

10 years ago

Shu, does this issue affect older branches as well? If so, please nominate for approval as-necessary :)

Assignee: nobody → shu

status-firefox41: --- → ?

status-firefox42: --- → ?

Flags: needinfo?(shu)

Shu-yu Guo [:shu]

Assignee

Comment 61

•

10 years ago

This affects Aurora but not Beta.

Flags: needinfo?(shu)

Shu-yu Guo [:shu]

Assignee

Comment 62

•

10 years ago

Comment on attachment 8648871 [details] [diff] [review] Always mark the global jitcode table during major GCs. Approval Request Comment [Feature/regressing bug #]: 1182730 [User impact if declined]: crashes when profiling [Describe test coverage new/current, TreeHerder]: on central on TH [Risks and why]: low, bugfix only [String/UUID change made/needed]: none

Attachment #8648871 - Flags: approval-mozilla-aurora?

Ryan VanderMeulen [:RyanVM]

Reporter

Updated

•

10 years ago

status-firefox41: ? → unaffected

status-firefox42: ? → affected

status-firefox-esr38: --- → unaffected

Sylvestre Ledru [:Sylvestre]

Comment 63

•

10 years ago

Comment on attachment 8648871 [details] [diff] [review] Always mark the global jitcode table during major GCs. Fix an intermittent, taking it.

Attachment #8648871 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Ryan VanderMeulen [:RyanVM]

Reporter

Comment 64

•

10 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/545da8632d21

status-firefox42: affected → fixed

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

10 years ago

Depends on: 1203791

BMO Automation

Updated

•

7 years ago

Product: Firefox → DevTools