Closed Bug 1171430 Opened 5 years ago Closed 5 years ago

Intermittent e10s test_jQuery.html | application timed out after 330 seconds with no output | application crashed [@ js::NativeObject& JSObject::as<js::NativeObject>()]

Categories

(Core :: JavaScript: GC, defect)

x86_64
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
e10s - ---
firefox39 --- unaffected
firefox40 --- unaffected
firefox41 --- fixed
firefox-esr31 --- unaffected
firefox-esr38 --- unaffected

People

(Reporter: RyanVM, Assigned: terrence)

References

Details

(Keywords: crash, intermittent-failure)

Attachments

(2 files)

00:27:20 WARNING - PROCESS-CRASH | dom/tests/mochitest/ajax/jquery/test_jQuery.html | application crashed [@ js::NativeObject& JSObject::as<js::NativeObject>()]
00:27:20 INFO - Crash dump filename: /tmp/tmpFR4esL.mozrunner/minidumps/31f16a0f-3f60-48d9-341dff2c-1145724b.dmp
00:27:20 INFO - Operating system: Linux
00:27:20 INFO - 0.0.0 Linux 3.2.0-76-generic #111-Ubuntu SMP Tue Jan 13 22:16:09 UTC 2015 x86_64
00:27:20 INFO - CPU: amd64
00:27:20 INFO - family 6 model 62 stepping 4
00:27:20 INFO - 1 CPU
00:27:20 INFO - Crash reason: SIGSEGV
00:27:20 INFO - Crash address: 0x0
00:27:20 INFO - Thread 0 (crashed)
00:27:20 INFO - 0 libxul.so!js::NativeObject& JSObject::as<js::NativeObject>() [CTypes.cpp:2ec6ba38c4c6 : 8501 + 0x3]
00:27:20 INFO - rbx = 0x00007f7999d25b78 r12 = 0x00007f7999d25000
00:27:20 INFO - r13 = 0x00007f79c2696780 r14 = 0x00007fff2a3300f0
00:27:20 INFO - r15 = 0x00007f7999d25b78 rip = 0x00007f79c0d2aae3
00:27:20 INFO - rsp = 0x00007fff2a32fd78 rbp = 0x00007fff2a32fd90
00:27:20 INFO - Found by: given as instruction pointer in context
00:27:20 INFO - 1 libxul.so!js::Shape::fixupDictionaryShapeAfterMovingGC() [jspropertytree.cpp:2ec6ba38c4c6 : 255 + 0x4]
00:27:20 INFO - rbx = 0x00007f7999d25b78 r12 = 0x00007f7999d25000
00:27:20 INFO - r13 = 0x00007f79c2696780 r14 = 0x00007fff2a3300f0
00:27:20 INFO - r15 = 0x00007f7999d25b78 rip = 0x00007f79c1356480
00:27:20 INFO - rsp = 0x00007fff2a32fd80 rbp = 0x00007fff2a32fd90
00:27:20 INFO - Found by: call frame info
00:27:20 INFO - 2 libxul.so!js::gc::UpdateCellPointersTask::updateArenas() [jspropertytree.cpp:2ec6ba38c4c6 : 307 + 0x4]
00:27:20 INFO - rbx = 0x0000000000000004 r12 = 0x00007f7999d25000
00:27:20 INFO - r13 = 0x00007f79c2696780 r14 = 0x00007fff2a3300f0
00:27:20 INFO - r15 = 0x00007f7999d25b78 rip = 0x00007f79c1387eed
00:27:20 INFO - rsp = 0x00007fff2a32fda0 rbp = 0x00007fff2a330020
00:27:20 INFO - Found by: call frame info
00:27:20 INFO - 3 libxul.so!js::gc::UpdateCellPointersTask::run() [jsgc.cpp:2ec6ba38c4c6 : 2512 + 0x7]
00:27:20 INFO - rbx = 0x00007fff2a3300f0 r12 = 0x000517abf1f84cf2
00:27:20 INFO - r13 = 0x00007fff2a3300d0 r14 = 0x00007f79b0008350
00:27:20 INFO - r15 = 0x0000000000000002 rip = 0x00007f79c1387f98
00:27:20 INFO - rsp = 0x00007fff2a330030 rbp = 0x00007fff2a330050
00:27:20 INFO - Found by: call frame info
00:27:20 INFO - 4 libxul.so!js::GCParallelTask::runFromMainThread(JSRuntime*) [HelperThreads.cpp:2ec6ba38c4c6 : 830 + 0x7]
00:27:20 INFO - rbx = 0x00007fff2a3300f0 r12 = 0x000517abf1f84cf2
00:27:20 INFO - r13 = 0x00007fff2a3300d0 r14 = 0x00007f79b0008350
00:27:20 INFO - r15 = 0x0000000000000002 rip = 0x00007f79c0e9dac6
00:27:20 INFO - rsp = 0x00007fff2a330060 rbp = 0x00007fff2a330070
00:27:20 INFO - Found by: call frame info
00:27:20 INFO - 5 libxul.so!js::gc::GCRuntime::updateAllCellPointersParallel(js::gc::MovingTracer*, JS::Zone*) [jsgc.cpp:2ec6ba38c4c6 : 2551 + 0xe]
00:27:20 INFO - rbx = 0x0000000000000002 r12 = 0x00007fff2a330168
00:27:20 INFO - r13 = 0x00007fff2a3300d0 r14 = 0x00007f79b0008350
00:27:20 INFO - r15 = 0x0000000000000002 rip = 0x00007f79c133e72b
00:27:20 INFO - rsp = 0x00007fff2a330080 rbp = 0x00007fff2a330320
00:27:20 INFO - Found by: call frame info
00:27:20 INFO - 6 libxul.so!js::gc::GCRuntime::updatePointersToRelocatedCells(JS::Zone*) [jsgc.cpp:2ec6ba38c4c6 : 2604 + 0x4]
00:27:20 INFO - rbx = 0x00007f79a9e02800 r12 = 0x00007f798ea18868
00:27:20 INFO - r13 = 0x00007f79b0008350 r14 = 0x00007fff2a330370
00:27:20 INFO - r15 = 0x00007f79b000e6d0 rip = 0x00007f79c136838c
00:27:20 INFO - rsp = 0x00007fff2a330330 rbp = 0x00007fff2a330410
00:27:20 INFO - Found by: call frame info
00:27:20 INFO - 7 libxul.so!js::gc::GCRuntime::compactPhase(JS::gcreason::Reason, js::SliceBudget&) [jsgc.cpp:2ec6ba38c4c6 : 5430 + 0x4]
00:27:20 INFO - rbx = 0x00007f79a9e02800 r12 = 0x00007f79b00104b0
00:27:20 INFO - r13 = 0x00007fff2a330710 r14 = 0x00007f79b0008350
00:27:20 INFO - r15 = 0x0000000000000025 rip = 0x00007f79c13bf6a6
00:27:20 INFO - rsp = 0x00007fff2a330420 rbp = 0x00007fff2a330470
00:27:20 INFO - Found by: call frame info
00:27:20 INFO - 8 libxul.so!js::gc::GCRuntime::incrementalCollectSlice(js::SliceBudget&, JS::gcreason::Reason) [jsgc.cpp:2ec6ba38c4c6 : 5873 + 0xd]
00:27:20 INFO - rbx = 0x00007fff2a330710 r12 = 0x0000000000000002
00:27:20 INFO - r13 = 0x0000000000000025 r14 = 0x0000000000000000
00:27:20 INFO - r15 = 0x00007f79b0008350 rip = 0x00007f79c13bfb36
00:27:20 INFO - rsp = 0x00007fff2a330480 rbp = 0x00007fff2a3304e0
00:27:20 INFO - Found by: call frame info
00:27:20 INFO - 9 libxul.so!js::gc::GCRuntime::gcCycle(bool, js::SliceBudget&, JS::gcreason::Reason) [jsgc.cpp:2ec6ba38c4c6 : 6039 + 0xd]
00:27:20 INFO - rbx = 0x00007f79b0008350 r12 = 0x0000000000000025
00:27:20 INFO - r13 = 0x00007fff2a330710 r14 = 0x00007f79b00084a0
00:27:20 INFO - r15 = 0x0000000000000001 rip = 0x00007f79c13c0bee
00:27:20 INFO - rsp = 0x00007fff2a3304f0 rbp = 0x00007fff2a330590
00:27:20 INFO - Found by: call frame info
00:27:20 INFO - 10 libxul.so!js::gc::GCRuntime::collect(bool, js::SliceBudget, JS::gcreason::Reason) [jsgc.cpp:2ec6ba38c4c6 : 6151 + 0xe]
00:27:20 INFO - rbx = 0x00007f79b0008350 r12 = 0x0000000000000001
00:27:20 INFO - r13 = 0x0000000000000025 r14 = 0x0000000000000001
00:27:20 INFO - r15 = 0x00007f79b000e6d0 rip = 0x00007f79c13c0f42
00:27:20 INFO - rsp = 0x00007fff2a3305a0 rbp = 0x00007fff2a330700
00:27:20 INFO - Found by: call frame info
Nominating, this seems to be more just intermittent, at the moment.
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #47)
> Retriggers are clearly pointing at bug 1170550 as the culprit.
> https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&filter-
> searchStr=Linux%20x64%20debug%20Mochitest%20e10s%20Mochitest%20e10s%20M-
> e10s%283%29&fromchange=780ec82ee92a&tochange=959c2bd7e15e

OK, but I don't know what to do about that.  Please get someone from the JS team to look at this.  I bet this is what's causing it: <http://mxr.mozilla.org/mozilla-central/source/dom/workers/test/serviceworkers/strict_mode_error.js>
Flags: needinfo?(ehsan)
I'm not sure why you're expecting me to do the legwork for finding a JS person to help with this. The regression was caused by your push and the onus is on your to help get it fixed or backed out.
Flags: needinfo?(ehsan)
I don't know anything about the JS engine, sorry.
Flags: needinfo?(ehsan) → needinfo?(nihsanullah)
k, I'll plan to backout tomorrow if we haven't heard from the JS team in the mean time. The frequency is too high to leave this sitting unowned.
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #53)
> k, I'll plan to backout tomorrow if we haven't heard from the JS team in the
> mean time. The frequency is too high to leave this sitting unowned.

Please disable the test if you want, the patch should not be backed out.
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #55)
> Try run of the test disabled:
> https://treeherder.mozilla.org/#/jobs?repo=try&revision=6b8580f653ef

This didn't work. Neither did a Try push of bug 1170550 backed out. Looks like you're off the hook and I need to curl up in the fetal position somewhere.
OK, I've got a saner-looking potential culprit now. Going retrigger-happy to hopefully avoid false positives this time.
Blocks: 1153382
No longer blocks: 1170550
100 green runs on the push prior and a green Try run of the backout confirms that this is from bug 1153382. Terrence, given the frequency, can you please fix or backout soon?
Flags: needinfo?(nihsanullah) → needinfo?(terrence)
Glad to see you narrowed it down to bug 1153382.  I was pretty worried about my patch having caused this...
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #60)
> 100 green runs on the push prior and a green Try run of the backout confirms
> that this is from bug 1153382. Terrence, given the frequency, can you please
> fix or backout soon?

The patch in question added extra poisoning to suss out existing invalid users. Seems we found one. It's impossible to tell where the culprit is from just the backtraces, unfortunately, but maybe I'll get lucky and be able to repro under rr.
Flags: needinfo?(terrence)
We determined yesterday that we're crashing because we access a poisoned nursery pointer.  It occurred to me afterwards that we shouldn't be accessing any nursery pointers after the IsInsideNursery() check earlier in the function.  This check works by checking a flag bit in the chunk trailer - something that is overwritten by the JS_CRASH_DIAGNOSTICS poisoning in Nursery::sweep().

The bit used is ChunkLocationBitNursery, which is the bottom bit.  The poisoning change in bug 1153382 changed the bit pattern used, and in fact made the low bit go from a one to a zero (0x...2b2b to 0x5656).  So this was already wrong, but our poisoning coincidentally made IsInsideNursery() return true for pointers into poisoned nursery chunks.

We need to explicitly set the chunk location flags to something for poisoned nursery chunks, and setting them to report as nursery chunks should fix this bug.  An alternative might be to mark them as poisoned chunks but then we have to find another way to check whether cell is a pointer into dead nursery space in Shape::fixupDictionaryShapeAfterMovingGC().
Attachment #8617872 - Flags: review?(terrence)
Comment on attachment 8617872 [details] [diff] [review]
bug1171430-poisoned-nursery

Review of attachment 8617872 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/gc/Nursery.cpp
@@ +655,2 @@
>              chunk(i).trailer.runtime = runtime();
> +        }

Wow, great find!
Attachment #8617872 - Flags: review?(terrence) → review+
And here is the fix for the poisoning itself.
Assignee: nobody → terrence
Status: NEW → ASSIGNED
Attachment #8620379 - Flags: review?(jcoppeard)
Attachment #8620379 - Flags: review?(jcoppeard) → review+
I love when a good plan comes together! Thanks for fixing this, Jon and Terrence :)
You need to log in before you can comment on or make changes to this bug.