Closed Bug 1103957 Opened 5 years ago Closed 5 years ago

Intermittent test_imestate.html | application crashed [@ js::gcstats::Statistics::beginPhase(js::gcstats::Phase)]

Categories

(Core :: JavaScript Engine, defect)

x86
Linux
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla37
Tracking Status
firefox35 --- unaffected
firefox36 --- fixed
firefox37 --- fixed
firefox-esr31 --- unaffected

People

(Reporter: cbook, Assigned: sfink)

References

()

Details

(Keywords: crash, intermittent-failure)

Attachments

(4 files)

Ubuntu VM 12.04 mozilla-inbound debug test mochitest-other

https://treeherder.mozilla.org/ui/logviewer.html#?job_id=4150071&repo=mozilla-inbound

03:52:28 WARNING - PROCESS-CRASH | chrome://mochitests/content/chrome/widget/tests/test_imestate.html | application crashed [@ js::gcstats::Statistics::beginPhase(js::gcstats::Phase)]
03:52:28 INFO - Crash dump filename: /tmp/tmpIG3RnZ.mozrunner/minidumps/6c075e0e-0791-205e-05ecb027-1329006b.dmp
03:52:28 INFO - Operating system: Linux
03:52:28 INFO - 0.0.0 Linux 3.2.0-23-generic-pae #36-Ubuntu SMP Tue Apr 10 22:19:09 UTC 2012 i686
03:52:28 INFO - CPU: x86
03:52:28 INFO - GenuineIntel family 6 model 44 stepping 2
03:52:28 INFO - 1 CPU
03:52:28 INFO - Crash reason: SIGSEGV
03:52:28 INFO - Crash address: 0x0
03:52:28 INFO - Thread 0 (crashed)
03:52:28 INFO - 0 libxul.so!js::gcstats::Statistics::beginPhase(js::gcstats::Phase) [Statistics.cpp:ced4220ad4df : 873 + 0x18]
03:52:28 INFO - eip = 0xb4df0a49 esp = 0xbfa9ccb0 ebp = 0xbfa9ccf8 ebx = 0xb6e30270
03:52:28 INFO - esi = 0xad8e0468 edi = 0x00000001 eax = 0x00000000 ecx = 0xb76578ac
03:52:28 INFO - edx = 0x00000000 efl = 0x00010286
03:52:28 INFO - Found by: given as instruction pointer in context
03:52:28 INFO - 1 libxul.so!js::gc::GCRuntime::collect(bool, js::SliceBudget&, js::JSGCInvocationKind, JS::gcreason::Reason) [Statistics.h:ced4220ad4df : 262 + 0x4]
03:52:28 INFO - eip = 0xb4aee0d5 esp = 0xbfa9cd00 ebp = 0xbfa9cd98 ebx = 0xb6e30270
03:52:28 INFO - esi = 0xbfa9cd4c edi = 0xad8da1b4
03:52:28 INFO - Found by: call frame info
03:52:28 INFO - 2 libxul.so!js::gc::GCRuntime::gcSlice(js::JSGCInvocationKind, JS::gcreason::Reason, long long) [jsgc.cpp:ced4220ad4df : 6305 + 0x12]
03:52:28 INFO - eip = 0xb4aee5a5 esp = 0xbfa9cda0 ebp = 0xbfa9cde8 ebx = 0xb6e30270
03:52:28 INFO - esi = 0x00000001 edi = 0xad8da1b4
03:52:28 INFO - Found by: call frame info
03:52:28 INFO - 3 libxul.so!js::gc::GCRuntime::maybePeriodicFullGC() [jsgc.cpp:ced4220ad4df : 3351 + 0x25]
03:52:28 INFO - eip = 0xb4aee7fb esp = 0xbfa9cdf0 ebp = 0xbfa9ce28 ebx = 0xb6e30270
03:52:28 INFO - esi = 0xad8da1b4 edi = 0x9f59eac0
03:52:28 INFO - Found by: call frame info
03:52:28 INFO - 4 libxul.so!mozilla::dom::AutoEntryScript::~AutoEntryScript() [ScriptSettings.cpp:ced4220ad4df : 554 + 0x10]
03:52:28 INFO - eip = 0xb2ffa425 esp = 0xbfa9ce30 ebp = 0xbfa9ce48 ebx = 0xb6e30270
03:52:28 INFO - esi = 0xbfa9ce94 edi = 0x9f59eac0
03:52:28 INFO - Found by: call frame info
03:52:28 INFO - 5 libxul.so!nsXPCWrappedJSClass::DelegatedQueryInterface(nsXPCWrappedJS*, nsID const&, void**) [XPCWrappedJSClass.cpp:ced4220ad4df : 560 + 0x30]
03:52:28 INFO - eip = 0xb2bfbe34 esp = 0xbfa9ce50 ebp = 0xbfa9cf68 ebx = 0xb6e30270
03:52:28 INFO - esi = 0x00000000 edi = 0x9f59eac0
So this is crashing because we're running a GC from the CC while it's running in the GC callback.
Assignee: nobody → sphink
Comment on attachment 8537242 [details] [diff] [review]
prevent phase nesting

Review of attachment 8537242 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm
Attachment #8537242 - Flags: review?(terrence) → review+
https://hg.mozilla.org/mozilla-central/rev/df0850b9023b
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla37
Please request Aurora approval on this when you get a chance.
Flags: needinfo?(sphink)
Comment on attachment 8537242 [details] [diff] [review]
prevent phase nesting

Approval Request Comment
[Feature/regressing bug #]: bug 1088831
[User impact if declined]: intermittent oranges in debug builds
[Describe test coverage new/current, TBPL]: seems to be doing ok on inbound and central for the last day
[Risks and why]: possibility of new intermittent oranges in debug builds (but like I said, so far it looks good)
[String/UUID change made/needed]: none
Flags: needinfo?(sphink)
Attachment #8537242 - Flags: approval-mozilla-aurora?
(In reply to TBPL Robot from comment #37)

BOOOO!!! :(
Status: RESOLVED → REOPENED
Flags: needinfo?(sphink)
Resolution: FIXED → ---
Target Milestone: mozilla37 → ---
Comment on attachment 8537242 [details] [diff] [review]
prevent phase nesting

not really fixed, so, no uplift :/
Attachment #8537242 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
Ok, I can't get this to trigger on try: https://treeherder.mozilla.org/ui/#/jobs?repo=try&revision=ab35d4c9b11b

But the links in this bug do have a good stack. The problem is being triggered by a GC_BEGIN calling back to the cycle collector calling Unlink calling the GC. Only that should be fine now, as it's exactly what the patch in this bug fixed.

So here's a diagnostics patch to dump out some more information.
Attachment #8539032 - Flags: review?(terrence)
Comment on attachment 8539032 [details] [diff] [review]
Temporary diagnostics

Review of attachment 8539032 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/gc/Statistics.cpp
@@ +888,5 @@
> +        fprintf(stderr, "phase %d already has start time of %ld; recursive entry detected!\n",
> +                phase, phaseStartTimes[phase]);
> +        for (int i = 0; i < phaseNestingDepth; i++)
> +            fprintf(stderr, "  stack[%d]: %d\n", i, phaseNesting[i]);
> +        MOZ_CRASH("My horse for a stack trace!\n");

Have you been shouting at treeherder again?
Attachment #8539032 - Flags: review?(terrence) → review+
(In reply to Terrence Cole [:terrence] from comment #44)
> Have you been shouting at treeherder again?

Yes, but it turns out it was unjust. TH *did* have the correct stack trace too, just obscured with a bogus one.

Sadly, I still need this printf output to diagnose the problem.
Flags: needinfo?(sphink)
Depends on: 1114444
https://hg.mozilla.org/mozilla-central/rev/ac25357cd4c1
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla37
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla37 → ---
Aha! My fix to the nesting handled one level of nesting, but not two.

If you have a PHASE_GC_BEGIN1 that is interrupted by a PHASE_GC_BEGIN2, then PHASE_GC_BEGIN1 is properly suspended and resumed between the PHASE2 phases. But if PHASE_GC_BEGIN2 is interrupted by a PHASE_GC_BEGIN3, then during beginPhase(PHASE_GC_BEGIN3) it does endPhase(PHASE_GC_BEGIN2), which checks the suspend stack (?!) and sees PHASE_GC_BEGIN1 on it. That is not what it should be doing.

Anyway, it does a beginPhase(PHASE_GC_BEGIN1) with a now-empty suspend stack, then continues the body of the outer beginPhase(PHASE_GC_BEGIN3), which sees that we're already inside of a PHASE_GC_BEGIN. Kaboom.

The part about resuming PHASE_GC_BEGIN1 is wrong. The endPhase triggered by beginPhase should not be auto-resuming. It should be force-ending the phase. Or rather, the public-facing endPhase should not be used for internal phase management. This removes a PHASE_MUTATOR special case too, which is good. (I explicitly handled the exact same problem for PHASE_MUTATOR, but didn't realize it also applied to the other suspendable phases.)
Attachment #8540402 - Flags: review?(terrence)
Attachment #8540402 - Flags: review?(terrence) → review+
Depends on: 1114871
No longer depends on: 1114871
https://hg.mozilla.org/mozilla-central/rev/9280b99e3475
https://hg.mozilla.org/mozilla-central/rev/f22bb012bf58
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla37
Victory! I can haz approval request? :)
Combined patch for aurora uplift.
Comment on attachment 8542204 [details] [diff] [review]
For aurora uplift: prevent phase nesting

Combined patch, carrying over r=terrence.

Approval Request Comment
[Feature/regressing bug #]: bug 1088831
[User impact if declined]: intermittent oranges in debug builds
[Describe test coverage new/current, TBPL]: RyanVM says it's good :-)
[Risks and why]: possibility of new intermittent oranges in debug builds (but like I said, so far it looks good)
[String/UUID change made/needed]: none
Attachment #8542204 - Attachment description: prevent phase nesting, → For aurora uplift: prevent phase nesting
Attachment #8542204 - Flags: review+
Attachment #8542204 - Flags: approval-mozilla-aurora?
Attachment #8542204 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.