Closed Bug 822041 Opened 12 years ago Closed 11 years ago

Intermittent browser_profiler_run.js, browser_profiler_controller.js, browser_profiler_bug_830664_multiple_profiles.js | application crashed [@ js::SPSProfiler::exit] (Assertion failure: stack_[*size_].js(), at ../../../js/src/vm/SPSProfiler.cpp:130)

Categories

(DevTools :: Performance Tools (Profiler/Timeline), defect)

defect
Not set
normal

Tracking

(firefox20 wontfix, firefox21 affected, firefox22 fixed, firefox23 fixed)

RESOLVED FIXED
Tracking Status
firefox20 --- wontfix
firefox21 --- affected
firefox22 --- fixed
firefox23 --- fixed

People

(Reporter: RyanVM, Assigned: sfink)

References

Details

(Keywords: assertion, crash, intermittent-failure)

Crash Data

Attachments

(3 files, 3 obsolete files)

https://tbpl.mozilla.org/php/getParsedLog.php?id=17977183&tree=Mozilla-Inbound

Rev3 Fedora 12x64 mozilla-inbound debug test mochitest-browser-chrome on 2012-12-15 09:23:33 PST for push 82077de3f9bc
slave: talos-r3-fed64-068

TEST-INFO | chrome://mochitests/content/browser/browser/devtools/profiler/test/browser_profiler_run.js | Console message: [JavaScript Warning: "ReferenceError: assignment to undeclared variable gJavaScriptOnly" {file: "chrome://browser/content/devtools/profiler/cleopatra/js/devtools.js" line: 69}]
TEST-PASS | chrome://mochitests/content/browser/browser/devtools/profiler/test/browser_profiler_run.js | Profiler panel exists
TEST-PASS | chrome://mochitests/content/browser/browser/devtools/profiler/test/browser_profiler_run.js | Active profile exists
TEST-PASS | chrome://mochitests/content/browser/browser/devtools/profiler/test/browser_profiler_run.js | Start button exists
TEST-PASS | chrome://mochitests/content/browser/browser/devtools/profiler/test/browser_profiler_run.js | Stop button exists
Profiler: Sampler Started
Profiler: Request signal
Profiler: Signal installed
Profiler: Profiler thread started
TEST-PASS | chrome://mochitests/content/browser/browser/devtools/profiler/test/browser_profiler_run.js | Profiler is running
MOZ_EVENT_TRACE sample 1355594587349 22
JavaScript strict warning: chrome://browser/content/devtools/profiler/cleopatra/js/ui.js, line 1291: assignment to undeclared variable gVideoCapture
Assertion failure: stack_[*size_].js(), at ../../../js/src/vm/SPSProfiler.cpp:130
WARNING: shutting down early because of crash!: file ../../../../dom/plugins/ipc/PluginModuleChild.cpp, line 704
WARNING: shutting down early because of crash!: file ../../../../dom/plugins/ipc/PluginModuleChild.cpp, line 704
WARNING: plugin process _exit()ing: file ../../../../dom/plugins/ipc/PluginModuleChild.cpp, line 669
WARNING: plugin process _exit()ing: file ../../../../dom/plugins/ipc/PluginModuleChild.cpp, line 669
TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/devtools/profiler/test/browser_profiler_run.js | Exited with code 11 during test run
INFO | automation.py | Application ran for: 0:38:37.496456
INFO | automation.py | Reading PID log: /tmp/tmpWIXyLipidlog
==> process 2259 launched child process 2322
==> process 2259 launched child process 2336
INFO | automation.py | Checking for orphan process with PID: 2322
INFO | automation.py | Checking for orphan process with PID: 2336
PROCESS-CRASH | chrome://mochitests/content/browser/browser/devtools/profiler/test/browser_profiler_run.js | application crashed [@ js::SPSProfiler::exit(JSContext*, JSScript*, JSFunction*)]
Crash dump filename: /tmp/tmpJgPKH5/minidumps/48c6402b-0146-d5c0-3b04128e-0e4ec12c.dmp
Operating system: Linux
                  0.0.0 Linux 2.6.31.5-127.fc12.x86_64 #1 SMP Sat Nov 7 21:11:14 EST 2009 x86_64
CPU: amd64
     family 6 model 23 stepping 10
     2 CPUs

Crash reason:  SIGSEGV
Crash address: 0x0

Thread 0 (crashed)
 0  libxul.so!js::SPSProfiler::exit(JSContext*, JSScript*, JSFunction*) [SPSProfiler.cpp : 130 + 0x18]
    rbx = 0x00000000020b7278   r12 = 0x00007fcec82f4c40
    r13 = 0x0000000001d16360   r14 = 0x00007fcec82f9200
    r15 = 0x00007fff7c334620   rip = 0x00007fcedbd37777
    rsp = 0x00007fff7c333e60   rbp = 0x00007fff7c333e90
    Found by: given as instruction pointer in context
 1  libxul.so!js::Probes::exitScript(JSContext*, js::Unrooted<JSScript*>, JSFunction*, js::StackFrame*) [jsprobes.h : 368 + 0x16]
    rbx = 0x00007fcec9dfd450   r12 = 0x00007fcec82f4c40
    r13 = 0x000000000213e0c0   r14 = 0x00007fcec82f9200
    r15 = 0x00007fff7c334620   rip = 0x00007fcedbb850ca
    rsp = 0x00007fff7c333ea0   rbp = 0x00007fff7c333ea0
    Found by: call frame info
 2  libxul.so!js::StackFrame::epilogue(JSContext*) [Stack.cpp : 330 + 0x11]
    rbx = 0x00007fcec9dfd450   r12 = 0x00007fcec82f4c40
    r13 = 0x000000000213e0c0   r14 = 0x00007fcec82f9200
    r15 = 0x00007fff7c334620   rip = 0x00007fcedbce2f31
    rsp = 0x00007fff7c333eb0   rbp = 0x00007fff7c333f30
    Found by: call frame info
 3  libxul.so!js::Interpret(JSContext*, js::StackFrame*, js::InterpMode) [jsinterp.cpp : 3805 + 0xa]
    rbx = 0x000000000213e0c0   r12 = 0x0000000000000001
    r13 = 0x0000000000000001   r14 = 0x00007fcec9dfd450
    r15 = 0x00007fff7c334620   rip = 0x00007fcedbb94a7b
    rsp = 0x00007fff7c333f40   rbp = 0x00007fff7c3347c0
    Found by: call frame info
 4  libxul.so!js::RunScript(JSContext*, JS::Handle<JSScript*>, js::StackFrame*) [jsinterp.cpp : 346 + 0xc]
    rbx = 0x000000000213e0c0   r12 = 0x00007fff7c3348c0
    r13 = 0x00007fcec9dfd450   r14 = 0x00007fff7c3347f0
    r15 = 0x00007fcec82f4c40   rip = 0x00007fcedbb998a5
    rsp = 0x00007fff7c3347d0   rbp = 0x00007fff7c334850
    Found by: call frame info
 5  libxul.so!SendToGenerator [jsiter.cpp : 1579 + 0x7]
    rbx = 0x000000001234d340   r12 = 0x000000000213e0c0
    r13 = 0x0000000000000001   r14 = 0x00007fff7c334870
    r15 = 0x00007fff7c3348c0   rip = 0x00007fcedbb9d31c
    rsp = 0x00007fff7c334860   rbp = 0x00007fff7c334910
    Found by: call frame info
 6  libxul.so!generator_send_impl(JSContext*, JS::CallArgs) [jsiter.cpp : 1657 + 0x4]
    rbx = 0x0000000000000000   r12 = 0x00007fcec9dfd438
    r13 = 0x000000001234d340   r14 = 0x0000000000000001
Blocks: 795268
https://tbpl.mozilla.org/php/getParsedLog.php?id=18115478&tree=Mozilla-Inbound
Summary: Intermittent browser_profiler_run.js | Exited with code 11 during test run | application crashed [@ js::SPSProfiler::exit(JSContext*, JSScript*, JSFunction*)] (Assertion failure: stack_[*size_].js(), at ../../../js/src/vm/SPSProfiler.cpp:130) → Intermittent browser_profiler_run.js, browser_profiler_controller.js | application crashed [@ js::SPSProfiler::exit(JSContext*, JSScript*, JSFunction*)] (Assertion failure: stack_[*size_].js(), at ../../../js/src/vm/SPSProfiler.cpp:130)
Component: Developer Tools → Developer Tools: Profiler
Anton, could you take a look at this top orange please?
Flags: needinfo?(anton)
This may be related to bug 822831.
(In reply to Benoit Girard (:BenWa) from comment #35)
> This may be related to bug 822831.

Doesn't look like it.
sfink you looked at bug 822041. Any chance someone could look into this assertion failure as well? It's a top orange and is very likely that users will run into this when using the profiler from the devtools.
Changing requestee according to comment 41.
Flags: needinfo?(anton) → needinfo?(sphink)
Ugh. This looks like it's probably a problem where in some edge situations, the JS function enter and leave instrumentation gets mismatched. I'm not sure what makes it intermittent, though -- GC timing? I ran it locally 120 times and had zero failures. I haven't been involved in that instrumentation since well before Ion existed. I guess I'll start diffing good/bad logs or something.
Flags: needinfo?(sphink)
Thank you for taking a look :-)
Just in case anyone is looking and is as blind as me: comment 0 clearly shows we're re-entering a generator, which is always a little tricky.
I'm not sure this will be enough to get anything actionable, but it's worth a shot. I have a local patch that adds a JS shell option to enable the profiler, so it's at least tracking enters/exits and would assert if they were unbalanced. I ran it with a generator, but it found nothing. I'll probably want to augment this patch with an indication of what mode each enter/exit happens in (interpreter, JM, IM). But I can add that in next.
Attachment #699570 - Flags: review?(jorendorff)
Assignee: nobody → sphink
robcee: You mentioned this bug on #devtools. Is this blocking you in some way? I am trying to figure out where to prioritize it.
Flags: needinfo?(rcampbell)
(In reply to Steve Fink [:sfink] from comment #85)
> robcee: You mentioned this bug on #devtools. Is this blocking you in some
> way? I am trying to figure out where to prioritize it.

not blocking. It's just a pretty frequent crasher triggered by the newly-landed profiler integration. More of a concern than a blocker.

Also, it seems like the frequency just went up in the last day or so. Any idea why it would start occurring more frequently all of a sudden?
Flags: needinfo?(rcampbell)
(In reply to Rob Campbell [:rc] (:robcee) from comment #99)
> Also, it seems like the frequency just went up in the last day or so. Any
> idea why it would start occurring more frequently all of a sudden?

It could be related to the increased number of pushes due to the post-uplift landing rush and this week's b2g sprint.
Attachment #699570 - Flags: review?(jorendorff) → review+
Whiteboard: [leave open]
this crash is still pretty frequent. Steve, are you gathering data on this?
https://tbpl.mozilla.org/php/getParsedLog.php?id=19570664&tree=Firefox

Assertion failure: stack_[*size_].js(), at e:/builds/moz2_slave/m-cen-w32-dbg/build/js/src/vm/SPSProfiler.cpp:145
TEST-UNEXPECTED-FAIL | chrome://mochitests/content/browser/browser/devtools/profiler/test/browser_profiler_bug_830664_multiple_profiles.js | Exited with code -2147483645 during test run
PROCESS-CRASH | chrome://mochitests/content/browser/browser/devtools/profiler/test/browser_profiler_bug_830664_multiple_profiles.js | application crashed [@ js::SPSProfiler::exit(JSContext *,js::Unrooted<JSScript *>,js::Unrooted<JSFunction *>)]
(In reply to Rob Campbell [:rc] (:robcee) from comment #155)
> this crash is still pretty frequent. Steve, are you gathering data on this?
Flags: needinfo?(sphink)
Crash Signature: [@ js::SPSProfiler::exit(JSContext*, JSScript*, JSFunction*)] → [@ js::SPSProfiler::exit(JSContext*, JSScript*, JSFunction*)] [@ js::SPSProfiler::exit(JSContext *,js::Unrooted<JSScript *>,js::Unrooted<JSFunction *>)]
Summary: Intermittent browser_profiler_run.js, browser_profiler_controller.js | application crashed [@ js::SPSProfiler::exit(JSContext*, JSScript*, JSFunction*)] (Assertion failure: stack_[*size_].js(), at ../../../js/src/vm/SPSProfiler.cpp:130) → Intermittent browser_profiler_run.js, browser_profiler_controller.js | application crashed [@ js::SPSProfiler::exit(JSContext *,js::Unrooted<JSScript *>,js::Unrooted<JSFunction *>)] (Assertion failure: stack_[*size_].js(), at SPSProfiler.cpp:130)
Blocks: 839525
Summary: Intermittent browser_profiler_run.js, browser_profiler_controller.js | application crashed [@ js::SPSProfiler::exit(JSContext *,js::Unrooted<JSScript *>,js::Unrooted<JSFunction *>)] (Assertion failure: stack_[*size_].js(), at SPSProfiler.cpp:130) → Intermittent browser_profiler_run.js, browser_profiler_controller.js | application crashed [@ js::SPSProfiler::exit] (Assertion failure: stack_[*size_].js(), at ../../../js/src/vm/SPSProfiler.cpp:130)
(In reply to Ed Morley [:edmorley UTC+0] from comment #215)
> (In reply to Rob Campbell [:rc] (:robcee) from comment #155)
> > this crash is still pretty frequent. Steve, are you gathering data on this?

Steve, any progress?
Please can you find an owner for this intermittent-failure - the current overall tree intermittent failure rate is spiralling out of control & the majority of bugs are unowned (see dev.platform thread).
Flags: needinfo?(vporof)
Steve is the owner for this bug, AFAIK.
Merging bug 839525 into this one.
Summary: Intermittent browser_profiler_run.js, browser_profiler_controller.js | application crashed [@ js::SPSProfiler::exit] (Assertion failure: stack_[*size_].js(), at ../../../js/src/vm/SPSProfiler.cpp:130) → Intermittent browser_profiler_run.js, browser_profiler_controller.js, browser_profiler_bug_830664_multiple_profiles.js | application crashed [@ js::SPSProfiler::exit] (Assertion failure: stack_[*size_].js(), at ../../../js/src/vm/SPSProfiler.cpp:130)
orangefactor says this is hitting 5% of the time, but I saw it in 0/100 runs on a borrowed slave.

But the instrumentation I added says something very odd: the profiler doesn't see *any* JS frames on the stack when it crashes, but the stack trace shows several RunScripts and similar that should have pushed frames.

I'll stop it under gdb in a similar situation, if I can get to one, and see if I can tell what's up.

I'm also going to look into cross-checking the profiler against a StackIter, if I can find a good place to put the check.

I've already enabled the profiler data collection in the JS shell, which should hit the same assertion even though nothing can access its data, and nothing in the test suite triggered it for me.
Flags: needinfo?(sphink)
And yes, I'm working on this bug. Clearing needinfo.
Flags: needinfo?(vporof)
Ugh. I copied in the exact build package and test package from one of these intermittents onto the slave. So far it's passed 5495 out of 5495 runs (with full setup/teardown for each.)

My latest hypothesis is that it's not a simple mismatch between enters and exits. The tests seem to start up profiling and then shut it down (both by clicking on buttons in the UI). Exiting a frame after the profiler has been disabled is supposed to still update the profiler's notion of the stack. It does this by recording in a frame whether we pushed it onto the profiler's stack on entry, so that it removes it on exit whether or not the profiler is enabled. Perhaps something is going wrong with this mechanism? (I have no particular evidence for this hypothesis, other than it's timing-related, and this is an intermittent.)
I appreciate the work everyone has put into trying to fix this so far, however 4 months & 360 failures is too much - browser_profiler_run.js, browser_profiler_controller.js, browser_profiler_bug_830664_multiple_profiles.js are going to have to sit in the naughty corner for now, sorry:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a8e42871854b
Whiteboard: [leave open] → [test disabled][leave open]
Latest status: I'm having a heck of a time reproducing this. I've seen it once in the last few weeks of trying. Much of that time has been wasted, however, because it seems like some of the test stuff comes from the tests/mochitest/browser/... subtree, and some comes from a tests.jar file. For example, I deleted the intermittent failures that were happening far more frequently than this bug, and seemingly prevented this bug from happening when they did fail -- and when I came back to check on it, it was still failing in the deleted tests. Apparently it was pulling them from the jar. Given that a single test run is 42 minutes, and I need at least 30 runs to have a decent chance of seeing this failure, it's easy to forget about this for extended periods.

Anyway, I'm still slowly trying to find a way to whittle down the test suite to something that can run faster, but I also have an unfinished patch to add some flight recording to record a bunch more information "in the field" (as in, in the tests that were just disabled. Either by re-enabling them, or running them in a mode where they record what I need and magically report the handful of failures to me. The first sounds easier.)
Sounds like it's been a nightmare - thank you for attempting to tackle this! :-)
I'm at a loss for what's going wrong. Here's a flight recorderish diagnostic patch to land temporarily, because I can't seem to reproduce on my test slave, and yet it happens about 6 times a day. I'll back this out once I get a single failure log. (It will also log on success, so I can compare and see the difference.)
Attachment #730904 - Flags: review?(jimb)
Comment on attachment 730904 [details] [diff] [review]
Diagnostics for trying to track down bug 822041

Review of attachment 730904 [details] [diff] [review]:
-----------------------------------------------------------------

Other than the one glitch, this looks great.

::: js/src/vm/SPSProfiler.cpp
@@ +471,5 @@
> +    if (nevents > size) {
> +        start = nevents - size;
> +        fprintf(stderr, "HISTORY: %lu expired events\n", nevents - size);
> +    }
> +    for (size_t i = start; (i % size) != bufpos; ++i) {

If the buffer ever overflows, this loop condition will be false on the first iteration, and you won't dump anything.
Attachment #730904 - Flags: review?(jimb) → review+
Attachment #730931 - Flags: review?(jimb)
Attachment #730904 - Attachment is obsolete: true
Ok, this time for real. I cut down the buffer size quite a bit because it seemed a little extreme as I was watching the output scroll by.

I think JM does stub calls, though I didn't look that closely.

IM does things inline, so I added the logging code there. It's clunky and could probably be done much better, but it seems to work.
Attachment #731020 - Flags: review?(jimb)
Attachment #730931 - Attachment is obsolete: true
Attachment #730931 - Flags: review?(jimb)
Comment on attachment 731020 [details] [diff] [review]
Diagnostics for trying to track down

Review of attachment 731020 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/ion/IonMacroAssembler.h
@@ +680,4 @@
>      {
> +        // p->history_.buffer[p->history_.nevents % size].event = ION_PUSH|ION_POP
> +        movePtr(ImmWord(&p->history_.nevents), temp);
> +        loadPtr(Address(temp, 0), temp);

It's a pity the js::ion::Address constructor doesn't default its second argument to zero, the way JSC::AbstractMacroAssembler::Address does.

@@ +681,5 @@
> +        // p->history_.buffer[p->history_.nevents % size].event = ION_PUSH|ION_POP
> +        movePtr(ImmWord(&p->history_.nevents), temp);
> +        loadPtr(Address(temp, 0), temp);
> +        JS_STATIC_ASSERT((p->history_.size & (p->history_.size - 1)) == 0);
> +        and32(Imm32(p->history_.size - 1), temp);

I don't know the IM assembler too well, but is there a reason you're doing an 'and32' here, instead of an andPtr? Mixing word sizes is asking for trouble...

::: js/src/vm/SPSProfiler.cpp
@@ +471,5 @@
> +                                        "OVERPOP",
> +                                        "ION_PUSH",
> +                                        "ION_POP" };
> +
> +    fprintf(stderr, "HISTORY dump begin\n");

It might be nice to put "Bug 822041" somewhere in this message, so people looking at the logs will have some idea what the heck this is, and can find out that it's temporary.
Attachment #731020 - Flags: review?(jimb) → review+
Added in a JM version. Mostly cut & paste.
Attachment #731258 - Flags: review?(jimb)
Attachment #731020 - Attachment is obsolete: true
Comment on attachment 731258 [details] [diff] [review]
Diagnostics for trying to track down bug 822041

Review of attachment 731258 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/methodjit/BaseAssembler.h
@@ +1482,5 @@
> +            JS_NOT_REACHED("unknown sps history entry size");
> +        addPtr(ImmPtr(&p->history_.buffer[0]), reg);
> +
> +        // offset=-1 for pop, offset=0 for push (the increment will happen later)
> +        store32(Imm32(offset < 0 ? SPSHistory::ION_POP : SPSHistory::ION_PUSH), Address(reg, 0));

Might be nice to assert that sizeof(p->history_.buffer[0].event) is 4. Maybe that's too fussy.

@@ +1483,5 @@
> +        addPtr(ImmPtr(&p->history_.buffer[0]), reg);
> +
> +        // offset=-1 for pop, offset=0 for push (the increment will happen later)
> +        store32(Imm32(offset < 0 ? SPSHistory::ION_POP : SPSHistory::ION_PUSH), Address(reg, 0));
> +        storePtr(ImmPtr(script), Address(reg, sizeof(void*)));

Doesn't this assume that sizeof(p->history_.buffer[0].event) is sizeof(void*)? It would be better to use an offsetof macro here, I think...
Attachment #731258 - Flags: review?(jimb) → review+
This fixes a bug, which is very probably what causes the symptoms in this bug.

The problem is that when a generator frame is reactivated, it is copied from the heap to the stack. That copy includes the hasPushedSPSFrame flag. So if the profiler was active when the generator was first created, it will always think it needs to pop exited generator frames. If the profiler is exited in between, that will cause a mismatched pop().

Note that the reverse is not a problem -- if the profiler was disabled when a generator is created, then the frames will be copied out unpushed, but they'll still get the bit set correctly when the frame is actually pushed. (And if it didn't, it still wouldn't matter, as long as it didn't push onto the profiler stack without setting the bit.)
Attachment #732139 - Flags: review?(luke)
Comment on attachment 732139 [details] [diff] [review]
Do not copy hasPushedSPSFrame() from heap generator frame

nice
Attachment #732139 - Flags: review?(luke) → review+
Attachment #732139 - Flags: checkin+
Whoops, don't need [leave open] any more.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Whiteboard: [test disabled][leave open]
Comment on attachment 732139 [details] [diff] [review]
Do not copy hasPushedSPSFrame() from heap generator frame

[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 761261

User impact if declined: incorrect JS stacks in somewhat unusual situations

Sheriff impact if declined: annoyance leading to high blood pressure leading to heart disease leading to eventual death and the necessity of hiring new sheriffs

Testing completed (on m-c, etc.): on m-c for a week or so, intermittents are now gone

Risk to taking this patch (and alternatives if risky): low risk. It affects bookkeeping when the profiler is active and generators are being used

String or IDL/UUID changes made by this patch: none
Attachment #732139 - Flags: approval-mozilla-aurora?
Note that only the one patch is necessary.
Comment on attachment 732139 [details] [diff] [review]
Do not copy hasPushedSPSFrame() from heap generator frame

Approving in favor of Sheriff impact if declined ;) haha

Low risk enough, fixes intermittent and baked on m-c , Sheriff's are happy :) lets get it on !
Attachment #732139 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
My wife and child thank you!
Product: Firefox → DevTools
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: