Closed Bug 1163991 Opened 5 years ago Closed 11 months ago

Asm.js call stack is not shown when aborting JavaScript execution on slow script dialog.

Categories

(Core :: JavaScript Engine, defect, P2)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox41 --- affected

People

(Reporter: jujjyl, Assigned: bbouvier)

References

Details

(Whiteboard: [gaming-tools])

Attachments

(3 files, 2 obsolete files)

Attached file infiniteloop.html
STR:

1. Open infiniteloop.html in browser.
2. Wait until the slow script notification appears: "A web page is causing Nightly to run slowly. What would you like to do?"
3. Click Options -> Stop script.
4. Observe that the web page console prints an incorrect partial callstack:

Error: Script terminated by timeout at:
@file:///Users/jjylanki/infloop/infiniteloop.html:37:12

5. Go to about:config and disable asm.js by setting the pref "javascript.options.asmjs;false".
6. Repeat steps 1.-3.
7. Observe that the web page console now reads the correct full callstack:

Error: Script terminated by timeout at:
logSum@file:///Users/jjylanki/infloop/infiniteloop.html:18:1
geometricMean@file:///Users/jjylanki/infloop/infiniteloop.html:29:18
@file:///Users/jjylanki/infloop/infiniteloop.html:37:12

Comparing the output from 4. and 7., it looks like when asm.js is enabled, stopping script execution from slow script dialog does not print frames of functions inside the asm.js module.
Hmm.  The error report there uses js::ComputeStackString.  That ends up calling into SavedStacks::saveCurrentStack which ends up using a FrameIter.

I'm not sure whether this is an issue in the SavedStacks code, in FrameIter, or somewhere else.  There are definitely comments in the SavedStacks code about asm.js frames...
Summary: Asm.js call stack is not shown when aborting JavaScript execution on show script dialog. → Asm.js call stack is not shown when aborting JavaScript execution on slow script dialog.
OK, stepping through this the FrameIter is only seeing one frame.   So it's not a problem in the SavedStacks code per se.

Luke, are asmjs activations lazily reified or something?  In particular, when inside js::FrameIter::settleOnActivation we do see an asmjs activation, go to do the AsmJSFrameIterator thing, but it ends up with a null fp_.  And in particular, AsmJSActivation::fp is returning null.
Flags: needinfo?(luke)
Summary: Asm.js call stack is not shown when aborting JavaScript execution on slow script dialog. → Asm.js call stack is not shown when aborting JavaScript execution on show script dialog.
Summary: Asm.js call stack is not shown when aborting JavaScript execution on show script dialog. → Asm.js call stack is not shown when aborting JavaScript execution on slow script dialog.
In non-profiling mode, AsmJSActivation::fp is only updated when calling out of asm.js.  The slow-script dialog interrupts executing asm.js so there is no call out and thus fp isn't set.  fp is not directly derivable from pc so it's not easy to set in the async interrupt handler either.  When profiling is enabled, we expect to be interrupted at any time so fp is updated on every call/return.  Thus, a hacky workaround would be to enable profiling (I think the Gecko profiler addon has a mode where profiling is always enabled), then run your demo, and then when you Stop Script, you'll see asm.js in the callstack b/c fp is non-null.
Flags: needinfo?(luke)
Hmm, I still think there's something that we should do here.

I can understand that due to a limitation, we are unable to show the callstack by default, and I think that's fine. However I think the core of the problem is that people don't sometimes know that they are not getting the full callstack, and they don't know that they would even have an option that they can set to uncover it.

I wonder, when we print the callstack after stopping at slow script dialog, would it be possible to detect that we were not able to print the full callstack since there were asm.js frames, and if so, we'd print a message "Unable to unwind full callstack inside asm.js code. Please set pref javascript.options.xxxyyyzz = true and run again to retain full callstack information.", where the pref would be something specific for this? The problem with asking users to run with profiling enabled is that sometimes the slow script dialog occurs immediately at page startup, so browser is not responsive that users could go and start profiling, and requiring users to install a separate gecko profiler add-on for this sounds like overkill (and brittle, gecko profiler has a track record of breaking every month or so, not necessarily due to gecko profiler, but external causes, like add-on signature verification becoming enforced, or backend servers going down, etc).
To wit, you can enable profiling from the builtin profiler and it also works to start profiling when on about:blank and then navigate to the page to have profiling enabled during statup.  (There is currently a glitch with e10s+devtools where when you make a new tab it's still in the parent process, so when you navigate and the child process is created, you lose the profiler.  The workaround is to first visit a "real" page (so a child process is created), navigate to about:blank, then start profiling, then navigate).

But you make a good point that it would be pretty easy to have a mode where the frame pointer was always being maintained (that didn't involve profiling and so had very low overhead).  Perhaps we could simply toggle this mode whenever the devconsole was opened: by design, enabling frame pointer maintenance doesn't require recompiling the code, just doing a <100ms patching operation.  Nick: I know this is about to change from what you were saying today but, is there a simple signal (now or soon) we can listen for in JS that tells us when the devconsole is open?
Flags: needinfo?(nfitzgerald)
(In reply to Luke Wagner [:luke] from comment #5)
> Perhaps we could simply toggle this mode
> whenever the devconsole was opened

I'm not sure if I like this, since I've enjoyed to ridicule IE11 and Safari due to their lame behavior of changing how a page runs functionally when the console is open, which people have always reported as counterintuitive. We've seen IE11 and Safari bugs where the bug would disappear or appear when the web console window was open. We don't do anything like this currently afaik(?), and it would be nice to keep that.
You're right, and that was on my mind.  However, unlike IE/Safari which I expect completely turn off their beefy JITs when the console is open, in this case we'd only be enabling a feature which has a minimal (I'd guess 5%) perf hit.
Would patching require the developer to open the web console before running the page? If developer opens the web console while the page is already hung, would the callstack show up, or does it require that the page yields once back from user JS code to browser before the patching can be performed?

In either case, would it still be possible to add a message to when the callstack is printed if we detect that asm.js frames were present and those could not be obtained? Or is that difficult in general to know? I'd like that users are guided so there is no danger of missing out by accident - e.g. printing "Asm.js call stack was not captured. Please rerun the page with web console window open." or something similar?
(In reply to Jukka Jylänki from comment #8)
> Would patching require the developer to open the web console before running
> the page?

Nope, patching happens at the next turn of the event loop (when no asm.js code is on the stack) after the console is opened.  That's why you can start profiling at any time (whether you started with the console open or not) and it basically just works (unless, as you said, you have a hung stack when you open the profiler).

> In either case, would it still be possible to add a message to when the
> callstack is printed if we detect that asm.js frames were present and those
> could not be obtained?

Yes, we can always do that: we always know when asm.js is on the stack and when we can't iterate it.
(In reply to Luke Wagner [:luke] from comment #9)
> > In either case, would it still be possible to add a message to when the
> > callstack is printed if we detect that asm.js frames were present and those
> > could not be obtained?
> 
> Yes, we can always do that: we always know when asm.js is on the stack and
> when we can't iterate it.

Ok, then that sounds perfect - having a message that guides the developer to open the web console, and then to try again is great.
(In reply to Luke Wagner [:luke] from comment #5)
> But you make a good point that it would be pretty easy to have a mode where
> the frame pointer was always being maintained (that didn't involve profiling
> and so had very low overhead).  Perhaps we could simply toggle this mode
> whenever the devconsole was opened: by design, enabling frame pointer
> maintenance doesn't require recompiling the code, just doing a <100ms
> patching operation.  Nick: I know this is about to change from what you were
> saying today but, is there a simple signal (now or soon) we can listen for
> in JS that tells us when the devconsole is open?

Sorry for the delay! I wrote a reply, never submitted it, and then garbage collected the tab, only to realize my error now :(

I suspect it would be easier to have the devtools/console manage turning this mode on and off than to try and listen for the devtools/console opening and closing from C++, since the devtools are largely implemented in JS. If we really wanted to, we could probably add an observer topic or something, but it really seems easier to me if we add a method to Debugger for enabling and disabling asm.js frame pointers and let the console orchestrate.

This problem won't really change soon: what we talked about the other day is that the debugger is going to be enabled whenever devtools are open. People were confused when "debugger;" statements weren't pausing despite the debugger never having been activated. However, we will be using Debugger.prototype.allowUnobservedAsmJS = true so that we don't disable all asm.js whenever devtools are open.

Hopefully between

(a) the lazy bailouts for debugging (only bailout for a function if you set a BP on it or request observing all frame entries/exits) and 

(b) the allowUnobservedAsmJS flag 

we can alleviate the conditions that lead to the type of heisenbugs that people were describing earlier.
Flags: needinfo?(nfitzgerald)
Thanks for the info Nick!  So would a quick proxy be that, when debug mode is enabled, to enable profiling in all asm.js code?
(In reply to Luke Wagner [:luke] from comment #12)
> Thanks for the info Nick!  So would a quick proxy be that, when debug mode
> is enabled, to enable profiling in all asm.js code?

Yes, once bug 1132501 is fixed.
Whiteboard: [gaming-tools]
Priority: -- → P2
A partner developer is hitting this pretty hard on a large codebase that needs Unity exception handling. In particular they lost quite a bit of time scratching their heads since they did not understand they were not seeing the full call stack. Having the info message that would print that not all stack frames were available would be very useful to have, perhaps it could show the asm.js functions as "<unknown>" or with some other placeholder in the print, and then after the callstack, put in an explaining message to enable profiling or debugging to get the full call stack?
Yes, we should do this.  I can add it to my todo list; but Benjamin feel free to steal!

Thinking about this again, though I realized that, when we interrupt a running script, we save the interrupted PC in the WasmActivation::resumePC (in case the dialog doesn't stop script and we need to resume), so we could lookup the CodeRange for that and then at least give the innermost frame (which is probably what people really want anyway).

Even given that, it still seems like a good idea to do the other two things:
 - enable profiling when the dev console is open (I think that corresponds to debug-mode being enabled (but not necessarily debuggerObservesAsmJS).
 - print a "frames may be missing" message (under the innermost frame) when profiling is not enabled and a WasmActivation is skipped.
Attached patch wip.patch (obsolete) — Splinter Review
So something like that?

I've modified testTimeout1.js so that it prints the stack trace when it times out, which exerces the interrupt check path:

- with asm.js disabled, before the patch:
@/code/mozilla-inbound/js/src/jit-test/tests/asm.js/testTimeout1.js:8:12
g@/code/mozilla-inbound/js/src/jit-test/lib/asm.js line 20 > Function:1:48
@/code/mozilla-inbound/js/src/jit-test/tests/asm.js/testTimeout1.js:10:1

- with asm.js enabled, before the patch:
@/code/mozilla-inbound/js/src/jit-test/tests/asm.js/testTimeout1.js:8:12
@/code/mozilla-inbound/js/src/jit-test/tests/asm.js/testTimeout1.js:10:1

So "g" is missing.

- with asm.js enabled, after the patch:
@/code/mozilla-inbound/js/src/jit-test/tests/asm.js/testTimeout1.js:8:12
g (Note that in non-profiling mode, some asm.js/wasm frames may not appear. You can have them show up in the stack trace by enabling the profiler mode in the developer tools.)@/code/mozilla-inbound/js/src/jit-test/lib/asm.js line 20 > Function:0:1
@/code/mozilla-inbound/js/src/jit-test/tests/asm.js/testTimeout1.js:10:1

So "g" appears now \o/ with a lengthy message explaining to open the profiler in the dev tools.

I've opted for a message directly after the function's name, so that the code handling the particular case of wasm with no FP doesn't leak into vm/Stack.cpp. It has the drawback to be a bit ugly, though.

I'll look into the debugger check tomorrow. I'd appreciate any leads, as I have no idea where to start looking :-) Thanks!
Assignee: nobody → bbouvier
Status: NEW → ASSIGNED
Attachment #8750882 - Flags: feedback?(luke)
Sweet work!

I think I would prefer not needing to open the profiler devtool to enable this, since the built-in performance profiler has problems in it that make it infeasible to use for more than a few seconds at a time since it locks up the browser after a while:
 - https://bugzilla.mozilla.org/show_bug.cgi?id=1259268
 - https://bugzilla.mozilla.org/show_bug.cgi?id=1160330

Also needing to start a profiling session with the devtool before each page load makes it a bit tedious to use. Because of this, I think it would probably be better to have a pref to set, since that can be set on once and persist. Perhaps when the pref is on, we could bring a console warning message after asm.js compilation succeeds, saying "asm.js performance warning: running in call stack tracing mode" or similar, to remind that the mode is active?

Geckoprofiler add-on works better for long running samples and generally is ok to keep running with smaller performance overhead, and it's ok for my workflow and for some partners to whom it's easy to recommend to install the add-on. Although, when I run the test case from c0 in default Nightly, asm.js enabled and no profiling active, I get

Error: Script terminated by timeout at:
@file:///Users/jjylanki/Downloads/infiniteloop.html:37:12

If I enable profiling via geckoprofiler, while keeping asm.js enabled, I get

Error: Script terminated by timeout at:
geometricMean@file:///Users/jjylanki/Downloads/infiniteloop.html:29:1
@file:///Users/jjylanki/Downloads/infiniteloop.html:37:12

and after disabling asm.js via javascript.options.asmjs;false, I get

Error: Script terminated by timeout at:
logSum@file:///Users/jjylanki/Downloads/infiniteloop.html:18:41
geometricMean@file:///Users/jjylanki/Downloads/infiniteloop.html:29:18
@file:///Users/jjylanki/Downloads/infiniteloop.html:37:12

I wonder if the above patch will relate to this - I see that I don't get a full callstack just with profiling enabled, at least when "only" geckoprofiler is being used. Is geckoprofiler somehow worse in power than the devtool profiler? (I presume it should be equal)
(In reply to Benjamin Bouvier [:bbouvier] from comment #16)
Sweet, yes!  The only tricky bit I was realizing is that we need to worry about trash values of resumePC (left behind from previous interrupts).  Can you walk through the interrupt path and ensure that it gets cleared once an interrupt resumes?

(In reply to Jukka Jylänki from comment #17)
This "show the innermost frame" change would Just Work w/o any profiler open.  The other changed suggested is to enable profiling (and thus full callstacks) any time the *web console* is open; which seems like far less trouble.  But that's not in the current patch and perhaps could be a follow-up.
Attachment #8750882 - Flags: feedback?(luke) → feedback+
(In reply to Luke Wagner [:luke] from comment #18)
> (In reply to Benjamin Bouvier [:bbouvier] from comment #16)
> Sweet, yes!  The only tricky bit I was realizing is that we need to worry
> about trash values of resumePC (left behind from previous interrupts).  Can
> you walk through the interrupt path and ensure that it gets cleared once an
> interrupt resumes?

So I've given it some thought, and tried to make test cases where we would read a trash value of resumePC. I could not, which does *not* mean that such a test case couldn't be forged.

Let's try a proof: either there is a WasmActivation on the stack, or there is not. If there aren't any, we don't have to worry about trash values of resumePC.

If there is a WasmActivation on the stack, either it is the innermost frame (if the stack goes down, that means the most recent call), or it is not. If it is the innermost frame, then we will trigger the mechanism that sets resumePC in the interrupt redirect handler [0], so we're fine. If it is not, it means we've gone back to JS from wasm, i.e. we've called an FFI. In which case, as the path to the FFI uses a profiling exit [1], we have a good value of fp and we won't read resumePC. So, either fp is set, or we have a meaningful resumePC value.

If there are more than one WasmActivation on the stack, either they concern different modules (thus are independent and the reasoning above can be applied to each module), or they relate to the same module; but this means one of them called back to JS, so we can recursively apply the reasoning above to get to the same conclusion as the previous paragraph.

[0] https://dxr.mozilla.org/mozilla-central/source/js/src/asmjs/WasmSignalHandlers.cpp#1208
[1] https://dxr.mozilla.org/mozilla-central/source/js/src/asmjs/WasmFrameIterator.cpp#380

Is this reasoning correct, or am I missing another way to escape? Or were you worried about something else? Or is it just safer to clear the resumePC in any case?

I think clearing the resumePC could be done in assembly in the interrupt exit otherwise. (if not for ARM which reuses the value *after* the call to HandleWasmInterrupt, we could even clear it in HandleWasmInterrupt)
Flags: needinfo?(luke)
(In reply to Benjamin Bouvier [:bbouvier] from comment #19)
> I think clearing the resumePC could be done in assembly in the interrupt
> exit otherwise. (if not for ARM which reuses the value *after* the call to
> HandleWasmInterrupt, we could even clear it in HandleWasmInterrupt)

I actually misread GenerateInterruptStub; we *can* clear resumePC in HandleWasmInterrupt. As it's cheap and brings us certainty, I'll make a patch that uses this; that being said, curious to hear what you think about the previous comment.
Yes, I was mostly just worrying about *after* we've taken the interrupt, how resumePC goes back to null; I couldn't remember whether HandleWasmInterrupt was executed on all paths but, if it is, then yes it makes sense to clear it.
Flags: needinfo?(luke)
Attached patch stackframe.patch (obsolete) — Splinter Review
Attachment #8750882 - Attachment is obsolete: true
Attachment #8758722 - Flags: review?(luke)
Comment on attachment 8758722 [details] [diff] [review]
stackframe.patch

Review of attachment 8758722 [details] [diff] [review]:
-----------------------------------------------------------------

Looking very good, just a few tweaks:

::: js/src/asmjs/WasmFrameIterator.cpp
@@ +65,5 @@
>  {
>      if (fp_)
>          settle();
> +    else
> +        trySynthesizingInnermostFrame(activation.resumePC());

Since there is only one caller, I'd inline so that way we can read the code in the context of the constructor which helps it make more sense.

@@ +84,5 @@
> +    synthesizedInnermostFrame_ = true;
> +    MOZ_ASSERT(!done());
> +
> +    if (codeRange->kind() == CodeRange::Function)
> +        codeRange_ = codeRange;

What if we use the state (!fp_ && codeRange_) to represent this new state where we're looking at a CodeRange found by resumePC.  Then FrameIterator::lineOrBytecode() could return codeRange_->funcLineOrBytecode() instead of 0.

Then we keep but rename synthesizedInnermostFrame_ to missingFramesMessage_ and the idea is when we're in the state (!fp_ && codeRange_), missingFrameMessage_ = false, but when we do operator++, we set codeRange_ = null and synthesizedInnermostFrame_ = true and the functionDisplayAtom() is the static string "frames may be missing; enable the profiler before running to see all frames".

Lastly, in the special case where resumePC is not in a CodeRange::Function, we could go straight to the state missingFrameMessage_ = true.

Then done() = fp_ || codeRange_ || missingFrameMessage_.

@@ +150,5 @@
> +    if (synthesizedInnermostFrame_) {
> +        chars = JS_smprintf("/!\\ Note that in non-profiling mode, some asm.js/wasm frames "
> +                "may not appear and line numbers/offsets won't be correct. Enable profiler mode in"
> +                " the devtools to get all the frames in the stack trace. "
> +                "Innermost frame: %s", chars);

This leaks the memory allocated by JS_smprintf.

::: js/src/asmjs/WasmTypes.cpp
@@ +56,5 @@
>  WasmHandleExecutionInterrupt()
>  {
> +    WasmActivation* activation = JSRuntime::innermostWasmActivation();
> +    bool success = CheckForInterrupt(activation->cx());
> +    activation->setResumePC(nullptr);

Perhaps add a comment that the calling stub has already saved a copy of resumePC and we clear it here to preserve the invariant that resumePC != null implies we're in the process of handling an interrupt.
Attached patch stackframe.patchSplinter Review
Here we go:

Script runs for too long, terminating.
@/tmp/a.js:46:11
asm_outer@/tmp/a.js:19:1
asm.js/wasm frames may be missing; enable the profiler before running to see allframes@/tmp/a.js:0:1
@/tmp/a.js:50:1

Also moved done() to the CPP file as it's closely coupled to operator++ and all the assertions calling done().
Attachment #8758722 - Attachment is obsolete: true
Attachment #8758722 - Flags: review?(luke)
Attachment #8758775 - Flags: review?(luke)
Comment on attachment 8758775 [details] [diff] [review]
stackframe.patch

Review of attachment 8758775 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks!

::: js/src/asmjs/WasmFrameIterator.cpp
@@ +71,5 @@
> +    void* pc = activation.resumePC();
> +    if (!pc)
> +        return;
> +
> +    MOZ_ASSERT(module_->containsCodePC(pc));

I think this assert is more conservative than the MOZ_ASSERT(codeRange) below, so I'd be fine removing.

@@ +81,5 @@
> +        codeRange_ = codeRange;
> +    else
> +        missingFrameMessage_ = true;
> +
> +    MOZ_ASSERT(!codeRange_ == done());

I don't see how this assert holds; it seems like you'd just have MOZ_ASSERT(!done()).

@@ +170,5 @@
>  unsigned
>  FrameIterator::lineOrBytecode() const
>  {
>      MOZ_ASSERT(!done());
> +    return codeRange_ ? codeRange_->funcLineOrBytecode() : 0;

The codeRange's line is different and not as good as the callsite's line, so I think you need
  callsite_ ? callsite_->lineOrBytecode() : codeRange_ ? codeRange_->funcLineOrBytecode() : 0
Attachment #8758775 - Flags: review?(luke) → review+
Pushed by bbouvier@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/d59a4523909b
Show a best effort stack frame in non-profiling mode for wasm; r=luke
Thanks! Updated the patch and pushed this part; marking leave-open for the second part.
Keywords: leave-open
Per policy at https://wiki.mozilla.org/Bug_Triage/Projects/Bug_Handling/Bug_Husbandry#Inactive_Bugs. If this bug is not an enhancement request or a bug not present in a supported release of Firefox, then it may be reopened.
Status: ASSIGNED → RESOLVED
Closed: Last year
Resolution: --- → INACTIVE
Status: RESOLVED → REOPENED
Resolution: INACTIVE → ---
The leave-open keyword is there and there is no activity for 6 months.
:bbouvier, maybe it's time to close this bug?
Flags: needinfo?(bbouvier)
Yeah, let's close it: i'm pretty sure that now that we precisely track FP and use software traps for slow script detection, we'd get a full stack. And asm.js isn't a priority anymore; probably we'd need to check this for wasm though: adding to my todo list.
Status: REOPENED → RESOLVED
Closed: Last year11 months ago
Flags: needinfo?(bbouvier)
Keywords: leave-open
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.