Last Comment Bug 626830 - jsd.asyncOn fails
: jsd.asyncOn fails
Status: RESOLVED FIXED
[firebug-p1][hardblocker][fixed-in-tr...
:
Product: Core
Classification: Components
Component: XPConnect (show other bugs)
: Trunk
: x86 Windows 7
: -- normal (vote)
: ---
Assigned To: Steve Fink [:sfink] [:s:] (PTO Sep23-28)
:
Mentors:
Depends on: 626743 627758 632343
Blocks:
  Show dependency treegraph
 
Reported: 2011-01-18 15:15 PST by John J. Barton
Modified: 2013-12-27 14:29 PST (History)
7 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
betaN+


Attachments
Take note of forced GC cancels (1.64 KB, patch)
2011-01-20 22:18 PST, Steve Fink [:sfink] [:s:] (PTO Sep23-28)
no flags Details | Diff | Splinter Review
Log of failure to turn on (6.54 KB, text/plain)
2011-01-21 09:22 PST, Steve Fink [:sfink] [:s:] (PTO Sep23-28)
no flags Details
Allow debugging to turn on when dormant contexts are on the XPCJSContextStack (1.37 KB, patch)
2011-01-21 16:40 PST, Steve Fink [:sfink] [:s:] (PTO Sep23-28)
dmandelin: review+
Details | Diff | Splinter Review
Part 2: jsd.off() should take effect immediately (1.33 KB, patch)
2011-01-24 13:28 PST, Steve Fink [:sfink] [:s:] (PTO Sep23-28)
dmandelin: review+
Details | Diff | Splinter Review
Allow debugging to turn on when dormant contexts are on the XPCJSContextStack (1.81 KB, patch)
2011-02-03 11:13 PST, Steve Fink [:sfink] [:s:] (PTO Sep23-28)
dmandelin: review+
Details | Diff | Splinter Review

Description John J. Barton 2011-01-18 15:15:36 PST
Although our actual use of Firebug with jsd.asyncOn() seems to be successful in a few cases so far, I am not able to get our FBtest case turning jsd off and on to work.

My current code looks like:

            if (!jsd.isOn)
            {
                var startAsyncOn = new Date().getTime();
                if (FBTrace.DBG_ACTIVATION)
                    FBTrace.sysout("firebug-service.activation begin jsd.asyncOn "+startAsyncOn);
                jsd.asyncOn(  // turn on jsd for the next event
                {
                    onDebuggerActivated: function doDebuggerActivated()
                    {
                        // now we are in the next event and jsd is on.
                        var nowAsyncOn = new Date().getTime();
                        if (FBTrace.DBG_ACTIVATION)
                            FBTrace.sysout("firebug-service.activation now we are in the next event and JSD is ON "+nowAsyncOn+" delta: "+(nowAsyncOn - startAsyncOn)+"ms" );
                        fbs.onDebuggerActivated();
                        fbs.onJSDebuggingActive();
                    }
                });
            }

After returning from this code I run a setTimeout with a 5000ms delay. My tracing messages show the begin jsd.asyncOn, the pause for 5000ms, then the "now we are in the next event" message with a delta of 228ms. 

I can't explain this result.

However this bizarre result is not the only result. Sometimes the "now we are in" message never comes out.  Sometimes I get an XPCOM exception (NS_ERROR_NOT_AVAILABLE) [jsdIDebuggerService.off, sometimes I get
(NS_ERROR_NOT_INITIALIZED) [jsdIDebuggerService.pause]. I think eventually my code stops trying to start jsd because it thinks it is started.

I also see the XPCOM exceptions at other times, outside of the test.

I created a new test just with the script panel. This is *not* a reduced test case, but it is what I have:

1. install firebug svn (to get the same tracing messages)
2, install fbtest svn
3. run fbtest case Script section script/3918/AsyncOn.js

If you open Firebug tracing you can see the time stamps.  If you apply the steps manually, the test fails.  

I don't understand why it fails here and not on other cases.
Comment 1 John J. Barton 2011-01-18 15:17:46 PST
See also Bug 595243 Expose debugMode to JSD
Comment 2 John J. Barton 2011-01-18 15:24:13 PST
Firebug JS debug cannot be reliable without a fix or workaround here.
Comment 3 Steve Fink [:sfink] [:s:] (PTO Sep23-28) 2011-01-19 11:03:56 PST
It is possible that this is due to early breakage from bug 626743; perhaps the compartment is getting reset in some scenarios. I'll deal with that first.
Comment 4 Steve Fink [:sfink] [:s:] (PTO Sep23-28) 2011-01-20 13:14:41 PST
Nope, my WIP patch for bug 626743 does not appear to help here.

(In reply to comment #0)
> My current code looks like:
> ...
> After returning from this code I run a setTimeout with a 5000ms delay. My
> tracing messages show the begin jsd.asyncOn, the pause for 5000ms, then the
> "now we are in the next event" message with a delta of 228ms. 
> 
> I can't explain this result.

Which part?

You wait 5000ms between them because jsd.asyncOn just sets things up so that debugging will be turned on when the next event is processed, but it doesn't do anything to queue up an event itself. I assume if you were to move the mouse or set up a timeout or something, that it would happen sooner.

As for the human-experienced time being 5sec and the delta reporting being far less, I'd say either the browser was traveling near the speed of light relative to you, or there's something very odd going on. When I was doing the test manually, though, I didn't see that -- my delta was getting reported as very large.

Does the current fbtest have the 5sec delay in it? Doesn't seem to. I guess I could try putting one in myself.

> However this bizarre result is not the only result. Sometimes the "now we are
> in" message never comes out.  Sometimes I get an XPCOM exception
> (NS_ERROR_NOT_AVAILABLE) [jsdIDebuggerService.off, sometimes I get
> (NS_ERROR_NOT_INITIALIZED) [jsdIDebuggerService.pause]. I think eventually my
> code stops trying to start jsd because it thinks it is started.

I'm seeing these too, though it's hard to untangle what's going on and when.

It makes sense, though, if the debugger never managed to activate, but you started attempting to use it anyway. That'd result in the NS_ERROR_NOT_INITIALIZED. NS_ERROR_NOT_AVAILABLE looks like it happens if you had previously turned it off and it hasn't done a GC run to clean up dead scripts. Or something like that. Looks tricky; I'll need to talk to dmandelin about it.

> I also see the XPCOM exceptions at other times, outside of the test.
> 
> I created a new test just with the script panel. This is *not* a reduced test
> case, but it is what I have:
> 
> 1. install firebug svn (to get the same tracing messages)
> 2, install fbtest svn
> 3. run fbtest case Script section script/3918/AsyncOn.js
> 
> If you open Firebug tracing you can see the time stamps.  If you apply the
> steps manually, the test fails.
> 
> I don't understand why it fails here and not on other cases.

It does seem very erratic. If I run the test through the test console multiple times, it fails most of the time but occasionally succeeds.

Will look further...
Comment 5 Steve Fink [:sfink] [:s:] (PTO Sep23-28) 2011-01-20 16:44:49 PST
I've found one thing that legitimately seems wrong: the jsd.off() frequently fails because it has some dead scripts queued up and we're supposedly in the middle of a GC pass. We shouldn't really be able to run the JS code that calls jsd.off() in that case, so there's something wrong with the interlocking setup. I'm looking into it, but this does make me realize that this test is dependent on jsd.off() working. The comments in bug 595243 didn't sound too confident that it'd work. I'm wondering if the priority of jsd.off() is a little lower than eg bug 626909, and could potentially be postponed past beta10?

How reliant is the test suite on jsd.off() working?

I don't know if that ends up causing the other errors mentioned here.

The jsd.off() error will manifest itself as the NS_ERROR_NOT_AVAILABLE return value.
Comment 6 John J. Barton 2011-01-20 16:55:14 PST
(In reply to comment #5)
> How reliant is the test suite on jsd.off() working?
> 

The real question could be: how reliant is Firebug on jsd.off() working? Answer: a lot ;-). That is how we stop debugging every page the users looks at.
Comment 7 Steve Fink [:sfink] [:s:] (PTO Sep23-28) 2011-01-20 22:18:19 PST
Created attachment 505696 [details] [diff] [review]
Take note of forced GC cancels

The problem where jsd.off() returning NS_ERROR_NOT_AVAILABLE happens when you have some dead scripts, but GC callback cancels the GC run for some evil reason. JSD only remembers that it got a JSGC_BEGIN callback, and assumes that GC is still happening, so when jsdService::Off() is invoked, it barfs because you're not supposed to call it mid-GC (although it doesn't really matter if you have no dead scripts lying around, so it lets you).

This fixes the JSD bookkeeping. An alternative, broader fix would be to have GC invoke the JSGC_END callback when a JSGC_BEGIN callback returns false, cancelling the GC. That might fix other callers with similar assumptions, though it's more of a behavior change than this patch.
Comment 8 Steve Fink [:sfink] [:s:] (PTO Sep23-28) 2011-01-21 09:22:12 PST
Created attachment 505830 [details]
Log of failure to turn on

Unfortunately, that patch does very little to help the test case.

What seems strange to me about the test case is that when it fails (or perhaps all the time?), I never see a call to jsdService::Off(). I don't know whether that's on the Firebug side or the JSD side.

Ah. Looks like it fails to turn off because it's already off; it failed to turn on first.

Ok, I'm attaching a log showing an example session where it failed to turn on. I added two bits of debugging output: first, I added a 0ms timer just after calling asyncOn in Firebug. Before its callback runs, JSD should have been turned on. If it isn't, it screams "PROBLEM SHOULD HAVE ACTIVATED!!!" You'll see that at the end of the log.

Second, I added a printout in nsXPConnect::Push that displays the "count", which I called depth, of the context stack before the push. It also says whether this is the main thread or not, because only the main thread counts. Debug mode will only be turned on when count reaches zero.

The log shows that count is not reaching zero. Could a Firebug person look at the portion of the log after the "PROBLEM" lines and before the "Push depth 0 main=1" line and tell me what was happening?

In brief, this appears to show that JSD is not turning on because the context stack doesn't drain, even though it was empty at the time asyncOn was called. Did Firebug somehow get stuff onto the stack after that? I don't know how that's possible, other than spinning a nested event loop, and that seems unlikely.

It's a little tricky to investigate, because it seems timing dependent: if I hang out on gdb breakpoints for any length of time, then it works. It's usually reproducible if I just add debugging printouts, though. I guess I should figure out how to dump out the stack or whatever it is that's being counted here.

(argh. forgot to click "submit" last night.)
Comment 9 Steve Fink [:sfink] [:s:] (PTO Sep23-28) 2011-01-21 09:30:23 PST
Comment on attachment 505696 [details] [diff] [review]
Take note of forced GC cancels

Transferring this to a new bug 627758 because the main jsd.asyncOn flakiness is separate.
Comment 10 John J. Barton 2011-01-21 10:14:05 PST
(In reply to comment #8)
...
> The log shows that count is not reaching zero. Could a Firebug person look at
> the portion of the log after the "PROBLEM" lines and before the "Push depth 0
> main=1" line and tell me what was happening?

I guess you are tracing with just ACTIVATION. We can know more with more tracing flags set.  Please set TESTCASE so we can see where you are.  Also if you open the tracing console you can click on the tracing messages and see that part of the JavaScript stack that Firefox chooses to show us, typically up to the event handler, sometimes across xpcom boundaries.

PanelActivation.showPanel; script
   We are just about to reveal the script panel 
fbs.showEvents true fbs.showEvals true
   A preference changed we are observing it.
shouldCreateContext hasAnnotation true for http://getfirebug.com/tests/content/script/3918/AsyncJSDPage.html 
   A new web page has arrived, we have determined that Firebug should be active on the page.


> 
> In brief, this appears to show that JSD is not turning on because the context
> stack doesn't drain, even though it was empty at the time asyncOn was called.

You must mean something else here. At the time jsd.asyncOn() is called, the stack can't be empty.

> Did Firebug somehow get stuff onto the stack after that? I don't know how
> that's possible, other than spinning a nested event loop, and that seems
> unlikely.

Firebug (JavaScript) has no direct way to add stuff to a stack that is empty.

One thing I thought of: maybe the tracing is trying to use jsd to get the call stack. But it seems we use Component.utils.stack.
Comment 11 Steve Fink [:sfink] [:s:] (PTO Sep23-28) 2011-01-21 10:39:49 PST
(In reply to comment #10)
> (In reply to comment #8)
> > In brief, this appears to show that JSD is not turning on because the context
> > stack doesn't drain, even though it was empty at the time asyncOn was called.
> 
> You must mean something else here. At the time jsd.asyncOn() is called, the
> stack can't be empty.

Sorry, I was being imprecise. I meant that just before an event was processed for which JS was invoked to call jsd.asyncOn(), the stack was empty. So after it returns, it should be empty again, unless it doesn't return for some reason -- eg, if it spins a nested event loop. But I doubt that it does.

> > Did Firebug somehow get stuff onto the stack after that? I don't know how
> > that's possible, other than spinning a nested event loop, and that seems
> > unlikely.
> 
> Firebug (JavaScript) has no direct way to add stuff to a stack that is empty.

I'm not talking about direct ways. I mean calling functions inside of which a timer expiration could be handled. (Again, a nested event loop is the obvious example, but I don't think it's that simple.)

> One thing I thought of: maybe the tracing is trying to use jsd to get the call
> stack. But it seems we use Component.utils.stack.

Hm.

Well, that was my state when I stopped. I threw the questions out hoping someone might have an obvious (to them) but miraculous (to me) insight. I'm not at all blocked; I'll start digging what's actually on the stack now. It's looking unlikely that I'll manage to fix this for beta10, though.
Comment 12 John J. Barton 2011-01-21 10:49:32 PST
(In reply to comment #11)
> (In reply to comment #10)
> > (In reply to comment #8)
> > > In brief, this appears to show that JSD is not turning on because the context
> > > stack doesn't drain, even though it was empty at the time asyncOn was called.
> > 
> > You must mean something else here. At the time jsd.asyncOn() is called, the
> > stack can't be empty.
> 
> Sorry, I was being imprecise. I meant that just before an event was processed
> for which JS was invoked to call jsd.asyncOn(), the stack was empty. So after
> it returns, it should be empty again, unless it doesn't return for some reason
> -- eg, if it spins a nested event loop. But I doubt that it does.

So if I understand you, the function onDebuggerActivated() is called on an empty stack and when it returns that same stack is magically not empty?

Or is there another stack involved here (we don't what you 'context' thing is)?
Comment 13 Steve Fink [:sfink] [:s:] (PTO Sep23-28) 2011-01-21 13:41:44 PST
jjb: ignore me for now, looks like I was getting confused by multiple threads.

The stack I'm referring to is actually a stack of contexts (no, I won't go into exactly what those are right now). And it turns out the stack wasn't at all empty when entering the JS that calls asyncOn() -- the empty stack turned out to be on a different thread.

So I have this funky XPCJSContextStack of contexts that is nonempty, and that's blocking debugging from getting turned on. I need to understand why, and what it is in the first place.
Comment 14 Steve Fink [:sfink] [:s:] (PTO Sep23-28) 2011-01-21 16:40:00 PST
Created attachment 506004 [details] [diff] [review]
Allow debugging to turn on when dormant contexts are on the XPCJSContextStack

I have weakened the condition required to activate debugging: instead of requiring the XPCJSContextStack to be completely empty, I allow it to contain contexts that are either NULL or dormant.

This fixes the asyncJSD test case for me.
Comment 15 David Mandelin [:dmandelin] 2011-01-21 16:50:03 PST
Comment on attachment 506004 [details] [diff] [review]
Allow debugging to turn on when dormant contexts are on the XPCJSContextStack

Nit: |notRunningJS| seems a more informative name to me than |checkDebug|. Up to you, though.
Comment 16 Steve Fink [:sfink] [:s:] (PTO Sep23-28) 2011-01-22 15:59:26 PST
You're right, notRunningJS is much better.

Unfortunately, this fix is not the end of things. Here's another failure mode:

 - debugging is turned on fully
 - jsd.off() is called. This disables all the callbacks and things, but only calls XPConnect's SetDebugModeWhenPossible, which sets a flag for processing when no JS is running (gDesiredDebugMode=false, but gDebugMode is still true)
 - several ::Push()es go by with running JS, so the off request stored in gDesiredDebugMode cannot be satisified.
 - debugging is requested to be activated again via jsdService::AsyncOn(). This will set gDesiredDebugMode=true.

Now, from XPConnect's standpoint, gDesiredDebugMode=gDebugMode=true, so there's nothing to be done. That means that CheckDebugMode will not be called, and it will never call ActivateDebugger(). Debugging ends up being half-on (mostly on, really, but jsd.isOn will return false.)

The problem is that jsd_xpc.cpp says turning debugging on is asynchronous, but turning it off is synchronous. nsXPConnect.cpp says both are asynchronous.

This is fixable by making XPConnect turn off debugging immediately. This will skip some scripts, because they're currently live, but any debugging hooks firing in those scripts will be ignored.

And yet... even after this fix, tests like firebug/openDisableEnableReload.js are still not working all the time. It's possible that the test is invalid (because it's keeping running JS alive across disable/enable/reload, which would never happen in actual usage), but I haven't looked closely enough at it to be sure. And it would be nice if it could be made to work anyway. Being testable is good.
Comment 17 Steve Fink [:sfink] [:s:] (PTO Sep23-28) 2011-01-24 13:28:30 PST
Created attachment 506500 [details] [diff] [review]
Part 2: jsd.off() should take effect immediately

Make XPConnect turn off debugging immediately, without waiting for a quiescent stack.

If any scripts are live when you turn off debugging, then they will remain in debugging mode until the next time they are recompiled. But no debugger functionality will be used, so it's just a temporary performance hit.

This fixes, somewhat inelegantly, the problem where turning debugging off then back on will end up with debugging only partly on if the stack does not quiesce in between.
Comment 18 Steve Fink [:sfink] [:s:] (PTO Sep23-28) 2011-01-24 13:33:50 PST
Honza -- I still get a failure in firebug/openEnableDisableReload.js even after these patches. I don't really understand the failure mode, though. The failure is in

"The script [panel?] should be marked on the Firebug Statusbar Icon."

which makes it seem like debugging is not getting turned back on properly still, but I notice that soon before that it checks

    FBTest.compare("false", FBTestFirebug.isPanelTabDisabled(panelName), "The "+panelName+" panel should be enabled");

and that passes. So the script panel is enabled, but it is not marked on the Firebug Statusbar Icon, whatever that means? Is that referring to the light gray/dark gray color scheme on the "Console/HTML/CSS/Script/DOM/Net" panel? Why is that an "Icon"?

Sorry, I'm not trying to be nitpicky, I'm just having trouble performing the test steps manually and checking the result to see if this is a difference between automated test mode and manual operation.
Comment 19 John J. Barton 2011-01-24 13:46:48 PST
(In reply to comment #18)
>  So the script panel is enabled, but it is not marked on the
> Firebug Statusbar Icon, whatever that means? Is that referring to the light
> gray/dark gray color scheme on the "Console/HTML/CSS/Script/DOM/Net" panel? Why
> is that an "Icon"?

The "Firebug Statusbar Icon" is in the bar at the bottom (formerly status bar). Its a square UI element with a image of a bug: thus "Icon".

The Icon has two visual states: colored and gray. Gray means "all panels that cause big overhead are off"; colored means "at least one is one". The tooltip breaks it down.

The state of the icon is driven by our event onJSDActivate which is called by fbs.onJSDebuggingActive() in our callback onDebuggerActivated.

So the "enable" UI event is a request that fires jsd.asyncOn(); The Icon state is a response, driven the the onDebuggerActivated event. 

If the icon does not say "On" for Script panel, we did not get the event (or we have a bug). I had to change this code for asyncOn, so be sure you are on SVN head and if you are still puzzled I'll take a look. (Tracing  ACTIVATION should show the events).
Comment 21 Steve Fink [:sfink] [:s:] (PTO Sep23-28) 2011-01-24 15:20:10 PST
Backed out due to Windows template compilation failure.

http://hg.mozilla.org/tracemonkey/rev/8f09ff778d69
http://hg.mozilla.org/tracemonkey/rev/c52e5d68a747
Comment 22 Chris Leary [:cdleary] (not checking bugmail) 2011-01-24 22:10:22 PST
cdleary-bot mozilla-central merge info:
http://hg.mozilla.org/mozilla-central/rev/b7e91bafdcd9
http://hg.mozilla.org/mozilla-central/rev/d54eac606dd7
http://hg.mozilla.org/mozilla-central/rev/c52e5d68a747 (backout)
http://hg.mozilla.org/mozilla-central/rev/8f09ff778d69 (backout)
Note: not marking as fixed because last changeset is a backout.
Comment 23 Steve Fink [:sfink] [:s:] (PTO Sep23-28) 2011-01-25 11:23:30 PST
The patch to make jsd.off() synchronous also triggers a crash on my test_callhooks mochitest, only when methodjit is enabled.

Seems like something is now getting recompiled when it shouldn't be. Either of the two patches on this bug could be at fault for that.

I really should get around to doing the jit support for gdb. A howling Jägermonkey ate my stack.
Comment 24 Steve Fink [:sfink] [:s:] (PTO Sep23-28) 2011-02-03 10:55:26 PST
(In reply to comment #18)
> Honza -- I still get a failure in firebug/openEnableDisableReload.js even after
> these patches. I don't really understand the failure mode, though.

(Replying to myself) The test had not been updated for the AsyncOn change. I submitted a change to FBTest which has been applied, and the test now succeeds. I think there are probably a number of other tests with the same problem.
Comment 25 Steve Fink [:sfink] [:s:] (PTO Sep23-28) 2011-02-03 11:13:10 PST
Created attachment 509496 [details] [diff] [review]
Allow debugging to turn on when dormant contexts are on the XPCJSContextStack

Sorry for the patch churn, but I have to update this patch to do the main thread detection, matching what I just did in bug 626743.

The other patch here on this bug does not change.
Comment 27 Chris Leary [:cdleary] (not checking bugmail) 2011-02-06 16:56:05 PST
cdleary-bot mozilla-central merge info:
http://hg.mozilla.org/mozilla-central/rev/b647ff21bfed
Comment 28 John J. Barton 2011-02-08 08:53:24 PST
Shortly after this landed our testbot began crashing, Bug 632343. Of course I don't know if these are related.

Note You need to log in before you can comment on or make changes to this bug.