Closed Bug 626830 Opened 13 years ago Closed 13 years ago

jsd.asyncOn fails

Categories

(Core :: XPConnect, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: johnjbarton, Assigned: sfink)

References

Details

(Whiteboard: [firebug-p1][hardblocker][fixed-in-tracemonkey][has patch])

Attachments

(3 files, 2 obsolete files)

Although our actual use of Firebug with jsd.asyncOn() seems to be successful in a few cases so far, I am not able to get our FBtest case turning jsd off and on to work.

My current code looks like:

            if (!jsd.isOn)
            {
                var startAsyncOn = new Date().getTime();
                if (FBTrace.DBG_ACTIVATION)
                    FBTrace.sysout("firebug-service.activation begin jsd.asyncOn "+startAsyncOn);
                jsd.asyncOn(  // turn on jsd for the next event
                {
                    onDebuggerActivated: function doDebuggerActivated()
                    {
                        // now we are in the next event and jsd is on.
                        var nowAsyncOn = new Date().getTime();
                        if (FBTrace.DBG_ACTIVATION)
                            FBTrace.sysout("firebug-service.activation now we are in the next event and JSD is ON "+nowAsyncOn+" delta: "+(nowAsyncOn - startAsyncOn)+"ms" );
                        fbs.onDebuggerActivated();
                        fbs.onJSDebuggingActive();
                    }
                });
            }

After returning from this code I run a setTimeout with a 5000ms delay. My tracing messages show the begin jsd.asyncOn, the pause for 5000ms, then the "now we are in the next event" message with a delta of 228ms. 

I can't explain this result.

However this bizarre result is not the only result. Sometimes the "now we are in" message never comes out.  Sometimes I get an XPCOM exception (NS_ERROR_NOT_AVAILABLE) [jsdIDebuggerService.off, sometimes I get
(NS_ERROR_NOT_INITIALIZED) [jsdIDebuggerService.pause]. I think eventually my code stops trying to start jsd because it thinks it is started.

I also see the XPCOM exceptions at other times, outside of the test.

I created a new test just with the script panel. This is *not* a reduced test case, but it is what I have:

1. install firebug svn (to get the same tracing messages)
2, install fbtest svn
3. run fbtest case Script section script/3918/AsyncOn.js

If you open Firebug tracing you can see the time stamps.  If you apply the steps manually, the test fails.  

I don't understand why it fails here and not on other cases.
See also Bug 595243 Expose debugMode to JSD
Whiteboard: [firebug-p1]
Firebug JS debug cannot be reliable without a fix or workaround here.
blocking2.0: --- → ?
It is possible that this is due to early breakage from bug 626743; perhaps the compartment is getting reset in some scenarios. I'll deal with that first.
blocking2.0: ? → betaN+
Depends on: 626743
Whiteboard: [firebug-p1] → [firebug-p1][hardblocker]
Assignee: nobody → sphink
Nope, my WIP patch for bug 626743 does not appear to help here.

(In reply to comment #0)
> My current code looks like:
> ...
> After returning from this code I run a setTimeout with a 5000ms delay. My
> tracing messages show the begin jsd.asyncOn, the pause for 5000ms, then the
> "now we are in the next event" message with a delta of 228ms. 
> 
> I can't explain this result.

Which part?

You wait 5000ms between them because jsd.asyncOn just sets things up so that debugging will be turned on when the next event is processed, but it doesn't do anything to queue up an event itself. I assume if you were to move the mouse or set up a timeout or something, that it would happen sooner.

As for the human-experienced time being 5sec and the delta reporting being far less, I'd say either the browser was traveling near the speed of light relative to you, or there's something very odd going on. When I was doing the test manually, though, I didn't see that -- my delta was getting reported as very large.

Does the current fbtest have the 5sec delay in it? Doesn't seem to. I guess I could try putting one in myself.

> However this bizarre result is not the only result. Sometimes the "now we are
> in" message never comes out.  Sometimes I get an XPCOM exception
> (NS_ERROR_NOT_AVAILABLE) [jsdIDebuggerService.off, sometimes I get
> (NS_ERROR_NOT_INITIALIZED) [jsdIDebuggerService.pause]. I think eventually my
> code stops trying to start jsd because it thinks it is started.

I'm seeing these too, though it's hard to untangle what's going on and when.

It makes sense, though, if the debugger never managed to activate, but you started attempting to use it anyway. That'd result in the NS_ERROR_NOT_INITIALIZED. NS_ERROR_NOT_AVAILABLE looks like it happens if you had previously turned it off and it hasn't done a GC run to clean up dead scripts. Or something like that. Looks tricky; I'll need to talk to dmandelin about it.

> I also see the XPCOM exceptions at other times, outside of the test.
> 
> I created a new test just with the script panel. This is *not* a reduced test
> case, but it is what I have:
> 
> 1. install firebug svn (to get the same tracing messages)
> 2, install fbtest svn
> 3. run fbtest case Script section script/3918/AsyncOn.js
> 
> If you open Firebug tracing you can see the time stamps.  If you apply the
> steps manually, the test fails.
> 
> I don't understand why it fails here and not on other cases.

It does seem very erratic. If I run the test through the test console multiple times, it fails most of the time but occasionally succeeds.

Will look further...
I've found one thing that legitimately seems wrong: the jsd.off() frequently fails because it has some dead scripts queued up and we're supposedly in the middle of a GC pass. We shouldn't really be able to run the JS code that calls jsd.off() in that case, so there's something wrong with the interlocking setup. I'm looking into it, but this does make me realize that this test is dependent on jsd.off() working. The comments in bug 595243 didn't sound too confident that it'd work. I'm wondering if the priority of jsd.off() is a little lower than eg bug 626909, and could potentially be postponed past beta10?

How reliant is the test suite on jsd.off() working?

I don't know if that ends up causing the other errors mentioned here.

The jsd.off() error will manifest itself as the NS_ERROR_NOT_AVAILABLE return value.
(In reply to comment #5)
> How reliant is the test suite on jsd.off() working?
> 

The real question could be: how reliant is Firebug on jsd.off() working? Answer: a lot ;-). That is how we stop debugging every page the users looks at.
Attached patch Take note of forced GC cancels (obsolete) — Splinter Review
The problem where jsd.off() returning NS_ERROR_NOT_AVAILABLE happens when you have some dead scripts, but GC callback cancels the GC run for some evil reason. JSD only remembers that it got a JSGC_BEGIN callback, and assumes that GC is still happening, so when jsdService::Off() is invoked, it barfs because you're not supposed to call it mid-GC (although it doesn't really matter if you have no dead scripts lying around, so it lets you).

This fixes the JSD bookkeeping. An alternative, broader fix would be to have GC invoke the JSGC_END callback when a JSGC_BEGIN callback returns false, cancelling the GC. That might fix other callers with similar assumptions, though it's more of a behavior change than this patch.
Attachment #505696 - Flags: review?(dmandelin)
Unfortunately, that patch does very little to help the test case.

What seems strange to me about the test case is that when it fails (or perhaps all the time?), I never see a call to jsdService::Off(). I don't know whether that's on the Firebug side or the JSD side.

Ah. Looks like it fails to turn off because it's already off; it failed to turn on first.

Ok, I'm attaching a log showing an example session where it failed to turn on. I added two bits of debugging output: first, I added a 0ms timer just after calling asyncOn in Firebug. Before its callback runs, JSD should have been turned on. If it isn't, it screams "PROBLEM SHOULD HAVE ACTIVATED!!!" You'll see that at the end of the log.

Second, I added a printout in nsXPConnect::Push that displays the "count", which I called depth, of the context stack before the push. It also says whether this is the main thread or not, because only the main thread counts. Debug mode will only be turned on when count reaches zero.

The log shows that count is not reaching zero. Could a Firebug person look at the portion of the log after the "PROBLEM" lines and before the "Push depth 0 main=1" line and tell me what was happening?

In brief, this appears to show that JSD is not turning on because the context stack doesn't drain, even though it was empty at the time asyncOn was called. Did Firebug somehow get stuff onto the stack after that? I don't know how that's possible, other than spinning a nested event loop, and that seems unlikely.

It's a little tricky to investigate, because it seems timing dependent: if I hang out on gdb breakpoints for any length of time, then it works. It's usually reproducible if I just add debugging printouts, though. I guess I should figure out how to dump out the stack or whatever it is that's being counted here.

(argh. forgot to click "submit" last night.)
Depends on: 627758
Comment on attachment 505696 [details] [diff] [review]
Take note of forced GC cancels

Transferring this to a new bug 627758 because the main jsd.asyncOn flakiness is separate.
Attachment #505696 - Attachment is obsolete: true
Attachment #505696 - Flags: review?(dmandelin)
(In reply to comment #8)
...
> The log shows that count is not reaching zero. Could a Firebug person look at
> the portion of the log after the "PROBLEM" lines and before the "Push depth 0
> main=1" line and tell me what was happening?

I guess you are tracing with just ACTIVATION. We can know more with more tracing flags set.  Please set TESTCASE so we can see where you are.  Also if you open the tracing console you can click on the tracing messages and see that part of the JavaScript stack that Firefox chooses to show us, typically up to the event handler, sometimes across xpcom boundaries.

PanelActivation.showPanel; script
   We are just about to reveal the script panel 
fbs.showEvents true fbs.showEvals true
   A preference changed we are observing it.
shouldCreateContext hasAnnotation true for http://getfirebug.com/tests/content/script/3918/AsyncJSDPage.html 
   A new web page has arrived, we have determined that Firebug should be active on the page.


> 
> In brief, this appears to show that JSD is not turning on because the context
> stack doesn't drain, even though it was empty at the time asyncOn was called.

You must mean something else here. At the time jsd.asyncOn() is called, the stack can't be empty.

> Did Firebug somehow get stuff onto the stack after that? I don't know how
> that's possible, other than spinning a nested event loop, and that seems
> unlikely.

Firebug (JavaScript) has no direct way to add stuff to a stack that is empty.

One thing I thought of: maybe the tracing is trying to use jsd to get the call stack. But it seems we use Component.utils.stack.
(In reply to comment #10)
> (In reply to comment #8)
> > In brief, this appears to show that JSD is not turning on because the context
> > stack doesn't drain, even though it was empty at the time asyncOn was called.
> 
> You must mean something else here. At the time jsd.asyncOn() is called, the
> stack can't be empty.

Sorry, I was being imprecise. I meant that just before an event was processed for which JS was invoked to call jsd.asyncOn(), the stack was empty. So after it returns, it should be empty again, unless it doesn't return for some reason -- eg, if it spins a nested event loop. But I doubt that it does.

> > Did Firebug somehow get stuff onto the stack after that? I don't know how
> > that's possible, other than spinning a nested event loop, and that seems
> > unlikely.
> 
> Firebug (JavaScript) has no direct way to add stuff to a stack that is empty.

I'm not talking about direct ways. I mean calling functions inside of which a timer expiration could be handled. (Again, a nested event loop is the obvious example, but I don't think it's that simple.)

> One thing I thought of: maybe the tracing is trying to use jsd to get the call
> stack. But it seems we use Component.utils.stack.

Hm.

Well, that was my state when I stopped. I threw the questions out hoping someone might have an obvious (to them) but miraculous (to me) insight. I'm not at all blocked; I'll start digging what's actually on the stack now. It's looking unlikely that I'll manage to fix this for beta10, though.
(In reply to comment #11)
> (In reply to comment #10)
> > (In reply to comment #8)
> > > In brief, this appears to show that JSD is not turning on because the context
> > > stack doesn't drain, even though it was empty at the time asyncOn was called.
> > 
> > You must mean something else here. At the time jsd.asyncOn() is called, the
> > stack can't be empty.
> 
> Sorry, I was being imprecise. I meant that just before an event was processed
> for which JS was invoked to call jsd.asyncOn(), the stack was empty. So after
> it returns, it should be empty again, unless it doesn't return for some reason
> -- eg, if it spins a nested event loop. But I doubt that it does.

So if I understand you, the function onDebuggerActivated() is called on an empty stack and when it returns that same stack is magically not empty?

Or is there another stack involved here (we don't what you 'context' thing is)?
jjb: ignore me for now, looks like I was getting confused by multiple threads.

The stack I'm referring to is actually a stack of contexts (no, I won't go into exactly what those are right now). And it turns out the stack wasn't at all empty when entering the JS that calls asyncOn() -- the empty stack turned out to be on a different thread.

So I have this funky XPCJSContextStack of contexts that is nonempty, and that's blocking debugging from getting turned on. I need to understand why, and what it is in the first place.
I have weakened the condition required to activate debugging: instead of requiring the XPCJSContextStack to be completely empty, I allow it to contain contexts that are either NULL or dormant.

This fixes the asyncJSD test case for me.
Attachment #506004 - Flags: review?(dmandelin)
Comment on attachment 506004 [details] [diff] [review]
Allow debugging to turn on when dormant contexts are on the XPCJSContextStack

Nit: |notRunningJS| seems a more informative name to me than |checkDebug|. Up to you, though.
Attachment #506004 - Flags: review?(dmandelin) → review+
You're right, notRunningJS is much better.

Unfortunately, this fix is not the end of things. Here's another failure mode:

 - debugging is turned on fully
 - jsd.off() is called. This disables all the callbacks and things, but only calls XPConnect's SetDebugModeWhenPossible, which sets a flag for processing when no JS is running (gDesiredDebugMode=false, but gDebugMode is still true)
 - several ::Push()es go by with running JS, so the off request stored in gDesiredDebugMode cannot be satisified.
 - debugging is requested to be activated again via jsdService::AsyncOn(). This will set gDesiredDebugMode=true.

Now, from XPConnect's standpoint, gDesiredDebugMode=gDebugMode=true, so there's nothing to be done. That means that CheckDebugMode will not be called, and it will never call ActivateDebugger(). Debugging ends up being half-on (mostly on, really, but jsd.isOn will return false.)

The problem is that jsd_xpc.cpp says turning debugging on is asynchronous, but turning it off is synchronous. nsXPConnect.cpp says both are asynchronous.

This is fixable by making XPConnect turn off debugging immediately. This will skip some scripts, because they're currently live, but any debugging hooks firing in those scripts will be ignored.

And yet... even after this fix, tests like firebug/openDisableEnableReload.js are still not working all the time. It's possible that the test is invalid (because it's keeping running JS alive across disable/enable/reload, which would never happen in actual usage), but I haven't looked closely enough at it to be sure. And it would be nice if it could be made to work anyway. Being testable is good.
Make XPConnect turn off debugging immediately, without waiting for a quiescent stack.

If any scripts are live when you turn off debugging, then they will remain in debugging mode until the next time they are recompiled. But no debugger functionality will be used, so it's just a temporary performance hit.

This fixes, somewhat inelegantly, the problem where turning debugging off then back on will end up with debugging only partly on if the stack does not quiesce in between.
Attachment #506500 - Flags: review?(dmandelin)
Honza -- I still get a failure in firebug/openEnableDisableReload.js even after these patches. I don't really understand the failure mode, though. The failure is in

"The script [panel?] should be marked on the Firebug Statusbar Icon."

which makes it seem like debugging is not getting turned back on properly still, but I notice that soon before that it checks

    FBTest.compare("false", FBTestFirebug.isPanelTabDisabled(panelName), "The "+panelName+" panel should be enabled");

and that passes. So the script panel is enabled, but it is not marked on the Firebug Statusbar Icon, whatever that means? Is that referring to the light gray/dark gray color scheme on the "Console/HTML/CSS/Script/DOM/Net" panel? Why is that an "Icon"?

Sorry, I'm not trying to be nitpicky, I'm just having trouble performing the test steps manually and checking the result to see if this is a difference between automated test mode and manual operation.
Attachment #506500 - Flags: review?(dmandelin) → review+
(In reply to comment #18)
>  So the script panel is enabled, but it is not marked on the
> Firebug Statusbar Icon, whatever that means? Is that referring to the light
> gray/dark gray color scheme on the "Console/HTML/CSS/Script/DOM/Net" panel? Why
> is that an "Icon"?

The "Firebug Statusbar Icon" is in the bar at the bottom (formerly status bar). Its a square UI element with a image of a bug: thus "Icon".

The Icon has two visual states: colored and gray. Gray means "all panels that cause big overhead are off"; colored means "at least one is one". The tooltip breaks it down.

The state of the icon is driven by our event onJSDActivate which is called by fbs.onJSDebuggingActive() in our callback onDebuggerActivated.

So the "enable" UI event is a request that fires jsd.asyncOn(); The Icon state is a response, driven the the onDebuggerActivated event. 

If the icon does not say "On" for Script panel, we did not get the event (or we have a bug). I had to change this code for asyncOn, so be sure you are on SVN head and if you are still puzzled I'll take a look. (Tracing  ACTIVATION should show the events).
http://hg.mozilla.org/tracemonkey/rev/b7e91bafdcd9
http://hg.mozilla.org/tracemonkey/rev/d54eac606dd7
Status: NEW → ASSIGNED
Whiteboard: [firebug-p1][hardblocker] → [firebug-p1][hardblocker][fixed-in-tracemonkey]
Backed out due to Windows template compilation failure.

http://hg.mozilla.org/tracemonkey/rev/8f09ff778d69
http://hg.mozilla.org/tracemonkey/rev/c52e5d68a747
Whiteboard: [firebug-p1][hardblocker][fixed-in-tracemonkey] → [firebug-p1][hardblocker]
The patch to make jsd.off() synchronous also triggers a crash on my test_callhooks mochitest, only when methodjit is enabled.

Seems like something is now getting recompiled when it shouldn't be. Either of the two patches on this bug could be at fault for that.

I really should get around to doing the jit support for gdb. A howling Jägermonkey ate my stack.
Component: JavaScript Debugging APIs → XPConnect
QA Contact: jsd → xpconnect
(In reply to comment #18)
> Honza -- I still get a failure in firebug/openEnableDisableReload.js even after
> these patches. I don't really understand the failure mode, though.

(Replying to myself) The test had not been updated for the AsyncOn change. I submitted a change to FBTest which has been applied, and the test now succeeds. I think there are probably a number of other tests with the same problem.
Sorry for the patch churn, but I have to update this patch to do the main thread detection, matching what I just did in bug 626743.

The other patch here on this bug does not change.
Attachment #506004 - Attachment is obsolete: true
Attachment #509496 - Flags: review?(dmandelin)
Attachment #509496 - Flags: review?(dmandelin) → review+
http://hg.mozilla.org/tracemonkey/rev/fa29bac3f858
http://hg.mozilla.org/tracemonkey/rev/b647ff21bfed
Whiteboard: [firebug-p1][hardblocker] → [firebug-p1][hardblocker][fixed-in-tracemonkey]
Whiteboard: [firebug-p1][hardblocker][fixed-in-tracemonkey] → [firebug-p1][hardblocker][fixed-in-tracemonkey][has patch]
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Shortly after this landed our testbot began crashing, Bug 632343. Of course I don't know if these are related.
Depends on: 632343
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: