Extremely frequent Linux32 debug mochitest-browser-chrome JS crashes during debugger tests

RESOLVED FIXED in mozilla20

Status

()

defect
--
blocker
RESOLVED FIXED
7 years ago
7 years ago

People

(Reporter: emorley, Assigned: past)

Tracking

Trunk
mozilla20
x86
Linux
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 6 obsolete attachments)

These have kept the tree closed most of the day (between this and bug 821685 it's been a bit of a waste-of-time day unfortunately).

These crashes seem to occur primarily on Linux32 debug, but are on some other platforms occasionally. Unfortunately this is our most coalesced (and resource-constrained) platform, so it has taken hours of retriggers to narrow this down. It is also green on Linux32 a small percentage of the time, which has screwed with finding the regression range :-(

https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=Rev3%20Fedora%2012%20mozilla-inbound%20debug%20test%20mochitest-browser-chrome&rev=b80437be0e70
(and press down once or twice)

Once per window private browsing become a candidate, looking at m-c shows that perhaps we got a few lucky greens there (otherwise the cause would have been much more clear):
https://tbpl.mozilla.org/?rev=b11065872128&jobname=Rev3%20Fedora%2012%20mozilla-central%20debug%20test%20mochitest-browser-chrome

I've backed out bug 818732 for now, but pending the m-c retriggers, we may also need to back out the 7 bug 769288 changesets too.

What's also pretty frustrating is that these crashes occurred on Birch too, but these changes were landed on mozilla-central regardless.

eg:
https://tbpl.mozilla.org/php/getParsedLog.php?id=17919454&tree=Birch
and also https://tbpl.mozilla.org/php/getParsedLog.php?id=17914544&tree=Birch (which was mis-starred as bug 821483 comment 1).
(In reply to Ed Morley [UTC+0; email:edmorley@moco] from comment #0)
> pending the m-c retriggers, we may
> also need to back out the 7 bug 769288 changesets too.

Retriggers taking an age, can't justify keeping mozilla-central and mozilla-inbound closed any longer, so backing out bug 769288 as well for now. If the retriggers come back green, it can just be relanded.

https://hg.mozilla.org/integration/mozilla-inbound/rev/f255406bb4aa
Waiting for a bit more green on both trees and will then reopen.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Fixed by the backout
Assignee: general → edmorley.bz
Technically I just disabled the test, didn't fix it.  :)

https://hg.mozilla.org/mozilla-central/rev/72cc10ffa8e2
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I was treating this more as a "tracking progress getting the tree reopened" (both for something to point at in tree status, and for keeping track time not spent on Q4 goals), so in that it's fixed. (We have umpteen other bugs filed on that test). But I'm not too fussed either way.
Assignee: edmorley.bz → general
For my own information, the backout was
https://hg.mozilla.org/mozilla-central/rev/72cc10ffa8e2

Reverting this should make the crashes reproduce.
Assignee: general → wmccloskey
It turns out these are all timeouts, not crashes. It only looks like a crash because when the test times out we intentionally crash to get a stack trace in the tinderbox long.

I looked at the test a little bit to see why it's taking so long. Here's a snippet of the JS code that's getting called:

  addDebuggee: function TA_addDebuggee(aGlobal) {
    try {
      this.dbg.addDebuggee(aGlobal);
    } catch (e) {
      // Ignore attempts to add the debugger's compartment as a debuggee.
      dumpn("Ignoring request to add the debugger's compartment as a debuggee");
    }
  },
  // ...
    findGlobals: function CDA_findGlobals() {
      // Fetch the list of globals from the debugger.
      for (let g of this.dbg.findAllGlobals()) {
        this.addDebuggee(g);
      }
    },

Someone is calling findGlobals, and it is calling addDebuggee on every global. Each call triggers a compartment GC, since we're transitioning the compartment to debug mode. When I ran the test, there were 169 compartments, so that's 169 compartment GCs. What makes this worse is that I added some DEBUG code a while back that looks for compartment mismatches on every GC. It traverses the entire heap to do this, even if it's only a compartment GC. So this one findGlobals call causes us to iterate over the entire Firefox heap 169 times.

It may be that this is not the only reason for the timeout, but it's definitely not helping. I think the right fix is to have a single call like addDebuggeesForAllGlobals. It would add them all in one go and then do a single non-compartmental GC.

Debugger dudes, what do you think?
I took a stab at implementing addDebuggeesForAllGlobals and I think this should have worked, but unfortunately it asserts shortly after the call returns:

Assertion failure: !cx->isExceptionPending(), at /Users/past/src/fx-team/js/src/jscntxtinlines.h:375

Someone who understands this code better than me may be able to fix it. Note, that this call is only used in the Browser Debugger, so to test the patch one just needs to start it from the Web Developer menu. Alternatively, running the chrome-debugging mochitest works, too.
We probably need to skip over the debugger's compartment itself. That might be why it's throwing an exception.
Doh, I thought about that, but I mistakenly assumed that addDebuggeeGlobal would just return false in that case. Adding a check for the debugger's compartment fixes the assertion and chrome debugging now works as before, but the test now fails with bug 707429.

More importantly though, from reading the code some more I think this patch doesn't actually do a single GC as you recommended, since each call to Debugger::addDebuggeeGlobal would call JSCompartment::addDebuggee and that eventually calls JSCompartment::updateForDebugMode, which schedules a GC.
Attachment #692760 - Attachment is obsolete: true
I figured out how to make it work and the browser debugger now attaches instantly. If we do the same optimization when removing all debuggees, the test should become pretty snappy.
Attachment #693291 - Attachment is obsolete: true
Attachment #693386 - Flags: review?(wmccloskey)
Attachment #693386 - Flags: review?(jimb)
I thought this would work for removeAllDebuggees, but sadly it asserts:

Assertion failure: !!script_->compartment()->debugMode() == !!originalDebugMode_, at /Users/past/src/fx-team/js/src/jsanalyze.cpp:2132
Comment on attachment 693386 [details] [diff] [review]
Introduce a new addDebuggeesForAllGlobals call v3

Review of attachment 693386 [details] [diff] [review]:
-----------------------------------------------------------------

It would be better to use js::AutoDebugModeGC instead of the nogc flag. The basic idea of this object is that you queue up compartments to GC during its lifetime, and then when its destructed the GC actually happens to those compartments. So you should be able to add an AutoDebugModeGC argument to addDebuggee and addDebuggeeGlobal and I think it should work.

Also, PrepareForFullGC doesn't actually do a GC. However, you shouldn't need that code at all if you use AutoDebugModeGC.
Attachment #693386 - Flags: review?(wmccloskey)
(In reply to Panos Astithas [:past] from comment #12)
> Created attachment 693472 [details] [diff] [review]
> Do a single GC in removeAllDebuggees
> 
> I thought this would work for removeAllDebuggees, but sadly it asserts:
> 
> Assertion failure: !!script_->compartment()->debugMode() ==
> !!originalDebugMode_, at /Users/past/src/fx-team/js/src/jsanalyze.cpp:2132

This assertion might be happening because, as in the previous patch, no GC is actually being done. I think the same AutoDebugModeGC stuff should work here too.
Assignee: wmccloskey → past
I think the real fix for this problem is to provide a way to identify globals, for use by chrome debugging, that isn't based on explicitly naming the globals that are debuggees. If instead, we could put off considering whether a global is a debuggee until some event of interest occurs in that global's scope, then there'd be no need to scan the heap creating possibly new references to globals, the way findAllGlobals does. Instead, we'd just mark globals that need special treatment as we create them, and let Debugger instances talk about the marks of the globals they want to debug.

This would replace the findAllGlobals and onNewGlobalObject facilities.
Thank you Bill, your comments were very helpful for me to understand how this code works. With your suggested changes, both adding all globals as debuggees and removing all debuggees are almost instantaneous, which gets the chrome-debugging test to finish in about 30 seconds in my laptop. I combined both patches into one and removed the increased timeout from the test, since it now runs blazingly fast. If I haven't done something dumb like skip GC altogether again, then this change makes a huge difference.

Jim, I don't think I'm qualified to implement the extensive changes that you described, but how do you feel about something like this for now?
Attachment #693386 - Attachment is obsolete: true
Attachment #693472 - Attachment is obsolete: true
Attachment #693386 - Flags: review?(jimb)
Attachment #694030 - Flags: review?(wmccloskey)
Attachment #694030 - Flags: review?(jimb)
Comment on attachment 694030 [details] [diff] [review]
Introduce a new addDebuggeesForAllGlobals call v4

Review of attachment 694030 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jscompartment.cpp
@@ +892,5 @@
>      debugModeBits |= DebugFromJS;
>      if (!wasEnabled) {
> +        if (dmgc == NULL) {
> +            AutoDebugModeGC newDmgc(cx->runtime);
> +            dmgc = &newDmgc;

Unfortunately this is unsafe in C++. newDmgc will be destroyed at the end of this block, but dmgc will continue to point to it. Any use of dmgc after that will cause undefined behavior (possibly a crash or a security issue).

I would recommend creating two versions of addDebuggee. One of them would be passed an |AutoDebugModeGC &| parameter and it would look pretty much like this, minus the |dmgc == NULL| check. The other one would look like this:

bool
JSCompartment::addDebuggee(JSContext *cx, js::GlobalObject *global)
{
    AutoDebugModeGC dmgc(cx->runtime);
    addDebuggee(cx, global, dmgc);
}
Attachment #694030 - Flags: review?(wmccloskey)
Added twin versions for addDebuggee, removeDebuggee and friends as you suggested. Still seems to work fine (and fast) in all my testing.
Attachment #694030 - Attachment is obsolete: true
Attachment #694030 - Flags: review?(jimb)
Attachment #694396 - Flags: review?(wmccloskey)
Attachment #694396 - Flags: review?(jimb)
Comment on attachment 694396 [details] [diff] [review]
Introduce a new addDebuggeesForAllGlobals call v5

Review of attachment 694396 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good; just some minor changes I'd like to see.

::: js/src/jit-test/tests/debug/Debugger-debuggees-20.js
@@ +10,5 @@
> +
> +var g1w = dbg.addDebuggee(g1);
> +assertEq(dbg.addDebuggeesForAllGlobals(), undefined);
> +
> +var a = dbg.getDebuggees();

We're about to remove getDebuggees, for a bunch of reasons. Could you rewrite this test in terms of 'hasDebuggee'? (I think it will actually be shorter.)

::: js/src/vm/Debugger.cpp
@@ +2136,5 @@
> +void
> +Debugger::removeDebuggeeGlobal(FreeOp *fop, GlobalObject *global,
> +                               GlobalObjectSet::Enum *compartmentEnum,
> +                               GlobalObjectSet::Enum *debugEnum,
> +                               AutoDebugModeGC &dmgc)

The other functions consistently place the AutoDebugModeGC argument after the global being added/removed, and before any "clear-me" Enum pointers. Could we do the same here?

@@ +2577,5 @@
>  };
>  
>  JSFunctionSpec Debugger::methods[] = {
>      JS_FN("addDebuggee", Debugger::addDebuggee, 1, 0),
> +    JS_FN("addDebuggeesForAllGlobals", Debugger::addDebuggeesForAllGlobals, 0, 0),

I understand that this name is meant to be parsed as:

  'addDebuggee' for all globals

but I always parse it as:

  add debuggees for all globals

which is weird because it makes it seem like a 'debuggee' is something associated with a global, not something that a global can be.

Could we rename it to 'addAllGlobalsAsDebuggees'?
Attachment #694396 - Flags: review?(jimb) → review+
Blocks: 825960
I think this means we have no need for findAllGlobals; filed as follow-up bug 825960.
Updated patch per review comments. I'll push to try after the tree is open again.
Attachment #694396 - Attachment is obsolete: true
Attachment #694396 - Flags: review?(wmccloskey)
Try was green. Landed in fx-team:
https://hg.mozilla.org/integration/fx-team/rev/2fce7807dc84
Whiteboard: [fixed-in-fx-team]
https://hg.mozilla.org/mozilla-central/rev/2fce7807dc84
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Whiteboard: [fixed-in-fx-team]
Target Milestone: --- → mozilla20
You need to log in before you can comment on or make changes to this bug.