Closed Bug 642801 Opened 14 years ago Closed 3 years ago

Firefox 4.0 Crash Report [@ SelectorMatches ] with Firebug

Categories

(Core :: JavaScript Engine, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: Honza, Unassigned)

References

Details

(Whiteboard: [firebug-p1])

Firebug Test bot is experiencing crash when running on Darwin OSX. See: http://getfirebug.com/testresults Starting since March 18: http://getfirebug.com/testresults?headerid=94e840e10d631000fb206658b8650b8d STR: 1) Install Firefox RC1, Firebug 1.7b3, http://getfirebug.com/releases/firebug/1.7X/firebug-1.7X.0b3.xpi 2) Install Firebug test harness, FBTest 1.7b14, http://getfirebug.com/releases/fbtest/1.7/fbTest-1.7b14.xpi 3) Run Firefox, open Firebug test console (Firebug menu -> Open Test Console) 4) Press "Run All" and wait for crash. Works on Windows Vista. Here is a crash report (Mac OSX Server 10.6.2, VMware Player): http://getfirebug.com/testresults?headerid=94e840e10d631000fb206658b8650b8d Honza
Whiteboard: [firebug-p1]
Why is this in the HTML: Parser component?
I found https://bugzilla.mozilla.org/show_bug.cgi?id=568460, which seems to be about the same crash and is in HTML: Parser component. Honza
(In reply to comment #2) > I found https://bugzilla.mozilla.org/show_bug.cgi?id=568460, which seems to be > about the same crash and is in HTML: Parser component. That one is about the style system running out of stack space with deep trees. The stack space has been at least historically more limited on Windows that on Mac, so crashing on Mac but not on Vista suggests that this bug has a different cause. It would help to have a stack trace and to know what the test case attempts to do.
Component: HTML: Parser → Style System (CSS)
QA Contact: parser → style-system
Sorry I pasted a wrong link in comment #0: Here is the crash report (Mac OSX Server 10.6.2, VMware Player): https://crash-stats.mozilla.com/report/index/bp-5a95d82c-687a-4e63-959c-fdbb52110318 Honza
That last shows us crashing at address 0x40 on this line: 1671 if ((kNameSpaceID_Unknown != aSelector->mNameSpace && 0x40 == 64 is the offset of mNameSpace in nsCSSSelector on a 64-bit system. Which means that aSelector is null. Which means that the mSelector of a RuleValue is null.... Which is very odd.
What's the regression range for this starting?
It should be pretty new (couple of days max, I guess) Could anybody please find the regression range? (use, str from comment #0) I have only VWware in Windows, which is pretty slow. Honza
So far I'm crashing in m-c nightly builds back to March 1.
Oh, and the Firebug tests die very very early on with fatal asserts in the JS engine in a debug build; way before they get to this crash.
OK, if I work around the js assert thing, I get this assertion twice before the crash: ###!!! ASSERTION: Want to fire mutation events, but it's not safe: '(aNode->IsNodeOfType(nsINode::eCONTENT) && static_cast<nsIContent*>(aNode)-> IsInNativeAnonymousSubtree()) || sScriptBlockerCount == sRemovableScriptBlockerCount', file /Users/bzbarsky/mozilla/vanilla/mozilla/content/base/src/nsContentUtils.cpp, line 3631 For me the aSelector in SelectorMatches just points to garbage. In fact, everything in |value| in ContentEnumFunc is garbage: the address mRule points to has a PermanentAtomImpl vtable at it, and mIndex is 563009488
That mutation event assert happens with this stack: #1 0x00623f8a in nsContentUtils::HasMutationListeners (aNode=0x215d40c0, aType=32, aTargetForSubtreeModified=0x215d40c0) at /Users/bzbarsky/mozilla/vanilla/mozilla/content/base/src/nsContentUtils.cpp:3627 .... #5 0x00ccb7d1 in nsXULElement::SetAttribute (this=0x215d40c0, name=@0xbfff88ec, value=@0xbfff88dc) at nsXULElement.h:561 ... #8 0x04493fe3 in js::Interpret () at /Users/bzbarsky/mozilla/vanilla/mozilla/js/src/jsinterp.cpp:4799 ... #22 0x01291513 in jsdService::ActivateDebugger (this=0x2159da10, rt=0x50f2c00) at /Users/bzbarsky/mozilla/vanilla/mozilla/js/jsd/jsd_xpc.cpp:2620 #23 0x00f3e1ac in nsXPConnect::CheckForDebugMode () at /Users/bzbarsky/mozilla/vanilla/mozilla/js/src/xpconnect/src/nsXPConnect.cpp:2591 #24 0x00f34b02 in nsXPConnect::Push (this=0x4f2ca10, cx=0x0) at /Users/bzbarsky/mozilla/vanilla/mozilla/js/src/xpconnect/src/nsXPConnect.cpp:2651 #25 0x00625e7b in nsCxPusher::DoPush (this=0xbfffb0d8, cx=0x0) at /Users/bzbarsky/mozilla/vanilla/mozilla/content/base/src/nsContentUtils.cpp:3001 #26 0x00625ee5 in nsCxPusher::PushNull (this=0xbfffb0d8) at /Users/bzbarsky/mozilla/vanilla/mozilla/content/base/src/nsContentUtils.cpp:3017 #27 0x007e27c3 in nsTextEditorState::GetValue (this=0x21a3c3b0, aValue=@0xbfffb144, aIgnoreWrap=1) at /Users/bzbarsky/mozilla/vanilla/mozilla/content/html/content/src/nsTextEditorState.cpp:1710 #28 0x00819c76 in nsHTMLInputElement::IntrinsicState (this=0x21a3c230) at /Users/bzbarsky/mozilla/vanilla/mozilla/content/html/content/src/nsHTMLInputElement.cpp:3375 ... #31 0x004c8da6 in RuleProcessorData::IsLink (this=0xbfffb5ac) at /Users/bzbarsky/mozilla/vanilla/mozilla/layout/style/nsCSSRuleProcessor.cpp:1263 So the upshot is that we're in style resolution, we try to get the intrinsic state of the element, it's got an editor so we do the PushNull thing, that calls nsXPConnect::Push, which goes ahead and starts jsd, since gDesiredDebugMode is 1 while gDebugMode is 0. And that calls the onDebuggerActivated callback, which looks like it goes and mutates the DOM, and then you lose. Calling out into arbitrary JS from inside a js context stack push seems like a _really_ bad thing. Especially if null is being pushed. None of the callers expect that to happen. Note that none of the Gecko side code here has changed since January or so.
Component: Style System (CSS) → JavaScript Debugging/Profiling APIs
QA Contact: style-system → jsd
Firebug may be able to work around this by doing nothing inside onDebuggerActivated other than a setTimeout call to do the real work...
The fundamental change that added the "start the debugger when Push() is called" stuff that needs to not happen imo is bug 595243 back in October.
Blocks: 595243
I wonder if another way of papering over this issue be to check for an empty stack after Pop(). It used to, but bug 595243 comment 40 recommended only doing it in Push() because Push/Pop always happen consecutively. That's true for every case except the initial one, which is the Push() that caused debugging to be requested. So if that Push()'s Pop() checked the stack after it popped, we *might* be able to turn on debugging immediately in enough cases to hide this problem. It's ugly and it may not work, but we're still hoping that the type inference branch will be able to make debugger activation synchronous again. Alternatively, could we do bz's comment 12 automatically? As in, do the gDesiredDebugMode != gDebugMode test, and if it passes, queue up an event to do the actual activation? I guess it would need to jump the queue (as in, be inserted at the beginning rather than the end) to prevent it from getting processed in somebody's nested event loop with new JS on the stack. It seems like for right now putting the setTimeout workaround into Firebug is the most workable.
What I don't understand is what this code is doing. _Why_ is it starting jsd when a JSContext is pushed on the stack? Especially when _null_ is pushed?
Turning on debug mode requires recompiling all JM-compiled scripts, and you can't do that if there are frames on the JS stack that will be returned into. So it finds a time when there is no JS running, and then does the switch by throwing out all JM-compiled scripts so they'll get recompiled into debug mode when they are next invoked. So the point is not that something getting pushed onto the stack is a good time, it's that just *before* something gets pushed onto the stack is a good time to check whether the stack is empty.
OK, I see. 1) That is NOT a good time to call into arbitrary debugger code! Can that call happen async or something? 2) Why are we doing this when null is being pushed?
(In reply to comment #17) > OK, I see. > > 1) That is NOT a good time to call into arbitrary debugger code! Can that call > happen async or something? From the debugger's point of view, this is already async, so making it "more" async won't break anything. You'd probably know how to do this better than I would, though. Is there any issue with putting it onto the event queue from XPConnect? And is there a way to put it at the head of the queue, to minimize cases where the event gets handled by a nested event loop? Or is that even a concern? > 2) Why are we doing this when null is being pushed? Probably because none of us who touched this code really understands the XPConnect stack. What does it mean for a null to be pushed, and how is that different from something else being pushed?
> Is there any issue with putting it onto the event queue from XPConnect? No. By the way, note that as far as I can tell the current code is broken in another way. Push() can happen on any thread, and it looks like it will call into the debugger code on that same thread. Presumably you want to just post an event to the main thread here to do at least the onDebuggerActivated call. As things stand, onDebuggerActivated can happen on arbitrary threads, which is _really_ bad news given what Firebug appears to do in that callback. It's probably even bad news for just calling setTimeout, since that involves messing with a Window's internal state... > is there a way to put it at the head of the queue Not really. I'm not sure whether this is a concern. The debugger part of this code I don't really understand; if you do I'd love either a pointer to documentation about it or an explanation of what it's trying to do. > Probably because none of us who touched this code really understands the > XPConnect stack. This is what we have code review for! And yes, the fact that no one who understands this code was involved in the reviews is a problem. :( When null is being pushed, that means some C++ code is about to do some operation that it wants to not be subject to security checks, so it wants to make it look like there is no JS currently running, no matter whether some is actually running or not. The callstack above, for example, is getting the value of a form control for internal use, so it pushes null to make sure it can get the value. There are very definitely callsites that push null that can't handle arbitrary JS running (the one above is one of them, but there are plenty of others). When something non-null is being pushed what that means ... depends. Sometimes it's done to make sure that a particular principal will show up as the "subject principal" for security checks. Sometimes it's done because we're about to run some script on the context being pushed. If stuff happens synchronously here, we'd need to audit the callers to see whether they can handle script running. But it sounds like we should be able to make this async.
(In reply to comment #19) > By the way, note that as far as I can tell the current code is broken in > another way. Push() can happen on any thread, and it looks like it will call > into the debugger code on that same thread. Presumably you want to just post > an event to the main thread here to do at least the onDebuggerActivated call. It'll only happen on the main thread: if (gDebugMode != gDesiredDebugMode && NS_IsMainThread()) { ^^^^^^^^^^^^^^^^^ > > is there a way to put it at the head of the queue > > Not really. I'm not sure whether this is a concern. The debugger part of this > code I don't really understand; if you do I'd love either a pointer to > documentation about it or an explanation of what it's trying to do. Which part? The firebug part (the main but not quite only user of this interface)? Or the JSD part? Firebug sets up some filters to control what it'll get callbacks for, and registers some debug hook callbacks, and fires off some of its own (internal) activation events that probably do all kinds of stuff. Probably including modifying the DOM to include the Firebug UI. jjb or Honza could tell you more, but I'm guessing that if there's something dangerous that could be done in a JS callback, it probably does it. The JSD part I am familiar with, but I'm not sure what you want to know. Debugging works by setting some debug hooks in the JS engine that get invoked when various events happen, eg a breakpoint is reached. Integrated debuggers like Firebug spin a nested event loop inside of that callback to allow the UI to work while you're sitting on a breakpoint with the triggering code on the JS stack. Resuming from a breakpoint means terminating the nested event loop, then returning from the callback to the code where the callback was hit. (The event loop itself is done within the C++ JSD code for jsdIDebuggerService.enterNestedEventLoop(), checking on every iteration to see if an "all done" flag was set by jsdIDebuggerService.exitNestedLoop().) > When null is being pushed, that means some C++ code is about to do some > operation that it wants to not be subject to security checks, so it wants to > make it look like there is no JS currently running, no matter whether some is > actually running or not. The debugger activation check will find any frames that have been set aside (with JS_SaveFrameChain), so this "hide the running JS" part shouldn't be a problem from the limited standpoint of "are there any compiled frames we might return to?" > The callstack above, for example, is getting the > value of a form control for internal use, so it pushes null to make sure it can > get the value. There are very definitely callsites that push null that can't > handle arbitrary JS running (the one above is one of them, but there are plenty > of others). So it seems to me like the logic for detecting running JS code is working fine. It's just that "no JS running" is insufficient for testing whether executing arbitrary JS is safe. Theoretically, we could split the debugger activation from the activation callback, doing the former immediately and posting the latter to the event queue. No JS running => safe to activate the debugger. Processing an event => safe to invoke the callback. But other than the specter of nested event loops, postponing both together is simpler. (If you split them up, then you'd end up doing things in a state when debugging is activated but the callback has not yet been invoked, which probably isn't horrible but seems messy/unsafe.) Note that processing an event does NOT imply that there is no JS running, because the debugger uses nested event loops quite a bit. Thinking about it right now, it seems like you'd never be in a debugger's nested event loop and then want to activate debugging again, but I think I remember hitting that case or something close to it when trying all this out...? > When something non-null is being pushed what that means ... depends. Sometimes > it's done to make sure that a particular principal will show up as the "subject > principal" for security checks. Sometimes it's done because we're about to run > some script on the context being pushed. If stuff happens synchronously here, > we'd need to audit the callers to see whether they can handle script running. > But it sounds like we should be able to make this async. Ok, this is making more sense to me now.
> if (gDebugMode != gDesiredDebugMode && NS_IsMainThread()) { Ah, good. That'll teach me to comment from memory! > The JSD part I am familiar with, but I'm not sure what you want to know. Well, one thing I want to know is when jsdService::ActivateDebugger needs to be called, exactly. Clearly we don't do it on every Push(), right? What do those booleans this code is testing really mean? > so this "hide the running JS" part shouldn't be a problem from the limited > standpoint of "are there any compiled frames we might return to?" Yes. If null is being pushed, there are no plans to run JS until another context is pushed. > It's just that "no JS running" is insufficient for testing whether executing > arbitrary JS is safe. Very much so. In fact, it's sort of the other way around... If JS is already running (or is about to run), then running JS is probably OK. If not, then you have to worry. > Note that processing an event does NOT imply that there is no JS running, Yep. In fact, you don't even have to be in the debugger; you could be in a sync XHR or under an alert() call or whatever.
Could we call CheckForDebugMode using a script runner?
According to some comments, there are also crashes on Windows with Firebug. It is #101 top crasher in 4.0 over the last 3 days.
OS: Mac OS X → All
Summary: Firefox 4.0 Crash Report [@ SelectorMatches ] (Mac OSX) → Firefox 4.0 Crash Report [@ SelectorMatches ] with Firebug
Component: JavaScript Debugging/Profiling APIs → JavaScript Engine

Firebug is no longer supported

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.