Closed Bug 417245 Opened 16 years ago Closed 6 years ago

HTML Validator extension called DOM from the wrong thread [@ JS_RestoreFrameChain - XPCJSContextStack::Pop] crash

Categories

(Firefox :: Extension Compatibility, defect, P2)

x86
Windows XP
defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: philip.chee, Assigned: mgueury)

Details

(Keywords: crash)

Crash Data

** This might be a duplicate of Bug 398076 but I'm filing this separately for the time being.

Usually happens when I middle click on a link to open it in a new tab.

http://crash-stats.mozilla.com/report/index/a356c0cf-d4af-11dc-8cdb-001a4bd43ef6?da
http://crash-stats.mozilla.com/report/index/a8d4c9ee-cc06-11dc-b2c7-001a4bd46e84?dat
http://crash-stats.mozilla.com/report/index/f2ff5d54-c9ae-11dc-8213-001a4bd43ef6?date=2008-01-23-12
http://crash-stats.mozilla.com/report/index/ae6d3f75-c9ae-11dc-9698-001a4bd43e5c?date=2008-01-23-12

Crash thread:

9|0|js3250.dll|JS_RestoreFrameChain|||0xc
9|1|xpc3250.dll|XPCJSContextStack::Pop(JSContext * *)|d:\builds\tinderbox\seamonkeytrunk\winnt_5.2_depend\mozilla\js\src\xpconnect\src\xpcthreadcontext.cpp|113|0x7
9|2|xpc3250.dll|nsXPCThreadJSContextStackImpl::Pop(JSContext * *)|d:\builds\tinderbox\seamonkeytrunk\winnt_5.2_depend\mozilla\js\src\xpconnect\src\xpcthreadcontext.cpp|380|0x9
9|3|gklayout.dll|nsCxPusher::Pop()|d:\builds\tinderbox\seamonkeytrunk\winnt_5.2_depend\mozilla\content\base\src\nscontentutils.cpp|2604|0xa
9|4|gklayout.dll|nsCxPusher::~nsCxPusher()|d:\builds\tinderbox\seamonkeytrunk\winnt_5.2_depend\mozilla\content\base\src\nscontentutils.cpp|2492|0x4
9|5|gklayout.dll|nsEventListenerManager::HandleEventSubType(nsListenerStruct *,nsIDOMEventListener *,nsIDOMEvent *,nsISupports *,unsigned int)|d:\builds\tinderbox\seamonkeytrunk\winnt_5.2_depend\mozilla\content\events\src\nseventlistenermanager.cpp|1084|0x7
9|6|gklayout.dll|nsEventListenerManager::HandleEvent(nsPresContext *,nsEvent *,nsIDOMEvent * *,nsISupports *,unsigned int,nsEventStatus *)|d:\builds\tinderbox\seamonkeytrunk\winnt_5.2_depend\mozilla\content\events\src\nseventlistenermanager.cpp|1183|0x11
9|7|gklayout.dll|nsEventTargetChainItem::HandleEvent(nsEventChainPostVisitor &,unsigned int)|d:\builds\tinderbox\seamonkeytrunk\winnt_5.2_depend\mozilla\content\events\src\nseventdispatcher.cpp|206|0x1c
9|8|gklayout.dll|nsEventTargetChainItem::HandleEventTargetChain(nsEventChainPostVisitor &,unsigned int,nsDispatchingCallback *)|d:\builds\tinderbox\seamonkeytrunk\winnt_5.2_depend\mozilla\content\events\src\nseventdispatcher.cpp|287|0x10
9|9|gklayout.dll|nsEventDispatcher::Dispatch(nsISupports *,nsPresContext *,nsEvent *,nsIDOMEvent *,nsEventStatus *,nsDispatchingCallback *)|d:\builds\tinderbox\seamonkeytrunk\winnt_5.2_depend\mozilla\content\events\src\nseventdispatcher.cpp|479|0x10
9|10|gklayout.dll|nsGenericElement::SetAttrAndNotify(int,nsIAtom *,nsIAtom *,nsAString_internal const &,nsAttrValue &,int,int,int)|d:\builds\tinderbox\seamonkeytrunk\winnt_5.2_depend\mozilla\content\base\src\nsgenericelement.cpp|3723|0x10
9|11|gklayout.dll|nsGenericElement::SetAttr(int,nsIAtom *,nsIAtom *,nsAString_internal const &,int)|d:\builds\tinderbox\seamonkeytrunk\winnt_5.2_depend\mozilla\content\base\src\nsgenericelement.cpp|3629|0x1e
9|12|gklayout.dll|nsGenericElement::SetAttribute(nsAString_internal const &,nsAString_internal const &)|d:\builds\tinderbox\seamonkeytrunk\winnt_5.2_depend\mozilla\content\base\src\nsgenericelement.cpp|1525|0x35
9|13|xpcom_core.dll|NS_InvokeByIndex_P|d:\builds\tinderbox\seamonkeytrunk\winnt_5.2_depend\mozilla\xpcom\reflect\xptcall\src\md\win32\xptcinvoke.cpp|101|0x2
9|14|xpc3250.dll|AutoJSSuspendRequest::SuspendRequest()|d:\builds\tinderbox\seamonkeytrunk\winnt_5.2_depend\mozilla\js\src\xpconnect\src\xpcprivate.h|3479|0x8
9|15|xpcom_core.dll|PL_DHashClearEntryStub|d:\builds\tinderbox\seamonkeytrunk\winnt_5.2_depend\mozilla\objdir\xpcom\build\pldhash.c|150|0x11
9|16|caps.dll|nsScriptSecurityManager::doGetObjectPrincipal(JSContext *,JSObject *)|d:\builds\tinderbox\seamonkeytrunk\winnt_5.2_depend\mozilla\caps\src\nsscriptsecuritymanager.cpp|2411|0x8
Flags: blocking1.9?
This is also a Thunderbird crashes - the frequency of crashes significantly increased since 2008020700 - making this a topcrash for thunderbird trunk.  

TB windows and mac crashes (Bug 398076) were almost nonexistent prior to 2008012100 (the approximate date when the SM crashes begin).  And only one prior to that date (in the 3 month period) is bp-1e2449ab-c2e5-11dc-bbb0-001a4bd46e84
Severity: major → critical
Keywords: topcrash
Summary: crash JS_RestoreFrameChain (called from XPCJSContextStack::Pop) → crash [@ JS_RestoreFrameChain - XPCJSContextStack::Pop]
Flags: tracking1.9? → blocking1.9?
Whiteboard: [Thunderbird topcrash]
Not sure how this can happen, but w/o a reproducable testcase or more data on when and how this happens it could be hard to figure out why this is happening :(

Blocking on at least collecting more data and studying more code around here to see if anything sticks out.
Assignee: nobody → jonas
Flags: blocking1.9? → blocking1.9+
Priority: -- → P1
bp-9912c4ab-e772-11dc-8113-001a4bd43ef6 has similar stack up to chee's frame 9 (but missing frame 2). The crash's comment is "Tried to connect to a (known) IMAP server whose certifcate is expired IMPAR = www.grappa.univ-lille3.fr"


Only one other crash (well, two, but same person) in the last 3 weeks with comments is bp-2dfccf4a-e7a8-11dc-92ed-001a4bd43e5c with comments "Came back to TB after a while and selected an imap message"
Using  Beta4 RC1.
After open some sites FF has been crashed with no errors, nothing at all, like disappeared. Then i started with session restore and it happens again after a while, and again and again. I closed http://www.cyberstyle.ru/filter/tags/Mozilla  and it crashed again. Than I closed google search and now it`s ok. I don`t know where is a problem, or sites or FF, maybe next researches do some help.
Priority: P1 → P2
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → DUPLICATE
sorry. your crash is caused by tidy.

the first url has:
PlainOldFavorites.dll  	 	 	
esent.dll 	5.1.2600.2780 	D5EC998E24EF4BFF8D837851E47028B82 	esent.pdb
nstidy.dll 	0.0.0.0 		nstidy.dll

the other three only have nstidy.
Status: RESOLVED → REOPENED
Component: XPConnect → Extension Compatibility
Flags: blocking1.9+
Product: Core → Firefox
Resolution: DUPLICATE → ---
Summary: crash [@ JS_RestoreFrameChain - XPCJSContextStack::Pop] → nstidy called dom from the wrong thread [@ JS_RestoreFrameChain - XPCJSContextStack::Pop]
Whiteboard: [Thunderbird topcrash]
hey, I'm not quite sure what you're doing (and if I can avoid reading your code, I'd like to), but basically you're touching a DOM object from some thread that isn't the main thread. that's unfortunately quite illegal.

you can try using a timer (DOM or XPCOM), if the DOM timer is setup from the main thread (the only legal place), it'll call you back there. if you use an XPCOM timer, you may be able to ask it to call back on the main thread. for passing data, you can use XPCOM components of most flavors, just don't try to *sync* proxy JS components at this time (you'll crash w/ a different unhelpful stack).
Assignee: jonas → mgueury
Status: REOPENED → NEW
QA Contact: xpconnect → extension.compatibility
Hi,

I am not sure I understand the problem. The bug reports issue in Thunderbird, while the Html Validator extension works only in Firefox or in the browser of Seamonkey. I see that all crash reports are happening in Seamonkey 2.0a1pre. I do not follow Seamonkey, but I suppose it is a new alpha version.

You got a question about thread. Here is how it works.
------------------------------------------------------
There is not start of external thread in the extension. (by default, even if there is a hidden preference, that is nowhere documented that allows it, it was done for testing and I am sure nobody uses it. Except if somebody spotted it in the code).

It works like this:
After a page is loaded, there is an event that is fired by Firefox, the name of the trigger has changed with the time and the Firefox versions. 

This trigger does this. 
1) it sets a flag (semaphore)
2) calls the extension
3) reset the flags

If another thread calls the trigger before the 1rst call has finished. The trigger does this
1) put the event info in a queue.

When the 1rst trigger has finished, the queue is not empty, it treats the other
events that appeared. The goal was to avoid concurrency in the extension. 

So, in short, I do not create threads. I am not aware of recent crash caused by the extension. It was not reported since about 2 years. The only issue I know os a bug about hanging present in Firefox 2 that was solved in Firefox 3 beta 3. 

But it is maybe a new issue, it can happen. If I can do something else to help, please tell.

Thanks,

Marc
> The bug reports issue in Thunderbird,
> while the Html Validator extension works only in Firefox or in the browser of
> Seamonkey.

The comments here referencing Thunderbird are a red herring and belong to some other bug.
sadly i can't think of any way to get from this stack trace to the culprit most likely it's dying as soon as it tries to call any method on any object that came from your chrome.

philip: i don't suppose you'd be interested in learning about less pretty side of chrome-js / xpconnect and searching to see if you can spot the item that's called from the wrong thread?

typically the callback is a dnslookup response. but there can be others, the easiest thing to do is catalog all transitions from nstidy into gecko and figure out what their contract says about threading.
Sadly I am in the middle of packing and flying off on a holiday tomorrow.

Also I looked at the threads in the raw stack dump and (besides not knowing what all those fields mean) I can't see what and where tidy is in relation to those threads at all. Is there a Dummies guide somewhere on MDC?
nope, but I'll write the howto hunt here if you promise to move it to mdc :)

I'm not in a hurry (it's not my problem).

as to the first bit (why tidy):
the logic is:
logic-a:
   most gecko code doesn't break the following rule:
 rule-b:
   "don't call dom from any thread other than the main thread(0)"
logic-c
   if there's code that isn't from gecko and a crash that
   clearly breaks rule-b, then it's probably that code's fault
logic-d
   the module list lets you find such bits
logic-e
   code known to be buggy is more likely to have bugs i care about than other uninspected code (this doesn't mean that the other code can't be buggy, it just means that examining it first makes more sense)

anyway, if you want to figure out what those other libraries are from comment 6, that'd be great. however I'm going to assume we have the right culprit. it doesn't matter to me. if you do manage to clear nstidy [which can also be done by disabling nstidy and triggering the same crash, of course], then please review the other bugs to verify that you didn't manage to hit one of the other known instances, and then try disabling things until you find a single extension which when enabled does trigger it [standard approach to finding culprits], once you've done this, everything else described below will work equally well [replace sources for nstidy with sources for identified culprit].

tools recommended for this hunt:
http://developer.mozilla.org/en/docs/How_to_get_a_stacktrace_with_WinDbg
at least the js code that nstidy uses and either the interfaces it uses or xpt_dump (binary not shipped, it should be built by most configurations, but at least --enable-debug) of the xpt files it ships.

sources for nstidy (not strictly necessary, but they'll make your life much easier)

approaches available:
live crash analysis (uses windbg):
crash again (make a .dump - something like .dump /maipwd /u /ba /c "comment" file - just to make sure you can get back later) and then jump to the deepest frame containing a JSContext* on the crashing thread, if you were using the crash from comment 0, that'd be:
9|16|caps.dll|nsScriptSecurityManager::doGetObjectPrincipal(JSContext
*,JSObject *)|d:\builds\tinderbox\seamonkeytrunk\winnt_5.2_depend\mozilla\caps\src\nsscriptsecuritymanager.cpp|2411|0x8

you're going to want to look through
cx->fp->script
        script->fileName
        script->lineNumber
    fp->down
        down->script
        down->down

fileName should give you enough, you may have to chase down links first. lineNumber will only be the lineNumber for a function, you have to use
    fp->pc
        script->main
and some math plus jsshell (cd "mozilla/js/src" && cvs up -d config && make -f Makefile.ref && WINNT5.1_DBG.OBJ\js)
in jsshell you'll need to load the script that crashed (create stubs if necessary, a relatively good start is http://webwizardry.net/~timeless/windowStubs.js ):

js> load("windowStubs.js")
js> load("evil.js")
js> dissrc(Factory.prototype.createInstance)
;------------------------- 1726:   if (outer)
00000:1726  getarg 0
00003:1726  ifeq 16 (13)
;------------------------- 1727:    throw Components.results.NS_ERROR_NO_AGGREGATION;
...
;------------------------- 1728:    return (new this.component.constructor).QueryInterface(iid);
00016:1728  this

the numbers you're interested in are the numbers before the colons (0000,0003) and they should be vaguely compatible w/ (pc) - (main) *[you'd have to check w/ your local js expert to find out if main is really the right thing]. The numbers after the colons are line numbers, in my case to evil.js.

 createInstance: function Factory_createInstance(outer, iid) {
  if (outer) /* line 1726 */
   throw Components.results.NS_ERROR_NO_AGGREGATION;
   return (new this.component.constructor).QueryInterface(iid);
 }, 

note that
0000:getarg is [outer]
0003:ifeq is |if (outer == 0)|
the 16/13 should be absolute and relative values from the script/pc (0/3) respectively

js> quit()

the other approach is to read through the chrome js code and find each place where it takes a js object/function and passes it to an xpcom interface, then read each xpcom interface to figure out what thread will call the object.

an example of known bugs is
nsIDNSService.asyncResolve

  61      * @param aListenerTarget
  62      *        optional parameter (may be null).  if non-null, this parameter
  63      *        specifies the nsIEventTarget of the thread on which the
  64      *        listener's onLookupComplete should be called.  however, if this
  65      *        parameter is null, then onLookupComplete will be called on an
  66      *        unspecified thread (possibly recursively).

http://mxr.mozilla.org/mozilla/source/netwerk/dns/public/nsIDNSService.idl

Note the gem on line 66 "unspecified thread". Obviously such a method is *not* safe to be called with that flavor from chrome js. (You can review the bugs with this signature where onLookupComplete is fingered to see how often this does happen.) To our visitors joining us late, this block is of course very relevant, but please read the whole bug + your bug + relevant interfaces.

fwiw, interesting stuff can be done w/ a debugger and the following frame:
9|12|gklayout.dll|nsGenericElement::SetAttribute(nsAString_internal const
&,nsAString_internal const
&)|d:\builds\tinderbox\seamonkeytrunk\winnt_5.2_depend\mozilla\content\base\src\nsgenericelement.cpp|1525|0x35

specifically you could find the name of the attribute being set and its value, the name when you're lucky will appear in a chrome js file and could save you a lot of time hunting (especially if it's relatively unique)

the other place that can at times be useful is the caller for NS_InvokeByIndex_P (not in this case), typical callers, e.g. XPCWrappedNative::CallMethod keep around useful bits the variables of interest are things like methodInfo and ifaceInfo which with a debugger and a lot of clicking (+) you can use to find the name of an interface and the method being called (this can be useful at times). Sometimes you can use those bits to search your suspect code for places where they cI/gS/QI to the interface, or when you're unlucky for a method that returns an object of that interface type (this variant obviously requires more work, but searching for the interface using mxr w/ a find limit of "idl$" should be helpful).

this is less useful for our case since we know it's calling nsGenericElement::SetAttribute, but for everyone else it can be useful to have ideas about how to chase things
(In reply to comment #0)
> ** This might be a duplicate of Bug 398076 but I'm filing this separately for
> the time being.

that bug was ultimately fixed in early 2009 in bug 422178
Summary: nstidy called dom from the wrong thread [@ JS_RestoreFrameChain - XPCJSContextStack::Pop] → nstidy called dom from the wrong thread [@ JS_RestoreFrameChain - XPCJSContextStack::Pop] crash
Summary: nstidy called dom from the wrong thread [@ JS_RestoreFrameChain - XPCJSContextStack::Pop] crash → HTML Validator extension called DOM from the wrong thread [@ JS_RestoreFrameChain - XPCJSContextStack::Pop] crash
Crash Signature: [@ JS_RestoreFrameChain - XPCJSContextStack::Pop]
Crash Signature: [@ JS_RestoreFrameChain - XPCJSContextStack::Pop] → [@ JS_RestoreFrameChain ]
The volume here is pretty low. Removing the top crash keyword.
Keywords: topcrash
Mass-closing old Extension Compatibility bugs that relate to legacy add-ons or NPAPI plug-ins. If you think this bug is still valid, please reopen or comment.

Sorry for the bug spam, and happy Friday!
Status: NEW → RESOLVED
Closed: 16 years ago6 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.