Assertion "XPConnect is being called on a scope without a 'Components' property!" in test_reactivate.html

RESOLVED WORKSFORME

Status

()

defect
RESOLVED WORKSFORME
10 years ago
6 years ago

People

(Reporter: cpearce, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

I caught a hang in test_reactivate.html. Luckily I was recording the run with Replay Debugging (Note to self: "Recording 25"). :)

We're dispatching 'loadeddata' event, and we're hitting this line:
http://mxr.mozilla.org/mozilla-central/source/js/src/xpconnect/src/xpcwrappednativescope.cpp#786

Which makes me think we're calling into a JS callback from native code after a media element has been put into the bfcache or otherwise "gone away". I note the media element has its mPausedForInactiveDocument==1.

Relevant stacks attached.

Roc: This is the hang I mentioned to you yesterday. I assume it's a regression from or a bug introduced by the patches in bug 518659?

This may show up as a random failure on tinderbox, though I don't think it's ever been reported.
It looks like the actual hang is in stack-walking code, which is weird.
We should probably just guard our DispatchEvent calls with a check for ownerDoc->IsActive().
Reporter

Comment 4

10 years ago
I can actually reproduce this on my physical Windows machine, it's hitting an in-code breakpoint in NS_ERROR(), dumps a callstack to stdout, and continues. I guess my setup in the VM is different, causing the in-code break point to wait instead.

(In reply to comment #2)
> We should probably just guard our DispatchEvent calls with a check for
> ownerDoc->IsActive().

That doesn't work unfortunately, as the handler for an event we're in the process of dispatching from an active document can make the owner doc inactive, and we'll still hit this assertion/breakpoint when we continue to run the handler. This happens in test_reactivate, where the loadeddata handler removes the subframe which owns the media element. Should the handler just not do that?
I think we can (and should) work around this in test_reactivate.html by having loadedAll do a setTimeout(0) to do its actual work, that will let the event handler in the <iframe> return before the <iframe> goes away. I'll take a patch for that.

But presumably Web content can still trigger this NS_ERROR no matter what we would do to try to work around it in the video element. CCing people more knowledgeable than I for advice.

To recap, the situation is that an element in an <iframe> dispatches an event, an event handler in the <iframe> runs and removes the <iframe> from its parent document. When that DOM operation returns to the event handler, we trigger the NS_ERROR in DEBUG_CheckForComponentsInScope. Is this a known issue?
NS_StackWalk isn't me, you want dbaron.
(In reply to comment #1)
> It looks like the actual hang is in stack-walking code, which is weird.

It looks like it's the symbol lookup part of it, though.  Hangs in symbol resolution can result from doing things in static initialization that acquire locks, which is something that you should *never* do.  This is because static initialization happens while the shared library loader is holding a lock; depending on OS you might need to acquire that lock to load another shared library or to do address->symbol mapping.  This puts you at risk for AB-BA deadlock.

What are the stacks of all threads?  Is there some other thread on which we're currently loading a shared library?
We can get all the stacks (yay replay!), but I don't think that's the cause of the hang here. The main thread is stuck in kernel32.dll!_CreateFileW@28(), which surely isn't blocked on the shared library loader lock.

I don't think the hang has anything to do with stack walking, actually, so sorry for wasting your and Ted's time...
Reporter

Comment 9

10 years ago
(In reply to comment #5)
> I think we can (and should) work around this in test_reactivate.html by having
> loadedAll do a setTimeout(0) to do its actual work, that will let the event
> handler in the <iframe> return before the <iframe> goes away.

If I do all the work of the handler in a timeout, we still hit that NS_ERROR when we try to play() the media elements which are bound to the now dead iframe, but which we still have references to.

If I reorder the handler, so that we kill the iframe *after* we play() the media elements, this error goes away, and the test still works. We may race the ended event, so we'd need to check that the media element is not already ended in reviveElements() before we add the "ended" listener to be safe.

Roc: would that change be acceptable?
No, I really do want to test calling play() on media elements in dead iframes.

So, let's morph this bug into the XPConnect bug that I described in comment #5: Web content can trigger this assertion in XPConnect.

The fact that that assertion causes a hang in our recording VMWare environment is a separate issue that we probably just need to figure out locally.
Component: Video/Audio → XPConnect
QA Contact: video.audio → xpconnect
Summary: Hang in test_reactivate.html → Assertion "XPConnect is being called on a scope without a 'Components' property!" in test_reactivate.html
Blocks: 528703

Comment 11

6 years ago
Bug 747434 removed the assertion.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.