Closed Bug 1021240 Opened 10 years ago Closed 10 years ago

Calling alert from 'mozMediaStatisticsShowing' property getter freezes Firefox

Categories

(Core :: DOM: Core & HTML, defect)

x86
macOS
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla33
Tracking Status
firefox33 --- verified
firefox-esr24 --- wontfix
firefox-esr31 --- wontfix

People

(Reporter: catalinb, Assigned: smichaud)

References

Details

(Keywords: sec-other, Whiteboard: [adv-main33-])

Attachments

(5 files, 2 obsolete files)

Attached file Test case
1. Create a video element with controls = true
2. Set the getter for the 'mozMediaStatisticsShowing' property to a function that calls alert
3. Add the video element to the dom tree.
4. Firefox freeze
This is OSX specific, right?
Hrm... We're spinning the event loop in openTabPrompt in nsPrompter.js but for some reason not responding to clicks on the alert....
So is this a variant of bug 953435?
Uh.... wait.  WAIT.  Why is videocontrols (and the browser context menu) doing:

  XPCNativeWrapper.unwrap(this.video).mozMediaStatisticsShowing

?  Is this a carryover from when we had no separate XBL scope that fields lived in or something?

Or put another way, why are we exposing this thing to the web page?
Group: core-security
Flags: needinfo?(bobbyholley)
> So is this a variant of bug 953435?

No (as best I can tell).  That was fixed by my patch for bug 996848.
(In reply to Boris Zbarsky [:bz] from comment #4) 
> ?  Is this a carryover from when we had no separate XBL scope that fields
> lived in or something?

Fields don't work with XBL scopes - they have to live on the underlying object, which means that the binding can't see them.

> Or put another way, why are we exposing this thing to the web page?

Because that's the only place where this can live. Both XBL and chrome code need to munge this property, so it can't live as an expando on either Xray, since they're not same-origin.

http://mxr.mozilla.org/mozilla-central/source/browser/base/content/nsContextMenu.js#448

If it's a security problem for content to see this property, then yes, this is a problem. :-(
Flags: needinfo?(bobbyholley)
This was made with today's m-c nightly on OS X 10.8.5.  Note that the native event loop is not being spun a second time.
The hangs don't happen (with today's m-c nightly) on Windows 7.
They also don't happen in FF 29.0.1 (on the Mac).  I'll look for a regression range.
The hang doesn't happen on Linux.
These start happening with the following m-c nightly:

firefox-2014-04-17-03-02-03-mozilla-nightly

This implies the following regression range:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=dd50745d7f35&tochange=e71ed4135461
I can't figure out which patch triggered these hangs.

No, it's not my bandaid patch for bug 959281 (which is long gone from the trunk -- having been replaced by my patch for bug 996848).  And I've already checked that these hangs happen with or without my patch for bug 996848.
Could it possibly be this?

http://hg.mozilla.org/mozilla-central/rev/9f9528ccd44b

author	Fabrice Desré <fabrice@mozilla.com>
	Wed Apr 16 10:48:01 2014 -0700 (at Wed Apr 16 10:48:01 2014 -0700)
changeset 178889	9f9528ccd44b

I'll check when I get the chance.
Another possible candidate?

  Bug 959787 - Handlify JS_DefineProperty; r=Waldo, r=bz
  http://hg.mozilla.org/mozilla-central/rev/1b7eef53c08a
(In reply to Steven Michaud from comment #13)
> Could it possibly be this?
> 
> http://hg.mozilla.org/mozilla-central/rev/9f9528ccd44b
> 
> author	Fabrice Desré <fabrice@mozilla.com>
> 	Wed Apr 16 10:48:01 2014 -0700 (at Wed Apr 16 10:48:01 2014 -0700)
> changeset 178889	9f9528ccd44b
> 
> I'll check when I get the chance.

I've checked this revision, the hang doesn't occur.
I'm having trouble finding the patch that triggered these hangs.  Reproducing this bug may require a universal build -- which I haven't been using since making them takes so long.

I'll try again tomorrow.
(In reply to Steven Michaud from comment #12)
> I can't figure out which patch triggered these hangs.
> 
> No, it's not my bandaid patch for bug 959281 (which is long gone from the
> trunk -- having been replaced by my patch for bug 996848).  And I've already
> checked that these hangs happen with or without my patch for bug 996848.

I've ran a git bisect on the range you mentioned and the bug was introduced on:
http://hg.mozilla.org/mozilla-central/rev/c1a8921153d9

author	Steven Michaud <smichaud@pobox.com>
	Wed Apr 16 09:59:15 2014 -0500 (at Wed Apr 16 09:59:15 2014 -0500)
changeset 178822	c1a8921153d9
	
Bug 959281 - http://paperjs.org/examples/voronoi/ drawing performance has regressed badly. r=spohl
I'm *very* skeptical.  Were you testing with universal builds, Cătălin?

In any case I'll continue my own tests tomorrow.
> If it's a security problem for content to see this property, then yes, this is a
> problem. 

I'm not sure it's a problem for content to see it, at first glance.  What I don't know is whether it's a problem for content to _spoof_ it.

But also, this doesn't seem to be a field.  It's just a property we set.  Can we just move it to an actual IDL property exposed only to chrome and XBL scopes?

Note that the current setup was added in bug 845555 looks like.
Flags: needinfo?(gijskruitbosch+bugs)
(In reply to Steven Michaud from comment #18)
> I'm *very* skeptical.  Were you testing with universal builds, Cătălin?
> 
> In any case I'll continue my own tests tomorrow.

No, just 64-bit builds. Maybe the bug was fixed and reintroduced?
(In reply to Boris Zbarsky [:bz] from comment #19)
> But also, this doesn't seem to be a field.  It's just a property we set. 

There's not really any difference.

> Can we just move it to an actual IDL property exposed only to chrome and XBL
> scopes?

SGTM.

> Note that the current setup was added in bug 845555 looks like.

Yeah. See in particular https://bugzilla.mozilla.org/show_bug.cgi?id=845555#c7.
(In reply to Boris Zbarsky [:bz] from comment #19)
> > If it's a security problem for content to see this property, then yes, this is a
> > problem. 
> 
> I'm not sure it's a problem for content to see it, at first glance.  What I
> don't know is whether it's a problem for content to _spoof_ it.
> 
> But also, this doesn't seem to be a field.  It's just a property we set. 
> Can we just move it to an actual IDL property exposed only to chrome and XBL
> scopes?
> 
> Note that the current setup was added in bug 845555 looks like.

See https://bugzilla.mozilla.org/show_bug.cgi?id=845555#c24 . I'm guessing that yes, we could theoretically move it to an actual IDL property, but as Bobby noted that he thought this was a design flaw, I'd rather do something better - I'm just not sure what that is. :-)

As for content spoofing the value: as far as I can tell, we basically treat it as a boolean throughout, and so content spoofing it would only mean that you wouldn't be able to show the statistics for a video (ie they'd set it to true constantly and you wouldn't be able to toggle it on) or would constantly show statistics (ie they set it to false when they notice you toggle them on, so you wouldn't be able to toggle it off)... which might be annoying but not life-threatening.
Flags: needinfo?(gijskruitbosch+bugs)
Attached patch , (obsolete) — Splinter Review
So I presume that the suggestion is to do something like this, but this breaks the video controls code as it no longer gets access to the property in question. I've been poking at webidl docs and source but I can't find a way to give XBL access to the webidl chromeonly property.
Attachment #8435815 - Flags: feedback?(bzbarsky)
Assignee: cbadea → gijskruitbosch+bugs
Status: NEW → ASSIGNED
Comment on attachment 8435815 [details] [diff] [review]
,

You need to set the member to false in the HTMLMediaElement constructor, presumably.

>+  [ChromeOnly] attribute boolean mozMediaStatisticsShowing;

s/ChromeOnly/Func="IsChromeOrXBL"/ should make this do what you want.

Also, please put it on the "Mozilla extensions" partial interface further in this file, not on the standards-stuff interface.
Attachment #8435815 - Flags: feedback?(bzbarsky) → feedback+
Attached patch ,Splinter Review
So this works, AFAICT. Thanks to Boris for dealing with my newbie questions and mistakes. :-) However, I'm concerned that (a) Bobby previously said this might not be the right approach, and (b) there are other bits of code in videocontrols.xml and in other parts of the tree (social, browser-element, the devtools actors, the TCPSocket implementation, the tps extension, ...) that use XPCNW.unwrap. I don't know how much of this is caused by .unwrap, and how much by other factors like alert-ing in the middle of showing a context menu, but I guess we should be sure that we're not painting too much of a bullseye on other code using XPCNW.unwrap if we think it's also (potentially) vulnerable in one way or another.
Attachment #8435888 - Flags: review?(jaws)
Attachment #8435888 - Flags: review?(bobbyholley)
Attachment #8435815 - Attachment is obsolete: true
I think we should fundamentally assume that any use of .wrappedJSObject or XPCNW.unwrap is a security bug unless proven otherwise, and we've known that (and have been telling people that) for years...

For this bug, there are two separate issues:

1)  The freezing.  That's an issue with alert running when it runs in this testcase, and needs fixing no matter what.

2)  The fact that we let content spoof this property.  That's what your patch fixes.
Comment on attachment 8435888 [details] [diff] [review]
,

Review of attachment 8435888 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM

(In reply to :Gijs Kruitbosch from comment #25)
> However, I'm concerned that (a) Bobby previously said this
> might not be the right approach

What specifically are you referring to?
Attachment #8435888 - Flags: review?(bobbyholley) → review+
(In reply to Bobby Holley (:bholley) from comment #27)
> Comment on attachment 8435888 [details] [diff] [review]
> ,
> 
> Review of attachment 8435888 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> LGTM
> 
> (In reply to :Gijs Kruitbosch from comment #25)
> > However, I'm concerned that (a) Bobby previously said this
> > might not be the right approach
> 
> What specifically are you referring to?

https://bugzilla.mozilla.org/show_bug.cgi?id=845555#c24
Could you file a new sec bug for the spoofing issue, assuming that you aren't also planning on fixing the hang issue?  Thanks.
Attachment #8435888 - Flags: review?(jaws) → review+
My bandaid patch for bug 959281 *does* trigger these hangs, as Cătălin reported in comment #17.

I was confused by the fact that "later" patches in the regression range *didn't* include that patch in the tree they were based upon.

The hangs aren't going to be easy to fix.  Fortunately they don't seem to happen often.

Once we decide which bug follows the hangs, I'll assign it to myself.  If need be I'll open a new bug.
(In reply to :Gijs Kruitbosch from comment #28)
> > What specifically are you referring to?
> 
> https://bugzilla.mozilla.org/show_bug.cgi?id=845555#c24

That was about Xray expandos, not WebIDL properties with visibility restrictions, which is what this new patch does.
Blocks: 1021969
(In reply to Andrew McCreight [:mccr8] from comment #29)
> Could you file a new sec bug for the spoofing issue, assuming that you
> aren't also planning on fixing the hang issue?  Thanks.

Done.
Assignee: gijskruitbosch+bugs → nobody
Status: ASSIGNED → NEW
(note that the patch here, for bug 1021969 does 'fix' this issue by making us ignore the content-defined getter, but not the underlying issue of the hang)
(In reply to Boris Zbarsky [:bz] from comment #2)
> Hrm... We're spinning the event loop in openTabPrompt in nsPrompter.js but
> for some reason not responding to clicks on the alert....

FWIW, when I run this testcase pre-patch with tab-modal dialogs turned off, I get an app-modal rather than tab-modal dialog (as expected), and it *does* show up, but I can't interact with it, and switching to a different app and back with cmd-tab doesn't reshow the browser window - in other words, it also hangs.
Steven, this bug is about the hang.
Assignee: nobody → smichaud
(In reply to comment #34)

What you say makes it more likely that these hangs are a bug uncovered by my patches (the bandaid patch for bug 959281 and the patch for bug 996848).  I sounds like nsAppShell::ProcessNextNativeEvent() is being called with aMayWait == false while the alert (a modal dialog) is open (aMayWait should be true in that case).

If so, this bug won't be so hard to fix.  But it may be a week or two before I can get back to it -- I've got other, more urgent stuff to deal with.
This is really intense for such a simple thing.  My aim here was to try to use this as part of the two way bridge you have to build to clone or manipulate any anonymous(XBL) content.  I do know that it can cause a hang on linux too if used in combination with multiple modal dialogs being opened.  BTW its right that a simple grep led me this and other instances of XPCNativeWrapper.unwrap being used.  You could do a quick search on mxr.mozilla.org and have a nice set of potential code to try to target.  I also have my eyes on breaking the XPCNativeWrapper captive out of its cross domain sandbox prison!
Keywords: sec-other
I think I finally understand why these hangs are happening, and (at
least in part) why only on OS X.

As I discovered working on bug 996848, we want to do as little as
possible when nsAppShell::ProcessNextNativeEvent() is called with
!aMayWait.  (For more information see bug 996848 comment #0, and
particularly the two comments it refers to from bug 959281.)

aMayWait is normally true in nsAppShell::ProcessNextNativeEvent() when
a modal dialog is open (and it's processing events).  In this case
it's normally called from here:

https://hg.mozilla.org/mozilla-central/annotate/3519e987aa3b/widget/xpwidgets/nsBaseAppShell.cpp#l298

But this call doesn't happen if there are events remaining to be
processed in the Gecko event queue.  And in this bug's case, a
pathological condition prevents the Gecko event queue from ever being
empty at that point.

To decipher this bug I backed out the patches for bug 1021969 (so that
this bug's testcase would continue to "work"), and added some debug
logging (which I'll post in a later comment).  With this patch I saw
the Gecko event queue continuously being refilled with the following
two nsIRunnable events:

  20nsRunnableMethodImplIM16nsBindingManagerFvvEvLb1EE
    (nsRunnableMethodImpl<void (nsBindingManager::*)(), void, true>)
  20nsUnblockOnloadEvent
    (nsUnblockOnloadEvent)

The "refilling" takes place here:

https://hg.mozilla.org/mozilla-central/annotate/3519e987aa3b/dom/xbl/nsBindingManager.cpp#l387

This code fights back against "someone [evil] doing event processing
from inside a constructor".  I assume "constructor" means an
nsCSSFrameConstructor, invoked from
nsBindingManager::ProcessAttachedQueue().

If you look at the stack trace from comment #7, you'll see calls to
nsThread::ProcessNextEvent() being made while
nsBindingManager::ProcessAttachedQueue() is on the stack.  I assume
this is the "evil" that nsBindingManager::DoProcessAttachedQueue() is
fighting back against.

I'm not sure what we can or should do about this.  Ultimately I think
we need to disallow modal dialogs while
nsBindingManager::DoProcessAttachedQueue() is on the stack.  But in
the meantime I suspect we can get away with doing nothing.  I doubt
very much that this situation comes up often.

I intend to mark this bug WONTFIX.  Please comment if you don't think
we should do that.
Comment on attachment 8446812 [details] [diff] [review]
Debug logging patch (not a fix)

I forgot to mention that you need to add the following to your mozconfig to use this patch:

ac_add_options --enable-cpp-rtti
> I assume "constructor" means an nsCSSFrameConstructor,

No, it means an XBL <constructor> (so basically some JS that ProcessAttachedQueue causes to run).  In this case, that's exactly what we have: the <binding id="videoControls"> in videocontrols.xml has a <constructor> that in the attached testcase is calling some web page script that does alert(), which spins the event loop.

> I assume this is the "evil" that nsBindingManager::DoProcessAttachedQueue() is
> fighting back against.

That's the theory.

So basically, we end up in a situation where we're asked to fire XBL constructors while we're already inside an XBL constructor.  The binding manager tries to deal with that situation by deferring the firing until that other constructor finishes.

Would it work to post a 0ms (or longer?) timer instead of directly posting an event to the event queue?  That might at least give the main-thread event queue a chance to clear and process some OS events, right?
main thread event loop should always process OS level events occasionally while it is processing
nsIRunnables. (And in case of non-performance-mode, it should even prioritize OS level events.)
http://mxr.mozilla.org/mozilla-central/source/widget/xpwidgets/nsBaseAppShell.cpp#274

But I don't really understand the main event loop handling on OSX. It seems to be all different
to other platforms.
Do we in this case end up having mBlockNativeEvent == true, and then pass mayWait == false ?
But even then we have starvation limit
http://mxr.mozilla.org/mozilla-central/source/widget/xpwidgets/nsBaseAppShell.cpp#98
(In reply to comment #42)

As of my bandaid patch for bug 959281 and my patch for bug 996848 that superseded it, we do very little processing of native events (on OS X) in nsAppShell::ProcessNextNativeEvent() when aMayWait is false.  And we don't process any "user events" (e.g. mouse events and keyboard events) at all.  For complicated reasons alluded to in the second paragraph of comment #38 above, this can't be changed.  (When we refuse to process a native event in nsAppShell::ProcessNextNativeEvent(), we leave it in the event queue and it eventually gets processed under [NSApp run], called from nsAppShell::Run().)

(In reply to comment #41)

I'll try what you suggest and see if I can get it working.
Here's what I see as the most significant difference in how events are handled between Windows and Linux on the one hand, and OS X on the other:

Windows and Linux have a native event queue for each window.  So it isn't too difficult to add a Gecko event queue (for processing nsIRunnable events) on top of that.

OS X has just one native event queue, for the entire app.  So it's much more complicated to add a Gecko event queue on top of it, and to avoid bad interactions between them.  Over time we've found that it's best to avoid "interleaving" them (to have multiple instances of each kind of event loop on the stack at the same time).  This is unavoidable with modal dialogs, but we still want to keep the "interleaving" as simple as possible (to have as few instances of event loops on the stack as possible on the stack at the same time).
(Following up comment #43)

> (When we refuse to process a native event in
> nsAppShell::ProcessNextNativeEvent(), we leave it in the event queue
> and it eventually gets processed under [NSApp run], called from
> nsAppShell::Run().)

Unless a modal dialog is running, as in this bug's case.
Attached patch Fix (obsolete) — Splinter Review
Thanks, Boris.  Your suggestion worked like a charm!

A delay of 0 didn't work -- the Gecko event queue still stayed full.  Next I tried a delay of 100 (1/10th of a second), which does work in my tests.  Not wanting to cut things too finely, I didn't test values in between.  A polling interval of 1/10th of a second doesn't seem unreasonable to me -- neither too long nor too short.  And I made sure that we won't leave dangling pointers.
Attachment #8447496 - Flags: review?(bzbarsky)
Comment on attachment 8447496 [details] [diff] [review]
Fix

I worry a bit about this introducing ordering issues.  I'd much prefer a 10ms timer here, not a 100ms one.

>+  nsRefPtr<nsBindingManager> mgr = static_cast<nsBindingManager*>(aClosure);

How about:

  nsRefPtr<nsBindingManager> mgr = 
    already_AddRefed<nsBindingManager>(static_cast<nsBindingManager*>(aClosure));

and then you don't need the manual NS_RELEASE.

You can just forward-declare nsITimer in the header, and include nsITimer.h just in the cpp.

r=me with that
Attachment #8447496 - Flags: review?(bzbarsky) → review+
> I'd much prefer a 10ms timer here, not a 100ms one.

I'll test with one next week.

I'll incorporate your other comments into my next patch.
I don't quite understand comment 44.
For example on gtk we end up calling
https://developer.gnome.org/glib/2.32/glib-The-Main-Event-Loop.html#g-main-context-iteration 
between nsIRunnable handling. g_main_context_iteration isn't per window, like our appshell isn't per window.
I don't know a whole lot about Linux event handling.  But I thought X, like Windows, has separate event loops for each window.

> like our appshell isn't per window

Correct.  The Gecko event loop is per-app.  But I was talking about *native* event loops.
And AppShell on linux calls g_main_context_iteration, which spins gdk event loop. Nothing window specific there.
Attached patch Fix rev1Splinter Review
> I'd much prefer a 10ms timer here, not a 100ms one.

A 10ms timer doesn't work.

For some reason, faster machines require longer timer delays.  On my fastest machine (a two-year old Retina MacBook Pro), testing in 5ms increments, I discovered that the shortest workable timer delay is 60ms.  But if we want this patch to work on all machines, we shouldn't cut things so close.  So I've stuck with my original 100ms delay.

This patch follows your other recommendations.
Attachment #8447496 - Attachment is obsolete: true
Attachment #8448846 - Flags: review?(bzbarsky)
Comment on attachment 8448846 [details] [diff] [review]
Fix rev1

r=me, but we should expect extension/UI regressions....
Attachment #8448846 - Flags: review?(bzbarsky) → review+
Thanks Boris.  I'll land my patch on trunk.

> but we should expect extension/UI regressions....

If we get them, I'd say we should back out my patch and go back to my original plan of WONTFIXing this bug.  And we should then consider disallowing modal dialogs while nsBindingManager::DoProcessAttachedQueue() is on the stack.
https://hg.mozilla.org/mozilla-central/rev/e07896399710
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: in-testsuite?
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
Whiteboard: [adv-main33-]
Reproduced the original issue using the following build:
- http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/06/2014-06-05-03-02-02-mozilla-central/
- used the poc from comment #0 and clicked in the window, m-c completely froze

Verified using the following builds:
- http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014/07/2014-07-05-03-02-03-mozilla-central/
- http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-10-07-03-02-02-mozilla-central/
- http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-10-07-00-40-03-mozilla-aurora/
- http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/33.0b9/mac/en-US/

For each of the above builds, I opened the poc in several windows/tabs (including Private/e10s) and ensured the issue wasn't occurring.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Group: core-security → core-security-release
Group: core-security-release
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: