780770 - Content can prevent chrome buttons from working

Reporter

Description

•

13 years ago

STR: 1) Load https://metrics.mozilla.com/data/content/pentaho-cdf-dd/Render?solution=metrics2&path=%2Ftelemetry&file=telemetryHistogram.wcdf 2) Login using Mozilla Persona with any email address. 3) While the loading spinners are shown for a histogram, try to click the new tab button (may need to have tab overflow or customize the toolbar for it to be shown). Alternatively, try clicking buttons on the error console while the in-content throbbers are shown. Some histograms take longer to load than others and make it easier to reproduce. Actual results: Browser is still responsive but various UI buttons don't work (e.g. new tab button or error console buttons). The buttons show the depressed UI but their actions are not taken (and don't seem to be queued). Closing the tab still works which puts the browser back to a usable state. Expected results: UI buttons should work like they normally do on other pages. Last good nightly: 2012-01-15 First bad nightly: 2012-01-16 Pushlog: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=21c84409902e&tochange=047c8ba7d2e4 Marking as a security bug for now since security UI is also affected such as the "Get me out of here" button on the reported attack site notification bar. I haven't yet figured out what the page is doing or made a reduced test-case. Reproduced on OS X 10.7 and Windows 7. I'm going to narrow down the regression range to the specific commit now.

Matthew N. [:MattN]

Reporter

Comment 1

•

13 years ago

Regression caused by bug 716167 changeset a0e3fb36fb1a: changeset: 84460:a0e3fb36fb1a user: Bobby Holley <bobbyholley@gmail.com> date: Sat Jan 14 10:31:16 2012 -0800 summary: Bug 716167 - Only push null contexts in XPConnect for main thread events, and remove infrastructure from bug 326777. r=bz

Daniel Veditz [:dveditz]

Comment 2

•

13 years ago

On the surface looks like a Denial of Service vuln, but given the regressing bug might be worse. Need Bobby to take a look.

Component: General → XPConnect

Bobby Holley (:bholley)

Comment 3

•

13 years ago

I just poked at this a little bit. I see a couple of interesting things in the debug spew: WARNING: NS_ENSURE_SUCCESS(rv, false) failed with result 0x8000FFFF: file /files/mozilla/repos/i/content/base/src/nsContentUtils.cpp, line 2929 WARNING: NS_ENSURE_TRUE(pusher.Push(aBoundElement)) failed: file /files/mozilla/repos/i/content/xbl/src/nsXBLProtoImplMethod.cpp, line 329 but they don't seem to be reliable or consistent. This stuff isn't really my area of expertise, and my spidey senses tell me I'm probably not the best person to investigate it. Probably better to have someone with more front-end expertise figure out why exactly the button isn't working.

Matthew N. [:MattN]

Reporter

Comment 4

•

13 years ago

Attached file bt + JS Stack using the telemetry dashboard — Details

I took a look again yesterday with GDB and it seems like sync XHR is how it's triggered with pentaho but Felipe hit this today with the Java plugin on the stack instead. (with nsAppShell::ProcessNextNativeEvent followed by the HandleEvent calls). Something interesting to me (for someone who doesn't know this code) was nsEventStatus_eIgnore but perhaps that is normal? (gdb) bt 1 #0 HandleEvent (aEvent=0x7fff5fbf6350) at /Users/matthew/mozilla-central/view/src/nsView.cpp:130 (gdb) p result $31 = nsEventStatus_eIgnore

:Felipe Gomes (needinfo for replies!)

Comment 5

•

13 years ago

I just hit this bug again, and the stack involved a plugin running its message handler and spinning the event loop. Matthew was able to reproduce with his STR and the stack involved does not have plugins, but has synchronous XHR which also spins the event loop. So it seems that is a good lead. It looks like some kind of situation ends up not being able to unwind and decrease the event loop depth. Here is a profile I got after the browser was inside this situation: http://people.mozilla.com/~bgirard/cleopatra/?report=7f0a49d5aec2f78a9235c31877dacac02a3841df (choose `Invert callstack` and then keep expanding the nsDOMEvent::GetTargetForFrame stack) The stack is not 100% meaningful because the browser was not hanging, and nsHTMLButtonElement::PostHandleEvent is (from my best guess) me clicking on the profiler's "Start/Stop/Analyze" buttons. But it seems to imply that the whole browser is running inside the nested loop. FWIW in my case, closing all tabs and killing plugin-container did not resolve the situation. Firefox was also unable to quit by itself (the window disappeared but the process never went way) so I had to kill it. There were also some ghost compartments left from tabs I closed. But all of this is probably just symptons of the bug (instead of causes).

Al Billings [:abillings - ex-MoCo]

Comment 6

•

13 years ago

Regression in January and no movement in two months. Can we propose a security rating and get this assigned to someone to fix?

Bobby Holley (:bholley)

Comment 7

•

13 years ago

Yeah, if content can cause us to get stuck inside a nested event loop, that's not super-great. On the surface it could be used to DoS, but other things might go wrong to, so it's really hard to say. I'm going to propose sec-moderate, but I think we should actively get someone working on it.

Keywords: sec-moderate

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 8

•

13 years ago

I've seen this too. Could we back out Bug 716167?

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 9

•

13 years ago

Bug 716167 makes use call push and pop in an odd way. First we push and run then other thread observers, run the runnable, then we pop and run then other thread observers. Before Bug 716167 we always first pushed null, then run thread observers, run the runnable run thread observers and then pop.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Updated

•

13 years ago

Blocks: 716167

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 10

•

13 years ago

(In reply to Olli Pettay [:smaug] from comment #9) > Bug 716167 makes use call push and pop in an odd way. s/use/us/

Bobby Holley (:bholley)

Comment 11

•

13 years ago

(In reply to Olli Pettay [:smaug] from comment #8) > I've seen this too. Could we back out Bug 716167? Nope. XPCPerThreadData is long gone (bug 755255), so touching XPConnect off-main-thread definitely won't work anymore. Also, bz said the old behavior (doing this stuff off-main-thread) was a major bug that he never intended. If there's something wrong with the way the new infrastructure pushes/pops, we should just fix that.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 12

•

13 years ago

There is a major problem with push/pop, as far as I see. No one can rely on right context (null) to be on the stack. I think that can easily lead to sg;crit problems.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 13

•

13 years ago

(In reply to Bobby Holley (:bholley) from comment #11) > (In reply to Olli Pettay [:smaug] from comment #8) > > I've seen this too. Could we back out Bug 716167? > > Nope. XPCPerThreadData is long gone (bug 755255), so touching XPConnect > off-main-thread definitely won't work anymore. Also, bz said the old > behavior (doing this stuff off-main-thread) was a major bug that he never > intended. Bug 716167 is not really about touching non-main thread objects. It is about pushing null context, especially about pushing null context in main thread.

Bobby Holley (:bholley)

Comment 14

•

13 years ago

(In reply to Olli Pettay [:smaug] from comment #12) > There is a major problem with push/pop, as far as I see. > No one can rely on right context (null) to be on the stack. > I think that can easily lead to sg;crit problems. I totally support fixing this however necessary. I'm just saying that it will have to be a new patch, because backing out bug 716167 isn't an option. (In reply to Olli Pettay [:smaug] from comment #13) > Bug 716167 is not really about touching non-main thread objects. > It is about pushing null context, especially about pushing null context in > main thread. It is. There are two patches in that bug, the first (what you're referring to) being a prerequisite for the second (which is the threading part). Part 1 of that bag removed code that caused us to touch XPConnect off the main thread, which now results in a MOZ_CRASH. This is why we can't just back that patch out.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 15

•

13 years ago

Attached patch patch — Details — Splinter Review

Assignee: nobody → bugs

Attachment #673840 - Flags: review?(bzbarsky)

Attachment #673840 - Flags: review?(bobbyholley+bmo)

Andrew McCreight [:mccr8]

Updated

•

13 years ago

status-firefox14: affected → wontfix

status-firefox15: affected → wontfix

status-firefox18: --- → affected

status-firefox19: --- → affected

Boris Zbarsky [:bzbarsky]

Comment 16

•

13 years ago

So the problem we're solving is that thread observers other than nsXPConnect might get called with a non-null cx on the stack? And that causes problems for some of them?

Boris Zbarsky [:bzbarsky]

Comment 17

•

13 years ago

Comment on attachment 673840 [details] [diff] [review] patch r=me

Attachment #673840 - Flags: review?(bzbarsky) → review+

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 18

•

13 years ago

(In reply to Boris Zbarsky (:bz) from comment #16) > So the problem we're solving is that thread observers other than nsXPConnect > might get called with a non-null cx on the stack? And that causes problems > for some of them? Yes. I could debug this some more if you want to know exactly which observer is causing the problem. But the patch would be the same. There is now an obvious bug in cx pushing/popping.

Boris Zbarsky [:bzbarsky]

Comment 19

•

13 years ago

Nah, no need to debug to figure out the relevant observer. Just a bit mystified as to what observers we have around at all... ;)

Bobby Holley (:bholley)

Comment 20

•

13 years ago

Comment on attachment 673840 [details] [diff] [review] patch Looks good. Thanks for taking care of this. r=bholley Were you able to reproduce this bug? Did this fix it?

Attachment #673840 - Flags: review?(bobbyholley+bmo) → review+

Bobby Holley (:bholley)

Comment 21

•

13 years ago

Oh, also - can you add a comment explaining why we're doing this?

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 22

•

13 years ago

(In reply to Bobby Holley (:bholley) from comment #20) > Were you able to reproduce this bug? Did this fix it? Yes, this happens all the time with https://metrics.mozilla.com/data/ And no, I don't want to add any comments before we have this released.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 23

•

13 years ago

Comment on attachment 673840 [details] [diff] [review] patch [Security approval request comment] How easily can the security issue be deduced from the patch? I'm not even 100% sure there is real security bug. We end up calling stuff with content cx on stack so things which should work, don't. Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem? No Which older supported branches are affected by this flaw? Not ESR10 If not all supported branches, which bug introduced the flaw? Bug 716167 Do you have backports for the affected branches? The patch applies (with some fuzz) to all the branches. How likely is this patch to cause regressions; how much testing does it need? The patch brings back the old behavior which was implemented in Bug 326777

Attachment #673840 - Flags: sec-approval?

Andrew McCreight [:mccr8]

Updated

•

13 years ago

status-firefox-esr10: --- → unaffected

Al Billings [:abillings - ex-MoCo]

Comment 24

•

13 years ago

Comment on attachment 673840 [details] [diff] [review] patch This appears safe enough to check in right now. We should make patches for applicable branches and nominate them.

Attachment #673840 - Flags: sec-approval? → sec-approval+

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 25

•

13 years ago

https://hg.mozilla.org/mozilla-central/rev/d3de9d8e2b5b

Status: NEW → RESOLVED

Closed: 13 years ago

Resolution: --- → FIXED

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 26

•

13 years ago

Comment on attachment 673840 [details] [diff] [review] patch [Approval Request Comment] Bug caused by (feature/regressing bug #): bug 716167 User impact if declined: content page may block chrome UI. Possibly security bugs Testing completed (on m-c, etc.): landed m-c Risk to taking this patch (and alternatives if risky): The patch effectively backs out bug 716167 String or UUID changes made by this patch: NA

Attachment #673840 - Flags: approval-mozilla-beta?

Attachment #673840 - Flags: approval-mozilla-aurora?

Alex Keybl [:akeybl]

Comment 27

•

13 years ago

Comment on attachment 673840 [details] [diff] [review] patch [Triage Comment] Let's get this on Aurora given where we are in the cycle, but no need to uplift to Beta since if it is exploited, the user can just stop going to the affected website to avoid DoS. Please re-nominate for approval-mozilla-beta if we got that wrong.

Attachment #673840 - Flags: approval-mozilla-beta?

Attachment #673840 - Flags: approval-mozilla-beta-

Attachment #673840 - Flags: approval-mozilla-aurora?

Attachment #673840 - Flags: approval-mozilla-aurora+

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 28

•

13 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/33c53c4dbf42

status-firefox18: affected → fixed

status-firefox19: affected → fixed

Jesse Ruderman

Updated

•

13 years ago

Blocks: 806026

Alex Keybl [:akeybl]

Updated

•

13 years ago

status-firefox-esr17: --- → wontfix

Al Billings [:abillings - ex-MoCo]

Updated

•

13 years ago

Whiteboard: [adv-main18-]

u279076

Updated

•

13 years ago

Keywords: verifyme

Matt Wobensmith [:mwobensmith][:matt:]

Comment 29

•

13 years ago

I've been trying to verify the fix for this bug. For the broken build, I've used trunk builds from 2012-01-16 to 2012-08-06, on Mac/Win/Linux. I don't see the behavior described here; performance is acceptable and I'm able to use the UI, spawn new tabs, etc., while a histogram is being built. I suppose something could have changed on the web site itself. I have no way of knowing. I'd love advice from Olli or anyone else w/r/t reproducing this. Also, I'd like to ask the original bug filer - and anyone else who can repro - if they'd like to go back and verify that a) the bug still repros on known bad builds/configs, and b) the latest builds no longer hang. Thank you.

Olli Pettay [:smaug][bugs@pettay.fi]

Assignee

Comment 30

•

13 years ago

It is possible that the site has changed. But I don't see much reason to verify this one. The patch fixed an obvious problem.

Matthew N. [:MattN]

Reporter

Comment 31

•

13 years ago

I verified the fix using my STR from comment 0 shortly after it landed. I would guess the page has since switched to async XHR[1] which may not cause the problem. [1] http://pedroalves-bi.blogspot.com/2012/11/making-cdf-calls-asynchronous.html

Status: RESOLVED → VERIFIED

Keywords: verifyme

u279076

Comment 32

•

13 years ago

(In reply to Matthew N. [:MattN] from comment #31) > I verified the fix using my STR from comment 0 shortly after it landed. I > would guess the page has since switched to async XHR[1] which may not cause > the problem. Does that mean this is not verifiable in Firefox 18 and 19?

Whiteboard: [adv-main18-] → [adv-main18-][qa?]

Matthew N. [:MattN]

Reporter

Comment 33

•

13 years ago

(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #32) > (In reply to Matthew N. [:MattN] from comment #31) > > I verified the fix using my STR from comment 0 shortly after it landed. I > > would guess the page has since switched to async XHR[1] which may not cause > > the problem. > > Does that mean this is not verifiable in Firefox 18 and 19? It is verifiable if someone makes or finds a new testcase but I don't know the exact requirements for it. FWIW, I did verify when 19 was on m-c. Olli seems to be fine without further verification so this is probably fine as-is.

Target Milestone: --- → mozilla19

u279076

Comment 34

•

13 years ago

Okay, thanks Matthew. I'm marking this verified for Firefox 19 based on your testing.

status-firefox19: fixed → verified

Keywords: testcase-wanted

Whiteboard: [adv-main18-][qa?] → [adv-main18-][qa-]

Daniel Veditz [:dveditz]

Updated

•

12 years ago

Group: core-security

Ryan VanderMeulen [:RyanVM]

Updated

•

10 years ago

Keywords: testcase-wanted

bt + JS Stack using the telemetry dashboard 13 years ago Matthew N. [:MattN] 24.08 KB, text/plain		Details
patch 13 years ago Olli Pettay [:smaug][bugs@pettay.fi] 5.84 KB, patch	bzbarsky : review+ bholley : review+ akeybl : approval-mozilla-aurora+ akeybl : approval-mozilla-beta- abillings : sec-approval+	Details \| Diff \| Splinter Review