Closed Bug 449129 Opened 16 years ago Closed 14 years ago
"ASSERTION: constructing frames in the middle of a paint" with flash slow script dialog [OOPP version]
9.85 KB, text/plain
5.66 KB, patch
|Details | Diff | Splinter Review|
6.33 KB, patch
|Details | Diff | Splinter Review|
7.03 KB, patch
|Details | Diff | Splinter Review|
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1a2pre) Gecko/2008080416 Minefield/3.1a2pre ###!!! ASSERTION: constructing frames in the middle of a paint: 'mPresContext->mLayoutPhaseCount[eLayoutPhase_Paint] == 0', file ../../dist/include/layout/nsPresContext.h, line 998 I don't have a reduced testcase like for bug 434894. Hopefully the stack trace makes it clear what the bug is.
Um, fun. We're telling flash to pain, and it decides to open a modal dialog while painting, which in turn lets us process other events so we run a JS timeout and who knows what'll happen. I don't know that there's all that much we can do about this. Anyone have thoughts?
Should we treat this is a bug in Flash? Could we just refuse to process some events if there's a paint in progress?
It's bad, probably exploitable. Maybe we could add something to block nsAppShell::ProcessGeckoEvents from handling Gecko events. Maybe it could just be a feature of the script-blocker, since any time we enter the Gecko event loop there is the possibility of running script.
Jesse, is there a test case for this?
No. It might be possible to construct one using the Flash in bug 434894, but if I didn't do it in comment 0 that probably means I thought it would be hard.
But how did you get the stack trace?
Some kind of fuzz-testing + after-the-fact use of fix-macosx-stack.pl
What should we do with this bug? If we're not going to fix it, let's WONTFIX it and make sure Adobe's in the loop. If we're going to play with event suppression or whatever, let's get that plan in here. (I recommend the former, since we're going to have this much more rational once we get OOP for Flash.)
This sound very timing related when the slow script dialog is encountered. Difficult to time this and to reproduce if we even could. I am open to suggestions.
You can't figure out the bug based on the stack trace? :(
Maybe you could look at bug 434894? That might be the same bug and actually has a testcase.
Charles, were you able to take a look at bug 434894 to determine if it's the same bug (or related)?
Sorry I have not had time yet.
Benjamin, here's another plugin pain re-entrancy bug.
In the OOPP case, I think we can just kill off the plugin if it starts to spin the event loop during painting. To fix this in-process, we could set a global "don't process events" flag, and simply throw away all Windows events which are dispatched during painting activities. Or we could abort/crash, which might be just as good here.
I bet this assertion fires due to the same problem that's described in bug 535651.
With our current tools I would need to know what version of the Player was running at the time the stack track was made and what the base address is. Its the only way I can stitch the symbols back. Michelle any thoughts on this bug?
Bug 434894 seems to not happen during painting so it's harder to blame the plugin there.
I'm not doing this kind of fuzzing any more, so I'm not sure I can reproduce with a known version of Flash :( But it was probably the version of Flash that Apple was shipping with Mac OS X 10.5 when I reported the bug.
I'll post this to another bug for checkin.
Comment on attachment 438603 [details] [diff] [review] proposal Based on the stack trace I don't see how this helps with anything. Flash opens some dialog (native?) at evil time and that spins gecko event loop. nsAutoScriptBlocker doesn't prevent that.
Attachment #438603 - Flags: feedback?(Olli.Pettay) → feedback-
my hope is that it would block the js from nsGlobalWindow::RunTimeout. It obviously fails if we go around and paint the window anyway. What we really need to do is have a paintblocker that works like the script blocker, but...
(In reply to comment #23) > my hope is that it would block the js from nsGlobalWindow::RunTimeout. Well, what if there is a synchronous XHR in the stack? nsAutoScriptBlocker wouldn't really help with handling it.
i filed bug 558938 against flash with what i think is a relatively correct list of things we don't want flash to do.
Per critsmash meeting today, I'm going to implement comment #16: starting with OOPP, we'll abort the plugin process if a plugin tries to spin an event loop during painting. If this goes smoothly, we will also abort the Firefox process if the plugin spins an event loop during painting. This will be a second step because we're concerned to figure out how common this problem is before actually deploying it. We'll get the OOPP crash reports first, which will help identify the frequency of the problem.
The concern here is more than spinning the event loop right? I.e. plugins that call back to the page to get properties which can potentially be implemented using web-page-js-getters. Can we abort there too?
That's a different bug, and can be solved more simply by simply making npruntime calls fail during painting. Once the plugin spins an event loop, however, there's little hope of recovery.
Whiteboard: [sg:critical?] → [sg:critical?][ETA 7-May]
Is this a Mac-only problem (i.e., not Windows or Linux)? Also, could someone please grant me access to bug 535651, if you think that is still related? Thanks.
bug 535651 is windows but we can't reproduce that one now.
Whiteboard: [sg:critical?][ETA 7-May] → [sg:critical?][ETA 7-May][critsmash:investigating]
Whiteboard: [sg:critical?][ETA 7-May][critsmash:investigating] → [sg:critical?][ETA 7-May][critsmash:investigating][plan in comment 26]
I can start analyzing this from the Flash Player perspective in a month or two. Sorry for the delay.
Can we get an update here?
I will get to this bug once there are no more 3.6.4 blockers.
Benjamin, can you update the ETA (we're way past May 9th now)?
In looking through the Flash Player codebase, I've found at least one place where we are running script during a paint event. We are looking into plugging this hole.
script can do things like change the position of the plugin or cause it to be destroyed. This in turn changes the painting structures which are on the stack for the paint.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
This is the patch as checked in, I had to fix some includes and build goop.
Argh, my carelessness shows through: http://hg.mozilla.org/mozilla-central/rev/e3787b57b3ab This is pretty hard to test, but I'm working on it.
This is the more-scary bit: I'd like to go ahead and land this on trunk for 4.0 so we can get crash data from the betas, but probably not land this on branches for a while because of the scariness.
Attachment #453403 - Flags: review?(roc)
Attachment #453403 - Flags: review?(roc) → review+
I think we'll wait on this for after 3.6.6/3.5.11
This bug was resolved FIXED in comment 40, but then sprouted a new reviewed patch that you said you wanted in 4.0 in comment 43. Has that been forgotten because this bug is resolved? Should it be moved to a new bug?
Also, how does the crash data look on trunk? Do we even need in-process changes as this was written against flash and it is oop on trunk and the branches?
The reviewed patch doesn't pass tryserver on WinXP, and I didn't bother to figure out why (something about synchronous WM_GETICON messages in the event loop).
Is the extra patch _necessary_? Are we stuck with an incomplete fix, or is the extra patch something that can be put into a different bug?
That patch is necessary if we're running any plugins in-process, which we will on Mac 10.5 for its supported life.
Comment on attachment 453374 [details] [diff] [review] Patch with corrected build-goop, rev. 1.1 We can't take this on the branches until it passes try and the other patch is done. Clearing approval.
We should take the OOPP patch as soon as possible. It fixes Flash and Silverlight on Windows, which significantly reduces our attack surface. The other patch is necessary for in-process, but they don't have to land together.
With this fix, what will happen when a plug-in calls script during a paint event? Thanks.
It will be terminated and appear to the user as a plugin crash.
Note, however, that this only affects the case where the plugin either shows the slow-script dialog or spins a nested event loop. It doesn't cover the case where it just uses npruntime scripting and returns quickly.
Created bug 591409 to track the in-process patch from comment 43 so it doesn't get lost due to this bug being RESOLVED FIXED.
blocking1.9.1: --- → needed
blocking1.9.2: --- → needed
Comment on attachment 453374 [details] [diff] [review] Patch with corrected build-goop, rev. 1.1 Clearing old 22.214.171.124 approval. Unsure about .11 -- does landing this fix give away the problem of bug 591409, which we don't seem to be ready to land? Are we seeing these aborts in FF4 betas at all? Should be a trackable signature right?
Comment on attachment 453374 [details] [diff] [review] Patch with corrected build-goop, rev. 1.1 I'm not up for taking this in 126.96.36.199, it'll have to wait for a future update.
Attachment #453374 - Flags: approval188.8.131.52? → approval184.108.40.206-
Whiteboard: [sg:critical?][ETA 7-May][critsmash:investigating][plan in comment 26] → [sg:critical?][need answer from bsmedberg to 2 questions in comment 56]
Comment on attachment 453374 [details] [diff] [review] Patch with corrected build-goop, rev. 1.1 I don't know the answer to either of these questions. Aborts should show up in the crash stats, but I don't know what signature, and we can't search by the entire stack (RPCChannel::SpinInternalEventLoop would be near the top of the stack). And I certainly can't guess whether we'd be "giving away" anything that we haven't already given away on trunk. I guess the branch commits are watched more closely, though...
Attachment #453374 - Flags: approval220.127.116.11?
I'm cobbling together a hacky way to search though a sample of stacks of the top 300 signatures of any given release. I'm looking for RPCChannel::SpinInternalEventLoop in 4.0b6 crashes. Nothing found yet. The closest to anything related to RPCChannel is Signature: StopRequest mozilla::ipc::RPCChannel::DispatchIncall mozilla::ipc::RPCChannel::Incall mozilla::ipc::RPCChannel::OnMaybeDequeueOne like in http://crash-stats.mozilla.com/report/index/84a3ad08-6078-4faa-a6cf-6b3ed2101011 and Signature: js::DeepBail(JSContext*) mozilla::ipc::RPCChannel::DispatchIncall(IPC::Message const mozilla::ipc::RPCChannel::Incall(IPC::Message const mozilla::ipc::RPCChannel::Call(IPC::Message *,IPC::Message *) like in http://crash-stats.mozilla.com/report/index/3b8e972a-b656-49e6-8ec6-7dc132101012 the search across 4.0b6 stacks is still running and I'll run against a sample from 3.6.10 and update if I find anything.
not much more in the search against 3.6.10 stacks In signature: mozilla::plugins::PPluginInstanceParent::CallPBrowserStreamConstructor(mozilla::plugins::PBrowserStreamParent*, nsCString const&, unsigned int const&, unsigned int const&, mozilla::plugins::PStreamNotifyParent*, nsCString const&, nsCString const&, bool... I found mozilla::ipc::RPCChannel::Call like in the hang http://crash-stats.mozilla.com/report/index/1b5383f7-d49b-419b-9d24-d347a2101013 and http://crash-stats.mozilla.com/report/index/27d520ca-8273-4163-bbfc-c2cc12101012 and # 129 Signature: mozilla::ipc::RPCChannel::EnteredCxxStack() mozilla::ipc::RPCChannel::EnteredCxxStack mozilla::ipc::RPCChannel::CxxStackFrame::CxxStackFrame mozilla::ipc::RPCChannel::Call maybe something in variation of crash [@ rasadhlp.dll@0x341f ] but I haven't found the particular report where I happend to stumble onto 0|5|xul.dll|mozilla::ipc::RPCChannel::WaitForNotify() 0|6|xul.dll|mozilla::ipc::RPCChannel::Call(IPC::Message *,IPC::Message *) 238.._PR_MD_RECV 0|5|xul.dll|mozilla::ipc::RPCChannel::WaitForNotify() 0|6|xul.dll|mozilla::ipc::RPCChannel::Call(IPC::Message *,IPC::Message *) many variations of the stack for this signature... need to change the script to to keep track of the reports that I'm using
I'm more interested in 4.0 crashes as this fix is there. I'm concerned that we will take this fix and introduce oopsies where previously it continued on (insecurely) but to the end-user appeared to work (not sure if that's even possible though). If there are really no crashes in 4.0 after this landed, does it mean that basically no one is hitting the insecure code to trigger and oopsie, and as such this is only really fuzzer-generated at this point and doesn't happen in the real world (yet)?
I'm really not concerned about introducing a new crash in 3.6.x... the user would *already* be experiencing the flash slow-script dialog, which makes the page already pretty useless and will probably crash very quickly anyway.
Is it correct that this bug fix is only needed in the case where there is no asynchronous painting? Which Firefox release will have that change?
Asynchronous painting works around the problem, yes. Asynchronous painting is currently being worked on and has landed on Linux for Firefox 4, but we're not sure when mac and windows will be ready.
Comment on attachment 453374 [details] [diff] [review] Patch with corrected build-goop, rev. 1.1 a=LegNeato for 18.104.22.168. Please land this on mozilla-1.9.2 default.
Attachment #453374 - Flags: approval22.214.171.124? → approval126.96.36.199+
err, I mean 188.8.131.52.
blocking1.9.2: needed → .12+
Whiteboard: [sg:critical?][need answer from bsmedberg to 2 questions in comment 56] → [sg:critical?]
Whiteboard: [sg:critical?] → [sg:critical?] don't un-hide until bug 591409 is fixed (non-OOPP part)
If bug 591409 covers the in-process version of this bug then 1.9.1 is "unaffected" by the OOPP version, at least in a bookkeeping sense.
You need to log in before you can comment on or make changes to this bug.