Closed Bug 552512 Opened 14 years ago Closed 7 years ago

Windowless plugins can blow up the world during painting

Categories

(Core Graveyard :: Plug-ins, defect)

defect
Not set
normal

Tracking

(blocking2.0 -)

RESOLVED INCOMPLETE
Tracking Status
blocking2.0 --- -

People

(Reporter: roc, Unassigned)

References

Details

Attachments

(2 files, 1 obsolete file)

During painting of windowless plugins, we call synchronously into plugin code. Plugins can respond to this by calling NPN_Evaluate to run arbitrary JS in the page. This will often cause us to crash in exciting ways. I don't have any actual examples of pages that do this, but I've heard they exist (via Webkit folks) and I suspect they might explain some of the crashes in painting we've been seeing.

Asynchronous painting would solve this, because we would not call into plugin code synchronously while painting, so we would not need to process NPN_Evaluate requests during painting.

Another option would be semi-synchronous painting. The idea is that we would paint synchronously but capture the rendered plugin contents into a buffer. In fact I guess we already are doing that with OOPP, since native graphics contexts can't be shared across processes, right? OK, then if during painting something bad happens, like the plugin does an NPN_Evaluate, we simply tell the browser that painting is done and give it the last plugin image buffer to paint. Then in the plugin process, we wait for the browser to asynchronously process the NPN_Evaluate, finish "painting", and tell the browser that now we're really ready to paint. The browser invalidates the plugin area and either we do it all over again or we arrange for the browser to just use the last buffer this time.

Maybe asynchronous painting is actually just as easy as semi-synchronous, though.
Actually what would be very helpful as a first step would be to add code to detect NPN_Evaluate during painting and log it with the URL so that it gets recorded in Breakpad if there's a subsequent crash (URL not for the public server of course). Ted, can we do that?
We can do that, but need socorro changes to make it private which might take a bit.

I'll prepare a patch to completely disable npruntime for testing purposes.
As a first step we could just say that there was an NPN_Evaluate during painting. That seems harmless. Where does the breakpad API for this live?
Hmm, could I just set the URL field?
Perhaps, yeah. It would be overwritten by session-restore at the next navigation, but if we're crashing basically immediately in painting the URL would at least be more accurate.
can I call annotateCrashReport to add a key/value pair, or does that require Socorro changes?

Also, I realized that because Silverlight likes to call into JS on every paint, asynchronous painting is really the only thing that's going to work with Silverlight if we want to prevent it from calling Evaluate during painting.
New key/value pairs won't be saved by Socorro without changes. You can use the "URL" key and it will go into the existing field, which is already non-public. (As bsmedberg says, the existing hooks will overwrite it, but that's probably ok.)
Assignee: nobody → roc
Attachment #432700 - Flags: review?(matspal)
Attachment #432700 - Flags: feedback?(ted.mielczarek)
Hmm. How hard will it be to get the new field saved by Socorro? Should I use appendAppNotesToCrashReport instead?
Attached patch test (obsolete) — Splinter Review
This adds an automated test that changes the DOM during plugin painting. We obviously can't check this in until we have a fix for this bug.
Attachment #432701 - Flags: review?(joshmoz)
(In reply to comment #10)
> Hmm. How hard will it be to get the new field saved by Socorro? Should I use
> appendAppNotesToCrashReport instead?

It's a bit of a pain, it requires DB schema changes, which are kind of complicated by the partitioning of the DB. If you use appendAppNotes, it will show up in the "Notes" field, but every time you call that it will append the data to whatever data we already have in that field. (So if you call it more than once, you'll get repeated data.)
Yes, repeated data would be bad. I suppose I could use a global variable to ensure that the data is only written once. Should I do that?

Is there any chance we can make changes so that adding data like this is easier? Maybe some kind of scratch field that would give us a clue to the cause of a crash that happens "soon"?
Comment on attachment 432700 [details] [diff] [review]
Add DOMChangeDuringPluginPaint annotation to crash reports

In nsCSSFrameConstructor.cpp:
>  nsIXBLService * nsCSSFrameConstructor::gXBLService = nsnull;
> +PRUint32 nsCSSFrameConstructor::gDOMGeneration = 0;

FYI, static members have "static storage duration" and are
zero-intialized before any intializer runs, and thus the
above initializers are redundant.


In nsObjectFrame.cpp:
> +#include "nsICrashReporter.h"

Wrap this include with #ifdef MOZ_CRASHREPORTER
 
> +#ifdef MOZ_CRASHREPORTER
> +    if (mDOMGeneration != nsCSSFrameConstructor::GetDOMGeneration())

Would it be useful to have an assertion for that?
(also when MOZ_CRASHREPORTER is undefined)

>  nsObjectFrame::PaintPlugin(...
>  {
> +  AutoDOMChangeObserver observer(this);

Running this destructor on exit from PaintPlugin will make this code
more crash prone, is that your intention?
(specifically accessing 'mFrame' which might be destroyed)

If so, I think we need to do it differently since what you have
looks like an unsafe crash.
Attachment #432700 - Flags: review?(matspal) → review-
(In reply to comment #14)
> (From update of attachment 432700 [details] [diff] [review])
> In nsCSSFrameConstructor.cpp:
> >  nsIXBLService * nsCSSFrameConstructor::gXBLService = nsnull;
> > +PRUint32 nsCSSFrameConstructor::gDOMGeneration = 0;
> 
> FYI, static members have "static storage duration" and are
> zero-intialized before any intializer runs, and thus the
> above initializers are redundant.

Yes, but I think it's easier to read that way.

> In nsObjectFrame.cpp:
> > +#include "nsICrashReporter.h"
> 
> Wrap this include with #ifdef MOZ_CRASHREPORTER

OK

> > +#ifdef MOZ_CRASHREPORTER
> > +    if (mDOMGeneration != nsCSSFrameConstructor::GetDOMGeneration())
> 
> Would it be useful to have an assertion for that?
> (also when MOZ_CRASHREPORTER is undefined)

Yes, I suppose so.

> >  nsObjectFrame::PaintPlugin(...
> >  {
> > +  AutoDOMChangeObserver observer(this);
> 
> Running this destructor on exit from PaintPlugin will make this code
> more crash prone, is that your intention?
> (specifically accessing 'mFrame' which might be destroyed)
> 
> If so, I think we need to do it differently since what you have
> looks like an unsafe crash.

Good catch. I think I'll stuff the URI in an nsCOMPtr in the constructor.
(In reply to comment #13)
> Is there any chance we can make changes so that adding data like this is
> easier? Maybe some kind of scratch field that would give us a clue to the cause
> of a crash that happens "soon"?

I'm open to suggestions, what would you like? We could have an auto stack class that annotates a value, and removes it from the destructor. Would that work?
That wouldn't work here since the crash will occur "later" when the relevant stuff is not on the stack.

What I'm thinking of is some kind of "Danger" field. When something really bad happens, we just store a note in the Danger field. If that's the last really bad thing to happen before the crash, then we'll know and be able to study those correlations.
Seems like that doesn't do you much good unless you have more than one dangerous place. Even then, you'd wind up with plenty of crashes that went through a dangerous area and crashed in some other way, but with the "Danger" annotation.

My only other suggestion would be to annotate your "Danger" field + a timestamp, and then during processing we could subtract the time of the crash, so you would get "number of seconds since we added the Danger annotation".
Attachment #432700 - Flags: feedback?(ted.mielczarek) → feedback-
Comment on attachment 432700 [details] [diff] [review]
Add DOMChangeDuringPluginPaint annotation to crash reports

I don't think we want the DOMChangeDuringPluginPaint annotation as discussed above. The URL annotation seems fine, it's certainly not going to make things worse.
(In reply to comment #18)
> Seems like that doesn't do you much good unless you have more than one
> dangerous place.

Yes, the idea is to provide a field that's an extension point since Socorro is apparently hard to extend. I think this could be generally useful, there are lots of places (many of the places we have assertions, for example!) where such a note could be a useful guide.

> My only other suggestion would be to annotate your "Danger" field + a
> timestamp, and then during processing we could subtract the time of the crash,
> so you would get "number of seconds since we added the Danger annotation".

That would work.

> The URL annotation seems fine, it's certainly not going to make things worse.

Yeah, but the note about DOM change during paint is what I really want here.

If I produce a patch that uses the Danger field plus a timestamp, will that be OK?
Comment on attachment 432701 [details] [diff] [review]
test

Are you sure you want your test to modify the DOM after the paint? Might it be more evil to do it right before actually painting?
The paintscript script is run *during* painting. That's maximally evil.
OK, I see what you mean now thanks to IRC discussion. Yes, we should move the notifyPaint calls to before the actual platform painting code in the test plugin.
Attachment #432701 - Flags: review?(joshmoz)
Attached patch testcase #1Splinter Review
With that change
Attachment #432701 - Attachment is obsolete: true
Attachment #433380 - Flags: review?(joshmoz)
Attachment #433380 - Flags: review?(joshmoz) → review+
Blocks: 588591
Depends on: 556487
sounds like this is the fix for #2 top crash on 4.0b4. nominating
blocking2.0: --- → ?
the #2-3 crash in 4.0b4 has the signature: 
  nsIFrame::GetOffsetToCrossDoc(nsIFrame const*) 
and is tracked in Bug 588591

http://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&date=2010-09-01%2008%3A00%3A00&signature=nsIFrame%3A%3AGetOffsetToCrossDoc%28nsIFrame%20const*%29&version=Firefox%3A4.0b4

That's slightly different than the signature/stack for the #2-3 top crashe we have been seeing in 3.6.x
 nsIFrame::GetOffsetTo(nsIFrame const*) 

http://crash-stats.mozilla.com/report/list?range_value=2&range_unit=weeks&date=2010-09-01%2008%3A00%3A00&signature=nsIFrame%3A%3AGetOffsetTo%28nsIFrame%20const*%29&version=Firefox%3A3.6.9

same bug with code paths changed around in 4.0bx, or is this regression?
#5 top crash in b5
Bug 593301 - Crash [@ nsPluginInstanceOwner::Paint(tagRECT const&, HDC__*) ] 

is this planned to land on b6?  needs to be blocking final or before.
No, async plugin painting will not land for Windows for beta6.
Blocking, but last I heard bsmedberg was working on this, should this be reassigned, or is the ongoing work being tracked elsewhere (should this be duped)?
blocking2.0: ? → betaN+
Bugs 596451 and 598425 track the work to make OOP windowless plugin painting asynchronous on Windows and Mac respectively. Some extra work would be required to make in-process windowless plugin painting asynchronous.
Assignee: roc → nobody
Depends on: 596451, 598425
Not a regression, but better on Windows with async, and will be better on mac soon.
blocking2.0: betaN+ → -
No longer blocks: 588591
Blocks: 588591
Resolving old bugs which are likely not relevant any more, since NPAPI plugins are deprecated.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: