Closed Bug 638241 Opened 9 years ago Closed 9 years ago

DoDrawImageSecurityCheck is 10% of the profile in FishIE Tank

Categories

(Core :: Canvas: 2D, defect)

x86
Windows 7
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla5
Tracking Status
blocking2.0 --- -

People

(Reporter: roc, Assigned: roc)

References

()

Details

(Whiteboard: [fx4-rc-ridealong])

Attachments

(3 files, 2 obsolete files)

My xperf profile is a bit messy, perhaps due to JM stacks, but it appears to show 10% of our main thread in CanvasUtils::DoDrawImageSecurityCheck. That sounds dumb.
Possibly, yes.  Security checks are slow, sadly.

We could try to have some sort of cache, or could try to rewrite security checks to be faster, I guess...  We can't just get rid of the check.
In fact I'm seeing even more time spent in nsLayoutUtils::SurfaceFromElement; some in nsXPConnect::Push (needed due to bug 604262, which can be fixed in another way), plus a bunch of overhead doing QIs and examining images.

So, I'm thinking of adding to nsHTMLCanvasElement a caching map from image elements to their gfxASurfaces. The cache only contains elements that have been drawn to the canvas, so if we find a hit in the cache, we don't need to do another security check --- if the canvas needs to be write-only, it already will be. We can make this a time-based cache. Also, the cache only contains nsHTMLImageElements --- the other elements are likely to change, which makes caching harder and less useful. We do need to make sure that the image-request for the image element hasn't changed since the last time we put it in the cache.
> In fact I'm seeing even more time spent in nsLayoutUtils::SurfaceFromElement;

Yes; I've seen that come up quite a bit in profiles...

The cache thing will help for cases when the same image is banged into the canvas over and over again, yeah.  Apart from silly things like fishtank, is this common?  I guess for game stuff it might be.

The other thing that would help with the security check but not SurfaceFromElement is to store in the image node itself a lazily computed boolean that indicates whether it's same-origin with its image data.  Then we could check that boolean and do a pointer compare on the two NodePrincipal()s.  But hten cross-frame access would still be slow.
(In reply to comment #3)
> > In fact I'm seeing even more time spent in nsLayoutUtils::SurfaceFromElement;
> 
> Yes; I've seen that come up quite a bit in profiles...
> 
> The cache thing will help for cases when the same image is banged into the
> canvas over and over again, yeah.  Apart from silly things like fishtank, is
> this common?  I guess for game stuff it might be.

I think any canvas usage that a) draws images and b) is dynamic will end up redrawing the same image many times to the same canvas. Games certainly, and they're important, but anything involving images and animation or interaction.

> The other thing that would help with the security check but not
> SurfaceFromElement is to store in the image node itself a lazily computed
> boolean that indicates whether it's same-origin with its image data.  Then we
> could check that boolean and do a pointer compare on the two NodePrincipal()s. 
> But hten cross-frame access would still be slow.

That's a good idea, but hopefully my cache will subsume it.
Yeah.  We should add noscript notxpcom getters on images to get the imgIRequest quickly; then we can store a pointer to that in the cache and do a fast compare.
Attached patch Part 2: cache (obsolete) — Splinter Review
Attachment #516497 - Flags: review?(bzbarsky)
With these patches, I go from 40fps to 60fps on FishIE tank with 1000 fish. Yeehaw!
Guys, check this out!

Boy I'd like to take these patches for Firefox 4!
Comment on attachment 516496 [details] [diff] [review]
Part 1: remove useless QI

Please don't lose the curlies, and r=me
Attachment #516496 - Flags: review?(bzbarsky) → review+
Comment on attachment 516497 [details] [diff] [review]
Part 2: cache

>+++ b/content/canvas/public/CanvasImageCache.h

I assume this is in public/ and exported just so you can include it in the layoutmodule stuff?  I'd rather just add content/canvas/src to the LOCAL_INCLUDES for layoutmodule if it's not there already and not export this header.

>+++ b/content/canvas/src/CanvasImageCache.cpp
>+  nsRefPtr<nsIDOMElement> mImage;

nsCOMPtr?  Not like it really matters much....

The rest of this looks good.  Can you file a followup (on me if desired) about having a non-COM version of GetRequest?
Attachment #516497 - Flags: review?(bzbarsky) → review+
The profile is still messy. After taking those patches, identifiable hotspots under drawImage are
-- 4% in nsCanvasRenderingContext2D::Redraw. The first 100 updates trigger Invalidate() calls. We can probably make this go away by tweaking our policy: if there more than 100 redraws between paints in the last cycle, Invalidate the whole canvas at the first next redraw instead of waiting for 100 redraws.
-- 2.5% in CanvasImageCache::Lookup. Of that, 2% is in the QI to nsIImageLoadingContent to check the current request!
-- Lots of random cairo and D2D stuff
-- Lots of overhead in various quickstub stuff, around calls to drawImage/save/restore/translate/scale/transform
-- 8% in actual window painting. We could probably reduce that to about 1% if we make canvas participate in empty transactions. Worth doing.

So I reckon we could knock off another 10% fairly easily, but beyond that is going to require improving quickstubs or cairo/D2D.
For the record, with these patches both FF4 and IE9 RC1 hit 60fps for 1000 fish. Hacking the test to support 2000 fish, I get about 34fps for FF4 and 38fps for IE9. Awfully tempting to try to close the gap there!

(I tried Chrome 11 but enabling acceleration didn't seem to work.)
BTW I think what happened here is that we used to be about as fast on FishIE as IE9 beta, then IE9 RC1 seems to have gotten faster and we regressed due to the fix for bug 604262 and maybe other things.
With various simple fixes I'm up to 37fps (2000 fish). Empty transactions for canvas would put us over the top.

I note the test has a useless ctx.save/restore pair.

I also note that in my current profiles we spend 10% of the profile in xpc_qsUnwrap*. Despite the fact that our save/restore is pretty inefficient (we save and restore the cairo context *and* maintain our own state stack in nsCanvasRenderingContext2D), for save(), restore(), translate(), scale() and transform(), qs overhead is about the same as the cost of the actual functionality.

I also note that we spend 10% of the profile in "?!? <itself>"...
(In reply to comment #19)
> I also note that we spend 10% of the profile in "?!? <itself>"...

Presumably this is JIT generated code?
> Of that, 2% is in the QI to nsIImageLoadingContent to check the current request!

Yeah, that part sort of sucks.  We should think about a better way to deal with it...

> -- Lots of overhead in various quickstub stuff, around calls to
> drawImage/save/restore/translate/scale/transform

We're hoping to make that better when we redo the DOM bindings.  Having a testcase just banging on that stuff would be pretty useful for that work.

> I also note that in my current profiles we spend 10% of the profile in
> xpc_qsUnwrap*

How much of that is arg and how much is this?
I can confirm the performance boost in FishIETank.

FishIETank:
Fx4 beta 12: 32fps
try server build: 42fps
IE9: 46fps

Speed Reading:
Fx4 beta 12: 48 seconds
try server build: 24 seconds!!! 2x improvement!
(In reply to comment #22)
> I can confirm the performance boost in FishIETank.
> 
> FishIETank:
> Fx4 beta 12: 32fps
> try server build: 42fps
> IE9: 46fps
> 
> Speed Reading:
> Fx4 beta 12: 48 seconds
> try server build: 24 seconds!!! 2x improvement!

There should be a patch on the speedreading parity bug somewhere from me that gives Speedreading an additional big boost.
Put in on trunk after FF4 branched and lets make FF5 rock.
(In reply to comment #24)
> Put in on trunk after FF4 branched and lets make FF5 rock.

Not so fast. Is .x out? How about picking up the fix on a respun RC? This is a competitive issue. Where's therisk analysis?

/be
This changes security checks in canvas. I am entirely uncomfortable with landing this in a .x release. It should just wait for 5.
Comment on attachment 516510 [details] [diff] [review]
Part 1 updated

Sorry, we're closed down hard for 2.0.
Attachment #516510 - Flags: approval2.0? → approval2.0-
Comment on attachment 516513 [details] [diff] [review]
Part 2 updated

Asking approval for these patches to land as a ridealong in any respin, or failing that, the next minor update. I do not want to respin just for these patches.

The patches give a large performance improvement in drawImage-heavy canvas benchmarks (see numbers above) --- benchmarks that Microsoft IE9 marketing is heavily focused on, as are tech bloggers.

The risk is relatively low. The new code is isolated and in a well-understood area.
Attachment #516513 - Flags: approval2.0?
Oops, didn't see the comments here.
Normally I would agree with comments #24 and #26 but right now and in the next few weeks as FF4 and IE9 roll out we are going to be looking at stuff like this:
http://downloadsquad.switched.com/2011/03/01/opera-enables-hardware-acceleration-matches-ie9-beats-firefox/
Plus a lot more focused marketing from Microsoft itself. I think these patches will have a significant impact on how FF4 is perceived.
blocking2.0: --- → ?
Sure, we can catch up and pass IE9 in a dot release or FF5 but if the story is already "IE9 wins" then it's really hard to change that perception.
Asking for .x in addition/alternative to the approval. I think this is an important marketing piece, at the same time we should absolutely not do anything that has any risk what so ever, except if we have to respin anyway.
(In reply to comment #26)
> This changes security checks in canvas. I am entirely uncomfortable with
> landing this in a .x release. It should just wait for 5.

The security checks only protect against cross-origin image data information leaks. If they fail, they're not exploitable in the usual sense.

Also, if this patch introduces a failure mode for canvas tainting then it's probably a very subtle one and we might well not find it in a normal release cycle anyway. More effective to have more eyes review it --- perhaps someone familiar with imagelib, like you :-).
Are both parts needed?
In order for me to even consider thinking about changing my mind, we will have to have significant testing (and by testing I mean "trying to break this") on cross-origin drawImage. If there aren't automated tests for that, they need to be written as part of any patch that gets landed for this.

(And that's the case regardless of what branch this lands on.)
I am not too worried about the security implications here, I am much more worried about a lot of new code. Part 1 looks like a ride-along if there is a respin, part 2 scares me. I would love to have this though.
(In reply to comment #34)
> Are both parts needed?

The first part is less significant, but it's a true no-brainer. We're QIing to nsINode, which is a static superclass of the type we're QIing from!
(In reply to comment #35)
> In order for me to even consider thinking about changing my mind, we will have
> to have significant testing (and by testing I mean "trying to break this") on
> cross-origin drawImage. If there aren't automated tests for that, they need to
> be written as part of any patch that gets landed for this.

test_bug397524 in test_canvas.html tests that basic cross-origin loads and loads redirected cross-origin clear the origin-clean flag.
Whats the perf impact of part1 and part2?
Comment on attachment 516510 [details] [diff] [review]
Part 1 updated

This is really, unbelievably safe.
Attachment #516510 - Flags: review?(joe)
Attachment #516510 - Flags: approval2.0?
Attachment #516510 - Flags: approval2.0-
Almost all the perf impact is in part2. part1 is actually bypassed by the cache in part2. We can drop part1 if it's just confusing things (it is, however, extraordinarily safe).
(In reply to comment #35)
> In order for me to even consider thinking about changing my mind, we will have
> to have significant testing (and by testing I mean "trying to break this")

I've already thought about how to break this and haven't come up with anything (obviously, or I would have done things differently :-) ). So it'd be best to have other people trying to break it.

I will add a mochitest to verify that if we draw a same-origin <img>, then change the source to be cross-origin and draw the same <img> again, we clear the origin-clean flag (and draw the new image, not the old image). The patch handles that by storing the element's imgIRequest (with a strong ref!) and checking that it hasn't changed before we use that entry during cache lookup.

I'm also going to investigate if changes to document.domain might affect any of this.
Comment on attachment 516513 [details] [diff] [review]
Part 2 updated

a lot of it is whitespace/indent changes for the cache-miss case.

some indentation trivia in the Makefiles

cache logic looks sound, I presume it's been tested to not leak all canvases or whatever

canvas access is only on the main thread, so no worries about cache-access races.

does this affect mobile? do they still use canvases in their scrolling whatzit?

r+, but I'm not a peer, so take it for what it's worth.
Attachment #516513 - Flags: review?(shaver) → review+
OK, so this testcase shows that my patch changes behavior with respect to document.domain changes:
http://people.mozilla.org/~roc/taint-after-setting-domain.html
After changing the domain to mozilla.org, it can still draw the image from people.mozilla.org without tainting the canvas, which is wrong.

This isn't a security bug, since it doesn't let the page do anything it couldn't already do, but I'll fix it anyway.
(In reply to comment #43)
> some indentation trivia in the Makefiles

Will fix.

> cache logic looks sound, I presume it's been tested to not leak all canvases or
> whatever

Yes.

> does this affect mobile? do they still use canvases in their scrolling whatzit?

No.
(In reply to comment #44)
> OK, so this testcase shows that my patch changes behavior with respect to
> document.domain changes:
> http://people.mozilla.org/~roc/taint-after-setting-domain.html
> After changing the domain to mozilla.org, it can still draw the image from
> people.mozilla.org without tainting the canvas, which is wrong.

Scrub that, the new behavior is fine. http://www.w3.org/TR/html5/the-canvas-element.html says that the origin-clean check is based on the document's origin,  but setting document.domain only changes the effective script origin, NOT the document origin. IE9, Chrome and Opera all behave the same as with my patch.

Nothing to see here, move along.
(I guess the trunk behavior of that testcase is a bug that we need to fix by introducing a distinction between 'origin' and 'effective script origin' in our code.)
Comment on attachment 516510 [details] [diff] [review]
Part 1 updated

Obviously correct.
Attachment #516510 - Flags: review?(joe) → review+
In fact, trunk currently allows a page that changes document.domain to get the image pixels of an image in the new domain, which the spec does not allow, although I don't know if that's really bad.

So the logic for why the patch is correct for the origin-clean flag goes like this:

1) A given imgIRequest will consistently return the same principal when we call it after have decoded the first frame of the image. (I haven't actually looked at code to verify this, but I can't think of a situation where it wouldn't be true.)

2) If CanvasImageCache returns a miss we do the canvas security check just as on trunk, so we'll set the origin-clean flag if necessary.

3) Otherwise, CanvasImageCache returns a hit.

3.1) If the canvas origin-clean flag is currently clear, then it doesn't matter whether we clear it again or not.

3.2) Otherwise the canvas origin-clean flag is currently set. Since the origin-clean flag can never be explicitly set, it must never have been cleared on the element. Since the CanvasImageCache contains an entry for this image request and canvas element, the first frame of the image for this image request must have been previously painted to the canvas, without clearing the origin-clean flag. Therefore, at that time the principal for the image is "the same" as the canvas element's principal.

3.3) Per #1, the image request principal for the current paint is equal to the image request principal for the previous paint. Therefore the image request principal for the current paint is "the same" as what the canvas element's principal *used to be*.

3.4) Now per HTML5 the canvas element's origin can't change so we would be guaranteed that the image request principal for the current paint is "the same" as the current canvas element principal. But we don't do that. HOWEVER, we *do* know that the image request's current principal *could have been* rendered to a canvas by the page without setting the origin-clean flag, since it happened before. So the page could have already obtained the pixel data before setting document.domain (or changing its principal in some other unknown way). So even if we don't clear the origin flag when we should, we're not giving the page the ability to do anything it couldn't have done another way.
Comment on attachment 516513 [details] [diff] [review]
Part 2 updated

This looks correct to me.

Things I considered:
 - Image principals start out as null and change to the correct principal in  OnStartRequest. However, we simply ignore images that haven't finished loading, so this doesn't hurt us.
 - Image principals stay otherwise unchanged from within imagelib, though script or other code can change the principal on any given Image js element. I don't know enough about how & when principals change to say if that can cause problems.
 - nsLayoutUtils is not exported from libxul, so no external addons can depend on it and we can change its return values (eg SurfaceFromElementResult) willy-nilly.

>diff --git a/layout/build/Makefile.in b/layout/build/Makefile.in
>--- a/layout/build/Makefile.in
>+++ b/layout/build/Makefile.in
>@@ -317,16 +317,17 @@ include $(topsrcdir)/config/rules.mk
> LOCAL_INCLUDES	+= -I$(srcdir)/../base \
> 		   -I$(srcdir)/../generic \
> 		   -I$(srcdir)/../forms \
> 		   -I$(srcdir)/../tables \
> 		   -I$(srcdir)/../style \
> 		   -I$(srcdir)/../xul/content/src \
> 		   -I$(srcdir)/../xul/base/src \
> 		   -I$(topsrcdir)/content/base/src \
>+           -I$(topsrcdir)/content/canvas/src \
> 		   -I$(topsrcdir)/content/html/content/src \
> 		   -I$(topsrcdir)/content/html/document/src \
> 		   -I$(topsrcdir)/content/html/style/src \
> 		   -I$(topsrcdir)/content/xslt/src/base \
> 		   -I$(topsrcdir)/content/xslt/src/xml \
> 		   -I$(topsrcdir)/content/xslt/src/xpath \
> 		   -I$(topsrcdir)/content/xslt/src/xslt \
> 		   -I$(topsrcdir)/content/xul/content/src \

Can you fix the indentation there?
Attachment #516513 - Flags: review?(joe) → review+
There's also a much simpler argument:

When CanvasImageCache returns a hit and the canvas element is currently origin-clean, it returns a gfxASurface containing data that was previously painted into the canvas without clearing the origin-clean flag. Therefore the page already could have obtained the pixel data in gfxASurface. Therefore painting the same data from the same surface again without clearing the origin-clean flag is not going to give the page access to anything it couldn't already have.

This argument relies on the contents of the gfxASurface that we obtained for the first frame of the image never changing. I'm pretty sure that's true but not 100% sure.

This argument implies that the GetImageRequest check is not needed for security. It's still needed for correctness though since we don't want to paint the old image if we've changed the <img src> between drawImages.
blocking2.0: ? → .x+
Whiteboard: [fx4-rc-ridealong]
Comment on attachment 516510 [details] [diff] [review]
Part 1 updated

Removing approval requests; we'll consider this if we do a second RC.
Attachment #516510 - Flags: approval2.0? → approval2.0-
Attachment #516513 - Flags: approval2.0? → approval2.0-
Comment on attachment 516513 [details] [diff] [review]
Part 2 updated

Do we have tests for using drawImage with animated images?
(Seems like CanvasImageCache::Lookup does handle that case)
I'm going to write such a test tomorrow. Thanks Olli!
Actually test_canvas.html already has a test to verify that the first frame of an animated GIF is drawn.
This adds a test to verify that painting an <img>, changing its src attribute and then painting the <img> again paints the new image.
(In reply to comment #21)
> We're hoping to make that better when we redo the DOM bindings.  Having a
> testcase just banging on that stuff would be pretty useful for that work.

Do you have any more information on the work happening here?
Bug 622298 is the tracker.  Past that, things are still in a design stage; we hope to have some code soonish, though.
Comment on attachment 516513 [details] [diff] [review]
Part 2 updated

In SurfaceFromElement you could remove the QI to nsINode, remove the null-checks and use |canvas| instead in the one place that it's actually used (node->NodePrincipal()).
Actually, I guess you can't since we use that check to make sure we've got a real element, not a fake JS object.
Comment on attachment 516510 [details] [diff] [review]
Part 1 updated

Re-requesting approval to land these performance patches on mozilla-central. I expect them to be relatively less risky for mobile.
Attachment #516510 - Flags: approval2.0- → approval2.0?
Comment on attachment 516513 [details] [diff] [review]
Part 2 updated

Re-requesting approval to land these performance patches on mozilla-central. I expect them to be relatively less risky for mobile.
Attachment #516513 - Flags: approval2.0- → approval2.0?
Attachment #516513 - Flags: review?(jmuizelaar) → review+
Is this ready to go?  Also, which patches should land here?
Firefox 4 seems to be doing fine without this, and Firefox 5 will be that much better. release-drivers aren't seeing a compelling case for landing this in the mozilla2.0 repo so not blocking. Please make the case and renominate if we've missed something that tips the scales.
blocking2.0: .x+ → -
Attachment #516510 - Flags: approval2.0? → approval2.0-
Attachment #516513 - Flags: approval2.0? → approval2.0-
You need to log in before you can comment on or make changes to this bug.