228.44 KB, image/png
187.17 KB, image/svg+xml
153.97 KB, image/svg+xml
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b4) Gecko/20050908 Firefox/1.4 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8b4) Gecko/20050908 Firefox/1.4 When displaying new SVG items via scripting, an observable performance degradation is experienced if objects with the same opacity are defined in different group elements (<g>) as opposed to the same. For example, displaying <g opacity="0.3"><circle.../><circle.../></g> is much faster than <g opacity="0.3"><circle.../></g> <g opacity="0.3"><circle.../></g> However, both are quite equivalent. Reproducible: Always Steps to Reproduce: 1.Script mouse events to display two areas. Both areas contain circles 2.In one area, define each circle in its own <g>, specifying a certain opacity on each <g> 3.In the other area, define all circles in one <g>, specifying the same opacity on the <g> Actual Results: The area with one <g> reacts much faster than the one where each circle is in its own <g> Expected Results: Both areas, being equivalent, should perform similarly.
Created attachment 197195 [details] Test case In this test case, two areas are defined with 9 circles each. In one area (left), each circle is defined in its own <g> element, which specifies opacity="0.3" In the other area (on the right), only one <g> element is used to define all circles. Mousing over each area displays and hides the circles. The one on the right is markedly faster.
Created attachment 197202 [details] Test Case 2 In this test case, I have included a third area where the opacity is set directly on the elements themselves (far right). This case does not suffer the performance degradation of the area on the left.
I experience more of a delay in having the dots pop up in the first box than the other two on "Test Case 2". Same with first box on "Test case". Am running Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en-US; rv:1.8b4) Gecko/20050915 Firefox/ 1.4 so recommend a playform/OS change to all. Can't do it myself.
(In reply to comment #0) > For example, displaying > > <g opacity="0.3"><circle.../><circle.../></g> > > is much faster than > > <g opacity="0.3"><circle.../></g> > <g opacity="0.3"><circle.../></g> > > However, both are quite equivalent. I believe you meant to say this the other way around. To clarify, setting the opacity on a group of objects is much slower than setting the same opacity on all those objects individually or, in fact, on the same number groups containing one object each.
> Both areas, being equivalent, should perform similarly. They're not equivalent. The visual result may happen to be the same, but that's *only* because your circles happen to not overlap. If your circles did overlap, the visual result would (should) be different. In the single <g> case, you would *not* see any of the lower circles showing through the higher circles. This is because all the circles in the group are painted as a group - one on top of the other without any opacity - before the whole group is painted with the opacity applied to the group as a whole. For the multiple <g> case the opacity is applied to each circle as it is painted, so the lower circles *would* show through the higher circles. The reason the multiple <g> case is more expensive is that you need separate buffers for each <g> that has an 'opacity' attribute. The rule is, the more 'opacity' attributes you have in your document, the more it's going to hurt. You should think carefully before using this attribute, and make absolutely sure that you can't achieve the same effect using the fill-opacity and stroke-opacity attributes. Note that the following are also not equivalent: <circle opacity=".3" .../> <circle fill-opacity=".3" stroke-opacity=".3" .../> (Actually they are in Mozilla, but that's a bug. See ASV for the correct rendering.) For the former case the fill shouldn't show though half the stroke (because the fill and stroke are painted as a group before opacity is applied), whereas for the latter case it should (because the stroke is painted with opacity). This should probably be closed as INVALID, but I'll leave it open for a bit to see if tor wants to comment.
Need to retest in a cairo build once bug 331130 is fixed. Note that I agree that the multiple-opacity case should be a little slower, but it's well over an order of magnitude slower here -- it takes 3-4 seconds to paint while the other one is fractions of a second (all in a non-cairo build due to bug 331130). It's worth profiling this just to figure out why it's so slow. ;)
Jonathan, I stand corrected. Your interpretation that the display is different and not equivalent is accurate. Thank you for the explanation. However, I am mostly concerned about the large difference in performance. I am sorry if the text of the summary is misleading, but I wish this report would stay open until the performance issue is dealt with.
Sure, tor says we are actually doing a bit more work than we should be, so this should stay open for that.
Tested with Firefox 2.0. Performance does not appear different between all the test cases. Must have been fixed somehow.
I'll assume that this was fixed by bug 331130... (If not, reopen and resolve as WORKSFORME instead, unless what actually fixed this is known.)
Bug 331130 wasn't committed against the Firefox 2 branch, so that's not likely it.
tor, can we reopen this bug? on os x: opera and safari load the new attachment in at most 2 seconds. nightly trunk of minefield or camino took over 25 seconds with the disturbing wizzing coloured circle of strain, present ~:" the uttachment has the ids deleted then each section of the code copied and pasted until mozilla was straining then the script was deleted
I'm going to reopen this based on comment 13, since I can reproduce the heavy slowness on trunk. I can also confirm that Safari is really fast. Profiling from the moment the <svg> open tag is parsed to window.onload firing gives me the following top-down tree (percentages are percent total time spent under the function): 81% nsSVGUtils::PaintChildWithEffects 74.4% _moz_cairo_paint_with_alpha 5.9% _moz_cairo_push_group_with_content The paint_with_alpha code ends up in _cairo_quartz_surface_paint calling CGContextDrawImage. Looking from bottom up, the time is split up as: 38.7% CoreGraphics argb32_image_mark 35.4% CoreGraphics argb32_image_argb32 With the caller of argb32_image_argb32 being argb32_image_mark. At first blush, it looks like we're not optimizing the painting here properly or something. We really shouldn't be painting things outside the dirty rect, which is what we're doing now (and relying on the clip rect to maybe help, but it's not helping the _performance_).
renamed attachment as hang not near-crash. my OS X system now displays after some minutes, but hangs. strangely when viewed through editor loads within about 2 seconds.
#15 https://bugzilla.mozilla.org/attachment.cgi?id=279711&action=edit "details" for no-script hang test. also fails to hang.... html wrapper?
Created attachment 293277 [details] hangs on third square
#15,16,17 hang appears dependent on displaying third square: "opacity defined directly on the item" removed other cases
+'ing/P2. Please re-prioritize as needed.
So the only difference between the third and fourth attachment is the addition of the width and height attributes on the root <svg> tag (well, other than some line ending changes). The purpose of the three rects in the test seems to be to compare the difference in performance for the following three cases: * shapes with fill and stroke, each inside their own <g> with a group opacity * shapes with fill and stroke, all in the same <g> with a group opacity * shapes with fill and stroke, each with a group opacity Having the tests in a single file makes it impossible to know which group contributes what and which performs worst. Even worse, the first case contains 73 shapes, the second 168 shapes, the third 762 shapes. In other words this test is no use for comparison. I'll attach my own tests shortly.
Created attachment 304883 [details] 200 shapes with fill and stroke, each inside their own <g> with a group opacity
Created attachment 304884 [details] 200 shapes with fill and stroke, all in the same <g> with a group opacity
So for me the testcase with each shape inside its own <g> with a group opacity and the testcase with each shape having the group opacity set directly both perform the same - poorly. We need to create a separate offscreen buffer for each group/shape and composite all 200 of these buffers, so that's expected. The testcase where the shapes are grouped inside one group performs much better by comparison. We only need one offscreen buffer in this case, so no surprise.
One thing that is noticeable is that the performance is better when the window is smaller. I guess this is because we are creating offscreen buffers the size of the clip area, and therefore we have more compositing to do when this area is large. We could perhaps try to figure out some way to reduce the size of the offscreen buffer, at least for the opacity-on-the-shape case.
What about comment 14?
You mean what you said about dirty rect and clip rect? Well we're in the initial paint here, i.e. under the NS_PAINT case in nsViewManager::DispatchEvent where the dirty rect comes from event->region. This seems to be the entire content area, so that's why the dirty rect for the initial paint doesn't help. As for clip rect, when the group opacity is set on the <g> then the only way we can know the area to clip to is to calculate the bounding box for all descendants. Hence, currently, we again have an offscreen buffer the size of the content area. For the case of the group opacity being set directly on the shape, we could certainly cheaply find a smaller clip rect.
> This seems to be the entire content area Yes, but in my testing, as of when I made that comment, removing content that was _outside_ the content area (which is most of the content in the testcase I was looking at) sped things up significantly.
I see. For the group cases, the offscreen buffer will be created even if all its descendants are outside the clip rect. I bet cairo isn't smart enough to notice that nothing was painted and skip the composite-with-alpha though. For the opacity directly on shape case it's within our control to skip the whole draw operation, but I'm fairly sure we don't. It's 6am right now so I'm a little tired. This might have to wait until after FOSDEM. What interests me is that Opera and Safari are so much faster even when all the shapes are within the clip rect. In the opacity-on-shape case Safari is broken and doesn't do the offscreen buffer thing, but that's besides the point since it's still fast on all the other cases.
Changing blocking1.9+ to wanted1.9+ since, although the perf sucks, I don't think this sufficiently serious to block the entire Firefox release.
Can someone clarify if the Firefox performance problem with the following file: http://vertex.corpsmoderne.net/main.php?g2_view=core.DownloadItem&g2_itemId=145 is a result of this bug? Performance in Firefox is really bad compared to Opera and Chrome. By manually removing all opacity attributes, performance improves drastically so that led me to this bug - even though the opacity attributes are not being applied to any group elements.
It looks like bug 455984 broke our optimization. :-( I'll file a new bug with a patch for that shortly.
Should add it to Tsvg too, I think, so that we can track it.
I've filled bug 523481 to fix the regression introduced by bug 455984. The performance still isn't great due to some paths having both a fill and a stroke both referencing gradients while at the same time having the 'opacity' attribute set (the other elements now paint a lot faster). This remaining issue is covered by this bug I guess. Jeff: "group opacity" refers to the use of the 'opacity' property (as opposed to the 'fill-opacity' or 'stroke-opacity' properties), regardless of whether it's on a group element of some sort. Mike: yes, I'll be sure to do that.
(In reply to comment #28) > Yes, but in my testing, as of when I made that comment, removing content that > was _outside_ the content area (which is most of the content in the testcase I > was looking at) sped things up significantly. Boris, I'm not sure which test you were using, because the three tests that were attached to the bug as of comment 14 do not seem to have content that lies outside the content area (at least not unless you size your window to be small enough). From my testing trunk builds seem to do a reasonable job of minimizing work for offscreen elements, as does the build from 2007-12-14-04-trunk in fact, so I'm not sure what you were seeing. If you disagree, can you open a new bug for the issue, or point me to the relevant testcase and I will.
Created attachment 407769 [details] profile screenshot I did a little profiling with Shark and confirmed what Boris found. In this screenshot take the 60% to actually be 100% (the PaintFrameWithEffects call is repeated under a different stack elsewhere in the profile taking up another 40% or so with the same ratios under it). So 25% or so of the time is spent under PushGroup, mainly filling new offscreens (with transparent black I assume). 75% is spent under gfxContext::Paint painting with group opacity (which means we're painting through a mask).
I believe as of when I made comment 14 my browser window at the time only showed the <g> commented as "first box" in the source of attachment (and _maybe_ part of the "second box") and that editing the source to remove the elements that were out of view anyway significantly improved the performance. I can try to find time to see whether I can still reproduce that, but not likely for at least several weeks.
The reason the PushGroup and Paint calls are hurting us so badly is because we're creating temporary surfaces that are much bigger than they need to be. We simply create a surface of the same size as the parent surface rather than trying to minimizing it to the size of the output of the child tree that's being painted with opacity. Testcases that demonstrate that coming up.
Created attachment 407770 [details] group opacity testcase with 2500 small rects - WATCH OUT! HANGS FOR 30 SECONDS! This testcase has 2500 small, roughly non-overlapping 6x6 squares painted with group opacity. It hangs Firefox for roughly 30 seconds, but Opera and Safari handle it fine. Looking in the debugger, we create offscreen surfaces the size of the SVG document 600x600 rather than the size of the rects 6x6.
Created attachment 407774 [details] group opacity testcase with 2500 big rects - WATCH OUT! HANGS FOR 30 SECONDS! This testcase also has 2500 rects painted with group opacity, but this time larger ones the size of the SVG canvas to force Opera and Safari to create large offscreen surfaces like Firefox does for the small rect case. On this testcase Firefox, Opera and Safari now all hang for about the same time - 30 seconds or so. Thus it seems like the only thing that's causing us to hurt so badly on group opacity compared to other implementations is the fact that we're not optimizing our offscreen surface size.
(In reply to comment #38) > I believe as of when I made comment 14 my browser window at the time only > showed the <g> commented as "first box" in the source of attachment (and > _maybe_ part of the "second box") and that editing the source to remove the > elements that were out of view anyway significantly improved the performance. > I can try to find time to see whether I can still reproduce that, but not > likely for at least several weeks. Thanks for the info. I can't reproduce a noticeable speedup by using DOM Inspector to delete the second and third boxes when they're offscreen either in current trunk or 2007-12-14-04-trunk, although the perf is poorer in the latter even at the smaller window size. Please file if you do get to testing and can reproduce. Thanks Boris!
After I landed bug 734082, I was able to fix this in bug 766429. We now perform equivalently to Opera on both the "group opacity testcase with 2500 small rects" and "group opacity testcase with 2500 large rects" testcases, the former being vastly sped up.
Oh, and both those tests are being run under Talos already.