Run Full Render benchmark on this site: http://nontroppo.org/timer/progressive_raytracer.html Be prepared for Firefox to more or less freeze for up to 10 mins on a good CPU. Take latest snapshot of Opera, runs about 10x faster (12x faster when I'm not using a clean profile for Firefox). Something is clearly going very very wrong with Firefox rendering. I don't think it matters if other browsers are running a little faster, but this is clearly indicative of something that needs fixing.
Component: General → General
Product: Firefox → Core
QA Contact: general → general
Assignee: nobody → general
QA Contact: general → general
It was discussed in IRC this might be due to the DOM tree rebuilding every time it does a pass.
Also discussed it might be the way Firefox renders in the first place, using Continuations as discussed by ROC: http://weblogs.mozillazine.org/roc/archives/2007/10/if_i_did_it.html
Extremely unlikely to be a JS engine issue here, from reading the code and light experimentation. Basic render shows the effects clearly enough: Basic render: 2.74 sec Make drawBlock a no-op: 0.3 sec Just remove appendChild from drawBlock: 0.97 sec Make javaSphereColour always return 'rgb(0,0,0)': 2.43 sec So about 90% of the time is spent in the DOM, 65% in node creation and the property setting. If I make drawBlock a no-op, a full render completes in 1.2 seconds on today's trunk, Vista, 2.mumble GHz CPU. So I think the thesis of the page is incorrect, to the extent that the test actually demonstrates it. :) (If you switch to a canvas rather than using the <div>s, and have "javaSphereColour" return [r,g,b] instead of the string, I suspect you will find that a Firefox nightly or even beta4 is pretty competitive with Opera. Maybe even faster enough that you'll want to file a bug on them. :) )
Assignee: general → nobody
QA Contact: general → general
Created attachment 312760 [details] "Raytracer" using canvas I didn't know there was a bug on this. I rewrote this awhile ago using canvas, and performance on FF and Safari at least (Opera doesn't work(?)) is about equal. I did note that if you force FF to resize the canvas, it gets significantly slower (but nowhere near the DOM-version slowness). This has some small rendering errors, but they're due to the implementation. I got rid of them once, but forget how.
The other bug had more useful information, but this bug was already confirmed so it seemed the right choice, I'll reverse the duplicate if anyone strong disagrees. Please see attachment 280318 [details] and attachment 280327 [details] for the test and the Jprof Profile Report, respectively. Also quoting bug 395635 comment 2: > jprof > > Flat Profile > > Total hit count: 13621 > Count %Total Function Name > 3146 23.1 nsLineBox::LastChild() const > 3085 22.6 nsLayoutUtils::GetLastSibling(nsIFrame*) > 3031 22.3 nsLineBox::RFindLineContaining(nsIFrame*, nsLineList_iterator > const&, nsLineList_iterator&, int*) > 3021 22.2 nsFrameList::AppendFrames(nsIFrame*, nsIFrame*) > ... > And bug 395635 comment 3: > nsLineBox::LastChild() is mentioned by BZ in bug 40988 comment #62 and also in > bug 237735 comment #4, but there isn't probably any bug for it. > > nsLayoutUtils::GetLastSibling is probably bug 233463 - adding it to > dependencies > > nsLineBox::RFindLineContaining is mentioned by BZ in bug 304598 comment #19 (In reply to comment #3) > nsLineBox::LastChild() is mentioned by BZ in bug 40988 comment #62 and also in > bug 237735 comment #4, but there isn't probably any bug for it. > > nsLayoutUtils::GetLastSibling is probably bug 233463 - adding it to > dependencies > > nsLineBox::RFindLineContaining is mentioned by BZ in bug 304598 comment #19
The equivalent WebKit bug was https://bugs.webkit.org/show_bug.cgi?id=15148 which turned out to be n^2 behavior due to checking the list of floating objects for duplicates on each append. Switching to a data structure with O(1) membership testing eliminated that. Not sure if the situation is similar for Gecko, but hopefully that's useful information. Sorry for the bugspam if not.
Thanks David. Asa, I'm not sure it deserves blocking, maybe wanted, and while it is an obscure benchmark I can't help but think improving it will affect real world performance somewhere. I thought I'd do another benchmark test given there's been no updated figures since the Firefox 3.0 betas, I used Mozilla/5.0 (Windows; U; Windows NT 5.2; en-US; rv:1.9.1b1pre) Gecko/20080907032646 Minefield/3.1b1pre ID:20080907032646 on a AMD Athlon 3800+ X2 (2GHz). A clean profile was used for every browser. So in order of fastest to slowest on browsers I could test (I couldn't get the webkit nightlies to work): Chrome - 29.69 seconds Opera 9.6RC - 31.609 seconds Safari 3.1.2 - 38.734 seconds Firefox Nightly Trace Enabled - 537.907 seconds Firefox Nightly - 538.344 seconds IE8 Beta 2 - CRASH - Pass 84/120 - 2269.468 seconds Given the increased time of each pass I would surmise if IE8 didn't crash it would of taken about 3 hours, it also had a significant memory increase on each pass, it was at about 900 MB when it crashed, it probably would of increased to about 2 - 3 GB. But the other figures seem to show that under the 1 minute mark is more than reasonable.
Not blocking on this, but giving this to bent to get a profile for what's actually slow here.
Assignee: nobody → bent.mozilla
Priority: -- → P1
Shark results: 27.5% nsLayoutUtils::GetLastSibling(nsIFrame*) 23.9% nsFrameList::LastChild() const 17.0% nsLineBox::LastChild() const 15.5% nsLineBox::IndexOf(nsIFrame*) const Seems like we're spending all of our time walking through linked lists. Over to layout.
Assignee: bent.mozilla → nobody
Component: DOM → Layout: Misc Code
QA Contact: general → layout.misc-code
OK. So the obvious thing there is of course bug 233463. But as comment 7 says, this situation is similar to that in bug 304598... Except that I thought we fast-pathed appends in the abs pos processing in frame constructor. Is that not working? Or are things just slow even with it working?
I was discussing this benchmark on the IE blog and multiple other people managed to pass the whole test in less than a minute on IE 8 Beta 2, though I'm still not able to replicate this result on any computer (looking in to it). But assuming it's common then it buts Firefox squarely 8-12x slower than any other browser.
OK, to answer my own question from comment 11, here's where time is spent "late" in the benchmark: 26% nsFrameList::LastChild() called from FindAppendPrevSibling 23% in nsLineBox::RFindLineContaining called from nsBlockFrame::InsertFrames Those are presumably for the placeholder. 25% nsLayoutUtils::GetLastSibling called from nsFrameConstructorState::ProcessFrameInsertions. 23% nsFrameList::AppendFrames called from nsAbsoluteContainingBlock::AppendFrames. Those are for the abs pos frame. So yes, this is bug 233463 in spades. That said, doing lazy frame construction might help somewhat. I filed bug 502937 on that. And it also looks like that AppendFrames call from ProcessFrameInsertions is uncalled-for. Filed bug 502941 on that.
Created attachment 393744 [details] Raytracer in canvas [fast] I hope the Author doesn't mind. I'm re-uploading his raytracing tests so they're both on bugzilla. I've fixed his canvas demo to remove the visual artifacts and make it work in Opera.
Attachment #312760 - Attachment is obsolete: true
Whiteboard: Full Render on DOM Raytracer freezes Firefox for a while, be prepared!!
But I thought Firefox was supposed to be fairly good with DOM manipulation. Is it the way this thing keeps updating? For example, is it possible to generate the same image as a full array of objects, and render the full list of <div>s all at once for comparison? Would Firefox be faster than the current score if it did not individually render each DOM update, but only had to make one pass at the end?
With the fixes for bug 512471, bug 512336, bug 512470, bug 233463 I see Firefox render this testcase in about 63 seconds. That's about 2x slower than Safari 4 on the same hardware; 1.2x slower than Opera 10. That said, with those patches the frametree bottlenecks seem to be gone: 80% of that time is painting. I'll reprofile once the patches land. As far as comment 18 goes, the answer is yes. If you generate the DOM with a display:none parent and then show it all at once, Firefox without the above patches would be a lot faster than it is on the repeated "live" DOM updates the testcase does.
Yeah, that's because I'm still cleaning them up and try-servering them and running local tests in a debug build and such minor things before attaching them.
s/bug 512470/bug 516742/ in the above. With those patches applied, bug 516732 and bug 516740 cover possible issues in the painting. The second is particular interesting: we're 2x slower than Safari 4 on the attached testcase, but 2x faster (we get 4x faster, their performance doesn't change) if the innerHTML update is taken out.
Would it matter that WebKit uses CGContextFillRect whereas we use CGContextFillPath? I have a hunch that the former is much faster. I'll see if I can switch Cairo to CGContextFillRect and time it.
I don't see WebKit filling paths at all, e.g. WebKit: 1) WebCore::RenderBox::paintFillLayer() -> CGContextFillRect 0.0% 42.6% WebCore WebCore::RenderBox::paintFillLayer(WebCore::RenderObject::PaintInfo const&, WebCore::Color const&, WebCore::FillLayer const*, int, int, int, int, WebCore::CompositeOperator) 1.0% 42.4% WebCore WebCore::RenderBoxModelObject::paintFillLayerExtended(WebCore::RenderObject::PaintInfo const&, WebCore::Color const&, WebCore::FillLayer const*, int, int, int, int, WebCore::InlineFlowBox*, WebCore::CompositeOperator) 0.3% 40.1% WebCore WebCore::GraphicsContext::fillRect(WebCore::FloatRect const&, WebCore::Color const&) 0.2% 27.9% CoreGraphics CGContextFillRect Firefox: 1) nsCSSRendering::PaintBackground() results in a CGContextFillPath 0.0% 16.5% XUL nsCSSRendering::PaintBackground(nsPresContext*, nsIRenderingContext&, nsIFrame*, nsRect const&, nsRect const&, unsigned int, nsRect*) 0.6% 16.0% XUL nsCSSRendering::PaintBackgroundWithSC(nsPresContext*, nsIRenderingContext&, nsIFrame*, nsRect const&, nsRect const&, nsStyleBackground const&, nsStyleBorder const&, unsigned int, nsRect*) 0.0% 12.6% XUL _moz_cairo_fill_preserve 0.1% 12.6% XUL _cairo_gstate_fill 0.0% 12.4% XUL _cairo_surface_fill 0.1% 12.1% XUL _cairo_quartz_surface_fill 0.0% 5.8% CoreGraphics CGContextFillPath 2) PaintBackgroundLayer seems to trigger a path fill as well. 0.0% 0.0% XUL PaintBackgroundLayer(nsPresContext*, nsIRenderingContext&, nsIFrame*, unsigned int, nsRect const&, nsRect const&, nsRect const&, nsStyleBackground const&, nsStyleBackground::Layer const&) 0.0% 0.0% XUL _cairo_path_fixed_fini 0.0% 0.0% XUL gfxContext::Rectangle(gfxRect const&, int) 0.0% 0.0% XUL gfxContext::NewPath() 0.0% 0.0% XUL gfxContext::Fill()
Filed bug 516931 for the fill rect vs fill path issue.
With the patches for bug 516740 and bug 516924 applied in addition to the ones in comment 20 and comment 22, we're about 1.5x faster than webkit. With those patches, the approximate time breakdown is: Painting: 42% (25% is building the display list, marking all those abs pos frames with properties, sorting it, etc). Reflow: 29% (about 2/3 of this is reflowing the placeholders!) Actually running the JS (setting style, creating nodes, etc, etc): 25% Will dig into the JS part. roc, do we want a bug on the display list stuff? Is there something we can do to avoid doing quite so much work for reflowing placeholders? That part is still ending up O(N^2) overall, since we're reflowing every placeholder each time through...
This is a super-bad case for display lists, we have thousands of visible display items, whereas most pages have on the order of a hundred display items visible at any one time. They're not killing us here, so I don't want to do a lot of work to restructure things. However, if we can find some simple local optimizations that help significantly, I guess that'd be worth doing, and we can have a bug on that. But we should wait for bug 513082 to land because it will change this code significantly.
One other thing to note here is that the script renders via a series of setTimeout(0)s, 3 rows of blocks at a time. I believe that adds 10ms of latency per 3 rows, guaranteed. I believe trunk Webkit (or at least Chrome) only adds 4ms there. At the moment we're probably covering up that latency because we're reflowing and painting during that time, but if we get fast enough, we'll stop getting faster because the setTimeout latency will dominate. This would also be affected by compositor-phase-2. Once we start reflowing and painting each 3-rows-of-blocks in less than 1/60s (or thereabouts), compositor-phase-2 would help by coalescing paints, and possibly reflows, to limit the rate to 60Hz or whatever your screen refresh rate is. (Mmmm, refresh-rate-dependent performance will make for fun benchmarking times!)
> But we should wait for bug 513082 to land OK. > I believe trunk Webkit (or at least Chrome) only adds 4ms there. Just chrome, not webkit. The latency the timeouts introduce into this testcase is 1200ms for us. We'd have to get about 20x faster than we are with all the patches we have in flight here before we run up against that limit. ;) > Once we start reflowing and painting each 3-rows-of-blocks in less than 1/60s > (or thereabouts) So in 16ms. Right now we're closer to averaging 160ms per line, more toward the end. So no danger of that biting us any time too soon.
Another surge of progress, I see. Great to see how many effects are seen here, and how completely this is being explored. I currently am soon to receive my first netbook purchase, the highest-end ASUS 1005HA (N280/1GB/XP-SP3). Will be checking some of this on such low-end hardware. Many of these small, individual issues uncovered by such an extreme test case, should be related to numerous claims that Firefox is too "heavy" for Atom processors. I think there has been too much testing on full-power multicore systems with plenty of memory. Unfortunately netbooks are popular, and doubly unfortunately they are MUCH weaker than the hardware that Mozilla has come to expect. This situation is anticipated to improve once Windows 7 ships and multiprocessor Atom-based netbooks come out, yet seriously addressing these small hits to performance may substantially future-proof the Firefox core. But I digress. When my netbook hits, I'll come back with what I'm seeing. Unless anyone else has already tried it and not said anything...?
I see some weird behaviour on this. I tried running the basic render consecutively a few times, and the rendering times are increasing for each run. These are the times I got: 5.867 s 21.741 s 36.955 s 50.264 s 63.921 s 77.695 s so it seems the times are linearly increasing for each consecutive run. Chrome and Safari seem to be exhibiting similar behaviour, though, so maybe it's nothing to worry about.
On my old hardware (2 GHz AMD 64 X2 + 2 GB of RAM) on XP x64 I now get these results for full render: Firefox Trunk: 25.992 Seconds Opera 10: 29.359 Seconds Chrome Dev Build: 37.673 Seconds
I think the scope of this bug is well and truly fixed, every test I've done Firefox narrowly outperforms the competition. Anyone feel free to re-open if you think this bug has more life to it. But I'd imagine it'd be better to create a new bug on benchmarking specific DOM performances cases or other related issues.
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → FIXED
There seems to have been a MASSIVE performance regression on this. Using much faster hardware than in my previous tests I now see: Firefox (Trunk): 286.257 Seconds Chrome 5: 12.754 Seconds Opera 10.6: 7.774 seconds I guess I'll try and narrow down a regression window later and file a new bug.
Result of regression window testing on Mozilla Central: Gecko/20100422 Minefield/3.7a5pre; 14.062 Seconds Gecko/20100619 Minefield/3.7a6pre: 13.954 Seconds Gecko/20100702 Minefield/4.0b2pre: 14.207 Seconds Gecko/20100710 Minefield/4.0b2pre; 14.196 Seconds Gecko/20100714 Minefield/4.0b2pre: 14.363 Seconds Gecko/20100715 Minefield/4.0b2pre: 15.408 Seconds Gecko/20100716 Minefield/4.0b2pre: 358.548 Seconds Gecko/20100717 Minefield/4.0b2pre: > 100 Seconds (crashed) This give this Window: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=5fda39cd703c&tochange=96de199027d7 I would hazard a guess that it was one of roc's many check-ins that caused this performance regression.
Other possibility that stands out to me: http://hg.mozilla.org/mozilla-central/rev/d5bc811bad0a
Damian, I filed bug 585258 on the obvious issue a profile shows. We should remeasure once that's fixed.
Thanks for the info! Been out of the Mozilla/bug loop for along while.
Component: Layout: Misc Code → Layout
Product: Core Graveyard → Core
You need to log in before you can comment on or make changes to this bug.