585258 - Performance regression on the testcase in bug 424715

Reporter

Description

•

15 years ago

We've got to add this to some regression suite... I profiled the testcase in bug 424715 after bug 424715 comment 35, and a large part of what seems to be happening is that we're doing a LOT of nsRegion::Sub operations in nsDisplayItem::RecomputeVisibility. I would assume that we in fact do one Sub() call for each div, so each painting pass ends up O(N^2) in number of divs. Is this a region we can sanely simplify outward if it gets too complicated or something? Note that this code was introduced in "Bug 564991. Part 9: Retain layer trees. r=tnikkel,sr=mats"

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

15 years ago

Blocks: 467263

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

15 years ago

blocking2.0: --- → ?

Keywords: perf, regression

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 1

•

15 years ago

Yes, we should simplify there.

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

15 years ago

URL: http://nontroppo.org/timer/progressiv...

Martijn Wargers (dead)

Comment 2

•

15 years ago

Attached file zipped up testcase — Details

I made a scrolling test out of this. These are the results: Fx3.6.8: 46154ms 2010-07-14: 54926ms 2010-07-16: 35751ms 2010-08-30: 43296ms Seamonkey 1.9.1 build of 2010-07-01: 8376ms Opera: 10703ms Between 2010-07-14 and 2010-07-16 is when bug 564991 landed. Btw, notice how 1.9.1 builds seem to be that much quicker. Is this test useful for this bug? Is this useful for a regression testsuite? What regression suite was commented about in comment 0?

Boris Zbarsky [:bzbarsky]

Reporter

Comment 3

•

15 years ago

> Is this test useful for this bug? No, but it's a useful test.... > What regression suite was commented about in comment 0? We don't have one like that yet. That's the problem. roc, this should probably block a beta, right?

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

15 years ago

Attachment #471538 - Attachment mime type: application/zip → application/java-archive

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 4

•

15 years ago

I don't think it needs to block beta, since all we need to do here is back off some analysis when it's getting expensive and repaint the entire window. But it definitely blocks! Martijn's test should probably be added to Tscroll. But we don't need to use a huge file, we can document.write() our way to victory.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 5

•

15 years ago

(In reply to comment #4) > I don't think it needs to block beta, since all we need to do here is back off > some analysis when it's getting expensive and repaint the entire window. In fact we won't even need to repaint the window.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

15 years ago

blocking2.0: ? → final+

Damian Shaw [Quan]

Comment 6

•

15 years ago

I know this is "blocking final+" but my understanding is there are a large number of blocking bugs and not very much time to ship and cuts must come. Can anyone really fix this fairly large performance regression (in terms of effort people made the test case performant in the first place) or should it really be considered a more long term gaol?

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 7

•

15 years ago

I think we can fix it fairly easily.

Damian Shaw [Quan]

Comment 8

•

14 years ago

Doing a quick test of the original benchmark on my fairly fast system the current browsers look like this to me: Firefox nightly build: 160 seconds at about 110/120 frames and then crash http://crash-stats.mozilla.com/report/index/bp-3f3378ae-890e-480d-b5a5-7e13a2101219 IE 9 Beta 2: 46.774 seconds Chrome latest Canary Build: 10.877 seconds Opera 11: 8.17 seconds

Boris Zbarsky [:bzbarsky]

Reporter

Comment 9

•

14 years ago

Yes, there's no point retesting until this bug is fixed. :(

Damian Shaw [Quan]

Comment 10

•

14 years ago

Sorry for the bugspam, I get a little excited sometimes as I don't have much spare time to do Firefox testing these days. Good luck for Fx4 release.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 11

•

14 years ago

Link to the testcase: jar:https://bug585258.bugzilla.mozilla.org/attachment.cgi?id=471538!/progressive_raytracer.html

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 12

•

14 years ago

My time is 53692ms in my debug build.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 13

•

14 years ago

Actually that was 53s in an opt build.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 14

•

14 years ago

Attached patch Part 1: Don't make visible region arbitrarily complex in RecomputeVisibility — Details — Splinter Review

This reduces it to 37s for me.

Attachment #499230 - Flags: review?(dbaron)

Boris Zbarsky [:bzbarsky]

Reporter

Comment 15

•

14 years ago

OK, so I applied that patch. Things are still Really Slow, though. Compared to bug 424715 comment 26, I see something like 87% of the time in painting (used to be 42%). 20% is spent building the display list (14%) and computing visibility (6%). But painting itself is 65%: 10% building layers and calling WillEndTransaction on the layer builder, and 55% painting backgrounds... Quartz debug (insofar as I can trust it) claims we repaint a box bounded by the text in the upper right corner and the most-recent line of the raytrace, so we would be repainting all the previously-painted things. I bet we didn't use to do that before...

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 16

•

14 years ago

On this hardware, FF 3.6 is 70s, Chrome is 13s. I'll see what else we can do, but this is a rare sort of testcase; we hardly ever see real pages with such an absurd number of element visible at the same time.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 17

•

14 years ago

Oh, I'm testing the scroll link from the testcase in comment #2. Are you testing pageload in the original test?

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 18

•

14 years ago

At least during scrolling, we're repainting exactly the areas you'd expect.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 19

•

14 years ago

3.6 is before we started fixing the deps of bug 424715, as I recall.... I just tried modifying the testcase by removing the innerHTML set for the thing at the top, and time I see dropped from 61s (with the attached patch) to about 37s. Safari 5 and Chrome are both at about 11s (with the innerHTML set). We still have the fix for bug 516740 in, so that's not it. :(

Boris Zbarsky [:bzbarsky]

Reporter

Comment 20

•

14 years ago

> Oh, I'm testing the scroll link from the testcase in comment #2. Oh. See comment 3! I'm testing the original test. Load http://nontroppo.org/timer/progressive_raytracer.html and click "Full Render".

jcblake

Comment 21

•

14 years ago

Just FYI... Early betas of FF 3.7 had this test case down to 11 seconds on my Core2Duo, so this appears to have been more or less completely resolved at one point.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 22

•

14 years ago

Yes, we know... See comment 0. ;)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 23

•

14 years ago

OK thanks for clarifying that. Now I think that the problem is in FrameLayerBuilder::InvalidateThebesLayerContents. The invalid region we accumulate gets complicated because the rects we invalidate aren't simply in top-to-bottom, left-to-right order; we get some holes. When it gets complicated, nsRegion::SimplifyOutward fluffs out to the bounding box, which includes the text at the top-right, so we invalidate a big chunk of the layer. I think we need to make nsRegion::SimplifyOutward smarter.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 24

•

14 years ago

Attached patch Part 2: Make SimplifyOutward smarter — Details — Splinter Review

This takes the "full render" from 45s to 30s on my machine. We repaint a constant number of display items in every pass --- 732, which seems high. I'll look into that next.

Attachment #499257 - Flags: review?(bzbarsky)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 25

•

14 years ago

Oh, 732 is about right actually. 3 rows, each 240 pixels (elements) wide.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 26

•

14 years ago

Hmm, xperf is still not nearly as good as Shark for this work, if I'm using it right...

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 27

•

14 years ago

Maybe --enable-profiling would help...

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 28

•

14 years ago

Indeed, xperf looks quite good now. Painting time still dominates (74% of total profile in nsLayoutUtils::PaintFrame). Of the painting time, roughly 33% is display-list construction (BuildDisplayListForStackingContext), 28% is ComputeVisibilityForRoot, 34% is layer construction (under FrameLayerBuilder::BuildContainerLayerFor and FrameLayerBuilder::WillEndTransaction), and 5% is actual painting (under LayerManagerD3D9::EndTransaction).

Boris Zbarsky [:bzbarsky]

Reporter

Comment 29

•

14 years ago

Hmm. So I tried applying the second patch, but it doesn't seem to help anything here... In fact, even if I make nsRegion::SimplifyOutward a no-op that changes nothing. I still see over 50% of the time spent in actual painting, and over 87% spent in painting-related stuff (including painting, layer construction, etc)... Maybe we're simplifying the region somewhere else on Mac too or something? Or maybe our painting really has gotten that much slower? I'm seeing this crap land inside CoreGraphics color management stuff, for one thing.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 30

•

14 years ago

Comment on attachment 499257 [details] [diff] [review] Part 2: Make SimplifyOutward smarter >+ pRect->YMost () >= pRect->next->y) Drop the space char after "YMost" here? >+ pRect->UnionRect(*pRect, *pRect->next); >+ delete Remove (pRect->next); And here after "Remove"? r=me with those nits.

Attachment #499257 - Flags: review?(bzbarsky) → review+

Boris Zbarsky [:bzbarsky]

Reporter

Comment 31

•

14 years ago

But again, part 2 seems to do nothing for me on Mac.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 32

•

14 years ago

(In reply to comment #30) > Comment on attachment 499257 [details] [diff] [review] > Part 2: Make SimplifyOutward smarter > > >+ pRect->YMost () >= pRect->next->y) > > Drop the space char after "YMost" here? > > >+ pRect->UnionRect(*pRect, *pRect->next); > >+ delete Remove (pRect->next); > > And here after "Remove"? > > r=me with those nits. Those are consistent with the rest of nsRegion.cpp.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 33

•

14 years ago

If you count the number of times we reach here: http://mxr.mozilla.org/mozilla-central/source/layout/base/FrameLayerBuilder.cpp#1672 in each call to DrawThebesLayer, you should see that number growing linearly with each paint before part 2, and not growing with each paint after part 2. At least I did. (In reply to comment #28) > Painting time still dominates (74% of total profile in > nsLayoutUtils::PaintFrame). Of the painting time, roughly 33% is display-list > construction (BuildDisplayListForStackingContext), 28% is > ComputeVisibilityForRoot, 34% is layer construction (under > FrameLayerBuilder::BuildContainerLayerFor and > FrameLayerBuilder::WillEndTransaction), and 5% is actual painting (under > LayerManagerD3D9::EndTransaction). Of those, the layer construction time (34%) is new. The rest should be pretty much unchanged since 3.6. I guess it's possible that Quartz painting has slowed down ... we've switched from drawing to a CGBitmapContext to a CGLayer, and maybe that adds overhead that shows up when you're drawing 59,000 1x1 pixel rects. I'm willing to do small tweaks to make this testcase faster but not major restructuring. This testcase has 100x more visible elements than any page you'll ever see that's not doing some kind of one-DIV-per-pixel hack ... and <canvas> is the right API for this use-case.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 34

•

14 years ago

> Those are consistent with the rest of nsRegion.cpp. Ah, ok. I agree we shouldn't spend too much time on this except insofar as real pages might be affected. I'll take a look at the counts I see.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 35

•

14 years ago

If the counts keep growing, you'd want to look at calls to ThebesLayerOGL::InvalidateRegion to see which region(s) are being invalidated at each paint. It should be something reasonable, a rectangle for the text and one or more rectangles for the strip of DIVs.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 36

•

14 years ago

Over here (on Mac), unless I'm screwing something up, I see the number of times we reach that line growing linearly (adding 729 per paint) no matter what I do with nsRegion::SimplifyOutward (including just putting a return at the very beginning of the function)... Lemme take a look at those regions.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 37

•

14 years ago

Yeah, those look broken to me. I see two InvalidateRegion calls per paint. The first is passed an empty region. The second is passed a region with three rects. Typical rects for that second call: x=502 y=-1 width=745 height=231 x=502 y=230 width=243 height=2 x=745 y=230 width=502 height=1 On each call the y values for rects 2 and 3 and the height for rect 1 increase by 3. So I guess the question is where that first rect comes from.

Karl Tomlinson (:karlt)

Comment 38

•

14 years ago

Comment on attachment 499257 [details] [diff] [review] Part 2: Make SimplifyOutward smarter >+ // Combine with the following rectangle if they have the same YMost >+ // or if they overlap vertically. This ensures that all overlapping >+ // rectangles are merged, preserving the invariant that rectangles >+ // don't overlap. "... don't intersect" would distinguish from other kinds of overlapping.

Damian Shaw [Quan]

Comment 39

•

14 years ago

Real page load times are always most important, but it would be cool if there existed a suite of 'chaos' benchmarks that reacted very badly to unintended performance regressions in very important functions, that way they could be treated as less important but give developers of an idea of potential unintended regressions. Also it's a bit of a shame between Fx 3.6 and 4.0 this benchmark has had Fx as the slowest modern browser, to the fastest, to the slowest again. Much appreciate your attention roc!

Boris Zbarsky [:bzbarsky]

Reporter

Comment 40

•

14 years ago

I _think_ it's already present in the invalidThebesContent region we get in FrameLayerBuilder::BuildContainerLayerFor. Though that region has 5 rects, and the region that's returned from ToOutsidePixels only has 3...

David Baron :dbaron: (⌚️UTC-5, no longer working on Mozilla)

Comment 41

•

14 years ago

Comment on attachment 499230 [details] [diff] [review] Part 1: Don't make visible region arbitrarily complex in RecomputeVisibility r=dbaron

Attachment #499230 - Flags: review?(dbaron) → review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

14 years ago

Keywords: checkin-needed

Whiteboard: [needsd landing]

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 42

•

14 years ago

(In reply to comment #40) > I _think_ it's already present in the invalidThebesContent region we get in > FrameLayerBuilder::BuildContainerLayerFor. Though that region has 5 rects, and > the region that's returned from ToOutsidePixels only has 3... That region is added to in FrameLayerBuilder::InvalidateThebesLayerContents. If you log the incoming aRect and the resulting invalidThebesContent region, you should be able to understand how the region ends up with that oversized first rectangle. The SimplifyOutward(20) call there is what was causing trouble for me, which patch 2 fixed. Everything you're saying makes it sound like patch 2 was not applied ...

Boris Zbarsky [:bzbarsky]

Reporter

Comment 43

•

14 years ago

> Everything you're saying makes it sound like patch 2 was not applied ... It really truly is! I did this with printfs just now instead of a debugger, and in an opt build to avoid whatever noise a debug build was introducing. I can no longer reproduce comment 37 on, but I do still see comment 36. Here's what a log looks like: Our region is: Rect count: 2 [(52261,5130) 22559x930] [(30120,26976) 14580x180] And will become: Rect count: 2 [(871,85) 376x16] [(502,449) 243x4] Invaliding region: Rect count: 2 [(871,-1) 376x16] [(502,363) 243x4] Resulting valid region: Rect count: 6 [(0,0) 871x15] [(0,15) 502x409] [(502,15) 745x348] [(745,363) 502x4] [(502,367) 745x57] [(0,424) 1247x420] Painted item count: 45293 The logging for regions logs the rect count and then logs [(x,y) width x height] for each rect. Those first two regions above are before and after the ToOutsidePixels call in FrameLayerBuilder::BuildContainerLayerFor. The next two are from ThebesLayerOGL::InvalidateRegion. The "painted item count" is the number of times in FrameLayerBuilder::DrawThebesLayer that we get past the cdi->mItem->GetVisibleRect().IsEmpty() check... So that looks like the valid region we end up with in ThebesLayerOGL::InvalidateRegion is correct (I drew those 6 rects out, and it's the whole viewport except a 376px wide by 16px tall strip at top right and a 243px wide by 4px tall strip in the middle). But we're still painting too much in DrawThebesLayer for some reason.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 44

•

14 years ago

OK. Looks like the aRegionToDraw passed to FrameLayerBuilder::DrawThebesLayer is too big. Typical values: region to draw: Rect count: 1 [(502,0) 745x199] region to draw: Rect count: 1 [(502,0) 745x202] region to draw: Rect count: 1 [(502,0) 745x205] etc

Boris Zbarsky [:bzbarsky]

Reporter

Comment 45

•

14 years ago

Ah, I found it! BasicTextureImage::BeginUpdate in GLContext.cpp always fluffs out aRegion to its bounding rect, with this comment: // the basic impl can only upload updates to rectangles So in BasicBufferOGL::BeginPaint we call BeginUpdate, and that sets result.mRegionToDraw to the bounding rect of the two-rect region we started with. With GL disabled, I see about 38 seconds for this testcase over here after applying the patches in this bug; with GL enabled it's closer to 60 seconds. Do we need a separate bug on this? If I disable opengl layers, then the number of items I'm seeing painted per call to DrawThebesLayer stops growing and stabilizes at 1137 (no idea why; since we're invalidating 4px strips for some reason, I would have expected 4 * 243 = 972 plus maybe a few for the text, so 1137 is still a bit high, but it's less than 5*243...).

Boris Zbarsky [:bzbarsky]

Reporter

Comment 46

•

14 years ago

Attached patch Printf debugging patch, in case I need it again — Details — Splinter Review

Boris Zbarsky [:bzbarsky]

Reporter

Comment 47

•

14 years ago

And with gl disabled my "paint-related" time is about 75% instead of 87%; most of this is display list management and such; only 4% is actual painting. There seem to be no obvious hotspots in the sense that all the display list stuff is just linear in the number of all things on the page, not just the things in the dirty area.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 48

•

14 years ago

(In reply to comment #45) > So in BasicBufferOGL::BeginPaint we call BeginUpdate, and that sets > result.mRegionToDraw to the bounding rect of the two-rect region we started > with. With GL disabled, I see about 38 seconds for this testcase over here > after applying the patches in this bug; with GL enabled it's closer to 60 > seconds. Do we need a separate bug on this? Yes, thanks. Fixing that would bring Mac in line with Windows here, and after these patches are applied, the problem boils down to "processing display items for all 59,000 visible rectangles in the window is slow".

Boris Zbarsky [:bzbarsky]

Reporter

Comment 49

•

14 years ago

File bug 621778.

Phil Ringnalda (:philor)

Comment 50

•

14 years ago

Attached patch Part 1: either unrotted or broken (obsolete) — Details — Splinter Review

This compiles, which is nice, but sensible people like a little more than "philor blindly unrotted it with absolutely no understanding of the patch or the change that bitrotted it" for things that are going to land under their name.

Attachment #499230 - Attachment is obsolete: true

Attachment #500629 - Flags: feedback?(roc)

Phil Ringnalda (:philor)

Updated

•

14 years ago

Keywords: checkin-needed

Whiteboard: [needsd landing] → [needs landing]

Timothy Nikkel (:tnikkel)

Comment 51

•

14 years ago

The reason the original didn't compile is not because it is bitrotted, but because it depends on other changes in roc's queue.

Timothy Nikkel (:tnikkel)

Comment 52

•

14 years ago

(specifically bug 602757)

Phil Ringnalda (:philor)

Updated

•

14 years ago

Attachment #500629 - Attachment is obsolete: true

Attachment #500629 - Flags: feedback?(roc)

Phil Ringnalda (:philor)

Updated

•

14 years ago

Attachment #499230 - Attachment is obsolete: false

Phil Ringnalda (:philor)

Updated

•

14 years ago

Depends on: 602757

Whiteboard: [needs landing] → [needs landing][part 1 applies on top of attachment 491151 from bug 602757]

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 53

•

14 years ago

Yes, sorry, I added checkin-need prematurely. Very sorry!

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 54

•

14 years ago

http://hg.mozilla.org/mozilla-central/rev/35013af94ec7 http://hg.mozilla.org/mozilla-central/rev/c3825c079c00

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Whiteboard: [needs landing][part 1 applies on top of attachment 491151 from bug 602757]

Damian Shaw [Quan]

Comment 55

•

14 years ago

Does this resolve the actual reported problem or just get it back to Firefox 3.6 levels? Which bug 424715 orriginal solved by making Firefox about 20x faster at this testcase than Firefox 3.6 was. Just need to know if I need to open another bug on the issue.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 56

•

14 years ago

Gets us back to Firefox 3.6 levels I guess. We're building a complete display list for the entire window on every paint to simplify the construction and update of our layer tree. That works pretty well when the window has hundreds of visible elements, not well when the window has 60,000 visible elements like here. I don't think we should be designing around this rare case.

Damian Shaw [Quan]

Comment 57

•

14 years ago

I concur with the reasoning and only hopped the absurdity of the testcase would help find underlying problems when firefox has to deal with a large number of visible elements. And as other browsers still do thisin excess 10x faster I think it is still valid in that sense, will make another low priority bug for this.

Tom Levels

Comment 58

•

14 years ago

When I was testing this, I found out that on my machine, each next test takes longer (first 3.5 seconds, second about 6 seconds, third 9 seconds, etc). Anyone can reproduce this? Or is this already a known bug?

Damian Shaw [Quan]

Comment 59

•

14 years ago

All browsers will do this as it is related to the way the code is written.

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

14 years ago

Depends on: dlbi

Damian Shaw [Quan]

Updated

•

13 years ago

Depends on: 705561

Nobody; OK to take it and work on it

Updated

•

7 years ago

Component: Layout: View Rendering → Layout: Web Painting

zipped up testcase 15 years ago Martijn Wargers (dead) 46.96 KB, application/java-archive		Details
Part 1: Don't make visible region arbitrarily complex in RecomputeVisibility 14 years ago Robert O'Callahan (:roc) (email my personal email if necessary) 970 bytes, patch	dbaron : review+	Details \| Diff \| Splinter Review
Part 2: Make SimplifyOutward smarter 14 years ago Robert O'Callahan (:roc) (email my personal email if necessary) 1.67 KB, patch	bzbarsky : review+	Details \| Diff \| Splinter Review
Printf debugging patch, in case I need it again 14 years ago Boris Zbarsky [:bzbarsky] 10.62 KB, patch		Details \| Diff \| Splinter Review
Part 1: either unrotted or broken 14 years ago Phil Ringnalda (:philor) 761 bytes, patch		Details \| Diff \| Splinter Review