Closed Bug 364221 Opened 16 years ago Closed 15 years ago

[cairo] bad page scrolling performance with large background images


(Core :: Graphics, defect, P2)






(Reporter: phiw2, Assigned: vlad)




(Keywords: perf, regression)


(1 file, 2 obsolete files)

test case:
(testcase from bug 328380, which was about the same issue on Windows)

Some numbers:
Minefield (cairo build 20061217): 31609ms
Minefield (non-cairo build 20061112): 11218ms
Camino (cairo 2006121502 1.2+): 30827ms
Camino (non-cairo build 1.2+): 8769ms

Firefox 2.0 (build 2006101022): 7613ms

Opera 9.10: 5563ms
Safari (Webkit build r18270): 10789ms

All tests performed on 10.4.8, ppc (Powerbook G4 1.5Ghz. 1.5Gb Ram)
testcase loaded from localhost.
No longer blocks: 363757
Blocks: 334729
Some more numbers:

Camino (Home-grown Intel-only, 20070102): 29666-29692ms.

WebKit (rev. 18516): 9879-9889ms.

Range of 5 page loads from localhost. MBP 2.0GHz CoreDuo, 2GB RAM.
Flags: blocking1.9?
Assignee: nobody → vladimir
Flags: blocking1.9? → blocking1.9+
Is this what's causing the extremly slow scrolling at the top of ?
(In reply to comment #2)
> Is this what's causing the extremly slow scrolling at the top of ?
Almost certainly.
That site use a 2400X600px background-image, just below the header
Target Milestone: --- → mozilla1.9beta1
same here i guess:
the site has a 2400px × 3000px background image
bug 383960 (Cairo upgrade) should improve performance massively - the 20070610-18 Minefield build (when Cairo 1.4.8 was first checked in) is more than twice as fast on the testcase.
from an average of 31000ms to an average of 13500ms here.
After the landing of bug 383960 - using 10.4.10, ppc:
Minefield (build 2007080222): 17787ms
Camino (build 2007080217): 18499 ms
Average from 5 runs of the testcase in comment 0.

The improvement is slightly less significant than what I had mentioned earlier in comment 5.
The one difference to note is that comment 5 was done with builds made with GCC3.3/OSX10.3SDK, whereas the latest test is done with GCC4/OSX10.4uSDK (the default now for PPC builds).
I suspect  there is something in the OSX10.4uSDK that makes Cairo perform slower on PowerPC Macs under that build configuration.
Target Milestone: mozilla1.9 M8 → ---
Some notes:

In fx2, this falls into argb32_image/argb32_image_mark_RGB32 -- in fx3, we end up in argb32_image/sseCGSBlendXXXX8888.  I can't figure out how to get us to take the faster argb32_image_mark_RGB32 path, but I'd guess that it has something to do with fx2 using QD contexts and the like for drawing everywhere; perhaps the backing buffer for the window is RGB24 and not RGB32 like it would be under Cocoa.

However, when the testcase is run without chrome, in an 800x900 window (to avoid tiling), I get 5900ms in fx2, and 6800ms in minefield.  With chrome, again in an 800x900 window, I get 6400ms in fx2, and *9400*ms in minefield.  That's a huge hit from having the chrome enabled in minefield; 5900->6800ms is explainable with widget changes, and possibly due to using the (potentially more expensive) SSE blend function instead of mark_RGB32, but 6400->9400 is ridiculous.  I have no idea where those 3000ms are going.
The gif image in the test case has a 'tpixel' set (index=255), but the image itself doesn't have transparancy. So nsGIFDecoder2.cpp will set format to RGB_A1 because of the tpixel, which causes gfxImageFrame/nsThebesImage to uses ImageFormatARGB32 for this image. 
So, fixing the gif decoder to detect for such cases (a defined but not used tpixel) may help some.

A quick (and ugly) test showed improvement from ±10 seconds to ±4 seconds!!

The patch for the GIF image decoder is simple, but just doing SetHasNoAlpha() after the image has been created is not enough, as the format change is not passed to the Cairo level, so Cairo will still assume ARGB32...

So, we really need to a good way change the format from ARGB32 to RGB24 of an existing gfxImageSurface, including telling Cairo that the format has changed.
For this cairo-image-surface.c needs to provide a function for this effect.

And note, this also means that SetHasNoAlpha also doesn't have the desired effect for other images (PNG's particulary).
This is only the detection of the no alpha case in GIF images.
We still need to make SetHasNoAlpha more effective (and rename to more sensible?)
Hmm, you're right -- I must've broken things when I "cleaned up" the patch, ugh.  Ok, I'll fix things up and get it working correctly again.
Attached patch potential fix (obsolete) — Splinter Review
Okay, here's a fix.  It adds the missing transparency checks in the GIF decoder, and makes the right things happen on Optimize when SetHasNoAlpha is called.
Attachment #287725 - Flags: review?(pavlov)
Some more digging results: For windows, the Optimize() in nsThebesImage only
works for images < 1024*1024, and the image in the test case is 1280x960.
So, that image is not optimized, even if SetHasNoAlpa is set. So, even with
this patch, scrolling with large (>1000x1000) background images will still be
slow, but for other platforms it is probably faster.

As far as I can see in bug 355548 the limit of 4*1024*1024 per image has been
selected rather empirical, so bigger values could be acceptable?
(In reply to comment #11)
> Created an attachment (id=287725) [details]
> potential fix

I tried to apply this patch (OS X, PowerPC), but failed:
> patching file modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp
> Hunk #2 succeeded at 393 (offset 9 lines).
> Hunk #3 FAILED at 454.
> 1 out of 3 hunks FAILED -- saving rejects to file modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp.rej

A related slowdown when running the testcase here:
hiding the scrollbar (by moving the window partly offscreen) improves the speed quite a bit: on my (aging...) PowerBook G4/OS X 10.4.10, that makes for a 3 seconds difference, going from ~17,000ms to ~14,000ms (with scrollbar offscreen).
(there is bug 398137 on similar issues)

Yeah, patch broke after the 8-bit GIF decoder stuff landed.
Attached patch V2: Fix bitrotSplinter Review
With this patch the result is 5.2sec in Windows as opposed to 10s (on my Thinkpad T41).

Some notes: 
1. The 'mSawTransparency' is only needed for the first frame, as that frame is the only frame that is optimized, and the following frames are handled differently anyway. 
2. The mSawTransparency detection loop can just check for pixel values equal to zero, as all transparent pixels in GIF frames are always set to 0. And countdown is faster and countup in the loop, as the loop then doesn't need to check for a class member variable for the end check.
Attachment #287659 - Attachment is obsolete: true
Attachment #287725 - Attachment is obsolete: true
Attachment #287810 - Flags: review?(pavlov)
Attachment #287725 - Flags: review?(pavlov)
With current MicroB engine (based on GTK2 backend) on N800 this testcase URL takes about 6344ms...

On Trunk mozilla based MicroB on N800 it takes ~44816ms....

(In reply to comment #15)
> Created an attachment (id=287810) [details]
> V2: Fix bitrot

Hmm. That didn't have the desired or expected results on my PowerBook (OS X 10.4.10 PowerPC). A patched build was actually *slower* than an unpatched build, going from an average of 17000ms to an average of 20000ms, over 5 runs.
Both builds are otherwise identical.
Oh, after fixing testcase s/200/300/ it shows on N800 60641ms(Cairo) vs 6433(Gtk)...

For me too it looks like with  patch it little bit slower... 
If it's a win on x86 we'll probably still take it, we can always comment out the format-change optimization on PPC or others (just the but in nsThebesImage::SetHasNoAlpha).  I'm guessing that there might be something in there with byte order, depending on how 10.4 stores images in memory on PPC; for the maemo issues, we'll have to look at that, but I guess there are/used to be known issues for doing RGB24->ARGB32 operations due to a bug in X and/or pixman.
ups, previous measurement was wrong...

16340ms(Cairo) vs 6433(Gtk)...

so it went from 44816ms (without patch) to 16340ms (with patch)?
No, I was experimenting with xlib/image surfaces and did some broken build which was working very slow and shows 44816ms....

it went from 44816ms to 16340ms after removing that experimental hacks, sorry...


Can you provide timings before/after patch on the current trunk, with out any other changes? I am interesting whether this patch by itself does improve anything. 
Attachment #287810 - Flags: review?(pavlov) → review+
FWIW, this patch maybe is no as bad as I thought based on my first tests above in PowerPC Macs.

I made two identical Camino trunk builds (Checkout 20071112~15:30PDT) and ran the test above with
G4 - PowerBook 1.5Ghz, 1.5Gb ram
G5 - iMac 1.9Ghz, 1.5Gb ram

      nopatch      patch
G5    ~ 13,790     ~ 13,503.6
G4    ~ 17,112.2   ~ 17,096.6
average over 5 runs, empty cache, default profile
Attachment #287810 - Flags: superreview?(tor)
So, for Windows this is a big improvement (from 10 to 5.2 seconds).
For Mac, this is not much an improvement (but, not worse either).
For the N800, we need more tests as it only compares FF2 with FF3+patch, and not the FF3-nopatch versus FF3+patch... 
So, what can people report from the Linux front?
Attachment #287810 - Flags: superreview?(tor) → superreview+
Who can do the checkin for me?
Keywords: checkin-needed, perf
vlad has commit access, so he can take care of this himself. If he wants somebody else to land it for him, he can personally add the checkin-needed keyword.
Keywords: checkin-needed
Oh, I thought vlad wrote the final patch. Ignore my last comment. Sure, I'll check this in for you.
Reassigning to Alfred since two of the three patches are his (majority rules!). Feel free to correct this.

Landing the patch now.
Assignee: vladimir → alfredkayser
OS: Mac OS X → All
Hardware: Macintosh → All
Target Milestone: --- → mozilla1.9 M10
Checking in gfx/src/thebes/nsThebesImage.cpp;
/cvsroot/mozilla/gfx/src/thebes/nsThebesImage.cpp,v  <--  nsThebesImage.cpp
new revision: 1.52; previous revision: 1.51
Checking in gfx/src/thebes/nsThebesImage.h;
/cvsroot/mozilla/gfx/src/thebes/nsThebesImage.h,v  <--  nsThebesImage.h
new revision: 1.17; previous revision: 1.16
Checking in gfx/thebes/public/gfxPlatform.h;
/cvsroot/mozilla/gfx/thebes/public/gfxPlatform.h,v  <--  gfxPlatform.h
new revision: 1.26; previous revision: 1.25
Checking in gfx/thebes/src/gfxPlatform.cpp;
/cvsroot/mozilla/gfx/thebes/src/gfxPlatform.cpp,v  <--  gfxPlatform.cpp
new revision: 1.28; previous revision: 1.27
Checking in modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp;
/cvsroot/mozilla/modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp,v  <--  nsGIFDecoder2.cpp
new revision: 1.87; previous revision: 1.86
Checking in modules/libpr0n/decoders/gif/nsGIFDecoder2.h;
/cvsroot/mozilla/modules/libpr0n/decoders/gif/nsGIFDecoder2.h,v  <--  nsGIFDecoder2.h
new revision: 1.32; previous revision: 1.31
Closed: 15 years ago
Resolution: --- → FIXED
I wouldn't be so fast in closing this bug. I originally opened it for Mac OS X issues.
Apparently the patch fixes issues on Windows, maybe on Intel Mac (can't test those), but definitely *not* on Mac PowerPC.
A post patch Minefield build just runs as slow as ever on my side (see also my comments above (comment 17, comment 24).

Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O 10.4; en-US; rv:1.9b2pre) Gecko/2007112702 Minefield/3.0b2pre
Resolution: FIXED → ---
At least the patch above makes it so that non-transparent images are really detected as such. 

This needs further investigation whether the following line does result in the desired effect (that the most optimal imageSurface is created for the image):
302         mOptSurface = gfxPlatform::GetPlatform()->OptimizeImage(mImageSurface, mFormat);

I don't have access to a Mac, so won't able to able with Mac specific issues here.
Assignee: alfredkayser → nobody
I'm going to close this bug as the bulk of the work here is fixed -- please file a new bug for Mac PPC issues, however.
(In reply to comment #33)
> I'm going to close this bug as the bulk of the work here is fixed -- please
> file a new bug for Mac PPC issues, however.

(sigh). OK, I filed bug 405894 for the Mac PowerPC issues.
Please transfer blocking flags, thanks.

I'll mark this one as fixed then.
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
On intel mac:

Build 2007112904: 18783ms
20010: 5754ms

So showing a little over 3x worse for Trunk vs branch - re-opening bug...
Resolution: FIXED → ---
(In reply to comment #35)
> On intel mac:
> Build 2007112904: 18783ms
> 20010: 5754ms

Your Intel Mac is even slower than my Powerbook G4, ppc ??

For reference, I repost the latest numbers I have (posted in bug 405894#c0)
> Gecko 1.9     Gecko 1.8     Safari 3.04
> 17220         7417          10089
> 17280         7347          10003
> 17070         7475          9906
> 17056         7311          9907
> 17236         7533          9895
> -----         -----         -----
> 17,172.4      7,424.6       9,960 (average over 5 runs)
> Gecko 1.9: 2007112816 Minefield/3.0b2pre
> Gecko 1.8: 20071128 BonEcho/
(In reply to comment #36)
> (In reply to comment #35)
> > On intel mac:
> > 
> > Build 2007112904: 18783ms
> > 20010: 5754ms
> Mike,
> Your Intel Mac is even slower than my Powerbook G4, ppc ??

Looks that way - is an Intel Mac Mini.

Vlad/Pav you want this in this bug or in bug 405894
Assignee: nobody → vladimir
(In reply to comment #37)
> (In reply to comment #36)
> > (In reply to comment #35)
> > > On intel mac:
> > > 
> > > Build 2007112904: 18783ms
> > > 20010: 5754ms
> > 
> > Mike,
> > Your Intel Mac is even slower than my Powerbook G4, ppc ??
> > 
> Looks that way - is an Intel Mac Mini.
> Vlad/Pav you want this in this bug or in bug 405894

Vlad swung by - looks like I had the window size large enough to cause the image to tile.  If I re-size the window my 1.8/1.9 numbers are much much closer.

I have a patch that uses a 10.5-only API if it's available to improve the tiled case significantly as well; gonna get it into upstream cairo and try to get it into our code before b2.
Pretty sure this is fixed one way or another at this point, especially on the Mac.
Closed: 15 years ago15 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.