Closed Bug 371867 Opened 17 years ago Closed 16 years ago

Severe performance regression on FAA flight status page


(Core :: Graphics, defect, P2)






(Reporter: bzbarsky, Assigned: vlad)




(Keywords: perf, regression)


(3 files, 2 obsolete files)

BUILD: Current trunk build (Seamonkey and Firefox have the same behavior; I just don't have old Firefox builds).

1)  Load
2)  Click on the "Select a Major Airport" combobox on the top right
3)  Try to use the keyboard to scroll.

ACTUAL RESULTS: Combobox takes 3-5 seconds to show up.  Scrolling takes 2-4 seconds per item.  The page is basically unusable, in addition to which it makes other apps unusable too (see below).

EXPECTED RESULTS:  Do all that much faster.

I see the expected results in a 2007-01-31-01 build, and the regression in a 2007-02-01-01 build.  In between those, Seamonkey flipped on cairo.  Given that the CPU usage seems to be in X, not in Gecko (so jprof shows nothing useful, even with -sync), this seems more likely than any of the actual checkins to cause this problem...

Note that this is likely to be Linux-only and depend on the exact X server version.  If there's any more info I can provide, I would be happy to do that.

Also note that it's clearly _possible_ to be a lot faster here.  We were on Jan 31...
Flags: blocking1.9?
Attached file testcase
It seems that absolutely positioned transparent images are the cause of the slowdown.  The original page has 43 (this testcase has only 6).

With a cairo build on linux, the page tends to take a few seconds to respond to most user actions (selecting the dropdown, scrolling), although once I start scrolling, it scrolls quickly.
Which X server version and driver?
I'm running FC6's Xorg 1.1.1 and ATI Radeon driver 6.6.3
Andrew, could you possibly install the debuginfo RPMs for X and try sysprof on these testcases?

And thank you for confirming that it's not just me!  ;)
Sysprof indicates that ~90% of the time is spent in fbCompositeSrc_8888x8888mmx (with my testcase).
With what callstack?
/me figures out how sysprof works

The full stack (back to Xorg's main) where all the fun happens is:
Hmm...  If you run Mozilla with -sync, do you get a stack back into mozilla code?
I guess we're recompositing all those images on top of each other... but I wonder why that ends up being so slow on the X side.  6 images is not exactly a lot.
> Hmm...  If you run Mozilla with -sync, do you get a stack back into
> mozilla code?

No, it's always back to Xorg.
Attached file seamonkey stack
Actually, the Mozilla side is pretty easy.  I just started with --sync and broke in with gdb while it was repainting.
Flags: blocking1.9? → blocking1.9+
Target Milestone: --- → mozilla1.9beta1
Try adding

  Option "XAANoOffscreenPixmaps" "true"

to the Device section in xorg.conf.
For what it's worth, vlad's suggestion improved performance here a good bit for me (less than 0.5 seconds to change combobox items, instead of 3-4 seconds).

Still slower than pre-cairo, but at least it's usable....
not sure what else we can really do here.  this sounds like a cairo/X server bug more than anything.
Priority: -- → P5
I'll buy that it's a cairo bug, perhaps.  In which case we should try to get them to fix it, if at all possible.  It's really nice to be able to get flight status and all.

In general, I seem to recall us working around library or OS bugs that led to serious issues for our users.  Not sure how feasible that is in this case.
Target Milestone: mozilla1.9 M8 → ---
this is a little slow on my fc6 under parallels but certainly usable.
How does it compare to 1.8 performance?  Note that your hardware is likely faster than mine (dunno how it compares to our typical user)...  and that I bet scrolling to the bottom of the list is still a lot more painful than on 1.8, even on your machine.
Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9a9pre) Gecko/2007110604 Minefield/3.0a9pre ID:2007110604

Can confirm that trunk perf is noticbly worse than branch on Mac (build above)
Copying the higher priority from the dup bug.  Though it's not quite the same steps to reproduce, so it's remotely possible they're separate issues.  At first blush, I doubt they are.
Priority: P5 → P2
Note that the duplicate bug has a regression date that appears NOT to be when cairo was turned on.  (Though before when cairo was turned on for SeaMonkey nightlies.)
Er, never mind, it does match when cairo was enabled (although about:buildconfig seems to be the only way to tell).
Assignee: nobody → vladimir
Okay; the way that this is structured is a map image, and then about 2 dozen of the same image layered on top of it, which are just used for imagemaps.  That image, you might notice, is -- which is a 692x455 fully transparent gif image.  The reason why this worked quickly before is that we were doing 1 bit mask transparency rendering before for GIFs.  That's one piece of functionality that we lost (there's no notion of 1-bit transparency in the new world), so we end up doing full 8-bit alpha blending for each image.  On linux especially, this has the potential to be really expensive.

I can fix this and similar cases by extending our image detection stuff -- we already detect when ARGB32 images should be kept as RGB24 (in the JPG case); I can extend this to also detect when the images are a constant color, and kick in an optimization in that case.  Does that seem useful?  It would fix this particular testcase, but I don't know how general that approach is on the web, and it wouldn't kick in if, say, you had just 1 pixel that was different than the others on an image.

Then again, given imagemaps have to go on images, fully transparent images are probably not that uncommon.
Fully transparent spacer GIFs sized to a pretty large size are not that uncommon.
Patch.  Gonna do some perf tests to make sure the effect isn't noticeable on normal images, but I don't think it will be -- we should abort pretty early on out of that loop on real images.
Comment on attachment 299957 [details] [diff] [review]
apply same-pixel optimization to all images, not just 1x1

given that the image data is guaranteed-aligned you could just loop through the whole image rather than going line by line....  you could also do this in parallel with some mmoy love.
Attached patch updated (obsolete) — Splinter Review
Updated to do single loop; I'm not sure if using MMX/SSE here is worth it, because I think this type of thing should already be very fast -- and to get it faster we'd have to complicate this code a good chunk (at least two paths).  This patch gives me a very visible win on the testcase in this bug.
Attachment #299957 - Attachment is obsolete: true
Ok, updated, now also correctly frees old surfaces so this could be a good memory win as well.
Attachment #299966 - Attachment is obsolete: true
Attachment #300523 - Flags: review?(pavlov)
Attachment #300523 - Flags: review?(pavlov) → review+
Checked in.  Reopen please if this is still bad, but I expect it to be significantly better.
Closed: 16 years ago
Resolution: --- → FIXED
Backed this out (twice or maybe even trice) to figure out whether it's the cause for the orange, please re-land this once the tree is green again.
Resolution: FIXED → ---
Fixed; reftest failures were unexpected passes on win32, see my comments in bug 371316 comment 8.
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
That patch helps a ton.  It's quite usable now.  Thank you!
Depends on: 429915
You need to log in before you can comment on or make changes to this bug.