Closed Bug 277762 Opened 20 years ago Closed 19 years ago

DHTML performance regression because of the patch from bug 228399

Categories

(Core Graveyard :: GFX: Win32, defect)

x86
Windows 2000
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: martijn.martijn, Assigned: emaijala+moz)

References

()

Details

(Keywords: perf, regression, testcase)

Attachments

(6 files)

This is a spin-off from bug 277044.
See the table with times in the testcase from bug 277044:

Transparent image:
Build	            relative	absolute fixed	  static
2004-11-26 7:42am   24285ms	8032ms	 16043ms  8032ms
2004-11-27 7:48am   39937ms	36222ms	 36463ms  8032ms

Non-transparent image:
Build	            relative	absolute fixed	  static
2004-11-26 7:42am   19949ms	8022ms	 10044ms  8032ms
2004-11-27 7:48am   10035ms	8031ms	 8052ms	  8032ms

The build I used for this test was a Firefox trunk build.

Also I build a non-debug build from a 2004-11-23 dated source of Mozilla, and
tested with and without the patch from bug 228399, using the automatic testcase
from this url:
http://martijn.heelveel.info/test/mozilla/slowdhtml/slow_relative4.html
Those test results are here:
http://martijn.heelveel.info/test/mozilla/slowdhtml/results_patch228399.htm
moz_20050109 = the build without the patch from bug 228399
moz_20050110 = the build with the patch from bug 228399

In the positioned fixed/transparent image case, the red block begins to move
especially slow in the build, when it is moving at the end, out of the
transparent image.
Depends on: 277766
No longer depends on: 277766
Note: this regression seems Windows-only (I don't see this effect on Linux...)
Assignee: win32 → emaijala
Flags: blocking1.8b?
Could you test a build from 2004-11-03? There were two patches in bug 228399.
The first one was checked in 2004-11-04 and fixed the actual issue. The second
one was checked in 2004-11-27 and reverted the system partially back to the old
behavior because of other performance problems (possibly the same you can see
with the non-transparent image here) the first change caused.
Ok, here are my test results. For completeness I also added the results for
Mozilla at 2004-11-05

Results from         Mozilla 2004-11-03   Mozilla 2004-11-05
relative: transpare: 8031                 8031
absolute: transpare: 8021                 8031
fixed   : transpare: 9243                 8382
static  : transpare: 8032                 8032                
relative: not_trans: 8022                 8032
absolute: not_trans: 8032                 8032                 
fixed   : not_trans: 8052                 8132                
static  : not_trans: 8032                 8032                 
relative: div_block: 8032                 8032
absolute: div_block: 8022                 8032
fixed   : div_block: 8032                 8142
static  : div_block: 8032                 8032
I took my few days old SeaMonkey trunk build and removed both patches of bug
228399. I did the test, added added the first patch, tested again, added the
supplementary patch and tested again. Here are the results:

Original:

relative: transpare: 10766
absolute: transpare: 8032
fixed   : transpare: 8022
static  : transpare: 8032
relative: not_trans: 10245
absolute: not_trans: 8022
fixed   : not_trans: 8022
static  : not_trans: 8032
relative: div_block: 8022
absolute: div_block: 8022
fixed   : div_block: 8022
static  : div_block: 8032


With patch 1:

Results:
relative: transpare: 17044
absolute: transpare: 10595
fixed   : transpare: 10956
static  : transpare: 8031
relative: not_trans: 14861
absolute: not_trans: 8713
fixed   : not_trans: 8833
static  : not_trans: 8022
relative: div_block: 8983
absolute: div_block: 8021
fixed   : div_block: 8031
static  : div_block: 8031


With patch 2 (supplementary optimization patch):

Results:
relative: transpare: 10815
absolute: transpare: 8011
fixed   : transpare: 8021
static  : transpare: 8021
relative: not_trans: 10044
absolute: not_trans: 8021
fixed   : not_trans: 8021
static  : not_trans: 8021
relative: div_block: 8031
absolute: div_block: 8031
fixed   : div_block: 8011
static  : div_block: 8021

So, the primary patch slowed the rendering down as I noticed, but the secondary
patch pushed it back to the original speed. So, for some reason my results don't
support the others. My tests were run with a debug build on Athlon XP 2700+ with
Matrox Millennium G550 at 1280x1024 32bit.
So... a few things:

1)  Looking at performance on a debug build is usually not useful
2)  The number is not all there is to this test.  The number will be at _least_
    8000-some no matter what due to how we process timeouts (at least 10ms per
    timeout).  The question is also how much CPU is being used in the process...
    on slower machines that may increase the times, but on a modern fast machine
    chances are we can still get all the processing into that 10ms timeslot with
    the simple testcase....
I've found debug build adequate for testing performance delta. 

In the test case the timeout was 20 milliseconds. I changed it to 0 (=10ms) and
the results didn't halve, so I'm now jumping to the assumption the results now
reflect the real maximum speed better. Here are the results again:

Original:

relative: transpare: 10935
absolute: transpare: 5778
fixed   : transpare: 5989
static  : transpare: 5207
relative: not_trans: 10205
absolute: not_trans: 5457
fixed   : not_trans: 5608
static  : not_trans: 5148
relative: div_block: 7681
absolute: div_block: 4947
fixed   : div_block: 4977
static  : div_block: 5087


Patch 1:

relative: transpare: 17064
absolute: transpare: 10605
fixed   : transpare: 10766
static  : transpare: 5158
relative: not_trans: 15002
absolute: not_trans: 8893
fixed   : not_trans: 8823
static  : not_trans: 5158
relative: div_block: 8993
absolute: div_block: 4927
fixed   : div_block: 5027
static  : div_block: 5017


Patch 2:

relative: transpare: 10926
absolute: transpare: 5809
fixed   : transpare: 5949
static  : transpare: 5218
relative: not_trans: 10145
absolute: not_trans: 5448
fixed   : not_trans: 5628
static  : not_trans: 5157
relative: div_block: 7791
absolute: div_block: 4937
fixed   : div_block: 5007
static  : div_block: 5058

Anyway, even if these are completely meaningless, I can't actually see why the
latter patch would make the situation worse than it was unpatched.
Could this depend on graphics card or Windows version?
Fwiw, these are the results on my WinXP computer, Duron 600MHz, NVidia GeForce2
MX 100/200 32MB, screen setting 1024*768 16bit color depth.

Results  build:      2004-11-26 2004-11-28
relative: transpare: 24676      40738
absolute: transpare: 8032       38174
fixed   : transpare: 16484      38295
static  : transpare: 8032       8032 
relative: not_trans: 16574      8042
absolute: not_trans: 8032       8032
fixed   : not_trans: 10035      8032
static  : not_trans: 8022       8032
relative: div_block: 11487      8032
absolute: div_block: 8022       8032
fixed   : div_block: 8112       8032
static  : div_block: 8032       8032
Attached file movement testcase
This is probably not the best Bug to place this comment, since it does not
relate directly to the bugs mentioned here. Since Bug #245131 Cpu utilization
has dropped dramatically. But at least since 10/14/04 some scripts are not
aethetically inferior to the branch builds. The testcase should move the dots
smoothly and with constant velocity, and thats true on the branch builds. Viewed
with trunk builds of thunderbird or firefox the movement is jerky and erratic,
compared to the branch.
If this testcase/comment should be posted elsewhere, sorry for the spam.
I am using WinXP pro sp2, P3-s 1.4 g, 512m Ram, ATI radeon 7200 64m DDR
Yes, that testcase should be placed elsewhere.  Specifically, in a bug of its
own, with more information on when you see the performance degradation start. 
Please cc me on said bug.  _This_ bug covers a specific performance regression
that happened on a specific day due to a specific patch.  It's got enough
complications with failures to reproduce that adding random unrelated comments
is very much NOT appreciated.  Please don't ever do that again.  Especially
since you KNEW that your issue was unrelated to this bug, as your comment shows.
 I've seen your name around Bugzilla enough that you should know the simple "one
bug per issue" rule by now....
Attached file What I changed
Ok, this is what I changed/removed in nsDrawingSurfaceWin.cpp (I then rebuilt
the gfx directory). These are the results in a non-debug build:
Results 	     unchanged changed:
relative: transpare: 36232     16464
absolute: transpare: 36061     16033
fixed	: transpare: 68128     24045
static	: transpare: 8022      8032
relative: not_trans: 8082      12188
absolute: not_trans: 8042      8182
fixed	: not_trans: 11957     12148
static	: not_trans: 8031      8032
relative: div_block: 8031      8032
absolute: div_block: 8031      8022
fixed	: div_block: 8031      8202
static	: div_block: 8031      8032
Interesting.  That's just backing out the optimization patch from bug 228399...
 I wonder why that patch makes things faster on some systems and slower on others?
What it comes down to is video card drivers.  Most drivers handle 24 bit
CreateDIBSection fast in any video mode.  A few, do not. 
CreateCompatibleBitmap, for the most part, is slower than CreateDIBSection, but
not always.

That's been my experience.  The really odd thing is that sometimes notching down
the graphics accelleration sometimes speeds things up.  Older nVidea cards were
especially bad for this.  Personally, I stay away from CreateCompatibleBitmap,
but then again, I currently have a nVidea card.

I played with this sort of stuff a few years ago, when I had a matrox and an
nVidea card.  It becomes really frustrating after a while.  I'll dig back and
see if I have documented anything from back then, but most of my work/testing is
probably long gone (and forgotten by me)
Well, as I said, the optimization patch reverts the functionality partially back
to what it was before the first patch as the first patch caused a quite bad
slowdown in a specific case at 16 bit and I thought it would be safer to do as
we previously did for the cases where blending isn't needed. I know that many
things are faster without the "optimization" patch, but it seems that with the
current architecture we'll always have a compromise. 

I wonder if it would be worth a shot trying to always use a dib section if the
color depth is >=24.
Btw, it's not that CreateDIBSection would be slow, it's that GetDIBits is slow.
It would be so much easier if we had the data around and wouldn't need GetDIBits.
Comment 15 sounds like a reasonable approach given comment 14.  Want to post a
patch that Martijn could take for a spin?
Attached patch Test patchSplinter Review
My results were not too encouraging. The test case slowed down considerably
with this patch, but it would be nice to see Martijn's results. 

I think we might do way better if we'd be able to keep a backbuffer around all
the time so there wouldn't be need to get the data from the device.
These are my results before and after I applied the test patch:
Results              before: after:
relative: transpare: 36463   36522
absolute: transpare: 36062   36062
fixed   : transpare: 68549   68458
static  : transpare: 8031    8031
relative: not_trans: 8092    8081
absolute: not_trans: 8042    8041
fixed   : not_trans: 12037   12047
static  : not_trans: 8031    8032
relative: div_block: 8031    8032
absolute: div_block: 8031    8022
fixed   : div_block: 8031    8032
static  : div_block: 8031    8032
It doesn't seem to make any difference.
My server is down for quite some time now, that's why I'm attaching the
testcase to this bug.
Attached file Testcase
Right, you're running @ 16 bits color depth, right? Then it doesn't make a
difference. Is it possible for you to test 24 bits mode?
Well, I can run in 32 bits mode. That seems also covered by your test patch.
These are the results:
Results:             Before:  After:
relative: transpare: 72855    22592
absolute: transpare: 72114    20179
fixed   : transpare: 141664   36062
static  : transpare: 8032     8032
relative: not_trans: 12098    17565
absolute: not_trans: 8112     16133
fixed   : not_trans: 16203    28051
static  : not_trans: 8031     8032
relative: div_block: 8061     8022
absolute: div_block: 8031     8022
fixed   : div_block: 8021     8042
static  : div_block: 8021     8032
So in that case, it seems improve things quite a bit.
If it only didn't make the non-transparents so much worse..
too late for 1.8b1, transferring request to 1.8b2
Flags: blocking1.8b?
Flags: blocking1.8b2?
Flags: blocking1.8b-
*** Bug 255648 has been marked as a duplicate of this bug. ***
There's been several image drawing bugs created over the last little while
related to slow drawing on 1 bit transparent images on nVidia cards.  I dup'd
them all to one bug, then dup'd that one here.

The patch fixes the problem.  However, from the looks of it, drawing ontop
anything that needs blending will be slow for nVidia cards.  But I suppose in
the future if ::AlphaBlend can be incorperated (as mentioned in the other bug),
then we will be cooking.

Just for the record, I'll paste my comment from Bug 255648:
It appears using CreateDIBSection to create the surface instead of
CreateCompatibleBitmap slows down nVidia's (on card?) optimization routines
(namely StretchDIBits).
Blocks: 255648
Sorry, that should read:
It appears using CreateCompatibleBitmap to create the surface instead of
CreateDIBSection slows down nVidia's (on card?) optimization routines
(namely StretchDIBits).

In other words:
CreateDIBSection + nVidia StretchDIBits() = ok
CreateCompatibleBitmap + nVidia StretchDIBits() = slow painting
nVidia Card; Win2k; I removed all the div_block results since they hovered
around 8031

a) Without "Test Patch" (Uses CreateCompatibleBitmap):
relative: transpare: 53593
absolute: transpare: 52094
fixed   : transpare: 52187
static  : transpare: 8031
relative: not_trans: 8032
absolute: not_trans: 8031
fixed   : not_trans: 8031
static  : not_trans: 8031

b) With "Test Patch" (Uses CreateDIBSection)
relative: transpare: 14750
absolute: transpare: 13437
fixed   : transpare: 13391
static  : transpare: 8047
relative: not_trans: 13750
absolute: not_trans: 13359
fixed   : not_trans: 13328
static  : not_trans: 8031

c) Without "Test Patch", but with last patch of Bug 205893 backed out:
relative: transpare: 8031
absolute: transpare: 8031
fixed   : transpare: 8031
static  : transpare: 8032
relative: not_trans: 8031
absolute: not_trans: 8031
fixed   : not_trans: 8031
static  : not_trans: 8032

d) With "Test Patch", last patch of 205893 backedout:
Results:
relative: transpare: 12891
absolute: transpare: 12516
fixed   : transpare: 12516
static  : transpare: 8031
relative: not_trans: 12750
absolute: not_trans: 8015
fixed   : not_trans: 8188
static  : not_trans: 8015


I bet you like c ;)  The last patch in Bug 205893 changed image optimization
from CreateDIBitmap (which create a DDB) to CreateDIBSection (which creates a
DIB).  The premise of the patch in Bug 205893 was that DIB sections are just as
fast as DDBs.

Just backing out the last patch of Bug 205893 will not solve everything. 
Currently we have a 0x20000 minimum size limit before optimizing an image.  When
an image is smaller than that, we'll get results like we do in a), which is
horribly slow, or with test patch, b) which is moderately slow.

The best solution seems to be always have an image of any size optimized, and
use CreateCompatibleBitmap (no "test patch").  The best solution is also the
most complex, as in order to have all images optimized, we need some sort of
HBITMAP caching mechanism.  See
http://lxr.mozilla.org/seamonkey/source/gfx/src/windows/nsImageWin.cpp#1443

To update my statement on Comment #29:
Surface               | Image  | Result
---------------------------------------------
CompatibleBitmap(DDB) | DIB    | VERY Slow
CompatibleBitmap(DDB) | DDB    | Fast
CompatibleBitmap(DDB) | None   | VERY Slow
DIBSection            | DIB    | Slow
DIBSection            | DDB    | Not as Slow
DIBSection            | None   | Slow
Martijn, Ere, others, could you run the test with the latest nightly?
Depends on: 284716
Ok, results with a 2005-3-4 and a 2005-3-6 trunk build. On a 600MHz Duron,
Geforce 2MX 100/200 at 16bits color depth.
Results              03-04:  03-06:
relative: transpare: 41860   8031
absolute: transpare: 36492   8031    
fixed   : transpare: 37303   8031
static  : transpare: 8031    8031
relative: not_trans: 8312    8031
absolute: not_trans: 8032    8031
fixed   : not_trans: 8032    8031
static  : not_trans: 8032    8031
relative: div_block: 8032    8021
absolute: div_block: 8022    8031
fixed   : div_block: 8032    8031
static  : div_block: 8032    8031
The results in 32 color depth are the same for the 3-6 build (the 4-3 build is
even slower than in 16b color depth). Also, I don't see any cpu strain anymore
in the 3-6 build.

Thank you!!! I would call this bug fixed, not? :)

Minor nit, when dynamically changing the color depth and then running the test,
the test becomes very slow. I have to restart the browser to make it fast again.
Fixed by checkin for bug 284716
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Isn't that a WFM instead of fix?
wfm is when it magically works and you don't know why.  We _do_ know the exact
code change that fixed this bug, though.
Flags: blocking1.8b2?
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: