Closed Bug 628582 Opened 13 years ago Closed 7 years ago

IE Test Drive Blizzard benchmark is slow

Categories

(Core :: Graphics, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: kael, Unassigned)

References

(Blocks 1 open bug, )

Details

(Whiteboard: ietestdrive)

The Blizzard canvas demo from the IE9 site ( http://ie.microsoft.com/testdrive/HTML5/Blizzard/Default.html ) is extremely slow in FF4 compared to IE9.

Rough snowflake scores from the work thinkpad:
IE9: ~3000 snowflakes
FF4: ~500 snowflakes

Home desktop:
IE9: ~3300 snowflakes
FF4: ~300 snowflakes*

* Note: I think something strange is going on here, probably bad ATI drivers - because the framerate is terrible unless I shrink the FF4 window down to a certain size, at which point it suddenly speeds up

Some basic investigation shows that this benchmark renders a few bitmaps and rotates them around to draw the snowman, and the snowflakes are all blitted from a source texture atlas using globalAlpha to fade them in/out.

DirectX call traces show that IE9 aggressively batches the rendering operations for this benchmark, never sending more than 60 draw calls to the GPU. It is hard to tell whether they achieve this through some built in Direct2D functionality or not.

In FF4 call traces show that each individual snowflake is being drawn as a separate D3D primitive rendering operation, and may in fact be handled using a scratch surface of some sort - I can't be certain there because the D3D call trace tools tend to fall over under the strain of recording the ~3000+ primitive draws we perform every frame when running this benchmark with hardware acceleration enabled.

So far my best guess is that something about the way they're rendering prevents Direct2D from batching our draw calls up and being efficient, because other simple drawImage based canvas demos are intelligently batched into a few large primitives by Direct2D.

My prime suspect is globalAlpha, but I wasn't able to confirm this by reading through the d2d backend for cairo, so I plan to do some debugging to try and confirm it.

It may also be the fact that they do a state save/restore on the canvas context around each snowflake - that could be forcing a D2D state change and preventing batching from occurring.
Whiteboard: ietestdrive
Minor note: Determined why it was slower on my home desktop; for some reason FF4 was set to use D3D9 acceleration instead of D3D10 (I think I swapped it over to deal with a crash in an older video driver). Switched it to D3D10 and the performance went up to a more reasonable value - 450 snowflakes at full desktop size.
The core of this "benchmark" is:

  context.save();
  context.globalAlpha = this.currentAlpha;
  context.drawImage(imgSnowflake,
     (20 * this.snowflakeIndex), 0, 20, 20,
     this.currentX, this.currentY, this.currentWidth, this.currentHeight);
  this.snowstorm.context.restore();

Specifically, it's a subrect source blit with a scale and non-integer translate (currentWidth/Height aren't 20).  This is similar to bug 600410, which is about SpeedReading, though it uses a slightly different drawImage form.

DrawImage creates an EXTEND_NONE pattern; in SpeedReading, a partial fix that got us most of the perf back was to make 1:1 integer-aligned blits go through the GPU memcpy path, because that's largely what that benchmark did.  This one doesn't.  (Note the trend here -- this is literally a benchmark of *one* function, with lots of window dressing, and one we happen to be slower at, just like SpeedReading was.)

On my laptop, the base benchmark gets around 450 snowflakes.  If I comment out the globalAlpha setting, I go up to 1250.

  450 - base
 1250 - no globalAlpha setting (leave it at the default 1.0)
 1400 - no globalAlpha, no save/restore (note frequent GC pauses at this speed)

I can't easily get rid of the EXTEND_NONE, but hacking DrawImage to set EXTEND_PAD might be interesting to see if it brings back the rest of the perf.
No longer blocks: 626277
My results:

IE9 ~1600 snowflakes, music and "different" font on the footnote
GC24 ~1600 snowflakes, music and "regular" font
Nightly 21 ~2100 snowflakes, no music and "regular" font

So the benchmark speed is fixed on my configuration (Win 7 D2D + D3D10)!
18 months later:

IE 11: 2600 snowflakes
Chrome 36: 700 snowflakes
Nightly 34 (OMTC ON): 1700 snowflakes
Nightly 34 (OMTC OFF): 2100 snowflakes
Firefox 31 (release): 2100 snowflakes
Firefox 32 (beta): 2050 snowflakes
I get the same result on Chrome 59, IE 11, Firefox 54 and Nightly. I think they all get capped at 60 fps (draw time = 16.6 ms).
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.