Closed Bug 637874 Opened 13 years ago Closed 12 years ago

Chrome does 2.5x more mutations per second on image evolution

Categories

(Core :: Graphics, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: dmandelin, Assigned: BenWa)

References

()

Details

Attachments

(1 file)

See the linked URL. Firefox does 60 mutations per second, Chrome 155. I strongly suspect this is because of the 10ms min on setTimeout delays (that corresponds to 60 Hz, and Chrome uses 4ms, which corresponds to 150Hz).
I see 117 mutations per second in Chrome 11, and 78 in my current m-c opt build.

If I change the "dom.min_timeout_value" preference to 4, so that our clamp on timeouts is only 4ms, I see about 80 mutations per second.  So not much better....

Also note that while the 10ms thing will definitely keep us from going over 100Hz, the fact that we aren't even reaching 100Hz tells us we're CPU-bound.
Profile on Mac says:

  19% painting
   8% mac event loop and paint stuff overhead
  17% tracejit-generated code
  46% under fill() on a canvas
   2% getImageData on a canvas
   3% setting style on canvas
   1% js::mjit::stubs::add (adding non-strings to strings?)
   1% mjit-generated code
   2% fillRect, lineTo, moveTo, beginPath, closePath on canvas

There's no DOM anywhere here, really.  Graphics and JS all the way, mostly graphics.  ;)
Component: DOM: Events → Graphics
QA Contact: events → thebes
Note that the numbers above may look very different on Windows with d2d...
On Windows with D2D, Chrome is just over twice as fast.
Paging Bas!
More data: 

On my Win 7 machine with D2D Chrome is actually 3.2x faster than Firefox. On my Win XP machine without D2D Chrome is 2.1x faster. So, D2D doesn't seem to help at all here.
(In reply to comment #6)
> More data: 
> 
> On my Win 7 machine with D2D Chrome is actually 3.2x faster than Firefox. On
> my Win XP machine without D2D Chrome is 2.1x faster. So, D2D doesn't seem to
> help at all here.

I'm pretty sure bz has most of it comment 1: Chrome has a smaller minimum timeout.
No.  It doesn't.  Our minimum timeout matches theirs at this point, changing the timeout to match theirs back then didn't change our numbers at all, and our number was way lower than if it were gated on the minimum timeout.  This is a pure performance "we're using too much CPU" bug.
The problem here is most likely getImageData. GetImageData is particularly hard for any hardware accelerated system because it will have to block on the GPU completing any commands in the pipeline in order to readback. Doing this many times a second causes serious performance issues.

There's no 'easy' solution here, a possibility is trying to detect situations where getImageData is used a lot, and fall back to software there. But such heuristics are tricky. On larger images we should be doing a lot better since the situation will become more fillrate bound where the GPU actually gives us a win. Part of the problem here is that due to the tiny image the GPU simply isn't doing a lot for us either in this situation.
Oh, I should note IE9 does a lot worse than us. But the IE10 platform preview does slightly better. I'm unsure at this point what IE10 does for getImageData, but it is possible they implement such heuristics as described, or found another way to improve the situation.

If I were to solve this in C++ I'd create one thread which fires the mutations to the GPU, and one which waits for data readback to have finished on each frame and performs the analysis, if the GPU gets too far ahead of the analysis thread, it waits, but the analysis thread would generally not have to wait after receiving its first readback.

Of course I don't how such an asynchronous readback technique could be applied to this demo.
Would the readback situation apply to Mac too?  Or is that something else?  Generally getImageData has been fast on Mac...
I get 80 on Aurora 9.0a2 vs. 150 on Chrome on Mac.
OS: Windows 7 → All
On windows XP I get 45(with HWA) and 50(without HWA) and on chrome I get about 60.
On Mac, I get 135 on Nightly vs. 200 on Chrome on Lion, on the same hardware as BenWa (who's on Snow Leopard).

On Win 7, I get 68 on Nightly vs. 20 on Chrome.
I get 180 mutations when viewed on my external monitor and 60 when viewed on my laptop monitor. We might be hitting some form of vsync here, at least on some cases.
I made a variant of the page above to work with a display-none canvas and update the results every 800 ms or when a new best match is found (you can change the timeout in javascript).

This decouples the execution speed from our frame rate.

With this simple change I get from 60 to 200 iterations on Firefox and 180 on Chrome Canary (run with --allow-file-access-from-files to allow cross origin canvas readback).

I think we should close this as INVALID. Can someone verify this on windows with my test?
Assignee: nobody → bgirard
Status: NEW → ASSIGNED
With some optimization of setInterval to push mutations above 220+/second we become limited by garbage collection.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: