646299 - Regression SVG NoChrome increase 44.5% on Nokia n900 mobile, reported 4:40PM 3/29/2011

Reporter

Description

•

13 years ago

Quoting a post to dev.tree-management:

On 03/29/2011 04:40 PM, nobody@cruncher.build.sjc1.mozilla.com wrote:
> Regression :( SVG NoChrome increase 44.5% on Nokia n900 mobile
> --------------------------------------------------------------
>      Previous: avg 5750.005 stddev 18.072 of 30 runs up to revision d402e973b4c9
>      New     : avg 8309.736 stddev 10.484 of 5 runs since revision d402e973b4c9
>      Change  : +2559.731 (44.5% / z=141.638)
>      Graph   : http://mzl.la/fPYxv7
> 
> Changeset range: http://hg.mozilla.org/mobile-browser/pushloghtml?fromchange=d402e973b4c9&tochange=d402e973b4c9

That changeset range is empty (& hence useless), because apparently our automated-changeset-range-builder only includes csets from mobile-browser, whereas this regression was really (presumably) caused by a change in mozilla-central.

From looking at the graph linked in the message, the values were all pretty consistently in the ~5700 range, and then at some point late today, they spiked up to ~8300.  In particular...
* The "last good" value is 5739.23, which (from cross-referencing on the Mobile tinderbox) was built from http://hg.mozilla.org/mozilla-central/rev/af61c4752e53

* The "first bad" value is 8309.82, which (from cross-referencing on the Mobile tinderbox) was built from http://hg.mozilla.org/mozilla-central/rev/4d7a0a6dd613

So that gives us this pushlog for the regression range:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=af61c4752e53&tochange=4d7a0a6dd613

That includes zero SVG changes, but a few significant platform changes -- in particular: a pixman update, libjpeg-turbo landing, and a 220-cset tracemonkey merge.

Jeff Muizelaar [:jrmuizel]

Comment 1

•

13 years ago

Probably the pixman update. I'll take a look tomorrow.

Joe Drew (not getting mail)

Comment 2

•

13 years ago

Did this show up on try?

Daniel Holbert [:dholbert]

Reporter

Updated

•

13 years ago

OS: Linux → Maemo

Hardware: x86 → ARM

Jeff Muizelaar [:jrmuizel]

Comment 3

•

13 years ago

(In reply to comment #2)
> Did this show up on try?

Maybe, try results for maemo don't show up in tbpl or in try chooser so I forgot to look.

Jeff Muizelaar [:jrmuizel]

Comment 4

•

13 years ago

Here's some detail on what went bad: http://people.mozilla.org/~jmuizelaar/graph-server/detail.html?new=6836410&old=6837446

Siarhei, do you have any idea what could have caused this?

Shawn Wilsher :sdwilsh

Comment 5

•

13 years ago

(In reply to comment #4)
> Here's some detail on what went bad:
> http://people.mozilla.org/~jmuizelaar/graph-server/detail.html?new=6836410&old=6837446
Linux and Linux64 saw similar regressions on *some* of the tests:
http://perf.snarkfest.net/compare-talos/index.html?oldRevs=af61c4752e53&newRev=820512b8223e&tests=tsvg&submit=true (click details).  Overall it looks like it'd be within noise, however.

Oleg Romashin (:romaxa)

Comment 6

•

13 years ago

is there are single repeatable testcase which is reproducing this problem? so I can run it with oprofile...

Siarhei Siamashka

Comment 7

•

13 years ago

(In reply to comment #4)
> Here's some detail on what went bad:
> http://people.mozilla.org/~jmuizelaar/graph-server/detail.html?new=6836410&old=6837446
> 
> Siarhei, do you have any idea what could have caused this?

Most likely it is related to bug 634110

Shawn Wilsher :sdwilsh

Comment 8

•

13 years ago

(In reply to comment #7)
> Most likely it is related to bug 634110
That bug is about existing code before this regression was landed, right?

Jeff Muizelaar [:jrmuizel]

Comment 9

•

13 years ago

(In reply to comment #6)
> is there are single repeatable testcase which is reproducing this problem? so I
> can run it with oprofile...

Here's a link to the test case that seems to have regressed the most:

http://hg.mozilla.org/build/talos/raw-file/cc0c3f9fb9fa/page_load_test/svg/hixie-007.xml

Oleg Romashin (:romaxa)

Comment 10

•

13 years ago

hmm.. trying to load it on device in fennec and have only "Test not yet started" brrr..

Siarhei Siamashka

Comment 11

•

13 years ago

(In reply to comment #8)
> (In reply to comment #7)
> > Most likely it is related to bug 634110
> That bug is about existing code before this regression was landed, right?

Both old and new pixman code drops have a single common problem - gradients are not SIMD optimized and very slow. Radial gradients are particularly bad in this respect. That was an old benchmark and a lot could have changed, but the only two tests where 'gl' backend could outperform 'image' backend (pixman) used to be the gradients heavy ones: http://anholt.livejournal.com/42146.html

Anyway, if somebody wants to really solve the problem and get something like 10x performance improvement for gradients in pixman, then a major update for the gradients code is needed and introducing SIMD optimizations is the way to go. I'm not even specifically talking about NEON just because getting NEON optimizations would also provide SSE2 almost for free.

But if you only care about regressions and some minor performance tweaks in the ballpark of a few tens of percents, then probably just reverting back to an older gradients implementation might help. You can look at the recent commits from Andrea Canciani, they are the likely culprits. For example the following commit might have caused some slowdown: http://cgit.freedesktop.org/pixman/commit/?id=29439bd7724031504e965ffe5b366baaeeae07d8

Jeff Muizelaar [:jrmuizel]

Comment 12

•

13 years ago

(In reply to comment #11)
> (In reply to comment #8)
> > (In reply to comment #7)
> > > Most likely it is related to bug 634110
> > That bug is about existing code before this regression was landed, right?
> 
> Anyway, if somebody wants to really solve the problem and get something like
> 10x performance improvement for gradients in pixman, then a major update for
> the gradients code is needed and introducing SIMD optimizations is the way to
> go. I'm not even specifically talking about NEON just because getting NEON
> optimizations would also provide SSE2 almost for free.

The test case that regressed the most has no gradients, so I think there must be something else going on here.

Oleg Romashin (:romaxa)

Comment 13

•

13 years ago

Interesting... this testcase works on maemo faster then on my 4 cores/Nvidia desktop firefox....

Oleg Romashin (:romaxa)

Comment 14

•

13 years ago

also results are varying from 5.5 sec -> ~6.5 sec on device

Siarhei Siamashka

Comment 15

•

13 years ago

I did some profiling yesterday (admittedly on Cortex-A9 hardware, not on N900), but pixman usage was quite low when loading this particular web page (a lot more time was spent in cairo). And most of the time spent in pixman was just r5g6b5 -> r5g6b5 copy.

Shawn Wilsher :sdwilsh

Comment 16

•

13 years ago

I think we need to start backing things out to figure out what caused this.  This regression has sat in the tree for too long.

Jeff Muizelaar [:jrmuizel]

Comment 17

•

13 years ago

(In reply to comment #16)
> I think we need to start backing things out to figure out what caused this. 
> This regression has sat in the tree for too long.

I'm pretty sure that this was the pixman update. More importantly, I think we need to decide if this regression matters at all.

Shawn Wilsher :sdwilsh

Comment 18

•

13 years ago

(In reply to comment #17)
> I'm pretty sure that this was the pixman update. More importantly, I think we
> need to decide if this regression matters at all.
I presume that is up to the mobile team.  Stuart, Mark?

Siarhei Siamashka

Comment 19

•

13 years ago

(In reply to comment #0)
> * The "last good" value is 5739.23, which (from cross-referencing on the Mobile
> tinderbox) was built from
> http://hg.mozilla.org/mozilla-central/rev/af61c4752e53
> 
> * The "first bad" value is 8309.82, which (from cross-referencing on the Mobile
> tinderbox) was built from
> http://hg.mozilla.org/mozilla-central/rev/4d7a0a6dd613

I can't confirm any performance differences between these two mozilla-central revisions for ARM Cortex-A8 when loading http://hg.mozilla.org/build/talos/raw-file/cc0c3f9fb9fa/page_load_test/svg/hixie-007.xml page.

Siarhei Siamashka

Comment 20

•

13 years ago

Attached file log from oprofile opening this svg page — Details

For the profiling purposes I compiled the latest Fennec with system cairo and pixman in order to easier see the percentage of time spent in these libraries. Looking at the profiling results, I don't see how pixman may affect performance on this particular test page, just because here pixman is using less than 10% of CPU time overall.

Can somebody else try to reproduce this regression and provide profiling logs?

Joel Maher ( :jmaher ) (UTC -8)

Comment 21

•

9 years ago

can we mark this as wontfix?  seems to be old and the browser has changed significantly, let alone no simple way to test this and no support for n900 anymore.

Jeff Muizelaar [:jrmuizel]

Comment 22

•

9 years ago

Sure.

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → WONTFIX