Closed Bug 646299 Opened 13 years ago Closed 9 years ago

Regression SVG NoChrome increase 44.5% on Nokia n900 mobile, reported 4:40PM 3/29/2011

Categories

(Core :: General, defect)

ARM
Maemo
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: dholbert, Unassigned)

Details

Attachments

(1 file)

Quoting a post to dev.tree-management:

On 03/29/2011 04:40 PM, nobody@cruncher.build.sjc1.mozilla.com wrote:
> Regression :( SVG NoChrome increase 44.5% on Nokia n900 mobile
> --------------------------------------------------------------
>      Previous: avg 5750.005 stddev 18.072 of 30 runs up to revision d402e973b4c9
>      New     : avg 8309.736 stddev 10.484 of 5 runs since revision d402e973b4c9
>      Change  : +2559.731 (44.5% / z=141.638)
>      Graph   : http://mzl.la/fPYxv7
> 
> Changeset range: http://hg.mozilla.org/mobile-browser/pushloghtml?fromchange=d402e973b4c9&tochange=d402e973b4c9

That changeset range is empty (& hence useless), because apparently our automated-changeset-range-builder only includes csets from mobile-browser, whereas this regression was really (presumably) caused by a change in mozilla-central.

From looking at the graph linked in the message, the values were all pretty consistently in the ~5700 range, and then at some point late today, they spiked up to ~8300.  In particular...
* The "last good" value is 5739.23, which (from cross-referencing on the Mobile tinderbox) was built from http://hg.mozilla.org/mozilla-central/rev/af61c4752e53

* The "first bad" value is 8309.82, which (from cross-referencing on the Mobile tinderbox) was built from http://hg.mozilla.org/mozilla-central/rev/4d7a0a6dd613

So that gives us this pushlog for the regression range:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=af61c4752e53&tochange=4d7a0a6dd613

That includes zero SVG changes, but a few significant platform changes -- in particular: a pixman update, libjpeg-turbo landing, and a 220-cset tracemonkey merge.
Probably the pixman update. I'll take a look tomorrow.
Did this show up on try?
OS: Linux → Maemo
Hardware: x86 → ARM
(In reply to comment #2)
> Did this show up on try?

Maybe, try results for maemo don't show up in tbpl or in try chooser so I forgot to look.
Here's some detail on what went bad: http://people.mozilla.org/~jmuizelaar/graph-server/detail.html?new=6836410&old=6837446

Siarhei, do you have any idea what could have caused this?
(In reply to comment #4)
> Here's some detail on what went bad:
> http://people.mozilla.org/~jmuizelaar/graph-server/detail.html?new=6836410&old=6837446
Linux and Linux64 saw similar regressions on *some* of the tests:
http://perf.snarkfest.net/compare-talos/index.html?oldRevs=af61c4752e53&newRev=820512b8223e&tests=tsvg&submit=true (click details).  Overall it looks like it'd be within noise, however.
is there are single repeatable testcase which is reproducing this problem? so I can run it with oprofile...
(In reply to comment #4)
> Here's some detail on what went bad:
> http://people.mozilla.org/~jmuizelaar/graph-server/detail.html?new=6836410&old=6837446
> 
> Siarhei, do you have any idea what could have caused this?

Most likely it is related to bug 634110
(In reply to comment #7)
> Most likely it is related to bug 634110
That bug is about existing code before this regression was landed, right?
(In reply to comment #6)
> is there are single repeatable testcase which is reproducing this problem? so I
> can run it with oprofile...

Here's a link to the test case that seems to have regressed the most:

http://hg.mozilla.org/build/talos/raw-file/cc0c3f9fb9fa/page_load_test/svg/hixie-007.xml
hmm.. trying to load it on device in fennec and have only "Test not yet started" brrr..
(In reply to comment #8)
> (In reply to comment #7)
> > Most likely it is related to bug 634110
> That bug is about existing code before this regression was landed, right?

Both old and new pixman code drops have a single common problem - gradients are not SIMD optimized and very slow. Radial gradients are particularly bad in this respect. That was an old benchmark and a lot could have changed, but the only two tests where 'gl' backend could outperform 'image' backend (pixman) used to be the gradients heavy ones: http://anholt.livejournal.com/42146.html

Anyway, if somebody wants to really solve the problem and get something like 10x performance improvement for gradients in pixman, then a major update for the gradients code is needed and introducing SIMD optimizations is the way to go. I'm not even specifically talking about NEON just because getting NEON optimizations would also provide SSE2 almost for free.

But if you only care about regressions and some minor performance tweaks in the ballpark of a few tens of percents, then probably just reverting back to an older gradients implementation might help. You can look at the recent commits from Andrea Canciani, they are the likely culprits. For example the following commit might have caused some slowdown: http://cgit.freedesktop.org/pixman/commit/?id=29439bd7724031504e965ffe5b366baaeeae07d8
(In reply to comment #11)
> (In reply to comment #8)
> > (In reply to comment #7)
> > > Most likely it is related to bug 634110
> > That bug is about existing code before this regression was landed, right?
> 
> Anyway, if somebody wants to really solve the problem and get something like
> 10x performance improvement for gradients in pixman, then a major update for
> the gradients code is needed and introducing SIMD optimizations is the way to
> go. I'm not even specifically talking about NEON just because getting NEON
> optimizations would also provide SSE2 almost for free.

The test case that regressed the most has no gradients, so I think there must be something else going on here.
Interesting... this testcase works on maemo faster then on my 4 cores/Nvidia desktop firefox....
also results are varying from 5.5 sec -> ~6.5 sec on device
I did some profiling yesterday (admittedly on Cortex-A9 hardware, not on N900), but pixman usage was quite low when loading this particular web page (a lot more time was spent in cairo). And most of the time spent in pixman was just r5g6b5 -> r5g6b5 copy.
I think we need to start backing things out to figure out what caused this.  This regression has sat in the tree for too long.
(In reply to comment #16)
> I think we need to start backing things out to figure out what caused this. 
> This regression has sat in the tree for too long.

I'm pretty sure that this was the pixman update. More importantly, I think we need to decide if this regression matters at all.
(In reply to comment #17)
> I'm pretty sure that this was the pixman update. More importantly, I think we
> need to decide if this regression matters at all.
I presume that is up to the mobile team.  Stuart, Mark?
(In reply to comment #0)
> * The "last good" value is 5739.23, which (from cross-referencing on the Mobile
> tinderbox) was built from
> http://hg.mozilla.org/mozilla-central/rev/af61c4752e53
> 
> * The "first bad" value is 8309.82, which (from cross-referencing on the Mobile
> tinderbox) was built from
> http://hg.mozilla.org/mozilla-central/rev/4d7a0a6dd613

I can't confirm any performance differences between these two mozilla-central revisions for ARM Cortex-A8 when loading http://hg.mozilla.org/build/talos/raw-file/cc0c3f9fb9fa/page_load_test/svg/hixie-007.xml page.
For the profiling purposes I compiled the latest Fennec with system cairo and pixman in order to easier see the percentage of time spent in these libraries. Looking at the profiling results, I don't see how pixman may affect performance on this particular test page, just because here pixman is using less than 10% of CPU time overall.

Can somebody else try to reproduce this regression and provide profiling logs?
can we mark this as wontfix?  seems to be old and the browser has changed significantly, let alone no simple way to test this and no support for n900 anymore.
Sure.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: