Last Comment Bug 854746 - Talos Regression tsvg_nochrome 7.6% on Android March 14
: Talos Regression tsvg_nochrome 7.6% on Android March 14
Status: NEW
: perf, regression
Product: Firefox for Android
Classification: Client Software
Component: General (show other bugs)
: unspecified
: x86 Android
: P5 normal with 1 vote (vote)
: ---
Assigned To: Avi Halachmi (:avih)
:
Mentors:
Depends on: 875770 888899
Blocks: 897054
  Show dependency treegraph
 
Reported: 2013-03-25 18:19 PDT by Geoff Brown [:gbrown]
Modified: 2015-12-03 12:10 PST (History)
11 users (show)
avihpit: needinfo? (jmuizelaar)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---
unaffected
affected
affected
+


Attachments
Change tsvg iterations to requestAnimationFrame (7.55 KB, patch)
2013-05-28 01:34 PDT, Avi Halachmi (:avih)
no flags Details | Diff | Splinter Review
V2 - make sure the initial SVG renders as well (7.74 KB, patch)
2013-05-28 01:49 PDT, Avi Halachmi (:avih)
jwatt: review+
Details | Diff | Splinter Review
v3 - Add descriptions, remove 'moz' prefix, remove delay vars (20.40 KB, patch)
2013-06-04 01:53 PDT, Avi Halachmi (:avih)
avihpit: review+
Details | Diff | Splinter Review
v4 - make sure tsvg doesn't hang, add missing image (34.03 KB, patch)
2013-06-05 18:51 PDT, Avi Halachmi (:avih)
no flags Details | Diff | Splinter Review

Description Geoff Brown [:gbrown] 2013-03-25 18:19:45 PDT
From dev-tree-management Digest, Vol 51, Issue 32:

Message: 1
Date: Sun, 17 Mar 2013 00:41:25 -0000
From: nobody@cruncher.build.mozilla.org
To: dev-tree-management@lists.mozilla.org
Subject: <Regression> mobile - SVG NoChrome - Android 4.0.4 - 7.64%
Message-ID:
        <20130317004125.A31111044F8@cruncher.srv.releng.scl3.mozilla.com>
Content-Type: text/plain; charset="us-ascii"

Regression: mobile - SVG NoChrome - Android 4.0.4 - 7.64% increase
------------------------------------------------------------------
    Previous: avg 4154.294 stddev 30.853 of 30 runs up to revision 0f7261e288f2
    New     : avg 4471.854 stddev 12.021 of 5 runs since revision 8f5b1f9f5804
    Change  : +317.560 (7.64% / z=10.293)
    Graph   : http://mzl.la/143yPbv

Changeset range: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=0f7261e288f2&tochange=8f5b1f9f5804
Comment 1 Geoff Brown [:gbrown] 2013-03-25 18:22:58 PDT
This seems to be a regression from bug 590422; there is significant discussion in that bug about expected talos regressions. 

Note that the same patch caused a slight improvement (with noise) in this same test - SVG NoChrome - on Android 2.2.
Comment 2 Avi Halachmi (:avih) 2013-05-23 05:05:51 PDT
Assuming I'm looking at the correct tests (talos/page_load_test/svg), it appears that on most of those tests, most of the time is spent within setTimeout.

Here's how the overall duration (the value reported to talos) looks before and after I change the loop from setTimeout to requestAnimationFrame, and set layout.frame_rate to 1000 (to make it redraw as fast as possible and make sure it gets drawn on each iteration). Note that the number of iterations per test is calculated such that the commutative timeouts is some specified number of seconds:

hixie 1,2: 15.0X s ->  0.7 s       (5% in svg)   , iterations -> 15s
hixie 3,4:  5.0X s ->  0.15/0.25 s (3%/5% in svg), iterations ->  5s
hixie 5,6:  5.0X s ->  1.3X s      (26% in svg)  , iterations ->  5s
hixie 7  :  1.13 s ->  0.60 s      (55% in svg)  , iterations ->  1s

As you can see, on 4 of 7 tests, 5% or less of the reported test result value is actual SVG drawing. The rest is setTimeout, and on the other tests still a considerable amount of measured time is spent in setTimeout, and timeouts are known to be wildly inaccurate.

I'd argue that most of those tests measure the consistency of setTimeout more than anything else.

And the consistency of setTimeouts certainly changed in bug 590422 (arguably for the better).

I could imagine that these timeouts were inserted in order to let GC/CC kick in without interrupting the drawing, however, my modified timing show that the new (quicker) results are extremely consistent, so I'd argue that at least for those tests on my system, there's no need for the timeouts, and regardless, with so much time spent in timeouts, except for hixie 7 test, the tested svg represents just too little percentage of the result.

[Slightly off topic] Generally, however, GC/CC could add some noise into regression tests of drawing/animations. For this reason, I want to investigate an option to disable GC/CC during some tests. My rough general idea was to run each test twice: without and with GC/CC. This way, as long as we don't run out of RAM, the first test would be a much better representation of what we want to test, and the second run (with GC/CC enabled) would give us a better understanding of how GC/CC affects stuff, especially when compared to the first run. But that's already for another bug.
Comment 3 Joel Maher ( :jmaher) 2013-05-23 07:54:46 PDT
This is good information here.  The SVG tests take a long time and I assume they have been measuring the right stuff.  

If we can gain useful information by not using setTimeout (both for android and desktop), then I vote for doing that.  If you post a patch for this, we could look at the numbers.
Comment 4 Jonathan Watt [:jwatt] (back in October - email directly if necessary) 2013-05-23 10:23:57 PDT
The hixie* tests appear to come from http://www.hixie.ch/tests/adhoc/svg/perf/

The setTimeout(..., 200) in the hixie* tests makes them very poor perf tests, as you note. I think the reason for using setTimeout with a 200 ms delay was because back in 2005/2006 when these tests were written the browser would start to lock up if you made changes off a setTimeout that was being triggered faster than the browser could paint. That isn't true any more, so setTimeout(..., 0) would be a whole lot better.

Better yet, replace the setTimeouts with requestAnimationFrame...hmm, actually maybe not. I suppose if we use requestAnimationFrame then we may artificially limit the number of doTest() calls that we can run in a second. Specifically, the refresh driver makes sure that we limit painting to 60 fps (or it should), but it's possible that we can run doTest() so fast that we can have more than one doTest() call per frame. So maybe setTimeout(..., 0) is better.

The way that |var start = new Date()| is at the top of the document and is evaluated as soon as that <script> tag loads means that these tests conflate how long it takes to load the test, and how well it animates. The start should be delayed until after we've had our initial paint (using whatever the equivalent of the MozReftestInvalidate event is in talos land).

The use of getElementsByTagNameNS in the tests is confusing. This should be replaced by appropriate use of document.documentElement and document.getElementById.

001.xml: add something like this as a comment to the top:

This test attempts to test how fast an SVG implementation can repaint as a circle that only has a fill animates over complex SVG via changes to its "cx" and "cy" attributes.

002.xml:

This test attempts to test how fast an SVG implementation can repaint as a circle that has a fill and fill-opacity animates over complex SVG via changes to its "cx" and "cy" attributes.

(Note the "fill" vs "fill and fill-opacity" difference.)

003-006 look like they might be a bit less trivial to fix up, since they have:

  var maxPerBlock = Xs.length / (idealTime / delay);

If you get rid of delay, then I'm not sure what maxPerBlock should be offhand.

003.xml suggested comment:

This test attempts to test how fast batches of individually transformed 'text' elements can be created, added to the document and repainted. The time required for each new batch is expected to increase since the previous batches are not removed from the document.

004.xml:

This test attempts to test how fast batches of individually transformed 'image' elements can be created, added to the document and repainted. The time required for each new batch is expected to increase since the previous batches are not removed from the document.

005.xml:

This test attempts to test how fast batches of individually transformed 'rect' elements with a linearGradient fill can be created, added to the document and repainted. The time required for each new batch is expected to increase since the previous batches are not removed from the document.

006.xml:

This test attempts to test how fast batches of individually transformed 'rect' elements with a radialGradient fill can created, be added to the document and repainted. The time required for each new batch is expected to increase since the previous batches are not removed from the document.

007.xml:

This test attempts to test how fast an implementation can repaint for changes to the "width" attributes of a root-&lt;svg&gt; with a viewBox and that contains many 'path' elements with fill and stroke.

Hope the above helps.
Comment 5 Avi Halachmi (:avih) 2013-05-23 11:35:18 PDT
(In reply to Jonathan Watt [:jwatt] from comment #4)
> Better yet, replace the setTimeouts with requestAnimationFrame...hmm,
> actually maybe not. I suppose if we use requestAnimationFrame then we may
> artificially limit the number of doTest() calls that we can run in a second.
> Specifically, the refresh driver makes sure that we limit painting to 60 fps
> (or it should), but it's possible that we can run doTest() so fast that we
> can have more than one doTest() call per frame. So maybe setTimeout(..., 0)
> is better.

The problem with setTimeout(0) is that it may iterate faster than the refresh driver, the result being that not all SVG iterations would actually trigger a layout flush and render, so the reported result value would be biased (quicker than it should be).

Which is why I suggested to use rAF and increase the refresh rate to 1000Hz via a pref, such that we both make sure every iteration is rendered (rAF), and also iterate as fast as possible (assuming 1000hz is faster than we could iterate those SVGs, which appears to be the case).

While in my tests of tsvg I didn't notice GC/CC, they still might affect the result under some conditions. This could always be a problem when we stress the browser and iterate stuff as fast as possible.
Comment 6 Jonathan Watt [:jwatt] (back in October - email directly if necessary) 2013-05-23 17:40:30 PDT
(In reply to Avi Halachmi (:avih) from comment #5)
> Which is why I suggested to use rAF and increase the refresh rate to 1000Hz
> via a pref...

Sounds great - I didn't know talos had access to such a pref, but if that works, perfect.
Comment 7 Avi Halachmi (:avih) 2013-05-28 01:34:26 PDT
Created attachment 754712 [details] [diff] [review]
Change tsvg iterations to requestAnimationFrame

The patch changes all tsvg iterations to rAF, and changes the initial triggers to direct calls (originally both were setTimeout on all tests).

This will make sure all iterations are rendered, and each iteration/render is of a new SVG output. This should hopefully make the results correlate as good as we can to actual SVG performance.

It also takes into account the initial render which could possibly be slower than the rest, but this is a value which we want to measure as well, since it's part of handling SVGs, therefore it should stay as part of the measured duration/result.

@jmaher: we also need the pref layout.frame_rate=10000 (that's 10k) to make sure rAF iterates as fast as possible. Since bug 875770 hasn't yet landed, I'm leaving it up to you to make sure the pref is set before running each of these tests.
Comment 8 Avi Halachmi (:avih) 2013-05-28 01:49:56 PDT
Created attachment 754716 [details] [diff] [review]
V2 - make sure the initial SVG renders as well

Changed the initial trigger from a direct call (at patch v1) to rAF such that the initial SVG image is rendered before starting the iterations.

The rest of comment 7 still applies, and especially this:

(In reply to Avi Halachmi (:avih) from comment #7)
> @jmaher: we also need the pref layout.frame_rate=10000 (that's 10k) to make
> sure rAF iterates as fast as possible. Since bug 875770 hasn't yet landed,
> I'm leaving it up to you to make sure the pref is set before running each of
> these tests.
Comment 9 Joel Maher ( :jmaher) 2013-05-28 04:50:31 PDT
would this pref be needed for tsvg and tsvg_opacity?
Comment 10 Avi Halachmi (:avih) 2013-05-28 05:33:19 PDT
(In reply to Joel Maher (:jmaher) from comment #9)
> would this pref be needed for tsvg and tsvg_opacity?

tsvg (talos/page_load_test/svg/*): need this pref.

tsvg_opavity (talos/page_load_test/svg_opacity/*): doesn't need - no animation there as far as I can tell.

dhtml (talos/page_load_test/dhtml/*): I briefly reviewed all 18 tests: they could be improved by using this perf and changing the iterations method. Most of these tests iterate using setTimeout(0), and few iterate directly (i.e. synchronously without ever leaving the loop). if we make those changes (in a new bug) then the results would correlate better IMO to the dhtml perf measured by each test. Before this change, they only render some of the animation frames - it still correlates to we what want to measure, but not as good as it could.

a11y (talos/page_load_test/a11y/): only dhtml.html does iterations, and in a straight loop. However, it appears to measure the effectiveness of lazy layout flushes, and IMO does it well, so no need for any change and also no need for this perf.

I haven't looked at dormaeo, kraken and v8_7.
Comment 11 Joel Maher ( :jmaher) 2013-05-28 07:10:11 PDT
we don't run dhtml anymore as we found it was outdated and nobody wanted to own it.  I will get the pref for tsvg on desktop and mobile set.
Comment 12 Avi Halachmi (:avih) 2013-05-29 05:38:28 PDT
Quick update: I discussed this approach with vlad and roc (rAF at very high frequency to measure performance of rendering iterations), and they both approves it as valid and better than the setTimeout approach.

One exception regarding OMTC by roc:
> I just have one concern, which is that with OMTC, we might need
> special-case code to drive the compositor at the requested rate.
Comment 13 Jonathan Watt [:jwatt] (back in October - email directly if necessary) 2013-05-29 12:33:53 PDT
Comment on attachment 754716 [details] [diff] [review]
V2 - make sure the initial SVG renders as well

Review of attachment 754716 [details] [diff] [review]:
-----------------------------------------------------------------

::: talos/page_load_test/svg/hixie-001.xml
@@ +269,5 @@
>     c.setAttribute('cx', cx);
>     c.setAttribute('cy', cy);
>     ++count;
>     if (count < max) {
> +     window.mozRequestAnimationFrame(bounce);

You don't need the "moz" prefix. Drop that here and elsewhere please.
Comment 14 Jonathan Watt [:jwatt] (back in October - email directly if necessary) 2013-05-29 12:54:24 PDT
Please also remove the 'idealTime' and 'delay' variables (just hardcode the number where we have |idealTime / delay|).
Comment 15 Jonathan Watt [:jwatt] (back in October - email directly if necessary) 2013-05-29 12:56:00 PDT
Avi, can you create separate patches for the other issues I raised in comment 4? Specifically:

 * the |var start = new Date()| thing
 * the getElementsByTagNameNS thing
 * the addition of the descriptive comments

If you don't want to do that, can you file a follow-up bug to do that work?
Comment 16 Avi Halachmi (:avih) 2013-06-04 01:53:22 PDT
Created attachment 757850 [details] [diff] [review]
v3 - Add descriptions, remove 'moz' prefix, remove delay vars

I didn't change the getElementsByTagNameNS since it's a valid use, and I don't think it's confusing in this narrow context, though I don't mind at all if it gets changed later.

I also intentionally kept the |var start = new Date()| where it was such that it also includes the initial rendering. We're measuring SVG, and the initial rendering represents SVG performance as well. I even made sure it's rendered before starting the animation. See comment 7.

jwatt is away for 3 weeks, so asking jrmuizel for feedback on these exceptions.
Comment 17 Avi Halachmi (:avih) 2013-06-05 07:10:43 PDT
Few issues we've bumped into after jmaher tried to deploy patch v3:

1. tsvg test timed out on all platforms except android.

After some research, apparently with rAF @10khz, a single test - hixie4 - finishes too quickly such that it tries to report the result before the talos PageLoader addon injects the report function into the window. This caused talos to never get the report, therefore it eventually timed out and the test failed.

This was easily fixed by reporting off a small timeout.


2. hixie 4 tries to load an image at the hixie website which doesn't exist at the talos repository (unrelated to 1 above). This causes this test to render nothing to screen and so it doesn't really measure anything useful. This is not a new thing, and probably was broken since the hixie tests were deployed at talos. The missing resource is |/resources/images/smallcats| and it's a gif which exists at the website even today, but not anywhere within talos.

Should be easily fixable. (Note that even when making this resource available within talos, 1 above still times out the same).


3. When running within talos with rAF @10khz, most of the svg animations don't actually animate on screen, and only the final animation frame is displayed.

This doesn't happen when running the same firefox binary stand-alone with the same rAF@10khz. After some research, it appears that when using the talos profile in a stand-alone run of firefox, it still updates the screen lazily and only shows the last frame. Further research shows that when disabling the PageLoader addon with the talos profile stand alone, firefox does animate the svg as expected.

So right now it seems that the PageLoader addon somehow causes lazy screen updates of svg animation. We're still researching this one.
Comment 18 Joel Maher ( :jmaher) 2013-06-05 08:10:55 PDT
after further investigation we see this to be an issue with a single tab which talos loads, not seen normally when running in a browser locally.
Comment 19 Avi Halachmi (:avih) 2013-06-05 18:51:29 PDT
Created attachment 758923 [details] [diff] [review]
v4 - make sure tsvg doesn't hang, add missing image

- Start the animation on load event - to prevent hanged tsvg tests because they try to report before toRecordTime is injected to the content (happens on load event).

Note: overall measured result still explicitly includes the load time of the initial SVG, as it represents an important performance measure as well.

- Added the missing image as |smallcats.gif| at the svg/images dir.

- There's still issue 3 from comment 17: the animation frames are iterated (and flushed) but the screen isn't updated until the animation ends, and we can't bypass this behavior easily. This is due to bug 880036, and results could get slightly worse when that bug is fixed - since rendering to screen requires some further processing.

Note that even without rendering the iterations to screen, rAF forces to flush SVG layout changes, so much/most of the SVG related performance is measured, and in any case it's MUCH better than the "old" tsvg, since it actually correlates to SVG performance.

Once this lands:
1. tsvg will show very different results than before - mostly much lower values.
2. tsvg results could get slightly worse when bug 880036 (or related) lands.
3. roc's reservation from comment 12 still stands: we'll need to revisit this for OMTC.
Comment 20 Avi Halachmi (:avih) 2013-06-05 19:17:56 PDT
Also, v4 checks both requestAnimationFrame and mozRequestAnimationFrame since current Firefox 21 only supports the latter.
Comment 21 Joel Maher ( :jmaher) 2013-06-07 08:04:42 PDT
jet- is there another svg person that could weigh in on the questions here?  jwatt is on pto and it would be nice to have this figured out and deployed this month.
Comment 22 Kevin Brosnan [:kbrosnan] 2013-08-01 10:47:19 PDT
23+ ship has sailed. Need to re-triage this.
Comment 23 Avi Halachmi (:avih) 2013-08-01 11:36:42 PDT
(In reply to Kevin Brosnan [:kbrosnan] from comment #22)
> 23+ ship has sailed. Need to re-triage this.

We determined that the problem wasn't really a performance regression, but rather that the SVG test was very susceptible to timing changes (such as those from bug 590422).

Since then, this bug has been used to create a new test (svgx), which should replace the old test soon (bug 897054).

As such, there's probably no need to track this for 23.
Comment 24 Jonathan Watt [:jwatt] (back in October - email directly if necessary) 2013-08-01 14:43:02 PDT
See also the notice to m.d.platform:

https://groups.google.com/forum/#!topic/mozilla.dev.platform/RICw5SJhNMo
Comment 25 Brad Lassey [:blassey] (use needinfo?) 2014-10-20 08:25:48 PDT
filter on [mass-p5]
Comment 26 Joel Maher ( :jmaher) 2015-12-03 12:10:07 PST
can we mark this as wontfix?  seems to be old and the browser has changed significantly, let alone no simple way to test this.

Note You need to log in before you can comment on or make changes to this bug.