Closed Bug 854746 Opened 11 years ago Closed 7 years ago

Talos Regression tsvg_nochrome 7.6% on Android March 14

Categories

(Firefox for Android Graveyard :: Testing, defect, P5)

x86
Android
defect

Tracking

(firefox21 unaffected, firefox22 affected, firefox23 affected, fennec+)

RESOLVED WONTFIX
Tracking Status
firefox21 --- unaffected
firefox22 --- affected
firefox23 --- affected
fennec + ---

People

(Reporter: gbrown, Assigned: avih, NeedInfo)

References

Details

(Keywords: perf, regression)

Attachments

(1 file, 3 obsolete files)

From dev-tree-management Digest, Vol 51, Issue 32:

Message: 1
Date: Sun, 17 Mar 2013 00:41:25 -0000
From: nobody@cruncher.build.mozilla.org
To: dev-tree-management@lists.mozilla.org
Subject: <Regression> mobile - SVG NoChrome - Android 4.0.4 - 7.64%
Message-ID:
        <20130317004125.A31111044F8@cruncher.srv.releng.scl3.mozilla.com>
Content-Type: text/plain; charset="us-ascii"

Regression: mobile - SVG NoChrome - Android 4.0.4 - 7.64% increase
------------------------------------------------------------------
    Previous: avg 4154.294 stddev 30.853 of 30 runs up to revision 0f7261e288f2
    New     : avg 4471.854 stddev 12.021 of 5 runs since revision 8f5b1f9f5804
    Change  : +317.560 (7.64% / z=10.293)
    Graph   : http://mzl.la/143yPbv

Changeset range: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=0f7261e288f2&tochange=8f5b1f9f5804
This seems to be a regression from bug 590422; there is significant discussion in that bug about expected talos regressions. 

Note that the same patch caused a slight improvement (with noise) in this same test - SVG NoChrome - on Android 2.2.
Keywords: perf, regression
See Also: → 845943
Assignee: nobody → avihpit
tracking-fennec: ? → 23+
Assuming I'm looking at the correct tests (talos/page_load_test/svg), it appears that on most of those tests, most of the time is spent within setTimeout.

Here's how the overall duration (the value reported to talos) looks before and after I change the loop from setTimeout to requestAnimationFrame, and set layout.frame_rate to 1000 (to make it redraw as fast as possible and make sure it gets drawn on each iteration). Note that the number of iterations per test is calculated such that the commutative timeouts is some specified number of seconds:

hixie 1,2: 15.0X s ->  0.7 s       (5% in svg)   , iterations -> 15s
hixie 3,4:  5.0X s ->  0.15/0.25 s (3%/5% in svg), iterations ->  5s
hixie 5,6:  5.0X s ->  1.3X s      (26% in svg)  , iterations ->  5s
hixie 7  :  1.13 s ->  0.60 s      (55% in svg)  , iterations ->  1s

As you can see, on 4 of 7 tests, 5% or less of the reported test result value is actual SVG drawing. The rest is setTimeout, and on the other tests still a considerable amount of measured time is spent in setTimeout, and timeouts are known to be wildly inaccurate.

I'd argue that most of those tests measure the consistency of setTimeout more than anything else.

And the consistency of setTimeouts certainly changed in bug 590422 (arguably for the better).

I could imagine that these timeouts were inserted in order to let GC/CC kick in without interrupting the drawing, however, my modified timing show that the new (quicker) results are extremely consistent, so I'd argue that at least for those tests on my system, there's no need for the timeouts, and regardless, with so much time spent in timeouts, except for hixie 7 test, the tested svg represents just too little percentage of the result.

[Slightly off topic] Generally, however, GC/CC could add some noise into regression tests of drawing/animations. For this reason, I want to investigate an option to disable GC/CC during some tests. My rough general idea was to run each test twice: without and with GC/CC. This way, as long as we don't run out of RAM, the first test would be a much better representation of what we want to test, and the second run (with GC/CC enabled) would give us a better understanding of how GC/CC affects stuff, especially when compared to the first run. But that's already for another bug.
This is good information here.  The SVG tests take a long time and I assume they have been measuring the right stuff.  

If we can gain useful information by not using setTimeout (both for android and desktop), then I vote for doing that.  If you post a patch for this, we could look at the numbers.
The hixie* tests appear to come from http://www.hixie.ch/tests/adhoc/svg/perf/

The setTimeout(..., 200) in the hixie* tests makes them very poor perf tests, as you note. I think the reason for using setTimeout with a 200 ms delay was because back in 2005/2006 when these tests were written the browser would start to lock up if you made changes off a setTimeout that was being triggered faster than the browser could paint. That isn't true any more, so setTimeout(..., 0) would be a whole lot better.

Better yet, replace the setTimeouts with requestAnimationFrame...hmm, actually maybe not. I suppose if we use requestAnimationFrame then we may artificially limit the number of doTest() calls that we can run in a second. Specifically, the refresh driver makes sure that we limit painting to 60 fps (or it should), but it's possible that we can run doTest() so fast that we can have more than one doTest() call per frame. So maybe setTimeout(..., 0) is better.

The way that |var start = new Date()| is at the top of the document and is evaluated as soon as that <script> tag loads means that these tests conflate how long it takes to load the test, and how well it animates. The start should be delayed until after we've had our initial paint (using whatever the equivalent of the MozReftestInvalidate event is in talos land).

The use of getElementsByTagNameNS in the tests is confusing. This should be replaced by appropriate use of document.documentElement and document.getElementById.

001.xml: add something like this as a comment to the top:

This test attempts to test how fast an SVG implementation can repaint as a circle that only has a fill animates over complex SVG via changes to its "cx" and "cy" attributes.

002.xml:

This test attempts to test how fast an SVG implementation can repaint as a circle that has a fill and fill-opacity animates over complex SVG via changes to its "cx" and "cy" attributes.

(Note the "fill" vs "fill and fill-opacity" difference.)

003-006 look like they might be a bit less trivial to fix up, since they have:

  var maxPerBlock = Xs.length / (idealTime / delay);

If you get rid of delay, then I'm not sure what maxPerBlock should be offhand.

003.xml suggested comment:

This test attempts to test how fast batches of individually transformed 'text' elements can be created, added to the document and repainted. The time required for each new batch is expected to increase since the previous batches are not removed from the document.

004.xml:

This test attempts to test how fast batches of individually transformed 'image' elements can be created, added to the document and repainted. The time required for each new batch is expected to increase since the previous batches are not removed from the document.

005.xml:

This test attempts to test how fast batches of individually transformed 'rect' elements with a linearGradient fill can be created, added to the document and repainted. The time required for each new batch is expected to increase since the previous batches are not removed from the document.

006.xml:

This test attempts to test how fast batches of individually transformed 'rect' elements with a radialGradient fill can created, be added to the document and repainted. The time required for each new batch is expected to increase since the previous batches are not removed from the document.

007.xml:

This test attempts to test how fast an implementation can repaint for changes to the "width" attributes of a root-&lt;svg&gt; with a viewBox and that contains many 'path' elements with fill and stroke.

Hope the above helps.
(In reply to Jonathan Watt [:jwatt] from comment #4)
> Better yet, replace the setTimeouts with requestAnimationFrame...hmm,
> actually maybe not. I suppose if we use requestAnimationFrame then we may
> artificially limit the number of doTest() calls that we can run in a second.
> Specifically, the refresh driver makes sure that we limit painting to 60 fps
> (or it should), but it's possible that we can run doTest() so fast that we
> can have more than one doTest() call per frame. So maybe setTimeout(..., 0)
> is better.

The problem with setTimeout(0) is that it may iterate faster than the refresh driver, the result being that not all SVG iterations would actually trigger a layout flush and render, so the reported result value would be biased (quicker than it should be).

Which is why I suggested to use rAF and increase the refresh rate to 1000Hz via a pref, such that we both make sure every iteration is rendered (rAF), and also iterate as fast as possible (assuming 1000hz is faster than we could iterate those SVGs, which appears to be the case).

While in my tests of tsvg I didn't notice GC/CC, they still might affect the result under some conditions. This could always be a problem when we stress the browser and iterate stuff as fast as possible.
(In reply to Avi Halachmi (:avih) from comment #5)
> Which is why I suggested to use rAF and increase the refresh rate to 1000Hz
> via a pref...

Sounds great - I didn't know talos had access to such a pref, but if that works, perfect.
The patch changes all tsvg iterations to rAF, and changes the initial triggers to direct calls (originally both were setTimeout on all tests).

This will make sure all iterations are rendered, and each iteration/render is of a new SVG output. This should hopefully make the results correlate as good as we can to actual SVG performance.

It also takes into account the initial render which could possibly be slower than the rest, but this is a value which we want to measure as well, since it's part of handling SVGs, therefore it should stay as part of the measured duration/result.

@jmaher: we also need the pref layout.frame_rate=10000 (that's 10k) to make sure rAF iterates as fast as possible. Since bug 875770 hasn't yet landed, I'm leaving it up to you to make sure the pref is set before running each of these tests.
Attachment #754712 - Flags: review?(jwatt)
Depends on: 875770
Changed the initial trigger from a direct call (at patch v1) to rAF such that the initial SVG image is rendered before starting the iterations.

The rest of comment 7 still applies, and especially this:

(In reply to Avi Halachmi (:avih) from comment #7)
> @jmaher: we also need the pref layout.frame_rate=10000 (that's 10k) to make
> sure rAF iterates as fast as possible. Since bug 875770 hasn't yet landed,
> I'm leaving it up to you to make sure the pref is set before running each of
> these tests.
Attachment #754712 - Attachment is obsolete: true
Attachment #754712 - Flags: review?(jwatt)
Attachment #754716 - Flags: review?(jwatt)
would this pref be needed for tsvg and tsvg_opacity?
(In reply to Joel Maher (:jmaher) from comment #9)
> would this pref be needed for tsvg and tsvg_opacity?

tsvg (talos/page_load_test/svg/*): need this pref.

tsvg_opavity (talos/page_load_test/svg_opacity/*): doesn't need - no animation there as far as I can tell.

dhtml (talos/page_load_test/dhtml/*): I briefly reviewed all 18 tests: they could be improved by using this perf and changing the iterations method. Most of these tests iterate using setTimeout(0), and few iterate directly (i.e. synchronously without ever leaving the loop). if we make those changes (in a new bug) then the results would correlate better IMO to the dhtml perf measured by each test. Before this change, they only render some of the animation frames - it still correlates to we what want to measure, but not as good as it could.

a11y (talos/page_load_test/a11y/): only dhtml.html does iterations, and in a straight loop. However, it appears to measure the effectiveness of lazy layout flushes, and IMO does it well, so no need for any change and also no need for this perf.

I haven't looked at dormaeo, kraken and v8_7.
we don't run dhtml anymore as we found it was outdated and nobody wanted to own it.  I will get the pref for tsvg on desktop and mobile set.
Quick update: I discussed this approach with vlad and roc (rAF at very high frequency to measure performance of rendering iterations), and they both approves it as valid and better than the setTimeout approach.

One exception regarding OMTC by roc:
> I just have one concern, which is that with OMTC, we might need
> special-case code to drive the compositor at the requested rate.
Comment on attachment 754716 [details] [diff] [review]
V2 - make sure the initial SVG renders as well

Review of attachment 754716 [details] [diff] [review]:
-----------------------------------------------------------------

::: talos/page_load_test/svg/hixie-001.xml
@@ +269,5 @@
>     c.setAttribute('cx', cx);
>     c.setAttribute('cy', cy);
>     ++count;
>     if (count < max) {
> +     window.mozRequestAnimationFrame(bounce);

You don't need the "moz" prefix. Drop that here and elsewhere please.
Attachment #754716 - Flags: review?(jwatt) → review+
Please also remove the 'idealTime' and 'delay' variables (just hardcode the number where we have |idealTime / delay|).
Avi, can you create separate patches for the other issues I raised in comment 4? Specifically:

 * the |var start = new Date()| thing
 * the getElementsByTagNameNS thing
 * the addition of the descriptive comments

If you don't want to do that, can you file a follow-up bug to do that work?
I didn't change the getElementsByTagNameNS since it's a valid use, and I don't think it's confusing in this narrow context, though I don't mind at all if it gets changed later.

I also intentionally kept the |var start = new Date()| where it was such that it also includes the initial rendering. We're measuring SVG, and the initial rendering represents SVG performance as well. I even made sure it's rendered before starting the animation. See comment 7.

jwatt is away for 3 weeks, so asking jrmuizel for feedback on these exceptions.
Attachment #754716 - Attachment is obsolete: true
Attachment #757850 - Flags: review+
Flags: needinfo?(jmuizelaar)
Few issues we've bumped into after jmaher tried to deploy patch v3:

1. tsvg test timed out on all platforms except android.

After some research, apparently with rAF @10khz, a single test - hixie4 - finishes too quickly such that it tries to report the result before the talos PageLoader addon injects the report function into the window. This caused talos to never get the report, therefore it eventually timed out and the test failed.

This was easily fixed by reporting off a small timeout.


2. hixie 4 tries to load an image at the hixie website which doesn't exist at the talos repository (unrelated to 1 above). This causes this test to render nothing to screen and so it doesn't really measure anything useful. This is not a new thing, and probably was broken since the hixie tests were deployed at talos. The missing resource is |/resources/images/smallcats| and it's a gif which exists at the website even today, but not anywhere within talos.

Should be easily fixable. (Note that even when making this resource available within talos, 1 above still times out the same).


3. When running within talos with rAF @10khz, most of the svg animations don't actually animate on screen, and only the final animation frame is displayed.

This doesn't happen when running the same firefox binary stand-alone with the same rAF@10khz. After some research, it appears that when using the talos profile in a stand-alone run of firefox, it still updates the screen lazily and only shows the last frame. Further research shows that when disabling the PageLoader addon with the talos profile stand alone, firefox does animate the svg as expected.

So right now it seems that the PageLoader addon somehow causes lazy screen updates of svg animation. We're still researching this one.
after further investigation we see this to be an issue with a single tab which talos loads, not seen normally when running in a browser locally.
- Start the animation on load event - to prevent hanged tsvg tests because they try to report before toRecordTime is injected to the content (happens on load event).

Note: overall measured result still explicitly includes the load time of the initial SVG, as it represents an important performance measure as well.

- Added the missing image as |smallcats.gif| at the svg/images dir.

- There's still issue 3 from comment 17: the animation frames are iterated (and flushed) but the screen isn't updated until the animation ends, and we can't bypass this behavior easily. This is due to bug 880036, and results could get slightly worse when that bug is fixed - since rendering to screen requires some further processing.

Note that even without rendering the iterations to screen, rAF forces to flush SVG layout changes, so much/most of the SVG related performance is measured, and in any case it's MUCH better than the "old" tsvg, since it actually correlates to SVG performance.

Once this lands:
1. tsvg will show very different results than before - mostly much lower values.
2. tsvg results could get slightly worse when bug 880036 (or related) lands.
3. roc's reservation from comment 12 still stands: we'll need to revisit this for OMTC.
Attachment #757850 - Attachment is obsolete: true
Also, v4 checks both requestAnimationFrame and mozRequestAnimationFrame since current Firefox 21 only supports the latter.
jet- is there another svg person that could weigh in on the questions here?  jwatt is on pto and it would be nice to have this figured out and deployed this month.
Flags: needinfo?(bugs)
Depends on: 888899
Flags: needinfo?(bugs) → needinfo?(jwatt)
23+ ship has sailed. Need to re-triage this.
tracking-fennec: 23+ → ?
(In reply to Kevin Brosnan [:kbrosnan] from comment #22)
> 23+ ship has sailed. Need to re-triage this.

We determined that the problem wasn't really a performance regression, but rather that the SVG test was very susceptible to timing changes (such as those from bug 590422).

Since then, this bug has been used to create a new test (svgx), which should replace the old test soon (bug 897054).

As such, there's probably no need to track this for 23.
See also the notice to m.d.platform:

https://groups.google.com/forum/#!topic/mozilla.dev.platform/RICw5SJhNMo
Flags: needinfo?(jwatt)
Blocks: 897054
tracking-fennec: ? → +
filter on [mass-p5]
Priority: -- → P5
can we mark this as wontfix?  seems to be old and the browser has changed significantly, let alone no simple way to test this.
Status: NEW → RESOLVED
Closed: 7 years ago
Component: General → Testing
Resolution: --- → WONTFIX
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: