Bug 902024 (australis-tart)

Investigate Australis tab animation performance compared to m-c

RESOLVED FIXED

Status

()

RESOLVED FIXED
5 years ago
4 years ago

People

(Reporter: avih, Unassigned)

Tracking

(Depends on: 8 bugs, Blocks: 1 bug, {meta})

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(11 attachments, 3 obsolete attachments)

16.03 KB, text/plain
Details
2.68 MB, application/zip
Details
2.65 MB, application/zip
Details
2.48 MB, application/zip
Details
2.55 MB, application/zip
Details
2.65 MB, application/zip
Details
2.66 MB, application/zip
Details
3.55 MB, application/zip
Details
3.82 MB, application/zip
Details
3.80 MB, application/zip
Details
3.63 MB, application/zip
Details
(Reporter)

Description

5 years ago
Summary of previous events:

Australis tab animation performance was improved greatly at bug 837885, up to a level which seemed quite close to m-c on a slow ATOM system (IIRC <10% and sometimes even slightly better than m-c).

Later, a talos tab animation regression test was developed (TART - bug 848358). This test provides higher resolution results than the instrumentation which was used previously.

Preliminary comparison using TART shows that on pure animation throughput, Australis is worse by 20-40% compared to m-c. These results are after testing on a single slow Win7 AMD E-350 system with blacklisted D2D and D3D HW.

The main differences of TART compared to the previous instrumentation are:

1. TART is running in ASAP mode (i.e. unlimited frame rate), which exposes perf differences even if under normal conditions both contenders would do 60fps.

2. TART reports the overall intervals average, but also provides another result: the average interval over the 2nd half of the animation. The main difference between those is that the latter is mostly unaffected by overheads which are unrelated to Australis (tab initialization etc which typically affects the first few frame intervals negatively, regardless of Australis).


The estimation of 20-40% regression is based on this "half duration" throughput on the test system.

Note that under normal conditions, the performance difference between Australis and m-c is much less acute. e.g. in absolute frame intervals values, Australis did, on average, still better than 60FPS throughput during the second half of the animation.

This bug is about investigating Australis regressions on more than one test system, maybe trying to find causes for these differences, and hopefully providing enough info to allow a more informed decision on how far do we want to go with improving Australis performance.


CC'ing everyone which was on the mail thread. I won't be offended if one un-CC herself ;)
Depends on: 902678
(Reporter)

Comment 1

5 years ago
Created attachment 787255 [details]
TART-bench-2013-08-06-data.txt

I compared tab animation performance of m-c vs UX (both 26.a1 2013-08-06) using TART on 4 systems.

TL;DR:

- On the fast windows system and on OS X (MBA), UX performance is roughly similar to m-c (give or take on specific subtests) and has potential for well above 60fps.

- On both of the slow windows systems, UX performed generally worse than m-c by ~20%-100% depending on context and noise (100% = twice as long intervals = half the throughput). FPS ranges from around 60hz to ~30hz with noticeable occasional jank (on UX more than on m-c, on Atom much more than on the AMD system).

- DPI scaling noticeably regresses performance on both UX and m-c, but UX appears to suffer from it more.


Overall: Australis is great on fast systems, and not terrible IMO on slow systems, but there's definitely room for improvement on the slower systems, where m-c does better even if also not perfect.


---- The long version ----

- The TART XPI which I used is from bug 848358 comment 15 (v1.3 WIP) (navigate to chome://tart/content/tart.html once the addon is installed).

TART default configuration was used except:
- Configured to run 5 times in a row (so it also provides average and stddev of the runs).
- Only the icon-DPI1 and icon-Dpi2 tests were selected.

I ran each test both in default refresh rate, and in ASAP mode (layout.refresh-rate=0, but on windows 10000 will work as well).

I ran each configuration few times to make sure the results are as consistent as they can get (they're more noisy on the slower systems).

I also watched each run and can confirm that the results correlate reasonably well to my subjective smoothness assessment.

I also triggered some random cases manually (not using TART) and also confirm that I haven't noticed any notable subjective differences between the performance within TART and the manual triggers.

As for the results of ASAP mode vs default refresh rate, it's fairly expected, i.e:
- If it performs well above 60fps in ASAP, then it does pretty stable 60hz at default rate.
- If it performs worse than 60hz in ASAP, then that's also roughly the performance at the default rate.
- If it performs around or marginally better than 60hz in ASAP, then typically it performs somewhat worse than 60hz in default rate, but not by a lot.

You can examine the full results (averages) of the runs from each config at the attached file, but here I'll focus on ASAP mode only.

Also, I'm comparing the "half" results only - which are the average frame intervals over the last 125ms of the animation, when it's hopefully already stable.


Here are the numbers:

--------------------
Fast system:
(Win7 laptop, i7-3630qm+iGPU, cores: 4+HT, ram: 16G, win-accel: D3D10, d2d: yes, sunspider 1.0: 120ms)

m-c, DPI 1.0: 5ms/iteration on tab open/close.
m-c, DPI 2.0: 10ms/iteration on tab open, 8ms on tab close.

ux, DPI 1.0: 5-6ms/iteration on open/close.
ux, DPI 2.0: 7ms on open, 10ms on close.

----------------
Slow System 1:
(Win7 laptop, AMD E350+iGPU, cores: 2, ram: 4G, win-accel: no, d2d: no, sunspider 1.0: 450ms)

m-c, DPI 1: 7ms open, 6ms close.
m-c, DPI 2: 9ms open, 6ms close.

ux, DPI 1: 11ms open, 14ms close.
ux, DPI 2: 18ms open, 26ms close.

Forcing hw composition using layers.acceleration.force-enabled=true didn't change the results too meaningfully.

--------------------
Slow system 2:
(Win8 tablet, Atom z2760+iGPU, Cores: 2+HT, ram: 2G, win-accel: D3D9, d2d: no, sunspider 1.0: 620ms)

m-c, DPI 1: 12ms open, 15ms close
m-c, DPI 2: 25ms open, 19ms close

ux, DPI 1: 16ms open, 21ms close.
ux, DPI 2: 21ms open, 28ms close

--------------------
OS X, medium speed (OMTC disabled because it currently breaks intervals recording):
(MBA 13" late 2010 2.13GHz, M.Lion 10.8.4, cores: 2, ram: 4G, sunspider1.0: 230ms)

m-c, DPI 1: 13ms open, 10ms close.
m-c, DPI 2: 13ms open, 11ms close.

ux, DPI 1: 11ms open, 9ms close.
ux, DPI 2: 14ms open, 8ms close.

--------------------

The conclusion is at the TL;DR section above.

Please note that TART is still young, and it possibly makes incorrect assumptions/setups/etc. To help improve TART, followup at bug 848358.

However, overall I feel that the results are indicative enough of the rough state of affairs wrt tab animation performance in general, and also wrt to the differences between m-c and ux.

Comment 2

5 years ago
I am confused by the results. Some questions:

(In reply to Avi Halachmi (:avih) from comment #1)
> TART default configuration was used except:
> - Configured to run 5 times in a row (so it also provides average and stddev
> of the runs).

What was the stddev for these runs?

> --------------------
> Fast system:
> (Win7 laptop, i7-3630qm+iGPU, cores: 4+HT, ram: 16G, win-accel: D3D10, d2d:
> yes, sunspider 1.0: 120ms)
> 
> m-c, DPI 1.0: 5ms/iteration on tab open/close.
> m-c, DPI 2.0: 10ms/iteration on tab open, 8ms on tab close.
> 
> ux, DPI 1.0: 5-6ms/iteration on open/close.
> ux, DPI 2.0: 7ms on open, 10ms on close.
> 
> ----------------
> Slow System 1:
> (Win7 laptop, AMD E350+iGPU, cores: 2, ram: 4G, win-accel: no, d2d: no,
> sunspider 1.0: 450ms)
> 
> m-c, DPI 1: 7ms open, 6ms close.
> m-c, DPI 2: 9ms open, 6ms close.
> 
> ux, DPI 1: 11ms open, 14ms close.
> ux, DPI 2: 18ms open, 26ms close.
> 
> Forcing hw composition using layers.acceleration.force-enabled=true didn't
> change the results too meaningfully.
> 
> --------------------
> Slow system 2:
> (Win8 tablet, Atom z2760+iGPU, Cores: 2+HT, ram: 2G, win-accel: D3D9, d2d:
> no, sunspider 1.0: 620ms)
> 
> m-c, DPI 1: 12ms open, 15ms close
> m-c, DPI 2: 25ms open, 19ms close
> 
> ux, DPI 1: 16ms open, 21ms close.
> ux, DPI 2: 21ms open, 28ms close
> 
> --------------------
> OS X, medium speed (OMTC disabled because it currently breaks intervals
> recording):
> (MBA 13" late 2010 2.13GHz, M.Lion 10.8.4, cores: 2, ram: 4G, sunspider1.0:
> 230ms)
> 
> m-c, DPI 1: 13ms open, 10ms close.
> m-c, DPI 2: 13ms open, 11ms close.
> 
> ux, DPI 1: 11ms open, 9ms close.
> ux, DPI 2: 14ms open, 8ms close.
> 
> --------------------

This is surprising. On OS X, we're faster than m-c everywhere but in double-DPI tabopening (+1ms). The effect is most pronounced when looking at double-DPI tabclosing, where m-c is 38% slower (3ms).

The opposite happens on the fast Windows machine, where we're faster in tabopening but slower at tabclosing.

The slow Windows machines also have some strange results. For one, spec-wise, it makes no sense that we're performing roughly identically on the two slower devices when it comes to double-DPI tabclose, but m-c performs 3 times as well on the AMD device (and 4 times as fast as UX on that machine, too).

For another, we're slower on all tests on both slow machines *except* on double-DPI tab-opening on the Atom, where m-c is ~20% slower.

I'd really like to see the spread on these numbers both between runs and between frames. Did you control the CPU stepping/speed for the slower devices? As they're somewhat newer laptops/tablets, they probably do highly magical things in determining how much CPU you're getting at one point, and that could influence the results rather dramatically...
(Reporter)

Comment 3

5 years ago
(In reply to :Gijs Kruitbosch (PTO Aug 8-Aug 18) from comment #2)
> What was the stddev for these runs?

Stddev of the _averages_ (i.e. of several <name>.half values - and not of specific frame intervals) can be found at the attachment of comment 1. Note that it's over 5 runs only, so consider it possibly noisy. I haven't saved the individual frame intervals from each run, though TART displays them once it completes its run.

If you think some of the results are questionable, I can repeat these runs and post data including individual intervals from each run (a lot of data). Or you can run TART yourself and collect data on platforms with "unexpected" results.


> This is surprising. On OS X, we're faster than m-c everywhere but in
> double-DPI tabopening (+1ms). The effect is most pronounced when looking at
> double-DPI tabclosing, where m-c is 38% slower (3ms).
> ...

True about the differences. Here are the exact numbers (from the attachment above) on OS X:

Nightly ASAP:
TART.icon-open-DPI1.half     Average (5): 12.67 stddev: 3.84
TART.icon-close-DPI1.half    Average (5): 10.13 stddev: 4.57

TART.icon-open-DPI2.half     Average (5): 12.85 stddev: 2.99
TART.icon-close-DPI2.half    Average (5): 11.13 stddev: 3.42

UX ASAP:
TART.icon-open-DPI1.half     Average (5): 10.89 stddev: 1.22
TART.icon-close-DPI1.half    Average (5): 8.63  stddev: 4.29

TART.icon-open-DPI2.half     Average (5): 13.80 stddev: 5.35
TART.icon-close-DPI2.half    Average (5): 7.83  stddev: 2.56

Note that stddev here is not negligible. 


> The slow Windows machines also have some strange results...

We have a lot of performance differences between different HW. For instance, the AMD system which I tested is roughly 50% faster per core than the (terribly slow) Atom system. The AMD's iGPU is also vastly superior to the atom's (it actually runs Half life 2 in > 30fps at 720p, where the atom struggles to run HL1 without jerking terribly). However, the AMD machine also has GPU composition blacklisted in Firefox, while the atom machine has it enabled by default (D9D9).

So considering the many different code paths, optimizations, advantages and disadvantages of each system wrt to different computational and graphics performance, I'd find it hard to define results as "strange". They're just different.


> Did you control the CPU stepping/speed for the slower
> devices?... that could influence the results rather dramatically...

I didn't touch stepping/turbo, etc, however, both machine step up to the highest performance level as soon as some load starts, and stay at that level until the load is gone (I do have a habit of keeping various CPU monitors running). Both systems don't throttle (the AMD never throttles, and the atom is with passive cooling, yet I never saw it throttling. It's 3W TDP).

I don't think throttling is a factor on my results.
(Reporter)

Comment 4

5 years ago
Also, in general, unless there's a specific perf difference that really doesn't make sense at the numbers I've posted, and which is important to understand if this difference is real, I think the looking at specific values misses the big picture.

First of all, the results were noisy, and at the slower system they're quite more noisy and more janky. Individual "wrong" numbers might creep in.

But if you take a step back and consider all the results, I think that the trends are relatively clear.

When there's a specific difference or trend which we think is important, by all means it warrants running it more times and looking at the collected values with more scrutiny and in higher resolution.

Again, I'm willing to re-run whatever tests we need, under any special setup which we think is important, and collect as much data as deemed required. And also anyone else could do this on his/her machine. Using TART is pretty straightforward.
No longer depends on: 902924
Hey Bas,

I did Moz2D recordings and SPS profiles of the basic TART test on both m-c and UX nightlies on a Windows 7 netbook. Could you help analyze them to identify what makes the tab animations slower on UX?

TART Results without recording/profiles on Nightly 20130820, Aero Glass, restored window, ASAP, HWA blocked (Half / All for 20 iterations)
m-c: open: 11.34 / 17.50  close:  8.79 / 12.13
UX:  open: 15.15 / 25.65  close: 18.61 / 22.64

Profiles for ~8 iterations:
m-c: http://people.mozilla.com/~bgirard/cleopatra/#report=721799032cf4e6ea1d86e61dfbeb50f449e8d203
ux:  http://people.mozilla.com/~bgirard/cleopatra/#report=21510942e438e7bbcb09b757a08a6dda2dfc0d60

Moz2D Recordings:
m-c: https://people.mozilla.com/~mnoorenberghe/gfx_recordings/mc-nightly-tart-basic-20130820.aer
UX:  https://people.mozilla.com/~mnoorenberghe/gfx_recordings/ux-nightly-tart-basic-20130820.aer
Flags: needinfo?(bas)
Keywords: meta
Hardware: x86_64 → All
(Reporter)

Comment 6

5 years ago
(In reply to Matthew N. [:MattN] from comment #5)
> 
> I did Moz2D recordings and SPS profiles of the basic TART test ...

Next time, when you profile, uncheck "Accurate first recorded frame" at TART.

If you keep it checked, then TART will change the opacity of the Back button several times before starting the tab animation. This is quite an ugly workaround to make sure that the 
(TART) recording of the first interval is accurate. However, if you profile then it adds irrelevant noise before animation which are measured by TART. When it's unchecked, then one should regard the first frame interval which TART measures/reports as inaccurate (measures longer than it actually is).

I'll add a note to the TART UI on this.

@bas, when you look at these profiles, select a single animation range (several frames) at the profiler to exclude this initial noise. The intervals are typically consistently longer on UX compared to m-c, so theoretically, selecting just few frames range should be enough to notice the difference in performance.
(In reply to Avi Halachmi (:avih) from comment #6)
> (In reply to Matthew N. [:MattN] from comment #5)
> > 
> > I did Moz2D recordings and SPS profiles of the basic TART test ...
> 
> Next time, when you profile, uncheck "Accurate first recorded frame" at TART.

I did that because I thought of the issue you mentioned.

The recordings didn't seem to work anyways. It only recorded the content area for a short duration. I'm not sure of the cause yet but I wonder if it has to do with HWA being blocked or the new e!0s stuff.
(In reply to Matthew N. [:MattN] from comment #7)
> The recordings didn't seem to work anyways. 

To clarify, I mean the Moz2d recordings didn't work, everything else is fine.
Depends on: 907546
(In reply to Matthew N. [:MattN] from comment #8)
> (In reply to Matthew N. [:MattN] from comment #7)
> > The recordings didn't seem to work anyways. 
> 
> To clarify, I mean the Moz2d recordings didn't work, everything else is fine.

I replaced the Moz2D recording files with working ones having D2D enabled. Enabling D2D on this Atom netbook made UX faster than m-c:

TART Results without recording/profiles on Nightly 20130820, Aero Glass, restored window, ASAP, HWA on (direct2d) (Half / All for 20 iterations)
m-c: open: 38.91 / 54.51  close: 32.91 / 41.46
UX:  open: 21.57 / 36.36  close: 28.33 / 36.93
(In reply to Matthew N. [:MattN] from comment #5)
> Hey Bas,
> 
> I did Moz2D recordings and SPS profiles of the basic TART test on both m-c
> and UX nightlies on a Windows 7 netbook. Could you help analyze them to
> identify what makes the tab animations slower on UX?
> 
> TART Results without recording/profiles on Nightly 20130820, Aero Glass,
> restored window, ASAP, HWA blocked (Half / All for 20 iterations)
> m-c: open: 11.34 / 17.50  close:  8.79 / 12.13
> UX:  open: 15.15 / 25.65  close: 18.61 / 22.64
> 
> Profiles for ~8 iterations:
> m-c:
> http://people.mozilla.com/~bgirard/cleopatra/
> #report=721799032cf4e6ea1d86e61dfbeb50f449e8d203
> ux: 
> http://people.mozilla.com/~bgirard/cleopatra/
> #report=21510942e438e7bbcb09b757a08a6dda2dfc0d60
> 
> Moz2D Recordings:
> m-c:
> https://people.mozilla.com/~mnoorenberghe/gfx_recordings/mc-nightly-tart-
> basic-20130820.aer
> UX: 
> https://people.mozilla.com/~mnoorenberghe/gfx_recordings/ux-nightly-tart-
> basic-20130820.aer

I'll focus on the graphics portions of this first.

So I looks like neither one of these profiles is really dominated by graphics. It's about 8.8% for m-c vs 14.5% for UX, which might account for some of the difference but not all I suppose.

We seem to be spending about twice as much time in ContainerState::ProcessDisplayItems, which seems to be a result of simply having more displayitems/layers. However, this only account for about 2% of the difference. We seem to be spending 10% vs 6% on EndTransaction which is the actual 'gfx part' of this. Some non-insignificant portion of that is spent doing background colors, probably a little bit more than in the m-c case because of having more displayitems (and possibly layers, but without a Moz2D recording or a layer tree dump it's hard to be sure), but by far the largest contribution is box shadow calculations (at ~3%). And that's also the main difference I find in the profiles for gfx related things. So shadows seem to be contributing most significantly to the difference.

Now asides from graphics.

What's interesting is we're -also- spending 2.5% more of the profile for ux waiting for messages though, this generally suggests that's we're -less- busy so it's kind of strange.

What's also interesting is that for a more significant contribution to this profile, we're only spending 1.1% less (out of a ~18% of the total profile) in dom::FrameRequestCallback doing JS-ish stuff. This is interesting since we're drawing more than 5% (1% of 20%). Since we drew just as many iterations but in -more- CPU time, this suggests the actual CPU time spent here is also higher 'per iteration'.
Flags: needinfo?(bas)
Thanks Bas!

I have an idea where the box-shadow work is coming from but I'm wondering if you have ideas to reduce some of the other costs you've found (or whether they're covered in existing dependencies). The layering seems to cause seemingly unnecessary work when I skim the recordings but I don't know why or what can be done about it. That may also be a layout issue instead. Hopefully the Moz2D recordings will provide more info.

(In reply to Bas Schouten (:bas.schouten) from comment #10)
> by far the
> largest contribution is box shadow calculations (at ~3%). And that's also
> the main difference I find in the profiles for gfx related things. So
> shadows seem to be contributing most significantly to the difference.

There is a new large box-shadow added for an Aero glass fog on Windows Vista & 7 which spans the width of the window. I've filed bug 908067 for this investigation.

==

I'm adding some of the other bugs that were filed based on profiles as dependencies so they are easier to track. If we find out that they are not useful for Australis, feel free to remove them.
Alias: australis-tart
Depends on: 902637, 902639
(In reply to Matthew N. [:MattN] from comment #5)
> TART Results without recording/profiles on Nightly 20130820, Aero Glass,
> restored window, ASAP, HWA on (direct2d) (Half / All for 20 iterations)
> m-c: open: 38.91 / 54.51  close: 32.91 / 41.46
> UX:  open: 21.57 / 36.36  close: 28.33 / 36.93

As requested by Bas, here are profiles for this same netbook now that D2D is enabled (after a driver update):

TART simple run with 8 iterations:
m-c: http://people.mozilla.com/~bgirard/cleopatra/?report=877f2f44019952502d6ff9dafeff27699d89659b
UX:  http://people.mozilla.com/~bgirard/cleopatra/?report=c31195d76a9ed57a0866b4eead4757c0364a0470
(In reply to Matthew N. [:MattN] from comment #11)
> Thanks Bas!
> 
> I have an idea where the box-shadow work is coming from but I'm wondering if
> you have ideas to reduce some of the other costs you've found (or whether
> they're covered in existing dependencies). The layering seems to cause
> seemingly unnecessary work when I skim the recordings but I don't know why
> or what can be done about it. That may also be a layout issue instead.
> Hopefully the Moz2D recordings will provide more info.
> 
> (In reply to Bas Schouten (:bas.schouten) from comment #10)
> > by far the
> > largest contribution is box shadow calculations (at ~3%). And that's also
> > the main difference I find in the profiles for gfx related things. So
> > shadows seem to be contributing most significantly to the difference.
> 
> There is a new large box-shadow added for an Aero glass fog on Windows Vista
> & 7 which spans the width of the window. I've filed bug 908067 for this
> investigation.
> 
> ==
> 
> I'm adding some of the other bugs that were filed based on profiles as
> dependencies so they are easier to track. If we find out that they are not
> useful for Australis, feel free to remove them.

Looking at this profile I honestly don't think there's a significant amount of gain to be had in the graphics department beyond the box shadow. There's just too much more time spent in other things, the mysterious waiting I don't fully understand. I'll have a look at D2D.
(In reply to Matthew N. [:MattN] from comment #12)
> (In reply to Matthew N. [:MattN] from comment #5)
> > TART Results without recording/profiles on Nightly 20130820, Aero Glass,
> > restored window, ASAP, HWA on (direct2d) (Half / All for 20 iterations)
> > m-c: open: 38.91 / 54.51  close: 32.91 / 41.46
> > UX:  open: 21.57 / 36.36  close: 28.33 / 36.93
> 
> As requested by Bas, here are profiles for this same netbook now that D2D is
> enabled (after a driver update):
> 
> TART simple run with 8 iterations:
> m-c:
> http://people.mozilla.com/~bgirard/cleopatra/
> ?report=877f2f44019952502d6ff9dafeff27699d89659b
> UX: 
> http://people.mozilla.com/~bgirard/cleopatra/
> ?report=c31195d76a9ed57a0866b4eead4757c0364a0470

Hrm, so here border images are a big culprit for m-c, presumably the new stuff no longer works with those and that'll make things a lot different. This profile also spends a -ton- of time in CCTimerFired though, more than all the graphics time spent combined, I wonder why that would be different for D2D, or it's some kind of corruption in the profile, very mysterious.
Whiteboard: [Australis:P1][Australis:M9]
Depends on: 908796
We've got our first bits of TART data coming in. Here's the first few datapoints from jmaher's try push (so non-PGO):

https://datazilla.mozilla.org/?start=1376686808&stop=1377291608&product=Firefox&repository=Try-Non-PGO&arch=x86_64&test=tart&page=icon-close-DPI1.all.TART&project=talos

TART is a multifaceted test, so the average number that tbpl displays for the test is kinda useless (it gives a super general indication of tab performance. I guess if it spikes like crazy some day, that's useful - but beyond that, it's pretty low-resolution). So we've gotta use Datazilla to browse each facet of the test.

We should get PGO data soon, as soon as bug 908853 lands on m-c.

I've pushed the patch to try off of UX to give us a little bit of data too:

https://tbpl.mozilla.org/?tree=Try&rev=e7809bd72b08

Once bug 908853 lands, I'll merge it, and we'll be off to the races.
Here's what we've got running on at least one of our XP Talos machines (though I'd hope they're uniform):

Graphics Adapter: {
    "numTotalWindows": 1,
    "numAcceleratedWindows": 1,
    "windowLayerManagerType": "Direct3D 9",
    "windowLayerManagerRemote": false,
    "adapterDescription": "NVIDIA GeForce GT 610",
    "adapterVendorID": "0x10de",
    "adapterDeviceID": "0x104a",
    "adapterRAM": "Unknown",
    "adapterDrivers": "nv4_disp",
    "driverVersion": "6.14.13.1407",
    "driverDate": "2-9-2013",
    "adapterDescription2": "",
    "adapterVendorID2": "",
    "adapterDeviceID2": "",
    "adapterRAM2": "",
    "adapterDrivers2": "",
    "driverVersion2": "",
    "driverDate2": "",
    "isGPU2Active": false,
    "direct2DEnabled": false,
    "directWriteEnabled": false,
    "directWriteVersion": "0.0.0.0",
    "direct2DEnabledMessage": [""],
    "webglRenderer": "Google Inc. -- ANGLE (NVIDIA GeForce GT 610 Direct3D9 vs_3_0 ps_3_0)",
    "info": {
        "AzureCanvasBackend": "skia",
        "AzureSkiaAccelerated": 0,
        "AzureFallbackCanvasBackend": "cairo",
        "AzureContentBackend": "none"
    }
}
Here's a differential profile for Windows 7 (the pseudostack intermingling of XP makes profiles there kinda useless):

http://tests.themasta.com/cleopatra/?report=bbb56b850ac68270ebf4635b1dda20d997a004c5

I tweaked the UX push so that the box-shadow of the fog was gone, since we'd already identified that as a source of regression on 7.
And here's a profile I gathered on my Windows 7 box where hardware acceleration was disabled, and I was in Windows Classic (to best emulate Windows XP):

http://people.mozilla.com/~bgirard/cleopatra/#report=8abcf9a371c96773c49cb7b52668e0d7fcea6a3c
Created attachment 800782 [details]
mc-merged-profile-tart-winxp-af36680248ef.txt.zip

Ok, using mstange's reflow profiling patch from bug 902857, I was able to capture a reflow profile for m-c and UX during 1 iteration of the TART test (more than that caused us to run out of memory when trying to compress the profile data).

You'll need to decompress this file locally, and view it on http://tests.themasta.com/cleopatra/. No uploading - these things are wayyyyyy too big.

Here's m-c.
Created attachment 800783 [details]
ux-merged-profile-tart-winxp-c7f2f6ee6139.txt.zip

And here's UX.
Comment on attachment 800783 [details]
ux-merged-profile-tart-winxp-c7f2f6ee6139.txt.zip

So I accidentally included a patch in this push that collapsed the urlbar-container, which is highly unrealistic and renders this profile rather useless.

Regenerating.
Attachment #800783 - Attachment is obsolete: true
Created attachment 800836 [details]
ux-merged-profile-tart-winxp-355582f77d71.txt.zip

Ok, this UX reflow profile should be more useful.
So I decided to test the assumption that the tabs are what is causing this performance regression. I found the changeset for when bug 738491 landed, and pushed that to Try - tweaked to run the most recent talos suite.  I did the same for the m-c changeset that bug 738491 landed on.

Here are my try pushes:

m-c: https://tbpl.mozilla.org/?tree=Try&rev=c46e8c26db51
UX: https://tbpl.mozilla.org/?tree=Try&rev=bd3482cc8410

And here's the compare-talos:

http://compare-talos.mattn.ca/breakdown.html?oldTestIds=29240939&newTestIds=29240281&testName=tart&osName=Windows%20XP&server=graphs.mozilla.org

Holy smokes, the tabs aren't the culprit. Or at least, they weren't initially. I'm going to keep bisecting and try to determine where the regression was introduced.
Next bisecting step:

m-c: https://tbpl.mozilla.org/?tree=Try&rev=7f8d17888991
UX: https://tbpl.mozilla.org/?tree=Try&rev=ce25fc1a6fcb

And here's the compare-talos:

http://compare-talos.mattn.ca/breakdown.html?oldTestIds=29241293&newTestIds=29241251&testName=tart&osName=Windows%20XP&server=graphs.mozilla.org

If you discount the error values, I don't think the regression was introduced here (between UX changesets b13d07fc8417 and cba03cce9531).

Continuing bisection...
(Reporter)

Comment 25

5 years ago
Agreed. However, while I wouldn't yet consider the specific regressions in (some of) the .error values as a shipping blocker, they're not completely negligible either (they were negligible in the comparison from comment 23). So just keep an eye on those.
Next bisecting step:

m-c: https://tbpl.mozilla.org/?tree=Try&rev=cb8a15d39c98
UX: https://hg.mozilla.org/try/rev/4797bd492208

compare-talos breakdown: http://compare-talos.mattn.ca/breakdown.html?oldTestIds=29241755&newTestIds=29241713&testName=tart&osName=Windows%20XP&server=graphs.mozilla.org

Really horrific performance on the error measurements here. I agree, Avi, that these are not negligible. I'm going to continue bisecting as if this changeset is "good", as I'm looking for regressions in the non-error measurements, but we should revisit this and investigate what happened between this changeset and landing the curvy tabs.

So not introduced yet. Next bisecting step...
(Reporter)

Comment 27

5 years ago
(In reply to Mike Conley (:mconley) from comment #26)
> Really horrific performance on the error measurements here. I agree, Avi,
> that these are not negligible.

Actually, these (.error regressions on comment 26) are practically negligible to me (they're there, but they don't mean much with these specific values).

The key here is to look at the absolute values and remember what the .error mean: ms difference between actual duration to the duration defined at the transition-duration css.

So the last ones are at most ~3.5ms regression - over 230ms animation, on ux it took an extra 3.5ms to complete the animation, and considering than on real-world systems (where we don't have unlimited frame rate as we do in TART) the minimum frame interval is 16.7ms, these extra few ms don't mean much, and are certainly not perceivable.

On comment 24 the regressions were around 10ms. So while it still doesn't mean too much on real-world systems (hence not a blocker), it could indicate that more work is creeping into that specific action.

Here's an (meta/pseudo/imaginary) example how the .error value is introduced.

[0ms] tart.startRecordingIntervals();

BrowserOpenTab();
--> {
       gStart = Date.now();

[1ms]  allocateSomeStuff1();
[15ms] prepareSomeOtherStuff2();
       tab = createTabInGbrowser();
[40ms] tab.setAttribute("fadein", "true"); // <-- the transition starts 40ms after OpenTab() was triggered.
}

function onTabTransitionEndWidth(e) {
  actuallyTook = Date.Now() - gStart; // <-- if transition-duration was 100, then actuallyTook==140.
}

tart.onTabTransitionEnd() {
  intervals = stopRecordingAndGetIntervals();
  munchIntervalsInto_all_half_error_andReport(); <-- First recorded frame will be more than 40ms, and .error would be ~40ms.
}

Comment 29

5 years ago
(In reply to Mike Conley (:mconley) from comment #26)
> UX: https://hg.mozilla.org/try/rev/4797bd492208
> So not introduced yet. Next bisecting step...

(In reply to Mike Conley (:mconley) from comment #28)
> UX: https://tbpl.mozilla.org/?tree=Try&rev=38931a8ff66e
> The regression is definitely present here. Down to ~7 steps...

I looked at this range this morning, and there's at least one obvious thing in there which could be affecting us: the new bookmark/star widget. I'll see if I have time to do a try run with that backed out later today.
Next bisecting step:

m-c: https://tbpl.mozilla.org/?tree=Try&rev=500f45accbdf
UX: https://tbpl.mozilla.org/?tree=Try&rev=d9604b6b8b0e

compare-talos breakdown: http://compare-talos.mattn.ca/breakdown.html?oldTestIds=29245595&newTestIds=29245559&testName=tart&osName=Windows%20XP&server=graphs.mozilla.org

It looks like there *may* be some regression here, but it's not a lot (certainly not like what we saw in comment 28). I think I'm going to mark this changeset as "good" and move on, but come back to this range later to see if we can pin down what caused this.

Comment 31

5 years ago
(In reply to Mike Conley (:mconley) from comment #30)
> Next bisecting step:
> 
> m-c: https://tbpl.mozilla.org/?tree=Try&rev=500f45accbdf
> UX: https://tbpl.mozilla.org/?tree=Try&rev=d9604b6b8b0e
> 
> compare-talos breakdown:
> http://compare-talos.mattn.ca/breakdown.
> html?oldTestIds=29245595&newTestIds=29245559&testName=tart&osName=Windows%20X
> P&server=graphs.mozilla.org
> 
> It looks like there *may* be some regression here, but it's not a lot
> (certainly not like what we saw in comment 28). I think I'm going to mark
> this changeset as "good" and move on, but come back to this range later to
> see if we can pin down what caused this.

I opened the compare-talos link for comment 28 and this one next to each other, and the numbers are wildly different. What's causing that?

Comment 28: http://compare-talos.mattn.ca/breakdown.html?oldTestIds=29243503&newTestIds=29243495&testName=tart&osName=Windows%20XP&server=graphs.mozilla.org
Comment 30: http://compare-talos.mattn.ca/breakdown.html?oldTestIds=29245595&newTestIds=29245559&testName=tart&osName=Windows%20XP&server=graphs.mozilla.org

In the former, the half/all numbers for both m-c and UX seem to be around 2-3. For the latter, they are around 16-17. Wat?
(In reply to :Gijs Kruitbosch from comment #31)
> (In reply to Mike Conley (:mconley) from comment #30)
> > Next bisecting step:
> > 
> > m-c: https://tbpl.mozilla.org/?tree=Try&rev=500f45accbdf
> > UX: https://tbpl.mozilla.org/?tree=Try&rev=d9604b6b8b0e
> > 
> > compare-talos breakdown:
> > http://compare-talos.mattn.ca/breakdown.
> > html?oldTestIds=29245595&newTestIds=29245559&testName=tart&osName=Windows%20X
> > P&server=graphs.mozilla.org
> > 
> > It looks like there *may* be some regression here, but it's not a lot
> > (certainly not like what we saw in comment 28). I think I'm going to mark
> > this changeset as "good" and move on, but come back to this range later to
> > see if we can pin down what caused this.
> 
> I opened the compare-talos link for comment 28 and this one next to each
> other, and the numbers are wildly different. What's causing that?
> 
> Comment 28:
> http://compare-talos.mattn.ca/breakdown.
> html?oldTestIds=29243503&newTestIds=29243495&testName=tart&osName=Windows%20X
> P&server=graphs.mozilla.org
> Comment 30:
> http://compare-talos.mattn.ca/breakdown.
> html?oldTestIds=29245595&newTestIds=29245559&testName=tart&osName=Windows%20X
> P&server=graphs.mozilla.org
> 
> In the former, the half/all numbers for both m-c and UX seem to be around
> 2-3. For the latter, they are around 16-17. Wat?

Whoa, you're right. Something really fishy happened in between those changesets that affected both m-c and UX...

Comment 33

5 years ago
(In reply to :Gijs Kruitbosch from comment #31)
> In the former, the half/all numbers for both m-c and UX seem to be around
> 2-3. For the latter, they are around 16-17. Wat?

All the other numbers also seem to be around the 16-17 mark. However, the current m-c/UX results in DataZilla seem to all be around the 2-3 mark.

It may also be useful to note that the 'big' regression shown in comment 28 (where numbers are around 2-3) is actually, in some cases, similar or even smaller in absolute size to the 'small' regression shown in comment 30 (e.g. icon-close-DPI2.half regressed by 0.4 in the comment 30 case, but by 0.12 in the comment 28 one). (Are all these numbers ms? I think so, but I'm not sure)
So maybe comment 30's rev is also regressed... or maybe not. Would it make sense to rerun some of the tsvg runs to get more stable numbers?
(In reply to :Gijs Kruitbosch from comment #33)
> So maybe comment 30's rev is also regressed... or maybe not. Would it make
> sense to rerun some of the tsvg runs to get more stable numbers?

I triggered the tsvg tests a few more times on those builds, and the compare-talos breakdown is more or less the same:

http://compare-talos.mattn.ca/breakdown.html?oldTestIds=29245595,29250743,29250767,29250799,29250815,29250823&newTestIds=29245559,29250751,29250759,29250775,29250783,29250791,29250807&testName=tart&osName=Windows%20XP&server=graphs.mozilla.org

My next bisect steps also completed:

m-c: https://tbpl.mozilla.org/?tree=Try&rev=7ed8bee98794
UX: https://tbpl.mozilla.org/?tree=Try&rev=d0b6e8e47502

compare-talos breakdown: http://compare-talos.mattn.ca/breakdown.html?oldTestIds=29250555&newTestIds=29250543&testName=tart&osName=Windows%20XP&server=graphs.mozilla.org

Like before - it appears that maybe a slight regression has already slipped in here... as before, I think we should come back at some point, and mark this changeset "bad" to see what introduced this regression - but we're not yet at the same magnitude of regression that we're currently seeing on m-c and UX. So I'm going to mark this good and keep moving.
So Gijs brought up a good point in IRC - the TART test that I've been running these bisections on is using layout.frame_rate = 0 to get us into ASAP mode, which TART needs to test on.

However, ASAP mode landed *after* the curvy tabs stuff landed for Windows (it landed in, I believe, bug 888899) which renders some of these TART results kinda useless.

But, good news everyone - there's a way we can get ASAP mode going pre-888899. We simply need to set layers.frame_rate to 10000, and (I believe) that'll simulate ASAP just fine. Avi can advise me if I'm totally off my rocker.

Anyhow, I've started the bisections from scratch, and I've again confirmed that the curvy tabs *did not introduce the regression for XP*:

m-c (25c2aaee8acc): https://tbpl.mozilla.org/?tree=Try&rev=2b0ef439008f
UX (cba03cce9531): https://tbpl.mozilla.org/?tree=Try&rev=97491232b8b5

compare-talos breakdown: http://compare-talos.mattn.ca/breakdown.html?oldTestIds=29253477&newTestIds=29253493&testName=tart&osName=Windows%20XP&server=graphs.mozilla.org


But I've been able to show the regression here:

m-c (3d40d270c031): https://tbpl.mozilla.org/?tree=Try&rev=359b8d0115ff
UX (ae7aaa96be25): https://tbpl.mozilla.org/?tree=Try&rev=5d7f7d2886b2

compare-talos breakdown: http://compare-talos.mattn.ca/breakdown.html?oldTestIds=29253371&newTestIds=29253229&testName=tart&osName=Windows%20XP&server=graphs.mozilla.org


So I don't think this is too much of a set back. I've pushed more bisection changesets to try, and we'll see how it shakes out. You probably want to ignore any of my previous data about the TART results from the older bisections.
Ok, I see the regression at UX revision b13d07fc8417:

m-c (0acda90a6f6a): https://tbpl.mozilla.org/?tree=Try&rev=fd61e66756f0
UX (b13d07fc8417): https://tbpl.mozilla.org/?tree=Try&rev=5afc57cc1e07

compare-talos: http://compare-talos.mattn.ca/breakdown.html?oldTestIds=29254133&newTestIds=29254297&testName=tart&osName=Windows%20XP&server=graphs.mozilla.org

So the regression was introduced between UX changesets cba03cce9531 and b13d07fc8417.

Continuing bisection...
I see some regression at UX revision 045da4704a6a:

m-c (cb242a1cccb2): https://tbpl.mozilla.org/?tree=Try&rev=0f3a1e5b1a60
UX (045da4704a6a): https://tbpl.mozilla.org/?tree=Try&rev=77f6845f1024

compare-talos: http://compare-talos.mattn.ca/breakdown.html?oldTestIds=29256017&newTestIds=29256009&testName=tart&osName=Windows%20XP&server=graphs.mozilla.org

I think more was introduced *after* 045da4704a6a, and that's worth pursuing, but I guess I'm going to try to attack the first-cause here, and then come back for the second one (assuming it's actually there).

So now I'm searching for the regression between cba03cce9531 and 045da4704a6a.
(Reporter)

Comment 38

5 years ago
(In reply to Mike Conley (:mconley) from comment #35)
> But, good news everyone - there's a way we can get ASAP mode going
> pre-888899. We simply need to set layers.frame_rate to 10000, and (I
> believe) that'll simulate ASAP just fine. Avi can advise me if I'm totally
> off my rocker.

TL:DR;
- For windows and linux we can use 10000 for ASAP on any revision. For OS X it's more complex.
- ASAP mode requires bug 884955 (landed 2013-06-20) or else TART results may be invalid (0ms intervals).

Overall, I suggest to use layout.frame_rate=10000 (at talos/test.py), look only at windows and linux results, and apply the patch of bug 884955 to any revision which doesn't include it already.

See bug 908741 for explanation about ASAP.

On OS X:
- 10000 should (haven't tested) enable ASAP before 2012-05-24 (bug 748816).
- After that, ASAP mode could not be enabled before 2013-07-11 (bug 888899 comment 5 - part 1).
- Between 888899 part 1 and until part 2 (2013-08-05), only 10000 will work.
- After part 2 landed, only 0 will work.

Beside that, ASAP mode also tries to avoid paint starvation (AKA favor-performance-mode - a prehistoric system which may prevent paints and other OS events while prioritizing internal events for 2000ms right after page load - bug 880036, bug 822096, bug 906811).

However, avoiding paint starvation (using the pref docshell.event_starvation_delay_hint=1) is only possible since bug 884955 (2013-06-20) which introduced this pref.

Not using this pref will most likely result in TART reporting intervals of 0 (tested recent Firefox builds on linux and hi/low-end windows systems - without this pref results are indeed 0).

(In reply to Mike Conley (:mconley) from comment #36)
> Ok, I see the regression at UX revision b13d07fc8417:
rver=graphs.mozilla.org
>  ...
> So the regression was introduced between UX changesets cba03cce9531 and
> b13d07fc8417.

You seem to treat it as if there was only a single regression. That's not necessarily the case.
(Reporter)

Comment 39

5 years ago
(In reply to Avi Halachmi (:avih) from comment #38)
> Overall, I suggest to use layout.frame_rate=10000 (at talos/test.py), look
> only at windows and linux results, and apply the patch of bug 884955 to any
> revision which doesn't include it already.

If applying the patch from bug 884955 proves very uncomfortable for older revisions, there's an alternative:

1. Set USE_RECORDING_API:false at talos/page_load_test/tart/addon/content/tart.js .

2. Don't prevent paint starvation (comment out only the docshell.event_starvation_delay_hint=1 part at talos/test.py).

1 will use a less accurate frames intervals recording system which can measure intervals even during paint starvation. However, it's less capable of exposing regressions - especially GFX related - since without painting, the graphics part (Present) is not invoked, so it's not represented in the results - results will appear to be better.

2 will not prevent paint starvation (which would only work from bug 884955 (2013-06-20)), which is required since 1 will record different results when paints are starved/not-starved.

I think the suggestion from comment 38 is better overall, but if it's hard to apply the patch to older revisions during the bisection, then this alternative might still be able to expose some regressions, and hopefully the important ones.
(Reporter)

Comment 40

5 years ago
(In reply to Mike Conley (:mconley) from comment #35)
> ...
> Anyhow, I've started the bisections from scratch, and I've again confirmed
> that the curvy tabs *did not introduce the regression for XP*:

Am I missing the XP* comment?

Also, in your next talos-compare links, could you please post the links to the "overview" comparison page instead of the "Details"?


Here's an validity analysis of all the previous bisection comparisons. To make sure our theory stands and that nothing falls between the cracks.

comment 23 - unreliable/bad regression detection (16.7ms wall)
m-c 2013-04-17
ux  2013-03-08

Comment 24 - unreliable/bad regression detection (16.7ms wall)
m-c 2013-06-11
ux  2013-06-12

Comment 26 - unreliable/bad regression detection (16.7ms wall)
m-c 2013-07-21
ux  2013-07-22

Comment 28 - Good regression detection (intervals 2-3 ms)
m-c 2013-08-12
ux  2013-08-10

Comment 30 - unreliable/bad regression detection (16.7ms wall)
m-c 2013-07-29
ux  2013-07-20

Comment 34 - unreliable/bad regression detection (16.7ms wall)
m-c 2013-08-02
ux  2013-08-02

Comment 35 - start using 10000: Seems good (intervals 2-3 ms)
First comparison (paint starvation prevention not supported):
m-c 2013-04-17
ux  2013-03-08

Second comparison: Valid.
m-c 2013-07-29
ux 2013-07-30

Important: on the first comparison on win XP, both builds don't have paint starvation prevention (supported with the pref since 2013-06-20). I'd have expected those to show intervals of 0. It's possible that this system behaves differently (=doesn't starve) on Windows XP.

Interestingly, also on the first comparison, if looking at the win7/8 comparison (if I got it right: http://compare-talos.mattn.ca/breakdown.html?oldTestIds=29253453&newTestIds=29253501&testName=tart&osName=Windows%207&server=graphs.mozilla.org ), both show ~25% improvement across the board. If I hadn't seen the build dates, I could have guessed that both revisions use the "less accurate recording" from comment 39, and that the "old revision" does prevent paint starvation and the "new revision" doesn't prevent it, which could explain the seemingly better results.
  
But since AFAIK both should behave similarly in this regard, and unless there was some unrelated change between these revisions which improved everything, I'm unable to explain the results on win7/8 in 2 ways:
- I'd have expected all the results to be 0.
- If not 0, I wouldn't expect the comparison to show such "improvement".

This is not good.
  
That being said, by looking at the absolute values of the XP comparison from comment 35, I'd say this specific comparison (the first) is valid.

Comment 36 - Seems good regression detection (same reservation as with comment 35)
m-c 2013-06-11
ux  2013-06-12

Comment 37 - Seems good regression detection (same reservation as with comment 35)
m-c 2013-05-16
ux  2013-05-16

Bottom line, starting with comment 35, the comparisons _look_ valid when judging by absolute result values, but I'm not yet able to understand why it look valid on builds prior to 2013-06-20.

I'd have expected them to not work in some way. If the win7/8 runs had 0ms intervals, it would mean that XP is affected differently by paint starvation and it would explain everything. But as long as the win7/8 show some non-0 results and also non 16.67, I still need to investigate this and the validity of the comparisons.
(Reporter)

Comment 41

5 years ago
(In reply to Avi Halachmi (:avih) from comment #40)
> ...
> Interestingly, also on the first comparison, if looking at the win7/8
> comparison (if I got it right:
> http://compare-talos.mattn.ca/breakdown.
> html?oldTestIds=29253453&newTestIds=29253501&testName=tart&osName=Windows%207
> &server=graphs.mozilla.org ), both show ~25% improvement across the board.

I think I got this part (the link) wrong. It would really help to see win7/8 results of those builds.

> I'd have expected them to not work in some way. If the win7/8 runs had 0ms
> intervals, it would mean that XP is affected differently by paint starvation
> and it would explain everything.

This still holds, and I really hope this is what we'll get on win7/8.
Wow, a lot of traffic in here lately.

First off, I think my posting each step of my bisection is probably less useful than I thought, and is adding confusion since we re-started the bisection process.

So I've started to track the bisections in this spreadsheet:

https://docs.google.com/a/mozilla.com/spreadsheet/ccc?key=0Asj8iLTl0K0UdGJzVnVYYkJTSDREbndRWWFTLWNyS1E#gid=1

I will not be posting my bisection steps until I find the first major culprit.

Avi - we discussed a few things late in the night last night, including whether or not we're being affected by some weird paint starvation behaviour. My spreadsheet includes the links to the compare-talos page (not the breakdown) that displays data for Win 7 and Win 8. Does that give you any clues as to whether paint starvation is a factor here?

(In reply to Avi Halachmi (:avih) from comment #38)

> You seem to treat it as if there was only a single regression. That's not necessarily the case.

I am completely aware that there might be more than 1 regression here. :) What I'm looking for is the first major regression.

(In reply to Avi Halachmi (:avih) from comment #40)
> (In reply to Mike Conley (:mconley) from comment #35)
> > ...
> > Anyhow, I've started the bisections from scratch, and I've again confirmed
> > that the curvy tabs *did not introduce the regression for XP*:
> 
> Am I missing the XP* comment?

Nope, you aren't. Basically, I restarted the bisection using frame_rate = 10000 (via a self-rolled talos build), and re-confirmed that the regression was not caused by the initial landing of the curvy tabs for Windows.

> 
> Also, in your next talos-compare links, could you please post the links to
> the "overview" comparison page instead of the "Details"?

Sure - it's in the spreadsheet now.
(Reporter)

Comment 43

5 years ago
So, got some explanation. TL:DR; so far we're good:

1. Paint starvation affects older builds much less than recent builds.
2. The UX and m-c builds, on both win7 and XP are effected in the same way.
3. The grand improvement on win7 seems valid (locally I reproduced a ux win as well with the 2013-03-08, but with lesser magnitude).

On recent builds, if we don't prevent starvation explicitly, the whole TART run is a slideshow and all the results are 0.

On older builds (I tested locally and also looked at most of the bisected builds tart logs between 2013-03-08 and 2013-05-20), it only affects the first 2 animations (simple-open and simple-close) which report intervals of 0ms, and only on the first-of-25 talos run, and it does so both with UX and with m-c, so the comparisons so far are valid.

The only result which appears noisy with these older builds so far is icon-open-DPI1 (.half/all/error), so I suggest to ignore it for now.

So it seems that somewhere between 2013-05-20 and today, paint starvation effect on TART changed dramatically.

If this somewhere happens to be after 06-20, then we're in the clear because after that date starvation prevention will be avoided because the pref we use will be respected.

If it happens earlier than that, then expect different result before/after that date.

However, regardless of when it happens, results before/after 06-20 are expected to be at least slightly different, since before this date the builds are at least slightly affected by starvation, and after this date they're not.
Avi requested the graphics hardware profiles for our Talos slaves, so here they are:


Windows XP:

{
  "numTotalWindows": 1,
  "numAcceleratedWindows": 1,
  "windowLayerManagerType": "Direct3D 9",
  "adapterDescription": "NVIDIA GeForce GT 610",
  "adapterVendorID": "0x10de",
  "adapterDeviceID": "0x104a",
  "adapterRAM": "Unknown",
  "adapterDrivers": "nv4_disp",
  "driverVersion": "6.14.13.1407",
  "driverDate": "2-9-2013",
  "adapterDescription2": "",
  "adapterVendorID2": "",
  "adapterDeviceID2": "",
  "adapterRAM2": "",
  "adapterDrivers2": "",
  "driverVersion2": "",
  "driverDate2": "",
  "isGPU2Active": false,
  "direct2DEnabled": false,
  "directWriteEnabled": false,
  "directWriteVersion": "0.0.0.0",
  "direct2DEnabledMessage": [
    ""
  ],
  "webglRenderer": "Google Inc. -- ANGLE (NVIDIA GeForce GT 610)",
  "info": {
    "AzureCanvasBackend": "skia",
    "AzureFallbackCanvasBackend": "cairo",
    "AzureContentBackend": "none"
  }
}

Windows 7:

{
  "numTotalWindows": 1,
  "numAcceleratedWindows": 1,
  "windowLayerManagerType": "Direct3D 10",
  "adapterDescription": "NVIDIA GeForce GT 610",
  "adapterVendorID": "0x10de",
  "adapterDeviceID": "0x104a",
  "adapterRAM": "1023",
  "adapterDrivers": "nvd3dum nvwgf2um,nvwgf2um",
  "driverVersion": "9.18.13.1407",
  "driverDate": "2-9-2013",
  "adapterDescription2": "",
  "adapterVendorID2": "",
  "adapterDeviceID2": "",
  "adapterRAM2": "",
  "adapterDrivers2": "",
  "driverVersion2": "",
  "driverDate2": "",
  "isGPU2Active": false,
  "direct2DEnabled": true,
  "directWriteEnabled": true,
  "directWriteVersion": "6.1.7601.17514",
  "webglRenderer": "Google Inc. -- ANGLE (NVIDIA GeForce GT 610)",
  "info": {
    "AzureCanvasBackend": "direct2d",
    "AzureFallbackCanvasBackend": "cairo",
    "AzureContentBackend": "direct2d"
  }
}

Windows 8:

{
  "numTotalWindows": 1,
  "numAcceleratedWindows": 1,
  "windowLayerManagerType": "Direct3D 10",
  "adapterDescription": "NVIDIA GeForce GT 610",
  "adapterVendorID": "0x10de",
  "adapterDeviceID": "0x104a",
  "adapterRAM": "1024",
  "adapterDrivers": "nvd3dumx,nvwgf2umx,nvwgf2umx nvd3dum,nvwgf2um,nvwgf2um",
  "driverVersion": "9.18.13.1090",
  "driverDate": "12-29-2012",
  "adapterDescription2": "",
  "adapterVendorID2": "",
  "adapterDeviceID2": "",
  "adapterRAM2": "",
  "adapterDrivers2": "",
  "driverVersion2": "",
  "driverDate2": "",
  "isGPU2Active": false,
  "direct2DEnabled": true,
  "directWriteEnabled": true,
  "directWriteVersion": "6.2.9200.16384",
  "webglRenderer": "Google Inc. -- ANGLE (NVIDIA GeForce GT 610)",
  "info": {
    "AzureCanvasBackend": "direct2d",
    "AzureFallbackCanvasBackend": "cairo",
    "AzureContentBackend": "direct2d"
  }
}
Depends on: 915352
Historical TART runs on nightlies are currently running on the borrowed talos slave(s). Results are being submitted to my graph server and logs will be uploaded to my people account when the runs are done. I started at the first nightly after the UX branch was reset to a clean state ready to be used for integration purposes as prior UX nightlies are built on revisions that are no longer in public mercurial repos. It will take about 24 hours for each branch to complete.

Graph link: http://graphs.mattn.ca/graph-local.html#tests=[[293,59,11],[293,1,11]]&sel=none&displayrange=365&datatype=running
Log/result output: https://people.mozilla.org/~mnoorenberghe/talos-tart-results/

Automation Setup:
* modified mozregression: https://github.com/mnoorenberghe/mozregression/tree/just_download
* script: https://hg.mozilla.org/users/mozilla_noorenberghe.ca/talos-tart/file/tip/tart-nightlies.sh
* talos-tart patch: https://hg.mozilla.org/users/mozilla_noorenberghe.ca/talos-tart/file/tip/nightlies.patch

Some more notes are in the australis-perf-standup etherpad[1] for 2013-09-12.

Let me know if you have any questions.

[1] https://etherpad.mozilla.org/australis-perf-standup
This is great! Thanks so much, Matt.
(Reporter)

Comment 47

5 years ago
(In reply to Matthew N. [:MattN] (away Sep. 16 - 22+) from comment #45)
> Graph link:
> http://graphs.mattn.ca/graph-local.html#tests=[[293,59,11],[293,1,
> 11]]&sel=none&displayrange=365&datatype=running

I should still verify that the results are valid by looking at the logs when they become available, but if we trust the results, and despite their noise, I think we can say few things already:

1. Until May 2nd: UX was worse than m-c.
2. From May 3rd until May 17th: UX was better than m-c.
3. From May 18th to mid July (we don't currently have more m-c data): UX gradually improved until it got roughly on par with m-c.

From this, we could derive:

1. Understand what happened between May 2nd and May 3rd (make sure we carefully match those dates to the relevant hg pushes), and learn from this on the kind of change which could improve UX performance.

2. Find out what happened between May 17th and 18th, and try to fix this regression.

3. Pointing at specific commits regressions from May 18th to mid July is going to be hard and should probably not be our first priority.

Looking forward to the rest of the m-c results coming in.
(Reporter)

Comment 48

5 years ago
(In reply to Avi Halachmi (:avih) from comment #47)
> 3. Pointing at specific commits regressions from May 18th to mid July is
> going to be hard and should probably not be our first priority.

Ermm.. this is an improvement range. So s/regressions/improvements/
Depends on: 916859
Depends on: 916946
Depends on: 917795
No longer depends on: 907546
Here's a recent profile from latest UX tip (e5e735235d91)

http://people.mozilla.org/~bgirard/cleopatra/#report=6a24f6de52bbc88d6f1b21e314b7c37dcd5750a0

This is directly from one of our XP talos slaves, and side-steps the pseudostack issue (bug 900524).
(In reply to Avi Halachmi (:avih) from comment #47)
> 1. Until May 2nd: UX was worse than m-c.
> 2. From May 3rd until May 17th: UX was better than m-c.
> 3. From May 18th to mid July (we don't currently have more m-c data): UX
> gradually improved until it got roughly on par with m-c.

Erm, I think there's a word wrong somewhere in here -- until May 2, UX performance was worse than m-c perf.  Between May 3 and May 17, the above says that UX performance was better than m-c; after May 18 it says that UX performance got better until it was the same as m-c.

But ux was already better than m-c according to the above.. so how could it improve to match?
(Reporter)

Comment 51

5 years ago
(In reply to Vladimir Vukicevic [:vlad] [:vladv] from comment #50)
> Erm, I think there's a word wrong somewhere in here ..

Sort of. The missing bit is that from May 17th to 18th UX regressed to slightly worse than m-c, and then gradually improved roughly to the same level as m-c at mid-July.

Since then, they've been roughly the same, with both improving meaningfully on Sep 6th.

All this info is by looking at the graph produced by running TART on existing historic m-c/ux nightlies, as described in comment 45:

> http://graphs.mattn.ca/graph-local.html#tests=[[293,59,11],[293,1,11]]&sel=1365305565548.1204,1379503308368.197,2.802507488193797,4.077134353865438&displayrange=365&datatype=running


> But ux was already better than m-c according to the above.. so how could it
> improve to match?

Not better, but indeed seems on par. However, this graph shows only the average from all TART's subtests. Further examination of individual subtests show that UX is better than m-c on some areas, but worse on others.

Specifically, UX has better pure animation throughput, but OTOH, it does more things (unrelated to animation but still affecting it) on tab close, and as a result, tab close animation on UX is visibly glitchier than on m-c.

This has first been identified on bug 911431, and then tracked down to a specific revision in bug 916946.
Depends on: 919541
Created attachment 809437 [details]
ux-profile-tart-winxp-351d311e107c.txt.zip

A more up-to-date reflow profile.
Attachment #800836 - Attachment is obsolete: true
Created attachment 809439 [details]
mc-profile-tart-winxp-579bfedd91f6.txt.zip

And one for m-c as well.
Attachment #800782 - Attachment is obsolete: true
Depends on: 920589
Depends on: 921038
Depends on: 921051
Depends on: 922207
We're starting to gather data on our OS X regressions, and developing a gameplan on how to deal with them. Reflow profiles coming up soon.
OS: Windows 7 → All
Created attachment 813306 [details]
mc-merged-profile-tart-snowleopard-e9b79f481864.txt.zip

mozilla-central reflow profile for Snow Leopard
Created attachment 813308 [details]
ux-merged-profile-tart-snowleopard-34cbdba2e0de.txt.zip

UX reflow profile for Snow Leopard
Created attachment 813368 [details]
ux-merged-profile-tart-lion-34cbdba2e0de.txt.zip
Created attachment 813370 [details]
mc-merged-profile-tart-lion-e9b79f481864.txt.zip
Created attachment 813371 [details]
ux-merged-profile-tart-mountainlion-34cbdba2e0de.txt.zip
Created attachment 813411 [details]
mc-merged-profile-tart-mountainlion-e9b79f481864.txt.zip
Created attachment 814170 [details]
m-c vs UX reflow comparison profile - Lion
Depends on: 924181
Depends on: 924182
Depends on: 924201
Depends on: 924415
Created attachment 814557 [details]
compare-snowleopard-03c32421e063-4a4ea81339a7.txt.zip

A comparison profile of more recent m-c and UX baseline profiles.
Depends on: 925413
Depends on: 925415

Updated

5 years ago
No longer depends on: 925415
The investigations have been completed, and the last TART blocker has been lifted.
Whiteboard: [Australis:P1][Australis:M9]
(Reporter)

Updated

5 years ago
Depends on: 936469
Depends on: 938742
Depends on: 938754
(Reporter)

Comment 65

4 years ago
We can close this. Australis has been the default for a long time now, it performs very well, and comparison with pre-Australis performance is no longer relevant.

Thanks Mike!
Status: NEW → RESOLVED
Last Resolved: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.