Bimodal results in WinXP Talos tscrollx (tscroll-ASAP MozAfterPaint)

RESOLVED WONTFIX

Status

RESOLVED WONTFIX
6 years ago
a year ago

People

(Reporter: mbrubeck, Unassigned)

Tracking

Trunk
x86_64
Windows 8
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [SfN])

(Reporter)

Description

6 years ago
The tscrollx test is producing bimodal results on Win7 (WINNT 6.1 (ix)) and Win8 (WINNT 6.2 x64), but not on XP/Mac/Linux:

http://graphs.mozilla.org/graph.html#tests=[[279,131,25],[287,1,33],[287,1,37],[287,1,22],[287,1,21],[287,1,31]]&sel=1374701984336,1377293984336&displayrange=30&datatype=running

This is causing false alarms from analyze_talos.py:
https://groups.google.com/d/msg/mozilla.dev.tree-management/hMQnCuKKFTU/IjhIn4QBf7oJ
(Reporter)

Updated

6 years ago
Blocks: 883150
Just repeating what I've said on IRC: I think we have bimodal results on many tests for some weeks now, and not only on tscrollx.

But I'll take a closer look at this.
(Reporter)

Comment 2

6 years ago
Yes, we have four other tests with known bimodal results (bug 859571, bug 751975, bug 860785, bug 901713).  Bug 859571 might be related because it is limited to the same two platforms (Win7 and Win8).
Adding some links by mbrubeck:

Some bimodal results in tscroll-ASAP mozafterpaint on few platforms:
http://graphs.mozilla.org/graph.html#tests=[[279,131,25],[279,1,37],[279,1,22],[279,1,33],[279,1,21],[279,1,31]]&sel=none&displayrange=7&datatype=running

Some bimodal results of tscroll-ASAP on few platforms:
http://graphs.mozilla.org/graph.html#tests=[[287,1,31],[287,1,37],[287,1,33],[287,1,22],[287,1,21]]&sel=1374701984336,1377293984336&displayrange=30&datatype=running

Possibly related: bug 908700 (make sure mozAfterPaint is used correctly).

jmaher, why do we have tscrollx both with and without mozAfterPaint?
Flags: needinfo?(jmaher)
(Reporter)

Comment 4

6 years ago
(In reply to Avi Halachmi (:avih) from comment #3)
> jmaher, why do we have tscrollx both with and without mozAfterPaint?

It looks like we switched from non-MozAfterPaint to MozAfterPaint a few days ago, with the latest Talos deployment.  We haven't been running both tests at the same time.
Flags: needinfo?(jmaher)
(In reply to Matt Brubeck (:mbrubeck) from comment #2)
> Yes, we have four other tests with known bimodal results (bug 859571, bug
> 751975, bug 860785, bug 901713).  Bug 859571 might be related because it is
> limited to the same two platforms (Win7 and Win8).

Is there a meta bug for those (and this) bug? if they all start around the same time, then looking at specific tests might be the wrong places to look.

(In reply to Matt Brubeck (:mbrubeck) from comment #4)
> It looks like we switched from non-MozAfterPaint to MozAfterPaint a few days
> ago, with the latest Talos deployment.  We haven't been running both tests
> at the same time.

Ah, thanks. Let's handle this on bug 908700 (I think tscrollx doesn't need mozAfterPaint).
(Reporter)

Comment 6

6 years ago
(In reply to Avi Halachmi (:avih) from comment #5)
> Is there a meta bug for those (and this) bug? if they all start around the
> same time, then looking at specific tests might be the wrong places to look.

I haven't created a meta bug since they mostly don't seem to have a common cause or common regression date.  (Keyword search http://bugzil.la/:Talos+bimodal finds them them easily enough, though.)
Looking at this graph:
http://graphs.mozilla.org/graph.html#tests=[[287,94,31]]&sel=none&displayrange=30&datatype=running

It appears to me that the bimodality started at Aug 1st, 19:45 . IIRC tscrollx was not modified around that time.

Joel, do you recall otherwise?

Also, I tried to look at few tscrollx bimodal results in datazilla. It seems to me that except for rare and random single outliers, all the results per run (25 values) are quite consistent. However, different runs (on different machines) could have different consistent values, which typically fall into one of the two ranges of this bimodality.

Unrelated to the bimodality, the regression on Aug 6th which magnifies the bimodality is due to an improvement of scrollx (and tsvgx) which made them take more stuff into account (by preventing paint starvation). See bug 908741.


I could use some help with datazilla or otherwise with trying to correlate these 2 sets of results with machines IDs.
Flags: needinfo?(mbrubeck)
Flags: needinfo?(jmaher)
From looking at this, there was a talos update at the same time (https://tbpl.mozilla.org/?rev=a0dd80f800e2), and in that talos update was:
http://hg.mozilla.org/build/talos/rev/4063ef2a221e

Is it possible that these updates prefs are causing more trouble than we realize?
Flags: needinfo?(jmaher)
(In reply to Joel Maher (:jmaher) from comment #8)
> From looking at this, there was a talos update at the same time ...

The Aug 6th changes at the graph are the ASAP prefs updates: added prevention of paint starvation with the pref docshell.event_starvation_delay_hint = 1. layout.frame_rate was already set to 0/10000 a while before that date.

Joel also says that starting Aug 1st (upto Aug 21st IIRC), the graphs confused PGO with non-PGO results, which could be the cause for me confusing the extra noise there with the introduction of bimodality. On that case, it's possible that the bimodality indeed started on Aug 6th, caused by the paint starvation prevention.

These are 2 tscrollx runs, each with 25 tppagecycles, representing the 2 bimodal result buckets on win7 opt:

Low results:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=aef6bbfe9db3

11:30:02     INFO -  |0;tiled.html;2.51;2.65;2.48;2.58;2.63;2.61;2.47;2.59;2.52;2.53;2.54;2.62;2.56;2.63;2.61;2.62;2.58;2.63;2.58;2.47;2.60;2.58;2.64;2.61;2.61

11:30:02     INFO -  |1;tiled-fixed.html;3.18;3.14;3.19;3.16;3.15;3.14;3.17;3.17;3.15;3.18;3.12;3.16;3.50;3.14;3.17;3.20;3.19;3.13;3.16;3.18;3.16;3.18;3.18;3.16;3.19

11:30:02     INFO -  |2;tiled-downscale.html;2.60;2.5;2.45;2.58;2.58;2.59;2.44;2.46;2.54;2.50;2.53;2.47;2.56;2.59;2.58;2.58;2.57;2.59;2.78;3.00;2.45;2.59;3.02;3.12;2.57

11:30:02     INFO -  |3;tiled-fixed-downscale.html;3.27;3.63;3.31;3.27;3.56;3.32;4.53;3.47;3.27;3.49;3.28;3.28;3.77;3.32;3.28;3.51;3.71;3.29;3.49;3.31;3.26;3.84;3.27;3.31;3.27

11:30:02     INFO -  |4;iframe.svg;5.93;5.81;5.89;5.87;5.81;5.86;5.87;5.89;5.85;5.87;5.86;5.89;5.82;5.81;5.80;5.87;5.86;5.85;5.86;5.86;5.85;5.96;5.97;5.84;5.86

11:30:02     INFO -  |5;reader.htm;10.16;5.09;5.09;5.09;5.07;5.10;5.07;5.07;5.06;5.04;5.07;5.10;5.07;5.05;5.07;5.07;5.10;5.08;5.03;5.05;5.10;5.07;5.13;5.10;5.10


High results:
https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=6eb4d1008076

09:10:03     INFO -  |0;tiled.html;3.72;3.63;3.63;3.61;3.65;3.66;3.64;3.64;3.62;3.64;3.65;3.66;3.65;3.65;3.67;3.66;3.66;3.64;3.66;3.62;3.62;3.64;3.63;3.66;3.65

09:10:03     INFO -  |1;tiled-fixed.html;4.52;4.48;4.50;4.43;4.51;4.41;4.52;4.73;4.45;4.48;4.51;4.50;4.51;4.50;4.51;4.52;4.51;4.52;4.49;4.44;4.51;4.48;4.51;4.51;4.44

09:10:03     INFO -  |2;tiled-downscale.html;3.95;3.93;3.91;3.91;3.92;3.92;3.92;3.92;3.93;3.91;3.93;3.93;3.90;3.93;3.91;3.90;3.92;3.92;3.92;3.92;4.72;4.15;3.92;4.41;3.93

09:10:03     INFO -  |3;tiled-fixed-downscale.html;5.23;5.27;5.25;5.20;5.26;5.21;5.05;5.07;4.98;5.04;5.11;5.12;5.62;5.93;5.13;5.23;5.14;5.09;5.23;5.05;5.03;4.84;4.90;4.95;4.94

09:10:03     INFO -  |4;iframe.svg;5.96;5.83;5.89;5.86;5.92;5.87;18.50;5.86;5.86;5.85;5.85;5.84;5.84;5.83;5.82;5.92;5.84;5.85;5.84;5.87;5.83;5.85;5.84;5.87;5.81

09:10:03     INFO -  |5;reader.htm;6.65;6.90;6.88;6.89;6.91;6.87;6.87;6.93;6.87;6.87;6.89;6.91;6.89;6.91;6.87;6.90;6.89;6.88;6.88;6.93;6.95;6.90;6.86;6.92;6.88


As you can see, other than very rare outliers, each subtest is quite stable, and the entire run (25-cycles x all-subtests) is either high or low results. It's like Firefox is either in or out of "fast mode", which doesn't change throughout the whole run.

Also, Joel managed to find 2 runs on the same machine where each run went clearly into a different bucket of this bimodality. I don't have the run IDs, but I think that for now we can rule out the machine-dependent bucket hypothesis.

As for paint starvation prevention and/or ASAP mode in general (see bug 908741), while both are new and not don't have a long history of usage, I _think_ they shouldn't cause bimodality. However, it seems that currently the evidence are pointing specifically at the pref setup of docshell.event_starvation_delay_hint = 1.

smaug, tn, what say you?
Flags: needinfo?(mbrubeck) → needinfo?(bugs)
Just noticed, on comment 9 it appears that all subtests suffer from being bimodal, except for iframe.svg. The iframe.svg results around 5.9 on both runs.

All the other tests are higher by about 50% on the "high results" run.
This is datazilla for tscrollx on winnt 6.1 with all the subtests: https://datazilla.mozilla.org/?start=1377437211&stop=1378042011&product=Firefox&repository=Mozilla-Inbound&os=win&os_version=6.1.7601&arch=x86_64&test=tscrollx&graph_search=85c661620e26,0b408255c923,35d97694d436&tr_id=2655035&graph=iframe.svg&project=talos

iframe.svg is also bimodal, but:
1. Bimodal behavior is with lower relative magnitude.
2. While all other tests show very high correlations in their high/low buckets, iframe.svg has several cases where the other tests are high, while iframe.svg is either at the "low" bucket or slighty higher than low, but still not at iframe.svg's high bucket.

Also iframe.svg is the only tscrollx test which uses iframe and svg

It might be useful to compare tscrollx with tscroll, but I couldn't find tscroll (or tscrollr) on datazilla's UI, and it also didn't work when I tried to modify the URL directly.

Joel, can we get old tscroll data on datazilla?
Flags: needinfo?(jmaher)
There's some tscrollr datazilla data on m-c (not on inbound). tscrollr pretty much flatlines at 16.67, so it can't help us. Its only subtest which isn't strictly flat is tiled-fixed.html, but that appears to be noise rather than bimodality.

https://datazilla.mozilla.org/?start=1377437211&stop=1378042011&product=Firefox&repository=Firefox&os=win&os_version=6.1.7601&arch=x86_64&test=tscrollr&graph_search=85c661620e26,0b408255c923,35d97694d436,01576441bdc6,60c382ba1773,4887845b1142,14b1e8c2957e&tr_id=2614833&graph=tiled-fixed.html&project=talos
Flags: needinfo?(jmaher)
Does the bimodality come from test runs where we maybe don't get any pixels pushed to the screen for some reason during testing?
(In reply to Timothy Nikkel (:tn) from comment #14)
> Does the bimodality come from test runs where we maybe don't get any pixels
> pushed to the screen for some reason during testing?

I guess it's possible. How can we test this hypothesis?
Can we use the new missed frames api?
If you're talking about bug 900785, then it hasn't landed yet since it proved to be not 100% accurate. Part 1 and 2 which did land are only refactoring in preparation. Part 3 is the actual new missed frames API, and it hasn't landed yet.

We might, however, be able to use the frames recording API which uses PostPresent to capture frames timestamps, so it might indicate non-presented frames (last enhanced on bug 826383, though this enhancement doesn't affect intervals recording - only adds paint durations).

It should be noted that the recording API is currently broken with OMTC (see bug 826383 comment 15), but since we only notice this bimodal behavior on windows, we could add recorded frame intervals to the log of tscrollx, and it might give us some indication on what's going on.

I'll push this change, thanks for the suggestion.
(Reporter)

Comment 18

6 years ago
At least in Windows PGO builds, there seems to be some bimodal behavior at build time that leads to bimodal results in several different tests.  That is, a given build will have either "good" performance across all tests, or "bad" performance across all tests.  For example, see how the same builds that perform worse at tscroll also perform worse at tresize:

graphs.mozilla.org/graph.html#tests=[[287,63,31],[254,63,31]]&sel=1378247879750,1378741306378
Very interesting, and a great find!

Since the difference between the buckets can be fairly big in some tests, do we want to consider that the "high" buckets are possibly non PGO'ed? how far are the high buckets from the non-pgo results?
Hmm.. the bipolar behavior exists in non-pgo as well, and the pgo results seem reasonably better than the non-pgo ones.

I'd say it's not a PGO related issue.

http://graphs.mozilla.org/graph.html#tests=[[287,63,31],[254,63,31],[287,131,31],[254,131,31]]&sel=1378247879750,1378741306378&displayrange=7&datatype=running
Hmm.. maybe too early to declare, but a recent 13.9% improvement of tresize on win8 ( https://groups.google.com/forum/#!topic/mozilla.dev.tree-management/WPOFdhh58p4 ) possibly also reduced/eliminated the bimodal behavior of tscrollx, tresize on win7 and of tscrollx, tresize, tart on win8 : http://graphs.mozilla.org/graph.html#tests=[[254,63,31],[293,63,31],[287,63,31],[287,63,25],[254,63,25]]&sel=1377349395677,1379941395677&displayrange=30&datatype=running
a promising early trend.  Great find.  Give this a fully week of data and we might be able to call this fixed.
Yeah, the problem is that I can't see how anything obvious among the bugs/pushes which could have affected those:

Bug 912321 - JS shell should be able to synchronize on off-thread compilations
Bug 913282 - IonMonkey: moar Float32 specializations
Bug 831285 - DLL block request: beid35cardlayer.dll 3.5.6.6968 and below
Bug 870406 - Move CSRCS to moz.build
Bug 501739 - String match and replace methods do not update global regexp lastIndex per ES3&5
Bug 918882 - nsMemoryReporterManager.cpp:1019:14: warning: unused variable 'rv' [-Wunused-variable]
Bug 832045 - [B2G] get_attribute('disabled') returns false regardless of disabled attribute
Bug 918869 - "Error populating virtualenv" building with objdir on a different drive
Bug 918519 - Add macro versions of GetMainComponent/GetCrossComponent, to avoid unnecessarily evaluating expensive function calls
Bug 918815 - Use xorps for float32 zero constants on x86
Bug 918645 - crash in java.lang.SecurityException: WifiService: Neither user 10061 nor current process has android.permission.ACCESS_WIFI_STATE. at android.os.Parcel.readException(Parcel.java)
(In reply to Avi Halachmi (:avih) from comment #22)
> Hmm.. maybe too early to declare, but a recent 13.9% improvement ...

Too early indeed. Results are bimodal again: http://graphs.mozilla.org/graph.html#tests=[[287,63,31],[287,63,25],[293,63,31],[254,63,25],[254,63,31],[279,63,31],[279,63,25]]&sel=1377681750148,1380273750148&displayrange=30&datatype=running

Interestingly, this is not the first time it happens. Apparently, during weekends the bimodal behavior goes away (on that graph, also sep 15, 8, 1).

My only hypothesis on this so far is that the bimodal behavior is not evenly split between the two buckets (for instance, it's maybe 80/20 towards one of the buckets), and since less tests are ran during weekends, they're just less likely to hit the less frequent bucket.

However, if the buckets have indeed different frequencies enough to get the lesser frequency bucket disappear during weekends, this is not something I could tell by glancing at the graph above.

Any thoughts on this?
Very interesting observation about the weekends. I see it on the graph as well.  This sounds like an interesting route to take in investigating test noise (including bimodal).  I see a couple possibilities:

1) Not enough data points to show the noise, although between two weekend days one would think we would get at least one data point - more investigation into this is needed

2) The machines are used 100% during the week and on weekends they are not at 100% capacity.  Maybe the hardware/os needs more downtime?

3) The datacenter is peaked during the week and we have power fluctuations or intermittent network conditions causing odd data.  I really don't think this is a problem, but we did see power issues under load for the panda boards.


I would like to look at this on datazilla and see which specific test files are doing this and if that is a pattern or random.  It would be nice to know the frequency of which we hit bi-modal data, i.e. 1 out of 10 or 1 out of 50.  If that is something we can quantify, then we can see if the weekend runs generate enough traffic, maybe queue up a bunch of patches for the weekend and try to see what happens.
(In reply to Joel Maher (:jmaher) from comment #26)
> Very interesting observation about the weekends...

Actually, it was vladan's observation. I just put it into words and came up with one hypothesis.
(Reporter)

Comment 28

6 years ago
TResize on Ubuntu also has slow runs that appear only on weekdays:

http://graphs.mozilla.org/graph.html#tests=[[254,131,33]]&sel=none&displayrange=90&datatype=running

Another random theory: Could some of our racks be heating up so much under peak load that some slaves are experiencing thermal throttling of CPU frequencies?
Summary: Bimodal results in Talos tscrollx (tscroll-ASAP MozAfterPaint) on Windows causing false regression alarms → Bimodal results in Talos tscrollx (tscroll-ASAP MozAfterPaint) on Windows causing false regression alarms, on weekdays only
Can we check if the bimodal behavior happens only on some datacenters but not on others? in that case, can we test increasing cooling (air condition) during weekdays for those which show bimodal/noisy results? (ubuntu tresize shows noise rather than bimodality during weekdays: http://graphs.mozilla.org/graph.html#tests=[[254,131,33]]&sel=1378557373666.692,1380471218547.7673,15.426866218225278,24.919403531658116&displayrange=90&datatype=running )
And/or specific racks or even specific locations within racks (e.g. the very bottom, etc). Maybe some of them just don't get enough cooling.

Also, one of the earlier hypotheses was that it was machine-dependent, and IIRC joel found at least one machine which showed both buckets (on different runs), so this theory got dropped. However, with this new data that during weekends it's not bimodal, it could still be machine dependent during weekdays.

Can we somehow select the ranges where results are bimodal, and on those, correlate machines to buckets (or to >10% regression on tresize in the graph above)?
(In reply to Joel Maher (:jmaher) from comment #26)
> Very interesting observation about the weekends. 
...
> 2) The machines are used 100% during the week and on weekends they are not
> at 100% capacity.  Maybe the hardware/os needs more downtime?
> 
> 3) The datacenter is peaked during the week and we have power fluctuations
> or intermittent network conditions causing odd data.  I really don't think
> this is a problem, but we did see power issues under load for the panda
> boards.

Correction: the datacenters (all 4 of them) are peaked mid-day of weekdays. However weekday morning/evening are medium load, while weekday night and weekends are very low load. Further data at http://oduinn.com/blog/2013/10/04/infrastructure-load-for-september-2013/

1) Are we also seeing this different runtime during week nights? Or is it only weekends?

2) For specific timings, are we talking about the duration differences in the test suite itself, or are these differences in the setup/teardown section of the job?
There is no way to tell from the graphs and data if it slows down at night, look at comment 29 and let me know if you can figure out if jobs during the night are less noisy.

The timing differences are the numbers reported from the talos tests, I have no idea about how long the jobs took or how long the setup/cleanup took.  All of that can be queried.

In the link to infrastructure load it does show 3 hour of low load and the rest is either medium or a lot (pushes/hour).  When that is averaged out with weekends, it might be misleading, it would be nice to look at the breakdown per day vs. averaged out over the entire month.  Another useful metric would be the backlog and see how that looks over time, I would think anytime we have >2 hours of no backlog (or machines idle not getting ready to take the next job off the queue) we would probably see less noise.

I am open to other suggestions or interpretations of the data.
(In reply to Avi Halachmi (:avih) from comment #9)
> (In reply to Joel Maher (:jmaher) from comment #8)
> > From looking at this, there was a talos update at the same time ...
> 
> The Aug 6th changes at the graph are the ASAP prefs updates: added
> prevention of paint starvation with the pref
> docshell.event_starvation_delay_hint = 1. layout.frame_rate was already set
> to 0/10000 a while before that date.
Note,  contentsink may still affect to favor perf hint :/
Flags: needinfo?(bugs)
(In reply to Avi Halachmi (:avih) from comment #22)
> ... possibly also reduced/eliminated the bimodal behavior of
> tscrollx, tresize on win7 and of tscrollx, tresize, tart on win8 :
> http://graphs.mozilla.org/graph.html#tests=[[254,63,31],[293,63,31],[287,63,31],[287,63,25],[254,63,25]]&sel=1377349395677,1379941395677&displayrange=30&datatype=running

and http://graphs.mozilla.org/graph.html#tests=[[254,63,31],[293,63,31],[287,63,31]]&sel=1380512321714.7993,1385369306202.907,2.1850749912546625,18.065672006180037&displayrange=90&datatype=running

It looks as if Oct 22nd was the last bimodal result on Win8 for tscrollx, tresize and TART, and it's been quite consistent since.

Do we still have other bimodal results left which we could associate with this bug? If not, shall we close this bug?

Would still be nice to know what happened around Oct 22nd that fixed this... Did we have some IT changes around that time?

Since this only happened on windows machines, did we have some windows gfx change which might explain the improvement in results consistency?
Flags: needinfo?(mbrubeck)
Flags: needinfo?(matt.woodrow)
I would bet on bug 929473.  Lets close this bug.
(In reply to Joel Maher (:jmaher) from comment #35)
> I would bet on bug 929473...

The symptoms don't match as far as I can tell. Bug 929473 speaks of timeouts, while this bug is about bimodal results, where typically the entire run is either falls into bucket or into the other.

It might still be bug 929473, but I can't yet see any evidence or similarities other than the date.
(Reporter)

Comment 37

5 years ago
(In reply to Avi Halachmi (:avih) from comment #34)
> It looks as if Oct 22nd was the last bimodal result on Win8 for tscrollx,
> tresize and TART, and it's been quite consistent since.

That's also the same date we stopped getting bimodal results from Ts Paint on Win7/8 (bug 859571).  So far we haven't figured out why.
Flags: needinfo?(mbrubeck)
Nothing stands out to me in that range.
Flags: needinfo?(matt.woodrow)
Depends on: 943950
(In reply to Joel Maher (:jmaher) from comment #35)
> I would bet on bug 929473.

So to test if it's that bug, we decided to try to revert it (re-enable screen saver) for 2 days and see if bimodal returns. Screen saver should be turned on now for about an hour already, and we'll turn it off again before the weekend. See bug 943950.

So far (though quite preliminary) it seems bimodal results indeed returned, but with a caveat: when bimodal stopped on Oct 22, we were left exclusively with the "high bucket" (the worse results), so when bimodal returns, we expect a new lower (better) bucket to be introduced, but it appears that's not the case so far - in practice, a higher bucket appeared.

This could be due to bug 899785 (turn OMTC on by default for windows if supported), which could have introduced a real regression, together with the screen saver which introduces a lower bucket (possibly with similar value to a day or two ago), making it appear as if the screen saver (or OMTC on) introduced a higher bucket. We'll know for sure after we turn screen saver off again if we're left with the higher or lower of the two.

As for hypothesis about screen saver and better results (the lower bucket), despite bug 929473 symptoms of timeouts, we could maybe had other symptoms where the tests would not time out. If that indeed happened, then while screen saver kicks in, windows doesn't need to actually update the screen during the tests (since it's either off or displaying screen saver), so it can perform better 0 hence the lower results bucket.

http://graphs.mozilla.org/graph.html#tests=[[287,63,31],[293,63,31],[254,63,31],[287,63,25],[293,63,25],[254,63,25]]&sel=none&displayrange=90&datatype=running
Well, I don't think it was the screen saver. The behavior we've see should be attributed to OMTC IMO, and it disappeared once OMTC on for windows was backed out. Since then, I can't see even a hint for bimodal behavior...

http://graphs.mozilla.org/graph.html#tests=[[293,131,31],[293,131,25],[287,131,25],[287,131,31],[254,131,25],[254,131,31]]&sel=none&displayrange=7&datatype=running
agreed, we have enough data points to prove it.
from the 6 tests in comment 40, there are 2 tests which appear to be bi-modal and another 2 which are just noisy:
http://graphs.mozilla.org/graph.html#tests=[[254,131,31],[287,131,25],[287,131,31],[293,131,31]]&sel=1400589846533,1403181846533&displayrange=30&datatype=running

the bi-modal ones are win8 tscroll-asap, and tresize.
The original bug was filed for tscrollx on win8 on 2013-08, and at the time it had about 35% bimodal range (5/7).

Since November it changed into "normal" noise which doesn't hide normal sustained perf changes too much, so from this perspective we're good. It also wasn't affected (noise wise) by turning OMTC on on May 20.

As for comment 42, slight bimodality or noise is unavoidable. The important aspect to consider is whether or not this noise/bimodality magnitude is such that it hides "normal" day to day improvements/regressions.

If we look at the bimodal ones which you noted (win8 tresize, tscrollx) over the past 90 days: http://graphs.mozilla.org/graph.html#tests=[[287,131,31],[293,131,31],[287,131,25],[254,131,31]]&sel=none&displayrange=90&datatype=running

We can clearly see that occasional "normal" changes are very apparent, which means the noise level is low enough that it shouldn't bother us.

So, the 4 tests you mentioned in comment 42 have low enough bimodal/noise value that it shouldn't bother us.

The original bimodal of tscrollx/win8 is also fine since November.

I did notice, however, that tscrollx on Windows XP had low bimodal range of ~5% (3.55/3.73) since 2013-08, and that since 2014-May-20 (OMTC turned on) it grew considerably to 25% bimodal range (2.4/3.2) - though still with lower absolute numbers than before May 20.

Your call if you want to file a new bug for it or keep this one open for windows xp since OMTC, and possible before it as well.
I will leave this open for now, with an edit to the subject.  

In the last 5 months (data I have easy access to), there have been 4 WinXP tscrollx alerts, all of them related to the landing of OMTC:
+------+---------------------+----------------+-------------------------+--------------+---------+
| id   | date                | platform       | branch                  | test         | percent |
+------+---------------------+----------------+-------------------------+--------------+---------+
| 2426 | 2014-05-20 13:04:08 | WINNT 5.1 (ix) | Mozilla-Inbound-Non-PGO | tscroll-ASAP | +21.4%  |
| 2464 | 2014-05-21 12:23:18 | WINNT 5.1 (ix) | Mozilla-Inbound         | tscroll-ASAP | +22.3%  |
| 2514 | 2014-05-22 04:46:32 | WINNT 5.1 (ix) | Firefox-Non-PGO         | tscroll-ASAP | +20.6%  |
| 2989 | 2014-06-12 20:23:15 | WINNT 5.1 (ix) | Mozilla-Aurora          | tscroll-ASAP | +24.8%  |
+------+---------------------+----------------+-------------------------+--------------+---------+

you can see it landed on inbound, then a pgo build, merge to firefox (m-c), and finally uplifted to Aurora.

Given this data, do you think we should worry about it?
Summary: Bimodal results in Talos tscrollx (tscroll-ASAP MozAfterPaint) on Windows causing false regression alarms, on weekdays only → Bimodal results in QinXP Talos tscrollx (tscroll-ASAP MozAfterPaint)
(In reply to Joel Maher (:jmaher) from comment #44)
> I will leave this open for now, with an edit to the subject.  
> 
> In the last 5 months (data I have easy access to), there have been 4 WinXP
> tscrollx alerts, all of them related to the landing of OMTC:

By looking at this graph: http://graphs.mozilla.org/graph.html#tests=[[279,63,37],[287,63,37]]&sel=none&displayrange=90&datatype=running

There has been one XP tscrollx regression around May 22nd or 23rd, and these 4 alerts you mention are probably the same one on different branches.

> Given this data, do you think we should worry about it?

I don't think this bug relates to this data at all. The regression(s) is an increase in value, while this bug is about bimodal behavior - which didn't manifest as alerts AFAIK.

The only relevant data here is that:

1. As far as I can tell, all the tests for which this bug was filed are not bimodal anymore.

2. tscrollx on windows XP was bimodal since this bug was filed (not sure if it was mentioned throughout this bug or not), and much more bimodal after 2014 may 20th.

So we should probably keep this bug open for tscrollx/xp. But I don't have time to investigate it right now...
tscrollx is still bimodal on pretty much all windows platforms, with and without e10s.

https://treeherder.mozilla.org/perf.html#/graphs?timerange=604800&series=[mozilla-inbound,e5c782b789ec02e8add7dae8c97ca3f42ed46442,0]&series=[mozilla-inbound,c3eb69f0719aacca596bca0626205e4b30953034,0]&series=[mozilla-inbound,3a3ad5b5e3ff3577c96a133e09dce5d7a1eab55c,0]&series=[mozilla-inbound,7a8e5a11cfddd7a822b10605b1f0e9a045b93dad,1]&series=[mozilla-inbound,c33eba52ac3ad7d976bd2a972930490395f20b46,0]&series=[mozilla-inbound,3e4c0b1bffd401a58b2e625168176cffeb90c758,0]

(The graph for winXP e10s is not obviously bimodal when zoomed out, but if you zoom in it does still appear bimodal).
Summary: Bimodal results in QinXP Talos tscrollx (tscroll-ASAP MozAfterPaint) → Bimodal results in WinXP Talos tscrollx (tscroll-ASAP MozAfterPaint)
we don't run on winxp anymore.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.