Closed Bug 967766 Opened 10 years ago Closed 10 years ago

ts_paint has regressed a lot on fx-team between january 22 and february 4

Categories

(Testing :: Talos, defect)

defect
Not set
normal

Tracking

(firefox29-, firefox30-)

RESOLVED WONTFIX
Tracking Status
firefox29 - ---
firefox30 - ---

People

(Reporter: jmaher, Unassigned)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression][Australis:P-])

I am still investigating this, but right now I see this trend.

fx-team-non-pgo:
http://graphs.mozilla.org/graph.html#tests=[[83,132,37]]&sel=none&displayrange=30&datatype=running

fx-team (pgo):
http://graphs.mozilla.org/graph.html#tests=[[83,64,37]]&sel=none&displayrange=30&datatype=running

an alert on mozilla.dev-planning got me hot on the trail:
https://groups.google.com/forum/#!topic/mozilla.dev.tree-management/1dO2Jro6mTs

the pgo one, I have done a couple of pgo builds to see what shakes out.  This will require some time to get the builds and tests going.

Right now based on the graphs, I am seeing 2 possible regressions:
1) ~jan 22
2) ~feb 2
Alert for the Jan 22 regression (also includes a large tpaint regression):
https://groups.google.com/d/topic/mozilla.dev.tree-management/IiglnLLt-C0/discussion
Changeset range (bug 944947):
http://hg.mozilla.org/integration/fx-team/pushloghtml?fromchange=b5bd8c4bd163&tochange=228214210aa5

Alert for the Feb 2 regression:
https://groups.google.com/d/topic/mozilla.dev.tree-management/1dO2Jro6mTs/discussion
Changeset range (bug 966694, bug 964217, bug 947586, bug 966913):
http://hg.mozilla.org/integration/fx-team/pushloghtml?fromchange=a91b8e6b8dcc&tochange=bf25e29dc677
Blocks: 944947
Summary: ts_paint has regressed a lot on fx-team in the last 2 weeks → [Australis] ts_paint has regressed a lot on fx-team in the last 2 weeks
(In reply to Matt Brubeck (:mbrubeck) from comment #1)
> Alert for the Jan 22 regression (also includes a large tpaint regression):
> https://groups.google.com/d/topic/mozilla.dev.tree-management/IiglnLLt-C0/
> discussion
> Changeset range (bug 944947):
> http://hg.mozilla.org/integration/fx-team/
> pushloghtml?fromchange=b5bd8c4bd163&tochange=228214210aa5

The tpaint regression was meant to be fixed by bug 963105, which landed very recently. Jmaher said he checked and the tpaint regression appeared fixed (see also: bug 967691). I assumed that would also fix the ts_paint regression. Did it not?

> Alert for the Feb 2 regression:
> https://groups.google.com/d/topic/mozilla.dev.tree-management/1dO2Jro6mTs/
> discussion
> Changeset range (bug 966694, bug 964217, bug 947586, bug 966913):
> http://hg.mozilla.org/integration/fx-team/
> pushloghtml?fromchange=a91b8e6b8dcc&tochange=bf25e29dc677

The problem with this is it's really not clear to me how any of these csets would regress ts_paint. Most of them apply to all platforms (and surely a single tooltiptext attribute wouldn't regress ts_paint that badly?). Instead, mac shows a ts_paint regression that seems to be caused by http://hg.mozilla.org/integration/fx-team/rev/75303a3ddc0c . Looking at the combined picture:

http://graphs.mozilla.org/graph.html#tests=[[83,132,35],[83,64,22],[83,64,24],[83,64,21],[83,132,37]]&sel=1391254793324.4275,1391554412374,531.9580208513787,1094.8741917545697&displayrange=30&datatype=running

I don't understand how the changesets listed in that pushlog were singled out.
err winxp t(o) runs to get the difference
The regression is now on Aurora. Marking as P1 since we can't ship with this big of a regression without understanding it better.

http://graphs.mozilla.org/graph.html#tests=[[83,64,37],[83,52,37]]&sel=1387409790468.7766,1391570411828,318.76929649939893,744.1539118840146&displayrange=90&datatype=running
Whiteboard: [talos_regression] → [talos_regression][Australis:P1]
Gijs, the bug 944947 had regressions in tpaint and ts_paint, we have fixed tpaint, but ts_paint still exists.
(In reply to Joel Maher (:jmaher) from comment #6)
> Gijs, the bug 944947 had regressions in tpaint and ts_paint, we have fixed
> tpaint, but ts_paint still exists.

Right. The landing that was meant to fix this was in the middle of another ts_paint regression. Are we sure that fix isn't just being masked by a bigger regression caused by:

https://tbpl.mozilla.org/?tree=Fx-Team&startdate=2014-02-02&enddate=2014-02-04&jobname=talos%20other&rev=75303a3ddc0c

?

Otherwise I'll go a-try-bisecting to figure out what's causing this.

I only got ts_paint regression emails for Windows. Is that accurate, ie was there no ts_paint regression on the other platforms?
it is hard to tell if the Feb 2nd regression is masking any fixes to this.  I would have expected to see a slight drop on the 3rd to indicate a fix.  It might be worth a try run with the bulk of the code from bug 944947 removed to see if there is a fix for ts_paint.
(In reply to Joel Maher (:jmaher) from comment #8)
> it is hard to tell if the Feb 2nd regression is masking any fixes to this. 
> I would have expected to see a slight drop on the 3rd to indicate a fix.  It
> might be worth a try run with the bulk of the code from bug 944947 removed
> to see if there is a fix for ts_paint.

Baseline (against current fx-team tip):   https://tbpl.mozilla.org/?tree=Try&rev=9b0262dc2bfd

Everything backed out:   https://tbpl.mozilla.org/?tree=Try&rev=fd33538acbe8
Blocks: australis
(In reply to Joel Maher (:jmaher) from comment #8)
> it is hard to tell if the Feb 2nd regression is masking any fixes to this. 
> I would have expected to see a slight drop on the 3rd to indicate a fix.  It
> might be worth a try run with the bulk of the code from bug 944947 removed
> to see if there is a fix for ts_paint.

Is there a separate bug tracking the ts_paint issue for the feb 2nd regression?
Oh, and another try push to check if removing the binding whenever the node is invisible has any effect:

https://tbpl.mozilla.org/?tree=Try&rev=f3c7d07753c4
(In reply to :Gijs Kruitbosch from comment #9)
> (In reply to Joel Maher (:jmaher) from comment #8)
> > it is hard to tell if the Feb 2nd regression is masking any fixes to this. 
> > I would have expected to see a slight drop on the 3rd to indicate a fix.  It
> > might be worth a try run with the bulk of the code from bug 944947 removed
> > to see if there is a fix for ts_paint.
> 
> Baseline (against current fx-team tip):  
> https://tbpl.mozilla.org/?tree=Try&rev=9b0262dc2bfd
> 
> Everything backed out:   https://tbpl.mozilla.org/?tree=Try&rev=fd33538acbe8

Backing things out:

http://compare-talos.mattn.ca/?oldRevs=9b0262dc2bfd&newRev=fd33538acbe8&server=graphs.mozilla.org&submit=true

(this seems to just make ts_paint worse?)

Disabling the binding:

http://compare-talos.mattn.ca/?oldRevs=9b0262dc2bfd&newRev=f3c7d07753c4&server=graphs.mozilla.org&submit=true

(this is /slightly/ better but doesn't come anywhere close to addressing the regression)

This looks to me like my supposition in comment #7 is correct, but I realize I'm a biased observer...
Steven, is it possible that the second regression (on Feb 2/3) was due to the sessionstore changes you made in that timeframe?
Flags: needinfo?(smacleod)
(In reply to :Gijs Kruitbosch from comment #13)
> Steven, is it possible that the second regression (on Feb 2/3) was due to
> the sessionstore changes you made in that timeframe?

We could back it out and push to try but I doubt it. Talos profiles don't have a sessionstore.js afaik and Steven's patch only changes how we load the file once at the very startup. But we all now that Talos can be very surprising more often than we would like it to...
(In reply to Tim Taubert [:ttaubert] from comment #14)
> (In reply to :Gijs Kruitbosch from comment #13)
> > Steven, is it possible that the second regression (on Feb 2/3) was due to
> > the sessionstore changes you made in that timeframe?
> 
> We could back it out and push to try but I doubt it. Talos profiles don't
> have a sessionstore.js afaik and Steven's patch only changes how we load the
> file once at the very startup. But we all now that Talos can be very
> surprising more often than we would like it to...

Doesn't ts_paint restart with the same profile 20 times? (Plus a first-run that isn't counted)
That should be affected by session store changes...
(In reply to :Gijs Kruitbosch from comment #15)
> (In reply to Tim Taubert [:ttaubert] from comment #14)
> > (In reply to :Gijs Kruitbosch from comment #13)
> > > Steven, is it possible that the second regression (on Feb 2/3) was due to
> > > the sessionstore changes you made in that timeframe?
> > 
> > We could back it out and push to try but I doubt it. Talos profiles don't
> > have a sessionstore.js afaik and Steven's patch only changes how we load the
> > file once at the very startup. But we all now that Talos can be very
> > surprising more often than we would like it to...
> 
> Doesn't ts_paint restart with the same profile 20 times? (Plus a first-run
> that isn't counted)
> That should be affected by session store changes...

Knowledge is power, so:

Baseline:

remote:   https://tbpl.mozilla.org/?tree=Try&rev=31d2f35a659d

backed out bug 959130:

remote:   https://tbpl.mozilla.org/?tree=Try&rev=4f3d2724ab17
(In reply to :Gijs Kruitbosch from comment #16)
> (In reply to :Gijs Kruitbosch from comment #15)
> > (In reply to Tim Taubert [:ttaubert] from comment #14)
> > > (In reply to :Gijs Kruitbosch from comment #13)
> > > > Steven, is it possible that the second regression (on Feb 2/3) was due to
> > > > the sessionstore changes you made in that timeframe?
> > > 
> > > We could back it out and push to try but I doubt it. Talos profiles don't
> > > have a sessionstore.js afaik and Steven's patch only changes how we load the
> > > file once at the very startup. But we all now that Talos can be very
> > > surprising more often than we would like it to...
> > 
> > Doesn't ts_paint restart with the same profile 20 times? (Plus a first-run
> > that isn't counted)
> > That should be affected by session store changes...
> 
> Knowledge is power, so:
> 
> Baseline:
> 
> remote:   https://tbpl.mozilla.org/?tree=Try&rev=31d2f35a659d
> 
> backed out bug 959130:
> 
> remote:   https://tbpl.mozilla.org/?tree=Try&rev=4f3d2724ab17

The backout here seems to have a decent impact, even going just by 10.6 which was retriggered by someone before I got to it... I just retriggered XP some more (7 and 8 seem to be badly backlogged on try, I'll see tomorrow morning).
http://perf.snarkfest.net/compare-talos/index.html?oldRevs=31d2f35a659d&newRev=4f3d2724ab17&submit=true

So the backout here seems to be 3-10% better than pre-backout (which is 20-100ms depending on the platform). I'm marking this as blocking bug 959130 because it seems clear that it's had a significant impact here. I haven't checked that this is all of the regression accounted for, though.

Either way, it seems like this should be investigated by people closer to that bug. Tim/Steven/Yoric?
Blocks: 959130
What bug 959130 is replace the input used by CrashMonitor (the OS.File worker) and Session Restore (the SessionWorker) by something that doesn't take up to 6 seconds to return a value.

I suspect that we move from the following situation:
1. CrashMonitor and SessionRestore request some input;
2. firstPaint;
3. CrashMonitor and SessionRestore receive their input;
4. do stuff with the input;
5. session restored.

To the following situation:
1. CrashMonitor and SessionRestore request some input;
2. input arrives faster, CrashMonitor and SessionRestore receive their input;
3. do stuff with the input;
4. firstPaint;
5. session restored.

In which case, there is nothing to worry about.
Steven, I believe that your code could test whether the input callbacks are called before or after firstPaint. This would help us settle the issue.
(In reply to Joel Maher (:jmaher) from comment #0)
> I am still investigating this, but right now I see this trend.
> 
> fx-team-non-pgo:
> http://graphs.mozilla.org/graph.html#tests=[[83,132,
> 37]]&sel=none&displayrange=30&datatype=running
> 
> fx-team (pgo):
> http://graphs.mozilla.org/graph.html#tests=[[83,64,
> 37]]&sel=none&displayrange=30&datatype=running
> 
> an alert on mozilla.dev-planning got me hot on the trail:
> https://groups.google.com/forum/#!topic/mozilla.dev.tree-management/
> 1dO2Jro6mTs
> 
> the pgo one, I have done a couple of pgo builds to see what shakes out. 
> This will require some time to get the builds and tests going.
> 
> Right now based on the graphs, I am seeing 2 possible regressions:
> 1) ~jan 22
> 2) ~feb 2

Also, it looks like there was a small regression on jan 20 as well. :-(
So... to get a better idea of what the regression is composed of, I've pushed the following to try:

1) baseline of http://hg.mozilla.org/integration/fx-team/rev/059ed9908fda . This was before either suspected bugs landed, and before the Jan 20 and/or 22 and/or feb 2/3 regression showed up.

remote:   https://tbpl.mozilla.org/?tree=Try&rev=3073d97e978d

2) baseline + bug 944947 (all 4 patches on that bug) + http://hg.mozilla.org/mozilla-central/rev/7a85fd2440e8 minus the CSS change, because the JS changes conflicted so much (this is somewhat sad, but I didn't have much of a choice)

3) (2) + bug 963105

remote:   https://tbpl.mozilla.org/?tree=Try&rev=51bab5125243

4) (3) + bug 959130

remote:   https://tbpl.mozilla.org/?tree=Try&rev=853fcb630790

5) http://hg.mozilla.org/integration/fx-team/rev/a2a5e5e5eb69, which is after all the regressions.

remote:   https://tbpl.mozilla.org/?tree=Try&rev=b2a2426c38b5


-----

All of this to answer two questions:
1) did bug 963105 fix the regression caused by bug 944947 (if so, 1 and 3 should be equal)
2) does all of the remaining regression belong to the sessionstore issue (if so, 4 and 5 should be equal)

(huh, so I guess in hindsight I could have dropped push 2... well, it's done now)
(In reply to :Gijs Kruitbosch from comment #22)
> So... to get a better idea of what the regression is composed of, I've
> pushed the following to try:
> 
> 1) baseline of http://hg.mozilla.org/integration/fx-team/rev/059ed9908fda .
> This was before either suspected bugs landed, and before the Jan 20 and/or
> 22 and/or feb 2/3 regression showed up.
> 
> remote:   https://tbpl.mozilla.org/?tree=Try&rev=3073d97e978d
> 
> 2) baseline + bug 944947 (all 4 patches on that bug) +
> http://hg.mozilla.org/mozilla-central/rev/7a85fd2440e8 minus the CSS change,
> because the JS changes conflicted so much (this is somewhat sad, but I
> didn't have much of a choice)

Forgot to paste this in the comment: https://tbpl.mozilla.org/?tree=Try&rev=4d2510915582

> 
> 3) (2) + bug 963105
> 
> remote:   https://tbpl.mozilla.org/?tree=Try&rev=51bab5125243
> 
> 4) (3) + bug 959130
> 
> remote:   https://tbpl.mozilla.org/?tree=Try&rev=853fcb630790
> 
> 5) http://hg.mozilla.org/integration/fx-team/rev/a2a5e5e5eb69, which is
> after all the regressions.
> 
> remote:   https://tbpl.mozilla.org/?tree=Try&rev=b2a2426c38b5
> 
> 
> -----
> 
> All of this to answer two questions:
> 1) did bug 963105 fix the regression caused by bug 944947 (if so, 1 and 3
> should be equal)
> 2) does all of the remaining regression belong to the sessionstore issue (if
> so, 4 and 5 should be equal)
> 
> (huh, so I guess in hindsight I could have dropped push 2... well, it's done
> now)

Waiting for retriggers to go through. Might be a while considering the try backlog.
Assignee: nobody → gijskruitbosch+bugs
So a lot of retriggers are still pending. However,

http://compare-talos.mattn.ca/?oldRevs=853fcb630790&newRev=b2a2426c38b5&server=graphs.mozilla.org&submit=true

says that my baseline (from ~ jan 18 or something) + bug 944947 + bug 963150 + bug 959130 aren't doing nearly as badly as fx-team was on feb 2. That is, something else (or somethings elses) caused the majority of the problem. Unless of course there are somehow other prerequisites that mean that those bugs aren't having the effect they did have when they did land... but the simpler explanation is that there was something else either around feb2/3 or earlier that badly regressed things.
(In reply to :Gijs Kruitbosch from comment #24)
> So a lot of retriggers are still pending. However,
> 
> http://compare-talos.mattn.ca/
> ?oldRevs=853fcb630790&newRev=b2a2426c38b5&server=graphs.mozilla.
> org&submit=true
> 
> says that my baseline (from ~ jan 18 or something) + bug 944947 + bug 963150
> + bug 959130 aren't doing nearly as badly as fx-team was on feb 2.

Err, feb 3. Sorry.
And this:

http://compare-talos.mattn.ca/?oldRevs=3073d97e978d&newRev=4d2510915582&server=graphs.mozilla.org&submit=true

shows clearly that I caused a regression with bug 944947, whereas this:

http://compare-talos.mattn.ca/?oldRevs=3073d97e978d&newRev=51bab5125243&server=graphs.mozilla.org&submit=true

shows that we did nullify both the tpaint and ts_paint regression with the fix from bug 963105.


This makes sense because all these patches are on Aurora and we're seeing the same regression there.

I'm not really sure where exactly to go from here. I think our best bet would be bisecting/dissecting the changes around Feb 2/3 and figuring out how/when we regressed if it wasn't any of these patches. Joel, do you have a better idea?

(meanwhile, the sessionstore change does still seem to be responsible for part of the regression, but not all of it - not even most of it, by the looks of it - so it'd still be valuable to figure out if that part at least is real or a measurement artifact, as per Yoric's earlier comments)
So, backing up a bit, because neither of the suspect bugs are helping to make sense of the magnitude of the regression:

looking at http://graphs.mozilla.org/graph.html#tests=[[83,64,37]]&sel=1390041042850.7788,1391553235535.8813,459.03255728788156,559.2186037995094&displayrange=30&datatype=running (which is a zoomed in bit of the graph from comment 0), I see the following regressions:

jan 19/20 (I suspect a merge from m-c/inbound, but I need to verify that)

jan 22/23 (bug 944947, which should have been fixed, but doesn't look it - so we need to figure out why, when the fix landed, that didn't seem to impact the numbers - need to look further into this)

feb 1/2 (ttaubert and I suspect the fxa-enabled-by-default caused this one, but bug 966823 was meant to fix that (on feb 2) and doesn't look like it has - need to look further into this)

feb 3/4 (there might be several here, the noise makes it hard to tell - the sessionrestore issue is in this lot)

I'm going to open sub-bugs for these as and when I narrow down ranges/culprits.

The other thing that bemuses me is that Aurora is ~20ms faster than Nightly, even though all the suspect bugs and regression ranges should have made it into Aurora. No explanation for that yet, either.
Summary: [Australis] ts_paint has regressed a lot on fx-team in the last 2 weeks → [Australis] ts_paint has regressed a lot on fx-team between january 18 and february 4
(In reply to :Gijs Kruitbosch from comment #27)
> feb 3/4 (there might be several here, the noise makes it hard to tell - the
> sessionrestore issue is in this lot)

So,

1) it seems bug 959130 owns about 23ms on the non-PGO regression, which accounts for most of the jump in the graph between feb 2 and feb 4.

I'm going to file a followup bug specifically about this regression and we can figure out the sessionstore portion there.

However, there was a clear PGO regression on/before feb 1/2, which is harder to spot on the non-PGO graph (where it almost seems like the numbers gradually rose from ~600 to ~615 between Jan 28 and Feb 2). I'll keep investigating that.

2) bug 963105, the fix for bug 944947's regression, landed exactly before bug 959130, and improved non-pgo by about 5ms. That's in comparison to an original regression of 15-20ms, so that's weird, because on try, the improvement gained by bug 963105 was actually larger than the regression (that is, the net result of bug 944947 + bug 963105 was lower ts_paint and tpaint numbers). I have no idea why the fix didn't have the impact it originally had, and will keep investigating this, too.
Depends on: 970043
Clearing needinfo here, further discussion about the sessionstore issue can happen in bug 970043.
No longer blocks: 959130
Flags: needinfo?(smacleod)
Depends on: 970049
No longer blocks: 944947
Depends on: 970114
The jan 19/20 change was from inbound, filed bug 970114 for that. Updating the summary to reflect which bits were actually from fx-team landings.
Summary: [Australis] ts_paint has regressed a lot on fx-team between january 18 and february 4 → [Australis] ts_paint has regressed a lot on fx-team between january 22 and february 4
Open questions for this bug:

1) did something else regress at the same time as bug 944947 (there are discrepancies between the amount of the regression on try and the jump in the fx-team graph)

The graph says that it would have been in this window: https://tbpl.mozilla.org/?tree=Fx-Team&fromchange=a7ec92db9f0b&tochange=0451fb80e3a2

To help figure this out, I've pushed:

baseline: https://tbpl.mozilla.org/?tree=Try&rev=32ea447b1e40 (http://hg.mozilla.org/integration/fx-team/rev/a7ec92db9f0b)
end of regression range on the graph: https://tbpl.mozilla.org/?tree=Try&rev=1d2904d6ef1c (http://hg.mozilla.org/integration/fx-team/rev/0451fb80e3a2)
that with bug 944947 backed out: https://tbpl.mozilla.org/?tree=Try&rev=69a1dec10d71 (previous rev + backouts)


2) what regressed between jan 28 and feb 2:
some of the pushes from bug 970049 can help here:

jan 28: https://tbpl.mozilla.org/?tree=Try&rev=b2b1a281737c (592.04)
jan 30: https://tbpl.mozilla.org/?tree=Try&rev=bd702400217e (596.84, +  4.80 from jan 28)
jan 31: https://tbpl.mozilla.org/?tree=Try&rev=3e5e37577a24 (602.66, +  5.82 from jan 30)
feb 02: https://tbpl.mozilla.org/?tree=Try&rev=70121bdf46b0 (618.43, + 15.77 from jan 31)

Which isn't particularly helpful, except that jan-31 to feb-02 was significantly worse than the other intervals. I'll try and figure out what happened there (but I should probably look at the other ones as well, seeing as it's all combined up to a significant regression...), and extend further out.

So:

jan-31 to feb 02:

early feb 02: https://hg.mozilla.org/integration/fx-team/rev/463bae14bef3 : https://tbpl.mozilla.org/?tree=Try&rev=e87a8c6b7f35
late jan 31: https://hg.mozilla.org/integration/fx-team/rev/c21d44250e0c : https://tbpl.mozilla.org/?tree=Try&rev=a192f996f0fb
m-c merge earlier on jan 31: https://hg.mozilla.org/integration/fx-team/rev/d02efae4db3a : https://tbpl.mozilla.org/?tree=Try&rev=c93b439da2ec
australis fix still earlier on jan 31 (which was busy, as opposed to feb 01): https://hg.mozilla.org/integration/fx-team/rev/a36e188dbc74 :  https://tbpl.mozilla.org/?tree=Try&rev=0030bacebe7d

extending to feb 03 and feb 04 is below

3) did something else regress on feb 3/4 along with what bug 970043 is tracking?

The graph seems to indicate the major regression range was:

https://tbpl.mozilla.org/?tree=Fx-Team&fromchange=bf25e29dc677&tochange=5fb066c6d76c

baseline: https://tbpl.mozilla.org/?tree=Try&rev=692abed8b78f (http://hg.mozilla.org/integration/fx-team/rev/bf25e29dc677)
regression: https://tbpl.mozilla.org/?tree=Try&rev=0b9430d66dad (http://hg.mozilla.org/integration/fx-team/rev/5fb066c6d76c)
that with bug 959130 backed out: https://tbpl.mozilla.org/?tree=Try&rev=a72e49ffe3a3
that with bug 963105 backed out also (which should have rendered an improvement): https://tbpl.mozilla.org/?tree=Try&rev=d7123add40b3

But before bf25e29dc677, there was also mayhem. So I've pushed some ones before the baseline listed above:

feb 2, middle of the day, before an m-c merge: https://tbpl.mozilla.org/?tree=Try&rev=87109379ade5
 (https://hg.mozilla.org/integration/fx-team/rev/b6cc3c35d419)

feb 2, very early, before another merge: https://tbpl.mozilla.org/?tree=Try&rev=64a30014824c (https://hg.mozilla.org/integration/fx-team/rev/463bae14bef3)


(note that dates are a little weird... I've been trying to use TBPL dates (pacific time) here, but it seems like the graph server either uses different ones or uses the end times of builds or something, so they seem to be later... which gets confusing)
(In reply to :Gijs Kruitbosch from comment #31)
> Open questions for this bug:
> 
> 1) did something else regress at the same time as bug 944947 (there are
> discrepancies between the amount of the regression on try and the jump in
> the fx-team graph)

"Don't know / try is unhelpful":

http://perf.snarkfest.net/compare-talos/index.html?oldRevs=32ea447b1e40&newRev=1d2904d6ef1c&submit=true
http://perf.snarkfest.net/compare-talos/index.html?oldRevs=32ea447b1e40&newRev=69a1dec10d71&submit=true

The first of those is baseline -> post regression (6.89ms, with variance at 20ms).
The second is baseline -> post regression + backout (2.58ms, with variance at 20ms).

The noise here is much too high, and the difference caused by this range is much smaller than the one seen on the fx-team graph. I don't understand why.
(In reply to :Gijs Kruitbosch from comment #31)
> 2) what regressed between jan 28 and feb 2:
> some of the pushes from bug 970049 can help here:
> 
> jan 28: https://tbpl.mozilla.org/?tree=Try&rev=b2b1a281737c (592.04)
> jan 30: https://tbpl.mozilla.org/?tree=Try&rev=bd702400217e (596.84, +  4.80
> from jan 28)
> jan 31: https://tbpl.mozilla.org/?tree=Try&rev=3e5e37577a24 (602.66, +  5.82
> from jan 30)
> feb 02: https://tbpl.mozilla.org/?tree=Try&rev=70121bdf46b0 (618.43, + 15.77
> from jan 31)
> 
> Which isn't particularly helpful, except that jan-31 to feb-02 was
> significantly worse than the other intervals. I'll try and figure out what
> happened there (but I should probably look at the other ones as well, seeing
> as it's all combined up to a significant regression...), and extend further
> out.
> 
> So:
> 
> jan-31 to feb 02:

http://perf.snarkfest.net/compare-talos/index.html?oldRevs=e87a8c6b7f35&newRev=a192f996f0fb&submit=true
http://perf.snarkfest.net/compare-talos/index.html?oldRevs=c93b439da2ec&newRev=0030bacebe7d&submit=true

The numbers here seem to have gone up/down gradually, by about 1% (6ms) every interval. The noise here is even worse. I'm not sure to what extent it makes sense trying to dig deeper here.
(In reply to :Gijs Kruitbosch from comment #31)
> Open questions for this bug:
> 
> 3) did something else regress on feb 3/4 along with what bug 970043 is
> tracking?
> 
> The graph seems to indicate the major regression range was:
> 
> https://tbpl.mozilla.org/?tree=Fx-
> Team&fromchange=bf25e29dc677&tochange=5fb066c6d76c

Answer: no.

http://compare-talos.mattn.ca/?oldRevs=692abed8b78f&newRev=0b9430d66dad&server=graphs.mozilla.org&submit=true

shows the original regression.

http://compare-talos.mattn.ca/?oldRevs=692abed8b78f&newRev=a72e49ffe3a3&server=graphs.mozilla.org&submit=true

shows the regression disappearing when 'just' backing out the sessionstore changes, and even a noticeable improvement

http://compare-talos.mattn.ca/?oldRevs=692abed8b78f&newRev=d7123add40b3&server=graphs.mozilla.org&submit=true

shows that improvement being nullified by backing out bug 963105. I'm assuming the 0.5% change of 3ms isn't real, considering the noise.
(In reply to :Gijs Kruitbosch from comment #31)
> Open questions for this bug:

4) Was there something that regressed shortly before the sessionstore changes?

> just before sessionstore: https://tbpl.mozilla.org/?tree=Try&rev=692abed8b78f
> (http://hg.mozilla.org/integration/fx-team/rev/bf25e29dc677)

<snip>

> But before bf25e29dc677, there was also mayhem. So I've pushed some ones
> before the baseline listed above:
> 
> feb 2, middle of the day, before an m-c merge:
> https://tbpl.mozilla.org/?tree=Try&rev=87109379ade5
>  (https://hg.mozilla.org/integration/fx-team/rev/b6cc3c35d419)
> 
> feb 2, very early, before another merge:
> https://tbpl.mozilla.org/?tree=Try&rev=64a30014824c
> (https://hg.mozilla.org/integration/fx-team/rev/463bae14bef3)


Yes/maybe:

http://compare-talos.mattn.ca/?oldRevs=64a30014824c&newRev=87109379ade5&server=graphs.mozilla.org&submit=true

This regression is bigger than the 6ms ones seen between jan 31 and feb 02, but not by that much, and the noise is still crazy. Anyway.

Regression range:

http://hg.mozilla.org/integration/fx-team/pushloghtml?fromchange=463bae14bef3&tochange=b6cc3c35d419

Considering the other csets are string/css-only changes with no realistic performance impact, and a metro change, I suspect the m-c to fx-team merge.

Try push for the merge: https://tbpl.mozilla.org/?tree=Try&rev=4733b02b1b97

prospective graph link:

http://compare-talos.mattn.ca/?oldRevs=64a30014824c&newRev=4733b02b1b97&server=graphs.mozilla.org&submit=true
After the aftermath of comment #35, I'd like to unassign myself.

I think our current system of ts_paint results isn't a good enough way of hunting out regressions of the magnitudes we're worried about here (apart from those already split off into separate bugs). Here's why:

in the confusion over where these successive regressions originated, I pushed the same cset twice (my mistake, and my apologies for the resulting overuse of resources):

(In reply to :Gijs Kruitbosch from comment #31)
> early feb 02: https://hg.mozilla.org/integration/fx-team/rev/463bae14bef3 :
> https://tbpl.mozilla.org/?tree=Try&rev=e87a8c6b7f35

<snip>

> feb 2, very early, before another merge:
> https://tbpl.mozilla.org/?tree=Try&rev=64a30014824c
> (https://hg.mozilla.org/integration/fx-team/rev/463bae14bef3)

Both of these were retriggered 10 times, which means we're comparing times over 200 startups on each run, on the same bits. This is what compare-talos says:

http://compare-talos.mattn.ca/?oldRevs=e87a8c6b7f35&newRev=64a30014824c&server=graphs.mozilla.org&submit=true

Considering that this is about .3ms off of some of the regressions 'identified' in my earlier comments, I don't think this is a workable way of diagnosing what the issue(s) were/are here.
Depends on: 973258
Filed bug 973258 for the regression caused by the m-c merge per comment #35.

Joel, I don't think there's anything else worth pursuing in this bug itself. Do you concur?

(I'm clearing the whiteboard because none of the culprits identified so far are Australis-related)
Assignee: gijskruitbosch+bugs → nobody
Flags: needinfo?(jmaher)
Whiteboard: [talos_regression][Australis:P1] → [talos_regression]
With bug 970123 fixed, can you please verify that this regression has gone away?  Thanks!
I don't believe this regression is fixed, but it isn't Australis based, thanks to :Gijs for tracking this down and filing bug 973258.  We still have a noticeable regression in ts_paint:
http://graphs.mozilla.org/graph.html#tests=[[83,131,37],[83,131,35],[83,131,33],[83,131,25]]&sel=1390149232102,1392741232102&displayrange=30&datatype=running

we need more datapoints on this since this hit inbound today.
Untracking from Australis work per last few comments.
Summary: [Australis] ts_paint has regressed a lot on fx-team between january 22 and february 4 → ts_paint has regressed a lot on fx-team between january 22 and february 4
Whiteboard: [talos_regression] → [talos_regression][Australis:P-]
Wait, what evidence do we have to implicate bug 970123 in the first place?  Tim added that dependency to the bug...
Flags: needinfo?(ttaubert)
(In reply to :Ehsan Akhgari (needinfo? me!) (slow responsiveness, emailapocalypse) from comment #41)
> Wait, what evidence do we have to implicate bug 970123 in the first place? 
> Tim added that dependency to the bug...

did you read bug 970123 comment #0?
(In reply to :Gijs Kruitbosch from comment #42)
> (In reply to :Ehsan Akhgari (needinfo? me!) (slow responsiveness,
> emailapocalypse) from comment #41)
> > Wait, what evidence do we have to implicate bug 970123 in the first place? 
> > Tim added that dependency to the bug...
> 
> did you read bug 970123 comment #0?

Yes, I worked on that bug.
(In reply to :Ehsan Akhgari (needinfo? me!) (slow responsiveness, emailapocalypse) from comment #43)
> (In reply to :Gijs Kruitbosch from comment #42)
> > (In reply to :Ehsan Akhgari (needinfo? me!) (slow responsiveness,
> > emailapocalypse) from comment #41)
> > > Wait, what evidence do we have to implicate bug 970123 in the first place? 
> > > Tim added that dependency to the bug...
> > 
> > did you read bug 970123 comment #0?
> 
> Yes, I worked on that bug.

My point was, there is data in that comment. Can you elaborate on how that doesn't answer the 'what data do we have' question?

For more context, you can read the conversations Tim and I had on IRC at the time: http://logbot.glob.com.au/?c=mozilla%23fx-team&s=9+Feb+2014&e=9+Feb+2014 .
Flags: needinfo?(ttaubert)
(In reply to :Gijs Kruitbosch from comment #44)
> (In reply to :Ehsan Akhgari (needinfo? me!) (slow responsiveness,
> emailapocalypse) from comment #43)
> > (In reply to :Gijs Kruitbosch from comment #42)
> > > (In reply to :Ehsan Akhgari (needinfo? me!) (slow responsiveness,
> > > emailapocalypse) from comment #41)
> > > > Wait, what evidence do we have to implicate bug 970123 in the first place? 
> > > > Tim added that dependency to the bug...
> > > 
> > > did you read bug 970123 comment #0?
> > 
> > Yes, I worked on that bug.
> 
> My point was, there is data in that comment.

It describes what bug 970123 is about.  That bug is well understood and fixed now.

> Can you elaborate on how that
> doesn't answer the 'what data do we have' question?

Sure, what I meant was, do we have any data to show that fixing bug 970123 should fix this bug as well?  I asked the same question on all of the dependencies of that bug, and my goal is to see if there is something related to the same root cause that my patch doesn't fix so that I can help with fixing it.

> For more context, you can read the conversations Tim and I had on IRC at the
> time:
> http://logbot.glob.com.au/?c=mozilla%23fx-team&s=9+Feb+2014&e=9+Feb+2014 .

I set the ni? to Tim since he set the dependency.  Since you cleared it and didn't tell me there is more work to do here, I will just assume no and move on.
now that we have more data, we have fixed the ts_paint regression on most platforms, windows xp  still has some issues:
http://graphs.mozilla.org/graph.html#tests=[[83,131,37],[83,131,25],[83,131,35],[83,131,33]]&sel=1390229764074,1392821764074&displayrange=30&datatype=running

Shall we refocus this bug on windows xp?  In 2 months Microsoft will stop supporting windows xp, I am not sure if Mozilla will focus a lot of energy on that specific platform going forward.
Flags: needinfo?(jmaher)
(In reply to Joel Maher (:jmaher) from comment #46)
> now that we have more data, we have fixed the ts_paint regression on most
> platforms, windows xp  still has some issues:
> http://graphs.mozilla.org/graph.html#tests=[[83,131,37],[83,131,25],[83,131,
> 35],[83,131,33]]&sel=1390229764074,
> 1392821764074&displayrange=30&datatype=running
> 
> Shall we refocus this bug on windows xp?  In 2 months Microsoft will stop
> supporting windows xp, I am not sure if Mozilla will focus a lot of energy
> on that specific platform going forward.

As I stated in comment #37, I don't think ts_paint noise levels make it fruitful to pursue this further. However, there's no reason someone else couldn't if they thought it merited more investigation.
Taking in account Gijs comment #47 (nice investigation work btw), I am going to untrack this bug for 29 & 30.
No longer depends on: 970114
Sounds like this isn't "a lot" anymore, and should be WONTFIX?
I am fine with a wontfix, note that we still show a noticeable uptick in the results for ts:
http://graphs.mozilla.org/graph.html#tests=[[83,132,37]]&sel=none&displayrange=90&datatype=running

the good news is we have been stable or only taken a slight hit at least for winxp.

Other platforms are not that lucky, win7 is pretty much going up:
http://graphs.mozilla.org/graph.html#tests=[[83,132,37],[83,132,33],[83,132,25],[83,132,35],[83,132,31]]&sel=1387888768304,1395664768304&displayrange=90&datatype=running

In general we took a 10% gain in numbers between January and March on just about all platforms, with the majority focused in the time range specified in this bug.

This seems to be mostly caused by whatever went in at the beginning of the firefox 30 cycle.  The only other ts_paint regression I see is bug 973258.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.