1239834 - 2.45 - 15.71% tcanvasmark / tp5o / tp5o % Processor Time / tp5o Main_RSS / tp5o Private Bytes (windows7-32, windows8-64, windowsxp) regression on push 6a44d73430aa (Thu Jan 14 2016)

Reporter

Description

•

8 years ago

Talos has detected a Firefox performance regression from push 6a44d73430aa. As author of one of the patches included in that push, we need your help to address this regression.

This is a list of all known regressions and improvements related to the push: https://treeherder.allizom.org/perf.html#/alerts?id=1773

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test(s), please see:

https://wiki.mozilla.org/Buildbot/Talos/Tests#CanvasMark
https://wiki.mozilla.org/Buildbot/Talos/Tests#tp5

Reproducing and debugging the regression: If you would like to re-run this Talos test on a potential fix, use try with the following syntax:

try: -b o -p win64,win32 -u none -t chromez[Windows 8,Windows 7,Windows XP],tp5o[Windows 8,Windows 7,Windows XP] --rebuild 5 # add "mozharness: --spsProfile" to generate profile data 

(we suggest --rebuild 5 to be more confident in the results)

To run the test locally and do a more in-depth investigation, first set up a local Talos environment:

https://wiki.mozilla.lorg/Buildbot/Talos/Running#Running_locally_-_Source_Code 

Then run the following command from the directory where you set up Talos:

talos --develop -e <path>/firefox -a tcanvasmark:tp5o

(add --e10s to run tests in e10s mode)

Making a decision: As the patch author we need your feedback to help us handle this regression.
*** Please let us know your plans by Monday, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations:

https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling

--

[wlach's note: there were several commits in the tested push, but it looks like it was :yury's patch in bug 1236104 that caused the problem based on the irc chatter]

Flags: needinfo?(ydelendik)

Yury Delendik (:yury)

Assignee

Comment 1

•

8 years ago

Attached patch Increases tiny script limit for off-thread compilation. — Details — Splinter Review

I was talking to Kannan about the issue. Increasing the limit for scripts we don't want to send to compile on different thread might help.

With bug 1236104 applied: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e25bb7982c84

Without: https://treeherder.mozilla.org/#/jobs?repo=try&revision=dac3c3533078

William Lachance (:wlach)

Reporter

Comment 2

•

8 years ago

Here's a compare view of those:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=dac3c3533078&newProject=try&newRevision=e25bb7982c84&framework=1

I did some retriggers on win7 tp in the try pushes to be more confident in the difference (we might want to retrigger some other suites as well, depending on results)

Yury Delendik (:yury)

Assignee

Comment 3

•

8 years ago

Comment on attachment 8708081 [details] [diff] [review]
Increases tiny script limit for off-thread compilation.

Review of attachment 8708081 [details] [diff] [review]:
-----------------------------------------------------------------

Looks like increasing the limit helps. I checked Talos JS files: 60% of them less than 5K, and 33% less than 25K. So 25K is good number to reduce the affect of the Talos files. Also checked m-c JS files, most of files with "debug" term in their names are below this size as well.

Attachment #8708081 - Flags: review?(kvijayan)

Yury Delendik (:yury)

Assignee

Comment 4

•

8 years ago

> Looks like increasing the limit helps. I checked Talos JS files: 60% of them
> less than 5K, and 33% less than 25K.

eh, "less" I meant greater. :)

Flags: needinfo?(ydelendik)

William Lachance (:wlach)

Reporter

Comment 5

•

8 years ago

Hi Yury, it looks like we're still seeing substantial regressions with the new patch:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=dac3c3533078&newProject=try&newRevision=e25bb7982c84&framework=1

If you dig into the tp5o results, you see that there are some pages which are particularly slower to load now than they used to be:

https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&originalRevision=dac3c3533078&newProject=try&newRevision=e25bb7982c84&originalSignature=ca94ff5540bd7a90cd41306de55b1b69962aca1e&newSignature=ca94ff5540bd7a90cd41306de55b1b69962aca1e

Note that `sohu.com` in particular is 50% slower to load.

I think we should define what an "acceptable" performance hit is from this patch. It's fine to trade off some loading time for jank reduction, but I don't know what the right trade-off is. CC'ing :vladan and :avih to see if they have ideas.

Kannan Vijayan [:djvj]

Updated

•

8 years ago

Attachment #8708081 - Flags: review?(kvijayan) → review+

Yury Delendik (:yury)

Assignee

Comment 6

•

8 years ago

(In reply to William Lachance (:wlach) from comment #5)
> I think we should define what an "acceptable" performance hit is from this
> patch. It's fine to trade off some loading time for jank reduction, but I
> don't know what the right trade-off is. CC'ing :vladan and :avih to see if
> they have ideas.

Let's land this patch for now. I'll keep this bug open to hear if the trade-off is right.

Keywords: checkin-needed, leave-open

Joel Maher ( :jmaher ) (UTC -8)

Comment 8

•

8 years ago

here is a general list of what regressed:
a compare chooser view:
https://treeherder.allizom.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=ab3c6ba63c89&newProject=mozilla-inbound&newRevision=e6f00eab30ae&framework=1

the ones of concern are:
ts_paint e10s: win7, win8
tp5o xres e10s: linux64
tp5o: all platforms
tp5o e10s: all platforms
tp5o private bytes: winxp, win7, linux64
tp5o private bytes e10s: winxp, win7, linux64
tp5o main_rss: all platforms
tp5o % processor time: winxp, win7
tcanvasmark e10s: win*
tcanvasmark: winxp
damp e10s: win7, win8, osx 10
damp: win7, win8, osx10, linux64

Avi Halachmi (:avih)

Comment 9

•

8 years ago

(In reply to Yury Delendik (:yury) from comment #3)
> Looks like increasing the limit helps. I checked Talos JS files: 60% of them
> less than 5K, and 33% less than 25K. So 25K is good number to reduce the
> affect of the Talos files.

Improving the talos numbers is not a goal in itself, so we shouldn't fine tune our tradeoff based on talos numbers alone.

Talos is a tool which gives us some indication when something changes, but our ultimate goal is to make it as best as we can for the users. Of course, if we think that the talos numbers also represent the average behavior for our users well enough, then tuning for talos is just a useful means to optimize it for our users.

Also, if there's a good reason for the recorded "regression", e.g. if we know this specific thing regressed but in order to make some other thing (which doesn't happen to be represented in talos) better, then it should be OK too.

Ultimately people decide which tradeoff is better, and the developers who own the code are usually the best ones to understand the factors of the tradeoff best.

So there's no strict threshold for what is acceptable and what isn't. I can help with that decision but I don't quite understand the tradeoff.

Could someone please put it into words such that it's easier to understand and therefore decide?

Yury Delendik (:yury)

Assignee

Comment 10

•

8 years ago

Avi, the "regression" caused by moving JavaScript bytecode compilation off the main thread to the helper thread. In this particular case (bug 1236104), we moved larger scripts that are already pre-cached/loaded/ready-execute to compile off-thread. Depending on JS runtime state, "larger script" defined above 5K or 100K (see [1] for decision). The attached to this bug patch raises these numbers to 25K and 100K.

The main use case we looking at is https://docs.google.com/. As demonstrated at bug 1236104, we can remove almost entirely JavaScript bytecode compilation from main thread for larger scripts. Expected regression is memory usage needed for helper threads. And if we got wrong our heuristics to detect script size and JS runtime state, we might see some increase in CPU usage as well. 


  [1] https://dxr.mozilla.org/mozilla-central/source/js/src/jsapi.cpp#4104

Avi Halachmi (:avih)

Comment 11

•

8 years ago

(In reply to Yury Delendik (:yury) from comment #10)
> The main use case we looking at is https://docs.google.com/.

Script heavy google pages are definitely a good thing to improve and unblock, but seeing how most of the tests regressed meaningfully (including tp5 which is ~50 top sites), would you say this is the kind of tradeoff you had in mind, and did/are you considering it reasonable?

TBH, to me it's a bit hard to swallow, i.e. the price to pay here seems to me quite high...

Yury Delendik (:yury)

Assignee

Comment 12

•

8 years ago

(In reply to Avi Halachmi (:avih) from comment #11)
> (In reply to Yury Delendik (:yury) from comment #10)
> would you say this is the kind of tradeoff you
> had in mind, and did/are you considering it reasonable?

Not really, I'll look at sohu.com and reuters.com regressions, probably it will shine some lights. But based on comment 5 results, the precision of the heuristic above matter.

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Component: Untriaged → JavaScript Engine

Product: Firefox → Core

Ryan VanderMeulen [:RyanVM]

Comment 13

•

8 years ago

This was pushed to inbound, but with bug 1236104 in the commit message.

Keywords: checkin-needed, leave-open

Joel Maher ( :jmaher ) (UTC -8)

Comment 14

•

8 years ago

looking at this, many of the regressions are mostly fixed, a few don't look touched at all (like the damp ones).

there are 30 different regressions:
http://alertmanager.allizom.org:8080/alerts.html?rev=e6f00eab30ae7f2611a13d22b512fd7a39df09e7&showAll=1

It is really hard to tell exactly what percentage we are at- if we are done working on this bug for now, I can summarize where we are with each regression.

Joel Maher ( :jmaher ) (UTC -8)

Comment 15

•

8 years ago

here is a list of the improvements:
http://alertmanager.allizom.org:8080/alerts.html?rev=9f02ddf848139a33ca5ed38197e04f0ed7e2cb5a&showAll=1&testIndex=0&platIndex=0

note, there are a couple regressions (~5%) with this fix:
damp: linux64-e10s, win7-e10s

while this landed with 3 other patches:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=b31310af26069026c2c4f86d834cf2280d4395af&tochange=9f02ddf848139a33ca5ed38197e04f0ed7e2cb5a

I did some try runs to bisect those and it fell on the fix landed for this:
https://treeherder.mozilla.org/#/jobs?repo=try&author=jmaher@mozilla.com&fromchange=7979375482ea&tochange=de788240cdf0

you can see a compare view:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=b31310af2606&newProject=mozilla-inbound&newRevision=9f02ddf84813&framework=1&filter=damp

non e10s looks like improvements (a bit of noise, but some improvements) and e10s is worse.  The subtests for linux64 are here:
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&originalRevision=b31310af2606&newProject=mozilla-inbound&newRevision=9f02ddf84813&originalSignature=13dd9b8ab5c9d40a946e35a9dc085970caebce48&newSignature=13dd9b8ab5c9d40a946e35a9dc085970caebce48

the regressions seem to be in the .close methods, the performance and netmonitor suites.

:yury, any thoughts on this?  To learn more about damp, the wiki is here:
https://wiki.mozilla.org/Buildbot/Talos/Tests#DAMP

and you can ask in #devtools, :bgrins is a great resource as well as many others.

Flags: needinfo?(ydelendik)

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

8 years ago

Blocks: 1233209

Yury Delendik (:yury)

Assignee

Comment 16

•

8 years ago

Joel, is it in addition to tp5o sohu.com regression or this one was fixed?

Joel Maher ( :jmaher ) (UTC -8)

Comment 17

•

8 years ago

it doesn't look like sohu is fixed- but we would need to be specific about the comparison as other tests could influence results.  We had a ~50% regression on that single test, and a 3% improvement- but tp5o is about the aggregate.

Do you have thoughts on the damp regressions?

Yury Delendik (:yury)

Assignee

Comment 18

•

8 years ago

(In reply to Joel Maher (:jmaher) from comment #17)
> it doesn't look like sohu is fixed- but we would need to be specific about
> the comparison as other tests could influence results.  We had a ~50%
> regression on that single test, and a 3% improvement- but tp5o is about the
> aggregate.
> 
> Do you have thoughts on the damp regressions?

I'm experimenting with amount of threads we allocate for off-thread parsing (currently it is set to 1). With bug 1236104 we add more off-thread tasks and with 25k-limit patch we reduce that a little. Trying to allocate more threads in https://treeherder.mozilla.org/#/jobs?repo=try&revision=6c878218b4fe

Joel Maher ( :jmaher ) (UTC -8)

Comment 19

•

8 years ago

collecting more data- getting a solid set for your base on m-c:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=6764bc656c1d&newProject=try&newRevision=6c878218b4fe&framework=1

if you always push with the same base, then we can easily compare experiments.  We should also push to try with "--rebuild 5" at the end of the try syntax- I can do the retriggers when the build is finished.  A few hours we should have results to compare.

Yury Delendik (:yury)

Assignee

Comment 20

•

8 years ago

Increased amount of threads with (and even without) bug 1236104 patches does not improve the situation.

Also tried removing bug 1236104 patches at (https://treeherder.mozilla.org/#/jobs?repo=try&revision=90dc47a5016c) -- improvement on DAMP e10s and few other tests 4-5% as expected.

Checked performance by removing tiny script processing off-thread (only 100K scripts are compiled) https://treeherder.mozilla.org/#/jobs?repo=try&revision=bf15fc68ffa5 (https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=914dae86b192&newProject=try&newRevision=bf15fc68ffa5&framework=1) -- cart opt e10s took 20% regression.

Flags: needinfo?(ydelendik)

Joel Maher ( :jmaher ) (UTC -8)

Comment 21

•

8 years ago

so what are the next steps?  This is a large hit on our primary platform (windows) across a few tests.  Is this change needed?  Do we have time to investigate this more?  Do we need to consider accepting this as the new normal?

Flags: needinfo?(ydelendik)

Yury Delendik (:yury)

Assignee

Comment 22

•

8 years ago

(In reply to Joel Maher (:jmaher) from comment #21)
> so what are the next steps?  This is a large hit on our primary platform
> (windows) across a few tests.  Is this change needed?  Do we have time to
> investigate this more?  Do we need to consider accepting this as the new
> normal?

I measured time required to schedule the off-thread parse tasks (from StartOffThreadParseScript call to the CompileScript call). While running talos, in majority cases this measured as 0ms. But in very seldom cases in might have up to 800ms values. The ParseTask might wait on GC, so there is a huge chance that this might be killing the performance. I'm looking into what does exactly cause this delay.

I reverted patches in bug 1236104, so we can close this one now.

Flags: needinfo?(ydelendik)

Joel Maher ( :jmaher ) (UTC -8)

Comment 23

•

8 years ago

this is great to see the regressions fixed.  I have asked in bug 1236104 to get this reverted/backed out on mozilla-aurora (v.46)

Yury Delendik (:yury)

Assignee

Comment 24

•

8 years ago

Requested aurora uplift. Closing this issue as resolved by back out.

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla47

Yury Delendik (:yury)

Assignee

Updated

•

8 years ago

Assignee: nobody → ydelendik

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Version: unspecified → 46 Branch