2.45 - 15.71% tcanvasmark / tp5o / tp5o % Processor Time / tp5o Main_RSS / tp5o Private Bytes (windows7-32, windows8-64, windowsxp) regression on push 6a44d73430aa (Thu Jan 14 2016)

RESOLVED FIXED in mozilla47

Status

()

Core
JavaScript Engine
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: wlach, Assigned: yury)

Tracking

(Blocks: 1 bug, {perf, regression})

46 Branch
mozilla47
perf, regression
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [talos_regression])

Attachments

(1 attachment)

Talos has detected a Firefox performance regression from push 6a44d73430aa. As author of one of the patches included in that push, we need your help to address this regression.

This is a list of all known regressions and improvements related to the push: https://treeherder.allizom.org/perf.html#/alerts?id=1773

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test(s), please see:

https://wiki.mozilla.org/Buildbot/Talos/Tests#CanvasMark
https://wiki.mozilla.org/Buildbot/Talos/Tests#tp5

Reproducing and debugging the regression: If you would like to re-run this Talos test on a potential fix, use try with the following syntax:

try: -b o -p win64,win32 -u none -t chromez[Windows 8,Windows 7,Windows XP],tp5o[Windows 8,Windows 7,Windows XP] --rebuild 5 # add "mozharness: --spsProfile" to generate profile data 

(we suggest --rebuild 5 to be more confident in the results)

To run the test locally and do a more in-depth investigation, first set up a local Talos environment:

https://wiki.mozilla.lorg/Buildbot/Talos/Running#Running_locally_-_Source_Code 

Then run the following command from the directory where you set up Talos:

talos --develop -e <path>/firefox -a tcanvasmark:tp5o

(add --e10s to run tests in e10s mode)

Making a decision: As the patch author we need your feedback to help us handle this regression.
*** Please let us know your plans by Monday, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations:

https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling

--

[wlach's note: there were several commits in the tested push, but it looks like it was :yury's patch in bug 1236104 that caused the problem based on the irc chatter]
Flags: needinfo?(ydelendik)
(Assignee)

Comment 1

2 years ago
Created attachment 8708081 [details] [diff] [review]
Increases tiny script limit for off-thread compilation.

I was talking to Kannan about the issue. Increasing the limit for scripts we don't want to send to compile on different thread might help.

With bug 1236104 applied: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e25bb7982c84

Without: https://treeherder.mozilla.org/#/jobs?repo=try&revision=dac3c3533078
Here's a compare view of those:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=dac3c3533078&newProject=try&newRevision=e25bb7982c84&framework=1

I did some retriggers on win7 tp in the try pushes to be more confident in the difference (we might want to retrigger some other suites as well, depending on results)
(Assignee)

Comment 3

2 years ago
Comment on attachment 8708081 [details] [diff] [review]
Increases tiny script limit for off-thread compilation.

Review of attachment 8708081 [details] [diff] [review]:
-----------------------------------------------------------------

Looks like increasing the limit helps. I checked Talos JS files: 60% of them less than 5K, and 33% less than 25K. So 25K is good number to reduce the affect of the Talos files. Also checked m-c JS files, most of files with "debug" term in their names are below this size as well.
Attachment #8708081 - Flags: review?(kvijayan)
(Assignee)

Comment 4

2 years ago
> Looks like increasing the limit helps. I checked Talos JS files: 60% of them
> less than 5K, and 33% less than 25K.

eh, "less" I meant greater. :)
Flags: needinfo?(ydelendik)
Hi Yury, it looks like we're still seeing substantial regressions with the new patch:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=dac3c3533078&newProject=try&newRevision=e25bb7982c84&framework=1

If you dig into the tp5o results, you see that there are some pages which are particularly slower to load now than they used to be:

https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&originalRevision=dac3c3533078&newProject=try&newRevision=e25bb7982c84&originalSignature=ca94ff5540bd7a90cd41306de55b1b69962aca1e&newSignature=ca94ff5540bd7a90cd41306de55b1b69962aca1e

Note that `sohu.com` in particular is 50% slower to load.

I think we should define what an "acceptable" performance hit is from this patch. It's fine to trade off some loading time for jank reduction, but I don't know what the right trade-off is. CC'ing :vladan and :avih to see if they have ideas.

Updated

2 years ago
Attachment #8708081 - Flags: review?(kvijayan) → review+
(Assignee)

Comment 6

2 years ago
(In reply to William Lachance (:wlach) from comment #5)
> I think we should define what an "acceptable" performance hit is from this
> patch. It's fine to trade off some loading time for jank reduction, but I
> don't know what the right trade-off is. CC'ing :vladan and :avih to see if
> they have ideas.

Let's land this patch for now. I'll keep this bug open to hear if the trade-off is right.
Keywords: checkin-needed, leave-open
Duplicate of this bug: 1240099
here is a general list of what regressed:
a compare chooser view:
https://treeherder.allizom.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=ab3c6ba63c89&newProject=mozilla-inbound&newRevision=e6f00eab30ae&framework=1

the ones of concern are:
ts_paint e10s: win7, win8
tp5o xres e10s: linux64
tp5o: all platforms
tp5o e10s: all platforms
tp5o private bytes: winxp, win7, linux64
tp5o private bytes e10s: winxp, win7, linux64
tp5o main_rss: all platforms
tp5o % processor time: winxp, win7
tcanvasmark e10s: win*
tcanvasmark: winxp
damp e10s: win7, win8, osx 10
damp: win7, win8, osx10, linux64

Comment 9

2 years ago
(In reply to Yury Delendik (:yury) from comment #3)
> Looks like increasing the limit helps. I checked Talos JS files: 60% of them
> less than 5K, and 33% less than 25K. So 25K is good number to reduce the
> affect of the Talos files.

Improving the talos numbers is not a goal in itself, so we shouldn't fine tune our tradeoff based on talos numbers alone.

Talos is a tool which gives us some indication when something changes, but our ultimate goal is to make it as best as we can for the users. Of course, if we think that the talos numbers also represent the average behavior for our users well enough, then tuning for talos is just a useful means to optimize it for our users.

Also, if there's a good reason for the recorded "regression", e.g. if we know this specific thing regressed but in order to make some other thing (which doesn't happen to be represented in talos) better, then it should be OK too.

Ultimately people decide which tradeoff is better, and the developers who own the code are usually the best ones to understand the factors of the tradeoff best.

So there's no strict threshold for what is acceptable and what isn't. I can help with that decision but I don't quite understand the tradeoff.

Could someone please put it into words such that it's easier to understand and therefore decide?
(Assignee)

Comment 10

2 years ago
Avi, the "regression" caused by moving JavaScript bytecode compilation off the main thread to the helper thread. In this particular case (bug 1236104), we moved larger scripts that are already pre-cached/loaded/ready-execute to compile off-thread. Depending on JS runtime state, "larger script" defined above 5K or 100K (see [1] for decision). The attached to this bug patch raises these numbers to 25K and 100K.

The main use case we looking at is https://docs.google.com/. As demonstrated at bug 1236104, we can remove almost entirely JavaScript bytecode compilation from main thread for larger scripts. Expected regression is memory usage needed for helper threads. And if we got wrong our heuristics to detect script size and JS runtime state, we might see some increase in CPU usage as well. 


  [1] https://dxr.mozilla.org/mozilla-central/source/js/src/jsapi.cpp#4104
(In reply to Yury Delendik (:yury) from comment #10)
> The main use case we looking at is https://docs.google.com/.

Script heavy google pages are definitely a good thing to improve and unblock, but seeing how most of the tests regressed meaningfully (including tp5 which is ~50 top sites), would you say this is the kind of tradeoff you had in mind, and did/are you considering it reasonable?

TBH, to me it's a bit hard to swallow, i.e. the price to pay here seems to me quite high...
(Assignee)

Comment 12

2 years ago
(In reply to Avi Halachmi (:avih) from comment #11)
> (In reply to Yury Delendik (:yury) from comment #10)
> would you say this is the kind of tradeoff you
> had in mind, and did/are you considering it reasonable?

Not really, I'll look at sohu.com and reuters.com regressions, probably it will shine some lights. But based on comment 5 results, the precision of the heuristic above matter.
Component: Untriaged → JavaScript Engine
Product: Firefox → Core
This was pushed to inbound, but with bug 1236104 in the commit message.
Keywords: checkin-needed, leave-open
looking at this, many of the regressions are mostly fixed, a few don't look touched at all (like the damp ones).

there are 30 different regressions:
http://alertmanager.allizom.org:8080/alerts.html?rev=e6f00eab30ae7f2611a13d22b512fd7a39df09e7&showAll=1

It is really hard to tell exactly what percentage we are at- if we are done working on this bug for now, I can summarize where we are with each regression.
here is a list of the improvements:
http://alertmanager.allizom.org:8080/alerts.html?rev=9f02ddf848139a33ca5ed38197e04f0ed7e2cb5a&showAll=1&testIndex=0&platIndex=0

note, there are a couple regressions (~5%) with this fix:
damp: linux64-e10s, win7-e10s

while this landed with 3 other patches:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=b31310af26069026c2c4f86d834cf2280d4395af&tochange=9f02ddf848139a33ca5ed38197e04f0ed7e2cb5a

I did some try runs to bisect those and it fell on the fix landed for this:
https://treeherder.mozilla.org/#/jobs?repo=try&author=jmaher@mozilla.com&fromchange=7979375482ea&tochange=de788240cdf0

you can see a compare view:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=b31310af2606&newProject=mozilla-inbound&newRevision=9f02ddf84813&framework=1&filter=damp

non e10s looks like improvements (a bit of noise, but some improvements) and e10s is worse.  The subtests for linux64 are here:
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=mozilla-inbound&originalRevision=b31310af2606&newProject=mozilla-inbound&newRevision=9f02ddf84813&originalSignature=13dd9b8ab5c9d40a946e35a9dc085970caebce48&newSignature=13dd9b8ab5c9d40a946e35a9dc085970caebce48

the regressions seem to be in the .close methods, the performance and netmonitor suites.

:yury, any thoughts on this?  To learn more about damp, the wiki is here:
https://wiki.mozilla.org/Buildbot/Talos/Tests#DAMP

and you can ask in #devtools, :bgrins is a great resource as well as many others.
Flags: needinfo?(ydelendik)
Blocks: 1233209
(Assignee)

Comment 16

2 years ago
Joel, is it in addition to tp5o sohu.com regression or this one was fixed?
it doesn't look like sohu is fixed- but we would need to be specific about the comparison as other tests could influence results.  We had a ~50% regression on that single test, and a 3% improvement- but tp5o is about the aggregate.

Do you have thoughts on the damp regressions?
(Assignee)

Comment 18

2 years ago
(In reply to Joel Maher (:jmaher) from comment #17)
> it doesn't look like sohu is fixed- but we would need to be specific about
> the comparison as other tests could influence results.  We had a ~50%
> regression on that single test, and a 3% improvement- but tp5o is about the
> aggregate.
> 
> Do you have thoughts on the damp regressions?

I'm experimenting with amount of threads we allocate for off-thread parsing (currently it is set to 1). With bug 1236104 we add more off-thread tasks and with 25k-limit patch we reduce that a little. Trying to allocate more threads in https://treeherder.mozilla.org/#/jobs?repo=try&revision=6c878218b4fe
collecting more data- getting a solid set for your base on m-c:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=6764bc656c1d&newProject=try&newRevision=6c878218b4fe&framework=1

if you always push with the same base, then we can easily compare experiments.  We should also push to try with "--rebuild 5" at the end of the try syntax- I can do the retriggers when the build is finished.  A few hours we should have results to compare.
(Assignee)

Comment 20

2 years ago
Increased amount of threads with (and even without) bug 1236104 patches does not improve the situation.

Also tried removing bug 1236104 patches at (https://treeherder.mozilla.org/#/jobs?repo=try&revision=90dc47a5016c) -- improvement on DAMP e10s and few other tests 4-5% as expected.

Checked performance by removing tiny script processing off-thread (only 100K scripts are compiled) https://treeherder.mozilla.org/#/jobs?repo=try&revision=bf15fc68ffa5 (https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&originalRevision=914dae86b192&newProject=try&newRevision=bf15fc68ffa5&framework=1) -- cart opt e10s took 20% regression.
Flags: needinfo?(ydelendik)
so what are the next steps?  This is a large hit on our primary platform (windows) across a few tests.  Is this change needed?  Do we have time to investigate this more?  Do we need to consider accepting this as the new normal?
Flags: needinfo?(ydelendik)
(Assignee)

Comment 22

2 years ago
(In reply to Joel Maher (:jmaher) from comment #21)
> so what are the next steps?  This is a large hit on our primary platform
> (windows) across a few tests.  Is this change needed?  Do we have time to
> investigate this more?  Do we need to consider accepting this as the new
> normal?

I measured time required to schedule the off-thread parse tasks (from StartOffThreadParseScript call to the CompileScript call). While running talos, in majority cases this measured as 0ms. But in very seldom cases in might have up to 800ms values. The ParseTask might wait on GC, so there is a huge chance that this might be killing the performance. I'm looking into what does exactly cause this delay.

I reverted patches in bug 1236104, so we can close this one now.
Flags: needinfo?(ydelendik)
this is great to see the regressions fixed.  I have asked in bug 1236104 to get this reverted/backed out on mozilla-aurora (v.46)
(Assignee)

Comment 24

2 years ago
Requested aurora uplift. Closing this issue as resolved by back out.
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
(Assignee)

Updated

2 years ago
Assignee: nobody → ydelendik
Version: unspecified → 46 Branch
You need to log in before you can comment on or make changes to this bug.