Closed Bug 1433128 Opened 6 years ago Closed 6 years ago

18.75 - 25.73% tp6_facebook (windows10-64) regression on push 467d285e001c (Tue Jan 23 2018)

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla60

Tracking Flags:

Tracking

Status

firefox-esr52

---

unaffected

firefox58

---

unaffected

firefox59

---

unaffected

firefox60

---

fixed

People

(Reporter: igoldan, Assigned: jonco)

References

Details

(Keywords: perf, regression, talos-regression)

Ionuț Goldan [:igoldan]

Reporter

Description

•

6 years ago

Talos has detected a Firefox performance regression from push:

https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=95b09e6e9613435bd1934a274cbeeefe23e3bab1&tochange=467d285e001c568039af5e3e067cff11c7ac43cf

As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

 26%  tp6_facebook windows10-64 pgo e10s     283.31 -> 356.21
 19%  tp6_facebook windows10-64 pgo 1_thread e10s276.38 -> 328.21


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=11260

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling

Ionuț Goldan [:igoldan]

Reporter

Updated

•

6 years ago

Component: Untriaged → JavaScript Engine

Product: Firefox → Core

Ionuț Goldan [:igoldan]

Reporter

Comment 1

•

6 years ago

:jonco This regression is pretty visible on PGO builds. On OPT builds it's hard to notice, still a very small increase seems to be there.

It's possible this is caused by the PGO build process itself. If that's the case, there are chances this will go away.

Please investigate bug 1431353 and say whether there are any expected performance penalties.

Flags: needinfo?(jcoppeard)

Jon Coppeard (:jonco)

Assignee

Comment 2

•

6 years ago

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #1)
This was expected to improve performance, if anything, by using more JS helper threads for parsing.

It's possible that this is affecting scheduling of other tasks run on JS helper threads however.  It might be that parsing is now blocking GC tasks and holding things up.  Maybe we should allow higher priority tasks to interrupt lower priority ones.

Jon Coppeard (:jonco)

Assignee

Updated

•

6 years ago

Assignee: nobody → jcoppeard

Flags: needinfo?(jcoppeard)

Jon Coppeard (:jonco)

Assignee

Comment 3

•

6 years ago

I've reproduced this on try where I get a 300 -> 342 change for Windows 10 PGO builds.

I've tried a few approaches to mitigating this but haven't succeeded so far.  I'm continuing to investigate.

Jon Coppeard (:jonco)

Assignee

Comment 4

•

6 years ago

(In reply to Jon Coppeard (:jonco) from comment #3)
I didn't realise that talos ignores the first 5 results and then takes the median.  My test results actually show 259 -> 365, which is closer to those reported.

The source of the problem is contention on the 'exclusive access lock' used to protect the atoms table.  Parsing creates a lot of atoms, and even though we have a per-zone cache (hit rate 70% for this test) concurrent parse tasks spend a lot of time waiting on this lock.

The real fix for this is to allow concurrent modification of the atoms table, possibly by splitting it into many sub-tables each with their own lock (like Java's ConcurrentHashMap).  This would also have the side effect of enabling parallel sweeping of this table.  This is quite a large amount of work however.

There are several smaller things we can do to improve the current situation:
 - reduce the concurrency for off-thread parsing to a more acceptable level
 - reduce the scope where the lock is held while atomizing
 - split the exclusive access lock into several smaller locks

I'll experiment with these first.

Jon Coppeard (:jonco)

Assignee

Comment 5

•

6 years ago

The patch in bug 1432794 shows a large improvement on try.  My theory is that although this doesn't remove the contention it stops it affecting the main thread by removing the prototype/constructor initialisation that happened on the main thread and atomized a load of strings.

I'm going to land that patch and see if that improves things.

Jon Coppeard (:jonco)

Assignee

Comment 6

•

6 years ago

(In reply to Jon Coppeard (:jonco) from comment #5)
This seems to me to have been fixed by the patch in bug 1432794.  What do you think?

Flags: needinfo?(igoldan)

Ionuț Goldan [:igoldan]

Reporter

Comment 7

•

6 years ago

(In reply to Jon Coppeard (:jonco) from comment #6)
> (In reply to Jon Coppeard (:jonco) from comment #5)
> This seems to me to have been fixed by the patch in bug 1432794.  What do
> you think?

I agree. The regressions on PGO builds are definitely fixed.
Still, there's something wrong with the OPT builds from [1] and [2].
The tests here became noisier, as the min-max value range increased. Do you know why that may happen?
I will do some retriggers on the OPT builds.

[1] https://treeherder.mozilla.org/perf.html#/graphs?timerange=5184000&series=mozilla-inbound,1535542,1,1&series=mozilla-inbound,1535578,1,1
[2] https://treeherder.mozilla.org/perf.html#/graphs?timerange=7776000&series=mozilla-inbound,1552299,1,1&series=mozilla-inbound,1550394,1,1

Flags: needinfo?(igoldan)

Jon Coppeard (:jonco)

Assignee

Comment 8

•

6 years ago

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #7)
Interesting, I hadn't seen that.  I don't know why that would be.

Jason Orendorff [:jorendorff]

Updated

•

6 years ago

status-firefox60: --- → affected

Priority: -- → P3

Ionuț Goldan [:igoldan]

Reporter

Comment 9

•

6 years ago

:jmaher I want to close this as FIXED unless you're too worried OPT tests are too noisy now.

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 10

•

6 years ago

this sounds good to me.

Status: NEW → RESOLVED

Closed: 6 years ago

Flags: needinfo?(jmaher)

Resolution: --- → FIXED

Ryan VanderMeulen [:RyanVM]

Updated

•

6 years ago

status-firefox58: --- → unaffected

status-firefox59: --- → unaffected

status-firefox60: affected → fixed

status-firefox-esr52: --- → unaffected

Depends on: 1432794

Target Milestone: --- → mozilla60

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

18.75 - 25.73% tp6_facebook (windows10-64) regression on push 467d285e001c (Tue Jan 23 2018)

Categories

(Core :: JavaScript Engine, defect, P3)

Tracking

()

People

(Reporter: igoldan, Assigned: jonco)

References

Details

(Keywords: perf, regression, talos-regression)

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Comment 9

Comment 10

Updated