Closed Bug 1233214 Opened 9 years ago Closed 8 years ago

2-5% Linux 64/Win* tp5o Main RSS on Mozilla-Inbound on Dec 16, 2015 from push 3e08a6a2299b

Tracking

(firefox47 wontfix, firefox48 affected, firefox49 affected, firefox50 affected)

Status:

RESOLVED WONTFIX

Tracking Flags:

Tracking

Status

firefox47

---

wontfix

firefox48

---

affected

firefox49

---

affected

firefox50

---

affected

People

(Reporter: jmaher, Unassigned)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression])

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Description

•

9 years ago

Talos has detected a Firefox performance regression from your commit 3e08a6a2299b242ac920c928ac57965f2be9f559 in bug 1231379.  We need you to address this regression.

This is a list of all known regressions and improvements related to your bug:
http://alertmanager.allizom.org:8080/alerts.html?rev=3e08a6a2299b242ac920c928ac57965f2be9f559&showAll=1

On the page above you can see Talos alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test, please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#RSS_.28Resident_Set_Size.29

Reproducing and debugging the regression:
If you would like to re-run this Talos test on a potential fix, use try with the following syntax:
try: -b o -p linux64,win64,win32 -u none -t tp5o  # add "mozharness: --spsProfile" to generate profile data

To run the test locally and do a more in-depth investigation, first set up a local Talos environment:
https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code

Then run the following command from the directory where you set up Talos:
talos --develop -e <path>/firefox -a sessionrestore_no_auto_restore

Making a decision:
As the patch author we need your feedback to help us handle this regression.
*** Please let us know your plans by Monday, or the offending patch will be backed out! ***

Our wiki page oulines the common responses and expectations:
https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 1

•

9 years ago

this is actually a net perf win, but I wanted to document the main RSS regression we see.

Here is the compare view:
https://treeherder.allizom.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=84c8783b8518&newProject=mozilla-inbound&newRevision=3e08a6a2299b&framework=1&showOnlyConfident=1

it clearly shows the main rss regressions and the ts_paint wins!

I assume based on the type of patch this is we are expecting this.  Please confirm!

Flags: needinfo?(catlee)

Chris AtLee [:catlee]

Comment 2

•

9 years ago

I wasn't expecting it, but it definitely makes sense. omni.ja itself is larger, so would consume more memory, assuming Firefox is keeping it in memory.

The bigger question is, is this an acceptable increase?

Flags: needinfo?(catlee)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 3

•

9 years ago

heh, I am not the person to drive that.  this is explainable.  I assume this is desktop only?  

:vladan, any thoughts on how to determine if this is acceptable to close as wontfix, or should we push back on this?

Flags: needinfo?(vladan.bugzilla)

Mike Hommey [:glandium]

Comment 4

•

9 years ago

The corollary is that it makes the installed directory larger too.

Vladan Djeric (:vladan)

Comment 5

•

9 years ago

This is an experiment running on Nightly channel only. We can't compare all the pros & cons until we've analyzed the startup data from Telemetry. It's fine to leave it on Nightly for now

Flags: needinfo?(vladan.bugzilla)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 6

•

9 years ago

great, lets leave it alone.

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 7

•

9 years ago

do we plan to leave this on for nightly and then uplift to aurora?  I only ask as when we look at regressions that make it to aurora, this could make it difficult for Main RSS regressions.  Luckily we don't have too many Main RSS regressions, so this might be fine.

Chris AtLee [:catlee]

Comment 8

•

9 years ago

The change is behind a 'NIGHTLY_BUILD' guard, which means it should be restricted to nightly only until we decide to allow it to ride the trains or to back it out.

Chris AtLee [:catlee]

Comment 9

•

8 years ago

Looking at telemetry, I don't see any impact to first paint times, which is what we were primarily concerned about: http://mzl.la/1mSFSPG

Ryan VanderMeulen [:RyanVM]

Comment 10

•

8 years ago

Avi, can you please weigh in about the trade-offs here?

Flags: needinfo?(avihpit)

Avi Halachmi (:avih)

Comment 11

•

8 years ago

I'll fist summarize what we know (correct me if I'm wrong):

1. Bug 1231379 disables omni.ja compression with the main goal of improving overall compression ratio and reducing update/download sizes and hence speed and network load/cost. I'm assuming this goal was achieved.

2. Turns out talos reports this also speeds up ts_paint (first paint - once) somewhat: ~10ms with e10s and ~50ms without e10s ~ 4%. Not negligible but not much to write home about either.

3. Turns out talos also detects Firefox now uses more memory by roughly 8M (~4%).

4. Telemetry data doesn't confirm the reduced ts_first paint durations.

5. We should keep in mind that the reported memory increase in percentage is only on startup and likely doesn't increase throughout the session, so these ~8M of additional memory will be lower percentage once the session is in "full throttle" with several tabs etc.

6. While not negligible, the wins here (~10% faster download/update) and possibly few ms faster startup times are not experienced during the actual session. Personally, I consider those with much lower importance for users compared to the performance during the duration of the session.

Let's leave the possible improvement which we can't really confirm now.

So we have a known win in download/update sizes and it costs us an extra ~8M at runtime. These are the main things we should weight IMO.

Personally, I think the cost is higher than I'd be comfortable with. But OTOH I don't know for instance how much it saves mozilla in CDN cost etc. As a user, I don't care and won't notice few ms improvement in startup times and some seconds during updates.

IMO, we should first confirm we understand how not compressing omni.ja ends up using more memory at runtime.

Once we do, we should consider if there are some low hanging fruits there. For instance, maybe the memory is dropped later at some stage and there's no need for any fix? Maybe it's cached with Firefox and we should/can leave it up to the OS to manage FS cache? etc.

Lastly, we should probably hear opinions from few people other than myself about how to weight these tradeoffs. It's not a black and white thing, so few more opinions here would help IMO.

Flags: needinfo?(avihpit)

Ryan VanderMeulen [:RyanVM]

Comment 12

•

8 years ago

It sounds like the trade-offs involved may not be worth it. Chris, do you want to investigate this further and see if there are mitigations we can take to make this more palatable? Or did you want second opinions from anyone? Or did you want to just go ahead and revert?

Flags: needinfo?(catlee)

Chris AtLee [:catlee]

Comment 13

•

8 years ago

I'd like to understand the impact to memory usage in more depth. e.g. is this just on startup, or does it persist over time?

I estimate the cost savings to be > $100k / yr, which is significant (IMO).

Flags: needinfo?(catlee)

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Comment 14

•

8 years ago

if this is a tp5o regression, this is recording memory over the duration of the entire test and taking the median value.  This means a 5+ minute browser session is getting a couple hundred data points of memory during the run.  I think given that definition, this is not just startup but overall browser RSS.

I a curious how this could save money, not that it is directly related to this bug, but it would be helpful for me to understand more of the impact of this patch!

Ryan VanderMeulen [:RyanVM]

Comment 15

•

8 years ago

Is this a wontfix at this point?

status-firefox47: --- → wontfix

status-firefox48: --- → affected

status-firefox49: --- → affected

status-firefox50: --- → affected

Version: unspecified → Trunk

Joel Maher ( :jmaher ) (UTC -8)

Reporter

Updated

•

8 years ago

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → WONTFIX

Nobody; OK to take it and work on it

Assignee

Updated

•

6 years ago

Component: Build Config → General

Product: Firefox → Firefox Build System

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

2-5% Linux 64/Win* tp5o Main RSS on Mozilla-Inbound on Dec 16, 2015 from push 3e08a6a2299b

Categories

(Firefox Build System :: General, defect)

Tracking

(firefox47 wontfix, firefox48 affected, firefox49 affected, firefox50 affected)

People

(Reporter: jmaher, Unassigned)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression])

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Updated

Updated