2.04 - 2.71% tp5o Main_RSS / tp5o Private Bytes (windows7-32, windowsxp) regression on push 6861f02dadfa1863af35086da6133ebc1ceaf845 (Thu Oct 20 2016)

RESOLVED FIXED in Firefox 52

Status

()

defect
P2
normal
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: ashiue, Assigned: heycam)

Tracking

({perf, regression, talos-regression})

52 Branch
mozilla52
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox49 unaffected, firefox50 unaffected, firefox51 unaffected, firefox52 fixed)

Details

Attachments

(1 attachment)

Talos has detected a Firefox performance regression from push 6861f02dadfa1863af35086da6133ebc1ceaf845. As author of one of the patches included in that push, we need your help to address this regression.

Regressions:

  3%  tp5o Main_RSS windows7-32 opt          174067772.61 -> 178784009.84
  2%  tp5o Main_RSS windows7-32 pgo          172722669.81 -> 176792432.35
  2%  tp5o Main_RSS windowsxp pgo            215522455.67 -> 220149350.72
  2%  tp5o Private Bytes windowsxp pgo       199309986.19 -> 203401948.8
  2%  tp5o Private Bytes windows7-32 opt     207135699.25 -> 211356189.02


You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=3748

On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests

For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running

*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***

Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
After doing some retriggers, this issue might be caused by some of the following patches:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=99bf3401b43f4f621c91facce91f729688941317&tochange=6861f02dadfa1863af35086da6133ebc1ceaf845

Hi Cameron, as you are the patch author, can you take a look at this and determine what is the root cause? Thanks!
Blocks: 1302124, 1288302
Flags: needinfo?(cam)
It is pretty likely a regression from bug 1288302 because that adds an extra level for all computed image values, which takes about 5 words or so. Probably we can use some technique to reduce that overhead... Either that or we should remove this extra level and store things in imgRequestProxy directly?
As you say, an nsStyleImageRequest takes 5 words.  But if that is the only overhead, then we would need to have ~200,000 nsStyleImageRequest objects alive at once to make up the difference in memory usage (4 MB) mentioned in comment 0.  That seems very high.  Could we be holding on to other objects too long?
Flags: needinfo?(cam)
Maybe you can try releasing the urls etc once resolved?

I guess the object itself could take more than 5 words from the allocator, though.
It looks like the imgRequestProxy objects are leaking.  I don't know why our existing object leak logging didn't pick this up. :(
Assignee: nobody → cam
Status: NEW → ASSIGNED
Component: Untriaged → CSS Parsing and Computation
Product: Firefox → Core
Priority: -- → P2
The objects aren't leaking; it's just that a bunch of them are only freed during shutdown.

One thing I didn't realize while working on bug 1288302 is that imgRequestProxy objects can provide a thread-safe object for URL comparisons (an ImageURL), which means we don't need to hold on to a css::ImageValue object.  Changing to compare the ImageURL rather than holding on to the ImageValue seems to solve the talos regression, although I am not entirely sure why.  (It might be something to do with ending up deleting imgRequestProxy objects later than we currently do.)

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=49a5fc1d1d13&newProject=try&newRevision=8bf4b7be86cf&framework=1&showOnlyImportant=0
Of course, we only have an ImageURL once we have an imgRequestProxy, so we still need to store something different to handle equality comparisons done before we resolve the nsStyleImageRequest.  So this patch grabs out the base nsIURI and the nsStringBuffer for the relative URL from the css::ImageValue, and stores them rather than the ImageValue.
Comment on attachment 8805441 [details]
Bug 1311921 - Store base and relative URIs explicitly in nsStyleImageRequests for comparison purposes, rather than use css::ImageValues.

https://reviewboard.mozilla.org/r/89190/#review88604
Attachment #8805441 - Flags: review?(bobbyholley) → review+
Pushed by cmccormack@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/288d92c34790
Store base and relative URIs explicitly in nsStyleImageRequests for comparison purposes, rather than use css::ImageValues. r=bholley
https://hg.mozilla.org/mozilla-central/rev/288d92c34790
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
unfortunately none of the tests affected here changed.  Possibly we just have to live with this?
Reopening this, given the regression is not fixed.

Even if we have to live with this, we may want to backout the patch of this bug given it doesn't seem to be effective.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
FWIW, https://areweslimyet.com/ also shows a huge regression, if you look at "Images: After TP5 [+30s]" on the last graph. It jumped from 23 MB to 65 MB.

Pushlog is https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=313dee79ed92&tochange=6861f02dadfa
Flags: needinfo?(cam)
OK, and the last data point on that graph seems to include the patch from this bug.  I'm going to back out this and bug 1288302.
Flags: needinfo?(cam)
:heycam, I wanted to followup on this bug, not sure if we are going to do more work here or accept these perf regressions as wontfix.
Flags: needinfo?(cam)
The original changes in bug 1288302 got backed out, then re-landed with a fix.  The attempted fix in this bug also got backed out, though I guess the merge to mozilla-central didn't get mentioned here automatically for some reason:

https://hg.mozilla.org/mozilla-central/rev/625e625af80e
Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Flags: needinfo?(cam)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.