Closed Bug 762215 Opened 8 years ago Closed 7 years ago

[cpg] Very large memory regression between Firefox 14 and 15 in endurance test runs

Categories

(Core :: JavaScript Engine, defect)

15 Branch
defect
Not set

Tracking

()

RESOLVED DUPLICATE of bug 754267
Tracking Status
firefox14 --- unaffected
firefox15 - wontfix
firefox16 - ---

People

(Reporter: ashughes, Unassigned)

References

Details

(Keywords: regression, Whiteboard: [js:inv:p2])

I've noticed that we've seen a very large memory regression between Firefox 14 and 15. I noticed today when comparing the last Firefox 14 Aurora endurance runs to the first Firefox 15 Aurora testruns.

The regression is lowest on Mac 10.7 (21% increase) and highest on Vista (61% increase). You can see an overview of the results here:
https://wiki.mozilla.org/Releases/Firefox_15/Test_Plan#Endurance_Tests

Looking back at the Firefox 15 Nightly testruns, this appears to have regressed somewhere between May 4th and May 7th:
http://mozmill-ci.blargon7.com/#/endurance/charts?branch=15.0&platform=All&from=2012-05-03&to=2012-05-10
Just testing Nightly first-run footprint locally on win32:

2012-05-04: ~60MB
2012-05-07: ~60MB

I'll now try to do a local endurance testrun on both builds to compare.
CPG landed on 5/4, so that seems like the most likely culprit.
(In reply to Andrew McCreight [:mccr8] from comment #2)
> CPG landed on 5/4, so that seems like the most likely culprit.

Would that be bug 650353? If so, should we morph this bug into a Firefox bug (ie. not a tests bug)?
(In reply to Anthony Hughes, Mozilla QA (irc: ashughes) from comment #3)
> Would that be bug 650353?
Yes, that's the bug.
> If so, should we morph this bug into a Firefox bug (ie. not a tests bug)?
That makes sense.  It is certainly plausible that it is a browser thing.
Assignee: nobody → general
Component: Mozmill Tests → JavaScript Engine
Product: Mozilla QA → Core
QA Contact: mozmill-tests → general
Blocks: 605353
Summary: Very large memory regression between Firefox 14 and 15 → [cpg] Very large memory regression between Firefox 14 and 15
Blocks: cpg
No longer blocks: 605353
Yeah, this is almost certainly CPG.  We saw a 42% regression on Trace Malloc MaxHeap due to it.
Who's going to follow up on this from engineering? I'd like to get this assigned since it's now in the tracked queue. Luke/Bobby?

Thanks!
Keywords: regression
Luke and I are guessing that our testing is at fault here, but this isn't our area of expertise. Also, we're both swamped. Can we get someone from the memshrink team to own this? Maybe njn or johns?
We saw a big 40%+ spike in an old version of the areweslimyet tests, though I believe we concluded that was due to GC scheduling changes, rather than CPG

The issue is that the GC no longer runs regularly during rapid page opening, resulting in a large max resident, and a good deal more fragmentation even after it had been allowed to run. The endurance tests could be hitting this unrealistic edge case, and showing a regression that doesn't arise in normal usage. The current AWSY test has delays between page loads, specifically to simulate a more realistic usage pattern - so it would not trigger the same GC issues.
Hah, I just mid-aired with John, but I guessed the same thing.  It is also possible that our GC scheduling is causing compartments that immediately become garbage to not be collected for long enough for them to pile up and increase max usage.  Note that we are doing a lot more per-compartment GC now so it's possible that, before cpg, continued activity in the same compartment (i.e., same domain, if all the pages we load are in the same domain) caused us to keep GC'ing the same compartment whereas now we keep jumping to new compartments.

Perhaps someone who understands browser GC scheduling could log when GCs are getting scheduled and for what compartment to see if we are getting this big pile up?
I forgot about the previous IGC memory blowup.  It sounds like this (and the Talos test) could just be the same thing we saw on AWSY, where we are bad on memory usage when you rapidly open a billion tabs in a row.  jlebar doesn't see a regression this huge on telemetry, so maybe it is the test after all that needs to be tweaked.
Severity: critical → normal
Summary: [cpg] Very large memory regression between Firefox 14 and 15 → [cpg] Very large memory regression between Firefox 14 and 15 in endurance test runs
Whiteboard: [MemShrink]
Version: unspecified → 15 Branch
Dave Hunt, what do you think could be causing this in the Endurance tests?
Whiteboard: [js:inv:p2]
Isn't this a duplicate of bug 754267? If so, we decided to use the new baseline after CPG landed. Otherwise, it is possible to increase the delay between each iteration/entity in endurance tests, which defaults to 100ms.
(In reply to Dave Hunt (:davehunt) from comment #12)
> Isn't this a duplicate of bug 754267? If so, we decided to use the new
> baseline after CPG landed. Otherwise, it is possible to increase the delay
> between each iteration/entity in endurance tests, which defaults to 100ms.

Seems right.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 754267
[Triage Comment]
Removing tracking since this is a duped bug.
You need to log in before you can comment on or make changes to this bug.