Memory spike in endurance tests introduced by landing of testTabbedBrowsing_OpenNewWindow

RESOLVED INVALID

Status

P2
normal
RESOLVED INVALID
6 years ago
6 years ago

People

(Reporter: vladmaniac, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [mozmill-endurance], URL)

(Reporter)

Description

6 years ago
Up to 369MB of resident memory usage for opening new windows, this is a real concern to be raised. 

Report: http://mozmill-ci.blargon7.com/#/endurance/report/d87d47fd1034f072b9bece6ee68a2e06

This is reproducing constantly regardless of platforms. 
Data from report: 

Arguments: Delay (s): 0.1 / Iterations: 10 / Entities: 10 / Restart: Yes
Number of tests: 	13
Number of checkpoints: 	2270 (175 per test)
Explicit memory (MB): 	Minimum: 44 / Maximum: 341 / Average: 98
Resident memory (MB): 	Minimum: 80 / Maximum: 369 / Average: 128
(Reporter)

Updated

6 years ago
Whiteboard: [mozmill-endurance]
CC'ing some peeps from the memshrink team who will definitely be interested in.
Whiteboard: [mozmill-endurance] → [mozmill-endurance][MemShrink]
This isn't a regression, right?  It's just a new test?

What is this test doing, exactly?  How many new windows is it opening?
Note that memory usage seems to go down to its original amount (that's the vertical fall-off in the graph), which makes me think that we're just loading a ton of windows and observing that those windows use memory, which we reclaim once the windows are closed.
(In reply to Justin Lebar [:jlebar] from comment #2)
> This isn't a regression, right?  It's just a new test?
> 
> What is this test doing, exactly?  How many new windows is it opening?

That's correct, this is not a regression, but a rather large spike in the endurance testrun caused by the new test. It was felt the spike was high enough to investigate if there is an issue with excessive memory usage when opening new windows. 10 windows are opened and then closed, and that is being repeated 10 times.

(In reply to Justin Lebar [:jlebar] from comment #3)
> Note that memory usage seems to go down to its original amount (that's the
> vertical fall-off in the graph), which makes me think that we're just
> loading a ton of windows and observing that those windows use memory, which
> we reclaim once the windows are closed.

Also correct. After the open window test, Firefox is restarted and we expect memory usage to drop again. The question here is whether the memory used by opening 10 windows is within expectations.

Whenever the 10 windows are closed, you can see a slight drop in the memory usage, however memory does not really drop back to 'normal' levels until after the test completes and Firefox is restarted.
> Whenever the 10 windows are closed, you can see a slight drop in the memory usage, 
> however memory does not really drop back to 'normal' levels until after the test 
> completes and Firefox is restarted.

So the cliff in [1] is a browser restart?  Are there any other browser restarts in the graph?

I think it's pretty misleading to display benchmark results as a continuous line if they contain browser restarts.  The line implies that there's some relation between adjacent points, but if there's a browser restart in there, that completely breaks the relation, and you really have two separate graphs.

In general, this kind of test is hard to do right.  If you open and close windows very quickly, the GC never has a chance to run, and we get bogus results.  AWSY [2] inserts delays between pageloads to avoid this problem, and also measures memory usage at a number of points (immediately after the test, after waiting 30s, after forcing a GC).  You can read more about the exact methodology on the AWSY FAQ.

It's hard to say whether this is a legitimate platform bug without the kind of detailed breakdown that AWSY gives.

Is there some reason you think that we're using an unreasonable amount of memory here?  How do the results change, for example, if you do the same test but load the pages into new tabs instead of new windows?

[1] http://mozmill-ci.blargon7.com/#/endurance/report/d87d47fd1034f072b9bece6ee68a2e06
[2] http://areweslimyet.com
(In reply to Justin Lebar [:jlebar] from comment #5)
> > Whenever the 10 windows are closed, you can see a slight drop in the memory usage, 
> > however memory does not really drop back to 'normal' levels until after the test 
> > completes and Firefox is restarted.
> 
> So the cliff in [1] is a browser restart?  Are there any other browser
> restarts in the graph?
> 
> I think it's pretty misleading to display benchmark results as a continuous
> line if they contain browser restarts.  The line implies that there's some
> relation between adjacent points, but if there's a browser restart in there,
> that completely breaks the relation, and you really have two separate graphs.

Yes, the endurance tests restart after every test. This wasn't the original design, but several people felt this would be beneficial. I believe one of the reasons was so that test results could be inspected individually, but as you point out this is difficult to do in a single chart.

> In general, this kind of test is hard to do right.  If you open and close
> windows very quickly, the GC never has a chance to run, and we get bogus
> results.  AWSY [2] inserts delays between pageloads to avoid this problem,
> and also measures memory usage at a number of points (immediately after the
> test, after waiting 30s, after forcing a GC).  You can read more about the
> exact methodology on the AWSY FAQ.

We have a configurable delay between each action, which defaults to 100ms. Vlad: could you experiment with increasing this to see if it allows Firefox to GC and prevents the continuous memory increase throughout the test?

> It's hard to say whether this is a legitimate platform bug without the kind
> of detailed breakdown that AWSY gives.
> 
> Is there some reason you think that we're using an unreasonable amount of
> memory here?

Not really, it's just a lot relative to the other tests. This is the only test that opens new windows though. It does mean that a spike in other tests will now be much less noticeable.

> How do the results change, for example, if you do the same
> test but load the pages into new tabs instead of new windows?

The breakdown of the open new tabs test appears immediately before the new window test in the report [1]. Repeated here:

testOpenNewTab
Average explicit memory: 84MB
Average resident memory: 111MB

testOpenAndCloseMultipleWindows
Average explicit memory: 227MB
Average resident memory: 256MB
> I believe one of the reasons was so that test results could be inspected individually, 
> but as you point out this is difficult to do in a single chart.

If it has to be on a single chart, maybe we could insert a break into the line between each test, so it's clear where the border is between two tests.

Although even if we did that, the chart would be pretty misleading because it would imply that the order of the tests was significant.  But if we restart the browser after each test, presumably the order of tests doesn't matter at all.
Assigning Vlad for further investigation as Dave mentioned.
Assignee: nobody → vlad.mozbugs
Status: NEW → ASSIGNED
Priority: -- → P2
(Reporter)

Comment 9

6 years ago
(In reply to Henrik Skupin (:whimboo) from comment #8)
> Assigning Vlad for further investigation as Dave mentioned.

Thanks Henrik.

I've increased the default delay to 2 sec but this is causing our test to disconnect. 
http://mozmill-crowd.blargon7.com/#/endurance/report/d87d47fd1034f072b9bece6ee6a2dbb1

Can anyone suggest a proper delay value bigger than 100ms which is the default setting, so we can indeed wait for GC?
A disconnect should not happen. Please file this as a separate bug. Sounds important to me to fix because we do not know the delay people are using.

Just play around and figure it out. But try with 500ms or 1s. Does that help?
I believe we use a delay of around 30s between page loads in AWSY.
That reminded me that we could even trigger CC/GC for testing purposes on our own as we do in Memchaser:

https://github.com/whimboo/memchaser/blob/master/extension/lib/garbage-collector.js#L90
Blocks: 711360
(Reporter)

Comment 13

6 years ago
I think I owe you guys some follow up on this one but comment 11 really frightens me. I don't think we can afford a 30s delay, that is a lot of time IMO
We should at least be able to test with larger delays to see if this prevents the memory spike. You mentioned a disconnect issue when using a delay of 2 seconds. Have you raised this? It should probably block this bug.
Also see my comment 12 how to force a GC from within our code. I don't see a reason why we have to experiment with larger delays.
We don't want to force a GC when we're testing Firefox in this way. If Firefox fails to GC appropriately then that's a regression, no?
(In reply to Dave Hunt (:davehunt) from comment #16)
> We don't want to force a GC when we're testing Firefox in this way. If
> Firefox fails to GC appropriately then that's a regression, no?

This was meant for testing only!
Thanks for clarifying that Henrik.
(In reply to Dave Hunt (:davehunt) from comment #16)
> We don't want to force a GC when we're testing Firefox in this way. If
> Firefox fails to GC appropriately then that's a regression, no?

It's not a regression unless you can show that it worked properly at some time before now.  It may be a bug, but even that isn't necessarily clear: If Firefox fails to GC when put through a completely un-human testcase -- where we open and close many tabs within just a few seconds -- that doesn't worry me too much.

Like I said, we discovered that we got noisy, strange, unrepresentative results out of AWSY before we added in pauses between pages.  If you guys aren't able to make this test representative of human behavior, and if we know that this kind of test has issues with noise and GC timing, then we should simply drop the test.
We are certainly able to add a delay. Vlad: Can you please take a look at this, or unassign the issue so we can assign it to someone else?
(Reporter)

Comment 21

6 years ago
(In reply to Dave Hunt (:davehunt) from comment #20)
> We are certainly able to add a delay. Vlad: Can you please take a look at
> this, or unassign the issue so we can assign it to someone else?

Dave - it seems like the disconnect I was seeing was due to heaving loading of one of the mini macs in house. I am running tests all over again starting from this time.
With the default 0.1 second delay I see the following averages:
Explicit: 271MB
Resident: 788MB
Report: http://mozmill-crowd.blargon7.com/#/endurance/report/671677a5d9d5ca25f3cf5ae1c40ccd56

With a 1 second delay these drop to:
Explicit: 175MB
Resident: 552MB
Report: http://mozmill-crowd.blargon7.com/#/endurance/report/671677a5d9d5ca25f3cf5ae1c40cd3f2

It's still climbing... I will try adding longer delay.
With a 10 second delay this has dropped to:
Explicit: 133MB
Resident: 326MB
Report: http://mozmill-crowd.blargon7.com/#/endurance/report/671677a5d9d5ca25f3cf5ae1c40cfa46

So I would conclude that there is no issue here with Firefox, but that we do perhaps need to tweak our endurance tests. Perhaps we can set a per-test minimum delay, or we should increase the delay across all tests.
Whiteboard: [mozmill-endurance][MemShrink] → [mozmill-endurance]
Or we always add a pre-defined number of milliseconds to the user defined delay so we always have at minimum 1s or so. Would be good to know that 5s or times between buy us.
Closing as invalid. I have raised bug 788531 to decide what to do with the endurance test delay.
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → INVALID
(Reporter)

Updated

6 years ago
Assignee: vlad.mozbugs → nobody
You need to log in before you can comment on or make changes to this bug.