Closed Bug 1177786 Opened 5 years ago Closed 3 months ago

Reduce the peak number of windows in browser chrome and devtools mochitests

Categories

(Testing :: Mochitest, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: RyanVM, Assigned: mccr8)

References

Details

Attachments

(1 file)

Today in our intermittent orange discussion, we discussed forcing a GC/CC between every test run in order to possibly improve test isolation and reduce timeouts caused by infrequent garbage collection.
Reminds me of bug 949607.
For the record, this came up in conversation as we've noticed mochitest-bc runs with 500+ active DOMWINDOWs even with run-by-dir enabled. Strongly suspect this is a contributing factor to the frequent timeouts we see on the single-core AWS instances we use.
We already poke the collectors and do some work on them in between every test:

http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/tests/SimpleTest/iframe-between-tests.html?force=1#14

kPokesBetweenExpensiveCollectorTriggers could be decreased so we collect more, though this will cause tests to be slower.

(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #2)
> For the record, this came up in conversation as we've noticed mochitest-bc
> runs with 500+ active DOMWINDOWs even with run-by-dir enabled. Strongly
> suspect this is a contributing factor to the frequent timeouts we see on the
> single-core AWS instances we use.

That sounds like a problem with those tests that should probably be fixed in those tests. Which ones are they?
It is not okay to just force a GC and CC because then we're not actually testing incremental GC or CC on TreeHerder.
Yes, if there's a problem here, we need to fix the GC/CC heuristics.
(In reply to Andrew McCreight [:mccr8] from comment #3)
> (In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #2)
> > For the record, this came up in conversation as we've noticed mochitest-bc
> > runs with 500+ active DOMWINDOWs even with run-by-dir enabled. Strongly
> > suspect this is a contributing factor to the frequent timeouts we see on the
> > single-core AWS instances we use.
> 
> That sounds like a problem with those tests that should probably be fixed in
> those tests. Which ones are they?

I don't think we have tests that open that many windows, but I expect we do have cases where the windows stick around for a "long" time and don't get GC'd (for whatever reason) causing the buildup.
Routinely runs in the 200+ range, peaking at ~290.
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux-debug/1435708545/mozilla-central_ubuntu32_vm-debug_test-mochitest-browser-chrome-1-bm03-tests1-linux32-build66.txt.gz
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux-debug/1435708545/mozilla-central_ubuntu32_vm-debug_test-mochitest-browser-chrome-2-bm01-tests1-linux32-build63.txt.gz

Peaks in the 300+ range.
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64-debug/1435708545/mozilla-central_ubuntu64_vm-debug_test-mochitest-browser-chrome-1-bm116-tests1-linux64-build5.txt.gz
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64-debug/1435708545/mozilla-central_ubuntu64_vm-debug_test-mochitest-browser-chrome-3-bm123-tests1-linux64-build1.txt.gz
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-macosx64-debug/1435708545/mozilla-central_snowleopard-debug_test-mochitest-browser-chrome-2-bm108-tests1-macosx-build199.txt.gz

Over 480.
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win64-debug/1435708545/mozilla-central_win8_64-debug_test-mochitest-browser-chrome-1-bm111-tests1-windows-build60.txt.gz

Over 400.
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win64-debug/1435708545/mozilla-central_win8_64-debug_test-mochitest-browser-chrome-2-bm111-tests1-windows-build78.txt.gz
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win64-debug/1435708545/mozilla-central_win8_64-debug_test-mochitest-browser-chrome-3-bm119-tests1-windows-build121.txt.gz
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-macosx64-debug/1435708545/mozilla-central_snowleopard-debug_test-mochitest-browser-chrome-3-bm107-tests1-macosx-build225.txt.gz

Almost 400.
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-macosx64-debug/1435708545/mozilla-central_snowleopard-debug_test-mochitest-browser-chrome-1-bm107-tests1-macosx-build208.txt.gz

Sessionstore tests tend to be particularly bad actors, FWIW. Sorry I couldn't find a 500+ instance offhand, but I only spent 5-10min going through logs.
Then it occurred to me that I hadn't looked at any devtools logs. Literally the first one I opened:
http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-linux64-debug/1435708545/mozilla-central_ubuntu64_vm-debug_test-mochitest-devtools-chrome-1-bm51-tests1-linux64-build5.txt.gz

18:27:41     INFO -  ++DOMWINDOW == 785 (0x7f5bbb3d9000) [pid = 2273] [serial = 1495] [outer = 0x7f5bb4aed800]
An issue is of course that GC/CC triggering heuristics need to be tuned for real world usage - so don't run all the time, but while running tests we do unusual stuff, like open tons of new pages and windows all the time.
(In reply to Olli Pettay [:smaug] from comment #9)
> An issue is of course that GC/CC triggering heuristics need to be tuned for
> real world usage - so don't run all the time, but while running tests we do
> unusual stuff, like open tons of new pages and windows all the time.

How does this jive with comments 4 and 5? If tests aren't a real-world scenario, then isn't trying to test incremental GC/CC nebulous?
I think we need to trigger GC/CC explicitly if there are tons of Window objects alive while testing, and in other cases rely on the incremental GC/CC.

mccr8, do you recall if we have some easy way in mochitest framework to know how many windows are alive?
Flags: needinfo?(continuation)
If we don't currently have that I'm sure we could figure out a way to expose nsGlobalWindow's gRefCnt via nsIDebug2.
(In reply to Olli Pettay [:smaug] from comment #11)
> mccr8, do you recall if we have some easy way in mochitest framework to know
> how many windows are alive?

As Ted said, this is already tracked in the DOM. A heuristic based on the number of windows that are created, rather than that are alive, might work okay without doing weird stuff to tab hoarders.
Flags: needinfo?(continuation)
I was reminded of this bug today while looking at a log with >520 active DOMWINDOWs. Can we proceed with comment 13 if everyone is satisfied with that approach?

Jonathan, given comment 12, it sounds like the platform already exposes the necessary info to the harness. Is there anybody available to look into creating such a heuristic?
Flags: needinfo?(jgriffin)
Since this heuristic would affect shipping code, I'm not sure anyone on the a-team would be in the best position to do this.
Flags: needinfo?(jgriffin)
Maybe I misunderstood comment 13. It sounded to me like the harness was going to be monitoring the number of active DOMWINDOWs and triggering a GC/CC if it went over a certain threshold.
Since comment #13 mentions tab hoarders, I thought we were talking about changing the GC/CC heuristic in general.  Andrew, can you clarify?
Flags: needinfo?(continuation)
Sorry I haven't replied here yet. I have been looking into this off and on. I wrote some scripts that let you look at the number of windows alive at each point in handy second-by-second chart form, or figure out which tests have the most windows opened at once, or open the most windows.

Something is definitely wacky with some of these tests. Something seems to be going horribly wrong during the markup view devtools tests. This is where we hit a peak of 700 windows. If you look at the number of windows that are open, it mostly just keeps going up and up. It does drop a small amount a few times, but mostly the windows stick around until (I think) we shut down.

Here's one example.

We start the test:
09:32:09     INFO -  2921 INFO TEST-START | browser/devtools/markupview/test/browser_markupview_node_not_displayed_01.js
A few seconds later we create this window:
09:32:11     INFO -  ++DOMWINDOW == 183 (0x7fe2db5ba400) [pid = 25884] [serial = 1773] [outer = (nil)]
A few seconds later we finish the test:
09:32:13     INFO -  2942 INFO TEST-OK | browser/devtools/markupview/test/browser_markupview_node_not_displayed_01.js | took 4648ms
Then, _four minutes later_, when we're shutting down, we finally destroy that window:
09:36:17     INFO -  --DOMWINDOW == 415 (0x7fe2db5ba400) [pid = 25884] [serial = 1773] [outer = (nil)] [url = chrome://browser/content/devtools/markup-view.xhtml]

I'll try to figure out if this is because we're doing a really bad job at scheduling the GC and CC or if this is some kind of leak-until-shutdown.
Assignee: nobody → continuation
Summary: Force a CC/GC between all test runs during mochitests → Reduce the peak number of windows in browser chrome and devtools mochitests
Here's a link to my mochitest log analysis script:
https://github.com/amccreight/mochitest-logs/blob/master/bloatfinder.py

It is fairly simple and you have to edit the script to get it to produce different output, but it is pretty good for getting a basic handle on where the problem areas in a run are.
Flags: needinfo?(continuation)
Here's some example output, though it isn't super useful without the original log. Each asterisk represents 10 windows open at a given second.
I tried running with AllTraces (which slows down the CC a lot) to figure out why the windows were alive, and that resulted in the number of live windows at the start of browser_markupview_events_jquery_1.6.js dropping from 543 to 119, which suggests that the problem is that we're not doing enough collection work. (Collection work is largely timer-based, so if tests run slower, we'll do more collection work per test.)
It looks like we're not calling into the runNextCollectorTimer() function I added in bug 949607 in browser Mochitests. I added a call to it for every time we produce a test result (just doing it at the end of a test didn't help too much) and that reduced the number of live windows on some test from around 500 to around 280. I'll see what that looks like in a full run. Hopefully there's not too much slowdown from the additional collecting.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d666188021d0
That didn't seem to reduce the number of windows on try. Maybe I should just add some manual forced collections in the few directories that are a problem.
Blocks: 1239884

This is probably still an issue, but this bug never went anywhere.

Status: NEW → RESOLVED
Closed: 3 months ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.