Closed Bug 864085 Opened 11 years ago Closed 11 years ago

Investigate and fix the enormous increase in time to run Windows debug browser-chrome

Categories

(Testing :: Mochitest, defect)

x86
Windows 7
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED FIXED
mozilla23

People

(Reporter: philor, Assigned: terrence)

References

Details

(Whiteboard: [buildfaster:?][capacity])

Attachments

(4 files)

https://tbpl.mozilla.org/php/getParsedLog.php?id=22025306&tree=Mozilla-Central was a Windows 7 debug browser-chrome run which was killed by buildbot because it exceeded the 120 minute time limit it places on any buildstep.

Typical Win7 debug b-c times:

esr17: 60 minutes
b2g18: 75 minutes
m-r19: 85 minutes
m-r20: 95 minutes
m-b21: 103 minutes
m-a22: 101 minutes
m-c23: 113 minutes
m-c24: gonna be killed by buildbot at 120 minutes

Neither the time increases on other platforms (10.6 debug went from 51 minutes on esr17 to 78 minutes on trunk, Win7 opt went from 29 minutes to 30 minutes) nor the increase in number of tests (around 1.2 times as many on trunk as on esr17) fully explain that jump.

My first thought while staring at that log was a massive increase in assertions, but removing the "Shouldn't be trying to restyle non-elements directly" assertion we hit a billion times on trunk and zero on esr17 and pushing to try gave me a 113 minute Win7 debug run, not even a minute below typical.

It's hard to guess with the presence of that odd decrease for 22, but the smart bet is that we have less than 6 weeks to fix this (or do something gross and unworthy like split browser-chrome into two chunks, even though that would be two wasteful 15 minute chunks for most opt platforms, or increase the timeout so that we can take *more than two hours* to run them).
(Those times are the complete run time, since that's easier to get from tbpl without opening logs, while the timeout is just for the runtests step, but setup/teardown time is nothing - the "114 minute" run I'm looking at took 113 minutes 39 seconds in the runtests step, since mozharness pushes almost everything into that step.)
Whiteboard: [buildfaster:?][capacity]
https://tbpl.mozilla.org/php/getParsedLog.php?id=22145910&tree=Mozilla-Inbound

Looks like the apocalypse will be about... now.
It's not an actual fix, but we have the capability of setting a different max time limit for this test, if that helps.
Yeah, I even thought of an excuse to partly make ourselves not feel as bad about doing so, since part of the 17 -> 18 increase is because mozharness moved some setup (maybe 5 minutes worth) into the same buildstep as running the tests.
I think a prerequisite to figuring anything out here would be to scrape buildbot data for the test run times and try to figure out when it regressed.
https://tbpl.mozilla.org/php/getParsedLog.php?id=22209163&tree=Mozilla-Inbound (so close, printed the first line of the summary so it probably had under 30 seconds to go)
Depends on: 865549
This will apply to both opt and debug; is that ok?
The debug-only patch will be considerably larger.
Attachment #741972 - Flags: review?(philringnalda)
Comment on attachment 741972 [details] [diff] [review]
increase browser-chrome script_maxtime to 9000 seconds

Over 9000?! :)

Not being debug-only worries me less than not being Windows-only - that'll make the currently not-too-frequent bug 857127 cost us another half hour of slave time when it hits. Personally, I'd just edge it up by 5 minutes (it's possible the first instance of hitting this was more like 6 minutes from finishing, but all the rest have been within seconds of being done), but if you'd rather not have to keep tweaking it as we get worse, until we either fix some of the slowdown or switch to the new slaves, that's fine too.
Attachment #741972 - Flags: review?(philringnalda) → review+
Comment on attachment 741972 [details] [diff] [review]
increase browser-chrome script_maxtime to 9000 seconds

http://hg.mozilla.org/build/buildbot-configs/rev/7af192410922

Yeah, it's purely a stopgap workaround.  If the hung script problem gets worse we can lower this again.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #6)
> I think a prerequisite to figuring anything out here would be to scrape
> buildbot data for the test run times and try to figure out when it regressed.

Here is some data for the historical build time of the mozilla-central_xp-debug_test-mochitest-browser-chrome builder. Make of it what you will.
I was curious about philor's comment, so I looked at which tests were the most expensive on Win debug.  This attachment is that list.

TL;DR: we should try figuring out why the devtools/debugger, devtools/webconsole, and mozapps/extensions tests are so extraordinarily expensive.  Those test directories account for about 40% of the test time with about 20% of the test scripts.

I don't think it's something inherent to the tests (e.g. the most expensive test takes about 1% of the time in an opt build vs. the time it takes in a debug build) necessarily--there might be some really expensive debug checks the tests are triggering.
I checked to see if this is the same thing as bug 846890, but it would appear not to be: I disabled the file checking stuff, and it was just as awfully slow, and that code is only hit about half a dozen times.
Running the tests in browser/devtools/debugger/ under perf on a Linux box turns up the following profile:

+   2.98%  firefox-bin  libxul.so                    [.] js::gc::Cell::isTenured() const
+   2.47%  firefox-bin  libxul.so                    [.] js::gc::Cell::tenuredZone() const
+   2.10%  firefox-bin  libplds4.so                  [.] PL_HashString
+   1.98%  firefox-bin  libxul.so                    [.] js::GCMarker::processMarkStackTop(js::SliceBudget&)
+   1.93%  firefox-bin  libpthread-2.11.3.so         [.] __pthread_mutex_unlock_usercnt
+   1.92%  firefox-bin  libpthread-2.11.3.so         [.] pthread_mutex_lock
+   1.71%  firefox-bin  libplds4.so                  [.] PL_HashTableRawLookup
+   1.63%  firefox-bin  libpthread-2.11.3.so         [.] pthread_mutex_trylock
+   1.47%  firefox-bin  libc-2.11.3.so               [.] __strcmp_sse42
+   1.41%  firefox-bin  libxul.so                    [.] js::Interpret(JSContext*, js::StackFrame*, js::InterpMode, bool)
+   1.26%  firefox-bin  libxul.so                    [.] js::gc::Cell::markIfUnmarked(unsigned int) const
+   1.20%  firefox-bin  libxul.so                    [.] void MarkInternal<JSAtom>(JSTracer*, JSAtom**)
+   1.19%  firefox-bin  libxul.so                    [.] JSCompartment::sweepBreakpoints(js::FreeOp*)
+   1.04%  firefox-bin  libpthread-2.11.3.so         [.] pthread_getspecific
+   0.98%  firefox-bin  libxul.so                    [.] GetBloatEntry(char const*, unsigned int)
+   0.95%  firefox-bin  libnspr4.so                  [.] PR_Unlock
+   0.95%  firefox-bin  firefox-bin                  [.] arena_malloc
+   0.92%  firefox-bin  libxul.so                    [.] js::ReadBarrieredValue::get() const
+   0.91%  firefox-bin  firefox-bin                  [.] arena_dalloc
+   0.87%  firefox-bin  libxul.so                    [.] js::ObjectImpl::lastProperty() const
+   0.82%  firefox-bin  libxul.so                    [.] js::gc::Cell::arenaHeader() const
+   0.80%  firefox-bin  libxul.so                    [.] js::UncheckedUnwrap(JSObject*, bool, unsigned int*)
+   0.68%  firefox-bin  libxul.so                    [.] SearchTable(PLDHashTable*, void const*, unsigned int, PLDHashOperator)
+   0.66%  firefox-bin  libnspr4.so                  [.] PR_GetCurrentThread
+   0.63%  firefox-bin  libxul.so                    [.] void MarkInternal<JSObject>(JSTracer*, JSObject**)
+   0.63%  firefox-bin  libxul.so                    [.] js::gc::Cell::isMarked(unsigned int) const
+   0.62%  firefox-bin  libxul.so                    [.] js::gc::FreeSpan::checkSpan() const

Around 25-30% of the runtime for all these tests is being consumed in GC; the #ifdef DEBUG checks are apparently killing us here.  Bill, is that expected?
Flags: needinfo?(wmccloskey)
Terrence might also have some idea.
The isTenured stuff is fairly recently, so that can't be the whole story.

At one point, the debugger would trigger various nasty GCs, but I don't recall if that's still the case.
(In reply to Andrew McCreight [:mccr8] from comment #17)
> The isTenured stuff is fairly recently, so that can't be the whole story.
> 
> At one point, the debugger would trigger various nasty GCs, but I don't
> recall if that's still the case.

I hate perf's interface for viewing things, but if I'm understanding it correctly, it looks like the GCs are getting triggered under js::Debugger::addDebuggee.  The callstacks for isTenured look something like:

js::Debugger::addDebuggee
IncrementalCollectSlice
js::GCMarker::drainMarkStack
js::GCMarker::processMarkStack

and then various bits under that.
(In reply to Andrew McCreight [:mccr8] from comment #17)
> The isTenured stuff is fairly recently, so that can't be the whole story.

IsTenured is particularly bad because the call is not inlinable in a path that is otherwise fully inlined. It would be nice to know which exact usage is the costly one here. Regarding inlining, we recently changed the semantics of JS_ALWAYS_INLINE; this may also have had an impact.

> At one point, the debugger would trigger various nasty GCs, but I don't
> recall if that's still the case.

I'm not aware of anything changing there recently.

(In reply to Nathan Froyd (:froydnj) from comment #18)
> I hate perf's interface for viewing things, 

Likewise.

> but if I'm understanding it
> correctly, it looks like the GCs are getting triggered under
> js::Debugger::addDebuggee.  The callstacks for isTenured look something like:
> 
> js::Debugger::addDebuggee
> IncrementalCollectSlice
> js::GCMarker::drainMarkStack
> js::GCMarker::processMarkStack
> 
> and then various bits under that.

Well, the isTenured assertion should fire any time that we have to access the arena header: only tenured objects have one, so this is pretty important. This access typically happens at the leaves, however: can we get a full stack to see which particular callsite(s) are making the assertion?
I doubt this has anything to do with isTenured. It just happens to show up in the profile because it's not inlined.

I'll look into this later today. Do I read comment 13 right: the debugger tests run 100x slower in debug builds than in opt builds?
Flags: needinfo?(wmccloskey)
(In reply to Bill McCloskey (:billm) from comment #20)
> I'll look into this later today. Do I read comment 13 right: the debugger
> tests run 100x slower in debug builds than in opt builds?

They do run slower.  But I see now I scrambled my numbers when I was doing the calculations for how much slower they ran.  100x would be a pretty big deal!

It looks like individual tests are around 8x slower in debug builds than in opt builds, just from glancing at the top several offenders in the list I attached.
I gathered some data to look at the GC issue in more detail. During a normal Win7 debug build (that had a few weird errors that hopefully don't affect these results), we spent 2041 seconds in GC, out of a total of 7205 seconds. That's 28%. I also tried to turn off two forms of debug checks. These are the numbers:

                               In GC (s)  Total (s)  Pct in GC
normal build                      2041      7205       28%
no incremental validation         1644      6630       25%
no DEBUG checks during marking     981      5751       17%

(Removing the DEBUG checks also speeds up allocation, if you're wondering.)

The try builds are here if anyone is interested:
normal (with printfs): https://tbpl.mozilla.org/?tree=Try&rev=0e8e57c87159
no validation: https://tbpl.mozilla.org/?tree=Try&rev=cce0664b5015
no DEBUG checks: https://tbpl.mozilla.org/?tree=Try&rev=0568f161e182

The cost of the incremental validation doesn't seem too extreme to me. The DEBUG checks are more expensive than I would have expected. I'm going to do a few more builds to try to isolate the cost of each kind of check.

I also looked at why we're doing so much garbage collection. Here's a breakdown of the GC reasons for non-incremental GCs, and how often a GC is invoked for that reason:

{'LAST_CONTEXT': 52, 'TOO_MUCH_MALLOC': 2, 'MEM_PRESSURE': 1, 'DOM_WORKER': 539, 'SHUTDOWN_CC': 2, 'DESTROY_CONTEXT': 52, 'DOM_UTILS': 51, 'COMPONENT_UTILS': 1707, 'API': 1, 'SET_NEW_DOCUMENT': 1, 'ALLOC_TRIGGER': 1, 'DEBUG_MODE_GC': 415}

I also looked at the reason why we initiate incremental GCs:

{'CC_WAITING': 61, 'FULL_GC_TIMER': 15, 'PAGE_HIDE': 251, 'MAYBEGC': 1, 'SET_NEW_DOCUMENT': 180}

The incremental ones are pretty typical, and there aren't very many of them. The outliers are these:

COMPONENT_UTILS: 1707
DOM_WORKER: 539
DEBUG_MODE_GC: 415

The DOM_WORKER ones happen on worker threads, so they may not affect the total running time very much. The DEBUG_MODE_GCs happen whenever we turn the debugger on or off, so I guess we should expect a lot of them. The COMPONENT_UTILS GCs seem pretty extreme. I'm going to try to find out if we can eliminate some or all of them. They account for 51% of all GCs.
That's pretty weird.  I guess people just sprinkle forceGC calls all over their tests?  I know there have been a few intermittent oranges that people try to force into a particular test by forcing GC.
Ah, it looks like checkForLeakedGlobalWindows schedules a precise GC, so maybe this is a regression from the work in bug 658738 to detect leaking windows?
There aren't many forceGC in browser-chrome tests, at first glance...
On the other hand, our HTTP server does a forceGC every time a connection closes.  :(
It looks like bz is right. The COMPONENT_UTILS GCs are from the HTTP server, which is in another process. They just happen to get printed to the log with everything else. It looks like they get run after the connection is closed, so they probably don't affect the total running time of the tests much (assuming the test slaves have multiple cores). Strangely, the relevant bug (bug 508125) is inaccessible to me. I hope I'm not about to get fired or something.
The real bug is bug 508128, the commit message just had a typo.
(In reply to Andrew McCreight [:mccr8] from comment #24)
> Ah, it looks like checkForLeakedGlobalWindows schedules a precise GC, so
> maybe this is a regression from the work in bug 658738 to detect leaking
> windows?

It would certainly be interesting to compare test run times without the leak detection.
How often does the leak detection run? I thought it was only at the end of the test run.

From bug 508128 (thanks Gavin), it sounds like we didn't used to have a GC trigger based on the number of bytes allocated. We do now. It's possible we could just remove that GC. I'll test. I'm still not sure if that would help, though.
It runs after each individual test, because it wants to blame individual tests for leaking things. So on the order of 1000+ times per "bc" run.
(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #31)
> It runs after each individual test, because it wants to blame individual
> tests for leaking things. So on the order of 1000+ times per "bc" run.

It looks like it only does the leak checking when the done flag is set:
  http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/browser-test.js#310
I think that only happens at the end?
((In reply to Bill McCloskey (:billm) from comment #32)
> I think that only happens at the end?

Yeah, looks like my recollection of how this works was wrong.
OK, I was wrong. It looks like the isTenured thing is a big problem. If I eliminate that check, we go from ~120 mins to ~105 mins. None of the other changes I tried had any effect. Terrence, can you try inlining that path?
Also, I tried removing the forceGC call from httpd.js. Doing so didn't cause the test to go orange, but it also didn't improve the running time at all. We should probably leave it alone.
Do we have any theories as to why this
a) is worse on windows
b) regressed significantly in recent history
Well, (b) at least is my fault. I'll work on fixing that.
Assignee: nobody → terrence
Attached patch v0Splinter Review
Attachment #744852 - Flags: review?(wmccloskey)
I just noticed that the running time for this test dropped a lot recently, from ~120 mins to ~100 mins. It seems to have happened between these checkins:

120 mins: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=ba3ce7dde51f
100 mins: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=e13bb42d2872

Ted, is there anything in your checkin that we would expect would make builds go faster? Only Windows debug builds seem to be affected. Other tests also seemed to get faster too.
Flags: needinfo?(ted)
Attachment #744852 - Flags: review?(wmccloskey) → review+
This is completely unexpected. I talked to billm on IRC, the only thing I can imagine is that that patch that adds -RELEASE to LDFLAGS does something. He's going to try backing it out on Try and see what happens. My only theory is that maybe it makes loading symbols for when we get assertion stacks faster somehow? I know WinDBG complains about the missing checksum when it loads symbols, so maybe that causes a slowdown somehow?
Flags: needinfo?(ted)
Oh, I forgot to say, Ted's patch in bug 431048 is what caused the running time to go down so much. It will be interesting to see if Terrence's patch gets us down even further.
(In reply to Terrence Cole [:terrence] from comment #41)
> Green try at:
> https://tbpl.mozilla.org/?tree=Try&rev=2090e7142df4
> 
> Pushed at:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/75699d465ba0

From spot checking TBPL, it looks like this helped us save about 10 minutes on XP and 7 and 5 minutes on win8. Nice job. :-)
https://hg.mozilla.org/mozilla-central/rev/75699d465ba0
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla23
Component: BrowserTest → Mochitest
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: