864085 - Investigate and fix the enormous increase in time to run Windows debug browser-chrome

Reporter

Description

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=22025306&tree=Mozilla-Central was a Windows 7 debug browser-chrome run which was killed by buildbot because it exceeded the 120 minute time limit it places on any buildstep.

Typical Win7 debug b-c times:

esr17: 60 minutes
b2g18: 75 minutes
m-r19: 85 minutes
m-r20: 95 minutes
m-b21: 103 minutes
m-a22: 101 minutes
m-c23: 113 minutes
m-c24: gonna be killed by buildbot at 120 minutes

Neither the time increases on other platforms (10.6 debug went from 51 minutes on esr17 to 78 minutes on trunk, Win7 opt went from 29 minutes to 30 minutes) nor the increase in number of tests (around 1.2 times as many on trunk as on esr17) fully explain that jump.

My first thought while staring at that log was a massive increase in assertions, but removing the "Shouldn't be trying to restyle non-elements directly" assertion we hit a billion times on trunk and zero on esr17 and pushing to try gave me a 113 minute Win7 debug run, not even a minute below typical.

It's hard to guess with the presence of that odd decrease for 22, but the smart bet is that we have less than 6 weeks to fix this (or do something gross and unworthy like split browser-chrome into two chunks, even though that would be two wasteful 15 minute chunks for most opt platforms, or increase the timeout so that we can take *more than two hours* to run them).

Phil Ringnalda (:philor)

Reporter

Comment 1

•

11 years ago

(Those times are the complete run time, since that's easier to get from tbpl without opening logs, while the timeout is just for the runtests step, but setup/teardown time is nothing - the "114 minute" run I'm looking at took 113 minutes 39 seconds in the runtests step, since mozharness pushes almost everything into that step.)

Ed Morley [:emorley]

Updated

•

11 years ago

Whiteboard: [buildfaster:?][capacity]

Phil Ringnalda (:philor)

Reporter

Comment 2

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=22129086&tree=Fx-Team

Phil Ringnalda (:philor)

Reporter

Comment 3

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=22145910&tree=Mozilla-Inbound

Looks like the apocalypse will be about... now.

Aki Sasaki (not active)

Comment 4

•

11 years ago

It's not an actual fix, but we have the capability of setting a different max time limit for this test, if that helps.

Phil Ringnalda (:philor)

Reporter

Comment 5

•

11 years ago

Yeah, I even thought of an excuse to partly make ourselves not feel as bad about doing so, since part of the 17 -> 18 increase is because mozharness moved some setup (maybe 5 minutes worth) into the same buildstep as running the tests.

(not currently active) Ted Mielczarek

Comment 6

•

11 years ago

I think a prerequisite to figuring anything out here would be to scrape buildbot data for the test run times and try to figure out when it regressed.

Phil Ringnalda (:philor)

Reporter

Comment 7

•

11 years ago

https://tbpl.mozilla.org/php/getParsedLog.php?id=22209163&tree=Mozilla-Inbound (so close, printed the first line of the summary so it probably had under 30 seconds to go)

Phil Ringnalda (:philor)

Reporter

Updated

•

11 years ago

Depends on: 865549

Aki Sasaki (not active)

Comment 8

•

11 years ago

Attached patch increase browser-chrome script_maxtime to 9000 seconds — Details — Splinter Review

This will apply to both opt and debug; is that ok?
The debug-only patch will be considerably larger.

Attachment #741972 - Flags: review?(philringnalda)

Phil Ringnalda (:philor)

Reporter

Comment 9

•

11 years ago

Comment on attachment 741972 [details] [diff] [review]
increase browser-chrome script_maxtime to 9000 seconds

Over 9000?! :)

Not being debug-only worries me less than not being Windows-only - that'll make the currently not-too-frequent bug 857127 cost us another half hour of slave time when it hits. Personally, I'd just edge it up by 5 minutes (it's possible the first instance of hitting this was more like 6 minutes from finishing, but all the rest have been within seconds of being done), but if you'd rather not have to keep tweaking it as we get worse, until we either fix some of the slowdown or switch to the new slaves, that's fine too.

Attachment #741972 - Flags: review?(philringnalda) → review+

Aki Sasaki (not active)

Comment 10

•

11 years ago

Comment on attachment 741972 [details] [diff] [review]
increase browser-chrome script_maxtime to 9000 seconds

http://hg.mozilla.org/build/buildbot-configs/rev/7af192410922

Yeah, it's purely a stopgap workaround.  If the hung script problem gets worse we can lower this again.

Ryan VanderMeulen [:RyanVM]

Comment 11

•

11 years ago

(In reply to Phil Ringnalda (:philor) from comment #3)
> Looks like the apocalypse will be about... now.

Yep.

https://tbpl.mozilla.org/php/getParsedLog.php?id=22286121&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=22286102&tree=Mozilla-Inbound
https://tbpl.mozilla.org/php/getParsedLog.php?id=22285817&tree=Mozilla-Inbound

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 12

•

11 years ago

Attached file Some historical build time data — Details

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #6)
> I think a prerequisite to figuring anything out here would be to scrape
> buildbot data for the test run times and try to figure out when it regressed.

Here is some data for the historical build time of the mozilla-central_xp-debug_test-mochitest-browser-chrome builder. Make of it what you will.

Nathan Froyd [:froydnj]

Comment 13

•

11 years ago

Attached file browser-chrome test scripts sorted by execution time — Details

I was curious about philor's comment, so I looked at which tests were the most expensive on Win debug.  This attachment is that list.

TL;DR: we should try figuring out why the devtools/debugger, devtools/webconsole, and mozapps/extensions tests are so extraordinarily expensive.  Those test directories account for about 40% of the test time with about 20% of the test scripts.

I don't think it's something inherent to the tests (e.g. the most expensive test takes about 1% of the time in an opt build vs. the time it takes in a debug build) necessarily--there might be some really expensive debug checks the tests are triggering.

Andrew McCreight [:mccr8]

Comment 14

•

11 years ago

I checked to see if this is the same thing as bug 846890, but it would appear not to be: I disabled the file checking stuff, and it was just as awfully slow, and that code is only hit about half a dozen times.

Nathan Froyd [:froydnj]

Comment 15

•

11 years ago

Running the tests in browser/devtools/debugger/ under perf on a Linux box turns up the following profile:

+   2.98%  firefox-bin  libxul.so                    [.] js::gc::Cell::isTenured() const
+   2.47%  firefox-bin  libxul.so                    [.] js::gc::Cell::tenuredZone() const
+   2.10%  firefox-bin  libplds4.so                  [.] PL_HashString
+   1.98%  firefox-bin  libxul.so                    [.] js::GCMarker::processMarkStackTop(js::SliceBudget&)
+   1.93%  firefox-bin  libpthread-2.11.3.so         [.] __pthread_mutex_unlock_usercnt
+   1.92%  firefox-bin  libpthread-2.11.3.so         [.] pthread_mutex_lock
+   1.71%  firefox-bin  libplds4.so                  [.] PL_HashTableRawLookup
+   1.63%  firefox-bin  libpthread-2.11.3.so         [.] pthread_mutex_trylock
+   1.47%  firefox-bin  libc-2.11.3.so               [.] __strcmp_sse42
+   1.41%  firefox-bin  libxul.so                    [.] js::Interpret(JSContext*, js::StackFrame*, js::InterpMode, bool)
+   1.26%  firefox-bin  libxul.so                    [.] js::gc::Cell::markIfUnmarked(unsigned int) const
+   1.20%  firefox-bin  libxul.so                    [.] void MarkInternal<JSAtom>(JSTracer*, JSAtom**)
+   1.19%  firefox-bin  libxul.so                    [.] JSCompartment::sweepBreakpoints(js::FreeOp*)
+   1.04%  firefox-bin  libpthread-2.11.3.so         [.] pthread_getspecific
+   0.98%  firefox-bin  libxul.so                    [.] GetBloatEntry(char const*, unsigned int)
+   0.95%  firefox-bin  libnspr4.so                  [.] PR_Unlock
+   0.95%  firefox-bin  firefox-bin                  [.] arena_malloc
+   0.92%  firefox-bin  libxul.so                    [.] js::ReadBarrieredValue::get() const
+   0.91%  firefox-bin  firefox-bin                  [.] arena_dalloc
+   0.87%  firefox-bin  libxul.so                    [.] js::ObjectImpl::lastProperty() const
+   0.82%  firefox-bin  libxul.so                    [.] js::gc::Cell::arenaHeader() const
+   0.80%  firefox-bin  libxul.so                    [.] js::UncheckedUnwrap(JSObject*, bool, unsigned int*)
+   0.68%  firefox-bin  libxul.so                    [.] SearchTable(PLDHashTable*, void const*, unsigned int, PLDHashOperator)
+   0.66%  firefox-bin  libnspr4.so                  [.] PR_GetCurrentThread
+   0.63%  firefox-bin  libxul.so                    [.] void MarkInternal<JSObject>(JSTracer*, JSObject**)
+   0.63%  firefox-bin  libxul.so                    [.] js::gc::Cell::isMarked(unsigned int) const
+   0.62%  firefox-bin  libxul.so                    [.] js::gc::FreeSpan::checkSpan() const

Around 25-30% of the runtime for all these tests is being consumed in GC; the #ifdef DEBUG checks are apparently killing us here.  Bill, is that expected?

Flags: needinfo?(wmccloskey)

Andrew McCreight [:mccr8]

Comment 16

•

11 years ago

Terrence might also have some idea.

Andrew McCreight [:mccr8]

Comment 17

•

11 years ago

The isTenured stuff is fairly recently, so that can't be the whole story.

At one point, the debugger would trigger various nasty GCs, but I don't recall if that's still the case.

Nathan Froyd [:froydnj]

Comment 18

•

11 years ago

(In reply to Andrew McCreight [:mccr8] from comment #17)
> The isTenured stuff is fairly recently, so that can't be the whole story.
> 
> At one point, the debugger would trigger various nasty GCs, but I don't
> recall if that's still the case.

I hate perf's interface for viewing things, but if I'm understanding it correctly, it looks like the GCs are getting triggered under js::Debugger::addDebuggee.  The callstacks for isTenured look something like:

js::Debugger::addDebuggee
IncrementalCollectSlice
js::GCMarker::drainMarkStack
js::GCMarker::processMarkStack

and then various bits under that.

Terrence Cole [:terrence]

Assignee

Comment 19

•

11 years ago

(In reply to Andrew McCreight [:mccr8] from comment #17)
> The isTenured stuff is fairly recently, so that can't be the whole story.

IsTenured is particularly bad because the call is not inlinable in a path that is otherwise fully inlined. It would be nice to know which exact usage is the costly one here. Regarding inlining, we recently changed the semantics of JS_ALWAYS_INLINE; this may also have had an impact.

> At one point, the debugger would trigger various nasty GCs, but I don't
> recall if that's still the case.

I'm not aware of anything changing there recently.

(In reply to Nathan Froyd (:froydnj) from comment #18)
> I hate perf's interface for viewing things, 

Likewise.

> but if I'm understanding it
> correctly, it looks like the GCs are getting triggered under
> js::Debugger::addDebuggee.  The callstacks for isTenured look something like:
> 
> js::Debugger::addDebuggee
> IncrementalCollectSlice
> js::GCMarker::drainMarkStack
> js::GCMarker::processMarkStack
> 
> and then various bits under that.

Well, the isTenured assertion should fire any time that we have to access the arena header: only tenured objects have one, so this is pretty important. This access typically happens at the leaves, however: can we get a full stack to see which particular callsite(s) are making the assertion?

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 20

•

11 years ago

I doubt this has anything to do with isTenured. It just happens to show up in the profile because it's not inlined.

I'll look into this later today. Do I read comment 13 right: the debugger tests run 100x slower in debug builds than in opt builds?

Flags: needinfo?(wmccloskey)

Nathan Froyd [:froydnj]

Comment 21

•

11 years ago

(In reply to Bill McCloskey (:billm) from comment #20)
> I'll look into this later today. Do I read comment 13 right: the debugger
> tests run 100x slower in debug builds than in opt builds?

They do run slower.  But I see now I scrambled my numbers when I was doing the calculations for how much slower they ran.  100x would be a pretty big deal!

It looks like individual tests are around 8x slower in debug builds than in opt builds, just from glancing at the top several offenders in the list I attached.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 22

•

11 years ago

I gathered some data to look at the GC issue in more detail. During a normal Win7 debug build (that had a few weird errors that hopefully don't affect these results), we spent 2041 seconds in GC, out of a total of 7205 seconds. That's 28%. I also tried to turn off two forms of debug checks. These are the numbers:

                               In GC (s)  Total (s)  Pct in GC
normal build                      2041      7205       28%
no incremental validation         1644      6630       25%
no DEBUG checks during marking     981      5751       17%

(Removing the DEBUG checks also speeds up allocation, if you're wondering.)

The try builds are here if anyone is interested:
normal (with printfs): https://tbpl.mozilla.org/?tree=Try&rev=0e8e57c87159
no validation: https://tbpl.mozilla.org/?tree=Try&rev=cce0664b5015
no DEBUG checks: https://tbpl.mozilla.org/?tree=Try&rev=0568f161e182

The cost of the incremental validation doesn't seem too extreme to me. The DEBUG checks are more expensive than I would have expected. I'm going to do a few more builds to try to isolate the cost of each kind of check.

I also looked at why we're doing so much garbage collection. Here's a breakdown of the GC reasons for non-incremental GCs, and how often a GC is invoked for that reason:

{'LAST_CONTEXT': 52, 'TOO_MUCH_MALLOC': 2, 'MEM_PRESSURE': 1, 'DOM_WORKER': 539, 'SHUTDOWN_CC': 2, 'DESTROY_CONTEXT': 52, 'DOM_UTILS': 51, 'COMPONENT_UTILS': 1707, 'API': 1, 'SET_NEW_DOCUMENT': 1, 'ALLOC_TRIGGER': 1, 'DEBUG_MODE_GC': 415}

I also looked at the reason why we initiate incremental GCs:

{'CC_WAITING': 61, 'FULL_GC_TIMER': 15, 'PAGE_HIDE': 251, 'MAYBEGC': 1, 'SET_NEW_DOCUMENT': 180}

The incremental ones are pretty typical, and there aren't very many of them. The outliers are these:

COMPONENT_UTILS: 1707
DOM_WORKER: 539
DEBUG_MODE_GC: 415

The DOM_WORKER ones happen on worker threads, so they may not affect the total running time very much. The DEBUG_MODE_GCs happen whenever we turn the debugger on or off, so I guess we should expect a lot of them. The COMPONENT_UTILS GCs seem pretty extreme. I'm going to try to find out if we can eliminate some or all of them. They account for 51% of all GCs.

Andrew McCreight [:mccr8]

Comment 23

•

11 years ago

That's pretty weird.  I guess people just sprinkle forceGC calls all over their tests?  I know there have been a few intermittent oranges that people try to force into a particular test by forcing GC.

Andrew McCreight [:mccr8]

Comment 24

•

11 years ago

Ah, it looks like checkForLeakedGlobalWindows schedules a precise GC, so maybe this is a regression from the work in bug 658738 to detect leaking windows?

Boris Zbarsky [:bzbarsky]

Comment 25

•

11 years ago

There aren't many forceGC in browser-chrome tests, at first glance...

Boris Zbarsky [:bzbarsky]

Comment 26

•

11 years ago

On the other hand, our HTTP server does a forceGC every time a connection closes.  :(

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 27

•

11 years ago

It looks like bz is right. The COMPONENT_UTILS GCs are from the HTTP server, which is in another process. They just happen to get printed to the log with everything else. It looks like they get run after the connection is closed, so they probably don't affect the total running time of the tests much (assuming the test slaves have multiple cores). Strangely, the relevant bug (bug 508125) is inaccessible to me. I hope I'm not about to get fired or something.

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 28

•

11 years ago

The real bug is bug 508128, the commit message just had a typo.

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 29

•

11 years ago

(In reply to Andrew McCreight [:mccr8] from comment #24)
> Ah, it looks like checkForLeakedGlobalWindows schedules a precise GC, so
> maybe this is a regression from the work in bug 658738 to detect leaking
> windows?

It would certainly be interesting to compare test run times without the leak detection.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 30

•

11 years ago

How often does the leak detection run? I thought it was only at the end of the test run.

From bug 508128 (thanks Gavin), it sounds like we didn't used to have a GC trigger based on the number of bytes allocated. We do now. It's possible we could just remove that GC. I'll test. I'm still not sure if that would help, though.

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 31

•

11 years ago

It runs after each individual test, because it wants to blame individual tests for leaking things. So on the order of 1000+ times per "bc" run.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 32

•

11 years ago

(In reply to :Gavin Sharp (use gavin@gavinsharp.com for email) from comment #31)
> It runs after each individual test, because it wants to blame individual
> tests for leaking things. So on the order of 1000+ times per "bc" run.

It looks like it only does the leak checking when the done flag is set:
  http://mxr.mozilla.org/mozilla-central/source/testing/mochitest/browser-test.js#310
I think that only happens at the end?

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 33

•

11 years ago

((In reply to Bill McCloskey (:billm) from comment #32)
> I think that only happens at the end?

Yeah, looks like my recollection of how this works was wrong.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 34

•

11 years ago

OK, I was wrong. It looks like the isTenured thing is a big problem. If I eliminate that check, we go from ~120 mins to ~105 mins. None of the other changes I tried had any effect. Terrence, can you try inlining that path?

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 35

•

11 years ago

Also, I tried removing the forceGC call from httpd.js. Doing so didn't cause the test to go orange, but it also didn't improve the running time at all. We should probably leave it alone.

(not currently active) Ted Mielczarek

Comment 36

•

11 years ago

Do we have any theories as to why this
a) is worse on windows
b) regressed significantly in recent history

Terrence Cole [:terrence]

Assignee

Comment 37

•

11 years ago

Well, (b) at least is my fault. I'll work on fixing that.

Assignee: nobody → terrence

Terrence Cole [:terrence]

Assignee

Comment 38

•

11 years ago

Attached patch v0 — Details — Splinter Review

Attachment #744852 - Flags: review?(wmccloskey)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 39

•

11 years ago

I just noticed that the running time for this test dropped a lot recently, from ~120 mins to ~100 mins. It seems to have happened between these checkins:

120 mins: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=ba3ce7dde51f
100 mins: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=e13bb42d2872

Ted, is there anything in your checkin that we would expect would make builds go faster? Only Windows debug builds seem to be affected. Other tests also seemed to get faster too.

Flags: needinfo?(ted)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Updated

•

11 years ago

Attachment #744852 - Flags: review?(wmccloskey) → review+

(not currently active) Ted Mielczarek

Comment 40

•

11 years ago

This is completely unexpected. I talked to billm on IRC, the only thing I can imagine is that that patch that adds -RELEASE to LDFLAGS does something. He's going to try backing it out on Try and see what happens. My only theory is that maybe it makes loading symbols for when we get assertion stacks faster somehow? I know WinDBG complains about the missing checksum when it loads symbols, so maybe that causes a slowdown somehow?

Flags: needinfo?(ted)

Terrence Cole [:terrence]

Assignee

Comment 41

•

11 years ago

Green try at:
https://tbpl.mozilla.org/?tree=Try&rev=2090e7142df4

Pushed at:
https://hg.mozilla.org/integration/mozilla-inbound/rev/75699d465ba0

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 42

•

11 years ago

Oh, I forgot to say, Ted's patch in bug 431048 is what caused the running time to go down so much. It will be interesting to see if Terrence's patch gets us down even further.

cmtalbert

Comment 43

•

11 years ago

(In reply to Terrence Cole [:terrence] from comment #41)
> Green try at:
> https://tbpl.mozilla.org/?tree=Try&rev=2090e7142df4
> 
> Pushed at:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/75699d465ba0

From spot checking TBPL, it looks like this helped us save about 10 minutes on XP and 7 and 5 minutes on win8. Nice job. :-)

Ryan VanderMeulen [:RyanVM]

Comment 44

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/75699d465ba0

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla23

Nobody; OK to take it and work on it

Updated

•

6 years ago

Component: BrowserTest → Mochitest

increase browser-chrome script_maxtime to 9000 seconds 11 years ago Aki Sasaki (not active) 804 bytes, patch	philor : review+	Details \| Diff \| Splinter Review
Some historical build time data 11 years ago Kartikaya Gupta (email:kats@mozilla.staktrace.com) 1.86 KB, text/plain		Details
browser-chrome test scripts sorted by execution time 11 years ago Nathan Froyd [:froydnj] 149.82 KB, text/plain		Details
v0 11 years ago Terrence Cole [:terrence] 2.39 KB, patch	billm : review+	Details \| Diff \| Splinter Review