Reduce buildbot timeout on unit test machines?

RESOLVED FIXED

Status

defect
P2
normal
RESOLVED FIXED
11 years ago
6 years ago

People

(Reporter: Gavin, Assigned: lsblakk)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

The buildbot timeout is currently set to 2400 seconds (40 minutes) for all of the unit test machines, which seems quite high. Based on the timeout message (e.g. "command timed out: 2400 seconds without output, killing pid 20669"), it appears that timeout is based on "time without output", and it seems crazy that there would be legitimate cases where we go for 40 minutes without outputting anything at all.

Is this just a default, or have we actually needed it that high? It makes it painful to try and debug hanging failures because it makes the minimum cycle time 40 minutes (not including building, running other tests, etc.). It would be nice to lower this if possible.

http://hg.mozilla.org/build/index.cgi/buildbot-configs/file/tip/mozilla2-unittest/master.cfg
Need to investigate. If true, then yes, waiting 40mins to detect that there's no outout is silly. However, we do not want to reduce this down right now, in case we go too far and accidently start causing more intermittent unittest failures. 

Triaging to Future for now, but will revisit after unittests are more stable, and dependent bug closed.
Component: Release Engineering → Release Engineering: Future
Depends on: 438871
Priority: -- → P3
(In reply to comment #1)
> However, we do not want to reduce this down right now, in
> case we go too far and accidently start causing more intermittent unittest
> failures. 

If that happens, the cause of the failures will be obvious from the log message and you can restore or increase the timeout. I'm pretty confident that even 20 minutes without output is an exceptional circumstance. 

> Triaging to Future for now, but will revisit after unittests are more stable,
> and dependent bug closed.

I noticed this precisely because it was making my life trying to debug an unstable test harder, so I'm not sure why we would wait to investigate until after unit tests are more stable. I don't see why this has anything to do with bug 438871.
Turns out we've been down this road before - bug 438324, some of the changes there came unstuck. Once that stuff gets back into production, lets see where we are with timeouts. As a head start, I'm testing shorter timeouts on the staging setup for mozilla-central, http://staging-master.build.mozilla.org:2010/waterfall
20 mins for win32 "make -f client.mk build", 5 mins for everything else.
Component: Release Engineering: Future → Release Engineering
OS: Mac OS X → All
Priority: P3 → P2
Yeah, gavin commented in bug 438324, mentioning this bug. :) Did those timeouts not stick because they were causing problems, or were they accidentally removed?
I hope the changes were accidentally reverted when moving buildbot masters around, the config setup is a bit messy. That said, the timeout's aren't reduced in attachment 328736 [details] [diff] [review] (mozilla-central) as they are in attachment 328158 [details] [diff] [review] (1.9.0). So we'll need to do that somewhere.
I probably did this is the moving of things, so I'll go through and make sure they are all reduced as per comment #3.
Assignee: nobody → lukasblakk
I think the timeouts are right in the various buildbot-configs in CVS, the masters are just not running with the most recent version of the file.
Attachment #341672 - Flags: review?(lukasblakk) → review+
Comment on attachment 341672 [details] [diff] [review]
[checked in] mozilla2-unittest: add timeouts to all Compile steps (40min), and reduce test step timeouts to 5min

changeset:   421:9c2c11dae135
Attachment #341672 - Attachment description: mozilla2-unittest: add timeouts to all Compile steps (40min), and reduce test step timeouts to 5min → [checked in] mozilla2-unittest: add timeouts to all Compile steps (40min), and reduce test step timeouts to 5min
The masters are now running with the most recent versions of the file - so I believe we can close this.

Any reason to keep it open?
(In reply to comment #10)
> Any reason to keep it open?

Nope.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.