Closed Bug 510552 Opened 15 years ago Closed 15 years ago

buildbot should detect hangs better

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: joduinn, Assigned: lsblakk)

References

Details

Currently, buildbot can be configured to timeout and kill jobs that produce no output for a set amount of time.

However in cases where the job is hung, yet continues producing a little output, this fools buildbot into thinking job is still running, so doesnt kill it. This ends up with slaves being 100% occupied with never-ending jobs. We need a way to kill jobs after some time - regardless of whether there is output being generated or not. This requires an upstream fix in buildbot.

This is a blocker on running unittests on debug builds, which is a Q3 goal.
Summary: buildbot should handle timeouts better → buildbot should detect hangs better
Moving to future till dependent bugs are fixed.
Component: Release Engineering → Release Engineering: Future
What?  This bug isn't isn't marked as depending on anything.
Component: Release Engineering: Future → Release Engineering
On https://bugzilla.mozilla.org/show_bug.cgi?id=372581#c73

catlee says:
> (In reply to comment #71)
> > So mochitest-browser-chrome worked for me on 1.9.1 on a Linux debug build.
> > 
> > That said, from the log, it looks like you hit a known random orange:  bug
> > 498339.  But since you were in a debug build, the infinite loop in question was
> > producing output.  Is there a way we could make buildbot detect the process as
> > hung if |timeout| seconds of output don't contain the string "TEST-", rather
> > than checking for no output at all?
> 
> Yeah, I've got an upstream patch to buildbot checked in that sets a maximum run
> time for these shell commands, regardless of if output is being generated. 
> We'll either get this when we upgrade to buildbot 0.7.13, or if we decide to
> cherry-pick those changes earlier.
> 
> Having a maximum on the log size sounds like a great idea as well.
#### end of what catlee said

We will have to wait to see if that patch is already in our buildbot repos
(In reply to comment #3)
> On https://bugzilla.mozilla.org/show_bug.cgi?id=372581#c73
> 
> catlee says:
> > (In reply to comment #71)
> > > So mochitest-browser-chrome worked for me on 1.9.1 on a Linux debug build.
> > > 
> > > That said, from the log, it looks like you hit a known random orange:  bug
> > > 498339.  But since you were in a debug build, the infinite loop in question was
> > > producing output.  Is there a way we could make buildbot detect the process as
> > > hung if |timeout| seconds of output don't contain the string "TEST-", rather
> > > than checking for no output at all?
> > 
> > Yeah, I've got an upstream patch to buildbot checked in that sets a maximum run
> > time for these shell commands, regardless of if output is being generated. 
> > We'll either get this when we upgrade to buildbot 0.7.13, or if we decide to
> > cherry-pick those changes earlier.
> > 
> > Having a maximum on the log size sounds like a great idea as well.
> #### end of what catlee said
> 
> We will have to wait to see if that patch is already in our buildbot repos

bhearsum: can you let us know if this patch is in our repo? It'd be great if we can try this out in staging... Otherwise, I guess we'd need to do a buildbot upgrade ??
(In reply to comment #4)
> bhearsum: can you let us know if this patch is in our repo? It'd be great if we
> can try this out in staging... Otherwise, I guess we'd need to do a buildbot
> upgrade ??

It's not in our repo yet. We can cherry pick it though, we don't have to do a full upgrade.
User repo with cherry picked patch in it:

http://hg.mozilla.org/users/bhearsum_mozilla.com/buildbot/
Assignee: nobody → lsblakk
Status: NEW → ASSIGNED
This got landed as part of bug 514242
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.