The default bug view has changed. See this FAQ.

Add the build step or else process name to buildbot's generic command timed out failure strings

RESOLVED FIXED

Status

Release Engineering
General Automation
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: emorley, Assigned: emorley)

Tracking

(Depends on: 1 bug, Blocks: 1 bug, {sheriffing-P1})

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

Bug 778688 comment 38 covers a number of intermittent failures where we have the generic log output:
"command timed out: N seconds without output, attempting to kill"

Whilst we've added this to the TBPL regexes, so we can use TBPL's bug suggestion feature - the messages are generic, so many suggestions are shown, eg:

https://bugzilla.mozilla.org/buglist.cgi?quicksearch=command%20timed%20out%20keywords%3Aintermittent

Whilst I'd prefer the worst of these failure modes to be handled by the mozharness/test harness/... itself, we're always going to have edge cases where timeouts occur and it's not worth adding TBPL-compatible failure messages to that script.

As such, I was thinking we should prefix the timeout messages with the build step name or else the process name (former preferred) here:

http://hg.mozilla.org/build/buildbot/file/6d8bc44b5874/slave/buildslave/runprocess.py#l657
   657     def doTimeout(self):
   658         self.timer = None
   659         msg = "command timed out: %d seconds without output" % self.timeout
   660         self.kill(msg)
   661 
   662     def doMaxTimeout(self):
   663         self.maxTimer = None
   664         msg = "command timed out: %d seconds elapsed" % self.maxTime
   665         self.kill(msg)

Now I know buildbot patches are generally a bit more awkward - so don't know if you think we would need to upstream first - or even whether they'd take it the change?

Dustin, what do you think? :-)
Blocks: 778688
I'd like to see that upstream, sure.

Shipping a change to non-Windows systems is pretty easy - it's done with Puppet.  Windows is still hard.
Upstream PR:
https://github.com/buildbot/buildbot/pull/1130
Assignee: nobody → emorley
Status: NEW → ASSIGNED
Duplicate of this bug: 778690
Created attachment 8408096 [details] [diff] [review]
For timeouts include the command being run in the failure string

Backport of upstream commit:
https://github.com/buildbot/buildbot/commit/b66cc92ee5a7ec0e923c7ab2055a38d07ac3515c

I've checked that we won't break any of the current regex:
https://mxr.mozilla.org/build/search?string=command%2Btimed%2Bout
Attachment #8408096 - Flags: review?(dustin)
Comment on attachment 8408096 [details] [diff] [review]
For timeouts include the command being run in the failure string

Assuming you're confident that fake_command works the same way in 0.8.2, this looks just like the patch I merged :)
Attachment #8408096 - Flags: review?(dustin) → review+
Landed on default & transplanted to production-0.8, since there are buildbot master changes that require a restart that did not want to be merged across just yet.

https://hg.mozilla.org/build/buildbot/rev/6ac947fa5721
https://hg.mozilla.org/build/buildbot/rev/8a9e33843c3f
Depends on: 1009584
Whiteboard: [waiting on bug 1009584]
This is still waiting for bug 1009584 to actually be deployed, but closing this so it still appearing in bugzilla-todos.
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
Whiteboard: [waiting on bug 1009584] → [waiting on bug 1009584 for deployment]
This is deployed on !Windows; bug 1042597 will take care of Windows.

@Sheriffs: Note this bug changes "command timed out: 2400 seconds without output" to "command timed out: 2400 seconds without output running <cmd...>"
Depends on: 1042597
Whiteboard: [waiting on bug 1009584 for deployment]
You need to log in before you can comment on or make changes to this bug.