Add timeouts to 'make check' and alive tests

RESOLVED FIXED

Status

Release Engineering
General
P2
normal
RESOLVED FIXED
6 years ago
a month ago

People

(Reporter: kmoir, Assigned: catlee)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [buildduty][simple])

Attachments

(1 attachment)

(Reporter)

Description

6 years ago
Ran into some OS X 10.7 64-bit try leak test builds today that were hung for several days.  Not sure why they aren't timing out given that they have a timeout of 1200 seconds configured.
Sounds like Buildbot might not be killing the processes correctly. Can you point me at a hung job?
Huh. I don't see anything in that log that indicates it tried to kill it after N seconds:
************************************************************
WARNING: 1 sort operation has occurred for the SQL statement '0x11f98c080'.  See https://developer.mozilla.org/En/Storage/Warnings details.: file /builds/slave/try-osx64-dbg/build/storage/src/mozStoragePrivateHelpers.cpp, line 110

command interrupted, attempting to kill
process killed by signal 9
program finished with exit code -1
elapsedTime=32876.270932

So it doesn't seem like Buildbot is failing to kill it (which is what I suspected), but rather that it's not trying to kill it all, like you say.
(Assignee)

Comment 4

6 years ago
Which step is that in?

Looks like buildbotcustom.steps.unittest.MozillaCheck needs a maxTime and/or timeout set.
I'm pretty sure that was during an alive test, but I'm not 100% sure.
(Reporter)

Comment 6

6 years ago
Yes, it was during an alive test.
(Assignee)

Comment 7

6 years ago
Many of the AliveTest steps also have no timeout/maxTime set.
Priority: -- → P2
Summary: OS X 10.7 64-bit try leak test builds don't timeout → Add timeouts to 'make check' and alive tests
I just killed 13 debug Linux build jobs on the Larch twig, which had been "running" for up to 6891 minutes.

What does 6891 minutes of ec2 slave time cost?
Blocks: 864088

Updated

5 years ago
Whiteboard: [buildduty]
(Assignee)

Comment 9

5 years ago
(In reply to Phil Ringnalda (:philor) from comment #8)
> I just killed 13 debug Linux build jobs on the Larch twig, which had been
> "running" for up to 6891 minutes.
> 
> What does 6891 minutes of ec2 slave time cost?

about $60
Assignee: nobody → catlee
Whiteboard: [buildduty] → [buildduty][simple]
(Assignee)

Comment 10

5 years ago
Created attachment 740488 [details] [diff] [review]
reduce check/alive times

removed some unused imports too

This sets the default timeout for the alive tests to 5 minutes, and maxTime to 10 minutes. I think this is more than enough for regular operations?

Also set the default timeout for ShellCommandReportTimeout to 2 hours / maxTime of 4 hours. This base class is used by the make check step, and various test steps. The only impact according to dump_masters diff is to 'make check', mobile mochitests and mobile reftests. In all 3 cases this adds maxTime = 4h.
Attachment #740488 - Flags: review?(philringnalda)
Attachment #740488 - Flags: review?(bhearsum)
Comment on attachment 740488 [details] [diff] [review]
reduce check/alive times

Review of attachment 740488 [details] [diff] [review]:
-----------------------------------------------------------------

The Windows trace malloc alive tests used to take a very very long time, but it looks like they're much quicker now (< 1min). Should be fine.
Attachment #740488 - Flags: review?(bhearsum) → review+
Comment on attachment 740488 [details] [diff] [review]
reduce check/alive times

lgtm
Attachment #740488 - Flags: review?(philringnalda) → review+
(Assignee)

Updated

5 years ago
Attachment #740488 - Flags: checked-in+
Whee, https://tbpl.mozilla.org/php/getParsedLog.php?id=22173156&tree=Larch got clubbed right between the eyes after 600 seconds, just like it should have :)

Oh, also, "this is in production."
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Component: General Automation → General
Product: Release Engineering → Release Engineering
You need to log in before you can comment on or make changes to this bug.