Closed
Bug 791335
Opened 12 years ago
Closed 11 years ago
Add timeouts to 'make check' and alive tests
Categories
(Release Engineering :: General, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: kmoir, Assigned: catlee)
References
Details
(Whiteboard: [buildduty][simple])
Attachments
(1 file)
4.57 KB,
patch
|
bhearsum
:
review+
philor
:
review+
catlee
:
checked-in+
|
Details | Diff | Splinter Review |
Ran into some OS X 10.7 64-bit try leak test builds today that were hung for several days. Not sure why they aren't timing out given that they have a timeout of 1200 seconds configured.
Comment 1•12 years ago
|
||
Sounds like Buildbot might not be killing the processes correctly. Can you point me at a hung job?
Reporter | ||
Comment 2•12 years ago
|
||
Here's one of the jobs I killed http://buildbot-master33.srv.releng.scl3.mozilla.com:8101/builders/OS%20X%2010.7%2064-bit%20try%20leak%20test%20build/builds/1711
Comment 3•12 years ago
|
||
Huh. I don't see anything in that log that indicates it tried to kill it after N seconds: ************************************************************ WARNING: 1 sort operation has occurred for the SQL statement '0x11f98c080'. See https://developer.mozilla.org/En/Storage/Warnings details.: file /builds/slave/try-osx64-dbg/build/storage/src/mozStoragePrivateHelpers.cpp, line 110 command interrupted, attempting to kill process killed by signal 9 program finished with exit code -1 elapsedTime=32876.270932 So it doesn't seem like Buildbot is failing to kill it (which is what I suspected), but rather that it's not trying to kill it all, like you say.
Assignee | ||
Comment 4•12 years ago
|
||
Which step is that in? Looks like buildbotcustom.steps.unittest.MozillaCheck needs a maxTime and/or timeout set.
Comment 5•12 years ago
|
||
I'm pretty sure that was during an alive test, but I'm not 100% sure.
Reporter | ||
Comment 6•12 years ago
|
||
Yes, it was during an alive test.
Assignee | ||
Comment 7•12 years ago
|
||
Many of the AliveTest steps also have no timeout/maxTime set.
Priority: -- → P2
Summary: OS X 10.7 64-bit try leak test builds don't timeout → Add timeouts to 'make check' and alive tests
Comment 8•11 years ago
|
||
I just killed 13 debug Linux build jobs on the Larch twig, which had been "running" for up to 6891 minutes. What does 6891 minutes of ec2 slave time cost?
Updated•11 years ago
|
Whiteboard: [buildduty]
Assignee | ||
Comment 9•11 years ago
|
||
(In reply to Phil Ringnalda (:philor) from comment #8) > I just killed 13 debug Linux build jobs on the Larch twig, which had been > "running" for up to 6891 minutes. > > What does 6891 minutes of ec2 slave time cost? about $60
Assignee: nobody → catlee
Whiteboard: [buildduty] → [buildduty][simple]
Assignee | ||
Comment 10•11 years ago
|
||
removed some unused imports too This sets the default timeout for the alive tests to 5 minutes, and maxTime to 10 minutes. I think this is more than enough for regular operations? Also set the default timeout for ShellCommandReportTimeout to 2 hours / maxTime of 4 hours. This base class is used by the make check step, and various test steps. The only impact according to dump_masters diff is to 'make check', mobile mochitests and mobile reftests. In all 3 cases this adds maxTime = 4h.
Attachment #740488 -
Flags: review?(philringnalda)
Attachment #740488 -
Flags: review?(bhearsum)
Comment 11•11 years ago
|
||
Comment on attachment 740488 [details] [diff] [review] reduce check/alive times Review of attachment 740488 [details] [diff] [review]: ----------------------------------------------------------------- The Windows trace malloc alive tests used to take a very very long time, but it looks like they're much quicker now (< 1min). Should be fine.
Attachment #740488 -
Flags: review?(bhearsum) → review+
Comment 12•11 years ago
|
||
Comment on attachment 740488 [details] [diff] [review] reduce check/alive times lgtm
Attachment #740488 -
Flags: review?(philringnalda) → review+
Assignee | ||
Updated•11 years ago
|
Attachment #740488 -
Flags: checked-in+
Comment 13•11 years ago
|
||
Whee, https://tbpl.mozilla.org/php/getParsedLog.php?id=22173156&tree=Larch got clubbed right between the eyes after 600 seconds, just like it should have :) Oh, also, "this is in production."
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Updated•6 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•