Closed Bug 793782 Opened 12 years ago Closed 12 years ago

Stop appending redundant "buildbot.slave.commands.TimeoutError" string

Categories

(Release Engineering :: General, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

References

Details

Attachments

(1 file)

Normally buildbot timeouts occur as: { 23490 INFO TEST-PASS | /tests/content/base/test/test_bug473162-2.html | Unexpected first node - [object HTMLSpanElement] should equal [object HTMLSpanElement] 23491 INFO TEST-PASS | /tests/content/base/test/test_bug473162-2.html | Unexpected second node - [object HTMLSpanElement] shou command timed out: 2400 seconds without output, killing pid 20841 process killed by signal 9 program finished with exit code -1 elapsedTime=2698.630792 TinderboxPrint: mochitest-plain<br/><em class="testfail">T-FAIL</em> buildbot.slave.commands.TimeoutError: command timed out: 2400 seconds without output, killing pid 20841 TinderboxPrint: mochitest-plain<br/><em class="testfail">timeout</em> ========= Finished 'python mochitest/runtestsremote.py ...' warnings (results: 1, elapsed: 44 mins, 59 secs) (at 2012-09-23 20:11:31.546325) ========= } (https://tbpl.mozilla.org/php/getParsedLog.php?id=15470170&tree=Firefox#error0) However, with mock (so I presume due to mozharness?), we get just: { localhost.localdomain - - [24/Sep/2012 08:04:57] "GET /safebrowsing-dummy/newkey HTTP/1.1" 404 - --DOMWINDOW == 15 (0x39c3630) [serial = 15] [outer = 0x398c1e0] [url = about:blank] --DOMWINDOW == 14 (0x2c36a80) [serial = 9] [outer = 0x2aab5e0] [url = about:blank] --DOMWINDOW == 13 (0x3074f30) [serial = 12] [outer = 0x2aab5e0] [url = about:blank] --DOMWINDOW == 12 (0x106c5d0) [serial = 2] [outer = 0x13c9700] [url = about:blank] localhost.localdomain - - [24/Sep/2012 08:05:01] "GET /sample.html HTTP/1.1" 200 - ++DOMWINDOW == 13 (0x1358910) [serial = 17] [outer = 0x398c1e0] command timed out: 3600 seconds without output, attempting to kill program finished with exit code 247 elapsedTime=3609.427740 ========= Finished alive test failed (results: 2, elapsed: 1 hrs, 9 secs) (at 2012-09-24 06:05:01.341018) ========= } (this example from https://tbpl.mozilla.org/php/getParsedLog.php?id=15479030&tree=Firefox) TBPL's parser only recognises "buildbot.slave.commands.TimeoutError:", so misses the latter case.
Blocks: 793016
The former case looks like mozharness (with the timestamp + INFO on every line); the latter looks like non-mozharness.
n/m, neither is mozharness.
Also: https://tbpl.mozilla.org/php/getParsedLog.php?id=15846492&tree=Thunderbird-Trunk { WARNING: NS_ENSURE_TRUE(wrapper) failed: file ../../../../mozilla/dom/base/nsJSUtils.cpp, line 81 WARNING: NS_ENSURE_TRUE(wrapper) failed: file ../../../../mozilla/dom/base/nsJSUtils.cpp, line 81 command timed out: 1200 seconds without output, attempting to kill program finished with exit code 247 elapsedTime=1227.483098 ========= Finished 'mock_mozilla -v ...' failed (results: 2, elapsed: 20 mins, 27 secs) (at 2012-10-05 08:38:49.536995) ========= } The mock changes have regressed TBPL visibility for these failures; please may someone take a look? (I have no idea where to start with mock).
Keywords: regression
Can we update TBPL to look for the new error message? I don't think updating mock to print out error messages that look like they come from buildbot is a good idea.
Priority: -- → P3
(In reply to Chris AtLee [:catlee] from comment #4) > Can we update TBPL to look for the new error message? I don't think updating > mock to print out error messages that look like they come from buildbot is a > good idea. We could, but it would mean we match twice against the buildbot logs, due to: https://hg.mozilla.org/build/buildbotcustom/file/tip/steps/unittest.py#l373 ...and I believe we won't be able to add "^command timed out:" to TBPL (to avoid the double match), since once we switch to mozharness, it prefixes the lines. However, I can't see anything else that uses the buildbot prefixed line - so would you be ok with removing: 373 'buildbot.slave.commands.TimeoutError: ' + 374 line + ...and then we can just get TBPL to match against "command timed out: " rather than "buildbot.slave.commands.TimeoutError".
Assignee: nobody → bmo
Status: NEW → ASSIGNED
Flags: needinfo?(catlee)
catlee, ping on the question in comment 5.
Ok, presuming the answer to comment 5 is yes... Bug 813039 will make TBPL match against "comment timed out:" directly, which will cover both the buildbot and mock cases. We can then cleanup: https://hg.mozilla.org/build/buildbotcustom/file/tip/steps/unittest.py#l373
No longer blocks: 778688
Depends on: 813039
Summary: mock's "command timed out: 3600 seconds without output" failures not accompanied by "buildbot.slave.commands.TimeoutError" → Stop appending redundant "buildbot.slave.commands.TimeoutError" string
Attached patch Patch v1Splinter Review
(Line was originally added by bug 438324)
Attachment #683070 - Flags: review?(catlee)
Attachment #683070 - Flags: review?(catlee) → review?(bhearsum)
Flags: needinfo?(catlee)
Whiteboard: [sheriff-want]
Comment on attachment 683070 [details] [diff] [review] Patch v1 Review of attachment 683070 [details] [diff] [review]: ----------------------------------------------------------------- Shouldn't we take out the TinderboxPrint at the same time?
Attachment #683070 - Flags: review?(bhearsum) → review+
Phil, do we hit the "TinderboxPrint: <testsuite-name> \n timeout" case any more, or just the s/timeout/T-FAIL/ one elsewhere?
(In reply to Ben Hearsum [:bhearsum] from comment #9) > Shouldn't we take out the TinderboxPrint at the same time? Philor says that we are still using this. I trust him more than me on this! hehe :-) https://hg.mozilla.org/build/buildbotcustom/rev/afb1205961f3
Keywords: regression
In production.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: