Closed Bug 758988 Opened 12 years ago Closed 12 years ago

Peptest jobs don't appear to have a timeout

Categories

(Release Engineering :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Assigned: mozilla)

References

Details

Attachments

(1 file)

Last night, I killed several Mac Peptest runs through self-serve that claimed to have been running for one or two days, expecting that they were zombies left over from last week's db issues. However, killing them appeared to work, which mostly seems to mean that the jobs were actually still live.

Today, there's another one which claims to have been running for 6 hours so far, in https://tbpl.mozilla.org/?noignore=1&tree=Mozilla-Inbound&rev=4ce695c26fc5&jobname=10.5.8%20mozilla-inbound%20opt%20test%20peptest which might mean that it really has been tying up a slave for 6 hours, or might mean it died some time ago and didn't tell anyone, or, dunno, maybe it never really happened at all.
The job is genuinely still running, the tail of the log looks like this:

06:14:29     INFO - #####
06:14:29     INFO - ##### Running run-peptest step.
06:14:29     INFO - #####
06:14:29     INFO - Running command: ['/Users/cltbld/talos-slave/test/build/venv/bin/python', '-u', '/Users/cltbld/talos-slave/test/build/tests/peptest/peptest/runpeptests.py', '--app', 'firefox', '--binary', '/Users/cltbld/talos-slave/test/build/application/FirefoxNightly.app/Contents/MacOS/firefox', '--test-path', '/Users/cltbld/talos-slave/test/build/tests/peptest/tests/firefox/firefox_all.ini', '--timeout', '60', '--tracer-threshold', '50', '--tracer-interval', '10', '--proxy', '/Users/cltbld/talos-slave/test/build/tests/peptest/tests/firefox/server-locations.txt', '--proxy-host-dirs', '--server-path', '/Users/cltbld/talos-slave/test/build/tests/peptest/tests/firefox/server', '--log-level', 'INFO', '--iterations', '10']
06:15:44     INFO -  PEP TEST-START | test_contextMenu.js
06:15:44  WARNING -  PEP WARNING    | test_contextMenu.js | content_reload | unresponsive time: 224 ms
06:15:44  WARNING -  PEP WARNING    | test_contextMenu.js | chrome_navigation | unresponsive time: 183 ms
06:15:44  WARNING -  PEP WARNING    | test_contextMenu.js | chrome_navigation | unresponsive time: 77 ms
06:15:44  WARNING -  PEP WARNING    | MOZMILL fail {"message":"[JavaScript Error: \"TelemetryStopwatch: key \"FX_SESSION_RESTORE_COLLECT_DATA_MS\" was already initialized\" {file: \"resource://gre/modules/TelemetryStopwatch.jsm\" line: 53}]"}
06:15:44    ERROR -  PEP TEST-UNEXPECTED-FAIL | test_contextMenu.js | fail threshold: 0.0 < metric: 89.594
06:15:44     INFO -  PEP TEST-END   | test_contextMenu.js | finished in: 4565 ms
06:15:44     INFO -  PEP TEST-START | test_openBlankTab.js
06:15:44  WARNING -  PEP WARNING    | test_openBlankTab.js | open_blank_tab | unresponsive time: 78 ms
06:15:44    ERROR -  PEP TEST-UNEXPECTED-FAIL | test_openBlankTab.js | fail threshold: 0.0 < metric: 6.084
06:15:44     INFO -  PEP TEST-END   | test_openBlankTab.js | finished in: 454 ms
06:15:44     INFO -  PEP TEST-START | test_openBookmarksMenu.js
06:15:44  WARNING -  PEP WARNING    | test_openBookmarksMenu.js | scroll_bookmarks | unresponsive time: 74 ms
06:15:44    ERROR -  PEP TEST-UNEXPECTED-FAIL | test_openBookmarksMenu.js | fail threshold: 0.0 < metric: 5.476
06:15:44     INFO -  PEP TEST-END   | test_openBookmarksMenu.js | finished in: 763 ms
06:15:44     INFO -  PEP TEST-START | test_openWindow.js
06:15:44     INFO -  PEP TEST-PASS  | test_openWindow.js | fail threshold: 0.0 >= metric: 0
06:15:44     INFO -  PEP TEST-END   | test_openWindow.js | finished in: 182 ms
06:15:44     INFO -  PEP TEST-START | test_resizeWindow.js
06:15:44    ERROR -  PEP ERROR      | test_resizeWindow.js | ReferenceError: test is not defined
06:15:44     INFO -  PEP TEST-START | test_contextMenu.js
06:15:44  WARNING -  PEP WARNING    | test_contextMenu.js | resize_to | unresponsive time: 125 ms
06:15:44  WARNING -  PEP WARNING    | test_contextMenu.js | content_reload | unresponsive time: 125 ms
06:15:44  WARNING -  PEP WARNING    | MOZMILL fail {"message":"[JavaScript Error: \"TelemetryStopwatch: key \"FX_SESSION_RESTORE_COLLECT_DATA_MS\" was already initialized\" {file: \"resource://gre/modules/TelemetryStopwatch.jsm\" line: 53}]"}
06:15:44  WARNING -  PEP WARNING    | test_contextMenu.js | chrome_navigation | unresponsive time: 85 ms
06:15:44    ERROR -  PEP TEST-UNEXPECTED-FAIL | test_contextMenu.js | fail threshold: 0.0 < metric: 38.475
06:15:44     INFO -  PEP TEST-END   | test_contextMenu.js | finished in: 2331 ms
06:15:44     INFO -  PEP TEST-START | test_openBlankTab.js
06:15:44     INFO -  PEP TEST-PASS  | test_openBlankTab.js | fail threshold: 0.0 >= metric: 0
06:15:44     INFO -  PEP TEST-END   | test_openBlankTab.js | finished in: 505 ms
06:15:44     INFO -  PEP TEST-START | test_openBookmarksMenu.js
06:15:44  WARNING -  PEP WARNING    | test_openBookmarksMenu.js | show_all_bookmarks | unresponsive time: 138 ms
06:15:44    ERROR -  PEP TEST-UNEXPECTED-FAIL | test_openBookmarksMenu.js | fail threshold: 0.0 < metric: 19.044
06:15:44     INFO -  PEP TEST-END   | test_openBookmarksMenu.js | finished in: 700 ms
06:15:44     INFO -  PEP TEST-START | test_openWindow.js
06:15:44     INFO -  PEP TEST-PASS  | test_openWindow.js | fail threshold: 0.0 >= metric: 0
06:15:44     INFO -  PEP TEST-END   | test_openWindow.js | finished in: 217 ms
06:15:44     INFO -  PEP TEST-START | test_resizeWindow.js
06:15:44    ERROR -  PEP ERROR      | test_resizeWindow.js | TypeError: window is null
06:15:44  WARNING -  PEP WARNING    | MOZMILL fail {"message":"[JavaScript Error: \"TelemetryStopwatch: key \"FX_SESSION_RESTORE_COLLECT_DATA_MS\" was already initialized\" {file: \"resource://gre/modules/TelemetryStopwatch.jsm\" line: 53}]"}
06:15:44     INFO -  PEP TEST-START | test_contextMenu.js
06:15:44  WARNING -  PEP WARNING    | test_contextMenu.js | content_reload | unresponsive time: 130 ms
06:15:44    ERROR -  PEP TEST-UNEXPECTED-FAIL | test_contextMenu.js | fail threshold: 0.0 < metric: 16.9
06:15:44     INFO -  PEP TEST-END   | test_contextMenu.js | finished in: 2346 ms
06:15:44     INFO -  PEP TEST-START | test_openBlankTab.js
06:15:44     INFO -  PEP TEST-PASS  | test_openBlankTab.js | fail threshold: 0.0 >= metric: 0
06:15:44     INFO -  PEP TEST-END   | test_openBlankTab.js | finished in: 506 ms
06:15:44     INFO -  PEP TEST-START | test_openBookmarksMenu.js
06:15:44  WARNING -  PEP WARNING    | MOZMILL fail {"message":"[JavaScript Error: \"TelemetryStopwatch: key \"FX_SESSION_RESTORE_COLLECT_DATA_MS\" was already initialized\" {file: \"resource://gre/modules/TelemetryStopwatch.jsm\" line: 53}]"}
06:15:44  WARNING -  PEP WARNING    | test_openBookmarksMenu.js | show_all_bookmarks | unresponsive time: 128 ms
06:15:44    ERROR -  PEP TEST-UNEXPECTED-FAIL | test_openBookmarksMenu.js | fail threshold: 0.0 < metric: 16.384
06:15:44     INFO -  PEP TEST-END   | test_openBookmarksMenu.js | finished in: 682 ms
06:15:44     INFO -  PEP TEST-START | test_openWindow.js
06:15:44     INFO -  PEP TEST-PASS  | test_openWindow.js | fail threshold: 0.0 >= metric: 0
06:15:44     INFO -  PEP TEST-END   | test_openWindow.js | finished in: 219 ms
06:15:44     INFO -  PEP TEST-START | test_resizeWindow.js
06:15:44    ERROR -  PEP ERROR      | test_resizeWindow.js | TypeError: window is null
06:15:44     INFO -  PEP TEST-START | test_contextMenu.js
06:15:44  WARNING -  PEP WARNING    | test_contextMenu.js | content_reload | unresponsive time: 242 ms
06:15:44  WARNING -  PEP WARNING    | test_contextMenu.js | content_scroll | unresponsive time: 50 ms
06:15:44  WARNING -  PEP WARNING    | MOZMILL fail {"message":"[JavaScript Error: \"TelemetryStopwatch: key \"FX_SESSION_RESTORE_COLLECT_DATA_MS\" was already initialized\" {file: \"resource://gre/modules/TelemetryStopwatch.jsm\" line: 53}]"}
06:15:44    ERROR -  PEP TEST-UNEXPECTED-FAIL | test_contextMenu.js | fail threshold: 0.0 < metric: 61.064
06:15:44     INFO -  PEP TEST-END   | test_contextMenu.js | finished in: 2312 ms
06:15:44     INFO -  PEP TEST-START | test_openBlankTab.js
06:15:44     INFO -  PEP TEST-PASS  | test_openBlankTab.js | fail threshold: 0.0 >= metric: 0
06:15:44     INFO -  PEP TEST-END   | test_openBlankTab.js | finished in: 32 ms
06:15:44     INFO -  PEP TEST-START | test_openBookmarksMenu.js
06:15:44    ERROR -  PEP ERROR      | test_openBookmarksMenu.js | TimeoutError: controller.waitForPageLoad(): Timeout waiting for page loaded.
06:15:44     INFO -  PEP TEST-START | test_openWindow.js
06:15:44     INFO -  PEP TEST-PASS  | test_openWindow.js | fail threshold: 0.0 >= metric: 0
06:15:44     INFO -  PEP TEST-END   | test_openWindow.js | finished in: 234 ms
06:15:44  WARNING -  PEP WARNING    | MOZMILL fail {"message":"[JavaScript Error: \"TypeError: can't access dead object\" {file: \"resource://mozmill/stdlib/utils.js\" line: 102}]"}
06:15:44  WARNING -  PEP WARNING    | MOZMILL fail {"message":"[JavaScript Error: \"TypeError: can't access dead object\" {file: \"resource://mozmill/stdlib/utils.js\" line: 102}]"}

then the last line repeats many many times.
Depends on: 759000
Filed bug 759000 on "don't do that," but, shouldn't buildbot be timing out the job at some point?
Summary: Mac Peptest jobs intermittently either hang for hours or days, or die without admitting they died → Peptest jobs don't appear to have a timeout
That particular step has 
  timeout: 1200
which the repeated error messages will avoid, and
  maxTime: None
so there's no upper bound. The builder is based on the ScriptFactory class, so a script_maxtime argument can be set, but I don't see the necessary plumbing to get that from a config to the factory and will defer to someone who knows this code better.
Killed four more, at 10, 11, 12 and 23 hours. If we run short of Mac test slaves, that'll be a good place to start looking for them.
Assignee: nobody → aki
Status: NEW → ASSIGNED
Attachment #627996 - Flags: review?(nrthomas)
Attachment #627996 - Flags: review?(nrthomas) → review+
Comment on attachment 627996 [details] [diff] [review]
set a default maxtime of 1200 seconds for mozharness test suites

1200 seconds might be aggressive; we may want to bump that up later.
However, currently good peptest runs seem to happen in a matter of a few minutes, so we're fine for the moment.
http://hg.mozilla.org/build/buildbotcustom/rev/5dcbdeb5d71f
Attachment #627996 - Flags: checked-in+
This should be live.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: