Peptest jobs don't appear to have a timeout

RESOLVED FIXED

Status

Release Engineering
General Automation
RESOLVED FIXED
6 years ago
4 years ago

People

(Reporter: philor, Assigned: aki)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment)

(Reporter)

Description

6 years ago
Last night, I killed several Mac Peptest runs through self-serve that claimed to have been running for one or two days, expecting that they were zombies left over from last week's db issues. However, killing them appeared to work, which mostly seems to mean that the jobs were actually still live.

Today, there's another one which claims to have been running for 6 hours so far, in https://tbpl.mozilla.org/?noignore=1&tree=Mozilla-Inbound&rev=4ce695c26fc5&jobname=10.5.8%20mozilla-inbound%20opt%20test%20peptest which might mean that it really has been tying up a slave for 6 hours, or might mean it died some time ago and didn't tell anyone, or, dunno, maybe it never really happened at all.
The job is genuinely still running, the tail of the log looks like this:

06:14:29     INFO - #####
06:14:29     INFO - ##### Running run-peptest step.
06:14:29     INFO - #####
06:14:29     INFO - Running command: ['/Users/cltbld/talos-slave/test/build/venv/bin/python', '-u', '/Users/cltbld/talos-slave/test/build/tests/peptest/peptest/runpeptests.py', '--app', 'firefox', '--binary', '/Users/cltbld/talos-slave/test/build/application/FirefoxNightly.app/Contents/MacOS/firefox', '--test-path', '/Users/cltbld/talos-slave/test/build/tests/peptest/tests/firefox/firefox_all.ini', '--timeout', '60', '--tracer-threshold', '50', '--tracer-interval', '10', '--proxy', '/Users/cltbld/talos-slave/test/build/tests/peptest/tests/firefox/server-locations.txt', '--proxy-host-dirs', '--server-path', '/Users/cltbld/talos-slave/test/build/tests/peptest/tests/firefox/server', '--log-level', 'INFO', '--iterations', '10']
06:15:44     INFO -  PEP TEST-START | test_contextMenu.js
06:15:44  WARNING -  PEP WARNING    | test_contextMenu.js | content_reload | unresponsive time: 224 ms
06:15:44  WARNING -  PEP WARNING    | test_contextMenu.js | chrome_navigation | unresponsive time: 183 ms
06:15:44  WARNING -  PEP WARNING    | test_contextMenu.js | chrome_navigation | unresponsive time: 77 ms
06:15:44  WARNING -  PEP WARNING    | MOZMILL fail {"message":"[JavaScript Error: \"TelemetryStopwatch: key \"FX_SESSION_RESTORE_COLLECT_DATA_MS\" was already initialized\" {file: \"resource://gre/modules/TelemetryStopwatch.jsm\" line: 53}]"}
06:15:44    ERROR -  PEP TEST-UNEXPECTED-FAIL | test_contextMenu.js | fail threshold: 0.0 < metric: 89.594
06:15:44     INFO -  PEP TEST-END   | test_contextMenu.js | finished in: 4565 ms
06:15:44     INFO -  PEP TEST-START | test_openBlankTab.js
06:15:44  WARNING -  PEP WARNING    | test_openBlankTab.js | open_blank_tab | unresponsive time: 78 ms
06:15:44    ERROR -  PEP TEST-UNEXPECTED-FAIL | test_openBlankTab.js | fail threshold: 0.0 < metric: 6.084
06:15:44     INFO -  PEP TEST-END   | test_openBlankTab.js | finished in: 454 ms
06:15:44     INFO -  PEP TEST-START | test_openBookmarksMenu.js
06:15:44  WARNING -  PEP WARNING    | test_openBookmarksMenu.js | scroll_bookmarks | unresponsive time: 74 ms
06:15:44    ERROR -  PEP TEST-UNEXPECTED-FAIL | test_openBookmarksMenu.js | fail threshold: 0.0 < metric: 5.476
06:15:44     INFO -  PEP TEST-END   | test_openBookmarksMenu.js | finished in: 763 ms
06:15:44     INFO -  PEP TEST-START | test_openWindow.js
06:15:44     INFO -  PEP TEST-PASS  | test_openWindow.js | fail threshold: 0.0 >= metric: 0
06:15:44     INFO -  PEP TEST-END   | test_openWindow.js | finished in: 182 ms
06:15:44     INFO -  PEP TEST-START | test_resizeWindow.js
06:15:44    ERROR -  PEP ERROR      | test_resizeWindow.js | ReferenceError: test is not defined
06:15:44     INFO -  PEP TEST-START | test_contextMenu.js
06:15:44  WARNING -  PEP WARNING    | test_contextMenu.js | resize_to | unresponsive time: 125 ms
06:15:44  WARNING -  PEP WARNING    | test_contextMenu.js | content_reload | unresponsive time: 125 ms
06:15:44  WARNING -  PEP WARNING    | MOZMILL fail {"message":"[JavaScript Error: \"TelemetryStopwatch: key \"FX_SESSION_RESTORE_COLLECT_DATA_MS\" was already initialized\" {file: \"resource://gre/modules/TelemetryStopwatch.jsm\" line: 53}]"}
06:15:44  WARNING -  PEP WARNING    | test_contextMenu.js | chrome_navigation | unresponsive time: 85 ms
06:15:44    ERROR -  PEP TEST-UNEXPECTED-FAIL | test_contextMenu.js | fail threshold: 0.0 < metric: 38.475
06:15:44     INFO -  PEP TEST-END   | test_contextMenu.js | finished in: 2331 ms
06:15:44     INFO -  PEP TEST-START | test_openBlankTab.js
06:15:44     INFO -  PEP TEST-PASS  | test_openBlankTab.js | fail threshold: 0.0 >= metric: 0
06:15:44     INFO -  PEP TEST-END   | test_openBlankTab.js | finished in: 505 ms
06:15:44     INFO -  PEP TEST-START | test_openBookmarksMenu.js
06:15:44  WARNING -  PEP WARNING    | test_openBookmarksMenu.js | show_all_bookmarks | unresponsive time: 138 ms
06:15:44    ERROR -  PEP TEST-UNEXPECTED-FAIL | test_openBookmarksMenu.js | fail threshold: 0.0 < metric: 19.044
06:15:44     INFO -  PEP TEST-END   | test_openBookmarksMenu.js | finished in: 700 ms
06:15:44     INFO -  PEP TEST-START | test_openWindow.js
06:15:44     INFO -  PEP TEST-PASS  | test_openWindow.js | fail threshold: 0.0 >= metric: 0
06:15:44     INFO -  PEP TEST-END   | test_openWindow.js | finished in: 217 ms
06:15:44     INFO -  PEP TEST-START | test_resizeWindow.js
06:15:44    ERROR -  PEP ERROR      | test_resizeWindow.js | TypeError: window is null
06:15:44  WARNING -  PEP WARNING    | MOZMILL fail {"message":"[JavaScript Error: \"TelemetryStopwatch: key \"FX_SESSION_RESTORE_COLLECT_DATA_MS\" was already initialized\" {file: \"resource://gre/modules/TelemetryStopwatch.jsm\" line: 53}]"}
06:15:44     INFO -  PEP TEST-START | test_contextMenu.js
06:15:44  WARNING -  PEP WARNING    | test_contextMenu.js | content_reload | unresponsive time: 130 ms
06:15:44    ERROR -  PEP TEST-UNEXPECTED-FAIL | test_contextMenu.js | fail threshold: 0.0 < metric: 16.9
06:15:44     INFO -  PEP TEST-END   | test_contextMenu.js | finished in: 2346 ms
06:15:44     INFO -  PEP TEST-START | test_openBlankTab.js
06:15:44     INFO -  PEP TEST-PASS  | test_openBlankTab.js | fail threshold: 0.0 >= metric: 0
06:15:44     INFO -  PEP TEST-END   | test_openBlankTab.js | finished in: 506 ms
06:15:44     INFO -  PEP TEST-START | test_openBookmarksMenu.js
06:15:44  WARNING -  PEP WARNING    | MOZMILL fail {"message":"[JavaScript Error: \"TelemetryStopwatch: key \"FX_SESSION_RESTORE_COLLECT_DATA_MS\" was already initialized\" {file: \"resource://gre/modules/TelemetryStopwatch.jsm\" line: 53}]"}
06:15:44  WARNING -  PEP WARNING    | test_openBookmarksMenu.js | show_all_bookmarks | unresponsive time: 128 ms
06:15:44    ERROR -  PEP TEST-UNEXPECTED-FAIL | test_openBookmarksMenu.js | fail threshold: 0.0 < metric: 16.384
06:15:44     INFO -  PEP TEST-END   | test_openBookmarksMenu.js | finished in: 682 ms
06:15:44     INFO -  PEP TEST-START | test_openWindow.js
06:15:44     INFO -  PEP TEST-PASS  | test_openWindow.js | fail threshold: 0.0 >= metric: 0
06:15:44     INFO -  PEP TEST-END   | test_openWindow.js | finished in: 219 ms
06:15:44     INFO -  PEP TEST-START | test_resizeWindow.js
06:15:44    ERROR -  PEP ERROR      | test_resizeWindow.js | TypeError: window is null
06:15:44     INFO -  PEP TEST-START | test_contextMenu.js
06:15:44  WARNING -  PEP WARNING    | test_contextMenu.js | content_reload | unresponsive time: 242 ms
06:15:44  WARNING -  PEP WARNING    | test_contextMenu.js | content_scroll | unresponsive time: 50 ms
06:15:44  WARNING -  PEP WARNING    | MOZMILL fail {"message":"[JavaScript Error: \"TelemetryStopwatch: key \"FX_SESSION_RESTORE_COLLECT_DATA_MS\" was already initialized\" {file: \"resource://gre/modules/TelemetryStopwatch.jsm\" line: 53}]"}
06:15:44    ERROR -  PEP TEST-UNEXPECTED-FAIL | test_contextMenu.js | fail threshold: 0.0 < metric: 61.064
06:15:44     INFO -  PEP TEST-END   | test_contextMenu.js | finished in: 2312 ms
06:15:44     INFO -  PEP TEST-START | test_openBlankTab.js
06:15:44     INFO -  PEP TEST-PASS  | test_openBlankTab.js | fail threshold: 0.0 >= metric: 0
06:15:44     INFO -  PEP TEST-END   | test_openBlankTab.js | finished in: 32 ms
06:15:44     INFO -  PEP TEST-START | test_openBookmarksMenu.js
06:15:44    ERROR -  PEP ERROR      | test_openBookmarksMenu.js | TimeoutError: controller.waitForPageLoad(): Timeout waiting for page loaded.
06:15:44     INFO -  PEP TEST-START | test_openWindow.js
06:15:44     INFO -  PEP TEST-PASS  | test_openWindow.js | fail threshold: 0.0 >= metric: 0
06:15:44     INFO -  PEP TEST-END   | test_openWindow.js | finished in: 234 ms
06:15:44  WARNING -  PEP WARNING    | MOZMILL fail {"message":"[JavaScript Error: \"TypeError: can't access dead object\" {file: \"resource://mozmill/stdlib/utils.js\" line: 102}]"}
06:15:44  WARNING -  PEP WARNING    | MOZMILL fail {"message":"[JavaScript Error: \"TypeError: can't access dead object\" {file: \"resource://mozmill/stdlib/utils.js\" line: 102}]"}

then the last line repeats many many times.
(Reporter)

Updated

6 years ago
Depends on: 759000
(Reporter)

Comment 2

6 years ago
Filed bug 759000 on "don't do that," but, shouldn't buildbot be timing out the job at some point?
Summary: Mac Peptest jobs intermittently either hang for hours or days, or die without admitting they died → Peptest jobs don't appear to have a timeout
That particular step has 
  timeout: 1200
which the repeated error messages will avoid, and
  maxTime: None
so there's no upper bound. The builder is based on the ScriptFactory class, so a script_maxtime argument can be set, but I don't see the necessary plumbing to get that from a config to the factory and will defer to someone who knows this code better.
(Reporter)

Comment 4

6 years ago
Killed four more, at 10, 11, 12 and 23 hours. If we run short of Mac test slaves, that'll be a good place to start looking for them.
(Assignee)

Comment 5

6 years ago
Created attachment 627996 [details] [diff] [review]
set a default maxtime of 1200 seconds for mozharness test suites
Assignee: nobody → aki
Status: NEW → ASSIGNED
Attachment #627996 - Flags: review?(nrthomas)
Attachment #627996 - Flags: review?(nrthomas) → review+
(Assignee)

Comment 6

6 years ago
Comment on attachment 627996 [details] [diff] [review]
set a default maxtime of 1200 seconds for mozharness test suites

1200 seconds might be aggressive; we may want to bump that up later.
However, currently good peptest runs seem to happen in a matter of a few minutes, so we're fine for the moment.
http://hg.mozilla.org/build/buildbotcustom/rev/5dcbdeb5d71f
Attachment #627996 - Flags: checked-in+
(Assignee)

Comment 7

6 years ago
This should be live.
Status: ASSIGNED → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
You need to log in before you can comment on or make changes to this bug.