Hardware for fxrecord startup performance harness in Toronto Office is unavailable
Categories
(Infrastructure & Operations :: RelOps: General, defect)
Tracking
(Not tracked)
People
(Reporter: davehunt, Unassigned)
References
Details
According to bug 1738369, no workers are available for gecko-t-fxrecorder, causing tasks to fail with deadline exceeded.
This Jira ticket indicates the following hardware is set up in the Toronto server room:
- fxrecorder01.corp.tor1.mozilla.com (10.242.24.97) - asset tag - 09390
- fxrunner01.corp.tor1.mozilla.com (10.242.24.96) - asset tag - 35944
Could someone please investigate the status of these boxes?
:markco do you know about these machines for fxrecord startup? re: https://bugzilla.mozilla.org/show_bug.cgi?id=1554314#c8 :fubar had mentioned you, but then it looks like :barret stood up the machines
Comment 2•3 years ago
|
||
I think jlin has also had a hand in setting these up in the Toronto office.
Comment 3•3 years ago
•
|
||
Yes I set these up over in the toronto server room as indicated in the jira ticket.
There is an upcoming power maintenance this weekend at the Toronto office, and I will be on site on Monday Nov 8 to do the powerup - I can check these hosts to make sure they are back online on Monday Nov 8 after the office is back up, probably around end of day on Nov 8.
Comment 4•3 years ago
|
||
(In reply to Dave House [:dhouse] from comment #1)
:markco do you know about these machines for fxrecord startup? re: https://bugzilla.mozilla.org/show_bug.cgi?id=1554314#c8 :fubar had mentioned you, but then it looks like :barret stood up the machines
There was discussion on us taking over management of this, but we never did.
I powered on both fxrecorder01 as well as fxrunner01, both seem to be doing some reboots (probably queued up updates, processes, or tasks?) I don't know if it's working correctly, but usually I will see the led on the HD60S capture device light up when it's doing a record - so far it hasn't lit up yet, I'll come back to them in 30 min- 1 hour to see if they are done with their reboot cycles
It looks like it is working.
I retriggered this task (from https://bugzilla.mozilla.org/show_bug.cgi?id=1738369#c0) and it is running now: https://firefox-ci-tc.services.mozilla.com/tasks/fzahkkV1TQijYvaPrN-Flw
excellent sounds like it's working - I didn't check in on it since my last comment 5 but if it's giving you successful runs that should be good.
![]() |
||
Updated•3 years ago
|
Comment 8•3 years ago
•
|
||
Hi, looks like that machine is not working. Jonathan, any chance you could check it out again?
Job in TH that's with deadline exceeded is here. Taskcluster link.
Updated•3 years ago
|
![]() |
||
Comment 9•3 years ago
|
||
The machine will be down until air conditioning in the room got fixed.
Comment 10•3 years ago
|
||
Bug 1743015 was filed for taking the tests offline during the outage, but it didn't hit m-c quick enough to stop a task from getting queued. Bug 1743026 will bring the tests back online once the Toronto office outage is over.
Comment 11•3 years ago
|
||
Re-resolving as FIXED because the tasks using these machines are not running so it shouldn't be causing further problems.
Comment 12•3 years ago
|
||
A few jobs timed out since the machines were turned back on. I just restarted the worker daemon on the fxrecorder01 machine and it seems to be accepting jobs again.
Description
•