Closed Bug 1739397 Opened 3 years ago Closed 2 years ago

Hardware for fxrecord startup performance harness in Toronto Office is unavailable

Categories

(Infrastructure & Operations :: RelOps: General, defect)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: davehunt, Unassigned)

References

Details

According to bug 1738369, no workers are available for gecko-t-fxrecorder, causing tasks to fail with deadline exceeded.

This Jira ticket indicates the following hardware is set up in the Toronto server room:

  • fxrecorder01.corp.tor1.mozilla.com (10.242.24.97) - asset tag - 09390
  • fxrunner01.corp.tor1.mozilla.com (10.242.24.96) - asset tag - 35944

Could someone please investigate the status of these boxes?

Blocks: 1739403

:markco do you know about these machines for fxrecord startup? re: https://bugzilla.mozilla.org/show_bug.cgi?id=1554314#c8 :fubar had mentioned you, but then it looks like :barret stood up the machines

Flags: needinfo?(mcornmesser)

I think jlin has also had a hand in setting these up in the Toronto office.

Flags: needinfo?(jlin)

Yes I set these up over in the toronto server room as indicated in the jira ticket.

There is an upcoming power maintenance this weekend at the Toronto office, and I will be on site on Monday Nov 8 to do the powerup - I can check these hosts to make sure they are back online on Monday Nov 8 after the office is back up, probably around end of day on Nov 8.

Flags: needinfo?(jlin)

(In reply to Dave House [:dhouse] from comment #1)

:markco do you know about these machines for fxrecord startup? re: https://bugzilla.mozilla.org/show_bug.cgi?id=1554314#c8 :fubar had mentioned you, but then it looks like :barret stood up the machines

There was discussion on us taking over management of this, but we never did.

Flags: needinfo?(mcornmesser)

I powered on both fxrecorder01 as well as fxrunner01, both seem to be doing some reboots (probably queued up updates, processes, or tasks?) I don't know if it's working correctly, but usually I will see the led on the HD60S capture device light up when it's doing a record - so far it hasn't lit up yet, I'll come back to them in 30 min- 1 hour to see if they are done with their reboot cycles

It looks like it is working.
I retriggered this task (from https://bugzilla.mozilla.org/show_bug.cgi?id=1738369#c0) and it is running now: https://firefox-ci-tc.services.mozilla.com/tasks/fzahkkV1TQijYvaPrN-Flw

excellent sounds like it's working - I didn't check in on it since my last comment 5 but if it's giving you successful runs that should be good.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED

Hi, looks like that machine is not working. Jonathan, any chance you could check it out again?
Job in TH that's with deadline exceeded is here. Taskcluster link.

Flags: needinfo?(jlin)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

The machine will be down until air conditioning in the room got fixed.

Bug 1743015 was filed for taking the tests offline during the outage, but it didn't hit m-c quick enough to stop a task from getting queued. Bug 1743026 will bring the tests back online once the Toronto office outage is over.

Flags: needinfo?(jlin)
See Also: → 1743015, 1743026

Re-resolving as FIXED because the tasks using these machines are not running so it shouldn't be causing further problems.

Status: REOPENED → RESOLVED
Closed: 3 years ago2 years ago
Resolution: --- → FIXED

A few jobs timed out since the machines were turned back on. I just restarted the worker daemon on the fxrecorder01 machine and it seems to be accepting jobs again.

See Also: → 1754563
You need to log in before you can comment on or make changes to this bug.