Closed Bug 1667441 Opened 4 years ago Closed 3 years ago

Test remove EDID from one MacOS PGO builder

Categories

(Infrastructure & Operations :: RelOps: Posix OS, task)

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dhouse, Assigned: dhouse)

References

Details

We plan to remove the EDID dongles from the mac minis because we are not putting EDID dongles on the new mac mini hardware, R8's, and we plan to replace the workers with R8's.

To prepare for removing the EDID dongles, we need to run PGO builds through 1 or more workers with the EDID removed.

  • The PGO builds do not likely require the display to be a specific resolution, unlike UI tests.
  • The PGO builds are commit level-3 and cannot be triggered through try or manually.
Assignee: nobody → dhouse
Depends on: 1530732

do-it-in-production test plan:

  1. quarantine 1 of the mac PGO workers, t-mojave-r7-471 (https://firefox-ci-tc.services.mozilla.com/provisioners/releng-hardware/worker-types/gecko-3-t-osx-1014)
  2. disconnect the hdmi cable from the back of the worker (mac mini r7)
  3. reboot the worker
  4. un-quarantine the worker
  5. watch for failures in production tasks through the worker for 7-days
    5a. If there are failures on the worker, any admin can quarantine the worker and be un-blocked for CI/release.
    5b. If there are no failures, we can decide then if we remove the EDID dongles from all of the PGO workers at once or if we want to wait through a release cycle and then remove them.

:jmaher what do you think? Also, do you know who else needs to be aware of this test? (I looked at the CC's on bugs like bug 1530732 and I'm not sure who is still around or may be concerned)

Flags: needinfo?(jmaher)

this process makes sense. I was thinking someone from build team, but this has been out of their hands for a while, if there are problems we can dig in and cc people as needed.

Timing wise, I assume this will be next week, there should be no releases scheduled and enough people will be around to help out if there are problems.

Flags: needinfo?(jmaher)

:dhouse was the plan to do this during this week?

Flags: needinfo?(dhouse)

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #3)

:dhouse was the plan to do this during this week?

Sure, I'll ask QTS to remove the EDID from t-mojave-r7-471. (I hadn't planned a date as I was expecting more CC's and input ;)

I quarantined the worker, https://firefox-ci-tc.services.mozilla.com/provisioners/releng-hardware/worker-types/gecko-3-t-osx-1014/workers/mdc1/t-mojave-r7-471
I asked QTS to disconnect the video cable from the back of the machine.
When QTS reports it is disconnected, I'll check the machine and reboot it.

when it is rebooted will it be quarantined?

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #6)

when it is rebooted will it be quarantined?

Thanks for checking! yes, it will stay quarantined when rebooted. The cable has been removed now and I rebooted the machine and will un-quarantine it when I can watch 1-2 tasks to make sure they do not fail (no tasks in the queue right now so I'll check it in the morning).

perfect, lets synchronize on Matrix tomorrow and unquarantine this- hopefully all is well and we can be one step closer to no EDID.

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #8)

perfect, lets synchronize on Matrix tomorrow and unquarantine this- hopefully all is well and we can be one step closer to no EDID.

sounds good!

we un-quarantined 571 this morning and manually re-ran a few tasks from the last 24h into the queue so that the no-EDID, 571, worker would pick up a task.

The first task completed successfully and I've triggered a few more to run through before production tasks reach it.

571 ran 4 tasks successfully with no EDID dongle. I've left it running (not quarantined) so it will take production tasks through the day.

Flags: needinfo?(dhouse)

(In reply to Dave House [:dhouse] from comment #11)

571 ran 4 tasks successfully with no EDID dongle. I've left it running (not quarantined) so it will take production tasks through the day.

"471" (Correction, I entered the wrong number)

And it has not had failures yet: https://firefox-ci-tc.services.mozilla.com/provisioners/releng-hardware/worker-types/gecko-3-t-osx-1014/workers/mdc1/t-mojave-r7-471

I think next week we should remove all the EDID cables from the remaining gecko-3-t-osx-1014 workers.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.