Closed Bug 1671033 Opened 5 years ago Closed 5 years ago

OCC terminates instances for missing task drives before ec2config has mapped task drives

Categories

(Infrastructure & Operations :: RelOps: OpenCloudConfig, task)

Production
x86_64
Windows

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: grenade, Assigned: grenade)

References

Details

Attachments

(2 files, 1 obsolete file)

for some reason on newly deployed win2012 amis, the occ task drive check is happening before ec2config has completed drive mapping, causing a race condition that commonly resolves badly when occ terminates the instance right before ec2config would have completed the mappings.

patch: https://github.com/mozilla-releng/OpenCloudConfig/commit/745e0f349126ffe37a838c9e6b1f5bcfae266044

on win2012 worker types:

  • wait 5 minutes for ec2config mappings
  • if mappings do not occur but mapping config exists, format D drive to change it's label from '' (blank) to 'task' and reboot to force ec2config to try again

new patch: https://github.com/mozilla-releng/OpenCloudConfig/commit/83ededbae1e83f1b03c47537c084750c6828c05a
image build: https://firefox-ci-tc.services.mozilla.com/tasks/ZIJ0hSRLT8ah6_zQc2pCbg (unfortunately no livelogs, since the image build worker was made l3)

my earlier patch neglected to check for a cache drive (y:) and only fixed the missing task drive (z:). i still don't understand why ec2-config drive mappings are occurring later than we are used to or why the 2012 configs only contain two drives instead of 3 but i'm probably forgetting something that changed in the 6 months or so since we last updated these amis.

we'll also need to figure out how to update ci-config to include the 3rd drive (c:, y:, z:). win 7 &/or 10 should have a config we can copy.

Aiui, status: gecko-3/b-win2012 should be all good. t-win10-64-source may still need some attention, but aiui those are all tier 2 tasks; relops will look at that tomorrow.

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Pushed by asasaki@mozilla.com: https://hg.mozilla.org/ci/ci-configuration/rev/75b8482842f1 Follow-up: add drive to comm3/b-win2012 workers. r=aki

Mark, can you assist with the gecko-1/b-win2012 workers? It blocks testing current mozilla-central code as mozilla-beta (last successful build is from Tuesday) and that merge is on Monday.

Flags: needinfo?(mcornmesser)

gecko-1/b-win2012 image rebuilds in progress:

i'll follow up with a ci-config patch when we've seen working examples using the updated images.

Flags: needinfo?(rthijssen)
Flags: needinfo?(mcornmesser)
Attachment #9182057 - Attachment description: rebuilt 2012 amis with: - updated hg cert pin - patched task/cache drive mapping → rebuilt 2012 amiswith:- updated hg cert pin- patched task/cache drive mapping
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
See Also: → 1695117
Attachment #9182057 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: