OCC terminates instances for missing task drives before ec2config has mapped task drives
Categories
(Infrastructure & Operations :: RelOps: OpenCloudConfig, task)
Tracking
(Not tracked)
People
(Reporter: grenade, Assigned: grenade)
References
Details
Attachments
(2 files, 1 obsolete file)
for some reason on newly deployed win2012 amis, the occ task drive check is happening before ec2config has completed drive mapping, causing a race condition that commonly resolves badly when occ terminates the instance right before ec2config would have completed the mappings.
| Assignee | ||
Comment 1•5 years ago
|
||
patch: https://github.com/mozilla-releng/OpenCloudConfig/commit/745e0f349126ffe37a838c9e6b1f5bcfae266044
on win2012 worker types:
- wait 5 minutes for ec2config mappings
- if mappings do not occur but mapping config exists, format D drive to change it's label from '' (blank) to 'task' and reboot to force ec2config to try again
| Assignee | ||
Comment 2•5 years ago
|
||
new patch: https://github.com/mozilla-releng/OpenCloudConfig/commit/83ededbae1e83f1b03c47537c084750c6828c05a
image build: https://firefox-ci-tc.services.mozilla.com/tasks/ZIJ0hSRLT8ah6_zQc2pCbg (unfortunately no livelogs, since the image build worker was made l3)
my earlier patch neglected to check for a cache drive (y:) and only fixed the missing task drive (z:). i still don't understand why ec2-config drive mappings are occurring later than we are used to or why the 2012 configs only contain two drives instead of 3 but i'm probably forgetting something that changed in the 6 months or so since we last updated these amis.
we'll also need to figure out how to update ci-config to include the 3rd drive (c:, y:, z:). win 7 &/or 10 should have a config we can copy.
| Assignee | ||
Comment 3•5 years ago
|
||
Comment 5•5 years ago
|
||
Aiui, status: gecko-3/b-win2012 should be all good. t-win10-64-source may still need some attention, but aiui those are all tier 2 tasks; relops will look at that tomorrow.
Comment 6•5 years ago
|
||
Comment 8•5 years ago
|
||
This failure still hits gecko-1/b-win2012, e.g. https://treeherder.mozilla.org/#/jobs?repo=try&selectedTaskRun=B4pTNkiQR3m3xbasQ-ej4w.0&revision=a61c4514299fced454b3bb7e7b86de2fd988c194
Comment 10•5 years ago
|
||
Mark, can you assist with the gecko-1/b-win2012 workers? It blocks testing current mozilla-central code as mozilla-beta (last successful build is from Tuesday) and that merge is on Monday.
| Assignee | ||
Comment 11•5 years ago
|
||
gecko-1/b-win2012 image rebuilds in progress:
- https://github.com/mozilla-releng/OpenCloudConfig/commit/af1b426
- https://firefox-ci-tc.services.mozilla.com/tasks/aqcii1bfRG-CAwzZSkYTRg
i'll follow up with a ci-config patch when we've seen working examples using the updated images.
| Assignee | ||
Updated•5 years ago
|
| Assignee | ||
Comment 12•5 years ago
|
||
| Comment hidden (Intermittent Failures Robot) |
Updated•5 years ago
|
| Assignee | ||
Updated•5 years ago
|
Updated•3 years ago
|
Description
•