Closed Bug 1152624 Opened 9 years ago Closed 7 years ago

ami generation in use1 sometimes doesn't copy over to usw2

Categories

(Release Engineering :: General, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: jlund, Unassigned)

References

Details

in "Bug 1149580 - disable AMI generation" I had to manually create and publish new AMIs.

There was some problems doing so: emulator64 and linux64 usw2 newly created AMIs had an old AMI name (they were based off use1 old AMI):

2015-04-08 14:11:11,325 - INFO - AMI spot-tst-linux64-2015-04-08-21-00 (ami-be3e02d6) is ready
2015-04-08 14:11:11,326 - WARNING - Terminating Instance:i-03f8b8d4
2015-04-08 14:11:12,112 - INFO - Copying ami-6ebd9506 (spot-tst-linux64-2015-03-25-08-59) to us-west-2
2015-04-08 14:11:12,113 - INFO - Copying Image:ami-6ebd9506 to us-west-2
2015-04-08 14:11:13,979 - INFO - AMI created
2015-04-08 14:11:13,979 - INFO - ID: ami-f56c47c5, name: None
2015-04-08 14:11:13,979 - INFO - New AMI created. AMI ID: ami-f56c47c5


apparently this is also happening outside of manual runs (via cron):
17:05:44 <•nthomas>  2015-03-20 02:01:47,732 - INFO - AMI spot-tst-linux64-2015-03-20-08-52 (ami-887c53e0) is ready
17:05:46 <•nthomas>  2015-03-20 02:01:48,517 - INFO - Copying ami-021f336a (spot-tst-linux64-2015-03-19-16-11) to us-west-2

to fix I manually copied the AMIs and then re-published just usw2:
>>> linux64_ami = get_ami('us-east-1', 'tst-linux64')
>>> linux64_ami.id
u'ami-be3e02d6'
>>> new_ami_linux64 = copy_ami(linux64_ami, 'us-west-2')
>>> # repeated the same for emulator64
$ aws_publish_amis -r us-west-2

I'm not sure exactly what is happening here. I did notice that AMIs took over an hour to go from pending to finished.

I think get_ami() is not seeing the newly created use1 AMIs sometimes (instead taking the previous one to copy from) and then passing that one to copy_ami(). I guess, at the very least, we should verify that get_ami() is the same ami that was just generated.
See Also: → 1149580
also, turns out you can't hack things and run `aws_publish_amis -r us-west-2` alone as that upsets use1. got 1000s of errors after I did that:
Apr 08 18:48:34 tst-emulator64-spot-195.test.releng.use1.mozilla.com running: post-task hook: /opt/runner/task_hook.py {"try_num": 5, "max_retries": 5, "task": "0-check_ami", "result": "RETRY"}

so I re ran both regions (no -r means use both):
aws_publish_amis
Summary: ami generation in use1 sometimes doesn't copy over to usw-2 → ami generation in use1 sometimes doesn't copy over to usw2
So I think the problem is that the ami is created here:
https://github.com/mozilla/build-cloud-tools/blob/master/cloudtools/scripts/aws_create_instance.py#L187

but isn't returned from that function, so the code further down here:
https://github.com/mozilla/build-cloud-tools/blob/master/cloudtools/scripts/aws_create_instance.py#L294

has to query the list of AMIs to find the one that was just created. I wonder if most of this time this works fine, but sometimes the new AMI isn't ready yet and you end up finding an older one instead.
Hit this again today.
At the post-mortem for the hg.m.o ssl cert change today, I agreed to find an owner for this.
Assignee: nobody → coop
Assignee: coop → nobody
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
Component: Tools → General
You need to log in before you can comment on or make changes to this bug.