Closed Bug 1046322 Opened 10 years ago Closed 10 years ago

All trees closed due to AWS builder lag

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
blocker

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: RyanVM, Unassigned)

References

Details

Callek says we're having trouble starting new instances and builds are starting to fall way behind. All trees closed.
so our aws_watch_pending cron has been alerting for ~ 2 hours now...

started between 1:47pm ET and 1:49pm ET

with the following traceback:

Traceback (most recent call last):
  File "aws_watch_pending.py", line 546, in <module>
    instance_type_changes=config.get("instance_type_changes", {})
  File "aws_watch_pending.py", line 400, in aws_watch_pending
    all_instances = aws_get_all_instances(regions)
  File "/builds/aws_manager/cloud-tools/cloudtools/aws/__init__.py", line 127, in aws_get_all_instances
    region_instances = conn.get_only_instances()
  File "/builds/aws_manager/lib/python2.7/site-packages/boto/ec2/connection.py", line 608, in get_only_instances
    max_results=max_results)
  File "/builds/aws_manager/lib/python2.7/site-packages/boto/ec2/connection.py", line 656, in get_all_reservations
    [('item', Reservation)], verb='POST')
  File "/builds/aws_manager/lib/python2.7/site-packages/boto/connection.py", line 1143, in get_list
    response = self.make_request(action, params, path, verb)
  File "/builds/aws_manager/lib/python2.7/site-packages/boto/connection.py", line 1089, in make_request
    return self._mexe(http_request)
  File "/builds/aws_manager/lib/python2.7/site-packages/boto/connection.py", line 1002, in _mexe
    raise BotoServerError(response.status, response.reason, body)
boto.exception.BotoServerError: BotoServerError: 503 Service Unavailable
<?xml version="1.0" encoding="UTF-8"?>
<Response><Errors><Error><Code>Unavailable</Code><Message>The service is unavailable. Please try again shortly.</Message></Error></Errors><RequestID>19e5b267-cac2-4a52-a18c-ec8626785fbc</RequestID></Response>
Just pushed a puppet change to disable us-west-2 from cron's...

Off to force puppet

https://hg.mozilla.org/build/puppet/rev/8a97a1d8c6d2
https://hg.mozilla.org/build/puppet/rev/e904c52de01e
Info: Applying configuration version 'e904c52de01e'
...
Notice: Finished catalog run in 85.60 seconds

So the next run of the watch_pending, et-al should be good and using us-east only
Summary: All trees due to AWS builder lag → All trees closed due to AWS builder lag
Builds are picking up again and looking green. Everything reopened at 14:20 PT.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.