Closed Bug 1051166 Opened 10 years ago Closed 10 years ago

Linux64 non-unified builds aren't starting on mozilla-central

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86_64
Linux
task
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: RyanVM, Unassigned)

References

Details

The linux64 non-unified builds are pending going back to rev 1d6500527f66, which was triggered at 13:30 PDT. Doesn't seem to be affecting any other build types or trees AFAICT.

https://secure.pub.build.mozilla.org/buildapi/self-serve/mozilla-central/rev/1d6500527f66
https://secure.pub.build.mozilla.org/buildapi/self-serve/mozilla-central/rev/a6424bfa8f39
https://secure.pub.build.mozilla.org/buildapi/self-serve/mozilla-central/rev/83f519eb1a3a
This is in a jacuzzi:
{
  "machines": [
    "bld-linux64-spot-1060", 
    "bld-linux64-spot-360", 
    "bld-linux64-spot-488"
  ]
}

aws_watch_pending.log (part of):
2014-08-10 15:55:02,765 - processing 3 pending jobs
2014-08-10 15:55:02,773 - getting slaves allocated to Linux x86-64 mozilla-central non-unified
2014-08-10 15:55:04,006 - Linux x86-64 mozilla-central non-unified instance type bld-linux64 slaveset frozenset([u'bld-linux64-spot-360', u'bld-linux64-spot-1060', u'bld-linux64-spot-488'])
2014-08-10 15:55:13,984 - 0 running for spot bld-linux64 frozenset([u'bld-linux64-spot-360', u'bld-linux64-spot-1060', u'bld-linux64-spot-488']) (0 fresh)
2014-08-10 15:55:13,984 - reducing required count for spot bld-linux64 frozenset([u'bld-linux64-spot-360', u'bld-linux64-spot-1060', u'bld-linux64-spot-488']) by 0 (0 running; need 1)
2014-08-10 15:55:13,984 - need 1 spot bld-linux64 for slaveset frozenset([u'bld-linux64-spot-360', u'bld-linux64-spot-1060', u'bld-linux64-spot-488'])
2014-08-10 15:55:13,985 - 62 bld-linux64 spot instances running globally

2014-08-10 15:55:16,707 - getting all spot requests for us-west-2
2014-08-10 15:55:20,203 - 84 active spot requests for us-west-2 bld-linux64
2014-08-10 15:55:20,204 - 25 real active spot requests for us-west-2 bld-linux64
2014-08-10 15:55:20,431 - getting all spot requests for us-east-1
2014-08-10 15:55:20,940 - 40 active spot requests for us-east-1 bld-linux64
2014-08-10 15:55:20,940 - 40 real active spot requests for us-east-1 bld-linux64

2014-08-10 15:55:21,226 - Sanity checking r3.xlarge in us-west-2b
2014-08-10 15:55:21,227 - getting filtered spot requests for us-west-2b (r3.xlarge)
2014-08-10 15:55:21,228 - No recent spot requests in last 15m
2014-08-10 15:55:21,229 - Need 1 of bld-linux64 in us-west-2b
2014-08-10 15:55:21,229 - Using r3.xlarge (us-west-2, us-west-2b) 0.0321 (value: 0.02675) < 0.18
2014-08-10 15:55:21,229 - Using cached slaves.json
2014-08-10 15:55:21,420 - No slave name available for us-west-2, bld-linux64, frozenset([u'bld-linux64-spot-360', u'bld-linux64-spot-1060', u'bld-linux64-spot-488'])

repeat the last over other availability zones.

The 84 vs 24 difference in us-west-2 may be relevant. That and the 'No slave name' may be related to bug 1050281.
>> from cloudtools.aws.spot import get_active_spot_requests
>> slaveset = set([u'bld-linux64-spot-360', u'bld-linux64-spot-1060', u'bld-linux64-spot-488'])

>> active_req = get_active_spot_requests('us-west-2')
>> set(r.tags.get("Name") for r in active_req).intersection(slaveset)
{u'bld-linux64-spot-1060', u'bld-linux64-spot-360', u'bld-linux64-spot-488'}

hence get_available_spot_slave_name() returning None, and we get the 'No slave name available for ...' message.
Definitely bug 1050281 as the problem here. I cancelled sir-03dvxf26 and aws_watch_pending.py did
 2014-08-10 20:00:25,734 - Spot request for bld-linux64-spot-488.build.releng.usw2.mozilla.com (0.18)

Also needed a temporarily hack to http://hg.mozilla.org/build/cloud-tools/file/default/scripts/aws_watch_pending.py#l70 to look back 2 days to see the pending job.
Depends on: 1050281
Cancelled (u'sir-03dv02qp', u'sir-03dwx3qc') too, and three builds are running. 3 more builds in the queue when they're done.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Coop. FYI c#3 and related bug for possible buildduty things this week
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.