Backlog of linux compile jobs (Amazon AWS instances not being launched)

RESOLVED FIXED

Status

Release Engineering
Buildduty
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: KWierso, Unassigned)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

(Reporter)

Description

3 years ago
AWS seems to be pretty backlogged. https://tbpl.mozilla.org/?tree=Mozilla-B2g30-v1.4&rev=728fb350f32e was pushed more than an hour ago and still has lots of pending builds.

Trunk trees are closed until the situation improves.
Created attachment 8442400 [details] [diff] [review]
increase_west_build_limit.patch
Attachment #8442400 - Flags: review?(catlee)
spot instance expense is too high for our bidding on many instance types in us-west and us-east. There are other types we can still avail of so increasing our overall limit of instances allowed to allocate more breathing room for those other types.
Comment on attachment 8442400 [details] [diff] [review]
increase_west_build_limit.patch

let's increase us-east-1 too while we are at it.
Attachment #8442400 - Flags: review?(catlee) → review-
https://hg.mozilla.org/build/cloud-tools/rev/eae3f6598284 is both, r+ from catlee on IRC.
Sorry, for those of us plying along at home.  Please don't use acronyms in bug summaries.
Created attachment 8442470 [details] [diff] [review]
Add subnet-7091d358 from us-west-1c

Current theory - watch_pending really really wants to use us-east-1c for pricing reasons, but subnet-7091d358 has no free IPs so it gives up. Adds in the other subnet for another 123 slots.
Attachment #8442470 - Flags: review?(catlee)
Comment on attachment 8442470 [details] [diff] [review]
Add subnet-7091d358 from us-west-1c

https://hg.mozilla.org/build/cloud-tools/rev/ec66195a91fd
Attachment #8442470 - Flags: review?(catlee) → review+
> Current theory - watch_pending really really wants to use us-east-1c for
> pricing reasons, but subnet-7091d358 has no free IPs so it gives up. Adds in
> the other subnet for another 123 slots.

Make that '... but subnet-2da98346 has no free IPs ...'
Tried once I will try again.  You have many contributors here who only see that the tree is closed and only explanation is  AWS backlog with no explanation of what AWS is. and clicking on the link to the bug gives no more information.  Just saying, for an open source open project we should explain these things instead of using acronyms that only a select few know hat they mean.  Just My opinion, I could be wrong.
Oh and to be fair it is not just this bug I have the same issue with many others that is usually ignored.
Using code words in bugs is anti-open
To make the more abundantly clear if I am trying to land a patch on inbound the fact that it is closed because of an AWS backlog with no explanation of what AWS is  is just like we closed it because we felt like it.
Bill, we are abundantly busy trying to fix the issue up. Please ask the sheriff on IRC for any clarification you need.

We have setting the ondemand limit back to 100
  http://hg.mozilla.org/build/cloud-tools/rev/64155873112f
which reverses part of 
  http://hg.mozilla.org/build/cloud-tools/rev/9f5ee86055e1
from a week ago.
and sorry to pick o nthis bug for a generic issue.  Bug summaries should be understandable with out jargon or acronyms not generally understood within the entire Mozilla project.

Updated

3 years ago
Summary: AWS backlog → Backlog of linux compile jobs (Amazon AWS instances not being launched)
(Reporter)

Comment 15

3 years ago
Reopened trunk trees at 2014-06-18T17:03:14 since it seems like things are better.
Crisis has subsided and free to discuss some of the theories for what may have caused the infra issue.

We have increased our spot instance and on demand instance AWS (amazon web services) limit.

Most likely this is just a band-aid to an underlying problem. In addition to theory described and patched here[1], here is another:

within the the last week we have added more jacuzzis[2] and may have locked too many slave names to them thus starving our non jacuzzi'd builders. The solution here is to add more slave names.

from discussion in #releng:
[16:46:02]] <catlee-away> | so we have 497 bld-linux64 spot slaves in slavealloc, and 416 of those are allocated to jacuzzis
[[16:46:09]] <catlee-away> | that doesn't leave many to handle non-jacuzzi's builders
[17:03:54]] <jlund|build> | catlee-away: we were having large lists of these initially: https://pastebin.mozilla.org/5432246 and this is non-jacuzzi. There didn't seem to be
          many many 'no slave names' for the jacuzzis in the log
[[17:04:46]] <catlee-away> | yeah, ok
[[17:04:51]] <catlee-away> | so we need more names
[[17:05:00]] <catlee-away> | we added a bunch more jacuzzis late last week


[1] https://bugzilla.mozilla.org/show_bug.cgi?id=1027308#c6 
[2] http://atlee.ca/blog/posts/initial-jacuzzi-results.html
Thank you for explaining to those who did not know what AWS is.  I realize most of us knew that but I hate it when especially a tree closure bug uses non universally known acronyms.
Depends on: 1027437
with trees open and stable again, closing this for now.

for follow up on the fix of this, please see: Bug 1027437 - add more slave names to non jacuzzi builders
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.