AWS seems to be pretty backlogged. https://tbpl.mozilla.org/?tree=Mozilla-B2g30-v1.4&rev=728fb350f32e was pushed more than an hour ago and still has lots of pending builds. Trunk trees are closed until the situation improves.
Created attachment 8442400 [details] [diff] [review] increase_west_build_limit.patch
spot instance expense is too high for our bidding on many instance types in us-west and us-east. There are other types we can still avail of so increasing our overall limit of instances allowed to allocate more breathing room for those other types.
Comment on attachment 8442400 [details] [diff] [review] increase_west_build_limit.patch let's increase us-east-1 too while we are at it.
https://hg.mozilla.org/build/cloud-tools/rev/eae3f6598284 is both, r+ from catlee on IRC.
Sorry, for those of us plying along at home. Please don't use acronyms in bug summaries.
Created attachment 8442470 [details] [diff] [review] Add subnet-7091d358 from us-west-1c Current theory - watch_pending really really wants to use us-east-1c for pricing reasons, but subnet-7091d358 has no free IPs so it gives up. Adds in the other subnet for another 123 slots.
Comment on attachment 8442470 [details] [diff] [review] Add subnet-7091d358 from us-west-1c https://hg.mozilla.org/build/cloud-tools/rev/ec66195a91fd
> Current theory - watch_pending really really wants to use us-east-1c for > pricing reasons, but subnet-7091d358 has no free IPs so it gives up. Adds in > the other subnet for another 123 slots. Make that '... but subnet-2da98346 has no free IPs ...'
Tried once I will try again. You have many contributors here who only see that the tree is closed and only explanation is AWS backlog with no explanation of what AWS is. and clicking on the link to the bug gives no more information. Just saying, for an open source open project we should explain these things instead of using acronyms that only a select few know hat they mean. Just My opinion, I could be wrong.
Oh and to be fair it is not just this bug I have the same issue with many others that is usually ignored.
Using code words in bugs is anti-open
To make the more abundantly clear if I am trying to land a patch on inbound the fact that it is closed because of an AWS backlog with no explanation of what AWS is is just like we closed it because we felt like it.
Bill, we are abundantly busy trying to fix the issue up. Please ask the sheriff on IRC for any clarification you need. We have setting the ondemand limit back to 100 http://hg.mozilla.org/build/cloud-tools/rev/64155873112f which reverses part of http://hg.mozilla.org/build/cloud-tools/rev/9f5ee86055e1 from a week ago.
and sorry to pick o nthis bug for a generic issue. Bug summaries should be understandable with out jargon or acronyms not generally understood within the entire Mozilla project.
Reopened trunk trees at 2014-06-18T17:03:14 since it seems like things are better.
Crisis has subsided and free to discuss some of the theories for what may have caused the infra issue. We have increased our spot instance and on demand instance AWS (amazon web services) limit. Most likely this is just a band-aid to an underlying problem. In addition to theory described and patched here, here is another: within the the last week we have added more jacuzzis and may have locked too many slave names to them thus starving our non jacuzzi'd builders. The solution here is to add more slave names. from discussion in #releng: [16:46:02]] <catlee-away> | so we have 497 bld-linux64 spot slaves in slavealloc, and 416 of those are allocated to jacuzzis [[16:46:09]] <catlee-away> | that doesn't leave many to handle non-jacuzzi's builders [17:03:54]] <jlund|build> | catlee-away: we were having large lists of these initially: https://pastebin.mozilla.org/5432246 and this is non-jacuzzi. There didn't seem to be many many 'no slave names' for the jacuzzis in the log [[17:04:46]] <catlee-away> | yeah, ok [[17:04:51]] <catlee-away> | so we need more names [[17:05:00]] <catlee-away> | we added a bunch more jacuzzis late last week  https://bugzilla.mozilla.org/show_bug.cgi?id=1027308#c6  http://atlee.ca/blog/posts/initial-jacuzzi-results.html
Thank you for explaining to those who did not know what AWS is. I realize most of us knew that but I hate it when especially a tree closure bug uses non universally known acronyms.
3 years ago
with trees open and stable again, closing this for now. for follow up on the fix of this, please see: Bug 1027437 - add more slave names to non jacuzzi builders