Closed Bug 1034055 Opened 10 years ago Closed 10 years ago

implement c3.xlarge slave class for Linux64 test spot instances

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kmoir, Assigned: kmoir)

References

Details

Attachments

(12 files, 11 obsolete files)

6.69 KB, patch
rail
: review+
kmoir
: checked-in+
Details | Diff | Splinter Review
9.44 KB, patch
rail
: review+
kmoir
: checked-in+
Details | Diff | Splinter Review
1.71 KB, patch
rail
: review+
kmoir
: checked-in+
Details | Diff | Splinter Review
20.19 KB, text/plain
rail
: review+
Details
7.06 KB, patch
rail
: review+
kmoir
: checked-in+
Details | Diff | Splinter Review
1.02 KB, patch
rail
: review+
kmoir
: checked-in+
Details | Diff | Splinter Review
1.11 KB, patch
rail
: review+
kmoir
: checked-in+
Details | Diff | Splinter Review
12.18 KB, patch
rail
: review+
Details | Diff | Splinter Review
2.03 KB, patch
rail
: review+
kmoir
: checked-in+
Details | Diff | Splinter Review
12.92 KB, patch
kmoir
: checked-in+
Details | Diff | Splinter Review
800 bytes, patch
rail
: review+
kmoir
: checked-in+
Details | Diff | Splinter Review
20.39 KB, text/plain
Details
We are having capacity issues on ix because Android 2.3 Armv6 jobs are now enabled now on trunk.  gbrown tested a c3.xlarge instance and the reftests et al now run to completion on this instance type. They didn't before.  Thus we should enable a slave class in AWS for Linux64 test spot instances on this instance type so that we can migrate the 2.3 tests that are currently running on ix to AWS.
Assignee: nobody → kmoir
Attached patch bug1034055.patch (obsolete) — Splinter Review
Initial patch for cloud tools to enable c3.xlarge as a new slave class

I'll add the patches for the other repos later.  As a fyi I plan to deploy these changes on ash first to ensure it doesn't break things.

I'm not sure how to match this platform in slavealloc.py here so it gets the correct slaves
http://hg.mozilla.org/build/cloud-tools/file/bfd92f8744a5/cloudtools/slavealloc.py#l70
Same thing applies to the regular expression for the builder types in water_pending.cfg since the tests will still be split across the existing instance types and the new c3.xlarge slaves

I'm not sure what the ami id should be in the configs for tst-emulator64.  Does this get generated automatically when puppet creates it?
Attachment #8450274 - Flags: feedback?(rail)
Comment on attachment 8450274 [details] [diff] [review]
bug1034055.patch

Review of attachment 8450274 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm!

::: configs/instance2ami.json
@@ +21,5 @@
> +     "regions": ["us-east-1", "us-west-2"]},
> +    {"ami-config": "ubuntu-12.04-x86_64-desktop",
> +     "instance-config": "tst-emulator64",
> +     "ssh-key": "aws-releng",
> +     "ssh-user": "ubuntu",

This code is not used anymore, but won't hurt.

::: configs/tst-emulator64
@@ +5,5 @@
> +        "domain": "test.releng.use1.mozilla.com",
> +        "ami": "ami-e48e1e8d",
> +        "subnet_ids": ["subnet-ae35ccc4", "subnet-8f32cbe5", "subnet-ff3542d7",
> +                       "subnet-b8643190", "subnet-fb97bc8f", "subnet-844b7ec2",
> +                       "subnet-ed35cc87", "subnet-5cd0d828", "subnet-7ca5f03a"],

I hope we have enough IP addresses! :)

@@ +7,5 @@
> +        "subnet_ids": ["subnet-ae35ccc4", "subnet-8f32cbe5", "subnet-ff3542d7",
> +                       "subnet-b8643190", "subnet-fb97bc8f", "subnet-844b7ec2",
> +                       "subnet-ed35cc87", "subnet-5cd0d828", "subnet-7ca5f03a"],
> +        "security_group_ids": ["sg-f0f1239f"],
> +        "instance_type": "c3.xlarge",

FTR. instance_type from this file is used only for on-demand instances.

@@ +11,5 @@
> +        "instance_type": "c3.xlarge",
> +        "distro": "ubuntu",
> +        "ssh_key": "aws-releng",
> +        "use_public_ip": true,
> +        "instance_profile_name": "tst-emulator64",

TODO: This will require adding a new profile in Amazon IAM. Copy paste of tst-linux64 IAM role will work.

@@ +17,5 @@
> +            "/dev/sda1": {
> +                "size": 20,
> +                "volume_type": "gp2",
> +                "instance_dev": "/dev/xvda1"
> +            }

Since this instance type comes with ephemeral storage, it's worth to copy /dev/sdb and /dev/sdc entries from http://hg.mozilla.org/build/cloud-tools/file/bfd92f8744a5/configs/bld-linux64#l21 (they are harmless).

@@ +23,5 @@
> +        "tags": {
> +            "moz-type": "tst-emulator64"
> +        }
> +    },
> +    "us-west-2": {

the same comments as above for usw2

::: configs/tst-emulator64.cloud-init
@@ +1,1 @@
> +#cloud-config

You can just symlink this file to tst-linux64 since they are identical

::: configs/watch_pending.cfg
@@ +23,5 @@
>          "^Ubuntu Mulet VM 12.04 x64.*": "tst-linux64",
>          "^Ubuntu ASAN VM 12.04 x64.*": "tst-linux64",
>          "^b2g_(emulator|ubuntu64)_vm": "tst-linux64",
>          "^Android 2.3( Armv6)? Emulator(?:(?!plain-reftest|crashtest|jsreftest).)*$": "tst-linux64",
> +        "^Android 2.3( Armv6)? Emulator(?:(?!plain-reftest|crashtest|jsreftest).)*$": "tst-emulator64",

This implies that no new tests are going to be run on this instance type. Only moving existing ones from m1.medium to c3.xlarge. Correct?

Everything except this line can (actually should) be landed in advance, so we generate AMIs and probably test them manually.
Attachment #8450274 - Flags: feedback?(rail) → feedback+
(In reply to Rail Aliiev [:rail] from comment #2)
> > +        "instance_profile_name": "tst-emulator64",
> 
> TODO: This will require adding a new profile in Amazon IAM. Copy paste of
> tst-linux64 IAM role will work.

Done.
Added the "golden" DNS entries (they will be used to puppetize the AMIs):

invtool A create --ip 10.134.50.244 --fqdn tst-emulator64-ec2-golden.test.releng.use1.mozilla.com  --private  --description "Golden AMI"
invtool PTR create --ip 10.134.50.244 --target tst-emulator64-ec2-golden.test.releng.use1.mozilla.com  --private --description "Golden AMI"

invtool A create --ip 10.132.49.72 --fqdn tst-emulator64-ec2-golden.test.releng.usw2.mozilla.com  --private  --description "Golden AMI"
invtool PTR create --ip 10.132.49.72 --target tst-emulator64-ec2-golden.test.releng.usw2.mozilla.com  --private --description "Golden AMI"
(In reply to Kim Moir [:kmoir] from comment #1)
> I'll add the patches for the other repos later.  As a fyi I plan to deploy
> these changes on ash first to ensure it doesn't break things.
> 
> I'm not sure how to match this platform in slavealloc.py here so it gets the
> correct slaves
> http://hg.mozilla.org/build/cloud-tools/file/bfd92f8744a5/cloudtools/
> slavealloc.py#l70

We may need to use something else to distinguish these... Maybe speed... The best approach would be adding a new column in the slavealloc DB table, I think.

> Same thing applies to the regular expression for the builder types in
> water_pending.cfg since the tests will still be split across the existing
> instance types and the new c3.xlarge slaves

Are we going to leave existing ones on m1.medium and use c3.xlarge for plain-reftest|crashtest|jsreftest?

> 
> I'm not sure what the ami id should be in the configs for tst-emulator64. 

The current value is OK. This is a base AMI, no puppet applied. This AMI is used to bootstrap and puppetize instances.

> Does this get generated automatically when puppet creates it?

Once we puppetize the "golden" AMIs (daily) we tag and publish them. Then aws_watch_pending.py uses http://hg.mozilla.org/build/cloud-tools/file/bfd92f8744a5/cloudtools/aws/ami.py#l127 to figure out the latest usable AMI.
In regard to comment #5, yes I plan to leave the existing tests than can run on m1.medium because it isn't worth using a more expensive instance type that we need.  Thanks for all the feedback, I'm working on new patches.
Attached patch bug1034055-2.patch (obsolete) — Splinter Review
Attachment #8450274 - Attachment is obsolete: true
Attachment #8450372 - Flags: feedback?(rail)
Comment on attachment 8450372 [details] [diff] [review]
bug1034055-2.patch

Review of attachment 8450372 [details] [diff] [review]:
-----------------------------------------------------------------

::: configs/tst-emulator64
@@ +17,5 @@
> +            "/dev/sda1": {
> +                "size": 20,
> +                "volume_type": "gp2",
> +                "instance_dev": "/dev/xvda1"
> +            }

^ JSON syntax error :) missing coma

@@ +53,5 @@
> +            "/dev/sda1": {
> +                "size": 20,
> +                "volume_type": "gp2",
> +                "instance_dev": "/dev/xvda1"
> +            }

the same here

::: configs/watch_pending.cfg
@@ +23,5 @@
>          "^Ubuntu Mulet VM 12.04 x64.*": "tst-linux64",
>          "^Ubuntu ASAN VM 12.04 x64.*": "tst-linux64",
>          "^b2g_(emulator|ubuntu64)_vm": "tst-linux64",
>          "^Android 2.3( Armv6)? Emulator(?:(?!plain-reftest|crashtest|jsreftest).)*$": "tst-linux64",
> +        "^Android 2.3( Armv6)? Emulator(?:(?!plain-reftest|crashtest|jsreftest).)*$": "tst-emulator64",

I think you need:

"^Android 2.3( Armv6)? Emulator.* (plain-reftest|crashtest|jsreftest).*": "tst-emulator64",

(?!blah) means "not blah".

I'd test the regexp against the actual builder names... I know, this is bad :)

Once we swith to allthethings.json, this horrible block will be gone!
Attachment #8450372 - Flags: feedback?(rail) → feedback+
Attachment #8450384 - Flags: review?(rail)
Attachment #8450384 - Flags: review?(rail) → review+
Attachment #8450384 - Flags: checked-in+
Attached patch bug1034055-3.patch (obsolete) — Splinter Review
I tested the regexp against builder names and it worked.  Like you said, I wouldn't land buildermap change in watch_pending.cfg until we tested the AMIs
Attachment #8450372 - Attachment is obsolete: true
Attachment #8450470 - Flags: review?(rail)
Comment on attachment 8450470 [details] [diff] [review]
bug1034055-3.patch

Review of attachment 8450470 [details] [diff] [review]:
-----------------------------------------------------------------

It looks fine to me as a not final patch for 2 reasons:

* we still need to distinguish tst-linux64 and tst-emulator64 in http://hg.mozilla.org/build/cloud-tools/file/bfd92f8744a5/cloudtools/slavealloc.py#l45

::: configs/watch_pending.cfg
@@ +23,5 @@
>          "^Ubuntu Mulet VM 12.04 x64.*": "tst-linux64",
>          "^Ubuntu ASAN VM 12.04 x64.*": "tst-linux64",
>          "^b2g_(emulator|ubuntu64)_vm": "tst-linux64",
>          "^Android 2.3( Armv6)? Emulator(?:(?!plain-reftest|crashtest|jsreftest).)*$": "tst-linux64",
> +        "^Android 2.3( Armv6)? Emulator.* (plain-reftest|crashtest|jsreftest).*": "tst-emulator64",

This would immediately start trying to start new instance types before they are ready to go (AMIs).

Can you put this hunk in a separate patch?
Attachment #8450470 - Flags: review?(rail) → review-
Attachment #8450470 - Attachment is obsolete: true
Attachment #8450507 - Flags: review?(rail)
Attached patch bug1034055bb.patch (obsolete) — Splinter Review
buildbot patch, still need to test
Attached patch bug1034055wp.patch (obsolete) — Splinter Review
patch for watch pending after we test AMIs
Attachment #8450535 - Flags: review?(rail)
Comment on attachment 8450507 [details] [diff] [review]
bug1034055-4.patch

Review of attachment 8450507 [details] [diff] [review]:
-----------------------------------------------------------------

::: cloudtools/slavealloc.py
@@ +76,3 @@
>         slave.get("trustlevel") == "try":
>          return "tst-linux64"
> +    

Please kill the trailing space when you land it.

Kim, can you also file a bug to make this method more straight forward. Using speed here can be used as a temporary work around, but it may hit us in the future at some point. Please assign it to me.
Attachment #8450507 - Flags: review?(rail) → review+
Attachment #8450535 - Flags: review?(rail) → review+
Kim and Rail, thanks for working so quickly through all this yesterday - fantastic job! I can't quite believe how much you got done yesterday - really great work. =)
Attachment #8450507 - Flags: checked-in+
regarding comment 15, I opened bug 1034674
So I couldn't get an ami to generate and I'm stuck so I think I'll wait until more people are back on Monday to help me debug.  

I tried to invoke the puppet script by hand but it failed because it couldn't create an c3.xlarge image in us-east-1b.   

As an aside, I thought adding -r us-west-2 to the script parameters in cron.pp for this instance type would help but invoking I see no error messages or log output. And no AMI created.  So not sure what's going on.
I talked to rail this morning and the problem was that the address assigned to tst-emulator64-ec2-golden.test.releng.use1.mozilla.com.  He said to modify the config to use these subnets

  "subnet_ids": ["subnet-35a9835e", "subnet-8f21eeee", "subnet-8c20efed",
                        "subnet-173ff076", "subnet-0aa98361", "subnet-33a98358"],
        "security_group_ids": ["sg-f0f1239f"],

with python free_ips.py -c /tmp/tst-emulator64 -n 1 and then update the A and PTR records for ec2-golden.test.releng.use1.mozilla.com to reflect the new address. Once this is propagated, I should be able to rerun the ami generation script successfully.
See https://bugzilla.mozilla.org/show_bug.cgi?id=1034034#c11

Maybe we can still run these tests on IX after all?
Flags: needinfo?(kmoir)
Flags: needinfo?(coop)
(In reply to Pete Moore [:pete][:pmoore] from comment #20)
> See https://bugzilla.mozilla.org/show_bug.cgi?id=1034034#c11
> 
> Maybe we can still run these tests on IX after all?

The builders are configured differently from the testers, so we can't just bulk out the existing pools by re-imaging the builders.

However, since we're effectively creating a new pool of AWS machines for mobile testing, there's no reason why we couldn't use a new hardware pool instead, provided the build machines meet the other prerequisites for use as test machines. I'm specifically worried about the minimum graphics requirements here because I don't think the builders have graphics cards and, because of their design/configuration, may not be able to accept new cards if required.

I'm still a bigger fan of pushing work into AWS if possible, but I'll let Kim decide the best course of action here. If we want to try an experiment, we can pull a windows ix builder to test.
Flags: needinfo?(coop)
I'm going to proceed with testing on AWS right now and get my patches tested on my dev-master and see if we can that working on a branch since we seem to be pretty close there and this would resolve some immediate capacity issues.

If we did decide to go with the builder ix machines we would have to get ateam to test them since they are a different hardware ref than our existing ix pool.
Flags: needinfo?(kmoir)
Depends on: 1035270
Attached patch bug1034055bb.patch wip (obsolete) — Splinter Review
patches to enable on ash. The builder diff looks good.  I just can't get my AWS loaner to talk to my dev-master due to networking issues so I can invoke sendchanges.  Will have to talk to people tomorrow morning and see how I can get them to connect.
Attachment #8450535 - Attachment is obsolete: true
Attachment #8450530 - Attachment is obsolete: true
Talked to nthomas and got my dev-master ports fixed. Invoked sendchanges and running some tests overnight with this slave.
The loaner script added the wrong instance type so the tests timed out last night.  I changed my loaner instance to c3.xlarge and now the tests are running green.  I'll attach patches to enable this configuration on ash.
Depends on: 1035863
Attached patch bug1034055bb-2.patch (obsolete) — Splinter Review
patches to enable on ash
Attachment #8452081 - Attachment is obsolete: true
we need two names for the slave classes just like last time one for armv6, one for 2.3  I can delete the ix slave classes that I created earlier once this is in production
It turns out the slave I used last night didn't have the correct AMI or instance tyoe.  I had to recreate the loaner twice today because there were some problems in our configs.  I'm now working through issues where the tests don't complete. I don't think it's a problem with the instance, just the test setup, still investigating.

http://dev-master1.srv.releng.scl3.mozilla.com:8036/builders/Android%202.3%20Armv6%20Emulator%20ash%20opt%20test%20plain-reftest-7/builds/3/steps/run_script/logs/stdio
So it seems ash is in a bad state and this was the source of all the errors I saw.  When I changed my dev-master to run these tests on m-c and invoked the associated sendchanges, the testing is proceeding.
Blocks: 1031083
Comment on attachment 8452349 [details] [diff] [review]
bug1034055bb-2.patch

I won't land this until the slavealloc entries are in place that I will attach shortly but asking for review since it's ready
Attachment #8452349 - Flags: review?(rail)
Comment on attachment 8452352 [details] [diff] [review]
bug1034055puppet-2.patch

ubuntu64_vm_large will still be used for Android 2.3 tests but I need a separate name for armv6 tests to avoid builder name contention
Attachment #8452352 - Flags: review?(rail)
Attachment #8452349 - Flags: review?(rail)
Attached patch bug1034055bb-3.patch (obsolete) — Splinter Review
Attachment #8452349 - Attachment is obsolete: true
Attachment #8453274 - Flags: review?(rail)
Attached file emulatorslave
List of slaves to add to slavealloc
Attachment #8453275 - Flags: review?(rail)
Attachment #8452352 - Flags: review?(rail) → review+
Attachment #8453275 - Flags: review?(rail) → review+
Blocks: 1036609
Attachment #8453274 - Attachment is obsolete: true
Attachment #8453274 - Flags: review?(rail)
Attachment #8453319 - Flags: review?(rail)
Comment on attachment 8453319 [details] [diff] [review]
bug10340550-4.patch

Review of attachment 8453319 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozilla-tests/production_config.py
@@ +69,5 @@
>  for i in range(1,100) + range(301,400):
>      SLAVES['ubuntu64_vm']['tst-linux64-ec2-%03i' % i] = {}
>  
> +for i in range(1,100) + range(301,400):
> +    SLAVES['ubuntu64_vm_large']['tst-emulator64-spot-%03i' % i] = {}  

Can you kill the trailing spaces above.
Attachment #8453319 - Flags: review?(rail) → review+
Attachment #8450535 - Attachment is obsolete: false
Comment on attachment 8453275 [details]
emulatorslave

imported to slavealloc
Tests all ran green on m-c on my dev-master on this image. So I'll get ready to deploy on ash, then to other branches.
Attachment #8452352 - Flags: checked-in+
First I have to finish setting up two new masters in AWS to handle the additional slaves in bug 1035863
Attachment #8450535 - Attachment is obsolete: true
Attachment #8454517 - Flags: review?(rail)
Attachment #8454517 - Flags: review?(rail) → review+
Attachment #8453319 - Flags: checked-in+
Attachment #8454517 - Flags: checked-in+
In production and merged m-c to ash to see if the emulator images are spun up correctly for Android 2.3 Armv6 tests on this branch.
Once you deploy the changes to all branches, we should also adjust the slave_health regexes like in http://hg.mozilla.org/build/slave_health/rev/5a892e2e304f
Hi Rail, Kim,

I noticed we are getting emails like these, but maybe this is solved by comment 42?


On 14 Jul 2014, at 17:41, Cron Daemon <root@cruncher.srv.releng.scl3.mozilla.com> wrote:

Unknown slave_type for test: tst-emulator64-spot-059
Unknown slave_type for test: tst-emulator64-spot-058
Unknown slave_type for test: tst-emulator64-spot-325
Unknown slave_type for test: tst-emulator64-spot-324
Unknown slave_type for test: tst-emulator64-spot-323
Unknown slave_type for test: tst-emulator64-spot-322
Unknown slave_type for test: tst-emulator64-spot-321
Unknown slave_type for test: tst-emulator64-spot-320
Unknown slave_type for test: tst-emulator64-spot-051
Unknown slave_type for test: tst-emulator64-spot-050
Unknown slave_type for test: tst-emulator64-spot-053
Unknown slave_type for test: tst-emulator64-spot-052
.....
.....
Attached patch bug1034055allbranches.patch (obsolete) — Splinter Review
Patch to enable on relevant branches.  The builder diff just shows an ordering difference.

I'll attach a patch for the cloud-tools too to remove Ash from the regexp.  Also, I'll remove the slave classes from puppet after we roll this out because there will be jobs in progress when we reconfig.
Attachment #8455396 - Flags: review?(rail)
watch all branches not just ash
Attachment #8455406 - Flags: review?(rail)
Comment on attachment 8455406 [details] [diff] [review]
bug1034055wp-3.patch

I think "^Android 2.3( Armv6)? Emulator.*" is what you want, the rest is redundant.
Attachment #8455406 - Flags: review?(rail) → review+
Attached patch bug1034055-slavehealth.patch (obsolete) — Splinter Review
adjust slave health
Attachment #8455434 - Flags: review?(rail)
Comment on attachment 8455434 [details] [diff] [review]
bug1034055-slavehealth.patch

Review of attachment 8455434 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/slave_health.js
@@ +91,5 @@
>  	} else if (pending.match(/(Ubuntu HW 12.04 x64|b2g_ics_armv7a_gecko_emulator_hw|b2g_emulator_hw)/)) {
>  	    slavetype = "talos-linux64-ix";
>  	} else if (pending.match(/Android (?:4.2 )?x86/)) {
>  	    slavetype = "talos-linux64-ix";
>  	} else if (pending.match(/Android 2.3( Armv6)? Emulator(?:(?!plain-reftest|crashtest|jsreftest).)*$/)) {

you also need to change the regexp to match the same rexep in watch_pending.cfg

@@ +92,5 @@
>  	    slavetype = "talos-linux64-ix";
>  	} else if (pending.match(/Android (?:4.2 )?x86/)) {
>  	    slavetype = "talos-linux64-ix";
>  	} else if (pending.match(/Android 2.3( Armv6)? Emulator(?:(?!plain-reftest|crashtest|jsreftest).)*$/)) {
> +	    slavetype = "tst-emulator64-spot";

Even though the line above looks ok, it's not enough. grepping slave_health gives me a lot of entries for tst-linux64, probably you need more changes to define this type. Coop may know more.
Attachment #8455434 - Flags: review?(rail) → review-
(In reply to Pete Moore [:pete][:pmoore] from comment #43)
> Hi Rail, Kim,
> 
> I noticed we are getting emails like these, but maybe this is solved by
> comment 42?
> 
> 
> On 14 Jul 2014, at 17:41, Cron Daemon
> <root@cruncher.srv.releng.scl3.mozilla.com> wrote:
> 
> Unknown slave_type for test: tst-emulator64-spot-059

root@cruncher?!! /me goes to get rid of this.
(In reply to Rail Aliiev [:rail] from comment #49)
> (In reply to Pete Moore [:pete][:pmoore] from comment #43)
> > Hi Rail, Kim,
> > 
> > I noticed we are getting emails like these, but maybe this is solved by
> > comment 42?
> > 
> > 
> > On 14 Jul 2014, at 17:41, Cron Daemon
> > <root@cruncher.srv.releng.scl3.mozilla.com> wrote:
> > 
> > Unknown slave_type for test: tst-emulator64-spot-059
> 
> root@cruncher?!! /me goes to get rid of this.


I think this is slave_health. phew...
Attachment #8455396 - Attachment is obsolete: true
Attachment #8455396 - Flags: review?(rail)
Attachment #8455481 - Flags: review?(rail)
Comment on attachment 8455481 [details] [diff] [review]
bug1034055allbranches-2.patch

Review of attachment 8455481 [details] [diff] [review]:
-----------------------------------------------------------------

Can you also remove ubuntu64_hw_mobile from BuildSlaves.py.template when you land?

A separate patch to remove ubuntu64_hw_mobile from puppet is also appreciated.

::: mozilla-tests/mobile_config.py
@@ +1426,2 @@
>              'debug_unittest_suites': [],
>          },       

Can you also kill the trailing space above when you land this.

@@ +1599,5 @@
>          ANDROID_2_3_AWS_DICT['opt_unittest_suites'].append(suite)
>  
>  # enable android 2.3 tests to ride the trains bug 1004791
>  for name, branch in items_at_least(BRANCHES, 'gecko_version', 32):
>      # Loop removes it from any branch that gets beyond here

as a follow up, can you fie a bug to fix this loop using items_before(). It's much more convenient for merge duty patches.
Attachment #8455481 - Flags: review?(rail) → review+
remove old slave classes from puppet once buildbot config patches are landed and reconfiged
Attachment #8455508 - Flags: review?(rail)
Blocks: 1038320
better patch than comment 52 + bug filed for loop in bug 1038320
Comment on attachment 8455434 [details] [diff] [review]
bug1034055-slavehealth.patch

coop is going to add the stuff to slave health so I won't worry about writing a patch for it.  Thanks coop!
Attachment #8455434 - Attachment is obsolete: true
Attachment #8455508 - Flags: review?(rail) → review+
Tests are green on ash so I'll land my patches to enable on all branches and reconfig again first thing tomorrow morning.  Reconfigs are fast when people in California are still sleeping and the load is light on the masters :-)
Attachment #8455516 - Flags: checked-in+
Comment on attachment 8455406 [details] [diff] [review]
bug1034055wp-3.patch

like this as rail suggested
Android 2.3( Armv6)? Emulator.*
Attachment #8455406 - Flags: checked-in+
In production
We are seeing a few problems cloning hg on some spot instances, but this seems to be bug 1036176.

I can see in slave health that the entire pool of spot instances is up and running jobs.  However, there are still ~800 pending jobs so we'll have to wait and see if we need to expand the pool, especially given that there are b2g tests that want to run on this same instance type in bug 1031083.
more instances to reduce pending
Attachment #8456351 - Flags: review?(rail)
Attached file emulatorlist2.txt (obsolete) —
add new instances to slavealloc
Attachment #8456352 - Flags: review?(rail)
Attachment #8456351 - Flags: review?(rail) → review+
Comment on attachment 8456352 [details]
emulatorlist2.txt

conditional r+:
s/tst-linux64-spot/tst-emulator64-spot/

(it would fail inserting into the db)
Attachment #8456352 - Flags: review?(rail) → review+
Comment on attachment 8455508 [details] [diff] [review]
bug1034055puppetremove.patch

and merged
Attachment #8455508 - Flags: checked-in+
Attachment #8456351 - Flags: checked-in+
Attached file emulatorlist2.txt
actually the hostnames were wrong, this is what I added to the db
Attachment #8456352 - Attachment is obsolete: true
Blocks: 1038941
Blocks: 1039227
Comment on attachment 8456406 [details]
emulatorlist2.txt

added to and enabled in slavealloc
New slave pool in production
New instances are up but the pending count is still high (~1200).  Will watch it over the next 24 hours and see how it keeps up with load.
Pending count looks good.  Closing.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.