Move Android 2.3 reftests to ix slaves (ash only)

RESOLVED FIXED

Status

Release Engineering
General Automation
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: gbrown, Assigned: kmoir)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(4 attachments, 7 obsolete attachments)

Android 2.3 reftests, js-reftests, and crashtests run very slowly on the aws ec2 instances we currently use -- too slowly to run them to completion for any reasonable number of jobs (see bug 980519 for real numbers and experiments on different aws instance types). My efforts to make them run faster don't seem to be succeeding. 

These tests run significantly faster on ix slaves. I'd like to demonstrate that on ash: Continue running mochitests, robocop, and xpcshell on aws, but move the reftest, js-reftest, and crashtest jobs to ix slaves.

I recently verified that reftests run well on an ix loaner: all jobs ran to completion with the current chunk scheme. Most tests passed, but I will need to make some final manifest changes to avoid a few failures.


Running multiple tests concurrently (in "sets", as we do for android x86) is an option, but I do not recommend it. Running 4 reftest jobs concurrently more than doubles run-times, and even running just 2 jobs concurrently produces a significant increase in run-time. (Also, :ryanvm recently expressed frustration with sets, and I agree that sets are harder to understand at a glance.)
(Assignee)

Updated

3 years ago
Assignee: nobody → kmoir
Blocks: 967913
(Assignee)

Comment 1

3 years ago
Created attachment 8404122 [details] [diff] [review]
bug992969.patch

will attach builder diff
Attachment #8404122 - Flags: review?(bugspam.Callek)
(Assignee)

Comment 2

3 years ago
Created attachment 8404125 [details] [diff] [review]
bug992969.diff

note diff shows changes in ordering only.  On my dev master the correct slave types are assigned to the different test suites.
Comment on attachment 8404122 [details] [diff] [review]
bug992969.patch

Review of attachment 8404122 [details] [diff] [review]:
-----------------------------------------------------------------

what about:

ANDROID_2_3_MOZHARNESS_DICT = ANDROID_2_3_MOZHARNESS_ENABLED_DICT + ANDROID_2_3_MOZHARNESS_DISABLED_DICT
ANDROID_2_3_MOZHARNESS_UNITTEST_DICT = {
    'opt_unittest_suites': ANDROID_2_3_MOZHARNESS_DICT,
    'debug_unittest_suites': [],
}
now?

also if this ever goes to another tree these ENABLED/DISABLED names stink for this.
Attachment #8404122 - Flags: review?(bugspam.Callek) → review+
(Assignee)

Comment 4

3 years ago
Created attachment 8404692 [details] [diff] [review]
bug992969-2.patch

So I removed the ANDROID_2_3_MOZHARNESS_DICT part when I checked it in.  Armen requested the disabled and enabled distinctions in bug 989462
Attachment #8404692 - Flags: checked-in+
Hi Kim,

This has caused a jenkins failure (http://10.134.48.37:8080/job/buildbot-configs_tests/1913/changes#detail) - ok for me to roll this back?

Pete
Flags: needinfo?(kmoir)
(Assignee)

Updated

3 years ago
Attachment #8404692 - Flags: checked-in+ → checked-in-
Flags: needinfo?(kmoir)
(Assignee)

Comment 6

3 years ago
Created attachment 8404777 [details] [diff] [review]
bug992969-3.patch

So this patch addresses the duplicate builder name on the scheduling masters that caused the breakage during the last reconfig. I had tested this on the test master, not a scheduling master this is why it didn't show up until later.  I don't really like this because it means there the test jobs look different that are spawned from the same build.  A simpler solution since there isn't much traffic on ash is to change all ash tests for Android 2.3 to run on ix machines for the duration of the test.  :gbrown what do you think?
Flags: needinfo?(gbrown)
That's fine with me. In fact, I am curious to see how mochitests, etc will run on ix.

Assuming that reftests-on-ix work out, I will want to start running reftests on ix on trunk trees, alongside the other tests running on aws. Is that going to be possible, in the longer term?
Flags: needinfo?(gbrown)
(Assignee)

Comment 8

3 years ago
Created attachment 8404852 [details] [diff] [review]
bug992969-4.patch

patch to run on android 2.3 tests on ix machines only for all tests on ash

re gbrown's question:

>>Assuming that reftests-on-ix work out, I will want to start running reftests >>on ix on trunk trees, alongside the other tests running on aws. Is that going >>to be possible, in the longer term?

I don't think so but we'll have to change the builder name to be unique on each platform
Attachment #8404852 - Flags: review?(bugspam.Callek)

Updated

3 years ago
Attachment #8404852 - Flags: review?(bugspam.Callek) → review+
(Assignee)

Updated

3 years ago
Attachment #8404852 - Flags: checked-in+
(Assignee)

Comment 9

3 years ago
Comment on attachment 8404852 [details] [diff] [review]
bug992969-4.patch

still scheduler failures, with non-unique builder names, investigating
Attachment #8404852 - Flags: checked-in+ → checked-in-
(Assignee)

Comment 10

3 years ago
Created attachment 8406908 [details] [diff] [review]
bug992969-5.patch

So I think the problem with the duplicate builder names is fixed with this patch.  I tested it on a scheduling master using the steps here 
https://wiki.mozilla.org/ReleaseEngineering:TestingTechniques#setup_one_master_and_output_the_steps_for_it

and it looked fine
Attachment #8406908 - Flags: review?(bugspam.Callek)
Comment on attachment 8406908 [details] [diff] [review]
bug992969-5.patch

Review of attachment 8406908 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozilla-tests/mobile_config.py
@@ +100,5 @@
>  PLATFORMS['android']['vm_android_2_3'] = {
>      'name': "Android 2.3 Emulator",
>  }
> +PLATFORMS['android']['ubuntu64_hw'] = {
> +    'name': "Android 2.3 Emulator on ix",

nit if it passes without dupe names can you rename this back to "Android 2.3 Emulator" we'll get the "on ix" for free based on slave itself.

If it doesn't pass without dupe names then I think we still might have a problem (e.g. double mochitest-foo one on ix one not, on the same branch)
Attachment #8406908 - Flags: review?(bugspam.Callek) → review+
(Assignee)

Comment 12

3 years ago
That didn't work either. Callek suggested 

Callek|Buildduty	: ahhhhh INFO - ValueError: Schedulers must have unique names, but 'tests-ash-ubuntu64_hw-opt-unittest' was a duplicate
	kmoir	huh
	Callek|Buildduty	kmoir: doing SLAVES['ubuntu64_hw_mobile'] = SLAVES['ubuntu64_hw'] and then using the former might solve your issue
(Assignee)

Comment 13

3 years ago
Created attachment 8407646 [details] [diff] [review]
bug992969puppet.patch
(Assignee)

Comment 14

3 years ago
Created attachment 8407648 [details] [diff] [review]
bug992969-6.patch
Attachment #8406908 - Attachment is obsolete: true
Attachment #8404122 - Attachment is obsolete: true
Attachment #8404692 - Attachment is obsolete: true
Attachment #8404777 - Attachment is obsolete: true
Attachment #8404852 - Attachment is obsolete: true
(Assignee)

Comment 15

3 years ago
Created attachment 8407651 [details] [diff] [review]
builder.diff
Attachment #8404125 - Attachment is obsolete: true
(Assignee)

Updated

3 years ago
Attachment #8407648 - Flags: review?(bugspam.Callek)
(Assignee)

Updated

3 years ago
Attachment #8407646 - Flags: review?(bugspam.Callek)
Comment on attachment 8407651 [details] [diff] [review]
builder.diff

not a fan of s/Emulator/Emulator on ix/

Also would be useful for a full dump-master diff (specifically of the scheduler masters)
Comment on attachment 8407648 [details] [diff] [review]
bug992969-6.patch

Review of attachment 8407648 [details] [diff] [review]:
-----------------------------------------------------------------

assuming the schedulers pass this time!

::: mozilla-tests/mobile_config.py
@@ +100,5 @@
>  PLATFORMS['android']['vm_android_2_3'] = {
>      'name': "Android 2.3 Emulator",
>  }
> +PLATFORMS['android']['ubuntu64_hw_mobile'] = {
> +    'name': "Android 2.3 Emulator on ix",

can we name this "Android 2.3 Emulator" instead?
Attachment #8407648 - Flags: review?(bugspam.Callek) → review+
Comment on attachment 8407646 [details] [diff] [review]
bug992969puppet.patch

Review of attachment 8407646 [details] [diff] [review]:
-----------------------------------------------------------------

we'll likely want a heads up to release@ about this change since dev masters will need manual update
Attachment #8407646 - Flags: review?(bugspam.Callek) → review+
(Assignee)

Comment 19

3 years ago
Created attachment 8407690 [details] [diff] [review]
dump_master.diff

diff of dump_master.py output on test scheduling master
(Assignee)

Comment 20

3 years ago
I'll remove the "on ix" part for the name when I land the bb change
Comment on attachment 8407690 [details] [diff] [review]
dump_master.diff

f+ with the above mentioned (in config patch) naming change please.
Attachment #8407690 - Flags: feedback+
(Assignee)

Comment 22

3 years ago
Created attachment 8407825 [details] [diff] [review]
bug992969-6b.patch

The problem with the previous patch was that it listed ubuntu64_vm_mobile in the BuildSlaves.py.template instead of ubuntu64_hw_mobile.  WHen I fixed this and ran test-masters.sh it was all green :-)
Attachment #8407648 - Attachment is obsolete: true
Attachment #8407825 - Flags: review?(bugspam.Callek)

Updated

3 years ago
Attachment #8407825 - Flags: review?(bugspam.Callek) → review+
(Assignee)

Comment 23

3 years ago
Comment on attachment 8407825 [details] [diff] [review]
bug992969-6b.patch

Landed this and the tests passed as green.  Thanks Callek for all your reviews.  I now know how to test for all the things....good for next time:-)
Attachment #8407825 - Flags: checked-in+
(Assignee)

Comment 24

3 years ago
This was released to production on a reconfig last Thursday, and I've verified that it's working as expected on tbpl.  gbrown, let me know how your tests go and what the next steps will be.
(Assignee)

Comment 25

3 years ago
I think I can close this given that bug 1004791 is now in production too.
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.