Closed Bug 1174746 Opened 9 years ago Closed 8 years ago

Android 4.3 opt doesn't appear to be adhering to SETA rules

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kmoir, Assigned: kmoir)

Details

Attachments

(2 files)

seems most jobs are running all the time
Assignee: nobody → kmoir
Summary: Android 4.3 optdoesn't appear to be adhering to SETA rules → Android 4.3 opt doesn't appear to be adhering to SETA rules
The problem is that BRANCHES definition in buildbot-configs includes the SETA specifications however, but the time that it is parsed in buildbotcustom/misc.py it is at skipconfig is at 0,0. I think this may have to do with the fact that Android 4.3 runs on two different types of instance types and thus slave definitions. Debugging :-)
So I looked at the logs for this today for both opt and debug. Currently this platform is configured to skip on every 7th commit or timeout of 1 hour.  I think we are hitting the timeout a lot and this schedules all the tests.

ubuntu64_vm_armv7_large is the platform many of our Android 4.3 jobs run on.. Although, some run on smaller instance types.  In any case, here is the aggregated data from the logs on how many jobs are being skipped.  I would like to try to change the timeout parameter for Android 4.3 now that we have per platform SETA configs implemented. I think we are just hitting the timeout for many of these tests, given how few are skipped at the later stages.  Thoughts gbrown and jmaher?


[kmoir@buildbot-master81.bb.releng.scl3.mozilla.com tests_scheduler]$ grep skip master/twistd.log | grep ubuntu64_vm_armv7_large-debug | grep "1/7" | wc -l
1472
[kmoir@buildbot-master81.bb.releng.scl3.mozilla.com tests_scheduler]$ grep skip master/twistd.log | grep ubuntu64_vm_armv7_large-debug | grep "2/7" | wc -l
947
[kmoir@buildbot-master81.bb.releng.scl3.mozilla.com tests_scheduler]$ grep skip master/twistd.log | grep ubuntu64_vm_armv7_large-debug | grep "3/7" | wc -l
821
[kmoir@buildbot-master81.bb.releng.scl3.mozilla.com tests_scheduler]$ grep skip master/twistd.log | grep ubuntu64_vm_armv7_large-debug | grep "4/7" | wc -l
684
[kmoir@buildbot-master81.bb.releng.scl3.mozilla.com tests_scheduler]$ grep skip master/twistd.log | grep ubuntu64_vm_armv7_large-debug | grep "5/7" | wc -l
862
[kmoir@buildbot-master81.bb.releng.scl3.mozilla.com tests_scheduler]$ grep skip master/twistd.log | grep ubuntu64_vm_armv7_large-debug | grep "6/7" | wc -l
375
[kmoir@buildbot-master81.bb.releng.scl3.mozilla.com tests_scheduler]$ grep skip master/twistd.log | grep ubuntu64_vm_armv7_large-opt | grep "1/7" | wc -l
1469
[kmoir@buildbot-master81.bb.releng.scl3.mozilla.com tests_scheduler]$ grep skip master/twistd.log | grep ubuntu64_vm_armv7_large-opt | grep "2/7" | wc -l
955
[kmoir@buildbot-master81.bb.releng.scl3.mozilla.com tests_scheduler]$ grep skip master/twistd.log | grep ubuntu64_vm_armv7_large-opt | grep "3/7" | wc -l
765
[kmoir@buildbot-master81.bb.releng.scl3.mozilla.com tests_scheduler]$ grep skip master/twistd.log | grep ubuntu64_vm_armv7_large-opt | grep "4/7" | wc -l
874
[kmoir@buildbot-master81.bb.releng.scl3.mozilla.com tests_scheduler]$ grep skip master/twistd.log | grep ubuntu64_vm_armv7_large-opt | grep "5/7" | wc -l
622
[kmoir@buildbot-master81.bb.releng.scl3.mozilla.com tests_scheduler]$ grep skip master/twistd.log | grep ubuntu64_vm_armv7_large-opt | grep "6/7" | wc -l
425
Flags: needinfo?(jmaher)
Flags: needinfo?(gbrown)
Looking at mozilla-inbound on treeherder, SETA seems to work more effectively (run fewer tests) on Android 2.3; is the timeout different?
Flags: needinfo?(gbrown)
the android 2.3 tests take <50 minutes, the 4.3 ones (debug) take 70+ minutes.  I do wonder if the timeout of 60 minutes is counting completed jobs.
Flags: needinfo?(jmaher)
if we figure out the root cause, and determine that increasing the timeout to 120 minutes is more practical, then I would fully support it.
I'd like to test this and see if the results change
Attachment #8681402 - Flags: review?(jmaher)
Comment on attachment 8681402 [details] [diff] [review]
bug1174746-2.patch

Review of attachment 8681402 [details] [diff] [review]:
-----------------------------------------------------------------

nice and simple.

::: mozilla-tests/config_seta.py
@@ +32,5 @@
> +      if slave_sp in ["xp-ix"]:
> +          skipconfig_defaults_platform[slave_sp] = (14, 7200)
> +      elif slave_sp in ["ubuntu64_vm_armv7_mobile", "ubuntu64_vm_armv7_large"]:
> +          skipconfig_defaults_platform[slave_sp] = (7, 7200)
> +      else: 

nit: trailing whitespace after the final else
Attachment #8681402 - Flags: review?(jmaher) → review+
Comment on attachment 8681402 [details] [diff] [review]
bug1174746-2.patch

and fixed whitespace
Attachment #8681402 - Flags: checked-in+
Hmm, this doesn't seem to have fixed it, I'm investigating further.
I noticed:

http://hg.mozilla.org/build/buildbot-configs/annotate/71d454af8ab3/mozilla-tests/config_seta.py#l80

def sort_android_tests(platform, slave_platform, tests):
    """create a dictionary that maps slave platform to tests"""
    """initialize the dictionary of tests per platform"""
    tests_by_slave_platform = {}
    for s in slave_platform:
        tests_by_slave_platform[s] = []
    for t in tests:
        if t.split()[-1].startswith('plain-reftest'):
            tests_by_slave_platform[slave_platform[0]].append(t)
        elif t.split()[-1].startswith('crashtest'):
            tests_by_slave_platform[slave_platform[0]].append(t)
        elif t.split()[-1].startswith('jsreftest'):
            tests_by_slave_platform[slave_platform[0]].append(t)
        else:
            tests_by_slave_platform[slave_platform[1]].append(t)
    return tests_by_slave_platform

For Android 4.3, I think that associates plain/crash/js-reftests with ubuntu64_vm_armv7_mobile and all other test types with ubuntu64_vm_armv7_large; for 2.3, plain/crash/js-reftests with ubuntu64_vm_mobile and all other test types with ubuntu64_vm_large.

That seems backwards -- reftests should be on _large -- and too simplistic...aren't some mochitests now on _large?
yes this is is, thanks for looking, I'll fix it
This is just a temporary fix so we stop running so many android 4.3 debug jobs which is really bugging me.  I looked at the code in our configs and it's really ugly to parse which type of test machine is allocated to each test.  So I'm going to use allthethings.json instead which allows this mapping.
Attachment #8683833 - Flags: review?(gbrown)
Comment on attachment 8683833 [details] [diff] [review]
bug1174746-3.patch

Review of attachment 8683833 [details] [diff] [review]:
-----------------------------------------------------------------

I see -- a great start!
Attachment #8683833 - Flags: review?(gbrown) → review+
Attachment #8683833 - Flags: checked-in+
So this is working now despite the pain scheduling woes caused in bug 1223042.  Next step to to consume allthethings.json for setting rules re instance type that seta consumes.
I think this can be closed.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: