The default bug view has changed. See this FAQ.

Android 4.3 opt doesn't appear to be adhering to SETA rules

RESOLVED FIXED

Status

Release Engineering
Platform Support
RESOLVED FIXED
2 years ago
6 months ago

People

(Reporter: kmoir, Assigned: kmoir)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(2 attachments)

(Assignee)

Description

2 years ago
seems most jobs are running all the time
(Assignee)

Updated

2 years ago
Assignee: nobody → kmoir
(Assignee)

Updated

2 years ago
Summary: Android 4.3 optdoesn't appear to be adhering to SETA rules → Android 4.3 opt doesn't appear to be adhering to SETA rules
(Assignee)

Comment 1

2 years ago
The problem is that BRANCHES definition in buildbot-configs includes the SETA specifications however, but the time that it is parsed in buildbotcustom/misc.py it is at skipconfig is at 0,0. I think this may have to do with the fact that Android 4.3 runs on two different types of instance types and thus slave definitions. Debugging :-)
(Assignee)

Comment 2

a year ago
So I looked at the logs for this today for both opt and debug. Currently this platform is configured to skip on every 7th commit or timeout of 1 hour.  I think we are hitting the timeout a lot and this schedules all the tests.

ubuntu64_vm_armv7_large is the platform many of our Android 4.3 jobs run on.. Although, some run on smaller instance types.  In any case, here is the aggregated data from the logs on how many jobs are being skipped.  I would like to try to change the timeout parameter for Android 4.3 now that we have per platform SETA configs implemented. I think we are just hitting the timeout for many of these tests, given how few are skipped at the later stages.  Thoughts gbrown and jmaher?


[kmoir@buildbot-master81.bb.releng.scl3.mozilla.com tests_scheduler]$ grep skip master/twistd.log | grep ubuntu64_vm_armv7_large-debug | grep "1/7" | wc -l
1472
[kmoir@buildbot-master81.bb.releng.scl3.mozilla.com tests_scheduler]$ grep skip master/twistd.log | grep ubuntu64_vm_armv7_large-debug | grep "2/7" | wc -l
947
[kmoir@buildbot-master81.bb.releng.scl3.mozilla.com tests_scheduler]$ grep skip master/twistd.log | grep ubuntu64_vm_armv7_large-debug | grep "3/7" | wc -l
821
[kmoir@buildbot-master81.bb.releng.scl3.mozilla.com tests_scheduler]$ grep skip master/twistd.log | grep ubuntu64_vm_armv7_large-debug | grep "4/7" | wc -l
684
[kmoir@buildbot-master81.bb.releng.scl3.mozilla.com tests_scheduler]$ grep skip master/twistd.log | grep ubuntu64_vm_armv7_large-debug | grep "5/7" | wc -l
862
[kmoir@buildbot-master81.bb.releng.scl3.mozilla.com tests_scheduler]$ grep skip master/twistd.log | grep ubuntu64_vm_armv7_large-debug | grep "6/7" | wc -l
375
[kmoir@buildbot-master81.bb.releng.scl3.mozilla.com tests_scheduler]$ grep skip master/twistd.log | grep ubuntu64_vm_armv7_large-opt | grep "1/7" | wc -l
1469
[kmoir@buildbot-master81.bb.releng.scl3.mozilla.com tests_scheduler]$ grep skip master/twistd.log | grep ubuntu64_vm_armv7_large-opt | grep "2/7" | wc -l
955
[kmoir@buildbot-master81.bb.releng.scl3.mozilla.com tests_scheduler]$ grep skip master/twistd.log | grep ubuntu64_vm_armv7_large-opt | grep "3/7" | wc -l
765
[kmoir@buildbot-master81.bb.releng.scl3.mozilla.com tests_scheduler]$ grep skip master/twistd.log | grep ubuntu64_vm_armv7_large-opt | grep "4/7" | wc -l
874
[kmoir@buildbot-master81.bb.releng.scl3.mozilla.com tests_scheduler]$ grep skip master/twistd.log | grep ubuntu64_vm_armv7_large-opt | grep "5/7" | wc -l
622
[kmoir@buildbot-master81.bb.releng.scl3.mozilla.com tests_scheduler]$ grep skip master/twistd.log | grep ubuntu64_vm_armv7_large-opt | grep "6/7" | wc -l
425
Flags: needinfo?(jmaher)
Flags: needinfo?(gbrown)
Looking at mozilla-inbound on treeherder, SETA seems to work more effectively (run fewer tests) on Android 2.3; is the timeout different?
Flags: needinfo?(gbrown)
the android 2.3 tests take <50 minutes, the 4.3 ones (debug) take 70+ minutes.  I do wonder if the timeout of 60 minutes is counting completed jobs.
Flags: needinfo?(jmaher)
if we figure out the root cause, and determine that increasing the timeout to 120 minutes is more practical, then I would fully support it.
(Assignee)

Comment 6

a year ago
Created attachment 8681402 [details] [diff] [review]
bug1174746-2.patch

I'd like to test this and see if the results change
(Assignee)

Updated

a year ago
Attachment #8681402 - Flags: review?(jmaher)
Comment on attachment 8681402 [details] [diff] [review]
bug1174746-2.patch

Review of attachment 8681402 [details] [diff] [review]:
-----------------------------------------------------------------

nice and simple.

::: mozilla-tests/config_seta.py
@@ +32,5 @@
> +      if slave_sp in ["xp-ix"]:
> +          skipconfig_defaults_platform[slave_sp] = (14, 7200)
> +      elif slave_sp in ["ubuntu64_vm_armv7_mobile", "ubuntu64_vm_armv7_large"]:
> +          skipconfig_defaults_platform[slave_sp] = (7, 7200)
> +      else: 

nit: trailing whitespace after the final else
Attachment #8681402 - Flags: review?(jmaher) → review+
(Assignee)

Comment 8

a year ago
Comment on attachment 8681402 [details] [diff] [review]
bug1174746-2.patch

and fixed whitespace
Attachment #8681402 - Flags: checked-in+
(Assignee)

Comment 9

a year ago
Hmm, this doesn't seem to have fixed it, I'm investigating further.
I noticed:

http://hg.mozilla.org/build/buildbot-configs/annotate/71d454af8ab3/mozilla-tests/config_seta.py#l80

def sort_android_tests(platform, slave_platform, tests):
    """create a dictionary that maps slave platform to tests"""
    """initialize the dictionary of tests per platform"""
    tests_by_slave_platform = {}
    for s in slave_platform:
        tests_by_slave_platform[s] = []
    for t in tests:
        if t.split()[-1].startswith('plain-reftest'):
            tests_by_slave_platform[slave_platform[0]].append(t)
        elif t.split()[-1].startswith('crashtest'):
            tests_by_slave_platform[slave_platform[0]].append(t)
        elif t.split()[-1].startswith('jsreftest'):
            tests_by_slave_platform[slave_platform[0]].append(t)
        else:
            tests_by_slave_platform[slave_platform[1]].append(t)
    return tests_by_slave_platform

For Android 4.3, I think that associates plain/crash/js-reftests with ubuntu64_vm_armv7_mobile and all other test types with ubuntu64_vm_armv7_large; for 2.3, plain/crash/js-reftests with ubuntu64_vm_mobile and all other test types with ubuntu64_vm_large.

That seems backwards -- reftests should be on _large -- and too simplistic...aren't some mochitests now on _large?
(Assignee)

Comment 11

a year ago
yes this is is, thanks for looking, I'll fix it
(Assignee)

Comment 12

a year ago
Created attachment 8683833 [details] [diff] [review]
bug1174746-3.patch

This is just a temporary fix so we stop running so many android 4.3 debug jobs which is really bugging me.  I looked at the code in our configs and it's really ugly to parse which type of test machine is allocated to each test.  So I'm going to use allthethings.json instead which allows this mapping.
Attachment #8683833 - Flags: review?(gbrown)
Comment on attachment 8683833 [details] [diff] [review]
bug1174746-3.patch

Review of attachment 8683833 [details] [diff] [review]:
-----------------------------------------------------------------

I see -- a great start!
Attachment #8683833 - Flags: review?(gbrown) → review+
(Assignee)

Updated

a year ago
Attachment #8683833 - Flags: checked-in+
(Assignee)

Comment 14

a year ago
So this is working now despite the pain scheduling woes caused in bug 1223042.  Next step to to consume allthethings.json for setting rules re instance type that seta consumes.
(Assignee)

Comment 15

6 months ago
I think this can be closed.
Status: NEW → RESOLVED
Last Resolved: 6 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.