Closed Bug 1201236 Opened 4 years ago Closed 4 years ago

run tests on Try to generate seta data for Android chunking in bug 1183877

Categories

(Infrastructure & Operations :: CIDuty, task)

task
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kmoir, Assigned: kmoir)

References

Details

Attachments

(8 files, 5 obsolete files)

6.93 KB, text/plain
Details
27.64 KB, patch
jlund
: review+
kmoir
: checked-in+
Details | Diff | Splinter Review
2.95 KB, patch
kmoir
: review+
gbrown
: checked-in+
Details | Diff | Splinter Review
26.63 KB, patch
gbrown
: review+
gbrown
: checked-in+
Details | Diff | Splinter Review
5.10 KB, text/plain
Details
4.91 KB, patch
jlund
: review+
kmoir
: checked-in+
Details | Diff | Splinter Review
2.53 KB, patch
Details | Diff | Splinter Review
1.42 KB, patch
Details | Diff | Splinter Review
In order to enable the tests in bug 1183877, we need to run the tests on try to generate SETA data that can be transplanted to run fewer tests on m-i and fxteam.  We simply don't have enough capacity to run 48 more jobs on Android and even if we did, it's not an efficient use of resources.

Given that we plan to use this approach for other platforms i.e. new e10s tests
this process should be repeatable

What known changesets should we run the tests on?
How many times should we run the tests (100 was mentioned in the last SETA meeting)
Can we use mozci to run the tests?
Do they have to be run on the weekend?
jmaher: The questions above are for you :-)
Flags: needinfo?(jmaher)
kmoir, thanks for asking!  I don't know the perfect answer, but giving this some thought over the last week or two I have some ideas.

1) we should push to try and get 20 data points from each chunk.  This will tell us the stability of the chunks.  that can be done by default with --rebuild 20.  Ideally this would be a weekend push.

2) I think we should push to try for a series of 20 revisions (tip of mozilla-central).  While this isn't perfect since those are usually the most stable, it gives us a starting point of showing how stable these tests are over time.  This can be scripted fairly easily.

3) maybe a replacement for #2.  look at a similar platform and find all revisions where we had failures, run those revisions on try.  There are 198 test jobs that were real failures on android 4.0 debug plain-reftest-*.  This boils down to 102 revisions and in the last 30 days we have 10:
| 2015-08-06 15:25:41 | plain-reftest-8 | 94831690f525 |
| 2015-08-06 15:37:23 | plain-reftest-8 | 74ecf0f57a56 |
| 2015-08-06 16:40:19 | plain-reftest-8 | d9bb3467a3b2 |
| 2015-08-07 03:42:14 | plain-reftest-1 | dc3a755872e1 |
| 2015-08-25 20:59:19 | plain-reftest-3 | c47ccde36664 |
| 2015-08-31 13:40:53 | plain-reftest-2 | 2734a3110b4a |
| 2015-08-31 14:13:55 | plain-reftest-3 | 008983d8b30a |
| 2015-08-31 15:12:52 | plain-reftest-3 | 6782a855c243 |
| 2015-09-01 12:16:20 | plain-reftest-3 | cea380e8dc0d |
| 2015-09-01 13:06:23 | plain-reftest-3 | 6ddd2771e164 |

this is the trend about 10/month.

Getting more creative, I have data for the last 7 weeks of the actual root cause of the failures:
| 2015-07-17 15:54:30 | plain-reftest-1 | b1b8616162b4 | http://10.26.130.17:30269/tests/layout/reftests/abs-pos//abs-pos-auto-margin-centered.html                                                                                                                                                                       |
| 2015-07-17 15:57:50 | plain-reftest-1 | ab61ece2428c | http://10.26.132.21:30487/tests/layout/reftests/abs-pos//abs-pos-auto-margin-centered.html                                                                                                                                                                       |
| 2015-07-17 16:06:25 | plain-reftest-1 | f6a5ad7edc09 | http://10.26.131.22:30420/tests/layout/reftests/abs-pos//abs-pos-auto-margin-centered.html                                                                                                                                                                       |
| 2015-08-06 15:25:41 | plain-reftest-8 | 94831690f525 | http://10.26.132.18:30448/tests/layout/reftests/writing-mode/abspos//1183431-orthogonal-modes-5a.html,http://10.26.132.18:30448/tests/layout/reftests/writing-mode/abspos//1183431-orthogonal-modes-7a.html,http://10.26.132.18:30448/tests/layout/reftests/writ |
| 2015-08-06 15:37:23 | plain-reftest-8 | 74ecf0f57a56 | http://10.26.132.21:30483/tests/layout/reftests/writing-mode/abspos//1183431-orthogonal-modes-5a.html                                                                                                                                                            |
| 2015-08-06 16:40:19 | plain-reftest-8 | d9bb3467a3b2 | http://10.26.133.18:30535/tests/layout/reftests/writing-mode/abspos//1183431-orthogonal-modes-5a.html                                                                                                                                                            |
| 2015-08-25 20:59:19 | plain-reftest-3 | c47ccde36664 | http://10.26.131.18:30365/tests/layout/reftests/css-gradients//large-gradient-3.html,http://10.26.131.18:30365/tests/layout/reftests/css-gradients//large-gradient-4.html                                                                                        |
| 2015-08-31 13:40:53 | plain-reftest-2 | 2734a3110b4a | http://10.26.132.22:30503/tests/layout/reftests/backgrounds//attachment-local-clipping-color-1.html,http://10.26.132.22:30503/tests/layout/reftests/backgrounds//attachment-local-clipping-color-2.html,http://10.26.132.22:30503/tests/layout/reftests/backgrou |


This should at least help us seed which chunks we care about and what failures to look for.  Since this is a *NEW* platform we cannot expect the same failures.

gbrown, what do you think?
Flags: needinfo?(jmaher) → needinfo?(gbrown)
Assignee: nobody → kmoir
I don't have strong opinions...we should make an effort to be sensible in our changeset selection, but as jmaher suggests, there's probably no "right" answer.

Regarding #3, Android 4.3 opt reftests are a possible alternative to Android 4.0 debug (I'm not sure if 4.0 vs 4.3 or opt vs debug is the more pertinent difference).
Flags: needinfo?(gbrown)
ok, then lets push the full set for these revisions:
94831690f525
74ecf0f57a56
d9bb3467a3b2
dc3a755872e1
c47ccde36664
2734a3110b4a
008983d8b30a
6782a855c243
cea380e8dc0d
6ddd2771e164

we will see if anything interesting comes from it, if not, we can maybe figure out which chunk those specific test failures mentioned in comment 2 fall in and use those as the chunks we run for every push.
Okay I have some questions because I've never triggered builds like this before

So I was looking at how to run these via mozci. But I'm not sure how how I do this with it.

I need to patch the chunking in tree so there are more debug chunks like this
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0cadc2b2dbc7&exclusion_profile=false

But I need to run against a specific revision as well.
So do I checkout one of the revisions in the comment above, patch the chunking, and then push to try with the new revision? Or is there a better way?
Flags: needinfo?(jmaher)
oh, great question- I am glad you asked rather than give yourself a headache.  The way you are thinking is actually correct.

if you are using patches, I would create a new patch:
hg qnew reftest_chunks.patch
<make all the edits>
hg qrefresh
hg qpop

try_syntax = "try: -b d -p android4.3 -u reftest_bomb" <- you get the picture
for rev in revisions:
    hg update rev
    hg qpush
    hg qrefresh -m "update rev: %s; %s" % (rev, try_syntax)
    hg push -f try
    hg qpop

In fact there is pre-existing code for this:
https://github.com/jmaher/alert_manager/blob/master/bisection_script.py#L70

That is intended to read a push range and then find the revisions, so edit the script to your liking or just copy piece to meet your needs!

When all is said and done, there will be a series of try pushes that we need to visually inspect.
Flags: needinfo?(jmaher)
Thanks Joel!
I ran some tests last night

The builds ran but trying to figure out why all the test jobs didn't get invoked
https://treeherder.mozilla.org/#/jobs?repo=try&revision=bdf3068b5827&exclusion_profile=false

like they did here
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0cadc2b2dbc7&exclusion_profile=false

perhaps I have to specify each test chunk separately instead of just reftest, jsreftest, crashtest since the additional chunked builders are missing in buildbot configs
So I changed the try syntax to list all the test chunks explicitly

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b2540af91b4c&exclusion_profile=false

It still only runs the old number of chunks.  Not sure if this is because the builders are missing. But I don't want to enable the 48 new builders on try, because we need SETA data to reduce the load from this change.  Will talk to some folks in releng and see what I can do to test.
I think it is because the builders are missing.

I'm not sure how useful this will be, but see my hacks in the try pushes in https://bugzilla.mozilla.org/show_bug.cgi?id=1140471#c16. Basically I run C1,C2,J1..J6,R1..R16 in the first push. Then in the second push, I hack android_emulator_unittest.py so the C1 job actually runs C3, C2 runs C4, J1 runs J7, etc. Finally I use a similar hack on the 3rd push to pick up the remaining jobs.
Attached patch bug1183877mh.patch (obsolete) — Splinter Review
mh patches for cedar

I'm going to try running these on cedar before enabling on try to shake out any mh issues.  Testing them is staging is too painful, I have to handcraft a bunch of test zips with mh changes
Attached patch bug1201236bb.patch (obsolete) — Splinter Review
buildbot configs to enable on cedar

note for previous patch: I fixed a bunch of pep 8 warnings as well
Attached file builder.diff
Attachment #8660114 - Attachment is obsolete: true
Attachment #8660091 - Flags: review?(jlund)
Attachment #8660226 - Flags: review?(jlund)
I will look at this tomorrow
Comment on attachment 8660226 [details] [diff] [review]
bug1201236bb.patch

Review of attachment 8660226 [details] [diff] [review]:
-----------------------------------------------------------------

wow, this really exposes the awkwardness required to have different chunking for debug vs opt :(

::: mozilla-tests/mobile_config.py
@@ +2176,5 @@
>       ),
>  ]
>  
> +ANDROID_4_3_MOZHARNESS_DEBUG_TRUNK = [
> +    ('jsreftest-1', {

do the names here mean anything? I can't recall but if they are just names that can be whatever, maybe we should name them with 'debug' in it? don't really mind. not a blocker. I see argument both ways.
Attachment #8660226 - Flags: review?(jlund) → review+
Comment on attachment 8660091 [details] [diff] [review]
bug1183877mh.patch

Review of attachment 8660091 [details] [diff] [review]:
-----------------------------------------------------------------

looks good. one part in question. see comments below.

r- for now. feel free to push back if I'm wrong

::: testing/mozharness/configs/android/androidarm_4_3.py
@@ +155,5 @@
> +                "--http-port=%(http_port)s",
> +                "--ssl-port=%(ssl_port)s",
> +                "--httpd-path", "%(modules_dir)s",
> +                "--symbols-path=%(symbols_path)s",
> +                "--total-chunks=16",

should this be 48 here?

actually, it looks like reftest and reftest-debug in this file already defines --total-chunks in 'options'. e.g. https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/configs/android/androidarm_4_3.py#376

I wonder if we are duplicating --total-chunks args
Attachment #8660091 - Flags: review?(jlund) → review-
Attached patch bug1183877mh-2.patch (obsolete) — Splinter Review
thanks for catching that, updated patch
Attachment #8660091 - Attachment is obsolete: true
Attachment #8661858 - Flags: review?(jlund)
Attachment #8661858 - Flags: review?(jlund) → review+
Attachment #8660226 - Flags: checked-in+
So the builders are enabled on cedar but the build sendchange is not invoking debug tests for this platform - investigating.
Seems to be working now but all the newly chunked tests are failing like this

09:25:24    ERROR - Can't run command ['/builds/slave/test/build/venv/bin/python', '-u', '/builds/slave/test/build/tests/reftest/remotereftest.py', '--app=org.mozilla.fennec', '--ignore-window-size', '--dm_trans=adb', '--bootstrap', '--remote-webserver=10.0.2.2', '--xre-path=/builds/slave/test/build/hostutils/xre', '--utility-path=/builds/slave/test/build/hostutils/bin', '--http-port=8854', '--ssl-port=4454', '--httpd-path', '/builds/slave/test/build/tests/modules', '--symbols-path=/builds/slave/test/build/symbols', '--total-chunks=48', 'tests/layout/reftests/reftest.list', '--this-chunk=45'] in non-existent directory '/builds/slave/test/build/tests/reftest'!
09:25:24    ERROR - No tests run or test summary not found

http://buildbot-master51.bb.releng.use1.mozilla.com:8201/builders/Android%204.3%20armv7%20API%2011+%20cedar%20debug%20test%20plain-reftest-45/builds/0/steps/run_script/logs/stdio
The problem appears to be that '/builds/slave/test/build/tests/reftest' does not exist, even though /builds/slave/test certainly exists and 'mkdir: /builds/slave/test/build/tests' seemed to succeed. This has puzzled me, but I just noticed this:

09:52:09     INFO - Using the following test package requirements:
09:52:09     INFO - {u'common': [u'fennec-43.0a1.en-US.android-arm.common.tests.zip'],
09:52:09     INFO -  u'cppunittest': [u'fennec-43.0a1.en-US.android-arm.common.tests.zip',
09:52:09     INFO -                   u'fennec-43.0a1.en-US.android-arm.cppunittest.tests.zip'],
09:52:09     INFO -  u'jittest': [u'fennec-43.0a1.en-US.android-arm.common.tests.zip',
09:52:09     INFO -               u'jsshell-android-arm.zip'],
09:52:09     INFO -  u'mochitest': [u'fennec-43.0a1.en-US.android-arm.common.tests.zip',
09:52:09     INFO -                 u'fennec-43.0a1.en-US.android-arm.mochitest.tests.zip'],
09:52:09     INFO -  u'mozbase': [u'fennec-43.0a1.en-US.android-arm.common.tests.zip'],
09:52:09     INFO -  u'reftest': [u'fennec-43.0a1.en-US.android-arm.common.tests.zip',
09:52:09     INFO -               u'fennec-43.0a1.en-US.android-arm.reftest.tests.zip'],
09:52:09     INFO -  u'talos': [u'fennec-43.0a1.en-US.android-arm.common.tests.zip',
09:52:09     INFO -             u'fennec-43.0a1.en-US.android-arm.talos.tests.zip'],
09:52:09     INFO -  u'web-platform': [u'fennec-43.0a1.en-US.android-arm.common.tests.zip',
09:52:09     INFO -                    u'fennec-43.0a1.en-US.android-arm.web-platform.tests.zip'],
09:52:09     INFO -  u'webapprt': [u'fennec-43.0a1.en-US.android-arm.common.tests.zip'],
09:52:09     INFO -  u'xpcshell': [u'fennec-43.0a1.en-US.android-arm.common.tests.zip',
09:52:09     INFO -                u'fennec-43.0a1.en-US.android-arm.xpcshell.tests.zip']}
09:52:09     INFO - Downloading packages: [u'fennec-43.0a1.en-US.android-arm.common.tests.zip'] for test suite category: reftest-debug

I think the "reftest-debug" is not found as a test suite, so only the common zip is downloaded and extracted.
Now tests are running to completion but the job is still failing.

http://ftp.mozilla.org/pub/mozilla.org/mobile/tinderbox-builds/cedar-android-api-11-debug/1444150175/cedar_ubuntu64_vm_armv7_large-debug_test-jsreftest-2-bm67-tests1-linux64-build0.txt.gz

11:59:20     INFO -  REFTEST INFO | Successful: 10499 (10499 pass, 0 load only)
11:59:20     INFO -  REFTEST INFO | Unexpected: 0 (0 unexpected fail, 0 unexpected pass, 0 unexpected asserts, 0 unexpected fixed asserts, 0 failed load, 0 exception)
11:59:20     INFO -  REFTEST INFO | Known problems: 1 (0 known fail, 0 known asserts, 0 random, 1 skipped, 0 slow)
11:59:20     INFO -  REFTEST INFO | Total canvas count = 0
11:59:20     INFO -  REFTEST TEST-START | Shutdown
11:59:31     INFO -  INFO | automation.py | Application ran for: 0:55:58.241006
...
11:59:35    ERROR - No tests run or test summary not found
Comment on attachment 8670406 [details] [diff] [review]
add test package aliases for Android debug reftests

Thanks for looking at this, I was heads down on another bug today. 

Thanks for fixing this, it looks like the tests ran but from treeherder it appears some timed out.  However, the logs look okay.
Attachment #8670406 - Flags: review?(kmoir) → review+
The log parser was still confused because it could not find a suite name like "reftest-debug". I updated my previous fix with similar aliases for the log parser suite category.
Attachment #8670406 - Attachment is obsolete: true
Attachment #8670840 - Flags: review?(kmoir)
This is kmoir's bug1183877mh-2.patch, simply updated for bitrot. I had to re-land this on cedar because the recent merge changed the android reftest config pretty significantly.
Comment on attachment 8670840 [details] [diff] [review]
use test suite category aliases for Android debug reftests

thanks!
Attachment #8670840 - Flags: review?(kmoir) → review+
Yay! Looks like this worked for everything but J7 which ran too long.
https://treeherder.mozilla.org/#/jobs?repo=cedar&revision=9029c344b38a

Thinking about how to enable these jobs on try without having them run by default for everyone. To generate the SETA data for the changesets in comment 4.  Right now you can not run by default on try on a per platform basis.
I was thinking about this during my run this morning.  

One strategy would be to enable the additional chunks on try in stages
i.e. new crashtests, new js-reftests, then plain-reftests 
run tests with the revisions above and gather data

Or enable all the new builders on try, run the tests on the weekend, then disable the builders.

I'll write patches for one of those strategies

I increased the size of the tst-emulator64 pool again this morning in bug 1204756
whatever strategy you use, lets just keep the bug updated.  I will be offline Friday-Sunday for a 3 day weekend truly offline- Monday I will be happy to analyze data and pre-seed SETA.
Attached patch bug1201236try.patch (obsolete) — Splinter Review
This patch is so I can generate SETA data on a weekend with the revs from comment 4.  I would not enable this patch in production during the week because we don't have enough capacity to handle it, the entire point of this bug :-)
builder diff for try buiders
Attachment #8671517 - Flags: review?(jlund)
Comment on attachment 8671517 [details] [diff] [review]
bug1201236try.patch

Review of attachment 8671517 [details] [diff] [review]:
-----------------------------------------------------------------

I'm not sure about the high level logic but patch looks sane :)

::: mozilla-tests/mobile_config.py
@@ +3391,5 @@
>              'opt_unittest_suites': deepcopy(ANDROID_4_3_AWS_DICT['opt_unittest_suites']),
>              'debug_unittest_suites': deepcopy(ANDROID_4_3_AWS_DICT['debug_unittest_suites']),
>  }
>  
> +# bug 1201236 run tests on Try to generate seta data for debug Android chunking 

nit: whitespace
Attachment #8671517 - Flags: review?(jlund) → review+
Depends on: 1213661
actually I'm going to try to run them in sets on try so I don't add too many jobs at once. First will try to run jstreftests and crashtests since that only adds 16 new jobs. Will add builder diff.
Attachment #8673253 - Flags: review?(jlund)
builder.diff
Joel, do you have some newer revisions that I you would suggest other than the ones in comment #4.  There have been a lot of mh etc changes since they and the current patches don't apply.
Flags: needinfo?(jmaher)
(In reply to Kim Moir [:kmoir] from comment #35)
> Created attachment 8673253 [details] [diff] [review]
> bug1201236bb-4.patch
> 
> actually I'm going to try to run them in sets on try so I don't add too many
> jobs at once. First will try to run jstreftests and crashtests since that
> only adds 16 new jobs. Will add builder diff.

nice!
Comment on attachment 8673253 [details] [diff] [review]
bug1201236bb-4.patch

Review of attachment 8673253 [details] [diff] [review]:
-----------------------------------------------------------------

::: mozilla-tests/mobile_config.py
@@ +3399,5 @@
>              'opt_unittest_suites': deepcopy(ANDROID_4_3_AWS_DICT['opt_unittest_suites']),
>              'debug_unittest_suites': deepcopy(ANDROID_4_3_AWS_DICT['debug_unittest_suites']),
>  }
>  
> +# bug 1201236 run tests on Try to generate seta data for debug Android chunking 

nit: whitespace
Attachment #8673253 - Flags: review?(jlund) → review+
in the last month:
select testtype,revision from testjobs where platform='android-4-0-armv7-api11' and date>'2015-09-14' and testtype like 'plain-reftest-%' and buildtype='debug' and failure_classification=2 group by bugid;
+-----------------+--------------+
| testtype        | revision     |
+-----------------+--------------+
| plain-reftest-3 | 25a0211ff6a9 |
| plain-reftest-2 | 0a34ebc90b12 |
| plain-reftest-8 | a62c1724002a |
| plain-reftest-8 | 0573cd4aed27 |
| plain-reftest-1 | 41be7cabf48d |
| plain-reftest-8 | b2af2e6f96f5 |
| plain-reftest-3 | 5b5ebeeb2979 |
| plain-reftest-3 | d6793bb3e45b |
| plain-reftest-8 | b3937b455406 |
| plain-reftest-1 | 9020444fc6ea |
| plain-reftest-5 | c7cc8e32b62f |
| plain-reftest-7 | c93cd72f8d40 |
| plain-reftest-5 | efcfe0c08c24 |
| plain-reftest-2 | 9fa379806dd4 |
| plain-reftest-3 | 8b3a4fdebdfc |
| plain-reftest-3 | a8e146496aec |
| plain-reftest-7 | e768739ec812 |
| plain-reftest-3 | 40ddc543dd16 |
+-----------------+--------------+
18 rows in set (33.50 sec)


breaking this down by ones where we have failures, I see:
| plain-reftest-3 | 25a0211ff6a9 | http://10.26.128.17:30088/tests/dom/canvas/test/reftest//capturestream.html
| plain-reftest-2 | 0a34ebc90b12 | http://10.26.133.20:30562/tests/layout/reftests/bugs//400826-1.html
| plain-reftest-8 | a62c1724002a | http://10.26.128.18:30103/tests/layout/reftests/transform-3d//preserve3d-1a.html,http://10.26.128.18:30103/tests/layout/reftests/transform-3d//preserve3d-4a.html,http://10.26.128.18:30103/tests/layout/reftests/transform-3d//preserve3d-5a.html
| plain-reftest-8 | 0573cd4aed27 | http://10.26.130.22:30323/tests/layout/reftests/transform//compound-1a.html,http://10.26.130.22:30323/tests/layout/reftests/transform//rotate-2a.html,http://10.26.130.22:30323/tests/layout/reftests/transform//rotate-2b.html
| plain-reftest-1 | 41be7cabf48d | http://10.26.133.23:30599/tests/layout/reftests/async-scrolling//bg-fixed-child-mask.html
| plain-reftest-3 | d6793bb3e45b | http://10.26.133.17:30526/tests/layout/reftests/bugs//1114526-1.html
| plain-reftest-8 | b3937b455406 | http://10.26.132.17:30441/tests/layout/reftests/text//control-chars-04a.html,http://10.26.132.17:30441/tests/layout/reftests/text//control-chars-04b.html,http://10.26.132.17:30441/tests/layout/reftests/text//control-chars-04c.html
| plain-reftest-5 | c7cc8e32b62f | http://10.26.137.22:30908/tests/layout/reftests/layers//opacity-blending.html
| plain-reftest-5 | efcfe0c08c24 | http://10.26.129.17:30178/tests/layout/reftests/margin-collapsing//block-min-height-last-child-4b-dyn.html
| plain-reftest-2 | 9fa379806dd4 | http://10.26.129.21:30220/tests/layout/reftests/backgrounds//attachment-local-clipping-color-4.html,http://10.26.129.21:30220/tests/layout/reftests/backgrounds//attachment-local-clipping-color-5.html
| plain-reftest-3 | 8b3a4fdebdfc | http://10.26.129.22:30233/tests/dom/canvas/test/reftest//capturestream.html


This boils down to 11 rows:
25a0211ff6a9
0a34ebc90b12
a62c1724002a
0573cd4aed27
41be7cabf48d
d6793bb3e45b
b3937b455406
c7cc8e32b62f
efcfe0c08c24
9fa379806dd4
8b3a4fdebdfc


lets go with those revisions and see where we end up!  Ideally we can see the same errors and determine which chunks they land in.
Comment on attachment 8673253 [details] [diff] [review]
bug1201236bb-4.patch

fixed whitespace
Attachment #8673253 - Flags: checked-in+
Attachment #8671517 - Attachment is obsolete: true
Keywords: leave-open
Comment on attachment 8670844 [details] [diff] [review]
bug1183877mh-2.patch updated for bitrot

r=jlund carried
Attachment #8670844 - Flags: review+
Attachment #8661858 - Attachment is obsolete: true
Joel, I just noticed that this data is for reftests and I have jsreftests and crashtests currently enabled on try (see comment #35) I assume the revision data is different for these tests. Or do we not need data for them and care more about reftests because they are the bulk of jobs being added

As an aside, I currently have 
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1029f363a24f&exclusion_profile=false rev: 0a34ebc90b12
https://treeherder.mozilla.org/#/jobs?repo=try&revision=98f50850a318&exclusion_profile=false rev a62c1724002a   rev: 0a34ebc90b12
https://treeherder.mozilla.org/#/jobs?repo=try&revision=94887b64c4cc&exclusion_profile=false rev: 25a0211ff6a9

running but again I think these changesets are specific to reftests.  I can disabled the jsreftests and crashtests and enable the plain-reftests tomorrow on try
Flags: needinfo?(jmaher)
Thanks Joel, try runs for jsreftest are running here
28e36be3f00b https://treeherder.mozilla.org/#/jobs?repo=try&revision=5088804cef86&exclusion_profile=false
b5a67fbf0805 https://treeherder.mozilla.org/#/jobs?repo=try&revision=b799da04abbb&exclusion_profile=false

After they are done, I'll enable the reftest builders on try and run with other revisions
run reftests only
Attachment #8670840 - Flags: checked-in+
Attachment #8670844 - Flags: checked-in+
good news, the two pushes to find jsreftest failures found them exactly and we have j5/j6 to worry about!  SETA is all preseeded with these jobs.
reftest jobs are enabled now and I have a try run for the first revision running.  If this works, I'll run them for the other revisions

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9a3c2914f432&exclusion_profile=false
try runs for reftests on the revisions from comment 40 are running now. jmaher will enter the data into SETA when finished, and then I have a patch in bug 1183877 to enable these on 4.3 and disable on 4.0.  Then we are all done with pandas except for talos which is being actively worked on in bug 1170685.
a lot of data to sift through, but I have some recommendations.  First off, some odd things like original tests failing as originally stated (in most cases) and on the new emulator chunks everything passed, or different tests failed.

In about half the cases we saw what we expected.  One quirky common failure is in R27:
http://ftp.mozilla.org/pub/mozilla.org/mobile/try-builds/kmoir@mozilla.com-5e22725fbb1e/try-android-api-11-debug/try_ubuntu64_vm_armv7_large-debug_test-plain-reftest-27-bm120-tests1-linux64-build0.txt.gz
REFTEST TEST-UNEXPECTED-FAIL | http://10.0.2.2:8854/tests/layout/reftests/image-rect/background-zoom-2.html | image comparison (==), max difference: 16, number of differing pixels: 11

this is a known intermittent in bug 1101424, so maybe we don't need to worry- I just saw it often.

With that said, here is the list of reftest jobs we should run:
R2, R5, R15, R16, R18, R27, R31, R37, R44, R45

That is 10 jobs, when we originally had 8, so more jobs, more runtime.  Actually R27 was a common failure and only on the list because of the intermittent mentioned above, so we can remove it and make the list 9 jobs (out of 48).

I will do that for now, :gbrown, can you take a look at the issue in bug 1101424 to see if we can fix it or disable it?
Flags: needinfo?(gbrown)
These try runs finished on Friday and Joel enabled the SETA data.  I'm watching m-i and fx-team to see that the tests are being skipped as the builders were enabled this morning in bug 1213661.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
(In reply to Joel Maher (:jmaher) from comment #52)
> I will do that for now, :gbrown, can you take a look at the issue in bug
> 1101424 to see if we can fix it or disable it?

See https://bugzilla.mozilla.org/show_bug.cgi?id=1101424#c145.
Flags: needinfo?(gbrown)
If we want to, we can uplift changes to beta (Firefox 43); there's just one reftest that needs fuzzing.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=fe049152a07a
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.