Closed Bug 1210395 Opened 4 years ago Closed 4 years ago
"Green up" tests on OS X 10
Added several t-yosemite-r7 slaves to slavealloc, created and configured a test master on dev-master2.bb.releng.use1.mozilla.com and invoked sendchanges for several changesets from mozilla-central. The issue here is that some jobs still end up with warnings, most of them are web-platform tests (opt+debug), but also some mochitest and marionette jobs.
Jobs that finish with warnings: - both opt&debug test web-platform-tests-2 test web-platform-tests-3 test web-platform-tests-reftests test mochitest-devtools-chrome-2 test mochitest-gl - opt: test web-platform-tests-4 - debug: test web-platform-tests-5 test web-platform-tests-7 test marionette Talked to Coop about this and we suspect that the issue might be related to moving from one OS version to another (10.10.2 --> 10.10.5). :jmaher we would appreciate it if you could take a look into this. Thanks!
can you send me a link to try or other branch where I can view these? Ideally try is the best so we can test fixes, retrigger to see patterns, etc. I won't be able to fix everything- this is usually a 100-200 hour project, if it is less, I will make a bigger dent.
We know that openssl no longer allows SSLv3 connections, and that the cert checking stuff in python 2.7.10 has been tightened up. I'm wondering if some of the failures are due to one/both of these. We were comparing two jobs, the first on an r5 that passed: http://buildbot-master107.bb.releng.scl3.mozilla.com:8201/builders/Rev5%20MacOSX%20Yosemite%2010.10%20mozilla-central%20opt%20test%20web-platform-tests-2/builds/403/steps/run_script/logs/stdio The second on an r7 that didn't (same changeset, same software): http://dev-master2.bb.releng.use1.mozilla.com:8095/builders/Rev5%20MacOSX%20Yosemite%2010.10%20mozilla-central%20opt%20test%20web-platform-tests-2/builds/2/steps/run_script/logs/stdio In this case and one we looked at before, the r7 does a runner_teardown and gives a WARNING around line 1363, but the r5 does not. In all other instances, a runner_teardown only happens if the test times out or if the end of the test run is reached. I'm wondering if this is significant (why is it doing a teardown here in the middle of the suite, after a successful test?)
keep in mind this could be an intermittent issue- I really don't see this on mozilla-inbound in the last 24 hours. if you look in the log there is a 'test-unexpected-fail', this means that we have failures and need to address them unless they are intermittent. looking at the failure, I see: https://dxr.mozilla.org/mozilla-central/source/testing/web-platform/meta/fetch/nosniff/image.html.ini?offset=400 note the specific hardcoding for osx 10.10.2, it sounds like we need to add 10.10.5 in there, there are 412 instances: https://dxr.mozilla.org/mozilla-central/search?q=10.10.2&redirect=true&case=true&limit=51&offset=351 There is a tool to do this *automated* for web platform tests specifically if we push to try for all platforms and run ALL the web platform tests, then we can: * testing/web-platform/update/fetchlogs.py * ./mach web-paltform-tests-update /path/to/logs/*.log * look at the diff of files, publish the diff for review, land, profit. for mochitest/reftest/etc. it will be a different story. I think the key is that we get these on try so we can look for patterns and use other tools.
Right, with wpt it's important to remember that there is metadata that determines the expected result and that this metadata is os version specific. So when adding a new platform we need to add new metadata.
(In reply to Amy Rich [:arr] [:arich] from comment #3) > We know that openssl no longer allows SSLv3 connections, and that the cert > checking stuff in python 2.7.10 has been tightened up. I'm wondering if some > of the failures are due to one/both of these. > > We were comparing two jobs, the first on an r5 that passed: > > http://buildbot-master107.bb.releng.scl3.mozilla.com:8201/builders/ > Rev5%20MacOSX%20Yosemite%2010.10%20mozilla-central%20opt%20test%20web- > platform-tests-2/builds/403/steps/run_script/logs/stdio > > The second on an r7 that didn't (same changeset, same software): > http://dev-master2.bb.releng.use1.mozilla.com:8095/builders/ > Rev5%20MacOSX%20Yosemite%2010.10%20mozilla-central%20opt%20test%20web- > platform-tests-2/builds/2/steps/run_script/logs/stdio > > In this case and one we looked at before, the r7 does a runner_teardown and > gives a WARNING around line 1363, but the r5 does not. In all other > instances, a runner_teardown only happens if the test times out or if the > end of the test run is reached. I'm wondering if this is significant (why is > it doing a teardown here in the middle of the suite, after a successful > test?) I'm not savvy on the way the test runners work. Maybe it uses different test runners for different test suites? I'm out of my depth on this one.
:kmoir pushed some try tests on yosemite-r7, more details: - https://treeherder.mozilla.org/#/jobs?repo=try&revision=a479174a8763 - https://treeherder.mozilla.org/#/jobs?repo=try&revision=0084ba8c2789 After the tests were completed, I made a comparison between them: Failing jobs: -> mozilla-central opt test gtest -> mozilla-central debug test gtest Warning jobs: -> mozilla-central opt test mochitest-devtools-chrome-2 -> mozilla-central opt test mochitest-gl -> mozilla-central opt test web-platform-tests-2 -> mozilla-central opt test web-platform-tests-3 -> mozilla-central opt test web-platform-tests-4 -> mozilla-central opt test web-platform-tests-reftests -> mozilla-central debug test mochitest-2 -> mozilla-central debug test mochitest-devtools-chrome-3 -> mozilla-central debug test mochitest-gl -> mozilla-central debug test web-platform-tests-2 -> mozilla-central debug test web-platform-tests-3
gtest is 100% failing on every push on every tree on every OS, so you don't have to worry about that.
So I just got mass failures on a try push because of this. I don't understand why we're pushing these machines out to try rather than creating a pool just for some twig and using that to green them up.
That would be bug 1203128 rather than this, but the answer is that it didn't expect that to happen, it thought that it was requiring that you use "-u web-platform-tests[Ubuntu,10.8,10.10.5,Windows XP,Windows 7,Windows 8]" to wind up with the 10.10.5 slaves, and thought https://treeherder.mozilla.org/#/jobs?repo=try&revision=33bc0551337f would still get the 10.10 slaves. ("10.8"? Did you mean 10.6?)
(In reply to Vlad Ciobancai [:vladC] from comment #7) > :kmoir pushed some try tests on yosemite-r7, more details: > - https://treeherder.mozilla.org/#/jobs?repo=try&revision=a479174a8763 > - https://treeherder.mozilla.org/#/jobs?repo=try&revision=0084ba8c2789 Trychooser syntax needs to be "try: -b do -p macosx64 -u all[10.10.5] -t all[10.10.5]" with the doubled 10.10.5 that the website won't add to it for you, in order to get talos.
Looks like there a bug disabling 10.10.5 by default when 10.10 is explicitly specified. I've opened a bug 1212887 for this
from #ateam today jmaher jgraham: btw, did you get the osx 10.10.5 data from your try run? jgraham jmaher: Yeah, but https://github.com/jgraham/treeherder_timeline was supposed to be a sea of green jgraham Uh jgraham https://treeherder.mozilla.org/#/jobs?repo=try&revision=9c325655c676 jmaher jgraham: looks like it isn't all green- is there anything I can do to help? jgraham jmaher: Make try runs faster? jgraham But I will fix up some of the remaining issues and see what another cycle brings jgraham But now I have to leave kmoir jgraham: it looks like some of these tests ran on r5 machines instead of r7 jmaher oh, -u [10.10], not -u [10.10.5] jgraham Wait, did we fix that bug? jgraham Last time [10.10] gave me 10.10.5 kmoir I fixed that bug on Friday kmoir please specify 10.10.5 jgraham Oh, OK jmaher jgraham: could I hack on the tool to not require windows? maybe that would help speed things up jgraham hanks for fixing the bug
Today I run a push try test on the new yosemite-r7 slaves, more details https://treeherder.mozilla.org/#/jobs?repo=try&revision=6b4a05498515 jmaher, jgraham: did you manage to find a resolution for the web platform tests?
we are not there yet- the long turnaround on try (windows) means we context switch off and finally get around to it much later than optimal. I recall that James was updating some tests and had to fix/remove those before getting to green- but 10.10.2 and 10.10.5 are looking very similar on the latest try push.
Yeah, I think I have "solved" the 10.10.5 problem, but it's mixed in with a pull of those tests from upstream which has introduced some issues. So I am working through those now. https://treeherder.mozilla.org/#/jobs?repo=try&revision=d009e6c5e85f was my last try push, but note that Wr on the base commit was bad, so there is less real orange than it seems.
all done here.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
jmaher, is there a way to uplift the patches from bug 1216542 bug 1216549 bug 1216551 bug 1223372 to non-trunk branches? We would like to totally decommission the r5 10.10 test machines.
I would be concerned about the patch in bug 1216542 as it touches the js source. It was only necessary for code on trunk, so we could uplift the others and give it a try- not sure how to really try it out. This could be done for Aurora, then the code could uplift to beta; we still have mozilla-release and esr, likewise old school b2g branches- I assume those all run osx jobs.
I mistyped the bug, it is bug 1223372 which was js specific and related to a patch on trunk. I have uplifted the other 3 patches from bugs 1216542, bug 1216549, and bug 1216551
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
You need to log in before you can comment on or make changes to this bug.