Closed Bug 1403484 Opened 4 years ago Closed 4 years ago

enable mochitest-webgl tests on taskcluster windows 10 for opt builds

Categories

(Firefox Build System :: Task Configuration, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED
mozilla58

People

(Reporter: grenade, Assigned: grenade)

References

(Depends on 1 open bug)

Details

Attachments

(1 file)

suite mochitest-webgl succeeds for taskcluster windows 10 opt builds (debug builds cause an infra issue, see bug 1366288) but it would be good to have the ones that succeed running regularly.
Wow, that's great to hear. It will help a lot for us to hopefully solve the issues we see on bug 1261598, and bug 1352671.

Joel, could you take this?
Assignee: relops → nobody
Blocks: 1261598, 1352671
Component: RelOps → Task Configuration
Flags: needinfo?(jmaher)
Product: Infrastructure & Operations → Taskcluster
QA Contact: arich
things are green and looking at log files in detail we are doing the right thing:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=65928e2cc5bcb7a627a1c8db029815b6ee785718&filter-searchStr=webgl&group_state=expanded

there is a lot of blue and that takes a while to get results- I would like to get our blue jobs to be rare not the normal case.
Flags: needinfo?(jmaher)
No longer blocks: 1366288
Depends on: 1366288
we have a problem with win 10 gpu instances on ec2 going into impaired state (this shows up in treeherder as blue. the worker simply stops talking and taskcluster treats this as an exception).

i'm testing an idea that rebooting after each test (chunk) on win 10 gpu instances will help reduce the occurences of the instances going into impaired state. the config changes necessary have only just been made so won't propagate until currently running instances die off.

will update when i know more.
the reboot appears to be working. we already use the reboot after each task configuration on win 7 gpu although i don't remember our reasons for that.
Assignee: nobody → rthijssen
Status: NEW → ASSIGNED
Comment on attachment 8912704 [details]
Bug 1403484 - enable mochitest-webgl on tc win 10 with chunks:8;

https://reviewboard.mozilla.org/r/184032/#review189240

++ on the reboot
Attachment #8912704 - Flags: review?(jmaher) → review+
Comment on attachment 8912704 [details]
Bug 1403484 - enable mochitest-webgl on tc win 10 with chunks:8;

https://reviewboard.mozilla.org/r/184032/#review189292

please test on win7 prior to landing.

::: taskcluster/ci/test/tests.yml:917
(Diff revisions 1 - 2)
> -            windows10-64.*/debug: buildbot-bridge/buildbot-bridge
> -            default: null
>      chunks:
>          by-test-platform:
>              android.*: 10
> +            windows.*: 8

this does this for all windows (7/10 opt/debug)
Comment on attachment 8912704 [details]
Bug 1403484 - enable mochitest-webgl on tc win 10 with chunks:8;

https://reviewboard.mozilla.org/r/184032/#review189292

now tested (and validated green) on win 7
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2591ece80fdcee8b006fbb5b757267e93969c711&group_state=expanded&filter-searchStr=windows7

> this does this for all windows (7/10 opt/debug)

whoops! thanks for spotting and pointing that out.
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/48299b6e8832
enable mochitest-webgl on tc win 10 opt & set to 8 chunks; r=jmaher
Keywords: checkin-needed
oh, we will have to go back to 3 chunks for win10/debug- or figure out how to move debug at the same time.
Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/42af702edce0
enable mochitest-webgl on tc win 10 with chunks:8; r=jmaher
Keywords: checkin-needed
Backed out for frequently failing mochitest-gl dom/canvas/test/webgl-mochitest/test_capture.html on Windows 7 debug:

https://hg.mozilla.org/integration/autoland/rev/a9a870e907d155dcff8a5308da0962e05e7ee10e

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=42af702edce0d7a382a6b4a9785d9ecb07840f7d&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=usercancel&filter-resultStatus=runnable
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=134353880&repo=autoland

09:44:59     INFO -  670 INFO None671 INFO TEST-START | dom/canvas/test/webgl-mochitest/test_capture.html
09:44:59     INFO -  GECKO(5196) | --DOMWINDOW == 31 (085F3400) [pid = 4144] [serial = 27] [outer = 00000000] [url = http://mochi.test:8888/tests/SimpleTest/iframe-between-tests.html]
09:44:59     INFO -  GECKO(5196) | --DOMWINDOW == 30 (08617800) [pid = 4144] [serial = 23] [outer = 00000000] [url = http://mochi.test:8888/tests/SimpleTest/iframe-between-tests.html]
09:44:59     INFO -  GECKO(5196) | --DOMWINDOW == 29 (0866BC00) [pid = 4144] [serial = 29] [outer = 00000000] [url = http://mochi.test:8888/tests/SimpleTest/iframe-between-tests.html]
09:44:59     INFO -  GECKO(5196) | --DOMWINDOW == 28 (086B4800) [pid = 4144] [serial = 25] [outer = 00000000] [url = http://mochi.test:8888/tests/SimpleTest/iframe-between-tests.html]
09:44:59     INFO -  GECKO(5196) | Hit MOZ_CRASH() at z:/build/build/src/memory/build/mozjemalloc.cpp:1445
09:50:29     INFO -  Buffered messages finished
09:50:29  WARNING -  TEST-UNEXPECTED-TIMEOUT | dom/canvas/test/webgl-mochitest/test_capture.html | application timed out after 330 seconds with no output
09:50:29     INFO -  672 ERROR Force-terminating active process(es).
Flags: needinfo?(rthijssen)
So the underlying problem here is bug 1397201. And I find it fantastic that this actually happens that frequently on a TC worker now!! Right now I have a Windows loaner from Buildbot but haven't had the time to fully setup everything. Given that a TC worker should make that process simpler, I might wanna switch it.
do we think the changes here (enable win 10 and change chunk size from 3 to 8 on both 7 & 10) are contributing to the failure? i had the impression that the win 7 problem has been going on since before this change...
Flags: needinfo?(rthijssen)
We had those failures already on Windows 10 build bot machines. Just with a different signature before I fixed bug 1403616 by end of last week.
i don't think we can attribute the timeout failures on win 7 to this patch. here, we're only enabling mochitest-webgl on windows 10 and changing the chunk size from 3 to 8. i wanted to keep the chunk size consistent on both win 7 and win 10 to make it easier to compare test results between the two platforms.

i ran a try push with the chunk size change applied to win 10 only (eg: no change to win 7 chunk size) and win 7 had the same timeout errors.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=00d32defe354ddd24ece9473a7e5edff9ac7afa7&group_state=expanded

in any case, i can't see how enabling this suite on windows 10 can be causing failures on windows 7. i think the win 7 failures are a different issue entirely and shouldn't hold back this patch which only affects windows 10.
I would like to land this again- but mozreview doesn't allow me to land when it was already landed once- I will land on inbound manually.
:grenade, can you reopen the review inside of mozreview?
reopened...
Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/c618c6c92b9c
enable mochitest-webgl on tc win 10 with chunks:8; r=jmaher
https://hg.mozilla.org/mozilla-central/rev/c618c6c92b9c
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
Product: TaskCluster → Firefox Build System
You need to log in before you can comment on or make changes to this bug.