Closed Bug 1400071 Opened 7 years ago Closed 7 years ago

some windows 10 ix machines appear to be failing webgl tests

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jmaher, Assigned: aobreja)

References

Details

https://treeherder.mozilla.org/#/jobs?repo=try&revision=aba49bf51ce76fb1e625752a04f2af25e5f4d0f7&filter-tier=1&filter-tier=2&filter-tier=3 ignoring the failures with stars, the others are failing with error messages like this: 8:50:29 INFO - 25 INFO TEST-START | dom/canvas/test/webgl-conf/generated/test_2_conformance2__attribs__gl-vertexattribipointer-offsets.html 08:50:29 INFO - GECKO(1544) | Unable to read VR Path Registry from C:\Users\cltbld\AppData\Local\openvr\openvrpaths.vrpath 08:50:29 INFO - GECKO(1544) | JavaScript warning: http://mochi.test:8888/tests/dom/canvas/test/webgl-conf/checkout/js/webgl-test-utils.js, line 1443: Error: WebGL warning: Refused to create WebGL2 context because of blacklist entry: FEATURE_FAILURE_UNKNOWN_DEVICE_VENDOR 08:50:29 INFO - GECKO(1544) | JavaScript warning: http://mochi.test:8888/tests/dom/canvas/test/webgl-conf/checkout/js/webgl-test-utils.js, line 1443: Error: WebGL warning: Failed to create WebGL context: WebGL creation failed: 08:50:29 INFO - GECKO(1544) | * Refused to create WebGL2 context because of blacklist entry: FEATURE_FAILURE_UNKNOWN_DEVICE_VENDOR 08:50:29 INFO - TEST-INFO | started process screenshot 08:50:29 INFO - TEST-INFO | screenshot: exit 0 08:50:29 INFO - Buffered messages logged at 08:50:29 08:50:29 INFO - 26 INFO TEST-PASS | dom/canvas/test/webgl-conf/generated/test_2_conformance2__attribs__gl-vertexattribipointer-offsets.html | A valid string reason is expected 08:50:29 INFO - 27 INFO TEST-PASS | dom/canvas/test/webgl-conf/generated/test_2_conformance2__attribs__gl-vertexattribipointer-offsets.html | Reason cannot be empty 08:50:29 INFO - Buffered messages finished 08:50:29 ERROR - 28 INFO TEST-UNEXPECTED-FAIL | dom/canvas/test/webgl-conf/generated/test_2_conformance2__attribs__gl-vertexattribipointer-offsets.html | Unable to fetch WebGL rendering context for Canvas I suspect these are related to the fact that there is a need to do something in the bios for the video card which is unique to windows 10. the range I see is 77-93, possibly it is larger than this and knowing this pool is a clue to a batch that was converted.
:arr, can you help get this into the right place to verify we have setup these machines properly?
Flags: needinfo?(arich)
Arr is currently PTO for the next few days. @alin, @andrei: do you have any idea?
Flags: needinfo?(aselagea)
Flags: needinfo?(aobreja)
:aobreja is looking at this today to see if there's anything we can assist with before arr's return. Removing :aselagea's NI.
Flags: needinfo?(aselagea)
> I suspect these are related to the fact that there is a need to do something > in the bios for the video card which is unique to windows 10. Did some investigation and came to conclusion that this issue is not generated by video card of some settings in bios since the e10 mochitest jobs finish successfully most of the time and sometime failed,even on the same machine. By checking [1] and filter after "failed" jobs or "e10n mochitest" or "name of the failed job" we see that some failed jobs have green status for other machines,machines that have the same configuration.Also by checking [2] you can see that there are lots of webgl tests that are green.In fact there are very few that have failed. I don't think this is a bios setting or a machine issue since for the other machines these tests are running well,this sounds more like exception issue.I'm pretty sure that if we let for few days these test to ran the same test will pass on a machine where it failed before. [1]https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix?numbuilds=500 [2]https://treeherder.mozilla.org/#/jobs?repo=try&revision=aba49bf51ce76fb1e625752a04f2af25e5f4d0f7&filter-tier=1&filter-tier=2&filter-tier=3&selectedJob=131158264
Flags: needinfo?(aobreja)
thanks for looking into this- it likely could be related to the tests- I just found it odd that the failures were in a small range of machine numbers- let me collect more failures. looking at machine stats, I see consistent failures for gpu and webgl jobs: 75 has a green instance https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix-076?numbuilds=500 https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix-077?numbuilds=500 (all 'gpu' jobs are red) https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix-078?numbuilds=500 https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix-079?numbuilds=500 https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix-080?numbuilds=500 ... https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix-090?numbuilds=500 https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix-091?numbuilds=500 https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix-092?numbuilds=500 https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix-093?numbuilds=500 https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix-094?numbuilds=500 95 had no instances 96 had green instances and on a machine outside of the range I don't see failures on gpu or gl- jobs: https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix-103?numbuilds=500 https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix-105?numbuilds=500 I really think this is something related to machines- I have seen this before and the patterns are really odd- but I am doing many retriggers to see if there are anynew patterns.
this was also seen by :philor in reference to failures seen on mozilla-beta in bug 1400099. I believe this is hardware related and would like to see these machines reimaged.
Disabled t-w1064-ix-076 through t-w1064-ix-095.
Blocks: 1397225
Did some investigation why only this range cause problems and it seems that machines between [076-095] had the onboard graphics enabled and also they ran at lower resolution. I fixed that by disable onboard graphics and verify that all machines are on the right resolution and enable them back .The problem should be fixed since now these machines are configured the same way as the other that don't cause issue.
The problem is solved,I don't see any other failing webgl tests since the change. I'll mark this bug as resolved ,if anything change please feel free to re-open the bug.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(arich)
Resolution: --- → FIXED
Moving this in our courtyard and giving credit to :aobreja for fixing this.
Assignee: nobody → aobreja
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.