some windows 10 ix machines appear to be failing webgl tests

RESOLVED FIXED

Status

Release Engineering
Buildduty
RESOLVED FIXED
2 months ago
2 months ago

People

(Reporter: jmaher, Assigned: aobreja)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

(Reporter)

Description

2 months ago
https://treeherder.mozilla.org/#/jobs?repo=try&revision=aba49bf51ce76fb1e625752a04f2af25e5f4d0f7&filter-tier=1&filter-tier=2&filter-tier=3

ignoring the failures with stars, the others are failing with error messages like this:
8:50:29     INFO -  25 INFO TEST-START | dom/canvas/test/webgl-conf/generated/test_2_conformance2__attribs__gl-vertexattribipointer-offsets.html
08:50:29     INFO -  GECKO(1544) | Unable to read VR Path Registry from C:\Users\cltbld\AppData\Local\openvr\openvrpaths.vrpath
08:50:29     INFO -  GECKO(1544) | JavaScript warning: http://mochi.test:8888/tests/dom/canvas/test/webgl-conf/checkout/js/webgl-test-utils.js, line 1443: Error: WebGL warning: Refused to create WebGL2 context because of blacklist entry: FEATURE_FAILURE_UNKNOWN_DEVICE_VENDOR
08:50:29     INFO -  GECKO(1544) | JavaScript warning: http://mochi.test:8888/tests/dom/canvas/test/webgl-conf/checkout/js/webgl-test-utils.js, line 1443: Error: WebGL warning: Failed to create WebGL context: WebGL creation failed:
08:50:29     INFO -  GECKO(1544) | * Refused to create WebGL2 context because of blacklist entry: FEATURE_FAILURE_UNKNOWN_DEVICE_VENDOR
08:50:29     INFO -  TEST-INFO | started process screenshot
08:50:29     INFO -  TEST-INFO | screenshot: exit 0
08:50:29     INFO -  Buffered messages logged at 08:50:29
08:50:29     INFO -  26 INFO TEST-PASS | dom/canvas/test/webgl-conf/generated/test_2_conformance2__attribs__gl-vertexattribipointer-offsets.html | A valid string reason is expected
08:50:29     INFO -  27 INFO TEST-PASS | dom/canvas/test/webgl-conf/generated/test_2_conformance2__attribs__gl-vertexattribipointer-offsets.html | Reason cannot be empty
08:50:29     INFO -  Buffered messages finished
08:50:29    ERROR -  28 INFO TEST-UNEXPECTED-FAIL | dom/canvas/test/webgl-conf/generated/test_2_conformance2__attribs__gl-vertexattribipointer-offsets.html | Unable to fetch WebGL rendering context for Canvas


I suspect these are related to the fact that there is a need to do something in the bios for the video card which is unique to windows 10.


the range I see is 77-93, possibly it is larger than this and knowing this pool is a clue to a batch that was converted.
(Reporter)

Comment 1

2 months ago
:arr, can you help get this into the right place to verify we have setup these machines properly?
Flags: needinfo?(arich)
Arr is currently PTO for the next few days.
@alin, @andrei: do you have any idea?
Flags: needinfo?(aselagea)
Flags: needinfo?(aobreja)
:aobreja is looking at this today to see if there's anything we can assist with before arr's return. Removing :aselagea's NI.
Flags: needinfo?(aselagea)
(Assignee)

Comment 4

2 months ago
> I suspect these are related to the fact that there is a need to do something
> in the bios for the video card which is unique to windows 10.

Did some investigation and came to conclusion that this issue is not generated by video card of some settings in bios since the e10 mochitest jobs finish successfully most of the time and sometime failed,even on the same machine.
By checking [1] and filter after "failed" jobs or "e10n mochitest" or "name of the failed job" we see that some failed jobs have green status for other machines,machines that have the same configuration.Also by checking [2] you can see that there are lots of webgl tests that are green.In fact there are very few that have failed.

I don't think this is a bios setting or a machine issue since for the other machines these tests are running well,this sounds more like exception issue.I'm pretty sure that if we let for few days these test to ran the same test will pass on a machine where it failed before. 

[1]https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix?numbuilds=500
[2]https://treeherder.mozilla.org/#/jobs?repo=try&revision=aba49bf51ce76fb1e625752a04f2af25e5f4d0f7&filter-tier=1&filter-tier=2&filter-tier=3&selectedJob=131158264
Flags: needinfo?(aobreja)
(Reporter)

Comment 5

2 months ago
thanks for looking into this- it likely could be related to the tests- I just found it odd that the failures were in a small range of machine numbers- let me collect more failures.

looking at machine stats, I see consistent failures for gpu and webgl jobs:
75 has a green instance
https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix-076?numbuilds=500
https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix-077?numbuilds=500 (all 'gpu' jobs are red)
https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix-078?numbuilds=500
https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix-079?numbuilds=500
https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix-080?numbuilds=500
...
https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix-090?numbuilds=500
https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix-091?numbuilds=500
https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix-092?numbuilds=500
https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix-093?numbuilds=500
https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix-094?numbuilds=500
95 had no instances
96 had green instances


and on a machine outside of the range I don't see failures on gpu or gl- jobs:
https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix-103?numbuilds=500
https://secure.pub.build.mozilla.org/buildapi/recent/t-w1064-ix-105?numbuilds=500


I really think this is something related to machines- I have seen this before and the patterns are really odd- but I am doing many retriggers to see if there are anynew patterns.
(Reporter)

Comment 6

2 months ago
this was also seen by :philor in reference to failures seen on mozilla-beta in bug 1400099.  I believe this is hardware related and would like to see these machines reimaged.
Disabled t-w1064-ix-076 through t-w1064-ix-095.
Blocks: 1397225
Also caused reftest failures, bug 1400004 / https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1400004
Duplicate of this bug: 1400004
(Assignee)

Comment 10

2 months ago
Did some investigation why only this range cause problems and it seems that machines between [076-095] had the onboard graphics enabled and also they ran at lower resolution.
I fixed that by disable onboard graphics and verify that all machines are on the right resolution and enable them back .The problem should be fixed since now these machines are configured the same way as the other that don't cause issue.
(Assignee)

Comment 11

2 months ago
The problem is solved,I don't see any other failing webgl tests since the change.
I'll mark this bug as resolved ,if anything change  please feel free to re-open the bug.
Status: NEW → RESOLVED
Last Resolved: 2 months ago
Flags: needinfo?(arich)
Resolution: --- → FIXED
Moving this in our courtyard and giving credit to :aobreja for fixing this.
Assignee: nobody → aobreja
Component: Platform Support → Buildduty

Comment 13

2 months ago
3 failures in 943 pushes (0.003 failures/push) were associated with this bug in the last 7 days.    

Repository breakdown:
* try: 3

Platform breakdown:
* windows10-64: 3

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1400071&startday=2017-09-18&endday=2017-09-24&tree=all
You need to log in before you can comment on or make changes to this bug.