Closed
Bug 1298285
Opened 8 years ago
Closed 7 years ago
Intermittent dom/canvas/test/webgl-conf/generated/test_conformance__context__context-release-upon-reload.html | getError expected: NO_ERROR. Was CONTEXT_LOST_WEBGL : Should be no errors
Categories
(Core :: Graphics: CanvasWebGL, defect, P5)
Core
Graphics: CanvasWebGL
Tracking
()
RESOLVED
INCOMPLETE
People
(Reporter: intermittent-bug-filer, Assigned: cleu)
References
Details
(Keywords: intermittent-failure, Whiteboard: [gfx-noted][stockwell disabled])
Attachments
(1 file, 2 obsolete files)
Filed by: philringnalda [at] gmail.com https://treeherder.mozilla.org/logviewer.html#?job_id=34727958&repo=mozilla-inbound https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-inbound-win32/1472181618/mozilla-inbound_win7_ix_test-mochitest-gl-2-bm119-tests1-windows-build79.txt.gz
Comment 1•8 years ago
|
||
Bulk assigning P3 to all open intermittent bugs without a priority set in Firefox components per bug 1298978.
Priority: -- → P3
Updated•8 years ago
|
Whiteboard: [gfx-noted]
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 5•7 years ago
|
||
this seems to fail mostly on win7-pgo. the recent spike is high and it appears to have started on May 31st. I don't see any errors in 12+ hours on trunk branches, possibly this is fixed or greatly reduced? from this log: https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=104673572&lineNumber=7346 I see this error: 14:23:59 INFO - TEST-PASS | dom/canvas/test/webgl-conf/generated/test_conformance__context__context-release-upon-reload.html | Buffer was the correct size: 1680x1050 14:23:59 INFO - TEST-PASS | dom/canvas/test/webgl-conf/generated/test_conformance__context__context-release-upon-reload.html | context was created properly 14:23:59 INFO - TEST-PASS | dom/canvas/test/webgl-conf/generated/test_conformance__context__context-release-upon-reload.html | getError was expected value: NO_ERROR : Should be no errors 14:23:59 INFO - TEST-PASS | dom/canvas/test/webgl-conf/generated/test_conformance__context__context-release-upon-reload.html | Buffer was the correct size: 1680x1050 14:23:59 INFO - TEST-PASS | dom/canvas/test/webgl-conf/generated/test_conformance__context__context-release-upon-reload.html | context was created properly 14:23:59 INFO - Buffered messages finished 14:23:59 INFO - TEST-UNEXPECTED-FAIL | dom/canvas/test/webgl-conf/generated/test_conformance__context__context-release-upon-reload.html | getError expected: NO_ERROR. Was CONTEXT_LOST_WEBGL : Should be no errors 14:23:59 INFO - reportResults@dom/canvas/test/webgl-conf/mochi-single.html?checkout/conformance/context/context-release-upon-reload.html:22:7 14:23:59 INFO - reportTestResultsToHarness@dom/canvas/test/webgl-conf/checkout/js/js-test-pre.js:116:5 14:23:59 INFO - testFailed@dom/canvas/test/webgl-conf/checkout/js/js-test-pre.js:246:5 14:23:59 INFO - glErrorShouldBeImpl@dom/canvas/test/webgl-conf/checkout/js/webgl-test-utils.js:1590:5 14:23:59 INFO - glErrorShouldBe@dom/canvas/test/webgl-conf/checkout/js/webgl-test-utils.js:1564:3 14:23:59 INFO - testContext@dom/canvas/test/webgl-conf/checkout/conformance/context/context-release-upon-reload.html:66:3 14:23:59 INFO - @dom/canvas/test/webgl-conf/checkout/conformance/context/context-release-upon-reload.html:83:5 14:23:59 INFO - EventListener.handleEvent*@dom/canvas/test/webgl-conf/checkout/conformance/context/context-release-upon-reload.html:81:1 14:23:59 INFO - Not taking screenshot here: see the one that was previously logged 14:23:59 INFO - TEST-UNEXPECTED-FAIL | dom/canvas/test/webgl-conf/generated/test_conformance__context__context-release-upon-reload.html | Buffer was the wrong size: 0x0 14:23:59 INFO - reportResults@dom/canvas/test/webgl-conf/mochi-single.html?checkout/conformance/context/context-release-upon-reload.html:22:7 14:23:59 INFO - reportTestResultsToHarness@dom/canvas/test/webgl-conf/checkout/js/js-test-pre.js:116:5 14:23:59 INFO - testFailed@dom/canvas/test/webgl-conf/checkout/js/js-test-pre.js:246:5 14:23:59 INFO - testContext@dom/canvas/test/webgl-conf/checkout/conformance/context/context-release-upon-reload.html:70:5 14:23:59 INFO - @dom/canvas/test/webgl-conf/checkout/conformance/context/context-release-upon-reload.html:83:5 14:23:59 INFO - EventListener.handleEvent*@dom/canvas/test/webgl-conf/checkout/conformance/context/context-release-upon-reload.html:81:1 14:29:09 INFO - Not taking screenshot here: see the one that was previously logged 14:29:09 INFO - TEST-UNEXPECTED-FAIL | dom/canvas/test/webgl-conf/generated/test_conformance__context__context-release-upon-reload.html | Test timed out. 14:29:09 INFO - reportError@SimpleTest/TestRunner.js:121:7 14:29:09 INFO - TestRunner._checkForHangs@SimpleTest/TestRunner.js:142:7 14:29:09 INFO - setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5 14:29:09 INFO - setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5 14:29:09 INFO - setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5 14:29:09 INFO - setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5 14:29:09 INFO - setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5 14:29:09 INFO - setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5 14:29:09 INFO - setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5 14:29:09 INFO - setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5 14:29:09 INFO - setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5 14:29:09 INFO - setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5 14:29:09 INFO - setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5 14:29:09 INFO - setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5 14:29:09 INFO - setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5 14:29:09 INFO - setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5 14:29:09 INFO - setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5 14:29:09 INFO - setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5 14:29:09 INFO - setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5 14:29:09 INFO - setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5 14:29:09 INFO - setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5 14:29:09 INFO - setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5 14:29:09 INFO - setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5 14:29:09 INFO - setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5 14:29:09 INFO - setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5 14:29:09 INFO - setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5 14:29:09 INFO - TestRunner.runTests@SimpleTest/TestRunner.js:380:5 14:29:09 INFO - RunSet.runtests@SimpleTest/setup.js:194:3 14:29:09 INFO - RunSet.runall@SimpleTest/setup.js:173:5 14:29:09 INFO - hookupTests@SimpleTest/setup.js:266:5 14:29:09 INFO - parseTestManifest@http://mochi.test:8888/manifestLibrary.js:36:5 14:29:09 INFO - getTestManifest/req.onload@http://mochi.test:8888/manifestLibrary.js:49:11 14:29:09 INFO - EventHandlerNonNull*getTestManifest@http://mochi.test:8888/manifestLibrary.js:45:3 14:29:09 INFO - hookup@SimpleTest/setup.js:246:5 14:29:09 INFO - EventHandlerNonNull*@http://mochi.test:8888/tests?autorun=1&closeWhenDone=1&consoleLevel=INFO&hideResultsTable=1&manifestFile=tests.json&dumpOutputDirectory=c%3A%5Cusers%5Ccltbld%5Cappdata%5Clocal%5Ctemp&cleanupCrashes=true:11:1 14:29:10 INFO - GECKO(3668) | MEMORY STAT | vsize 1611MB | vsizeMaxContiguous 95MB | residentFast 135MB | heapAllocated 66MB 14:29:10 INFO - TEST-OK | dom/canvas/test/webgl-conf/generated/test_conformance__context__context-release-upon-reload.html | took 311824ms :milan, can you find someone to look at this in the next 2 weeks as this seems to have increased?
Flags: needinfo?(milan)
Whiteboard: [gfx-noted] → [gfx-noted][stockwell needswork]
Did we change the hardware/drivers on the systems that run these tests?
Assignee | ||
Comment 7•7 years ago
|
||
I'll look into it.
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → cleu
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 9•7 years ago
|
||
This failure seems to present only in 32-bit non-e10s mode, which makes me suspect it's a memory issue. I tried to reproduce it on my local VM with same configuration but no luck yet. And I observed the MEMORY STAT part when running this mochitest, my local VM has a smaller vsize (about 700~900 MB) and bigger vsizeMaxContiguous (about 200~400 MB) while tryserver has a bigger vsize (about 1600MB) and smaller vsizeMaxContiguous (about 150MB), it may indicate that there is more memory fragment in tryserver which is a potential cause to this failure.
Comment 10•7 years ago
|
||
we have one click loaners available. If you click on a job inside of treeherder (try or integration branch) and in the job details that display three is an option for a 'one click loaner'. There is a wizard once you get into the shell (via the browser) to setup and run a specific test job. We also have the ability to change the image if you feel there are things to do there. We share the linux64 image, but have :i386 libraries installed so that the 32 bit browser and tools run successfully.
Assignee | ||
Comment 11•7 years ago
|
||
This failure happens in win7-32bit. Windows VM is not supported by One-click loaner AFAIK.
Comment 12•7 years ago
|
||
:lenzak, do you have any updates on this intermittent? It looks to be failing at the same rate.
Flags: needinfo?(cleu)
Assignee | ||
Comment 13•7 years ago
|
||
I am still investigating it. Since I cannot reproduce it on my Windows 7 32-bit VM, I can only print some logs and push to try server to gather some information. I initially think that it is caused by a mis-discarded GL context because of our maximum live context policy, but it turns out that it has nothing to do with this failure. I am now printing all the context's memory address and comparing those event logs to find who force discard it and make the test fail.
Flags: needinfo?(cleu)
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 15•7 years ago
|
||
I think the context is force-lost because of a swap failure in WebGLContext::PresentScreenBuffer, now I will investigate why it fail, since it only happens in 32-bit and non-e10s configuration, I suspect it's caused by OOM.
Assignee | ||
Comment 16•7 years ago
|
||
OK, now I can confirm it's an OOM issue. https://dxr.mozilla.org/mozilla-central/rev/95543bdc59bd038a3d5d084b85a4fec493c349ee/gfx/layers/client/CanvasClient.cpp#484 Aside being unable to allocate new back screen buffer, there is a warning about fail to allocate TextureClient for the canvas which is usually caused by memory pressure always presents just before this testfail happen. It can also explain why this testfail only happens under 32-bit non-e10s configuration. I think the reason why I cannot reproduce on my local VM is because the OOM condition only happens when the VM is running multiple mochitest tasks, so maybe this issue will be fixed if we can isolate this test or split into even smaller chunks.
Comment 17•7 years ago
|
||
the machines we run on have 15gb of memory available. Is it possible that we are at the memory limit most of the time and we just happen to cross the limit on 5-10% of the time? Is it possible that when this fails there is another condition causing us to use much more memory or not free up previously used memory? Typically we run tests per directory which translates to per manifest. Could we split the manifest into two parts? In other directories we are able to run large volumes of tests in a single mochitest session. We always seem to fail on the same test: dom/canvas/test/webgl-conf/generated/test_conformance__context__context-release-upon-reload.html this indicates that maybe this test or a previously run test is the root cause?
Assignee | ||
Comment 18•7 years ago
|
||
Yes, this VM have 16GB memory, but it is Win7 32bit, only 3.2GB is available. Moreover, for a 32bit Windows app, it usually got memory problem when a single process used more than about 1.8G of memory. It's also why I think it explains why this only happens under non-e10s mode
Comment 19•7 years ago
|
||
good point about 32 bit, I overlooked that. Do we get value in testing non-e10s mode? In 7 weeks (firefox 57 on trunk) we will disable all non-e10s tests when there is a e10s version running, so in this case we will disable the non-e10s webgl tests. We could do this earlier :)
Assignee | ||
Comment 20•7 years ago
|
||
Actually there are some intermittent webgl testfails happens only in win7-32bit non-e10s mode, not only this one. To avoid more oranges, maybe we can disable mochitest-gl on all win32 non-e10s mode?
Comment 21•7 years ago
|
||
that is very easy to do; I would like to hear from :milan on that before making a quick decision.
Comment hidden (mozreview-request) |
Assignee | ||
Comment 23•7 years ago
|
||
This patch adds some logs about failures related to this testfail, I think it will be helpful for future diagnose if similar intermittent failure happens.
Comment hidden (mozreview-request) |
Comment 25•7 years ago
|
||
mozreview-review |
Comment on attachment 8879493 [details] Bug 1298285 - Add Logs to diagnose GL context lost caused by swap failure; https://reviewboard.mozilla.org/r/150800/#review155930 These are normal, and shouldn't bring down a debug build normally.
Attachment #8879493 -
Flags: review?(jgilbert) → review-
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 36•7 years ago
|
||
While we are waiting on the needinfo for the larger question of all mochitest-gl, let's skip this one test on win, non-e10s, since it fails so frequently.
Attachment #8886620 -
Flags: review?(jmaher)
Comment 37•7 years ago
|
||
Sorry, attached empty patch earlier.
Attachment #8886620 -
Attachment is obsolete: true
Attachment #8886620 -
Flags: review?(jmaher)
Attachment #8886621 -
Flags: review?(jmaher)
Comment 38•7 years ago
|
||
Comment on attachment 8886621 [details] [diff] [review] skip on win, non-e10s tests are not running anymore on win7 non-e10s as per bug 1379868
Attachment #8886621 -
Flags: review?(jmaher)
Updated•7 years ago
|
Attachment #8886621 -
Attachment is obsolete: true
Updated•7 years ago
|
Whiteboard: [gfx-noted][stockwell needswork] → [gfx-noted][stockwell disabled]
Comment hidden (Intermittent Failures Robot) |
Updated•7 years ago
|
Flags: needinfo?(milan)
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 45•7 years ago
|
||
Bulk priority update of open intermittent test failure bugs. P3 => P5 https://bugzilla.mozilla.org/show_bug.cgi?id=1381960
Priority: P3 → P5
Comment 46•7 years ago
|
||
https://wiki.mozilla.org/Bugmasters#Intermittent_Test_Failure_Cleanup
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
Comment hidden (Intermittent Failures Robot) |
You need to log in
before you can comment on or make changes to this bug.
Description
•