Closed Bug 1373107 Opened 7 years ago Closed 7 years ago

Intermittent dom/crypto/test/test_WebCrypto_Workers.html | Test timed out.

Categories

(Core :: DOM: Security, defect, P3)

defect

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: intermittent-bug-filer, Unassigned)

Details

(Keywords: bulk-close-intermittents, intermittent-failure, Whiteboard: [domsecurity-intermittent][stockwell unknown])

Priority: -- → P3
Whiteboard: [domsecurity-intermittent]
this is a linux debug failure- which started June 14th and on the 16th/17th the failure rate increased and has been fairly consistent:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1373107&startday=2017-06-12&endday=2017-06-23&tree=all

here is a recent linux64 log:
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=109432872&lineNumber=4046

an related text:
[task 2017-06-23T02:08:04.344963Z] 02:08:04     INFO - TEST-START | dom/crypto/test/test_WebCrypto_Workers.html
[task 2017-06-23T02:08:04.609558Z] 02:08:04     INFO - GECKO(1636) | ++DOMWINDOW == 20 (0xd9069800) [pid = 1677] [serial = 36] [outer = 0xdaedc000]
[task 2017-06-23T02:08:06.786626Z] 02:08:06     INFO - GECKO(1636) | --DOMWINDOW == 19 (0xd701f800) [pid = 1677] [serial = 30] [outer = (nil)] [url = http://mochi.test:8888/tests/dom/crypto/test/test_WebCrypto_RSA_PSS.html]
[task 2017-06-23T02:08:06.788955Z] 02:08:06     INFO - GECKO(1636) | --DOMWINDOW == 18 (0xd7041c00) [pid = 1677] [serial = 24] [outer = (nil)] [url = http://mochi.test:8888/tests/dom/crypto/test/test_WebCrypto_Normalize.html]
[task 2017-06-23T02:08:06.790412Z] 02:08:06     INFO - GECKO(1636) | --DOMWINDOW == 17 (0xd9ea3400) [pid = 1677] [serial = 22] [outer = (nil)] [url = http://mochi.test:8888/tests/dom/crypto/test/test_WebCrypto_JWK.html]
[task 2017-06-23T02:08:06.792926Z] 02:08:06     INFO - GECKO(1636) | --DOMWINDOW == 16 (0xd7242000) [pid = 1677] [serial = 32] [outer = (nil)] [url = http://mochi.test:8888/tests/dom/crypto/test/test_WebCrypto_Reject_Generating_Keys_Without_Usages.html]
[task 2017-06-23T02:08:06.797010Z] 02:08:06     INFO - GECKO(1636) | --DOMWINDOW == 15 (0xd701bc00) [pid = 1677] [serial = 28] [outer = (nil)] [url = http://mochi.test:8888/tests/dom/crypto/test/test_WebCrypto_RSA_OAEP.html]
[task 2017-06-23T02:08:06.801651Z] 02:08:06     INFO - GECKO(1636) | --DOMWINDOW == 14 (0xd7026c00) [pid = 1677] [serial = 31] [outer = (nil)] [url = http://mochi.test:8888/tests/SimpleTest/iframe-between-tests.html]
[task 2017-06-23T02:08:06.809054Z] 02:08:06     INFO - GECKO(1636) | --DOMWINDOW == 13 (0xd703c400) [pid = 1677] [serial = 20] [outer = (nil)] [url = http://mochi.test:8888/tests/dom/crypto/test/test_WebCrypto_Import_Multiple_Identical_Keys.html]
[task 2017-06-23T02:08:06.811307Z] 02:08:06     INFO - GECKO(1636) | --DOMWINDOW == 12 (0xd9ed4400) [pid = 1677] [serial = 16] [outer = (nil)] [url = http://mochi.test:8888/tests/dom/crypto/test/test_WebCrypto_ECDSA.html]
[task 2017-06-23T02:08:06.818212Z] 02:08:06     INFO - GECKO(1636) | --DOMWINDOW == 11 (0xd7249400) [pid = 1677] [serial = 26] [outer = (nil)] [url = http://mochi.test:8888/tests/dom/crypto/test/test_WebCrypto_PBKDF2.html]
[task 2017-06-23T02:08:06.822956Z] 02:08:06     INFO - GECKO(1636) | --DOMWINDOW == 10 (0xd9e1d000) [pid = 1677] [serial = 14] [outer = (nil)] [url = http://mochi.test:8888/tests/dom/crypto/test/test_WebCrypto_ECDH.html]
[task 2017-06-23T02:08:06.825758Z] 02:08:06     INFO - GECKO(1636) | --DOMWINDOW == 9 (0xe8f27c00) [pid = 1677] [serial = 8] [outer = (nil)] [url = http://mochi.test:8888/tests/dom/crypto/test/test_WebCrypto.html]
[task 2017-06-23T02:08:06.829537Z] 02:08:06     INFO - GECKO(1636) | --DOMWINDOW == 8 (0xd9ee8000) [pid = 1677] [serial = 12] [outer = (nil)] [url = http://mochi.test:8888/tests/dom/crypto/test/test_WebCrypto_DH.html]
[task 2017-06-23T02:08:06.831982Z] 02:08:06     INFO - GECKO(1636) | --DOMWINDOW == 7 (0xd9ea6c00) [pid = 1677] [serial = 18] [outer = (nil)] [url = http://mochi.test:8888/tests/dom/crypto/test/test_WebCrypto_HKDF.html]
[task 2017-06-23T02:08:08.043324Z] 02:08:08     INFO - GECKO(1636) | [Child 1677] WARNING: Unable to find interface object on global: file /home/worker/workspace/build/src/dom/base/nsDOMClassInfo.cpp, line 1807
[task 2017-06-23T02:08:09.817066Z] 02:08:09     INFO - GECKO(1636) | MEMORY STAT | vsize 707MB | residentFast 98MB | heapAllocated 19MB
[task 2017-06-23T02:08:10.359171Z] 02:08:10     INFO - TEST-OK | dom/crypto/test/test_WebCrypto_Workers.html | took 6005ms
[task 2017-06-23T02:08:12.457565Z] 02:08:12     INFO - GECKO(1636) | --DOMWINDOW == 6 (0xd7041800) [pid = 1677] [serial = 33] [outer = (nil)] [url = http://mochi.test:8888/tests/SimpleTest/iframe-between-tests.html]
[task 2017-06-23T02:08:21.316946Z] 02:08:21     INFO - GECKO(1636) | --DOMWINDOW == 5 (0xd724f000) [pid = 1677] [serial = 35] [outer = (nil)] [url = http://mochi.test:8888/tests/SimpleTest/iframe-between-tests.html]
[task 2017-06-23T02:08:21.321160Z] 02:08:21     INFO - GECKO(1636) | --DOMWINDOW == 4 (0xd7245c00) [pid = 1677] [serial = 34] [outer = (nil)] [url = http://mochi.test:8888/tests/dom/crypto/test/test_WebCrypto_Structured_Cloning.html]
[task 2017-06-23T02:12:59.271678Z] 02:12:59     INFO - GECKO(1636) | [Parent 1636] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80070057: file /home/worker/workspace/build/src/netwerk/base/nsChannelClassifier.cpp, line 316
[task 2017-06-23T02:16:59.326236Z] 02:16:59     INFO - GECKO(1636) | [Parent 1636] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80070057: file /home/worker/workspace/build/src/netwerk/base/nsChannelClassifier.cpp, line 316
[task 2017-06-23T02:16:59.345594Z] 02:16:59     INFO - GECKO(1636) | [Parent 1636] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80070057: file /home/worker/workspace/build/src/netwerk/base/nsChannelClassifier.cpp, line 316
[task 2017-06-23T02:18:24.996084Z] 02:18:24     INFO - TEST-INFO | started process screentopng
[task 2017-06-23T02:18:25.605053Z] 02:18:25     INFO - TEST-INFO | screentopng: exit 0
[task 2017-06-23T02:18:25.614986Z] 02:18:25     INFO - TEST-UNEXPECTED-FAIL | dom/crypto/test/test_WebCrypto_Workers.html | Test timed out. 
[task 2017-06-23T02:18:25.615368Z] 02:18:25     INFO -     reportError@SimpleTest/TestRunner.js:121:7
[task 2017-06-23T02:18:25.615678Z] 02:18:25     INFO -     TestRunner._checkForHangs@SimpleTest/TestRunner.js:142:7
[task 2017-06-23T02:18:25.617029Z] 02:18:25     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-23T02:18:25.620088Z] 02:18:25     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-23T02:18:25.623350Z] 02:18:25     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-23T02:18:25.625330Z] 02:18:25     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-23T02:18:25.627228Z] 02:18:25     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-23T02:18:25.629397Z] 02:18:25     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-23T02:18:25.631383Z] 02:18:25     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-23T02:18:25.633353Z] 02:18:25     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-23T02:18:25.637250Z] 02:18:25     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-23T02:18:25.639222Z] 02:18:25     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-23T02:18:25.641271Z] 02:18:25     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-23T02:18:25.643185Z] 02:18:25     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-23T02:18:25.645113Z] 02:18:25     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-23T02:18:25.647033Z] 02:18:25     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-23T02:18:25.649124Z] 02:18:25     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-23T02:18:25.651079Z] 02:18:25     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-23T02:18:25.653802Z] 02:18:25     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-23T02:18:25.655888Z] 02:18:25     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-23T02:18:25.657765Z] 02:18:25     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-23T02:18:25.659601Z] 02:18:25     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-23T02:18:25.661428Z] 02:18:25     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-23T02:18:25.663228Z] 02:18:25     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-23T02:18:25.665073Z] 02:18:25     INFO -     setTimeout handler*TestRunner._checkForHangs@SimpleTest/TestRunner.js:163:5
[task 2017-06-23T02:18:25.666913Z] 02:18:25     INFO -     TestRunner.runTests@SimpleTest/TestRunner.js:380:5
[task 2017-06-23T02:18:25.670526Z] 02:18:25     INFO -     RunSet.runtests@SimpleTest/setup.js:194:3
[task 2017-06-23T02:18:25.672827Z] 02:18:25     INFO -     RunSet.runall@SimpleTest/setup.js:173:5
[task 2017-06-23T02:18:25.674755Z] 02:18:25     INFO -     hookupTests@SimpleTest/setup.js:266:5
[task 2017-06-23T02:18:25.676572Z] 02:18:25     INFO - parseTestManifest@http://mochi.test:8888/manifestLibrary.js:36:5
[task 2017-06-23T02:18:25.678498Z] 02:18:25     INFO - getTestManifest/req.onload@http://mochi.test:8888/manifestLibrary.js:49:11
[task 2017-06-23T02:18:25.680365Z] 02:18:25     INFO - EventHandlerNonNull*getTestManifest@http://mochi.test:8888/manifestLibrary.js:45:3
[task 2017-06-23T02:18:25.682173Z] 02:18:25     INFO -     hookup@SimpleTest/setup.js:246:5
[task 2017-06-23T02:18:25.684134Z] 02:18:25     INFO - EventHandlerNonNull*@http://mochi.test:8888/tests?autorun=1&closeWhenDone=1&consoleLevel=INFO&manifestFile=tests.json&dumpOutputDirectory=%2Ftmp&cleanupCrashes=true:11:1
[task 2017-06-23T02:18:25.996258Z] 02:18:25    ERROR - [SimpleTest.finish()] this test already called finish!
[task 2017-06-23T02:18:26.016026Z] 02:18:26     INFO - TEST-UNEXPECTED-ERROR | dom/crypto/test/test_WebCrypto_Workers.html | called finish() multiple times
[task 2017-0


in fact we get a lot more info in the log, it appears we call finish many times.  This sounds like we are receiving events or stuck in a loop where we finsish but never clean up the handler.

:ckerschb, I see you are the triage owner for the dom:security component, could you find someone from the dom:security team to look into this in the next couple weeks?
Flags: needinfo?(ckerschb)
Whiteboard: [domsecurity-intermittent] → [domsecurity-intermittent][stockwell needswork]
as a note, this is run at a reduced frequency via SETA, so on high load every 5th push, overall I think we average out to every 4th push.  So 40 failures in 880 pushes, looks like 1 failure/22 pushes, but if we run infrequently this is more like 1 failure for every 5 or 6 pushes.
(In reply to Joel Maher ( :jmaher) from comment #2)
> :ckerschb, I see you are the triage owner for the dom:security component,
> could you find someone from the dom:security team to look into this in the
> next couple weeks?

Tim, any chance you could take that on?
Flags: needinfo?(ckerschb) → needinfo?(ttaubert)
This looks weird. It reports TEST-OK for test_WebCrypto_Workers.html, so that test has finished, but then the suite just hangs for 10 minutes.
Flags: needinfo?(ttaubert)
:ttaubert, any chance you are making progress on this bug?
Flags: needinfo?(ttaubert)
Looks like forced termination of workers could be a problem here. The last message printed before we time out is:

> TEST-PASS | dom/crypto/test/test_WebCrypto_Workers.html | Web crypto in terminating Worker

https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-beta&job_id=113339254&lineNumber=2804

That test was added in bug 1281874. Maybe Ben has an idea what's going wrong here? Maybe a deadlock of some sort?
Flags: needinfo?(ttaubert) → needinfo?(bkelly)
It looks like I can easily reproduce this locally by just running:

./mach mochitest dom/crypto/test/test_WebCrypto_Workers.html
Well, any of the dom/crypto tests seem to time out after completion. Not sure if that's an issue with the test runner/suite itself. The mochitest runner is killed after 330s of inactivity.
(In reply to Tim Taubert [:ttaubert] from comment #13)
> Looks like forced termination of workers could be a problem here.

The whole point of the test was to validate webcrypto handles worker shutdown correctly.  In that past it just caused UAF which was quite bad.

I'm sorry, but someone familiar with webcrypto needs to see where the test is getting stuck.  What async step is not firing, etc.  Then we can try to diagnose how this can happen in the worker shutdown code.

For example, during worker shutdown WorkerRunnable::Dispatch() can fail.  In cases where it needs to be infallible code should fall back to dispatching a control runnable.
Flags: needinfo?(bkelly)
(In reply to Tim Taubert [:ttaubert] from comment #15)
> Well, any of the dom/crypto tests seem to time out after completion. Not
> sure if that's an issue with the test runner/suite itself. The mochitest
> runner is killed after 330s of inactivity.

I mistakenly thought that the process would quit when running single tests, looks like it doesn't. I can't reproduce any of this on my machine, unfortunately.
Can you add debug to see if this fails in automation when the test times out?

http://searchfox.org/mozilla-central/source/dom/crypto/WebCryptoTask.cpp#422
Flags: needinfo?(ttaubert)
Also note, we should probably change this:

http://searchfox.org/mozilla-central/source/dom/crypto/WebCryptoTask.cpp#387

To use nsIGlobalObject::EventTargetFor() so the runnables are properly labeled on main thread.  This should work for Workers as well as of today's inbound.
(In reply to Ben Kelly [:bkelly] from comment #18)
> Can you add debug to see if this fails in automation when the test times out?
> 
> http://searchfox.org/mozilla-central/source/dom/crypto/WebCryptoTask.cpp#422

I unfortunately don't think I'll have time to do this in the next weeks, or probably until 57. I agree that the best way forward here might be to add a bunch of debug info and either just land them or push to try and retrigger until we hit it.
Flags: needinfo?(ttaubert)
I was actually hoping this would be fixed by bug 1340941, but it seems we've had a couple triggers since that landed.
Whiteboard: [domsecurity-intermittent][stockwell needswork] → [domsecurity-intermittent][stockwell unknown]
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.