Closed Bug 1031928 Opened 10 years ago Closed 4 years ago

Intermittent serviceworkers/test_installation_simple.html | Test timed out

Categories

(Core :: DOM: Workers, defect, P5)

defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox33 --- wontfix
firefox34 --- wontfix
firefox35 --- wontfix
firefox36 --- disabled
firefox37 --- disabled
firefox38 --- affected
firefox-esr31 --- unaffected

People

(Reporter: philor, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [rr])

https://tbpl.mozilla.org/php/getParsedLog.php?id=42669538&tree=Mozilla-Inbound Android 2.3 Armv6 Emulator mozilla-inbound opt test mochitest-9 on 2014-06-27 16:38:29 PDT for push 3f4fe642a893 slave: tst-linux64-spot-024 17:16:15 INFO - 954 INFO TEST-PASS | /tests/dom/workers/test/serviceworkers/test_installation_simple.html | Should fail with a SecurityError 17:16:15 INFO - 955 INFO TEST-UNEXPECTED-FAIL | /tests/dom/workers/test/serviceworkers/test_installation_simple.html | Test timed out. 17:16:15 INFO - 956 INFO TEST-INFO | MEMORY STAT vsize after test: 447393792 17:16:15 INFO - 957 INFO TEST-INFO | MEMORY STAT residentFast after test: 173191168 17:16:15 INFO - 958 INFO TEST-INFO | MEMORY STAT heapAllocated after test: 28566284 17:16:15 INFO - 959 INFO TEST-END | /tests/dom/workers/test/serviceworkers/test_installation_simple.html | finished in 325100ms
https://tbpl.mozilla.org/php/getParsedLog.php?id=42677632&tree=Mozilla-Inbound Ubuntu VM 12.04 mozilla-inbound debug test mochitest-4 on 2014-06-27 19:22:18 PDT for push febfad44b731 https://tbpl.mozilla.org/php/getParsedLog.php?id=42681764&tree=Mozilla-Inbound Android 2.2 Armv6 Tegra mozilla-inbound opt test mochitest-7 on 2014-06-27 21:37:33 PDT for push 13ac876b0439 https://tbpl.mozilla.org/php/getParsedLog.php?id=42722372&tree=Mozilla-Inbound Ubuntu VM 12.04 x64 mozilla-inbound debug test mochitest-4 on 2014-06-29 10:13:24 PDT for push 5cc5f26b237e https://tbpl.mozilla.org/php/getParsedLog.php?id=42675853&tree=Mozilla-Central b2g_emulator_vm mozilla-central opt test mochitest-8 on 2014-06-27 18:44:59 PDT for push 286b6cb59c3e https://tbpl.mozilla.org/php/getParsedLog.php?id=42694170&tree=Mozilla-Central Android 2.3 Emulator mozilla-central opt test mochitest-9 on 2014-06-28 03:46:38 PDT for push 2f5df65e3662 https://tbpl.mozilla.org/php/getParsedLog.php?id=42696815&tree=Mozilla-Central Ubuntu VM 12.04 x64 mozilla-central debug test mochitest-4 on 2014-06-28 04:56:55 PDT for push 6ee5583b0fa6 https://tbpl.mozilla.org/php/getParsedLog.php?id=42705992&tree=Mozilla-Central Ubuntu VM 12.04 x64 mozilla-central debug test mochitest-4 on 2014-06-28 18:47:02 PDT for push afa67a2f7905 https://tbpl.mozilla.org/php/getParsedLog.php?id=42703123&tree=B2g-Inbound Ubuntu VM 12.04 b2g-inbound pgo test mochitest-4 on 2014-06-28 13:13:32 PDT for push 5a0a2d744c3e https://tbpl.mozilla.org/php/getParsedLog.php?id=42722620&tree=B2g-Inbound Ubuntu VM 12.04 b2g-inbound pgo test mochitest-4 on 2014-06-29 10:38:02 PDT for push 8789ff012731
Perhaps a clue, b2g managed to crash instead of just time out in https://tbpl.mozilla.org/php/getParsedLog.php?id=42723771&tree=Mozilla-Inbound
These tests are failing like crazy on Android/B2G. We need action ASAP or we'll need to resort to disabling instead.
Flags: needinfo?(nsm.nikhil)
Assignee: nobody → nsm.nikhil
Flags: needinfo?(nsm.nikhil)
This is likely a bug in the worker error handling. ReportErrorRunnable::WorkerRun returns early due to IsAcceptingEvents() returning false. http://mxr.mozilla.org/mozilla-central/source/dom/workers/WorkerPrivate.cpp#1310 According to :khuey, we might still want to skip the call to the parent ReportError itself, but we want to do everything else in that function. Waiting for :bent to confirm
Depends on: 1040924
This test was disabled on all platforms except Windows in bug 1040924.
Ben, see comment 75?
Flags: needinfo?(bent.mozilla)
That should be bug 1042908.
Flags: needinfo?(bent.mozilla)
With rr and chaos mode, I was able to record what looks like this timeout. Is the underlying issue known to be bug 1042908, or would it help anyone to debug my recording?
Flags: needinfo?(bent.mozilla)
Whiteboard: [rr]
Fwiw, I don't see the early-return case that was changed in bug 1042908 being hit in my recorded timeout. Not sure if that fix was expected to take care of this too.
Chris, bug 1042908 was just our best guess I think. It's very possible that something else is going wrong here.
Flags: needinfo?(bent.mozilla)
Here's some output before the timeout that looks possibly relevant ---------- [rr 2162 69276]TEST-PASS | /tests/dom/workers/test/serviceworkers/test_installation_simple.html | URL should be of the worker [rr 2162 69355][2162] WARNING: Already had a UpdatePromise. Aborting that one!: file /home/cjones/rr/mozilla-central/dom/workers/ServiceWorkerManager.cpp, line 560 [rr 2162 69426]TEST-PASS | /tests/dom/workers/test/serviceworkers/test_installation_simple.html | First registration should fail with AbortError [rr 2162 70253]TEST-PASS | /tests/dom/workers/test/serviceworkers/test_installation_simple.html | Second registration should succeed [rr 2162 70834][2162] WARNING: Called close() before start()!: 'mStarted', file /home/cjones/rr/mozilla-central/dom/workers/MessagePort.cpp, line 214 ... [rr 2162 70914][2162] WARNING: A runnable was posted to a worker that is already shutting down!: file /home/cjones/rr/mozilla-central/dom/workers/WorkerPrivate.cpp, line 2220 (previous output repeats several more times) [rr 2162 71718][2162] WARNING: 'NS_FAILED(rv)', file /home/cjones/rr/mozilla-central/dom/workers/ServiceWorkerManager.cpp, line 1327 (that repeats several more times) ###!!! [Parent][OnMaybeDequeueOne] Error: Channel closing: too late to send/recv, messages will be lost (repeats several more times) [rr 2162 78555][2162] WARNING: Called close() before start()!: 'mStarted', file /home/cjones/rr/mozilla-central/dom/workers/MessagePort.cpp, line 214 [rr 2162 78555][2162] WARNING: Using a worker event target after the thread has alreadybeen released!: file /home/cjones/rr/mozilla-central/dom/workers/WorkerPrivate.cpp, line 2213 (repeats several more times) .... [rr 2162 117192]TEST-UNEXPECTED-FAIL | /tests/dom/workers/test/serviceworkers/test_installation_simple.html | Test timed out. - expected: PASS ---------- I don't know this code so that doesn't ring any bells for me. But if someone has a hypothesis they'd like me to test, feel free to ni? me. I have a recording of this failure that I can replay deterministically.
The timeout is happening for the |networkError404()| sub-test. I see |serviceWorker.register(...)| being called from the right script context, and then the registration is updated and new Promise created as expected. But then I never see my breakpoints on |UpdatePromise::ResolveAllPromises| and |UpdatePromise::RejectAllPromises| (both flavors) hit. So it seems that one of these things may be happening 1. no network timeout trying to load "404.js" (load never initiated?) 2. there's a network timeout, but somehow it's failing to trigger UpdatePromise::RejectAllPromises 3. the Promise is somehow cycle-collected before the network timeout, so there's nothing to notify I think I can test (3), but the others are probably too inefficient for me to debug with my limited background in this code.
> 3. the Promise is somehow cycle-collected before the network timeout, so there's nothing to notify The UpdatePromise destructor doesn't run between the time it's created and when the test times out. So this hypothesis seems unlikely (unless a holding object is CC'd and leaks its promise, but that also seems unlikely). bent, any guidance on how to proceed? Or suggestion for someone else who could look at this?
Flags: needinfo?(bent.mozilla)
Nikhil is your guy.
Flags: needinfo?(bent.mozilla) → needinfo?(nsm.nikhil)
Chris, are you able to reproduce this on a m-c from Aug 20 or so? Certain patches landed that may have fixed the issue. I'm unable to reproduce it on a Linux machine. Also, could you email the recording to my bugzilla email.
Flags: needinfo?(nsm.nikhil) → needinfo?(cjones.bugs)
I don't know. I won't have time to try for a few days at least. Which patches landed that may have fixed the issue? Unfortunately rr recordings currently can't be shared from machine to machine :/. However, it's quite easy to set up rr for yourself, I encourage you to do so :) https://github.com/mozilla/rr/wiki/Recording-Firefox
Flags: needinfo?(nsm.nikhil)
I'll take a look soon.
Flags: needinfo?(nsm.nikhil)
Running the service worker tests with --repeat 100 (opt build) consistently triggers the timeout after the fourth/fifth iteration.
This is back now that bug 1130065 has re-enabled the test.
I may have a fix, waiting on several try runs.
Flags: needinfo?(nsm.nikhil)
Bulk assigning P3 to all open intermittent bugs without a priority set in Firefox components per bug 1298978.
Priority: -- → P3
Priority: P3 → P5

The test is enabled on all platforms and we have no occurrences here for a long time and no other filed bugs for this test.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.