<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Comment 9

•

9 years ago

Hmm, I got a failed trigger in try after landing this in comment 8. Seems there is more to fix.

Keywords: leave-open

Ryan VanderMeulen [:RyanVM]

Comment 10

•

9 years ago

Alexander, can you investigate what is going on here? My quick fix does not seem to have solved the issue. Seems to trigger most often on mac 10.6 debug dt6.

Flags: needinfo?(poirot.alex)

Comment hidden (Intermittent Failures Robot)

Comment 12

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/b2a337c51c4b

Comment hidden (Intermittent Failures Robot)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=d7dd35779b0e

Assignee

Comment 14

•

9 years ago

Comment 15

•

9 years ago

I'm not sure the try had the right syntax in comment 14. I added manually triggered the mac 10.6 dt6 job. Let's see if that helps.

Assignee

Comment 16

•

9 years ago

Oh! Yes, it looks like mochitest-dt only runs on 10.10 which need explicit flag :/ Oh! I didn't knew about this treeherder option, looks fantastic, if that works!!

Flags: needinfo?(poirot.alex)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=be985a6957f2

Assignee

Comment 17

•

9 years ago

Assignee

Comment 18

•

9 years ago

Attached patch Allow running browser_service_workers*.js more than once — Details — Splinter Review

It doesn't solve the intermittent, but ease running the tests multiple times.

Attachment #8709427 - Flags: review?(janx)

Comment 19

•

9 years ago

Comment on attachment 8709427 [details] [diff] [review] Allow running browser_service_workers*.js more than once Review of attachment 8709427 [details] [diff] [review]: ----------------------------------------------------------------- Works for me. One nit, also please submit it to try. ::: devtools/client/aboutdebugging/test/browser_service_workers.js @@ +45,1 @@ > // Use message manager to work with e10s Nit: The empty line before "// Finally" should probably before "// Use" instead.

Attachment #8709427 - Flags: review?(janx) → review+

Comment 20

•

9 years ago

Going to pass this to Alexander for now since he's actively investigating it.

Assignee: bkelly → poirot.alex

https://treeherder.mozilla.org/#/jobs?repo=try&revision=210389070af7

Assignee

Comment 21

•

9 years ago

Comment hidden (Intermittent Failures Robot)

Assignee

Comment 23

•

9 years ago

Ok, so I finally narrowed it down to WorkerDebugge.initialize. We first get this warning: [1074] WARNING: A debugger runnable was posted to a worker that is already shutting down!: file /builds/slave/try-m64-d-00000000000000000000/build/src/dom/workers/WorkerPrivate.cpp, line 2399 And then, connectToWorker throws on aDbg.initialize() with this exception: NS_ERROR_FAILURE - nsIWorkerDebugger.initialize resource://devtools/server/main.js :: DebuggerServer.connectToWorker/< :: line 785 Eddy, any random guess? Unfortunately, this exception happens intermitently on OSX debug builds on devtools/client/aboutdebugging/test/browser_service_workers_timeout.js Would you happen to be on OSX?

Flags: needinfo?(ejpbruel)

Assignee

Comment 24

•

9 years ago

Note that there is a bunch of warnings happening on the previous test: 07:32:23 INFO - [1074] WARNING: '!registration', file /builds/slave/try-m64-d-00000000000000000000/build/src/dom/workers/ServiceWorkerManager.cpp, line 999 07:32:23 INFO - [1074] WARNING: 'NS_FAILED(rv)', file /builds/slave/try-m64-d-00000000000000000000/build/src/dom/workers/ServiceWorkerManager.cpp, line 1205 07:32:23 INFO - [1074] WARNING: 'NS_FAILED(aStatus)', file /builds/slave/try-m64-d-00000000000000000000/build/src/dom/workers/ServiceWorkerManager.cpp, line 372 07:32:23 INFO - [1074] WARNING: ServiceWorkerJob failed with error: NS_ERROR_DOM_ABORT_ERR: file /builds/slave/try-m64-d-00000000000000000000/build/src/dom/workers/ServiceWorkerManager.cpp, line 378

Assignee

Comment 25

•

9 years ago

Also, in the previous test, there is very few logs, as if it breaks early, but it still reports the test succeed... Pushing a new try run to figure this out.

Comment 26

•

9 years ago

(In reply to Alexandre Poirot [:ochameau] from comment #23) > Ok, so I finally narrowed it down to WorkerDebugge.initialize. > We first get this warning: > [1074] WARNING: A debugger runnable was posted to a worker that is already > shutting down!: file > /builds/slave/try-m64-d-00000000000000000000/build/src/dom/workers/ > WorkerPrivate.cpp, line 2399 > And then, connectToWorker throws on aDbg.initialize() with this exception: > NS_ERROR_FAILURE - nsIWorkerDebugger.initialize > resource://devtools/server/main.js :: DebuggerServer.connectToWorker/< :: > line 785 > > Eddy, any random guess? > Unfortunately, this exception happens intermitently on OSX debug builds on > devtools/client/aboutdebugging/test/browser_service_workers_timeout.js > Would you happen to be on OSX? This looks like a race condition. Right before it shuts down, a worker unregisters itself with the WorkerDebuggerManager. The WorkerDebuggerManager then closes the WorkerDebugger for that worker. Once the WorkerDebugger is closed, it can no longer be used. Any attempts to do so will result in an NS_ERROR_UNEXPECTED being thrown. To make sure we don't attempt to use the WorkerDebugger after it is closed, the WorkerDebugger notifies its listeners when it closes, and also sets it isClosed property to true. Closing the WorkerDebugger, notifying the listeners, and setting the isClosed property all happen on the main thread, so these steps are not racy. I've double checked the implementation of WorkerActor, and it looks like we are correctly using the onClosed handler and isClosed property to make sure we don't access the WorkerDebugger after it is closed. However, there is a small window where the worker has started to to shut down, and should no longer be processing runnables, but has not unregistered itself with the WorkerDebuggerManager yet. We could maybe unregister the worker sooner (right before it starts to shut down), but that would not solve the problem, because unregistration happens on the parent thread. If the parent thread is not the main thread, we have to send a runnable to the main thread, because the WorkerDebuggerManager can only be accessed there. While this runnable is underway, we can still interact with the WorkerDebugger, such as trying to initialize it. This will not result in an NS_ERROR_UNEXPECTED, because the WorkerDebugger has not yet been closed. It will, however, result in an NS_ERROR_FAILURE, because the worker is already shutting down. We either need a way to make the onClosed event non-race prone, but that seems impossible at first glance. If we cannot do that, then the best we can do is add an explicit error check in the code that calls initialize. Hope that helps!

Updated

•

9 years ago

Flags: needinfo?(ejpbruel)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=460702f0de1f

Assignee

Comment 27

•

9 years ago

Assignee

Comment 28

•

9 years ago

https://hg.mozilla.org/integration/fx-team/rev/ca345be438249e308318379eca7eec7d6013ae93 Bug 1232931 - Allow running browser_service_workers_timeout.js more than once. r=janx

Comment hidden (Intermittent Failures Robot)

https://treeherder.mozilla.org/#/jobs?repo=try&revision=451538b1899f

Assignee

Comment 30

•

9 years ago

Updated

•

9 years ago

Blocks: 1212797

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e41b207cc4d2

Assignee

Comment 31

•

9 years ago

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 32

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/ca345be43824

Assignee

Comment 33

•

9 years ago

Last try seems to say it only happens when the two about:debugging test regarding service worker are run. browser_service_workers.js and browser_service_workers_timeout.js Both instanciates the same service worker, so I highly suspect the first test to not wait correctly for full service worker cleanup before running the second one. But with my patch, we are now waiting for registration.unregister() resolution before proceeding. (And before that patch, we were already waiting for nsIWorkerDebuggerManager's onUnregister callback to be called in the main process) I don't know what else we could wait for? This race happens when using e10s... which seems to easily introduce such race.

Comment 34

•

9 years ago

(In reply to Alexandre Poirot [:ochameau] from comment #33) > Last try seems to say it only happens when the two about:debugging test > regarding service worker are run. > browser_service_workers.js and browser_service_workers_timeout.js > Both instanciates the same service worker, so I highly suspect the first > test to not wait correctly for full service worker cleanup before running > the second one. > But with my patch, we are now waiting for registration.unregister() > resolution before proceeding. > (And before that patch, we were already waiting for > nsIWorkerDebuggerManager's onUnregister callback to be called in the main > process) > I don't know what else we could wait for? > This race happens when using e10s... which seems to easily introduce such > race. FWIW, The onUnregister callback on nsIWorkerDebuggerManager is called very late during the worker shutdown process: the RuntimeService will have already nulled out the worker's thread, and the WorkerPrivate is no longer safe to be touched. The RuntimeService then schedules the worker for deletion, which will cause a WorkerFinishedRunnable to be dispatched. This runnable will disable the debugger for the WorkerPrivate, which will cause the worker to be unregistered. Since the worker always has to be unregistered on the main thread (because that's where the WorkerDebuggerManager lives), we always have to dispatch *some* runnable to the main thread to unregister the worker after it has shut down. Because the WorkerDebugger is not closed until after the worker is unregistered, there will always be some window where the worker has already shut down, but not unregistered (since the runnable is still underway). Assuming that your problem is the one you mentioned in comment 26, i.e. that we're posting a runnable to a worker that has already shut down (I thought that was only supposed to be a warning, not an error), perhaps we should close the WorkerDebugger before the worker starts to shut down? We could at the very least set the isClosed flag without dispatching a runnable (using the WorkerDebugger's mutex to keep it thread safe). The only problem with that solution would be that it breaks the invariant that the isClosed flag will not be set until after the onClosed notification, which can cause the server to access a WorkerDebugger that has already been closed. Not sure how helpful any of this is, but there ya go.

Comment hidden (Intermittent Failures Robot)

Assignee

Comment 37

•

9 years ago

Attached patch Wait for service worker registration before attaching a toolbox to it — Details — Splinter Review

Looks like I finally got it: https://treeherder.mozilla.org/#/jobs?repo=try&revision=fc2edbeb8b71&selectedJob=15864659 We can prevent all these platform exception and toolbox init failure by waiting for the SW registration promise (navigator.serviceWorker.register(...)) to finish before proceeding. I forgot about this horrible setTimeout I used. Waiting nsIWorkerDebuggerManager's onRegister call isn't enough. That's what we were waiting for in addition to this horrible timeout. Thanks Eddy for your help here!

Attachment #8711621 - Flags: review?(janx)

Comment 38

•

9 years ago

Comment on attachment 8711621 [details] [diff] [review] Wait for service worker registration before attaching a toolbox to it Review of attachment 8711621 [details] [diff] [review]: ----------------------------------------------------------------- Nice! Looks much cleaner this way, and the all-green try makes me happy. Thanks!

Attachment #8711621 - Flags: review?(janx) → review+

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Assignee

Comment 39

•

9 years ago

https://hg.mozilla.org/integration/fx-team/rev/48571bf11be832658eabe82402739b4f42428d98 Bug 1232931: Wait for service worker registration before attaching a toolbox to it. r=janx

Comment 40

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/48571bf11be8

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Comment 41

•

9 years ago

Does this change also resolve the unhandled promise rejection[1] in rows 7 - 16? https://docs.google.com/spreadsheets/d/1ZKXW9HqnxJNdeEVG7s33_TGSV94PdUJCsQsZe1GGqjY/edit#gid=0

Flags: needinfo?(poirot.alex)

Assignee

Comment 42

•

9 years ago

I don't know. I've never seen any of these exceptions so far, except the test timed out. But I tried on linux, with the promise patch and don't see any exception/failure. These two patches should have fixed the test timed out failure.

Flags: needinfo?(poirot.alex)

Assignee

Comment 43

•

9 years ago

Looks like this intermittent is gone \o/ https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1232931&startday=2016-01-18&endday=2016-01-28&tree=all

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Phil Ringnalda (:philor)

Reporter

Comment 44

•

9 years ago

Apparently you shouldn't challenge it to fail again that way.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Comment hidden (Intermittent Failures Robot)

Assignee

Comment 46

•

9 years ago

Ok. So it changed again. Still failing but at another spot and differently. We call WorkerActor.onAttach which return {error: 'closed'} as dbg.isClosed is true. It is important to note that this is on non-e10s runs. I imagine it is still due to the previous test registering the exact same service worker. I imagine the previous registration, is being unregistered during previous test and it still in process of cleaning things up while the second test registers a new instance and fails.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=fbff2916677c

Assignee

Comment 47

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=74e7be8b7396

Assignee

Comment 48

•

9 years ago

Assignee

Comment 49

•

9 years ago

Attached patch Wait for service worker registration in browser_service_workers.js — Details — Splinter Review

I realized we were not waiting for navigator.register() promise in the first test. It may help...

Assignee

Comment 50

•

9 years ago

Attached patch Use distinct service workers for about:debugging tests — Details — Splinter Review

Using distinct workers is most likely going to help a lot!

https://treeherder.mozilla.org/#/jobs?repo=try&revision=a44ed582e8ed

Assignee

Comment 51

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=058e933dfbb1

Assignee

Comment 52

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e3c50002efc5

Assignee

Comment 53

•

9 years ago

Assignee

Comment 54

•

9 years ago

Ok. So using two distinct service worker scripts doesn't help. This is surprising! Unfortunately, using some log statements in the C++ part seems to prevent the intermittent... I'll try to get in touch with a MacOS hardware to try to reproduce it locally. Pushing to try doesn't scale here. My last findings were that the worker is terminated, ServiceWorkerPrivate::TerminateWorker is being called. It see two WorkerDebugger constructor being called and one is immediately destroyed. When the test pass, we only get one WorkerDebugger constructor call. I now suspect that we get two workers being instianciated. One installing and one active, the first being destroyed immediately and used by about:debugging, while the other one stay alive but is ignored by about:debugging. about:debugging uses nsIWorkerDebuggerManager.getWorkerDebuggerEnumerator(). Does it returns distincts WorkerDebugger instances, one for each worker state: installing/active/... ?

https://treeherder.mozilla.org/#/jobs?repo=try&revision=aeef87323b30

Assignee

Comment 55

•

9 years ago

Comment hidden (Intermittent Failures Robot)

Ryan VanderMeulen [:RyanVM]

Updated

•

9 years ago

status-firefox47: --- → affected

Assignee

Comment 57

•

9 years ago

Wait. I'm having hard time to reproduce it on try. Since Feb. 1st, it only reproduce on some try run or aurora. Wouldn't we just need to uplift?? Or did we disabled osx 10.6 on this date?

Assignee

Comment 58

•

9 years ago

A try run with an empty commit on top of mozilla-central: https://treeherder.mozilla.org/#/jobs?repo=try&revision=21765759ab3d

Comment 59

•

9 years ago

(In reply to Alexandre Poirot [:ochameau] from comment #54) > about:debugging uses nsIWorkerDebuggerManager.getWorkerDebuggerEnumerator(). > Does it returns distincts WorkerDebugger instances, one for each worker > state: installing/active/... ? Yes it does. Each service worker registration can have up to three service workers associated with it at any given time: an installing, a waiting, and an active service worker. Each of those service workers runs in its own thread, and therefore gets it's own WorkerPrivate. And since a WorkerDebugger just represents each WorkerPrivate on the main thread, you get multiple WorkerDebuggers as well.

Comment hidden (Intermittent Failures Robot)

Assignee

Comment 61

•

9 years ago

Comment on attachment 8709427 [details] [diff] [review] Allow running browser_service_workers*.js more than once This intermittent doesn't reproduces on mozilla-central: https://treeherder.mozilla.org/#/jobs?repo=try&revision=21765759ab3d Whereas we still see occurances on aurora: https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1232931&startday=2016-02-08&endday=2016-02-14&tree=all I think my two js patches really fixed it. Or at least very drastically reduced it! This is just test script tweaks.

Attachment #8709427 - Flags: approval-mozilla-aurora?

Liz Henry (:lizzard) (relman/hg->git project)

Assignee

Comment 62

•

9 years ago

Comment on attachment 8711621 [details] [diff] [review] Wait for service worker registration before attaching a toolbox to it Same.

Attachment #8711621 - Flags: approval-mozilla-aurora?

Comment 63

•

9 years ago

Comment on attachment 8709427 [details] [diff] [review] Allow running browser_service_workers*.js more than once Test fixes, ok to uplift to aurora. Sounds promising!

Attachment #8709427 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Liz Henry (:lizzard) (relman/hg->git project)

Comment 64

•

9 years ago

Comment on attachment 8711621 [details] [diff] [review] Wait for service worker registration before attaching a toolbox to it Another test only change, fine to uplift to aurora

Attachment #8711621 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Comment hidden (Intermittent Failures Robot)

Pulsebot

Comment 66

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/c292962512ff

Ryan VanderMeulen [:RyanVM]

Comment 67

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/c292962512ff

Comment hidden (Intermittent Failures Robot)