Closed Bug 1179401 Opened 9 years ago Closed 9 years ago

Rewrite fetch-event-respond-with-stops-propagation.https.html to make it pass by not postMessaging after closing the window

Categories

(Core :: DOM: Service Workers, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla44
Tracking Status
firefox42 --- affected
firefox43 --- fixed
firefox44 --- fixed

People

(Reporter: noemi, Assigned: ehsan.akhgari)

References

Details

Attachments

(3 files)

A Nightly crash occurs when executing "fetch-event-respond-with-stops-propagation.https.html" wpt test such as |./mach web-platform-tests _mozilla/service-workers/service-worker/etch-event-respond-with-stops-propagation.https.html| with today's (7/1) master build

The assertion failure shown is as follows:
"Assertion failure: workerPrivate, at /Users/noef/Documents/mozilla-central/dom/workers/ServiceWorker.cpp:93"

Please find attached the crash report corresponding to this
Assignee: nobody → ehsan
The interesting thing about this test is that it tries to call ServiceWorker::PostMessage after closing the window of the ServiceWorker object: <http://mxr.mozilla.org/mozilla-central/source/testing/web-platform/mozilla/tests/service-workers/service-worker/fetch-event-respond-with-stops-propagation.https.html?force=1#26>.  While debugging this, I found two independent issues, one of which I will file separately.

The first issue is that RuntimeService::CancelWorkersForWindow() calls Close() on the shared/serviceworker SharedWorker objects associated with the window.  That in turn calls SharedWorker::NoteDeadWorker() which sets its mWorkerPrivate to null.  Given that CancelWorkersForWindow is called from FreeInnerObjects during docshell destruction (which can be trigger when calling nsINode::Remove from script, for example!) it seems like clearing mWorkerPrivate at that point is premature.

Is this behavior intentional, Kyle?  It seems to me that the right fix is to remove the call to CloseSharedWorkersForWindow() from RuntimeService::CancelWorkersForWindow() <https://dxr.mozilla.org/mozilla-central/source/dom/workers/RuntimeService.cpp#2312>.
Forgot to needinfo Kyle!  Please see comment 1.
Flags: needinfo?(khuey)
(Note that the fix I suggested in comment 1 seems to pass our tests...)
SharedWorkers do not have postMessage, they only have a MessagePort.

Why would you expect methods on a DOM object to continue to work after the window it lives in is closed?
Flags: needinfo?(khuey)
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #4)
> SharedWorkers do not have postMessage, they only have a MessagePort.

Yes, but ServiceWorkers have postMessage(), and internally, ServiceWorker objects have an underlying SharedWorker.  See <https://dxr.allizom.org/mozilla-central/source/dom/workers/ServiceWorker.cpp?offset=0#88>.  The GetWorkerPrivate() call there is failing here.

> Why would you expect methods on a DOM object to continue to work after the
> window it lives in is closed?

Because in this case the DOM object represents a service worker that can easily survive the lifetime of the window.  I think that is quite reasonable.
Depends on: 1179567
(In reply to Ehsan Akhgari (not reviewing patches, not reading bugmail, needinfo? me!) from comment #5)
> (In reply to Kyle Huey [:khuey] (khuey@mozilla.com) from comment #4)
> > SharedWorkers do not have postMessage, they only have a MessagePort.
> > Why would you expect methods on a DOM object to continue to work after the
> > window it lives in is closed?
> 
> Because in this case the DOM object represents a service worker that can
> easily survive the lifetime of the window.  I think that is quite reasonable.

We're getting into edge cases of the web platform here, but I don't think that is a reasonable assumption.
Fwiw, behavior in different browsers differs radically for objects that outlive their window.  It can even differ radically for different sorts of objects in the same browser....

That said, for something new like SharedWorker I would hope the spec would define the behavior.
The spec does define the lifetime of SharedWorker.  I don't know that the ServiceWorker spec defines the lifetime of ServiceWorker ...
Blocks: 1179540
I can't find details on the exact lifetime of the ServiceWorker object in the spec, but I can't find the same for SharedWorker either.  Where is this in the HTML spec, Kyle?
Flags: needinfo?(khuey)
https://html.spec.whatwg.org/multipage/workers.html#the-worker%27s-lifetime

"A worker is said to be a permissible worker if its list of the worker's Documents is not empty, or if its list has been empty for no more than a short user-agent-defined timeout value, its WorkerGlobalScope is actually a SharedWorkerGlobalScope object (i.e. the worker is a shared worker), and the user agent has a browsing context whose Document is not complete loaded."

and

"Closing orphan workers: Start monitoring the worker such that no sooner than it stops being a protected worker, and no later than it stops being a permissible worker, worker global scope's closing flag is set to true."
Flags: needinfo?(khuey)
I think we are talking about different things.  I am not talking about the lifetime of the underlying worker, but the SharedWorker object that the script accesses.  I don't see anything in either specs that says the SharedWorker or the ServiceWorker _objects_ should stop working.

That being said, the service worker spec actually leaves the decision to when it should kill the worker to the UA.  But more interestingly, it says that when a postMessage() is being executed, we need to run the service worker <http://slightlyoff.github.io/ServiceWorker/spec/service_worker/index.html#service-worker-postmessage-method>.

Should we rerun the service worker in ServiceWorker::PostMessage()?  That seems to be a better fix here.  Needinfoing some people who might have opinions here.
Flags: needinfo?(nsm.nikhil)
Flags: needinfo?(bkelly)
Flags: needinfo?(amarchesini)
(In reply to Ehsan Akhgari (not reviewing patches, not reading bugmail, needinfo? me!) from comment #12)
> That being said, the service worker spec actually leaves the decision to
> when it should kill the worker to the UA.  But more interestingly, it says
> that when a postMessage() is being executed, we need to run the service
> worker
> <http://slightlyoff.github.io/ServiceWorker/spec/service_worker/index.
> html#service-worker-postmessage-method>.
> 
> Should we rerun the service worker in ServiceWorker::PostMessage()?  That
> seems to be a better fix here.  Needinfoing some people who might have
> opinions here.

Yes, we should try to re-run. This means ServiceWorker::PostMessage will have to request a new SharedWorker from the SWM.
Flags: needinfo?(nsm.nikhil)
I also agree that restarting the service worker is closer to the intent of the spec design.
Flags: needinfo?(bkelly)
Status: NEW → ASSIGNED
I tried this, but there is an issue that I'm not sure how to solve.  By the time that this code kicks in in the WPT test case, the nsGlobalWindow::mContext is nulled out.  You cannot create more than one context for the object because the second time that EnsureScriptEnvironment is called, this check prevents the creation of the global context: <https://dxr.mozilla.org/mozilla-central/source/dom/base/nsGlobalWindow.cpp#1938>.  Because of this, the following check fails <https://dxr.mozilla.org/mozilla-central/source/dom/workers/WorkerPrivate.cpp#5061> and because of that, CreateServiceWorkerForWindow fails: <https://dxr.mozilla.org/mozilla-central/source/dom/workers/ServiceWorkerManager.cpp#2708>.
Can we give up on the idea that objects from a dead window should keep working yet?
Possibly related:

  https://github.com/slightlyoff/ServiceWorker/issues/649

Sounds like the spec may change to specify an external queue of events.

Not sure if that's in play here, but it seems related what to do when an event needs to be fired at a service worker that has been terminated.
I doubt comment 17 helps.  It seems like WONTFIXing this is the only viable option.  Perhaps we should reword the spec to clarify that this case shouldn't work.
Catalin, Nikhil mentioned that you have been looking at this, and it was also discussed in the F2F on Monday.  Do you have any updates for what we should do here?

My personal preference is to take my patch here (since it is mandated by the spec) but WONTFIX this bug about this specific test, and perhaps say something in the spec about ServiceWorker objects not being functional after the Window they have been created from is navigated away or some such...
Flags: needinfo?(catalin.badea392)
(In reply to Ehsan Akhgari (not reviewing patches, not reading bugmail, needinfo? me!) from comment #19)
> Catalin, Nikhil mentioned that you have been looking at this, and it was
> also discussed in the F2F on Monday.  Do you have any updates for what we
> should do here?
Yes, the conclusion we arrived at was that dom objects that outlive their global
shouldn't be usable. Jake opened a sw issue until the appropriate spec change
is decided: https://github.com/slightlyoff/ServiceWorker/issues/722.
 
> My personal preference is to take my patch here (since it is mandated by the
> spec) but WONTFIX this bug about this specific test, and perhaps say
> something in the spec about ServiceWorker objects not being functional after
> the Window they have been created from is navigated away or some such...
Yes, postMessage should spin up the service worker if it's not running. However, except for the orphan
dom object case, this can never happen in gecko (I think). This change would be justified once we have limited lifetime for service workers. It's up to you.

I think postMessage should throw when |GetParentObject()| is null. Also, we should remove the |mWindow| and |mdDocument| references from ServiceWorker.
Flags: needinfo?(catalin.badea392)
Hmm, how can we ensure that this situation cannot happen without the orphan object?
Flags: needinfo?(catalin.badea392)
(In reply to Ehsan Akhgari (not reviewing patches, not reading bugmail, needinfo? me!) from comment #21)
> Hmm, how can we ensure that this situation cannot happen without the orphan
> object?

I'm working on a patch that will stop service workers if they're idle for a given amount of time in bug 1188545. The first patch that's under review refactors event dispatching code and ensures the worker is restarted if needed.
Flags: needinfo?(catalin.badea392)
Depends on: 1188545
Depends on: 1193133
Flags: needinfo?(amarchesini)
Summary: Assert/Crash running "fetch-event-respond-with-stops-propagation.https.html" wpt service worker test in Nightly → Rewrite fetch-event-respond-with-stops-propagation.https.html to make it pass by not postMessaging after closing the window
No longer depends on: 1188545
Comment on attachment 8667680 [details] [diff] [review]
Call stopImmediatePropagation() on the Event object in respondWith()

Review of attachment 8667680 [details] [diff] [review]:
-----------------------------------------------------------------

::: testing/web-platform/mozilla/tests/service-workers/service-worker/fetch-event-respond-with-stops-propagation.https.html
@@ +26,2 @@
>            worker.postMessage({port: channel.port2}, [channel.port2]);
> +          frame.remove();

What's the reason for this change?
Attachment #8667680 - Flags: review?(josh) → review+
(In reply to Josh Matthews [:jdm] from comment #24)
> Comment on attachment 8667680 [details] [diff] [review]
> Call stopImmediatePropagation() on the Event object in respondWith()
> 
> Review of attachment 8667680 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> :::
> testing/web-platform/mozilla/tests/service-workers/service-worker/fetch-
> event-respond-with-stops-propagation.https.html
> @@ +26,2 @@
> >            worker.postMessage({port: channel.port2}, [channel.port2]);
> > +          frame.remove();
> 
> What's the reason for this change?

To bypass the crash in comment 0.
Comment on attachment 8667680 [details] [diff] [review]
Call stopImmediatePropagation() on the Event object in respondWith()

Approval Request Comment
[Feature/regressing bug #]: Service workers
[User impact if declined]: This is a SW blocker.
[Describe test coverage new/current, TreeHerder]: Has a test.
[Risks and why]: This change is limited to SW related code and has no risk towards other code.
[String/UUID change made/needed]: None.
Attachment #8667680 - Flags: approval-mozilla-aurora?
https://hg.mozilla.org/mozilla-central/rev/ece14c314ca0
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
Hi,

just checked on m-c (89732fcdb0ba revision) and the test successfully runs. Thanks for fixing it!

Summary

Harness status: OK

Found 1 tests
1 Pass
Details
Result	Test Name
Pass	respondWith() invokes stopImmediatePropagation()
Comment on attachment 8667680 [details] [diff] [review]
Call stopImmediatePropagation() on the Event object in respondWith()

Fixes a crash and perf issue, includes tests. Let's uplift this to aurora.
Attachment #8667680 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: