Closed Bug 1231974 Opened 9 years ago Closed 9 years ago

Intermittent TEST-UNEXPECTED-FAIL | unregister-then-register-new-script.https.html | Registering a new script URL while an unregistered registration is in use - assert_unreached: assert_equals the new worker should control a new document

Categories

(Core :: DOM: Service Workers, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla46
Tracking Status
firefox46 --- fixed

People

(Reporter: bkelly, Assigned: bkelly)

References

(Blocks 1 open bug)

Details

Attachments

(6 files, 5 obsolete files)

6.12 KB, patch
baku
: review+
Details | Diff | Splinter Review
8.12 KB, patch
baku
: review+
Details | Diff | Splinter Review
9.47 KB, patch
baku
: review+
Details | Diff | Splinter Review
834 bytes, patch
baku
: review+
Details | Diff | Splinter Review
4.25 KB, patch
baku
: review+
Details | Diff | Splinter Review
17.29 KB, patch
bkelly
: review+
Details | Diff | Splinter Review
This test depends on GC timing do no longer control the previous frame.
Attachment #8697602 - Flags: review?(ehsan)
We need to examine this test case for correctness and determine what the spec should do here.  There is already bug 1227007 to do that.

I'm just going to disable this test for now.
Keywords: leave-open
See Also: → 1227007
Attachment #8697626 - Flags: review?(ehsan) → review+
Comment on attachment 8697602 [details] [diff] [review]
Fix wpt GC timing dependency by explicitly navigating a frame to stop being a controlled client. r=ehsan

A debug try to see what is going on here.  I don't think my GC theory above is accurate:

  https://treeherder.mozilla.org/#/jobs?repo=try&revision=2f35207eecb0
Attachment #8697602 - Attachment is obsolete: true
We're getting an update executed for the scope using a different registration object which then does not see the script URL has changed.  This then re-activates the old script URL.

Its still unclear why we are getting this new registration.  It could be related to a necko channel failure that appears to be happening in the ServiceWorkerScriptCache layer for some reason.
I think we're essentially hitting the same race described in:

  https://github.com/slightlyoff/ServiceWorker/issues/783

We should really look up the registrations in the job Start() instead of getting our references from outside of the queue synchronization mechanism.
Attachment #8697626 - Attachment is obsolete: true
Updated to fix a failure path that called Done() synchronously from a job Start().

https://treeherder.mozilla.org/#/jobs?repo=try&revision=ca2df2f85baf
Attachment #8698165 - Attachment is obsolete: true
Attachment #8698165 - Flags: review?(ehsan)
Attachment #8698206 - Flags: review?(ehsan)
Blocks: 1232444
Try to fix that same failure case again.  This time with better logic.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=9eee01cc54bf
Attachment #8698206 - Attachment is obsolete: true
Attachment #8698206 - Flags: review?(ehsan)
Attachment #8698281 - Flags: review?(ehsan)
Keywords: leave-open
There are still a couple intermittents I want to investigate, but I think these patches should land.
So, there is still a race in the spec even after making these agreed changes:

  https://github.com/slightlyoff/ServiceWorker/issues/783#issuecomment-164813284

This patch allows us to abort the install job if it was triggered from an update and the newest worker script has changed on us.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b12f55e91dba
Attachment #8698477 - Flags: review?(ehsan)
Try is looking pretty good.  Large number of retriggers have completed with no service worker wpt failures.

This should be good to go once review completes.
Attachment #8698161 - Flags: review?(ehsan) → review?(amarchesini)
Attachment #8698162 - Flags: review?(ehsan) → review?(amarchesini)
Attachment #8698163 - Flags: review?(ehsan) → review?(amarchesini)
Attachment #8698166 - Flags: review?(ehsan) → review?(amarchesini)
Attachment #8698281 - Flags: review?(ehsan) → review?(amarchesini)
Attachment #8698477 - Flags: review?(ehsan) → review?(amarchesini)
Attachment #8698161 - Flags: review?(amarchesini) → review+
Attachment #8698162 - Flags: review?(amarchesini) → review+
Attachment #8698163 - Flags: review?(amarchesini) → review+
Attachment #8698166 - Flags: review?(amarchesini) → review+
Comment on attachment 8698281 [details] [diff] [review]
P4 Lazy load registration and verify it does not change in service worker jobs. r=ehsan

Review of attachment 8698281 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/workers/ServiceWorkerManager.cpp
@@ +970,5 @@
>  
> +  nsresult
> +  EnsureRegistration()
> +  {
> +    AssertIsOnMainThread();

what about if mRegistration already exist?

@@ +974,5 @@
> +    AssertIsOnMainThread();
> +
> +    RefPtr<ServiceWorkerManager> swm = ServiceWorkerManager::GetInstance();
> +    if (NS_WARN_IF(!swm)) {
> +      mRegistration = nullptr;

this should not be needed.

@@ +1164,5 @@
>  
>    void
>    ContinueAfterInstallEvent(bool aInstallEventSuccess)
>    {
> +    nsresult rv = EnsureRegistration();

are we sure we don't have a mRegistration here?

@@ +1340,5 @@
>                     const nsACString& aMaxScope) override
>    {
>      RefPtr<ServiceWorkerRegisterJob> kungFuDeathGrip = this;
> +
> +    nsresult rv = EnsureRegistration();

move it after mCanceled check

@@ +1492,5 @@
>    {
>      AssertIsOnMainThread();
>      RefPtr<ServiceWorkerRegisterJob> kungFuDeathGrip = this;
> +
> +    nsresult rv = EnsureRegistration();

move this after the if(mCanceled) check.
Attachment #8698281 - Flags: review?(amarchesini) → review+
Attachment #8698477 - Flags: review?(amarchesini) → review+
(In reply to Andrea Marchesini (:baku) from comment #20)
> > +  nsresult
> > +  EnsureRegistration()
> > +  {
> > +    AssertIsOnMainThread();
> 
> what about if mRegistration already exist?

This is allowed by design.  But we verify that the current registration for this scope is not a new object and clear mRegistration if it has changed.

> @@ +974,5 @@
> > +    AssertIsOnMainThread();
> > +
> > +    RefPtr<ServiceWorkerManager> swm = ServiceWorkerManager::GetInstance();
> > +    if (NS_WARN_IF(!swm)) {
> > +      mRegistration = nullptr;
> 
> this should not be needed.

Why?  I want to ensure as an invariant that if EnsureRegistration returns a failure code, then mRegistration is nullptr.  If shutdown happens in the middle of the job, then I think SWM can be nullptr, but still have a mRegistration set.

> >    void
> >    ContinueAfterInstallEvent(bool aInstallEventSuccess)
> >    {
> > +    nsresult rv = EnsureRegistration();
> 
> are we sure we don't have a mRegistration here?

We probably do.  This EnsureRegistration() call is verifying it has not changed on us.  (For example, something didn't clear the registration and create a new one.)
I'll change the name of the method to EnsureAndVerifyRegistration() to clarify its doing more than setting mRegistration.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: