Closed Bug 1230030 Opened 6 years ago Closed 6 years ago

Intermittent e10s TEST-UNEXPECTED-TIMEOUT | /_mozilla/service-workers/service-worker/update-after-oneday.https.html | Update should be triggered after a functional event when last update time is over 24 hours - Test timed out

Categories

(Testing :: web-platform-tests, defect)

defect
Not set
normal

Tracking

(firefox45 affected, firefox46 fixed)

RESOLVED FIXED
mozilla46
Tracking Status
firefox45 --- affected
firefox46 --- fixed

People

(Reporter: philor, Assigned: bkelly)

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

NI myself to investigate this after I get back from PTO in January.
Flags: needinfo?(bkelly)
Assignee: nobody → bkelly
Status: NEW → ASSIGNED
Flags: needinfo?(bkelly)
Note sure if its the problem here, but it seems we are attempting to schedule two timers for each FetchEvent.  The first for the functional event firing and the second after the document load completes.  They should be coalesced, but maybe if the server is running particularly slow we can get an extra update in there.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c357dcbed721
It looks like we're not getting a fetch event at all for the sub-resource load.  In this case its an image.  I wonder if image cache is screwing things up somehow.

Here is a try that does cache busting on the image:

  https://treeherder.mozilla.org/#/jobs?repo=try&revision=75bdae5641b5
Cache busting did not help.

More debugging shows, however, that ServiceWorkerPrivate::TerminateWorker() can be called even if we have a KeepAliveToken supposedly holding the worker alive.  It seems the FetchEventRunnable is getting created and the work is being killed before it can be executed on the worker.

Lets try aborting TerminateWorker if mTokenCount is > 0:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=20b545139379
That was a bad patch in comment 9.  We need to be able to force terminate a worker even if it has an mTokenCount.

In this case, though, its happening because the ServiceWorkerInfo object is being destroyed.  This is quite odd because it should be held alive as the reg.active worker.  More try debug...
So what is happening is this:

1) First navigation update triggers a PropagateRegistration()
2) This async calls back to the child process with LoadRegistration()
3) LoadRegistration() checks to see if it needs to start a new active worker or not
4) Logic error in (3) causes us to replace our current active worker with a new active worker using the same script spec.
5) Replacing the current active worker kills the WorkerPrivate and its thread.
6) It just happens to do this between creating the FetchEventRunnable and executing it on the worker thread.

Here is a try build that attempts to fix the logic error LoadRegistration():

https://treeherder.mozilla.org/#/jobs?repo=try&revision=82eca92c300c
Comment on attachment 8705742 [details] [diff] [review]
Don't replace active worker unnecessarly after saving registration in e10s mode. r=ehsan

20 green try triggers.  I think this fixes the problem.
Attachment #8705742 - Flags: review?(ehsan)
Interesting.  I wonder if bug 1232555 is somehow related to this...
See Also: → 1232555
Attachment #8705742 - Flags: review?(ehsan) → review+
See Also: 1232555
Summary: Intermittent TEST-UNEXPECTED-TIMEOUT | /_mozilla/service-workers/service-worker/update-after-oneday.https.html | Update should be triggered after a functional event when last update time is over 24 hours - Test timed out → Intermittent e10s TEST-UNEXPECTED-TIMEOUT | /_mozilla/service-workers/service-worker/update-after-oneday.https.html | Update should be triggered after a functional event when last update time is over 24 hours - Test timed out
https://hg.mozilla.org/mozilla-central/rev/b82dc4ab197b
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
You need to log in before you can comment on or make changes to this bug.