Default value for dom.serviceWorker.idle_timeout too agressive

NEW
Unassigned

Status

()

Core
DOM: Service Workers
P2
normal
11 months ago
8 months ago

People

(Reporter: Richard Maher, Unassigned)

Tracking

(Blocks: 1 bug)

53 Branch
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments)

(Reporter)

Description

11 months ago
Created attachment 8883774 [details]
echo.js code for POC ServiceWorker. Uses Message Event to simulate up-coming Travel Event

User Agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.115 Safari/537.36

Steps to reproduce:

Send a message to ServiceWorker
Wait 30 seconds


Actual results:

ServiceWorker is zapped by 30sec idle timeout
My "Loiter/Standing" event waits 40 secs to detect a stationary user resulting in a new service worker just to report "Standing".

This is not a big deal but Chrome, Opera, and Android browser are all less agressive. 

(See attached)


Expected results:

The default idle_timeout should be increased to at least 1min. I also believe device load should be taken into account so if memory/cpu allows then let the SW live.

Either that or give us an option on the subscription to increase the timeout value.

Updated

11 months ago
Component: Untriaged → DOM: Service Workers

Updated

11 months ago
Attachment #8883774 - Attachment is patch: true
Attachment #8883774 - Attachment mime type: application/javascript → text/plain

Updated

11 months ago
Attachment #8883774 - Attachment is patch: false

Comment 1

11 months ago
So there is nothing in the spec which requires us to permit such a long running service worker.  I believe chrome may permit it if there is an open window controlled by the SW.  Eventually the spec may need to define this behavior in more detail.

I expect we will probably support this in the future, but not in the short term.  We have a number of refactoring efforts in flight that will touch this code.  Once we expose the possibility of self postMessage(), though, we will need to fix our lifetime management and this will probably get fixed at the same time.

In the meantime you can work around this issue from the window side by sending another message event to the service worker to extend the lifetime further.
Blocks: 1226983
See Also: → bug 1216170

Updated

11 months ago
Status: UNCONFIRMED → NEW
Ever confirmed: true
(Reporter)

Comment 2

11 months ago
Hi Ben, (Good to see you here! Didn't know what timezone you were in.)

I'm just having to stick a half-decent UX on my ServiceWorker Background-Geolocation POC and then you'll have a better idea but, regardless of your Timeout setting, the ServiceWorker architecture fits background geolocation like a glove! Whether driving or riding a single SW is all that is required unless you get stuck at the lights (which take forever in Perth. SERIOUSLY different)
When you take off you get a new SW that keeps telling the Fleet Manager where you are. It's a bit like those annoying new cars that turn the engine off at the lights and start up again when you hit the accelerator.

Anyway, more by the w/e. Again, just so you know Chrome, Opera, and Android Browser are less aggressive. 40secs to walk the min 50m before GPS update seems a good reason to me. Why would you call 1 minute "such a long Service Worker" but not have a problem with 30secs? What if it's recharging/not-recharging?

As I said "no big deal" but if we're being arbitrary then I say 1min sounds like a nice round figure :-) 

I've attached the TravelManager pollyfill but here's the POC registration code: - (did I use the e.waitUntil properal in the Message Event handler?)

'use strict';
/* identification division.
 * program-id.    RegisterServiceWorker.
 * author.        Richard Maher.
 */
 
function registerServiceWorker()
{	
// Make sure SW is there
	navigator.serviceWorker.register('echo.js')
	  .then(reg => {
    	console.log('SW Registered');	
	  }).catch(err => {
    	console.log('SW Registration failed with ' + err);
		reportError({header:"Could not register ServiceWorker",
					message:err});
  	  });
	  
// Register for Background Geolocation tracking. Take default for accuracy, max age.    
	navigator.serviceWorker.ready
		.then(reg => {
				reg.travelManager.subscribe({
											minSilence:     5, // Car trip value.
											maxSilence:   600, // Sanity check. Squelch off.
											minProgress:   50, // Indoor accuracy an issue.
											maxSnail:      40, // Brisk walk.
											dropDodgy:   true  // Avoid wireless Pong.
											})
					.then(subscription => {travelSubscription = subscription}, locFail)
					.catch(err => {
							console.log('Travel Manager Subscription failed with ' + err);
							reportError({header:"Could not subscribe to Travel Manager",
									message:err});
						});
			});
}
(Reporter)

Comment 3

11 months ago
Created attachment 8883793 [details]
TravelManagerPolyfill all the UA coders have to do!

The attached code (~400 lines Javascript) is almost all that the UA coderes have to do to give us background geolocation!

Ok, Okay! At the moment Firefox does run in the background but that's another story. Good for testing but!!!
(Reporter)

Updated

11 months ago
Attachment #8883774 - Attachment description: echo.js → echo.js code for POC ServiceWorker. Uses Message Event to simulate up-coming Travel Event
If you need 1min idle timeouts, couldn't you just use event.waitUntil to get that?
You can call event.waiUntil on multiple promises and your worker will live until all of them are resolved.

Here your worker will live between 1minute and 5minutes (if no other events that further extend its lifetime are received).
onmessage = function(event) {
  event.waitUntil(new Promise(function(resolve) { setTimeout(resolve, 60000);}));
  event.waitUntil(new Promise(function(resolve) {
    // do stuff and resolve when done
  }));
}
(Reporter)

Comment 5

11 months ago
@Comment 4 Looks like a plan!

So, a) if the UA does have to kill a SW instance then all outstanding promises and timers will be destroyed and b) a new extendable event can interrupt an existing extendable event as if it had yeilded with a "return" (pretty sure they can but it's late :-) as in https://github.com/w3c/ServiceWorker/issues/838 then we're good to go.

I'll try it tomorrow. 

Didn't know you could have multiple event.waitUntil calls for the same event. Good tip! Any difference/preference to event.waitUntil(Promise.all(myArray)) ?

So if Firefox dilly-dally on a 2sec change to  dom.serviceWorker.idle_timeout to bring it into line with the rest of the world then we can defeat the check as you suggested. Cool!

The only downside is that now we have to clean-up the last promise and timer whenever we get a position-update, push-message, fetch etc. Not the end of the world I guess.

BTW Where did the "5minutes" figure come from? Another about:config parameter that I missed? A hard, no ifs or buts, termination point? Bit of a worry :-( Errors reported? Quiescent?

Related: https://github.com/w3c/ServiceWorker/issues/1039
(In reply to Richard Maher from comment #5)
> @Comment 4 Looks like a plan!
> 
> So, a) if the UA does have to kill a SW instance then all outstanding
> promises and timers will be destroyed and b) a new extendable event can
> interrupt an existing extendable event as if it had yeilded with a "return"
> (pretty sure they can but it's late :-) as in
> https://github.com/w3c/ServiceWorker/issues/838 then we're good to go.
> 
> I'll try it tomorrow. 
> 
> Didn't know you could have multiple event.waitUntil calls for the same
> event. Good tip! Any difference/preference to
> event.waitUntil(Promise.all(myArray)) ?
There actually is a difference. If one of your promises rejects, Promise.all([]) will reject
without waiting for any pending promises, potentially causing your worker to be killed.
Calling event.waitUntil multiple times will wait until each promise was resolved or rejected.
 
> So if Firefox dilly-dally on a 2sec change to 
> dom.serviceWorker.idle_timeout to bring it into line with the rest of the
> world then we can defeat the check as you suggested. Cool!
Could you please explain how knowing when the worker gets terminated is useful to your app?
The point is you shouldn't base your app logic on these settings.
UA could potentially kill workers without a timeout, resulting in each event handler running on a fresh js global.

> The only downside is that now we have to clean-up the last promise and timer
> whenever we get a position-update, push-message, fetch etc. Not the end of
> the world I guess.
> 
> BTW Where did the "5minutes" figure come from? Another about:config
> parameter that I missed? A hard, no ifs or buts, termination point? Bit of a
> worry :-( Errors reported? Quiescent?
Yes, each event can extend the lifetime by a maximum of 5minutes. If you do
event.waitUntil(new Promise(function(){})); // will never resolve
Firefox will kill the worker after 5 minutes. Chrome has (or used to have) a similar threshold.
Note, you can still keep your worker running longer than that by sending new events from a window.

The pref is dom.serviceWorkers.idle_extended_timeout

> 
> Related: https://github.com/w3c/ServiceWorker/issues/1039
(Reporter)

Comment 7

11 months ago
> Could you please explain how knowing when the worker gets terminated is useful to your app?
> The point is you shouldn't base your app logic on these settings.
> UA could potentially kill workers without a timeout, resulting in each event handler running on a fresh js global.

Please let me restate what I said over in stackoverflow: -
"I am 100% ALL for the ability for a UA to kill a SW at will ANY time. I 100% am NOT asking for extended trans-SW-instance memory or Fat-SW syndrome! Just asking why kill it if we've got heaps of memory and CPU?"

Please also believe me when I say this is not a big issue for me. One extra SW instantiation here or there is not going to break the bank. As I pointed out previously to Ben, by the w/e I will have a Web-App that is useful for showing people just how perfectly Background Geolocation is deployed on the ServiceWorker paradigm!

The only reason I reported the bug was because it is behaviour that is specific and, in its aggressiveness, unique to Firefox. There is a VERY arbitrary idle timeout of 30secs and I suggest it should be 1min for the simple reason that people take longer than 30secs to walk longer than a sensible minimum number of metres for a valid GPS reading.

IOW, I currently, and this is parameterized and not engraved in stone either, ignore any geolocation update that is less than 50 metres from the last position. This is for obvious accuracy and reliability reasons. Now it just so happens that it takes the average person more than 30secs to walk 50m. Therefore Firefox has killed the last SW before the UA has had a chance to tell it that heuristically the person is actually loitering/standing.

First-world problem? Yes :-)

Does it indicate that the spec-writers and designers have not got this area right? Yes?

Will I lose sleep if nothing happens? No.

Did I really just raise this bug as yet another chance to bang-on about Background-Geolocation with Service Workers? What do you think?

More on the week-end. . .

PS. "please explain how knowing when the worker gets terminated is useful to your app?"
It's usefulness lies in my ability to show the very favourable Service-Worker-Instance to Interesting-Geolocation-Update ratio that comes from driving and riding!!! Firefox makes the walking trips a little more CPU expensive. Big deal.
(Reporter)

Comment 8

11 months ago
NB: Please look at the attach Service Worker example (echo.js) in particular the *very* appropriately named variable: -

var doryRegister = [];  // Meet someone new every day.

and later . . . 

	function sendClient(client, cmd, position)
	{				  
		if (doryRegister[client.id] == undefined) {
			doryRegister[client.id] = 0;
			if (!postIt(client,{'cmd':INTRO})) return false;
		}

I don't *rely* on the Global the longevity or anything else not SW-NEWable!

The App "needs" to know when I get a new instance so I can educate the ignorant about how Service Workers and Background Geolocation together like Fish & Chips!
(Reporter)

Updated

11 months ago
Attachment #8883793 - Attachment mime type: application/javascript → text/plain
(Reporter)

Comment 9

11 months ago
Ok, I give up. I have attached a slightly modified version of my Service Worker code. The only difference is 3 lines all containing the variable name "firefoxValium". Strangely if you uncomment the following line: -

e.waitUntil(new Promise(resolver=>{firefoxValium = resolver})); 

You will observe/hypothesize that on Firefox 54.0.1 on Android 7 a new extended event *cannot* interrupt my running extended event. That is, the code: -

firefoxValium.resolve();// Don't care if already resolved

never gets called as the new Message Event is queued waiting for the current waitUntil() to complete. I can't say for sure that is happening (and I am losing interest) but it fits the available evidence.

NB: The *same* code on Windows 7 Firefox 53.0 works and allows the "Loiter" message through after 40 secs.

If I've done something stupid then please let me know otherwise OTY.

Cheers Richard
(Reporter)

Comment 10

11 months ago
Created attachment 8884145 [details]
echo.js Modified in an attempt to defeat the SW timeout for certain message types/events.

Doesn't work on mobile so I'll stick with the o/head of a few extra SWs.
(Reporter)

Updated

11 months ago
Attachment #8884145 - Attachment description: echo.js → echo.js Modified in an attempt to defeat the SW timeout for certain message types/events.
(Reporter)

Comment 11

11 months ago
This is probably out-of-scope for this bug (slightly related) but just while it's fresh in my head: -

The above bug addresses the many-SW per page scenario but the flip-side is also giving me concerns. I'm guessing that a "client" and its client.id refers to a window/tab and not a page, so any page in-SW-scope hosted by that client can receive messages POSTed by the SW?

Clearly, I can use "scope" to control which pages see my SW but what happens to the SW-posted messages to me/client when I refresh? I expected the event/message queue(s) to be garbage collected and new ones created? On Chrome (admittedly with debug on) I am seeing the odd old (old = pre-current page instance) message turning up. In one case 30mins later.

What is *most* concerning is that it was out of sequence!

For example: -

Messages: -
8:52:22 AM Kick off!
8:52:40 AM New ServiceWorker instance 0
8:52:40 AM 8:51:58 AM SWI0 2 2 start lng: 115.8107681 lat: -31.982061499999997 +/- 1697
8:52:40 AM 8:52:07 AM SWI0 3 3 start lng: 115.8107681 lat: -31.982061499999997 +/- 1697
8:52:40 AM 8:52:21 AM SWI0 4 4 start lng: 115.8107681 lat: -31.982061499999997 +/- 1697
*--->9:26:35 AM 8:50:56 AM SWI0 4 1 loiter lng: 115.8107681 lat: -31.982061499999997 +/- 1697<---*
9:27:01 AM 8:52:21 AM SWI0 5 5 loiter lng: 115.8107681 lat: -31.982061499999997 +/- 1697

See the 2nd last entry. 30 mins old; Seq #1 for last session, but coming out as #4 in this session.

The real question is "Can you confirm that all pending messages/events should be wiped on refresh?"

Anyway back to "Can an extended event interrupt another extended event?"

Comment 12

10 months ago
(In reply to Richard Maher from comment #11)
> Clearly, I can use "scope" to control which pages see my SW but what happens
> to the SW-posted messages to me/client when I refresh? I expected the
> event/message queue(s) to be garbage collected and new ones created? On
> Chrome (admittedly with debug on) I am seeing the odd old (old = pre-current
> page instance) message turning up. In one case 30mins later.

A Client and its associated ID represent a single javascript global.  So either a window global or a worker self global.  Refreshing the page should create a new global and give you a new Client.

Something like document.open() or document.write() will give you a new global per the spec and that is what we do in firefox.  I believe chrome does not create a new global in that case, though, so the Client may not change.

If you are seeing messages to one Client showing up in a different Client, then that sounds like a chrome bug.  I recommend trying to reduce the failing case and filing a bug over there:

https://chromiumbugs.appspot.com/?token=4ORKxCiG4Ga_kIyUHqHqHzoxNDk5Njk2Njg0&role=&continue=https%3A//bugs.chromium.org/p/chromium/issues/entry.do
(Reporter)

Comment 13

10 months ago
@Ben Really appreciate the help and clarification of what is a client FF & Chrome! I am taking the proverbial by talking about Chrome behaviour here :-) I know where monorail is and will enter a bug there if need be. 

Can you or Catalin explain the difference in behaviour between the 2 Firefox versions I quoted WRT the attached and revised echo.js? Is everyone else satisfied that the 3 "firefoxValium" lines I inserted should be fine and there's no way an existing Extended Message event would block a new Message event.

History tells me that pilot-error is the most likely explanation but it was only 3 lines and 3 lines that worked on Windows 7 FF 53.0 yet blocked on Android 7 FF 54.0.1.

Comment 14

10 months ago
The main difference between desktop and android is that android runs in a single process.  On desktop we are expanding the number of processes we use by default.  In particular, newer versions of firefox may end up running separate instances of the same service worker in different processes by accident.  We are working to fix this, but its a large effort and we don't have a target release yet.

The overall effort is in bug 1231208.

That being said, I don't think one extended event should block another, even in single process android.  This line should not prevent a further event from firing:

  e.waitUntil(new Promise(resolver=>{firefoxValium = resolver}));

What it could prevent, though, is an upgrade of the service worker script.  If you were in development and tried updating your script it might not finish updating until this previous promise resolved or the worker was timed out.
(Reporter)

Comment 15

10 months ago
> I don't think one extended event should block another, even in single process android

Me either, but that's what I saw :-( Anyway let's leave it till I have a SSCCE that you can run on your own kit. Unfortunately I can only do it out-of-hours and on the w/e I started caring about what the UI looked like so had a scope blow out. This week-end for sure!

If you or anyone has time to look at a fundamental Promises architecture question about the atomicity of the unwinding of Resolver Functions can you please have a look her https://stackoverflow.com/questions/45012039/javascript-promise-thenable-resolvingfunctions-release-order 

The fact that a .THEN on an already resolved promise executes immediately in-line leads me to believe that FIFO Promise delivery is impossible to guarantee. This would be catastrophic IMHO so, again, I must be wrong but . . .
Priority: -- → P2
(Reporter)

Comment 16

10 months ago
As promised/threatened here is an example that can be used to see Firefox's trigger-happy nature in action: -

Please see https://drive.google.com/open?id=0B7Rmd3Rn8_hDMmFqWjlHUG5OQzQ for what is arguably the way forward with Geolocation and Web Apps. There is a aaa_readme.txt.

Note how beautifully and EFFICIENTLY ServiceWorkers underpin background geolocation?
(Reporter)

Comment 18

8 months ago
Please be aware that a new version of the Web App is available at the same link https://drive.google.com/open?id=0B7Rmd3Rn8_hDMmFqWjlHUG5OQzQ

Doesn't add/subtract from this bug but hopefully with FF57 and fullscreen mode will look as smicko as it does on Chrome.

There is now a Summary page when you press Arrive and the trip can be mapped and replayed.

@ben *PLEASE* see the utility of the proposed background geolocation and help it get up! (Or at least tell me what's wrong with it)
You need to log in before you can comment on or make changes to this bug.