Open Bug 1203274 Opened 9 years ago Updated 1 year ago

Service worker wakeup and lifetime telemetry

Categories

(Core :: DOM: Service Workers, task, P3)

task

Tracking

()

Tracking Status
firefox43 --- affected

People

(Reporter: nsm, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [tw-dom])

We should log the following:
1) How long a SW that has woken up gets to run before it is killed.
 a) The reason it was killed - timeout, idle timeout, or stopped controlling
2) Every time an event is sent to the worker (what kind of event) and whether the worker was spawned for this or an existing one was used.
  It might be better to not log "fetch" since it will happen frequently and is a hot path, but we may want to log some proxy like how often a SW controls documents or how often requests get actually intercepted by the SW
3) The time SWs are usually taking between the event being fired and either the event dispatch finishing, or, if waitUntil() was called, until the promise(s) resolve(s).
Whiteboard: [tw-dom]
I'd like to work on this!
Assignee: nobody → ttung
Status: NEW → ASSIGNED
(In reply to Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?) from comment #0)
Hi Ben,

I put the self-notes for the comment #0 below. I'd like to ask for your quick feedback before starting to implement it. Also, please correct me if there is anything wrong. Thanks!

> We should log the following:
> 1) How long a SW that has woken up gets to run before it is killed.
This is the time between state_active and state_redundant for a SWInfo for the normal case. Should we consider cases like a SW is killed because some failures/exceptions happen?

>  a) The reason it was killed - timeout, idle timeout, or stopped controlling
Add counters for every of them.

> 2) Every time an event is sent to the worker (what kind of event) and
> whether the worker was spawned for this or an existing one was used.
>   It might be better to not log "fetch" since it will happen frequently and
> is a hot path, but we may want to log some proxy like how often a SW
> controls documents or how often requests get actually intercepted by the SW

I need to get:
- The frequency of each type of event.
Add counters for every event (exclude the fetch event).

- How often do aNewWorkerCreated return true in SpawnWorkerIfNeeded()?
Add counters to get num_of_aNewWorkerCreated_true / num_of_calling_SpawnWorkerIfNeeded 

- how often requests get actually intercepted by the SW?
Add counters to get num_of_incepeted_request / num_of_request

> 3) The time SWs are usually taking between the event being fired and either
> the event dispatch finishing, or, if waitUntil() was called, until the
> promise(s) resolve(s).

If we collect the data for this, maybe we don't need to add extra counters for each event. The reason is that we can get it by calculating how many times of the measured event-dispatch-time for every kinds of event are recoreded.
Flags: needinfo?(bkelly)
I need to think about this a bit.  I'm not sure comment 0 is completely up-to-date any more, but there are some questions it would be useful to answer with this type of data.  I think we should try to define these questions first and then figure out exactly what data we need to answer them.
Assignee: ttung → nobody
Status: ASSIGNED → NEW
(In reply to Ben Kelly [:bkelly] from comment #4)
> I need to think about this a bit.  I'm not sure comment 0 is completely
> up-to-date any more, but there are some questions it would be useful to
> answer with this type of data.  I think we should try to define these
> questions first and then figure out exactly what data we need to answer them.

Think it would be worthwhile to discuss this in our next team meeting?
Priority: -- → P3
I never got a chance to look at the telemetry again.  I do think it is important to have probes to make sure things are working as intended, though.
Flags: needinfo?(bkelly)
Type: defect → task
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.