Launch application with ServiceWorker registered takes much longer

NEW
Unassigned

Status

Firefox OS
Performance
P1
normal
3 years ago
9 months ago

People

(Reporter: ting, Unassigned)

Tracking

(Depends on: 1 bug, Blocks: 1 bug, {perf})

unspecified
FxOS-S7 (18Sep)
ARM
Gonk (Firefox OS)
Dependency tree / graph

Firefox Tracking Flags

(tracking-b2g:+)

Details

Attachments

(1 attachment)

(Reporter)

Description

3 years ago
While in Paris NGA work week, Francisco told me they have some benchmark tests for ServiceWorker [1]. And the result he most concerns is the app empty-sw, since it takes ~2x startup time comapre to baseline which does not register any. Note the app registers just an empty ServiceWorker, and fetch a json file when startup.

I cpatured two profiles on Aries L, one for baseline [2] and one for empty-sw [3] (add the apps to apps/verticalhome/build/default-homescreens.json so I can launch them). I also enabled ServiceWorker preferences "dom.serviceWorkers.enabled", "dom.serviceWorkers.interception.enabled" and installed the ServiceWorker before profiling.

The results of running Raptor to measure both apps on Aries L:

  baseline:
    | Metric                                  | Mean    | Median  | Min     | Max     | StdDev | p95     |
    | --------------------------------------- | ------- | ------- | ------- | ------- | ------ | ------- |
    | coldlaunch.benchmark-js-file-parsed     | 111.067 | 94.500  | 70.000  | 248.000 | 40.281 | 182.000 |
    | coldlaunch.benchmark-dom-content-loaded | 122.172 | 109.000 | 76.000  | 222.000 | 38.361 | 205.850 |
    | coldlaunch.benchmark-load               | 144.833 | 129.000 | 96.000  | 257.000 | 45.155 | 251.000 |
    | coldlaunch.benchmark-payload-loaded     | 161.567 | 142.500 | 110.000 | 290.000 | 48.242 | 285.000 |
    | coldlaunch.benchmark-payload-read       | 178.900 | 159.000 | 126.000 | 297.000 | 47.796 | 292.000 |
    | coldlaunch.fullyLoaded                  | 179.000 | 159.000 | 126.000 | 297.000 | 47.914 | 293.000 |

  empty-sw:
    | Metric                              | Mean    | Median  | Min     | Max     | StdDev | p95     |
    | ----------------------------------- | ------- | ------- | ------- | ------- | ------ | ------- |
    | coldlaunch.benchmark-js-file-parsed | 273.967 | 269.000 | 213.000 | 338.000 | 34.135 | 332.000 |
    | coldlaunch.benchmark-load           | 275.400 | 270.500 | 214.000 | 338.000 | 34.190 | 333.000 |
    | coldlaunch.benchmark-payload-loaded | 312.367 | 313.000 | 239.000 | 377.000 | 35.130 | 369.000 |
    | coldlaunch.benchmark-payload-read   | 315.967 | 316.500 | 251.000 | 380.000 | 34.727 | 371.000 |
    | coldlaunch.fullyLoaded              | 316.167 | 316.500 | 252.000 | 380.000 | 34.706 | 372.000 |

Versions:
  Gecko: f3b757156f69
  Gaia: 581de383687dc441a878d2c91a0167c6ec688fef

[1] https://github.com/azasypkin/sw-tests
[2] http://people.mozilla.org/~bgirard/cleopatra/#report=049737cc6516a32ce4de73d123c10196a199910b
[3] http://people.mozilla.org/~bgirard/cleopatra/#report=2511bcf01a1c118b6c4442edf6f6fb7cd5f381f2
(Reporter)

Comment 1

3 years ago
(In reply to Ting-Yu Chou [:ting] from comment #0)
> I cpatured two profiles on Aries L, one for baseline [2] and one for empty-sw [3]

The profile of empty-sw shows child process main thread idles for ~150ms before WorkerMainThreadRunnable::Run(), I'll try to figure out why.

Ben, please let us know if you have any ideas, thanks.
Flags: needinfo?(bkelly)

Comment 2

3 years ago
Is it spending time spinning up the ServiceWorker thread (even though you have no fetch event handler)?
Flags: needinfo?(bkelly)
(Reporter)

Comment 3

3 years ago
I am not sure, will figure that out.

Maybe I should change the subject a bit, the problem is not about whether there's a fetch event handler or not, but why it's idling.
Summary: Lauch application with ServiceWorker registered takes much longer, even it is empty → Lauch application with ServiceWorker registered takes much longer

Comment 4

3 years ago
What do you mean by "idling"?  Do you mean why is the ServiceWorker going away between requests?

The answer to that is we haven't implemented a grace period to keep the ServiceWorker alive for a short time before killing it.  We should add this to debounce the worker thread and minimize teardown/startup costs.

Catalin, is this related to what you are looking at now?
Flags: needinfo?(catalin.badea392)
Summary: Lauch application with ServiceWorker registered takes much longer → Launch application with ServiceWorker registered takes much longer
(Reporter)

Comment 5

3 years ago
I mean the child process main thread is idling for ~150ms during application startup.
(Reporter)

Comment 6

3 years ago
(In reply to Ting-Yu Chou [:ting] from comment #5)
> I mean the child process main thread is idling for ~150ms during application
> startup.

"idling" is waiting in epoll_wait().

Comment 7

3 years ago
Is it in a single epoll_wait() or spread across many?
Created attachment 8643802 [details] [diff] [review]
Perf patch

I am trying to set some perf marks on the SW implementation to see where the time is spent with the attached patch and this is the output that I got so far:

I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|INTERCEPTION_START_app://benchmark-app.gaiamobile.org/empty-sw/index.html|594144.537325|0.000000|1438795590256
I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|WORKER_PRIVATE_CREATION|594146.488887|0.000000|1438795590258
I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|SERVICE_WORKER_CREATION|594146.806908|0.000000|1438795590258
I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|DISPATCH_FETCH_EVENT_app://benchmark-app.gaiamobile.org/empty-sw/index.html|594533.544096|0.000000|1438795590645
I/PerformanceTiming( 3214): Performance Entry: Gecko|measure|TIME_FROM_INTERCEPTION_TO_FETCH_app://benchmark-app.gaiamobile.org/empty-sw/index.html|594144.537325|389.006771|1438795590646
I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|RESET_INTERCEPTION_app://benchmark-app.gaiamobile.org/empty-sw/index.html|594539.975867|0.000000|1438795590651
I/PerformanceTiming( 3214): Performance Entry: Gecko|measure|TIME_FROM_INTERCEPTION_TO_RESET_app://benchmark-app.gaiamobile.org/empty-sw/index.html|594144.537325|395.438542|1438795590652
I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|INTERCEPTION_START_app://benchmark-app.gaiamobile.org/empty-sw/js/app.js|594591.069773|0.000000|1438795590702
I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|WORKER_PRIVATE_CREATION|594596.358002|0.000000|1438795590707
I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|SERVICE_WORKER_CREATION|594596.463887|0.000000|1438795590708
I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|INTERCEPTION_START_app://benchmark-app.gaiamobile.org/empty-sw/js/dummy.js|594597.139148|0.000000|1438795590708
I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|WORKER_PRIVATE_REUSED|594603.246648|0.000000|1438795590714
I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|SERVICE_WORKER_CREATION|594603.460241|0.000000|1438795590715
I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|DISPATCH_FETCH_EVENT_app://benchmark-app.gaiamobile.org/empty-sw/js/app.js|594709.592117|0.000000|1438795590821
I/PerformanceTiming( 3214): Performance Entry: Gecko|measure|TIME_FROM_INTERCEPTION_TO_FETCH_app://benchmark-app.gaiamobile.org/empty-sw/js/app.js|594591.069773|118.522344|1438795590821
I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|RESET_INTERCEPTION_app://benchmark-app.gaiamobile.org/empty-sw/js/app.js|594710.466908|0.000000|1438795590822
I/PerformanceTiming( 3214): Performance Entry: Gecko|measure|TIME_FROM_INTERCEPTION_TO_RESET_app://benchmark-app.gaiamobile.org/empty-sw/js/app.js|594591.069773|119.397135|1438795590822
I/PerformanceTiming( 3214): Performance Entry: benchmark-app.gaiamobile.org|mark|benchmark-js-file-parsed|584.330000|0.000000|1438795590825
I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|DISPATCH_FETCH_EVENT_app://benchmark-app.gaiamobile.org/empty-sw/js/dummy.js|594716.858575|0.000000|1438795590828
I/PerformanceTiming( 3214): Performance Entry: Gecko|measure|TIME_FROM_INTERCEPTION_TO_FETCH_app://benchmark-app.gaiamobile.org/empty-sw/js/dummy.js|594597.139148|119.719427|1438795590828
I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|INTERCEPTION_START_app://benchmark-app.gaiamobile.org/empty-sw/payload.json|594717.400710|0.000000|1438795590829
I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|WORKER_PRIVATE_REUSED|594719.572273|0.000000|1438795590831
I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|SERVICE_WORKER_CREATION|594719.824617|0.000000|1438795590831
I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|RESET_INTERCEPTION_app://benchmark-app.gaiamobile.org/empty-sw/js/dummy.js|594720.922429|0.000000|1438795590832
I/PerformanceTiming( 3214): Performance Entry: Gecko|measure|TIME_FROM_INTERCEPTION_TO_RESET_app://benchmark-app.gaiamobile.org/empty-sw/js/dummy.js|594597.139148|123.783281|1438795590832
I/PerformanceTiming( 3214): Performance Entry: benchmark-app.gaiamobile.org|mark|benchmark-load|592.960000|0.000000|1438795590834
I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|DISPATCH_FETCH_EVENT_app://benchmark-app.gaiamobile.org/empty-sw/payload.json|594738.534356|0.000000|1438795590850
I/PerformanceTiming( 3214): Performance Entry: Gecko|measure|TIME_FROM_INTERCEPTION_TO_FETCH_app://benchmark-app.gaiamobile.org/empty-sw/payload.json|594717.400710|21.133646|1438795590850
I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|RESET_INTERCEPTION_app://benchmark-app.gaiamobile.org/empty-sw/payload.json|594764.198002|0.000000|1438795590875
I/PerformanceTiming( 3214): Performance Entry: Gecko|measure|TIME_FROM_INTERCEPTION_TO_RESET_app://benchmark-app.gaiamobile.org/empty-sw/payload.json|594717.400710|46.797292|1438795590876
I/PerformanceTiming( 3214): Performance Entry: benchmark-app.gaiamobile.org|mark|benchmark-payload-loaded|698.780000|0.000000|1438795590940
I/PerformanceTiming( 3214): Performance Entry: benchmark-app.gaiamobile.org|mark|benchmark-payload-read|715.210000|0.000000|1438795590956
I/PerformanceTiming( 3214): Performance Entry: benchmark-app.gaiamobile.org|mark|fullyLoaded|715.420000|0.000000|1438795590956
I/PerformanceTiming( 2228): Performance Entry: Gecko|mark|INTERCEPTION_START_app://benchmark-app.gaiamobile.org/img/icons/icon48x48.png|989178.313789|0.000000|1438795591069
I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|SERVICE_WORKER_DESTRUCTION|600403.501645|0.000000|1438795596515
I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|SERVICE_WORKER_DESTRUCTION|600403.593520|0.000000|1438795596515
I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|SERVICE_WORKER_DESTRUCTION|600403.647113|0.000000|1438795596515
I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|SERVICE_WORKER_DESTRUCTION|600403.695603|0.000000|1438795596515

I am measuring the time that it takes from starting the interception of a request to sending the fetch event and to resetting the interception of the request. IIUC, the numbers show the following:

- We take ~390ms to get to the interception reset for the first resource (index.html).
- We take ~125ms for the following resources (app.js and dummy.js).
- We take ~45ms for the last resource (payload.json).

As you can see on the log, the requests for app.js and dummy.js are different in a way that we create a worker private for app.js but we reuse it for dummy.js, but still the times are quite similar.

The difference between the times of the app.js and dummy.js and payload.json requests may be because payload.json is requested via the Fetch API (?)
Attachment #8643802 - Attachment is patch: true
BTW I run the tests on a Flame device

Comment 10

3 years ago
(In reply to Fernando Jiménez Moreno [:ferjm] (PTO July 27th - July 31th and Aug10th - Aug 24th) from comment #8)
> As you can see on the log, the requests for app.js and dummy.js are
> different in a way that we create a worker private for app.js but we reuse
> it for dummy.js, but still the times are quite similar.
> 
> The difference between the times of the app.js and dummy.js and payload.json
> requests may be because payload.json is requested via the Fetch API (?)

This is probably because dummy.js starts at almost the same time as app.js.  It reuses the WorkerPrivate, but the worker is still not started yet.

In contrast, payload.json starts just before app.js and dummy.js complete.  So it does benefit from the WorkerPrivate thread being fully spun up.

Comment 11

3 years ago
If the WorkerPrivate thread from handling index.html interception was still alive when app.js was triggered, then it seems we would save ~100ms.
(In reply to Ben Kelly [:bkelly] from comment #4)
> What do you mean by "idling"?  Do you mean why is the ServiceWorker going
> away between requests?
> 
> The answer to that is we haven't implemented a grace period to keep the
> ServiceWorker alive for a short time before killing it.  We should add this
> to debounce the worker thread and minimize teardown/startup costs.
> 
> Catalin, is this related to what you are looking at now?

From what I can tell, there are two problems here:
1. We're not reusing the service worker to handle sequences of fetch events.
2. We dispatch fetch events to a service worker that doesn't have a fetch event handler.

My patches from bug 1188545 should fix the first problem - the service worker will live at least ~30 seconds
after each event.

At the moment, each fetch event holds a reference to the service worker that keeps it alive (which in turn keeps the WorkerPrivate alive) until the fetch handler was executed. Thus, having a very fast succession of fetches should reuse the WorkerPrivate. I suspect the reason the first fetch is slower than the rest in Fernando's results is because that's the only time the worker thread is actually created.
Gecko keeps threads around for a short amount of time (I think 10s) and reuses them when creating a new WorkerPrivate.
Flags: needinfo?(catalin.badea392)
(In reply to Ting-Yu Chou [:ting] from comment #1)
> (In reply to Ting-Yu Chou [:ting] from comment #0)
> > I cpatured two profiles on Aries L, one for baseline [2] and one for empty-sw [3]
> 
> The profile of empty-sw shows child process main thread idles for ~150ms
> before WorkerMainThreadRunnable::Run(), I'll try to figure out why.
> 
> Ben, please let us know if you have any ideas, thanks.

Is that the WorkerMainThreadRunnable that resets the channel for index.html?
Flags: needinfo?(janus926)

Comment 14

3 years ago
(In reply to Cătălin Badea (:catalinb) from comment #12)
> 2. We dispatch fetch events to a service worker that doesn't have a fetch
> event handler.

This is not something we can address ourselves.  We agreed to add a disableFetch() call on InstallEvent to opt-out of fetch event processing.  We haven't implemented that yet, though.

> My patches from bug 1188545 should fix the first problem - the service
> worker will live at least ~30 seconds
> after each event.

Awesome!

> At the moment, each fetch event holds a reference to the service worker that
> keeps it alive (which in turn keeps the WorkerPrivate alive) until the fetch
> handler was executed. Thus, having a very fast succession of fetches should
> reuse the WorkerPrivate. I suspect the reason the first fetch is slower than
> the rest in Fernando's results is because that's the only time the worker
> thread is actually created.

If there is no overlap in the FetchEvent, then the ServiceWorker ref drops to zero.

> Gecko keeps threads around for a short amount of time (I think 10s) and
> reuses them when creating a new WorkerPrivate.

Keeps threads alive, but it still starts shutting down the javascript running inside of it immediately when ref count of the DOM object reaches zero as far as I know.  Loading, parsing, and executing the javascript is probably heavier weight than actually spawning an OS thread.
(Reporter)

Comment 15

3 years ago
(In reply to Cătălin Badea (:catalinb) from comment #13)
> Is that the WorkerMainThreadRunnable that resets the channel for index.html?

It is CompileScriptRunnable that takes time.

Child process main thread is mostly idling between nsJARChannel::AsyncOpen() intercepting index.html and FetchEventRunnable calling nsJARChannel::ResetInterception(), and the CompileScriptRunnable for sw.js is processed before FetchEventRunnable.

There're also two problems here:

  - Why does CompileScriptRunnable take time for an empty script?
  - Can we compile the service worker script ahead of time (AOT), e.g., startup cache?
Flags: needinfo?(janus926)
(Reporter)

Updated

3 years ago
Blocks: 1168078
(In reply to Ting-Yu Chou [:ting] from comment #15)
> (In reply to Cătălin Badea (:catalinb) from comment #13)
> > Is that the WorkerMainThreadRunnable that resets the channel for index.html?
> 
> It is CompileScriptRunnable that takes time.
> 
> Child process main thread is mostly idling between nsJARChannel::AsyncOpen()
> intercepting index.html and FetchEventRunnable calling
> nsJARChannel::ResetInterception(), and the CompileScriptRunnable for sw.js
> is processed before FetchEventRunnable.
> 
> There're also two problems here:
> 
>   - Why does CompileScriptRunnable take time for an empty script?
>   - Can we compile the service worker script ahead of time (AOT), e.g.,
> startup cache?

For the JS scripts in general I was hoping to be able to use some of the work in bug 1164389. So far the patches there does not support Workers but it should not be too hard to add in dom/workers/ScriptLoader.cpp (irrc).
(Reporter)

Comment 17

3 years ago
(In reply to Ting-Yu Chou [:ting] from comment #15)
>   - Why does CompileScriptRunnable take time for an empty script?

It takes time from ScriptLoaderRunnable::RunInternal() calling CacheCreator::Load() til CacheCreator::ResolvedCallback() is called. I'll take a deeper look.
(Reporter)

Comment 18

3 years ago
The dealy is from two runnables which both dispatch to and pending on chrome process main thread: PrincipalVerifier and QuotaInitRunnable, thus ScripLoaderRunnable receives CacheCreator::ResolvedCallback() late in content process.

I added profiler markers (PVDispatch, PVRun, QIRDispatch, QIRRun) before dispatching the runnable and when it runs, the profile [1] shows B2G process main thread was busy working in:

  1. awf_launch() handling webapps-launch event in system app (~15ms),
  2. refresh driver tick #1 (50ms),
  3. awm_switchApp() to switch app to foreground/background in system app (~25ms),
  4. refresh driver tick #2 (80ms)

There's a bug 1110625 for 2 and 4.

[1] http://people.mozilla.org/~bgirard/cleopatra/#report=8a3f722ae0eb9733943fc52208878e4b29a115d4
Depends on: 1110625

Comment 19

3 years ago
(In reply to Ting-Yu Chou [:ting] from comment #18)
>   1. awf_launch() handling webapps-launch event in system app (~15ms),
>   2. refresh driver tick #1 (50ms),
>   3. awm_switchApp() to switch app to foreground/background in system app
> (~25ms),
>   4. refresh driver tick #2 (80ms)

We could try to collapse the Cache main thread runnables into a single runnable, but it seems like it will still get delayed on these operations.

Unfortunately I don't think there is any way to avoid main thread when opening an origin in QuotaManager right now.
(Reporter)

Comment 20

3 years ago
Do we always need Cache to be ready before loading service worker script?

Comment 21

3 years ago
(In reply to Ting-Yu Chou [:ting] from comment #20)
> Do we always need Cache to be ready before loading service worker script?

Currently the service worker scripts are stored in the Cache API, so yes we do.

We chose Cache API partly because it was convenient (didn't need to get http cache pinning implemented) and partly because the scripts were exposed in the spec via a Cache object.  I believe they are no longer exposed in the spec, however, so we could stop using Cache API in theory.  This would be a pretty large change, though.

Comment 22

3 years ago
Also, we're talking about moving other persistent data like SW registrations into quota manager.  So they would also grow a "main thread runnable at startup" requirement.

Jan, does your work to switch QM over to PBackground reduce the need to go to main thread to initialize origin storage in QM?
Flags: needinfo?(Jan.Varga)

Comment 23

3 years ago
(In reply to Ben Kelly [:bkelly] from comment #22)
> Also, we're talking about moving other persistent data like SW registrations
> into quota manager.  So they would also grow a "main thread runnable at
> startup" requirement.
> 
> Jan, does your work to switch QM over to PBackground reduce the need to go
> to main thread to initialize origin storage in QM?

Once quota manager is on PBackground thread, the main thread will be only used occasionally.
For example the first call to QuotaManager::GetOrCreate(nsIRunnable* aCallback) will have to dispatch a runnable to main thread to get profile directory and preferences.
Flags: needinfo?(Jan.Varga)
(Reporter)

Comment 24

3 years ago
Aleh, bug 1196604 is landed, do you see any improvements from that?
Flags: needinfo?(azasypkin)
(In reply to Ting-Yu Chou [:ting] from comment #24)
> Aleh, bug 1196604 is landed, do you see any improvements from that?

Mmm, I've just re-run tests I use to measure SW performance. I see that app with registered SW loads a bit faster now (it takes ~50ms less for the first app performance mark).  

See the latest numbers [1] and compare with the previous ones [2].

Only thing that bothers me a bit is the initial numbers [3] look even better than both [2] and [3]... Not sure what has changed.

[1] https://github.com/nga-crash-tests/service-worker-benchmarks#24.08.2015
[2] https://github.com/nga-crash-tests/service-worker-benchmarks#12.08.2015
[3] https://github.com/nga-crash-tests/service-worker-benchmarks#15.07.2015
Flags: needinfo?(azasypkin)
Target Milestone: --- → FxOS-S7 (18Sep)
Depends on: 1188545

Updated

2 years ago
tracking-b2g: --- → +
Keywords: perf
Priority: -- → P1
(Reporter)

Comment 26

2 years ago
Bug 961049 was just landed to switch QuotaManager to PBackground.
You need to log in before you can comment on or make changes to this bug.