1191339 - Launch application with ServiceWorker registered takes much longer

Reporter

Description

•

9 years ago

While in Paris NGA work week, Francisco told me they have some benchmark tests for ServiceWorker [1]. And the result he most concerns is the app empty-sw, since it takes ~2x startup time comapre to baseline which does not register any. Note the app registers just an empty ServiceWorker, and fetch a json file when startup. I cpatured two profiles on Aries L, one for baseline [2] and one for empty-sw [3] (add the apps to apps/verticalhome/build/default-homescreens.json so I can launch them). I also enabled ServiceWorker preferences "dom.serviceWorkers.enabled", "dom.serviceWorkers.interception.enabled" and installed the ServiceWorker before profiling. The results of running Raptor to measure both apps on Aries L: baseline: | Metric | Mean | Median | Min | Max | StdDev | p95 | | --------------------------------------- | ------- | ------- | ------- | ------- | ------ | ------- | | coldlaunch.benchmark-js-file-parsed | 111.067 | 94.500 | 70.000 | 248.000 | 40.281 | 182.000 | | coldlaunch.benchmark-dom-content-loaded | 122.172 | 109.000 | 76.000 | 222.000 | 38.361 | 205.850 | | coldlaunch.benchmark-load | 144.833 | 129.000 | 96.000 | 257.000 | 45.155 | 251.000 | | coldlaunch.benchmark-payload-loaded | 161.567 | 142.500 | 110.000 | 290.000 | 48.242 | 285.000 | | coldlaunch.benchmark-payload-read | 178.900 | 159.000 | 126.000 | 297.000 | 47.796 | 292.000 | | coldlaunch.fullyLoaded | 179.000 | 159.000 | 126.000 | 297.000 | 47.914 | 293.000 | empty-sw: | Metric | Mean | Median | Min | Max | StdDev | p95 | | ----------------------------------- | ------- | ------- | ------- | ------- | ------ | ------- | | coldlaunch.benchmark-js-file-parsed | 273.967 | 269.000 | 213.000 | 338.000 | 34.135 | 332.000 | | coldlaunch.benchmark-load | 275.400 | 270.500 | 214.000 | 338.000 | 34.190 | 333.000 | | coldlaunch.benchmark-payload-loaded | 312.367 | 313.000 | 239.000 | 377.000 | 35.130 | 369.000 | | coldlaunch.benchmark-payload-read | 315.967 | 316.500 | 251.000 | 380.000 | 34.727 | 371.000 | | coldlaunch.fullyLoaded | 316.167 | 316.500 | 252.000 | 380.000 | 34.706 | 372.000 | Versions: Gecko: f3b757156f69 Gaia: 581de383687dc441a878d2c91a0167c6ec688fef [1] https://github.com/azasypkin/sw-tests [2] http://people.mozilla.org/~bgirard/cleopatra/#report=049737cc6516a32ce4de73d123c10196a199910b [3] http://people.mozilla.org/~bgirard/cleopatra/#report=2511bcf01a1c118b6c4442edf6f6fb7cd5f381f2

Ting-Yu Chou [:ting] (away)

Reporter

Comment 1

•

9 years ago

(In reply to Ting-Yu Chou [:ting] from comment #0) > I cpatured two profiles on Aries L, one for baseline [2] and one for empty-sw [3] The profile of empty-sw shows child process main thread idles for ~150ms before WorkerMainThreadRunnable::Run(), I'll try to figure out why. Ben, please let us know if you have any ideas, thanks.

Flags: needinfo?(bkelly)

Ben Kelly [:bkelly, not reviewing]

Comment 2

•

9 years ago

Is it spending time spinning up the ServiceWorker thread (even though you have no fetch event handler)?

Flags: needinfo?(bkelly)

Ting-Yu Chou [:ting] (away)

Reporter

Comment 3

•

9 years ago

I am not sure, will figure that out. Maybe I should change the subject a bit, the problem is not about whether there's a fetch event handler or not, but why it's idling.

Summary: Lauch application with ServiceWorker registered takes much longer, even it is empty → Lauch application with ServiceWorker registered takes much longer

Ben Kelly [:bkelly, not reviewing]

Comment 4

•

9 years ago

What do you mean by "idling"? Do you mean why is the ServiceWorker going away between requests? The answer to that is we haven't implemented a grace period to keep the ServiceWorker alive for a short time before killing it. We should add this to debounce the worker thread and minimize teardown/startup costs. Catalin, is this related to what you are looking at now?

Flags: needinfo?(catalin.badea392)

Summary: Lauch application with ServiceWorker registered takes much longer → Launch application with ServiceWorker registered takes much longer

Ting-Yu Chou [:ting] (away)

Reporter

Comment 5

•

9 years ago

I mean the child process main thread is idling for ~150ms during application startup.

Ting-Yu Chou [:ting] (away)

Reporter

Comment 6

•

9 years ago

(In reply to Ting-Yu Chou [:ting] from comment #5) > I mean the child process main thread is idling for ~150ms during application > startup. "idling" is waiting in epoll_wait().

Ben Kelly [:bkelly, not reviewing]

Comment 7

•

9 years ago

Is it in a single epoll_wait() or spread across many?

Fernando Jiménez Moreno [:ferjm]

Comment 8

•

9 years ago

Attached patch Perf patch — Details — Splinter Review

Fernando Jiménez Moreno [:ferjm]

Updated

•

9 years ago

Attachment #8643802 - Attachment is patch: true

Fernando Jiménez Moreno [:ferjm]

Comment 9

•

9 years ago

BTW I run the tests on a Flame device

Ben Kelly [:bkelly, not reviewing]

Comment 10

•

9 years ago

(In reply to Fernando Jiménez Moreno [:ferjm] (PTO July 27th - July 31th and Aug10th - Aug 24th) from comment #8) > As you can see on the log, the requests for app.js and dummy.js are > different in a way that we create a worker private for app.js but we reuse > it for dummy.js, but still the times are quite similar. > > The difference between the times of the app.js and dummy.js and payload.json > requests may be because payload.json is requested via the Fetch API (?) This is probably because dummy.js starts at almost the same time as app.js. It reuses the WorkerPrivate, but the worker is still not started yet. In contrast, payload.json starts just before app.js and dummy.js complete. So it does benefit from the WorkerPrivate thread being fully spun up.

Ben Kelly [:bkelly, not reviewing]

Comment 11

•

9 years ago

If the WorkerPrivate thread from handling index.html interception was still alive when app.js was triggered, then it seems we would save ~100ms.

Cătălin Badea (:catalinb)

Comment 12

•

9 years ago

(In reply to Ben Kelly [:bkelly] from comment #4) > What do you mean by "idling"? Do you mean why is the ServiceWorker going > away between requests? > > The answer to that is we haven't implemented a grace period to keep the > ServiceWorker alive for a short time before killing it. We should add this > to debounce the worker thread and minimize teardown/startup costs. > > Catalin, is this related to what you are looking at now? From what I can tell, there are two problems here: 1. We're not reusing the service worker to handle sequences of fetch events. 2. We dispatch fetch events to a service worker that doesn't have a fetch event handler. My patches from bug 1188545 should fix the first problem - the service worker will live at least ~30 seconds after each event. At the moment, each fetch event holds a reference to the service worker that keeps it alive (which in turn keeps the WorkerPrivate alive) until the fetch handler was executed. Thus, having a very fast succession of fetches should reuse the WorkerPrivate. I suspect the reason the first fetch is slower than the rest in Fernando's results is because that's the only time the worker thread is actually created. Gecko keeps threads around for a short amount of time (I think 10s) and reuses them when creating a new WorkerPrivate.

Flags: needinfo?(catalin.badea392)

Cătălin Badea (:catalinb)

Comment 13

•

9 years ago

(In reply to Ting-Yu Chou [:ting] from comment #1) > (In reply to Ting-Yu Chou [:ting] from comment #0) > > I cpatured two profiles on Aries L, one for baseline [2] and one for empty-sw [3] > > The profile of empty-sw shows child process main thread idles for ~150ms > before WorkerMainThreadRunnable::Run(), I'll try to figure out why. > > Ben, please let us know if you have any ideas, thanks. Is that the WorkerMainThreadRunnable that resets the channel for index.html?

Flags: needinfo?(janus926)

Ben Kelly [:bkelly, not reviewing]

Comment 14

•

9 years ago

(In reply to Cătălin Badea (:catalinb) from comment #12) > 2. We dispatch fetch events to a service worker that doesn't have a fetch > event handler. This is not something we can address ourselves. We agreed to add a disableFetch() call on InstallEvent to opt-out of fetch event processing. We haven't implemented that yet, though. > My patches from bug 1188545 should fix the first problem - the service > worker will live at least ~30 seconds > after each event. Awesome! > At the moment, each fetch event holds a reference to the service worker that > keeps it alive (which in turn keeps the WorkerPrivate alive) until the fetch > handler was executed. Thus, having a very fast succession of fetches should > reuse the WorkerPrivate. I suspect the reason the first fetch is slower than > the rest in Fernando's results is because that's the only time the worker > thread is actually created. If there is no overlap in the FetchEvent, then the ServiceWorker ref drops to zero. > Gecko keeps threads around for a short amount of time (I think 10s) and > reuses them when creating a new WorkerPrivate. Keeps threads alive, but it still starts shutting down the javascript running inside of it immediately when ref count of the DOM object reaches zero as far as I know. Loading, parsing, and executing the javascript is probably heavier weight than actually spawning an OS thread.

Ting-Yu Chou [:ting] (away)

Reporter

Comment 15

•

9 years ago

(In reply to Cătălin Badea (:catalinb) from comment #13) > Is that the WorkerMainThreadRunnable that resets the channel for index.html? It is CompileScriptRunnable that takes time. Child process main thread is mostly idling between nsJARChannel::AsyncOpen() intercepting index.html and FetchEventRunnable calling nsJARChannel::ResetInterception(), and the CompileScriptRunnable for sw.js is processed before FetchEventRunnable. There're also two problems here: - Why does CompileScriptRunnable take time for an empty script? - Can we compile the service worker script ahead of time (AOT), e.g., startup cache?

Flags: needinfo?(janus926)

Ting-Yu Chou [:ting] (away)

Reporter

Updated

•

9 years ago

Blocks: nga-performance

Vivien Nicolas (:vingtetun) (:21) - (NOT reading bugmails, needinfo? please)

Comment 16

•

9 years ago

(In reply to Ting-Yu Chou [:ting] from comment #15) > (In reply to Cătălin Badea (:catalinb) from comment #13) > > Is that the WorkerMainThreadRunnable that resets the channel for index.html? > > It is CompileScriptRunnable that takes time. > > Child process main thread is mostly idling between nsJARChannel::AsyncOpen() > intercepting index.html and FetchEventRunnable calling > nsJARChannel::ResetInterception(), and the CompileScriptRunnable for sw.js > is processed before FetchEventRunnable. > > There're also two problems here: > > - Why does CompileScriptRunnable take time for an empty script? > - Can we compile the service worker script ahead of time (AOT), e.g., > startup cache? For the JS scripts in general I was hoping to be able to use some of the work in bug 1164389. So far the patches there does not support Workers but it should not be too hard to add in dom/workers/ScriptLoader.cpp (irrc).

Ting-Yu Chou [:ting] (away)

Reporter

Comment 17

•

9 years ago

(In reply to Ting-Yu Chou [:ting] from comment #15) > - Why does CompileScriptRunnable take time for an empty script? It takes time from ScriptLoaderRunnable::RunInternal() calling CacheCreator::Load() til CacheCreator::ResolvedCallback() is called. I'll take a deeper look.

Ting-Yu Chou [:ting] (away)

Reporter

Comment 18

•

9 years ago

The dealy is from two runnables which both dispatch to and pending on chrome process main thread: PrincipalVerifier and QuotaInitRunnable, thus ScripLoaderRunnable receives CacheCreator::ResolvedCallback() late in content process. I added profiler markers (PVDispatch, PVRun, QIRDispatch, QIRRun) before dispatching the runnable and when it runs, the profile [1] shows B2G process main thread was busy working in: 1. awf_launch() handling webapps-launch event in system app (~15ms), 2. refresh driver tick #1 (50ms), 3. awm_switchApp() to switch app to foreground/background in system app (~25ms), 4. refresh driver tick #2 (80ms) There's a bug 1110625 for 2 and 4. [1] http://people.mozilla.org/~bgirard/cleopatra/#report=8a3f722ae0eb9733943fc52208878e4b29a115d4

Depends on: 1110625

Ben Kelly [:bkelly, not reviewing]

Comment 19

•

9 years ago

(In reply to Ting-Yu Chou [:ting] from comment #18) > 1. awf_launch() handling webapps-launch event in system app (~15ms), > 2. refresh driver tick #1 (50ms), > 3. awm_switchApp() to switch app to foreground/background in system app > (~25ms), > 4. refresh driver tick #2 (80ms) We could try to collapse the Cache main thread runnables into a single runnable, but it seems like it will still get delayed on these operations. Unfortunately I don't think there is any way to avoid main thread when opening an origin in QuotaManager right now.

Ting-Yu Chou [:ting] (away)

Reporter

Comment 20

•

9 years ago

Do we always need Cache to be ready before loading service worker script?

Ben Kelly [:bkelly, not reviewing]

Comment 21

•

9 years ago

(In reply to Ting-Yu Chou [:ting] from comment #20) > Do we always need Cache to be ready before loading service worker script? Currently the service worker scripts are stored in the Cache API, so yes we do. We chose Cache API partly because it was convenient (didn't need to get http cache pinning implemented) and partly because the scripts were exposed in the spec via a Cache object. I believe they are no longer exposed in the spec, however, so we could stop using Cache API in theory. This would be a pretty large change, though.

Ben Kelly [:bkelly, not reviewing]

Comment 22

•

9 years ago

Also, we're talking about moving other persistent data like SW registrations into quota manager. So they would also grow a "main thread runnable at startup" requirement. Jan, does your work to switch QM over to PBackground reduce the need to go to main thread to initialize origin storage in QM?

Flags: needinfo?(Jan.Varga)

Jan Varga [:janv]

Comment 23

•

9 years ago

(In reply to Ben Kelly [:bkelly] from comment #22) > Also, we're talking about moving other persistent data like SW registrations > into quota manager. So they would also grow a "main thread runnable at > startup" requirement. > > Jan, does your work to switch QM over to PBackground reduce the need to go > to main thread to initialize origin storage in QM? Once quota manager is on PBackground thread, the main thread will be only used occasionally. For example the first call to QuotaManager::GetOrCreate(nsIRunnable* aCallback) will have to dispatch a runnable to main thread to get profile directory and preferences.

Flags: needinfo?(Jan.Varga)

Ting-Yu Chou [:ting] (away)

Reporter

Comment 24

•

9 years ago

Aleh, bug 1196604 is landed, do you see any improvements from that?

Flags: needinfo?(azasypkin)

Oleg Zasypkin [:azasypkin]

Comment 25

•

9 years ago

(In reply to Ting-Yu Chou [:ting] from comment #24) > Aleh, bug 1196604 is landed, do you see any improvements from that? Mmm, I've just re-run tests I use to measure SW performance. I see that app with registered SW loads a bit faster now (it takes ~50ms less for the first app performance mark). See the latest numbers [1] and compare with the previous ones [2]. Only thing that bothers me a bit is the initial numbers [3] look even better than both [2] and [3]... Not sure what has changed. [1] https://github.com/nga-crash-tests/service-worker-benchmarks#24.08.2015 [2] https://github.com/nga-crash-tests/service-worker-benchmarks#12.08.2015 [3] https://github.com/nga-crash-tests/service-worker-benchmarks#15.07.2015

Flags: needinfo?(azasypkin)

Fernando Jiménez Moreno [:ferjm]

Updated

•

9 years ago

Target Milestone: --- → FxOS-S7 (18Sep)

Fernando Jiménez Moreno [:ferjm]

Updated

•

9 years ago

Depends on: 1188545

Bobby Chien

Updated

•

9 years ago

tracking-b2g: --- → +

Keywords: perf

Priority: -- → P1

Ting-Yu Chou [:ting] (away)

Reporter

Comment 26

•

9 years ago

Bug 961049 was just landed to switch QuotaManager to PBackground.

BMO Automation

Comment 27

•

7 years ago

Firefox OS is not being worked on

Status: NEW → RESOLVED

Closed: 7 years ago

Resolution: --- → WONTFIX