Closed Bug 1335900 Opened 8 years ago Closed 5 years ago

Enabling service worker slows down time-to-interactive on dropbox.com by 600ms

Categories

(Core :: DOM: Service Workers, defect, P3)

defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
platform-rel --- +

People

(Reporter: dzbarsky, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: perf, Whiteboard: [platform-rel-Dropbox])

Attachments

(1 file)

Two test accounts:
dzbarsky+test_sw@gmail.com (service worker enabled) and dzbarsky+test_sw_off@gmail.com (service worker disabled)
Both have password 123456

Log in and go to https://www.dropbox.com/home
Once it finishes loading, you can do "require.s.contexts['embedded-app'].require('modules/clean/web_timing_logger').time_to_interactive - performance.timing.navigationStart" in the console to figure out when we mark the app as interactive. The timing is pretty noisy, but aggregate telemetry numbers from users in the wild show a 600ms difference on the browse page.

You may be able to reproduce on a simpler page, load https://www.dropbox.com/profiling/maestro_blank_embedded_app and run the same command in console.
Catalin is going to help here (he's busy this week, though).
Assignee: nobody → catalin.badea392
From what I can see, the registered service worker doesn't serve any cached resources and will just fall back to the network, at least for https://www.dropbox.com/profiling/maestro_blank_embedded_app.

I've measured the time it takes from when we decide to intercept the request until we reset the channel, which should be the overhead introduced by going through the service worker. We spend most of the time waiting for the ResumeRequest runnable to be executed on the main thread. Not using the throttled event queue reduces this time considerably.

The first dispatch to the service worker takes between 14ms and 30ms, this is probably due to the SW waking up. Subsequent dispatches usually take less than 1ms.

The averages and max values are taken while loading https://www.dropbox.com/profiling/maestro_blank_embedded_app with a registered service worker and then refreshing 6 times.

ThrottledQueue (refreshing the page 6 times):
Time between interception and channel reset: average=76.7275, max=574.190719
Time between ResumeRequest dispatch and execution on MT: average=76.454, max=574.041316

The max value is a bit unusual, measuring just the first load doesn't yield such high values.

ThrottledQueue (start firefox, load https://www.dropbox.com/profiling/maestro_blank_embedded_app then quit) x 6:
Time between interception and channel reset: average=74.6755, max=243.249065
Time between ResumeRequest dispatch and execution on MT: avg=74.2079, max=243.175957

NS_DispatchToMainThread:
Time between interception and channel reset: average=17.0206, max=68.695675
Time between ResumeRequest dispatch and execution on MT: average=16.8428, max=68.552611
They claim that serving resources out of Cache API also slowed it down, but its hard to measure when they have that disabled.  I tried to explain they should expect a perf hit if they interpose a javascript fetch event handler and it doesn't do anything.  Running the javascript is always going to be slower than not.
This is the patch I used to do the measurements. It's a bit of a hack and I need
to make sure these are correct.
(In reply to Ben Kelly [not reviewing due to deadline][:bkelly] from comment #3)
> They claim that serving resources out of Cache API also slowed it down, but
> its hard to measure when they have that disabled.  I tried to explain they
> should expect a perf hit if they interpose a javascript fetch event handler
> and it doesn't do anything.  Running the javascript is always going to be
> slower than not.

Right, but we may be able to improve on this by not throttling fetch response runnables or channel reset runnables. Also, this might be a good opportunity to write talos tests.

Could we skip the throttled queue in this case? Or give these runnables a higher priority, not sure how it works.
Flags: needinfo?(bkelly)
(In reply to Cătălin Badea (:catalinb) from comment #5)
> Right, but we may be able to improve on this by not throttling fetch
> response runnables or channel reset runnables. Also, this might be a good
> opportunity to write talos tests.
> 
> Could we skip the throttled queue in this case? Or give these runnables a
> higher priority, not sure how it works.

You mean bypassing ThrottledEventQueue?  I don't think we should do that.  That only affects timing if the main thread is busy.  If the main thread is busy, then your not going to be getting good performance here anyway.

ThrottledEventQueue does not actually "throttle".  It just yields to other main thread work between each runnable.
Flags: needinfo?(bkelly)
You could see if disabling their loading animation thing helps.  Perhaps they are hitting the main thread hard to do that animation.
(In reply to Ben Kelly [not reviewing due to deadline][:bkelly] from comment #6)
> (In reply to Cătălin Badea (:catalinb) from comment #5)
> > Right, but we may be able to improve on this by not throttling fetch
> > response runnables or channel reset runnables. Also, this might be a good
> > opportunity to write talos tests.
> > 
> > Could we skip the throttled queue in this case? Or give these runnables a
> > higher priority, not sure how it works.
> 
> You mean bypassing ThrottledEventQueue?  I don't think we should do that. 
> That only affects timing if the main thread is busy.  If the main thread is
> busy, then your not going to be getting good performance here anyway.
> 
> ThrottledEventQueue does not actually "throttle".  It just yields to other
> main thread work between each runnable.

Yes, but we're delaying the network request, which is not handled on the main thread. Maybe this can cause an overall longer loading time.
(In reply to Ben Kelly [not reviewing due to deadline][:bkelly] from comment #7)
> You could see if disabling their loading animation thing helps.  Perhaps
> they are hitting the main thread hard to do that animation.

I'll be sure to check that.
David, does your telemetry data include the client version? I wonder if there's any difference between firefox releases.
Flags: needinfo?(dzbarsky)
(In reply to Cătălin Badea (:catalinb) from comment #8)
> > ThrottledEventQueue does not actually "throttle".  It just yields to other
> > main thread work between each runnable.
> 
> Yes, but we're delaying the network request, which is not handled on the
> main thread. Maybe this can cause an overall longer loading time.

I don't think letting the worker jank the main thread is the solution.  Also, any benefit would be very racy and timing dependent anyway because you might end up getting scheduled behind the current work causing problems.

Lets see what other work is happening on the main thread that is conflicting with the SW runnables.
Btw, I'm glad to see that linkedin turned on cache-control:immutable.
After investigating the tests at https://github.com/samertm/firefox-sw-perf, it turns out using URL objects in the fetch handler is one of the main causes for the increased loading time. For the "/with_dbxsw" path, dropping the use of URL objects will speed the loading time from ~500ms to ~30ms.

It's interesting that the service worker will actually cause gecko to perform more font face style flushes, which will keep the main thread busy and cause URL syncloop requests to be even slower. I *think* the increase in style flushes is due to the increased time between when css resources are fetched and style operations not being coalesced.
Another path we probably want to optimze is event.respondWith(fetch(event.request)). While this will always add some overhead, I think a lot of people will end up doing:

if (request.url matches list_of_special_paths) {
 .. 
} else {
  event.respondWith(fetch(event.request));
}

On https://github.com/samertm/firefox-sw-perf , a service worker that just responds with a fetch on the same request, will bump the loading times from 30-40ms to 600ms. Chrome doesn't have this issue.
platform-rel: --- → ?
Whiteboard: [platform-rel-Dropbox]
Whiteboard: [platform-rel-Dropbox] → [platform-rel-Dropbox][qf]
Priority: -- → P1
Whiteboard: [platform-rel-Dropbox][qf] → [platform-rel-Dropbox][qf:p1]
platform-rel: ? → +
Dropbox dropped the use of URL object in their service workers, but still experience poor performance with Firefox (compared to chrome). I couldn't reproduce these new issues using the perf benchmark because the performance api is broken for synthesized responses (bug 1351521). From what I investigated some time ago, I couldn't find individual issues that we can fix at this time. I'd like to take another stab at this after we land parent interception, which would allow me to fix the benchmark.
Whiteboard: [platform-rel-Dropbox][qf:p1] → [platform-rel-Dropbox][qf:p2]
(I'm moving this to our P3 bucket because this bug in itself is not actionable at this point and depends on other work here)
Priority: P1 → P3
Keywords: perf
Re-evaluating in 2 months when Service Worker rewrite lands.
Whiteboard: [platform-rel-Dropbox][qf:p2] → [platform-rel-Dropbox][qf:p3]
Assignee: catalin.badea392 → nobody

Part of this is most likely that worker code uses throttled queue for any URL creation (used inside fetch() and Cache API) and also for initiating network connections.
Throttled queue should be used only for things like postMessage.

Whiteboard: [platform-rel-Dropbox][qf:p3] → [platform-rel-Dropbox][qf?]

Removing qf, since this is just a variant of bug 1522316.

Depends on: 1522316
Whiteboard: [platform-rel-Dropbox][qf?] → [platform-rel-Dropbox]

(In reply to Olli Pettay [:smaug] (PTO-ish Feb 16-23) from comment #18)

Part of this is most likely that worker code uses throttled queue for any URL creation (used inside fetch() and Cache API) and also for initiating network connections.
Throttled queue should be used only for things like postMessage.

So, I just looked into this because I was very confused because we made http/https creation bypass the main thread in bug 1344751. But it seems like Bug 1454656 regressed that as part of an attempt to unify/clean up URL code, causing us to have to consult the main thread in every case. Specifically, the hunk removal at https://hg.mozilla.org/mozilla-central/rev/b5051b2393f2#l5.272 was our fast-path.

Component: DOM → DOM: Core & HTML

Given new information, is this still a P3?

Flags: needinfo?(htsai)
Component: DOM: Core & HTML → DOM: Service Workers

Bug 1558923 could have helped here.

I'll defer to Jens as he is managing the team. :)

Flags: needinfo?(htsai)

So my understanding of the situation is:

  • The DropBox ServiceWorker uses (used?) URL a lot. (Which is reasonable.)
  • Main thread contention was delaying ServiceWorker-initiated fetches.

So we think the situation is that performance should have improved from the above. However, on nightly we also think that that the enabling of parent-intercept in bug 1456995 may potentially introduce additional latency which we are tracking in bug 1587759.

In general, it's definitely known that we will need to be focusing on addressing latency problems with ServiceWorkers in the near and medium term. So the big question is whether we have active contacts at dropbox who would like to work with us on optimizing our performance as it relates to dropbox specifically.

So, I am going to toggle the needinfo on dzbarsky. We'll resolve WFM if we don't hear back in a while.

Flags: needinfo?(dzbarsky)

Do you have any numbers on DropBox ServiceWorker performance under Firefox 69 release or the current 70 release? Thanks!

Flags: needinfo?(dzbarsky)

Thanks for the update Andrew.

We experimented with services workers when I first reported this, but ultimately did not end up using them in production (both due to Firefox perf issues and because Chrome didn't let us reliably update the worker script).

Happy to hear you've got improvements coming though!

Flags: needinfo?(dzbarsky)

Okay, I'm going to resolve this WORKSFORME because it seems there's no specific further actions to be taken here, but please feel free to reopen or file a new bug if you perform further ServiceWorker investigations involving Firefox. In the meantime we do know we have a lot of performance enhancements to work on and will be looking to get representative tests added to talos (automated performance regression tests).

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: