1139513 - Warn and gather data if ServiceWorker hits max workers per domain limit

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Reporter

Description

•

10 years ago

This should be a fairly easy bug requiring adding one telemetry probe and some warnings to RuntimeService::RegisterWorker(). 1) Warn to window console or browser console (when no window) if a ServiceWorker spawn gets queued due to hitting max workers per domain limit. 2) Add a Telemetry probe to count when a ServiceWorker hits the limit. We may also need another probe to count that a SW was spawned, so we get a percentage of failures, unless metrics has some other way to determine this. Andrea, would you be willing to mentor since step 1 is the big piece and you know about it?

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Reporter

Updated

•

10 years ago

Blocks: ServiceWorkers-v1

Flags: needinfo?(amarchesini)

Whiteboard: [good first bug][lang=C++]

José Antonio Olivera Ortega [:jaoo]

Assignee

Comment 1

•

10 years ago

Go for it!

Assignee: nobody → josea.olivera

Noemí Freire (:noemi)

Updated

•

10 years ago

Status: NEW → ASSIGNED

Andrea Marchesini [:baku]

Updated

•

10 years ago

Mentor: amarchesini

Flags: needinfo?(amarchesini)

José Antonio Olivera Ortega [:jaoo]

Assignee

Comment 2

•

10 years ago

Attached patch WIP (obsolete) — Details — Splinter Review

Just show a warning in the web console when a ServiceWorker spawn gets queued.

José Antonio Olivera Ortega [:jaoo]

Assignee

Comment 3

•

10 years ago

Attached patch WIP (obsolete) — Details — Splinter Review

Added telemetry probes for service worker spawns. I'm planing to add a couple of more probes for the number of domains and the number of domains hitting the limit.

Attachment #8576971 - Attachment is obsolete: true

José Antonio Olivera Ortega [:jaoo]

Assignee

Comment 4

•

10 years ago

Attached patch WIP (obsolete) — Details — Splinter Review

(In reply to José Antonio Olivera Ortega [:jaoo] from comment #3) > I'm planing to add a > couple of more probes for the number of domains and the number of domains > hitting the limit. Probes for the hit-the-limit thing I commented in the last version of the patch.

José Antonio Olivera Ortega [:jaoo]

Assignee

Updated

•

10 years ago

Attachment #8577501 - Attachment is obsolete: true

José Antonio Olivera Ortega [:jaoo]

Assignee

Comment 5

•

10 years ago

Comment on attachment 8578731 [details] [diff] [review] WIP Andrea, I would be nice to have some feedback at this point. Would you mind to have a look a it? Thanks! PS. When you are back from your holidays! ;)

Attachment #8578731 - Flags: feedback?(amarchesini)

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Reporter

Comment 6

•

10 years ago

Comment on attachment 8578731 [details] [diff] [review] WIP Review of attachment 8578731 [details] [diff] [review]: ----------------------------------------------------------------- Patch wide fixes: 1) You need to call telemetry only when it is a service worker, and not for both service and shared workers. 2) Always AssertIsOnMainThread() before calling Telemetry, unless the containing block already does so. ::: dom/workers/RuntimeService.cpp @@ +1420,5 @@ > aWorkerPrivate->IsServiceWorker(); > if (isSharedOrServiceWorker) { > AssertIsOnMainThread(); > > + Telemetry::Accumulate(Telemetry::SERVICE_WORKER_SPAWN_ATTEMPS, 1); Only for serviceworkers (use aWorkerPrivate->IsServiceWorker()), don't count shared workers. @@ +1455,5 @@ > > domainInfo = new WorkerDomainInfo(); > domainInfo->mDomain = domain; > + if (isSharedOrServiceWorker && !domainInfo->mDomainTracked) { > + Telemetry::Accumulate(Telemetry::DOMAIN_WITH_SERVICE_WORKERS, 1); Nit: Whitespace. @@ +1473,5 @@ > + // ServiceWorker spawn gets queued due to hitting max workers per domain > + // limit so let's log a warning. > + // Note: aWorkerPrivate->GetDocument() call might result nullptr due to > + // no window so the message warning will show up in the browser console. > + nsContentUtils::ReportToConsole(nsIScriptError::warningFlag, Before you call this, AssertIsOnMainThread(). @@ +1489,5 @@ > else if (parent) { > domainInfo->mChildWorkerCount++; > + if (isSharedOrServiceWorker) { > + Telemetry::Accumulate(Telemetry::SERVICE_WORKER_WAS_SPAWNED, 1); > + } SWs can never have a parent (yet), so this block is not needed @@ +1495,5 @@ > else { > domainInfo->mActiveWorkers.AppendElement(aWorkerPrivate); > + if (isSharedOrServiceWorker) { > + Telemetry::Accumulate(Telemetry::SERVICE_WORKER_WAS_SPAWNED, 1); > + } If you put this at the end of RegisterWorker, right before |return true;| then you will only count workers that really did get a thread created for them, which is what we want. @@ +1515,5 @@ > } > > // From here on out we must call UnregisterWorker if something fails! > + // TODO: Should we decrease the counters of service workers spawns if the > + // worker ends up unregistered? You don't need this if you move the probe as suggested above. ::: dom/workers/RuntimeService.h @@ +47,5 @@ > nsTArray<WorkerPrivate*> mQueuedWorkers; > nsClassHashtable<nsCStringHashKey, SharedWorkerInfo> mSharedWorkerInfos; > uint32_t mChildWorkerCount; > + bool mDomainTracked; > + bool mDomainHitMaxWorkersLimitTracked; You set these to true but never use them. Is there a follow up? I don't think you need to preserve this state. ::: toolkit/components/telemetry/Histograms.json @@ +7678,5 @@ > "expires_in_version": "never", > "kind": "count", > "description": "Count plugin hang notices in e10s" > + }, > + "SERVICE_WORKER_SPAWN_ATTEMPS": { Nit: ATTEMPTS @@ +7681,5 @@ > + }, > + "SERVICE_WORKER_SPAWN_ATTEMPS": { > + "expires_in_version": "never", > + "kind": "count", > + "description": "Tracking ServiceWorker spawn attemps" description should be "Count attempts to spawn a ServiceWorker for a domain"

Attachment #8578731 - Flags: feedback?(amarchesini)

José Antonio Olivera Ortega [:jaoo]

Assignee

Comment 7

•

10 years ago

Attached patch WIP (obsolete) — Details — Splinter Review

(In reply to Nikhil Marathe [:nsm] (needinfo? please) from comment #6) > Comment on attachment 8578731 [details] [diff] [review] > WIP > > Review of attachment 8578731 [details] [diff] [review]: > ----------------------------------------------------------------- Thanks for the review! > Patch wide fixes: > 1) You need to call telemetry only when it is a service worker, and not for > both service and shared workers. > 2) Always AssertIsOnMainThread() before calling Telemetry, unless the > containing block already does so. Done! > ::: dom/workers/RuntimeService.h > @@ +47,5 @@ > > nsTArray<WorkerPrivate*> mQueuedWorkers; > > nsClassHashtable<nsCStringHashKey, SharedWorkerInfo> mSharedWorkerInfos; > > uint32_t mChildWorkerCount; > > + bool mDomainTracked; > > + bool mDomainHitMaxWorkersLimitTracked; > > You set these to true but never use them. Is there a follow up? I don't > think you need to preserve this state. I use them. There is no such follow up work. They are useful to have a way to avoid reporting twice the same domain hit the limit. I'll ping you today so we could go over the patch and see whether this domain probes are useful or no.

Attachment #8578731 - Attachment is obsolete: true

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Reporter

Comment 8

•

10 years ago

Comment on attachment 8579318 [details] [diff] [review] WIP Review of attachment 8579318 [details] [diff] [review]: ----------------------------------------------------------------- The reason I'm not in favour of tracking it across domains is that it doesn't give us much more useful information. We measure "X SWs hit the limit out of Y attempts". "A domains hit the limit out of Z domains" does not add much to it. Also, using booleans like this is wrong because: 1) user visits example.com, spawns various workers flips counts and booleans etc. So telemetry is bumped up once. 2) user closes all tabs for example.com, at this point runtime service removes the domain info from domain map 3) step 1 happens again. Which means we measure how many times the domaininfo is created and then sw is spawned and hits the limit. the domain infos don't persist, another reason why I think having this isn't useful. ::: dom/workers/RuntimeService.cpp @@ +1458,5 @@ > NS_ASSERTION(!parent, "Shouldn't have a parent here!"); > > domainInfo = new WorkerDomainInfo(); > domainInfo->mDomain = domain; > + if (isServiceWorker && !domainInfo->mDomainTracked) { mDomainTracked will always be false here since it was just created. ::: dom/workers/RuntimeService.h @@ +46,5 @@ > nsTArray<WorkerPrivate*> mActiveWorkers; > nsTArray<WorkerPrivate*> mQueuedWorkers; > nsClassHashtable<nsCStringHashKey, SharedWorkerInfo> mSharedWorkerInfos; > uint32_t mChildWorkerCount; > + bool mDomainTracked; if you decide to keep it, rename to mTelemetryUsesServiceWorker. @@ +47,5 @@ > nsTArray<WorkerPrivate*> mQueuedWorkers; > nsClassHashtable<nsCStringHashKey, SharedWorkerInfo> mSharedWorkerInfos; > uint32_t mChildWorkerCount; > + bool mDomainTracked; > + bool mDomainHitMaxWorkersLimitTracked; similarly, mTelemetryHitMaxWorkersLimit.

José Antonio Olivera Ortega [:jaoo]

Assignee

Comment 9

•

10 years ago

Attached patch v1 (obsolete) — Details — Splinter Review

(In reply to Nikhil Marathe [:nsm] (needinfo? please) from comment #8) > Comment on attachment 8579318 [details] [diff] [review] > WIP > > Review of attachment 8579318 [details] [diff] [review]: > ----------------------------------------------------------------- Thanks for explain your reason in detail. > The reason I'm not in favour of tracking it across domains is that it > doesn't give us much more useful information. Let's remove the domain related probes then. I guess this's ready for review. Would you mind please? Thanks!

Attachment #8579318 - Attachment is obsolete: true

Attachment #8581514 - Flags: review?(nsm.nikhil)

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Reporter

Comment 10

•

10 years ago

Comment on attachment 8581514 [details] [diff] [review] v1 Review of attachment 8581514 [details] [diff] [review]: ----------------------------------------------------------------- looks good. Needs worker peer review. ::: dom/locales/en-US/chrome/dom/dom.properties @@ +160,5 @@ > # LOCALIZATION NOTE: Do not translate "IndexedDB". > IndexedDBTransactionAbortNavigation=An IndexedDB transaction that was not yet complete has been aborted due to page navigation. > # LOCALIZATION NOTE (WillChangeBudgetWarning): Do not translate Will-change, %1$S,%2$S,%3$S are numbers. > WillChangeBudgetWarning=Will-change memory consumption is too high. Surface area covers %1$S pixels, budget is the document surface area multiplied by %2$S (%3$S pixels). All occurences of will-change in the document are ignored when over budget. > +HittingMaxWorkersPerDomain=ServiceWorker spawn gets queued due to hitting max workers per domain limit. Nit: s/gets/got

Attachment #8581514 - Flags: review?(nsm.nikhil) → review?(amarchesini)

Andrea Marchesini [:baku]

Comment 11

•

10 years ago

Comment on attachment 8581514 [details] [diff] [review] v1 Review of attachment 8581514 [details] [diff] [review]: ----------------------------------------------------------------- lgtm but before landing I want to know if we really need these 2 additional telemetry logs. ::: dom/workers/RuntimeService.cpp @@ +27,5 @@ > #include "jsfriendapi.h" > #include "mozilla/ArrayUtils.h" > #include "mozilla/CycleCollectedJSRuntime.h" > #include "mozilla/TimeStamp.h" > +#include "mozilla/Telemetry.h" alphabetic order. @@ +1418,5 @@ > > + bool isServiceWorker = aWorkerPrivate->IsServiceWorker(); > + if (isServiceWorker) { > + AssertIsOnMainThread(); > + Telemetry::Accumulate(Telemetry::SERVICE_WORKER_SPAWN_ATTEMPTS, 1); I don't think we want this. Have you spoken with nsm about this? @@ +1551,5 @@ > } > > + if (isServiceWorker) { > + AssertIsOnMainThread(); > + Telemetry::Accumulate(Telemetry::SERVICE_WORKER_WAS_SPAWNED, 1); Same for this. ::: toolkit/components/telemetry/Histograms.json @@ +7688,5 @@ > + "expires_in_version": "never", > + "kind": "count", > + "description": "Count ServiceWorkers that really did get a thread created for them" > + }, > + "SERVICE_WORKER_SPAWN_GETS_QUEUED": { Actually, this is the only telemetry is required for this bug, right? The other 2 are not related to this bug: I don't think we want to know how many SWs are created.

Attachment #8581514 - Flags: review?(amarchesini) → review+

Andrea Marchesini [:baku]

Comment 12

•

10 years ago

> looks good. Needs worker peer review. nsm, what do you think about these 2 additional telemetry logs? attempt/spawned?

Andrea Marchesini [:baku]

Comment 13

•

10 years ago

Comment on attachment 8581514 [details] [diff] [review] v1 Review of attachment 8581514 [details] [diff] [review]: ----------------------------------------------------------------- bent, we need a review from a worker peer. The patch looks good to me.

Attachment #8581514 - Flags: feedback?(bent.mozilla)

José Antonio Olivera Ortega [:jaoo]

Assignee

Comment 14

•

10 years ago

(In reply to Andrea Marchesini (:baku) from comment #12) > > looks good. Needs worker peer review. > > nsm, what do you think about these 2 additional telemetry logs? > attempt/spawned?

Flags: needinfo?(nsm.nikhil)

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Reporter

Comment 15

•

10 years ago

(In reply to Andrea Marchesini (:baku) from comment #12) > > looks good. Needs worker peer review. > > nsm, what do you think about these 2 additional telemetry logs? > attempt/spawned? I'd like to have it in to get a good idea of the relative number of SWs that get queued. That way, if we see a large percentage of queues, we will have to look at alternate strategies for SWs like a thread pool and aggressive termination. If it is fairly small, we can just tweak the limits a little. :jaoo feel free to land once :bent approves. Thanks for the patch!

Flags: needinfo?(nsm.nikhil)

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 16

•

10 years ago

Comment on attachment 8581514 [details] [diff] [review] v1 Review of attachment 8581514 [details] [diff] [review]: ----------------------------------------------------------------- This looks great, thanks! I just have two suggestions: ::: dom/locales/en-US/chrome/dom/dom.properties @@ +160,5 @@ > # LOCALIZATION NOTE: Do not translate "IndexedDB". > IndexedDBTransactionAbortNavigation=An IndexedDB transaction that was not yet complete has been aborted due to page navigation. > # LOCALIZATION NOTE (WillChangeBudgetWarning): Do not translate Will-change, %1$S,%2$S,%3$S are numbers. > WillChangeBudgetWarning=Will-change memory consumption is too high. Surface area covers %1$S pixels, budget is the document surface area multiplied by %2$S (%3$S pixels). All occurences of will-change in the document are ignored when over budget. > +HittingMaxWorkersPerDomain=ServiceWorker spawn gets queued due to hitting max workers per domain limit. Nit: I think this would be better: "A ServiceWorker could not be started immediately because other documents in the same origin are already using the maximum number of workers. The ServiceWorker is now queued and will be started after some of the other workers have completed." And please mark with # LOCALIZATION NOTE: Do not translate "ServiceWorker". so that our localizers know that "ServiceWorker" is something special. ::: dom/workers/RuntimeService.cpp @@ +1415,5 @@ > } > > nsCString sharedWorkerScriptSpec; > > + bool isServiceWorker = aWorkerPrivate->IsServiceWorker(); Nit: Make this const

Attachment #8581514 - Flags: feedback?(bent.mozilla) → feedback+

José Antonio Olivera Ortega [:jaoo]

Assignee

Comment 17

•

10 years ago

Attached patch v2 (obsolete) — Details — Splinter Review

Addresses review comments from comment 11 and comment 16. Ben, would mind to have a look a it again please? Thank! PS. Should we request review at someone from the telemetry team?

Attachment #8581514 - Attachment is obsolete: true

Attachment #8583046 - Flags: review?(bent.mozilla)

Ben Turner (not reading bugmail, use the needinfo flag!)

Updated

•

10 years ago

Attachment #8583046 - Flags: review?(bent.mozilla) → review+

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 18

•

10 years ago

(In reply to José Antonio Olivera Ortega [:jaoo] from comment #17) > PS. Should we request review at someone from the telemetry team? If you'd like, but I think this is fine. Thanks!

José Antonio Olivera Ortega [:jaoo]

Assignee

Comment 19

•

10 years ago

Comment on attachment 8583046 [details] [diff] [review] v2 I'd like :rvitillo to have a look at the patch so he could let's know if this is ready to go.

Attachment #8583046 - Flags: review?(rvitillo)

Roberto Agostino Vitillo (:rvitillo)

Comment 20

•

10 years ago

Ben, could you confirm that this probe should never expire? Just making sure "never" is not misused as a default value.

Flags: needinfo?(bent.mozilla)

José Antonio Olivera Ortega [:jaoo]

Assignee

Comment 21

•

10 years ago

I went ahead and pushed this to try. Find the results at https://treeherder.mozilla.org/#/jobs?repo=try&revision=9899f1f39e47 please.

Nikhil Marathe [:nsm] (No longer reading bugmail, please needinfo?)

Reporter

Comment 22

•

10 years ago

(In reply to rvitillo@mozilla.com from comment #20) > Ben, could you confirm that this probe should never expire? Just making sure > "never" is not misused as a default value. Good point, something like 50 should be reasonable.

Flags: needinfo?(bent.mozilla)

José Antonio Olivera Ortega [:jaoo]

Assignee

Comment 23

•

10 years ago

Attached patch v3 — Details — Splinter Review

Setting expire to gecko 50. Carrying out r=bent. Thanks!

Attachment #8583046 - Attachment is obsolete: true

Attachment #8583046 - Flags: review?(rvitillo)

Attachment #8583305 - Flags: review?(rvitillo)

Roberto Agostino Vitillo (:rvitillo)

Updated

•

10 years ago

Attachment #8583305 - Flags: review?(rvitillo) → review+

José Antonio Olivera Ortega [:jaoo]

Assignee

Comment 24

•

10 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=b3c166fa508a

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Comment 25

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/3604d6055fe5

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Comment 26

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/3604d6055fe5

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

status-firefox39: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla39

WIP 10 years ago José Antonio Olivera Ortega [:jaoo] 3.19 KB, patch		Details \| Diff \| Splinter Review
WIP 10 years ago José Antonio Olivera Ortega [:jaoo] 7.20 KB, patch		Details \| Diff \| Splinter Review
WIP 10 years ago José Antonio Olivera Ortega [:jaoo] 8.85 KB, patch		Details \| Diff \| Splinter Review
WIP 10 years ago José Antonio Olivera Ortega [:jaoo] 8.30 KB, patch		Details \| Diff \| Splinter Review
v1 10 years ago José Antonio Olivera Ortega [:jaoo] 6.35 KB, patch	baku : review+ bent.mozilla : feedback+	Details \| Diff \| Splinter Review
v2 10 years ago José Antonio Olivera Ortega [:jaoo] 6.53 KB, patch	bent.mozilla : review+	Details \| Diff \| Splinter Review
v3 10 years ago José Antonio Olivera Ortega [:jaoo] 6.53 KB, patch	rvitillo : review+	Details \| Diff \| Splinter Review