GetUsage takes too much time on shutdown
Categories
(Core :: Storage: Quota Manager, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox68 | --- | fixed |
People
(Reporter: tt, Assigned: tt)
References
Details
Attachments
(5 files)
If we try/catch [1], there is an edge case that we never call the callback for this function and thus cause the problem.
I also suspect we have the same problem with all the caller using PQuota protocol.
Jan and baku, please correct me or elaborate more if I miss something. Thanks!
Assignee | ||
Comment 1•5 years ago
|
||
Andrea, do you happen to have a step to reproduce or a crash report about the case?
In the 5 most recent reports, they [1-5] all seem to be congested in the QuotaManager IO thread and they are getting the usages. I was thinking the problem is that QuotaManager/QuotaManagerService somehow has never answered the callback to the callee of the getUsageForPrincipal (even after getting the usage). So, I'm wondering if I misunderstood something. Please correct me if I was wrong. Thanks!
[1] https://crash-stats.mozilla.com/report/index/c492faf9-7ba7-4e08-95b5-0d6a40190227#allthreads
[2] https://crash-stats.mozilla.com/report/index/f36f1656-30a7-4ecd-8cc8-b5cdf0190227#allthreads
[3] https://crash-stats.mozilla.com/report/index/98dbd1eb-1754-4812-ae91-4b0940190227#allthreads
[4] https://crash-stats.mozilla.com/report/index/c8273dbb-d27a-4062-9c06-c38010190227#allthreads
[5] https://crash-stats.mozilla.com/report/index/48108423-a259-4770-b765-721f70190227#allthreads
Comment 2•5 years ago
|
||
The problem I see is that the callback is not executed:
https://hg.mozilla.org/mozilla-central/annotate/110ea2a7c3d4f34b5079c195f7ea57966748e6da/browser/modules/Sanitizer.jsm#l750
Sanitizer.jsm blocks the shutdown, and after 1 minute, firefox crashes and we have those crash reports.
If you see the metadata of them, you see the last 'progress' step:
https://crash-stats.mozilla.com/report/index/c492faf9-7ba7-4e08-95b5-0d6a40190227#tab-metadata
For instance, the first one has:
{"phase":"Places Clients shutdown","conditions":[{"name":"sanitize.js: Sanitize on shutdown","state":{"progress":{"isShutdown":true,"advancement":"get-principals","step":"principals-quota-manager"}},"filename":"resource:///modules/Sanitizer.jsm","lineNumber":152,"stack":["resource:///modules/Sanitizer.jsm:onStartup:152"]}]}
The last step for all of them is "principals-quota-manager" instead of "principals-quota-manager-getUsage" (or something else).
Comment 3•5 years ago
|
||
And note that this is all happening during the "profile-change-teardown" observer stage of shutdown, so QuotaManager itself should not have begun to shutdown. (The order is: "profile-change-net-teardown", "profile-change-teardown", "profile-before-change", "profile-before-change-qm", "profile-before-change-telemetry". "Places Client shutdown" is the ClientsShutdownBlocker
's label, which is added as a blocker[1] to nsIAsyncShutdown.profileChangeTeardown[2].)
1: https://searchfox.org/mozilla-central/rev/2a6f3dde00801374d3b2a704232de54a132af389/toolkit/components/places/Database.cpp#488
https://searchfox.org/mozilla-central/rev/2a6f3dde00801374d3b2a704232de54a132af389/toolkit/components/places/Database.cpp#398
2: https://searchfox.org/mozilla-central/rev/2a6f3dde00801374d3b2a704232de54a132af389/toolkit/components/asyncshutdown/nsIAsyncShutdown.idl#188
Assignee | ||
Comment 4•5 years ago
|
||
(In reply to Andrea Marchesini [:baku] from comment #2)
Thanks for the explanation!
Combined this with reports I read, I think the problem is that it takes too much time between calling getUsage() and getting its callback. Note that from those reports QuotaManager is traversing the profile and is calculating usage.
There are a few things came to my mind:
-
While executing GetUsageOp in QuotaManager, we, in fact, traverse the disk twice if the profile hasn't been initialized. I think if we can optimize this case, we could probably mitigate this issue a bit. (Also, it seems that if it's not initialized yet, we perhaps don't need to initialize QM in this case because it implies QM hasn't written anything yet.)
-
Does "Sanitizer.jsm" really need
GetUsage
[1] (GetUsageForPrincipal
[2] can be an alternative solution if we just need to get usage, but the downside is that result might be out-dated)? The difference is that GetUsage actually traverses the disk and GetUsageForPrincipal just get the in-memory data recorded in QuotaManager. I guess it's probably related to the needs for getAllPrincipalsInternal(), I will check the code.
[1] https://searchfox.org/mozilla-central/rev/2a6f3dde00801374d3b2a704232de54a132af389/dom/quota/nsIQuotaManagerService.idl#66
[2] https://searchfox.org/mozilla-central/rev/2a6f3dde00801374d3b2a704232de54a132af389/dom/quota/nsIQuotaManagerService.idl#84
Assignee | ||
Comment 5•5 years ago
|
||
(In reply to Andrew Sutherland [:asuth] from comment #3)
Thanks for the note! That could explain why some reports show their QM are initializing or getting the usage and verify that we don't need to consider time racing of events' order.
And note that this is all happening during the "profile-change-teardown" observer stage of shutdown, so QuotaManager itself should not have begun to shutdown. (The order is: "profile-change-net-teardown", "profile-change-teardown", "profile-before-change", "profile-before-change-qm", "profile-before-change-telemetry". "Places Client shutdown" is the
ClientsShutdownBlocker
's label, which is added as a blocker[1] to nsIAsyncShutdown.profileChangeTeardown[2].)1: https://searchfox.org/mozilla-central/rev/2a6f3dde00801374d3b2a704232de54a132af389/toolkit/components/places/Database.cpp#488
https://searchfox.org/mozilla-central/rev/2a6f3dde00801374d3b2a704232de54a132af389/toolkit/components/places/Database.cpp#398
2: https://searchfox.org/mozilla-central/rev/2a6f3dde00801374d3b2a704232de54a132af389/toolkit/components/asyncshutdown/nsIAsyncShutdown.idl#188
Assignee | ||
Comment 6•5 years ago
|
||
(In reply to Tom Tung [:tt, :ttung] from comment #4)
- While executing GetUsageOp in QuotaManager, we, in fact, traverse the disk twice if the profile hasn't been initialized. I think if we can optimize this case, we could probably mitigate this issue a bit. (Also, it seems that if it's not initialized yet, we perhaps don't need to initialize QM in this case because it implies QM hasn't written anything yet.)
Correcting myself, GetUsageOp doesn't traverse the storage directory twice [1].
[1] https://searchfox.org/mozilla-central/source/dom/quota/ActorsParent.cpp#6612
Assignee | ||
Comment 7•5 years ago
|
||
Andrea, I'm thinking to customize a function for getAllPrincipals
only, because the reports show that QuotaManager IO thread was either getting usage or initializing. I guess the fix for the crashes that QM io thread is getting usage should be doing something to shorten the time of getUsage
or create another simpler method to achieve the same goal.
I'm not familiar with Sanitizer.jsm, so I might misunderstand something. Could you check if the following things are correct or not? Thanks!
-
getAllPrincipalsInternal
wants to get all the principals from QuotaManager only. It seems that the usage of each origin is not important here. -
getAllPrincipals
is used while we want to clean the session only storage. If it's correct, can I assume that QM doesn't need to return any principal if the QuotaManager is not initialized yet (that implies there is no any quota-client has written anything into the disk yet)?
Comment 8•5 years ago
|
||
getAllPrincipalsInternal
wants to get all the principals from QuotaManager only. It seems that the usage of each origin is not important here.
Correct.
getAllPrincipals
is used while we want to clean the session only storage. If it's correct, can I assume that QM doesn't need to return any principal if the QuotaManager is not initialized yet (that implies there is no any quota-client has written anything into the disk yet)?
Correct.
Assignee | ||
Comment 9•5 years ago
|
||
Assignee | ||
Comment 10•5 years ago
|
||
Depends on D21723
Assignee | ||
Comment 11•5 years ago
|
||
Depends on D21724
Assignee | ||
Comment 12•5 years ago
|
||
Note to myself:
I also want to get the time of getUsage. I guess it should check:
- Time for waiting for being executed from the child process. (I wonder if it often take noticeable time. e.g. blocking by initialization or something else)
- Time for getting all the usage. (This should take a lot of time since it touches the disk and doing IO)
- Time for calling back.
So that we could have more concept about the average time it takes from the method is called to the callback is called of getting usage.
Assignee | ||
Comment 13•5 years ago
|
||
The idea of this patch is to track how long does it takes to complete getUsage.
Currently, it divides getUsage into four pharses:
- Time between sending ipc from the child process and receiving in the parent
process. - Time between contructing the GetUsageOp on the background thread and start to
traverse the storage. - Time for traversing the storage.
- Time for transferring the data(an array of originUsage) back to the original
process.
Comment 14•5 years ago
|
||
Driving by with a quick note that this new data collection will require Data Collection Review before it lands.
Assignee | ||
Comment 15•5 years ago
|
||
(In reply to Chris H-C :chutten from comment #14)
Driving by with a quick note that this new data collection will require Data Collection Review before it lands.
Will do. Thanks for the notice!
I haven't done that because I'm not sure whether is the time I measured in the patch is appropriate or not. (e.g. granularity is proper or not) Will do that after making sure of that.
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 16•5 years ago
|
||
Assignee | ||
Comment 17•5 years ago
|
||
Comment 18•5 years ago
|
||
Pushed by shes050117@gmail.com: https://hg.mozilla.org/integration/autoland/rev/7bf89a70ca20 P0 - Change some code to align the clang format; r=janv https://hg.mozilla.org/integration/autoland/rev/203d88b18106 P1 - Have a method to get origins only on QuotaManagerService; r=asuth https://hg.mozilla.org/integration/autoland/rev/f6bd764df03f P2 - Have a test to verify listInitializedOrigins; r=baku https://hg.mozilla.org/integration/autoland/rev/7e8e7504a696 P3 - Use the listInitializedOrigins on Sanitizer.jsm; r=baku
Comment 19•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/7bf89a70ca20
https://hg.mozilla.org/mozilla-central/rev/203d88b18106
https://hg.mozilla.org/mozilla-central/rev/f6bd764df03f
https://hg.mozilla.org/mozilla-central/rev/7e8e7504a696
Description
•