Closed Bug 1529301 Opened 5 years ago Closed 5 years ago

GetUsage takes too much time on shutdown

Categories

(Core :: Storage: Quota Manager, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox68 --- fixed

People

(Reporter: tt, Assigned: tt)

References

Details

Attachments

(5 files)

If we try/catch [1], there is an edge case that we never call the callback for this function and thus cause the problem.

I also suspect we have the same problem with all the caller using PQuota protocol.

Jan and baku, please correct me or elaborate more if I miss something. Thanks!

[1] https://searchfox.org/mozilla-central/rev/93905b660fc99a5d52b683690dd26471daca08c8/dom/quota/QuotaManagerService.cpp#530

Andrea, do you happen to have a step to reproduce or a crash report about the case?

In the 5 most recent reports, they [1-5] all seem to be congested in the QuotaManager IO thread and they are getting the usages. I was thinking the problem is that QuotaManager/QuotaManagerService somehow has never answered the callback to the callee of the getUsageForPrincipal (even after getting the usage). So, I'm wondering if I misunderstood something. Please correct me if I was wrong. Thanks!

[1] https://crash-stats.mozilla.com/report/index/c492faf9-7ba7-4e08-95b5-0d6a40190227#allthreads
[2] https://crash-stats.mozilla.com/report/index/f36f1656-30a7-4ecd-8cc8-b5cdf0190227#allthreads
[3] https://crash-stats.mozilla.com/report/index/98dbd1eb-1754-4812-ae91-4b0940190227#allthreads
[4] https://crash-stats.mozilla.com/report/index/c8273dbb-d27a-4062-9c06-c38010190227#allthreads
[5] https://crash-stats.mozilla.com/report/index/48108423-a259-4770-b765-721f70190227#allthreads

Flags: needinfo?(amarchesini)

The problem I see is that the callback is not executed:
https://hg.mozilla.org/mozilla-central/annotate/110ea2a7c3d4f34b5079c195f7ea57966748e6da/browser/modules/Sanitizer.jsm#l750

Sanitizer.jsm blocks the shutdown, and after 1 minute, firefox crashes and we have those crash reports.

If you see the metadata of them, you see the last 'progress' step:
https://crash-stats.mozilla.com/report/index/c492faf9-7ba7-4e08-95b5-0d6a40190227#tab-metadata

For instance, the first one has:
{"phase":"Places Clients shutdown","conditions":[{"name":"sanitize.js: Sanitize on shutdown","state":{"progress":{"isShutdown":true,"advancement":"get-principals","step":"principals-quota-manager"}},"filename":"resource:///modules/Sanitizer.jsm","lineNumber":152,"stack":["resource:///modules/Sanitizer.jsm:onStartup:152"]}]}

The last step for all of them is "principals-quota-manager" instead of "principals-quota-manager-getUsage" (or something else).

Flags: needinfo?(amarchesini)

And note that this is all happening during the "profile-change-teardown" observer stage of shutdown, so QuotaManager itself should not have begun to shutdown. (The order is: "profile-change-net-teardown", "profile-change-teardown", "profile-before-change", "profile-before-change-qm", "profile-before-change-telemetry". "Places Client shutdown" is the ClientsShutdownBlocker's label, which is added as a blocker[1] to nsIAsyncShutdown.profileChangeTeardown[2].)

1: https://searchfox.org/mozilla-central/rev/2a6f3dde00801374d3b2a704232de54a132af389/toolkit/components/places/Database.cpp#488
https://searchfox.org/mozilla-central/rev/2a6f3dde00801374d3b2a704232de54a132af389/toolkit/components/places/Database.cpp#398
2: https://searchfox.org/mozilla-central/rev/2a6f3dde00801374d3b2a704232de54a132af389/toolkit/components/asyncshutdown/nsIAsyncShutdown.idl#188

(In reply to Andrea Marchesini [:baku] from comment #2)
Thanks for the explanation!

Combined this with reports I read, I think the problem is that it takes too much time between calling getUsage() and getting its callback. Note that from those reports QuotaManager is traversing the profile and is calculating usage.

There are a few things came to my mind:

  • While executing GetUsageOp in QuotaManager, we, in fact, traverse the disk twice if the profile hasn't been initialized. I think if we can optimize this case, we could probably mitigate this issue a bit. (Also, it seems that if it's not initialized yet, we perhaps don't need to initialize QM in this case because it implies QM hasn't written anything yet.)

  • Does "Sanitizer.jsm" really need GetUsage [1] (GetUsageForPrincipal [2] can be an alternative solution if we just need to get usage, but the downside is that result might be out-dated)? The difference is that GetUsage actually traverses the disk and GetUsageForPrincipal just get the in-memory data recorded in QuotaManager. I guess it's probably related to the needs for getAllPrincipalsInternal(), I will check the code.

[1] https://searchfox.org/mozilla-central/rev/2a6f3dde00801374d3b2a704232de54a132af389/dom/quota/nsIQuotaManagerService.idl#66
[2] https://searchfox.org/mozilla-central/rev/2a6f3dde00801374d3b2a704232de54a132af389/dom/quota/nsIQuotaManagerService.idl#84

Summary: GetUsageForPrincipal might create problem on shutdown → GetUsage takes too much time on shutdown

(In reply to Andrew Sutherland [:asuth] from comment #3)
Thanks for the note! That could explain why some reports show their QM are initializing or getting the usage and verify that we don't need to consider time racing of events' order.

And note that this is all happening during the "profile-change-teardown" observer stage of shutdown, so QuotaManager itself should not have begun to shutdown. (The order is: "profile-change-net-teardown", "profile-change-teardown", "profile-before-change", "profile-before-change-qm", "profile-before-change-telemetry". "Places Client shutdown" is the ClientsShutdownBlocker's label, which is added as a blocker[1] to nsIAsyncShutdown.profileChangeTeardown[2].)

1: https://searchfox.org/mozilla-central/rev/2a6f3dde00801374d3b2a704232de54a132af389/toolkit/components/places/Database.cpp#488
https://searchfox.org/mozilla-central/rev/2a6f3dde00801374d3b2a704232de54a132af389/toolkit/components/places/Database.cpp#398
2: https://searchfox.org/mozilla-central/rev/2a6f3dde00801374d3b2a704232de54a132af389/toolkit/components/asyncshutdown/nsIAsyncShutdown.idl#188

(In reply to Tom Tung [:tt, :ttung] from comment #4)

  • While executing GetUsageOp in QuotaManager, we, in fact, traverse the disk twice if the profile hasn't been initialized. I think if we can optimize this case, we could probably mitigate this issue a bit. (Also, it seems that if it's not initialized yet, we perhaps don't need to initialize QM in this case because it implies QM hasn't written anything yet.)

Correcting myself, GetUsageOp doesn't traverse the storage directory twice [1].

[1] https://searchfox.org/mozilla-central/source/dom/quota/ActorsParent.cpp#6612

Andrea, I'm thinking to customize a function for getAllPrincipals only, because the reports show that QuotaManager IO thread was either getting usage or initializing. I guess the fix for the crashes that QM io thread is getting usage should be doing something to shorten the time of getUsage or create another simpler method to achieve the same goal.

I'm not familiar with Sanitizer.jsm, so I might misunderstand something. Could you check if the following things are correct or not? Thanks!

  • getAllPrincipalsInternal wants to get all the principals from QuotaManager only. It seems that the usage of each origin is not important here.

  • getAllPrincipals is used while we want to clean the session only storage. If it's correct, can I assume that QM doesn't need to return any principal if the QuotaManager is not initialized yet (that implies there is no any quota-client has written anything into the disk yet)?

Flags: needinfo?(amarchesini)
  • getAllPrincipalsInternal wants to get all the principals from QuotaManager only. It seems that the usage of each origin is not important here.

Correct.

  • getAllPrincipals is used while we want to clean the session only storage. If it's correct, can I assume that QM doesn't need to return any principal if the QuotaManager is not initialized yet (that implies there is no any quota-client has written anything into the disk yet)?

Correct.

Flags: needinfo?(amarchesini)

Note to myself:

I also want to get the time of getUsage. I guess it should check:

  • Time for waiting for being executed from the child process. (I wonder if it often take noticeable time. e.g. blocking by initialization or something else)
  • Time for getting all the usage. (This should take a lot of time since it touches the disk and doing IO)
  • Time for calling back.

So that we could have more concept about the average time it takes from the method is called to the callback is called of getting usage.

The idea of this patch is to track how long does it takes to complete getUsage.
Currently, it divides getUsage into four pharses:

  • Time between sending ipc from the child process and receiving in the parent
    process.
  • Time between contructing the GetUsageOp on the background thread and start to
    traverse the storage.
  • Time for traversing the storage.
  • Time for transferring the data(an array of originUsage) back to the original
    process.

Driving by with a quick note that this new data collection will require Data Collection Review before it lands.

(In reply to Chris H-C :chutten from comment #14)

Driving by with a quick note that this new data collection will require Data Collection Review before it lands.

Will do. Thanks for the notice!

I haven't done that because I'm not sure whether is the time I measured in the patch is appropriate or not. (e.g. granularity is proper or not) Will do that after making sure of that.

Attachment #9047759 - Attachment description: Bug 1529301 - P2 - Have a test to verify getOrigin; → Bug 1529301 - P2 - Have a test to verify listInitializedOrigins;
Attachment #9047760 - Attachment description: Bug 1529301 - P3 - Use the getOrigin on Sanitizer.jsm; → Bug 1529301 - P3 - Use the listInitializedOrigins on Sanitizer.jsm;
Pushed by shes050117@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/7bf89a70ca20
P0 - Change some code to align the clang format; r=janv
https://hg.mozilla.org/integration/autoland/rev/203d88b18106
P1 - Have a method to get origins only on QuotaManagerService; r=asuth
https://hg.mozilla.org/integration/autoland/rev/f6bd764df03f
P2 - Have a test to verify listInitializedOrigins; r=baku
https://hg.mozilla.org/integration/autoland/rev/7e8e7504a696
P3 - Use the listInitializedOrigins on Sanitizer.jsm; r=baku
Regressions: 1584986
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: