Open Bug 1871215 Opened 9 months ago Updated 8 months ago

Add a mechanism to nsIQuotaManagerService to wedge threads until a promise is resolved to create edge cases related to IPC timing

Categories

(Core :: Storage: Quota Manager, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: asuth, Assigned: aiunusov)

References

Details

For bug 1870594 in its comment 2 I propose we might be able to exercise an edge case if we had a mechanism for blocking various Quota Client threads in a controlled manner and we then caused an abrupt termination of a content process that resembled a crash/teardown of the content process without the global being fully and cleanly torn down. (For the Cache API, it's the case that a SharedWorker or ServiceWorker will actually have the most gracious shutdown, whereas if the last window for an origin is being closed, the window and any of its dedicated workers will likely not get a chance to fully shutdown in the way the Cache API wants.)

My rough proposal would be something like:

  • Add a method nsIQuotaRequest blockQuotaClientThreadUntilPromiseResolve(in ACString aPersistenceType, in nsIPrincipal aPrincipal, in AString aClientType, in AString aClientArg, in Promise aPromise) to nsIQuotaManagerService where:
    • aPersistenceType, aPrincipal let us identify the storage key / directory lock that needs to be acquired
    • In the future we'd also potentially have AString aBucketName or something like that too.
    • aClientType identifies the storage endpoint / quota client
    • aClientArg lets the caller provide additional data to the quota client in case there may be multiple threads per storage key, like for IDB we need to specify the database name. For Cache API we don't need the arg.
    • aPromise is the promise that we wait on to unblock the thread.
    • The return value is an nsIQuotaRequest for consistency because there is potential for an async error where maybe the Quota Client is supposed to error if there isn't actually an existing thread.
  • Plumb that call from QuotaManagerService through PQuota into "Quota" the QuotaParent on PBackground and its QuotaManager glue.
    • Because the main thread ends up needing to send both the initial request to block plus a subsequent, associated, "hey, stop blocking", this probably makes sense to be a sub-actor so that nsIQMS sends the actor constructor when it wants to start the blocking and it sends the __delete__ when it wants the blocking to stop.
  • Add a method to quota::Client that propagates effectively the same information. I suppose this might have a signature like PersistenceType aPersistenceType, const OriginMetadata& aOriginMetadata, const nsACString& aClientArg, nsIRunnable aTheRunnableToDispatchThatDoesTheBlocking)? :janv might have a better idea for this.
  • Implement the method for the Cache API that looks up the right manager and right thread and dispatches the runnable at the thread.
  • The runnable would look something like the mozStorage ThreadWedger where the runnable just blocks until a monitor notifies it to wake up. (Note that there may be a bug in that wedger if the reentrant monitor is allowed to spuriously wake up like pthread_cond_wait, I think the safest idiom is normally to use a "while" on the flag variable.)

The specific structure of a test like this would be to write a browser mochitest. Pseudocode would be something like:

  • Create a tab in the origin in question
  • Use ContentTask.spawn to open a Cache API cache in the tab, waiting for the open to complete.
  • Invoke the nsIQMS API to wedge the Cache API
  • Use ContextTask.spawn to perform the Cache API operations in the tab that we think might create the interesting edge cases. (We can just try everything, really.) We do need to wait for the calls to be issued, but obviously we shouldn't wait for their resolution, as we'd just hang.
  • (hand-waving) terminate/forcibly shutdown the process and verify we know the channel was torn down; ContentParent::ActorDestroy does send an "ipc:content-shutdown" observer notification which is at least one option. The key thing to this shutdown is that the globals should not get a chance to tear down normally.
  • open a new tab for the origin in question because that old one is now sad.
  • Use ContextTask.spawn to open up a Cache API cache for the origin in the tab, but for a different cache name, waiting for the open to complete. This should provide sufficient ordering guarantees that the manager processed the queue backlog and PBackground had processed all the channel teardown stuff.
  • Close the tab, maybe waiting for that process to disappear too. This should have cleaned things up.
  • The Cache API QuotaClient state is either now permanently broken or not. Maybe we would want to augment nsIQuotaManagerService to dump its state in a way that we can introspect, similar to what we end up generating for crash reports when we hang.
Assignee: nobody → aiunusov
You need to log in before you can comment on or make changes to this bug.