Closed Bug 1877619 Opened 1 year ago Closed 9 months ago

ServiceWorker failure for drive.google.com which breaks Google Drive/Docs Labeling sidebar

Categories

(Core :: DOM: Service Workers, defect, P2)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: asuth, Assigned: asuth)

References

Details

In a slack thread there were reports of a problem with the "Labels" sidebar mechanism in Google Docs which after brief investigation appeared to correlate with ServiceWorkers and in particular NS_ERROR_INTERCEPTION_FAILED errors on subresource fetches (which we don't have any current mitigations for). After additional investigation over Zoom and local reproduction, it appears that the problem occurs only on Release (Firefox 122) but not Nightly (Firefox 124).

Complicating matters, if devtools is connected to the ServiceWorker when the fault occurs, devtools disconnects for some reason and this induces the ServiceWorker to terminate.

Using the Profiler we can see an "unhandledrejection" event firing, but cannot see its content. Interestingly, an "updatefound" event is also being fired.

I plan to continue my investigation tomorrow with the primary goal of identifying the unhandledrejection and why nightly doesn't care.

The steady state problem appears to be that the ServiceWorker is trying to use an IndexedDB db "ODPMC" and create a transaction that involves the object store "versions", but there is no such object store in the db. In fact, there are no object stores. But there should be 3: "versions" and "modules" introduced at version 1 and "cssModules" introduced at version 2.

The most likely explanations for problems like this are either:

  • A bugged onupgradeneeded implementation.
  • The global gets torn down but to IDB it looks like an onupgradeneeded implementation that did nothing. ServiceWorker lifecycle complexities potentially exacerbate this situation.

Using the profiler ended up being lower yield than I'd hoped so I ended up using pernosco (and needing to set up my new machine for pernosco). I'm going to grab a pernosco trace for the initial SW install to try and lock down what's happening.

Okay, so there appear to be 2 things going on here:

  1. We fail the "install" job which causes us to terminate the worker
  2. The timing works out such that when we get to BackgroundDatabaseChild::RecvPBackgroundIDBVersionChangeTransactionConstructor we have already done WorkerPrivate::NotifyInternal(Canceling) which has disconnected the IDBOpenDBRequest DETH which has cleared out its mListenerManager which means that when we go to dispatch the "upgradeneeded" event we find no event listeners rather than trying to call the listener and getting an error from CallbackObject.

The "install" job rejection that seems to cause the problem seems to have a stack of:

0 Ph/e.g</e.h<(h = "Error: Ia`Cache got opaque response with bad status 0 while trying to add request ") ["https://drive.google.com/_/scs/drive-static/_/js/k=boq-drive.OdpOfflineServiceWorker.en.HQ4o0-AvYsA.es5.O/am=wA/d=1/rs=AH299oe2AeNqnDEqX3D5q4B-njY4UifxXA/m=base":112:502]
    this = null
1 $h(a = "[object Object]", b = "3", c = "Error: Ia`Cache got opaque response with bad status 0 while trying to add request ") ["https://drive.google.com/_/scs/drive-static/_/js/k=boq-drive.OdpOfflineServiceWorker.en.HQ4o0-AvYsA.es5.O/am=wA/d=1/rs=AH299oe2AeNqnDEqX3D5q4B-njY4UifxXA/m=base":115:438]
2 Wh(a = "null", b = "[object Object]", c = "3", d = "Error: Ia`Cache got opaque response with bad status 0 while trying to add request ") ["https://drive.google.com/_/scs/drive-static/_/js/k=boq-drive.OdpOfflineServiceWorker.en.HQ4o0-AvYsA.es5.O/am=wA/d=1/rs=AH299oe2AeNqnDEqX3D5q4B-njY4UifxXA/m=base":115:295]
3 U.prototype.s() ["https://drive.google.com/_/scs/drive-static/_/js/k=boq-drive.OdpOfflineServiceWorker.en.HQ4o0-AvYsA.es5.O/am=wA/d=1/rs=AH299oe2AeNqnDEqX3D5q4B-njY4UifxXA/m=base":115:178]
    this = [object Object]
4 Ch("undefined") ["https://drive.google.com/_/scs/drive-static/_/js/k=boq-drive.OdpOfflineServiceWorker.en.HQ4o0-AvYsA.es5.O/am=wA/d=1/rs=AH299oe2AeNqnDEqX3D5q4B-njY4UifxXA/m=base":109:511]

That we try and move forward with the version change transaction is somewhat surprising because we intend to handle this through the call to !EnsureDOMObject(). In particular, we would hope the !factory.GetParentObject() check would detect this. But IDBFactory is not a DOMEventTargetHelper subclass; it only clears mGlobal in IDBFactory::DisconnectFromGlobal but this is only called from nsGlobalWindowInner and never on workers. The good news is that :saschanaz introduced GlobalTeardownObserver for cases like this, and we can use that to replace the window-specific logic IDBFactory::DisconnectFromGlobal. I will file a specific IDB bug for this since there's also the first case going on in this bug.

Depends on: 1879259

I filed bug 1879259 for the IDB issue. I will finish looking into the "install" job issue tomorrow with the pernosco traces I uploaded for this. (Unfortunately there is auth data in the pernosco trace that cannot be shared, so I can't link to the pernosco traces.)

The source of the opaque response that is emitting the error message we see in the stack in the ServiceWorker seems to be the code:

c=a.C.map(function(e){e=new Request(e,{mode:"no-cors",credentials:"include"});return b.add(e)});

where the first failure case comes from fetching the URL "https://ssl.gstatic.com/docs/doclist/images/empty_state_details.png" which is doomed to fail because it's a cross-origin no-cors request which means the response will be opaque which means Cache.add is required to fail because an opaque filtered response is defined to have a status of 0 and step 5.7.1 of addAll is "or response’s status is not an ok status...reject responsePromise with a TypeError."

This should cause problems in all web browsers, so I wonder if we're being served something distinct.

We are being served different ServiceWorkers but difftastic is suggesting most of the differences are due to the closure compiler-allocated identifiers being shuffled without major changes. We don't really have great tooling for this otherwise so I'm going to do a repro under rr/pernosco.

This is reproducing for me on nightly now, although it seems like there are potentially some races in terms of whether it's perceived on first invocation of "File... Labels" or not. Given that my goal in reproducing was to see the SW present in "serviceworker.txt", I suspect the cases where it seemed to work are likely cases where the SW failed the install and we did not retry the install so the iframe did not get intercepted.

I'm going to work up a fix for the IDB issue in bug 1879259 and mail the Google list about the apparent misuse of Cache.add.

Summary: Reproducible ServiceWorker failure for drive.google.com on Release (Firefox 122) but not nightly (Firefox 124.0a1) which breaks Google Drive/Docs Labeling sidebar; devtools disconnects on fault → ServiceWorker failure for drive.google.com which breaks Google Drive/Docs Labeling sidebar

This was fixed by my fix in 1879259.

Status: ASSIGNED → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.