ServiceWorker failure for drive.google.com which breaks Google Drive/Docs Labeling sidebar
Categories
(Core :: DOM: Service Workers, defect, P2)
Tracking
()
People
(Reporter: asuth, Assigned: asuth)
References
Details
In a slack thread there were reports of a problem with the "Labels" sidebar mechanism in Google Docs which after brief investigation appeared to correlate with ServiceWorkers and in particular NS_ERROR_INTERCEPTION_FAILED
errors on subresource fetches (which we don't have any current mitigations for). After additional investigation over Zoom and local reproduction, it appears that the problem occurs only on Release (Firefox 122) but not Nightly (Firefox 124).
Complicating matters, if devtools is connected to the ServiceWorker when the fault occurs, devtools disconnects for some reason and this induces the ServiceWorker to terminate.
Using the Profiler we can see an "unhandledrejection" event firing, but cannot see its content. Interestingly, an "updatefound" event is also being fired.
I plan to continue my investigation tomorrow with the primary goal of identifying the unhandledrejection and why nightly doesn't care.
Assignee | ||
Comment 1•1 year ago
|
||
The steady state problem appears to be that the ServiceWorker is trying to use an IndexedDB db "ODPMC" and create a transaction that involves the object store "versions", but there is no such object store in the db. In fact, there are no object stores. But there should be 3: "versions" and "modules" introduced at version 1 and "cssModules" introduced at version 2.
The most likely explanations for problems like this are either:
- A bugged onupgradeneeded implementation.
- The global gets torn down but to IDB it looks like an onupgradeneeded implementation that did nothing. ServiceWorker lifecycle complexities potentially exacerbate this situation.
Using the profiler ended up being lower yield than I'd hoped so I ended up using pernosco (and needing to set up my new machine for pernosco). I'm going to grab a pernosco trace for the initial SW install to try and lock down what's happening.
Assignee | ||
Comment 2•1 year ago
|
||
Okay, so there appear to be 2 things going on here:
- We fail the "install" job which causes us to terminate the worker
- The timing works out such that when we get to
BackgroundDatabaseChild::RecvPBackgroundIDBVersionChangeTransactionConstructor
we have already done WorkerPrivate::NotifyInternal(Canceling) which has disconnected the IDBOpenDBRequest DETH which has cleared out its mListenerManager which means that when we go to dispatch the "upgradeneeded" event we find no event listeners rather than trying to call the listener and getting an error from CallbackObject.
The "install" job rejection that seems to cause the problem seems to have a stack of:
0 Ph/e.g</e.h<(h = "Error: Ia`Cache got opaque response with bad status 0 while trying to add request ") ["https://drive.google.com/_/scs/drive-static/_/js/k=boq-drive.OdpOfflineServiceWorker.en.HQ4o0-AvYsA.es5.O/am=wA/d=1/rs=AH299oe2AeNqnDEqX3D5q4B-njY4UifxXA/m=base":112:502]
this = null
1 $h(a = "[object Object]", b = "3", c = "Error: Ia`Cache got opaque response with bad status 0 while trying to add request ") ["https://drive.google.com/_/scs/drive-static/_/js/k=boq-drive.OdpOfflineServiceWorker.en.HQ4o0-AvYsA.es5.O/am=wA/d=1/rs=AH299oe2AeNqnDEqX3D5q4B-njY4UifxXA/m=base":115:438]
2 Wh(a = "null", b = "[object Object]", c = "3", d = "Error: Ia`Cache got opaque response with bad status 0 while trying to add request ") ["https://drive.google.com/_/scs/drive-static/_/js/k=boq-drive.OdpOfflineServiceWorker.en.HQ4o0-AvYsA.es5.O/am=wA/d=1/rs=AH299oe2AeNqnDEqX3D5q4B-njY4UifxXA/m=base":115:295]
3 U.prototype.s() ["https://drive.google.com/_/scs/drive-static/_/js/k=boq-drive.OdpOfflineServiceWorker.en.HQ4o0-AvYsA.es5.O/am=wA/d=1/rs=AH299oe2AeNqnDEqX3D5q4B-njY4UifxXA/m=base":115:178]
this = [object Object]
4 Ch("undefined") ["https://drive.google.com/_/scs/drive-static/_/js/k=boq-drive.OdpOfflineServiceWorker.en.HQ4o0-AvYsA.es5.O/am=wA/d=1/rs=AH299oe2AeNqnDEqX3D5q4B-njY4UifxXA/m=base":109:511]
That we try and move forward with the version change transaction is somewhat surprising because we intend to handle this through the call to !EnsureDOMObject(). In particular, we would hope the !factory.GetParentObject() check would detect this. But IDBFactory is not a DOMEventTargetHelper subclass; it only clears mGlobal in IDBFactory::DisconnectFromGlobal but this is only called from nsGlobalWindowInner and never on workers. The good news is that :saschanaz introduced GlobalTeardownObserver for cases like this, and we can use that to replace the window-specific logic IDBFactory::DisconnectFromGlobal. I will file a specific IDB bug for this since there's also the first case going on in this bug.
Assignee | ||
Comment 3•1 year ago
|
||
I filed bug 1879259 for the IDB issue. I will finish looking into the "install" job issue tomorrow with the pernosco traces I uploaded for this. (Unfortunately there is auth data in the pernosco trace that cannot be shared, so I can't link to the pernosco traces.)
Assignee | ||
Comment 4•1 year ago
•
|
||
The source of the opaque response that is emitting the error message we see in the stack in the ServiceWorker seems to be the code:
c=a.C.map(function(e){e=new Request(e,{mode:"no-cors",credentials:"include"});return b.add(e)});
where the first failure case comes from fetching the URL "https://ssl.gstatic.com/docs/doclist/images/empty_state_details.png" which is doomed to fail because it's a cross-origin no-cors request which means the response will be opaque which means Cache.add is required to fail because an opaque filtered response is defined to have a status of 0 and step 5.7.1 of addAll is "or response’s status is not an ok status...reject responsePromise with a TypeError."
This should cause problems in all web browsers, so I wonder if we're being served something distinct.
Assignee | ||
Comment 5•1 year ago
|
||
We are being served different ServiceWorkers but difftastic is suggesting most of the differences are due to the closure compiler-allocated identifiers being shuffled without major changes. We don't really have great tooling for this otherwise so I'm going to do a repro under rr/pernosco.
Assignee | ||
Comment 6•1 year ago
|
||
This is reproducing for me on nightly now, although it seems like there are potentially some races in terms of whether it's perceived on first invocation of "File... Labels" or not. Given that my goal in reproducing was to see the SW present in "serviceworker.txt", I suspect the cases where it seemed to work are likely cases where the SW failed the install and we did not retry the install so the iframe did not get intercepted.
I'm going to work up a fix for the IDB issue in bug 1879259 and mail the Google list about the apparent misuse of Cache.add.
Assignee | ||
Comment 7•9 months ago
|
||
This was fixed by my fix in 1879259.
Description
•