Clicking on a twitter link inside of a google doc opens an about:blank page without loading the twitter url
Categories
(Core :: DOM: Service Workers, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox82 | --- | fixed |
firefox83 | --- | fixed |
People
(Reporter: annyG, Assigned: asuth)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: regression)
Attachments
(3 files, 2 obsolete files)
47 bytes,
text/x-phabricator-request
|
jcristau
:
approval-mozilla-beta+
|
Details | Review |
47 bytes,
text/x-phabricator-request
|
jcristau
:
approval-mozilla-beta+
|
Details | Review |
47 bytes,
text/x-phabricator-request
|
jcristau
:
approval-mozilla-beta+
|
Details | Review |
This is with Fission enabled and I came across this issue quite a few times.
Steps: (ensure fission is enabled)
- Open https://docs.google.com/document/d/1WDpmoJnK_pPrWposjWfkZWSA1gzB4uC2Ft7_ehyWN18/edit
- Wait until the document has loaded, try clicking on the twitter link a few times (close the twitter tabs after you open them)
- If you haven't succeeded yet, try opening a few other sites, e.g. I've been opening a few https://web.telegram.org/ pages VERY QUICKLY (copy pasting the url and hitting enter fast) and closing them after they load, before doing the next step.
- Wait a few minutes, make sure there are no twitter processes in
about:process
and try clicking on the twitter link again.
Expected:
- A new tab opens with twitter url
Current:
- about:blank page opens and does not redirect us to twitter.com
Though this is quite annoying, it is the only way I have been able to reproduce.
I have used mozregression (for the first time ever) to try to determine if it's due to my patches in bug 1589102, but 2 out of the 4 times, mozregression showed me that it was this range https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=3ab251f67b71bfefc6f3aa29bea44f1d9b1de1a4&tochange=4ee63a7a01c558ab33cbe80885cf966630e27dc9 and not my patches.
Perhaps QA can retest this to verify if it is indeed the patches above that caused this? Or maybe it will turn out to be my patches after all.
Reporter | ||
Comment 1•4 years ago
|
||
Another way to check this, is for me to download the firefox binary that includes my changes, but excludes the commit range above, use it for a day or two and see if it happens with my patches as well. I'll do that in the meantime.
Comment 2•4 years ago
|
||
I'm seeing very similar symptoms. For me, this is also happening frequently with Reddit as well as Twitter. I also see a similar symptom where the page just doesn't load when typing the URL in the bar.
Comment 3•4 years ago
|
||
(For me, I was following links to Twitter from a message in Discord.)
Assignee | ||
Comment 4•4 years ago
|
||
:rpl added a bunch of "RemoteWorkerManager" MOZ_LOGging in those patches, it would be great if you could enable that for any runs where you haven't backed out those changes.
Reporter | ||
Comment 5•4 years ago
|
||
:mccr8 regarding pages not loading when typing the URL in the bar - I CC'ed you on bug 1662927, which I also filed.
Today I have been using firefox binary that only has my changes and none of the service worker changes and I haven't run into either of these issues.. Will report back tomorrow.
Comment 6•4 years ago
|
||
I've discussed this a bit on Matrix, so clearing my ni?
There's a chance this is being caused by a race when intercepting the twitter.com load due to a serviceworker, potentially related to the use of GetNewOrUsedBrowserProcessAsync
here: https://searchfox.org/mozilla-central/rev/b2716c233e9b4398fc5923cbe150e7f83c7c6c5b/dom/workers/remoteworkers/RemoteWorkerManager.cpp#621-627
Ideally, this sort of thing would be switched to use GetNewOrUsedLaunchingBrowserProcess
, AddKeepAlive
, and WaitForLaunchAsync
like is done by CanonicalBrowsingContext
here: https://searchfox.org/mozilla-central/rev/b2716c233e9b4398fc5923cbe150e7f83c7c6c5b/docshell/base/CanonicalBrowsingContext.cpp#1127-1144
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Comment 7•4 years ago
|
||
Andrew, please provide an estimate date for the patch in the Whiteboard field as M6b is nearing completion.
Comment 8•4 years ago
|
||
Updated•4 years ago
|
Comment 9•4 years ago
|
||
This is a WIP patch, and currently fails.
Assignee | ||
Comment 11•4 years ago
|
||
Our current understanding of this bug thanks to Nika's excellent test and investigation work is that the ServiceWorker's mechanism for enqueueing FetchEvent operations is deficient and has always been. Specifically, FetchEventOpParent::Initialize is assuming that the RemoteWorkerParent already exists when the request to send a fetch event reaches PBackground from the main-thread. However, the act of creating a RemoteWorkerParent is asynchronous when process launching is involved, which it frequently is in the fission case. (In the non-fission "web" case, there's usually already a "web" process.)
I'm going to provide a fix for this newly uncovered problem and then we will ideally land the whole stack. I'm clearing the needinfo so I can be re-needinfo'ed but this bug remains my top priority.
Comment 12•4 years ago
|
||
re-assigning back to asuth, as they're handling the changes to FetchEventOpParent
Comment 13•4 years ago
|
||
Andrew, any updates on the patch and an ETA? The experiment launch date is Oct 6 and we need fixes by Oct 1.
Assignee | ||
Comment 14•4 years ago
|
||
I have a fix locally that passes Nika's test (with a minor shutdown cleanup change; no need to try and manually shutdown the active worker) and the bug 1622451 verification test. My plan is to augment Nika's test to have a variant that creates an appropriate request body, clean up the fix patch a little more, then do some try pushes and ask for review.
Assignee | ||
Comment 15•4 years ago
|
||
Assignee | ||
Comment 16•4 years ago
|
||
moz-phab really didn't like my arc checkout of nika's test patch and my attempt to placate it and fell over in a "Validation Error" leaving my checkout without my test enhancement and fix patches. Unfortunately I'm using hg and have been historically burned by hg a lot so I'm not quite comfortable assuming that the hg bookmark mechanism works as reliably as git reflog
(hg bookmarks
shows I'm not on my bookmark, so presumably the stack is still there?) at this time of day, so I'll deal with this tomorrow, but I think https://github.com/mystor/phlay may have found a new user!
Try pushes which do have the test enhancements and fix safely preserved in the cloud:
- auto: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a10adc5beb7e4334eaa305849c7a3604e06aa463
- syntax "dom" tests plus manual WPT linux 64 coverage noting that the xpcshell test harness when used with "syntax" seems to no longer understand the concept of path filters for xpcshell, or at least not when the path itself is not a test directory containing an .ini file for xpcshell tests (I used "dom" because it was one of the examples in the
--help
docs after more targeted attempts failed): https://treeherder.mozilla.org/#/jobs?repo=try&revision=d36d1bdb6c2ba320b7b1205e7fb3c6b6bc7bf9f3
Assignee | ||
Comment 17•4 years ago
|
||
Depends on D91173
Assignee | ||
Comment 18•4 years ago
|
||
Excerpting some documentation I wrote up for an OVERVIEW.md for Service
Workers for this exact use situation below. The basic situation of this patch
is that we were trying to create the FetchEventOpProxyParent immediately
before the managing PRemoteWorker instance existed, which isn't a thing we can
do. Because FetchEvent ops are more complicated they require somewhat more
unique handling, but it should have been unified with the PendingOp
infrastructure and was not. The one notable thing going on actor-wise is that
the request body (if present) requires special RemoteLazyInputStream
serialization and this is something that can only be done once we have the
RemoteWorkerParent. See https://phabricator.services.mozilla.com/D73173 and
its commit message and my "Restating" blocks for more context.
Threads and Proxies
Main Thread
ServiceWorkerManager's authoritative ServiceWorker state lives on the main
thread in the parent process. This is due to a combination of legacy factors
and that currently the nsHttpChannel AsyncOpen logic where navigation
interception occurs must be on the main thread.
IPDL Background Thread
The IPDL Background Thread is the thread where PBackground parent actors are
created. Because IPDL actors are explicitly tied to the thread they are created
on and PBackground is the only top-level protocol created for use by DOM
Workers, this thread is the natural home for book-keeping and authoritative
state for APIs that are accessed via PBackground-managed protocols. For
example, IndexedDB and other QuotaManager-managed storage APIs.
The Remote Worker APIs are all PBackground managed and so all messages sent via
the Remote Worker API need to be sent from the IPDL Background thread.
Main Thread to IPDL Background Proxies
There are 2 ways to get data from the main thread to the IPDL Background thread:
either via direct runnable dispatch or via IPDL IPC. We use IPDL IPC (which
has optimizations for same-process communication).
The following interfaces exist exclusively to proxy requests from the
ServiceWorkerManager on the main thread to the IPDL Background thread.
PRemoteWorkerController
is a proxy wrapper around the
RemoteWorkerController
API exposed on the IPDL Background thread.PFetchEventOp
is paired withPFetchEventOpProxy
managed by the above
PRemoteWorkerController
.PFetchEventOp
gets the data to the IPDL
Background thread from the main thread, thenPFetchEventOpProxy
gets the
data down to the content process.
Non-Fetch ServiceWorker events AKA ExtendableEvents
How non-fetch events are dispatched to the serviceworker (including the IPC).
Because ServiceWorkers are intended to be shutdown and restarted on demand and
most event processing is asynchronous, there needs to be a way to track
outstanding requests and for content logic to indicate when it is done
processing requests. ExtendableEvent
s are the mechanism for this. A method
waitUntil(promise)
adds promises to be track as long as the event is still
"active".
This straightforward lifecycle means that these events map well to our IPC
async return value mechanism. This is used by
PRemoteWorker::ExecServiceWorkerOp
.
Fetch events and FetchEvent.respondWith()
FetchEvent
s have a different lifecycle and dataflow than regular
ExtendableEvents
. They expose a respondWith()
method that will eventually
resolve with a fetch Response
object that potentially needs to be propagated
before the ExtendableEvent is no longer active. (The response will never be
propagated after waitUntil()
settles because every call to respondWith()
implicitly calls waitUntil()
.)
From an IPC perspective, this means that FetchEvent
instances need their own
IPC actor rather than being able to depend on the async return value mechanism
of IPDL. That's why PFetchEventOpProxy
exists and is managed by
PRemoteWorker
.
Depends on D92020
Assignee | ||
Comment 19•4 years ago
|
||
It turns out moz-phab has its own mechanism of downloading patches from phabricator, nicely documented at https://github.com/mozilla-conduit/review#downloading-a-patch-from-phabricator so I should have done moz-phab patch D91173
. (My previous use of arc had come from previous uses of arc as directed by since-updated Firefox version control or conduit documentation but I'd never tried to submit any of the resulting applied patches, just use them.)
Unfortunately when submitting my stack which built on Nika's test this did permute the canonical test revision and its stack to no longer depend on Nika's provision fix. I think this might actually be desired, but it's weird. (And to its credit moz-phab did warn me about this, but after I panicked the first time after agreeing and hit control-c it left my hg bookmark pointing at the wrong thing, so I restored my state and just went ahead with the stack clobbering.)
Updated•4 years ago
|
Comment 20•4 years ago
|
||
This has been reported even without Fission so removing Fission milestone flag.
Comment 21•4 years ago
|
||
This is much worse with Fission so we do need to block M6b on this.
Comment 22•4 years ago
|
||
Comment 23•4 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/5572b2a892c9
https://hg.mozilla.org/mozilla-central/rev/c4920ccb9fbd
https://hg.mozilla.org/mozilla-central/rev/49008be99669
Assignee | ||
Comment 24•4 years ago
|
||
Comment on attachment 9178901 [details]
Bug 1662925 - Make SW fetch events properly queue. r=#dom-workers-and-storage
Beta/Release Uplift Approval Request
- User impact if declined: Websites that involve ServiceWorker interception may not properly launch on the first try if they are launching a content process to host the ServiceWorker. This is more of a major problem for fission, but the circumstances also seem to arise when normal e10s Firefox is first started and possibly also GeckoView.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): After all is said and done, this is a relatively straightforward life-cycle management bug that definitely was broken before. The underlying pending operation tracking has been exercising related paths for some time and has been refined and the MozPromises involved in all of this already existed before and had the same lifetime relationships.
We don't actually have to take the new test and its refinements, but it seems appropriate to help verify the fix.
- String changes made/needed: none
Assignee | ||
Updated•4 years ago
|
Comment 25•4 years ago
|
||
As we're about to build the last beta for 82, uplifting makes me nervous. is this a new issue, or somehow getting worse for users? if there are regressions from it, how likely are we to notice and link them back to that change within a couple of days?
Assignee | ||
Comment 26•4 years ago
•
|
||
This is not a new issue. It also is recoverable if the user goes to the URL bar and hits enter again, etc. That said, bug 1663125 which impacted Firefox 80 would have made the problem disappear because ServiceWorkers were not having their registrations loading from disk on startup. That regression was fixed late in the 81 cycle (81.0rc2) as well, so it might appear like a new regression for some set of users.
See the blockers and see also's for potentially related bugs. Note that there's a separate twitter issue which is more concerning and getting some awareness and that's the one that involves the corrupted content UI (bug 1665368). That is likely unrelated to this bug.
In terms of regressions, I think we'd notice it pretty quickly because the failure modes for a screw-up would be:
- Crash. We should see a ton of crashes on twitter.com for those that report the URL if so.
- Hangs forever / displays about:blank. This is actually the problem that the bug fixes, but if that didn't get fixed right or increased the incidence, I'd still expect us to hear about it because of twitter.
- Increased corrupted content error rate. We actually may see this in the success case because whatever's happening with the twitter SW throwing a rejection might happen more because the set of successfully launched ServiceWorkers with FetchEvents dispatched at them would increase.
Note again that the Twitter corrupted content error (bug 1665368) is a separate bug I'm working on a fix for and that one is definitely not suitable for immediate uplift. So maybe it makes sense to wait to for that fix to land and both to ride the beta train. I just wanted to make sure this potential fix was known to release management for consideration.
Comment 27•4 years ago
|
||
Comment on attachment 9178901 [details]
Bug 1662925 - Make SW fetch events properly queue. r=#dom-workers-and-storage
Thanks for the details Andrew. Let's get this one in 82.0b9.
Updated•4 years ago
|
Updated•4 years ago
|
Comment 28•4 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/3e19be3e6824
https://hg.mozilla.org/releases/mozilla-beta/rev/8c783a21273a
https://hg.mozilla.org/releases/mozilla-beta/rev/bba66a0ea401
Comment 31•4 years ago
|
||
This was caused by the changes in bug 1626362. I didn't see it in the regressed by field.
Updated•4 years ago
|
Updated•4 years ago
|
Comment 34•4 years ago
|
||
Comment on attachment 9178652 [details]
Bug 1662925 - Doc improvements for Remote and Shared workers.
Revision D91887 was moved to bug 1701726. Setting attachment 9178652 [details] to obsolete.
Updated•3 years ago
|
Description
•