Closed Bug 1662925 Opened 4 years ago Closed 4 years ago

Clicking on a twitter link inside of a google doc opens an about:blank page without loading the twitter url

Tracking

()

Status:

RESOLVED FIXED

Milestone:

83 Branch

Project Flags:

Fission Milestone

M6b

Tracking Flags:

Tracking

Status

firefox-esr78

---

unaffected

firefox82

---

fixed

firefox83

---

fixed

People

(Reporter: annyG, Assigned: asuth)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(3 files, 2 obsolete files)

Bug 1662925 - Hold a keepalive while waiting for worker process launch, 4 years ago Nika Layzell [:nika] (ni? for response) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1662925 - Part 1: Test fetch intercepting spawning a serviceworker process, 4 years ago Nika Layzell [:nika] (ni? for response) 47 bytes, text/x-phabricator-request	jcristau : approval-mozilla-beta+	Details \| Review
Bug 1662925 - Doc improvements for Remote and Shared workers. 4 years ago Andrew Sutherland [:asuth] (he/him) 47 bytes, text/x-phabricator-request		Details \| Review
Bug 1662925 - Test enhancements. r=nika 4 years ago Andrew Sutherland [:asuth] (he/him) 47 bytes, text/x-phabricator-request	jcristau : approval-mozilla-beta+	Details \| Review
Bug 1662925 - Make SW fetch events properly queue. r=#dom-workers-and-storage 4 years ago Andrew Sutherland [:asuth] (he/him) 47 bytes, text/x-phabricator-request	jcristau : approval-mozilla-beta+	Details \| Review

Anny Gakhokidze [:annyG]

Reporter

Description

•

4 years ago

This is with Fission enabled and I came across this issue quite a few times.

Steps: (ensure fission is enabled)

Open https://docs.google.com/document/d/1WDpmoJnK_pPrWposjWfkZWSA1gzB4uC2Ft7_ehyWN18/edit
Wait until the document has loaded, try clicking on the twitter link a few times (close the twitter tabs after you open them)
If you haven't succeeded yet, try opening a few other sites, e.g. I've been opening a few https://web.telegram.org/ pages VERY QUICKLY (copy pasting the url and hitting enter fast) and closing them after they load, before doing the next step.
Wait a few minutes, make sure there are no twitter processes in about:process and try clicking on the twitter link again.

Expected:

A new tab opens with twitter url

Current:

about:blank page opens and does not redirect us to twitter.com

Though this is quite annoying, it is the only way I have been able to reproduce.

I have used mozregression (for the first time ever) to try to determine if it's due to my patches in bug 1589102, but 2 out of the 4 times, mozregression showed me that it was this range https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=3ab251f67b71bfefc6f3aa29bea44f1d9b1de1a4&tochange=4ee63a7a01c558ab33cbe80885cf966630e27dc9 and not my patches.

Perhaps QA can retest this to verify if it is indeed the patches above that caused this? Or maybe it will turn out to be my patches after all.

Flags: needinfo?(nkochar)

Flags: needinfo?(nika)

Anny Gakhokidze [:annyG]

Reporter

Comment 1

•

4 years ago

Another way to check this, is for me to download the firefox binary that includes my changes, but excludes the commit range above, use it for a day or two and see if it happens with my patches as well. I'll do that in the meantime.

Andrew McCreight [:mccr8]

Comment 2

•

4 years ago

I'm seeing very similar symptoms. For me, this is also happening frequently with Reddit as well as Twitter. I also see a similar symptom where the page just doesn't load when typing the URL in the bar.

Andrew McCreight [:mccr8]

Comment 3

•

4 years ago

(For me, I was following links to Twitter from a message in Discord.)

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 4

•

4 years ago

:rpl added a bunch of "RemoteWorkerManager" MOZ_LOGging in those patches, it would be great if you could enable that for any runs where you haven't backed out those changes.

Anny Gakhokidze [:annyG]

Reporter

Comment 5

•

4 years ago

:mccr8 regarding pages not loading when typing the URL in the bar - I CC'ed you on bug 1662927, which I also filed.

Today I have been using firefox binary that only has my changes and none of the service worker changes and I haven't run into either of these issues.. Will report back tomorrow.

Nika Layzell [:nika] (ni? for response)

Comment 6

•

4 years ago

I've discussed this a bit on Matrix, so clearing my ni?

There's a chance this is being caused by a race when intercepting the twitter.com load due to a serviceworker, potentially related to the use of GetNewOrUsedBrowserProcessAsync here: https://searchfox.org/mozilla-central/rev/b2716c233e9b4398fc5923cbe150e7f83c7c6c5b/dom/workers/remoteworkers/RemoteWorkerManager.cpp#621-627

Ideally, this sort of thing would be switched to use GetNewOrUsedLaunchingBrowserProcess, AddKeepAlive, and WaitForLaunchAsync like is done by CanonicalBrowsingContext here: https://searchfox.org/mozilla-central/rev/b2716c233e9b4398fc5923cbe150e7f83c7c6c5b/docshell/base/CanonicalBrowsingContext.cpp#1127-1144

Flags: needinfo?(nika)

Andrew McCreight [:mccr8]

Updated

•

4 years ago

Keywords: regression

Neha Kochar [:neha]

Updated

•

4 years ago

Fission Milestone: ? → M6b

Component: DOM: Navigation → DOM: Service Workers

Flags: needinfo?(nkochar) → needinfo?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Assignee

Updated

•

4 years ago

Assignee: nobody → bugmail

Status: NEW → ASSIGNED

Flags: needinfo?(bugmail)

David Teller [:Yoric] - still alive but not very active

Updated

•

4 years ago

Blocks: 1663737

Neha Kochar [:neha]

Comment 7

•

4 years ago

Andrew, please provide an estimate date for the patch in the Whiteboard field as M6b is nearing completion.

Flags: needinfo?(bugmail)

Whiteboard: ETA: ?

Nika Layzell [:nika] (ni? for response)

Comment 8

•

4 years ago

Attached file Bug 1662925 - Hold a keepalive while waiting for worker process launch, (obsolete) — Details

Neha Kochar [:neha]

Updated

•

4 years ago

Assignee: bugmail → nika

Whiteboard: ETA: ?

Nika Layzell [:nika] (ni? for response)

Comment 9

•

4 years ago

Attached file Bug 1662925 - Part 1: Test fetch intercepting spawning a serviceworker process, — Details

This is a WIP patch, and currently fails.

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 11

•

4 years ago

Our current understanding of this bug thanks to Nika's excellent test and investigation work is that the ServiceWorker's mechanism for enqueueing FetchEvent operations is deficient and has always been. Specifically, FetchEventOpParent::Initialize is assuming that the RemoteWorkerParent already exists when the request to send a fetch event reaches PBackground from the main-thread. However, the act of creating a RemoteWorkerParent is asynchronous when process launching is involved, which it frequently is in the fission case. (In the non-fission "web" case, there's usually already a "web" process.)

I'm going to provide a fix for this newly uncovered problem and then we will ideally land the whole stack. I'm clearing the needinfo so I can be re-needinfo'ed but this bug remains my top priority.

Flags: needinfo?(bugmail)

Nika Layzell [:nika] (ni? for response)

Comment 12

•

4 years ago

re-assigning back to asuth, as they're handling the changes to FetchEventOpParent

Assignee: nika → bugmail

Neha Kochar [:neha]

Comment 13

•

4 years ago

Andrew, any updates on the patch and an ETA? The experiment launch date is Oct 6 and we need fixes by Oct 1.

Flags: needinfo?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 14

•

4 years ago

I have a fix locally that passes Nika's test (with a minor shutdown cleanup change; no need to try and manually shutdown the active worker) and the bug 1622451 verification test. My plan is to augment Nika's test to have a variant that creates an appropriate request body, clean up the fix patch a little more, then do some try pushes and ask for review.

Flags: needinfo?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 15

•

4 years ago

Attached file Bug 1662925 - Doc improvements for Remote and Shared workers. (obsolete) — Details

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 16

•

4 years ago

moz-phab really didn't like my arc checkout of nika's test patch and my attempt to placate it and fell over in a "Validation Error" leaving my checkout without my test enhancement and fix patches. Unfortunately I'm using hg and have been historically burned by hg a lot so I'm not quite comfortable assuming that the hg bookmark mechanism works as reliably as git reflog (hg bookmarks shows I'm not on my bookmark, so presumably the stack is still there?) at this time of day, so I'll deal with this tomorrow, but I think https://github.com/mystor/phlay may have found a new user!

Try pushes which do have the test enhancements and fix safely preserved in the cloud:

auto: https://treeherder.mozilla.org/#/jobs?repo=try&revision=a10adc5beb7e4334eaa305849c7a3604e06aa463
syntax "dom" tests plus manual WPT linux 64 coverage noting that the xpcshell test harness when used with "syntax" seems to no longer understand the concept of path filters for xpcshell, or at least not when the path itself is not a test directory containing an .ini file for xpcshell tests (I used "dom" because it was one of the examples in the --help docs after more targeted attempts failed): https://treeherder.mozilla.org/#/jobs?repo=try&revision=d36d1bdb6c2ba320b7b1205e7fb3c6b6bc7bf9f3

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 17

•

4 years ago

Attached file Bug 1662925 - Test enhancements. r=nika — Details

Depends on D91173

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 18

•

4 years ago

Attached file Bug 1662925 - Make SW fetch events properly queue. r=#dom-workers-and-storage — Details

Excerpting some documentation I wrote up for an OVERVIEW.md for Service
Workers for this exact use situation below. The basic situation of this patch
is that we were trying to create the FetchEventOpProxyParent immediately
before the managing PRemoteWorker instance existed, which isn't a thing we can
do. Because FetchEvent ops are more complicated they require somewhat more
unique handling, but it should have been unified with the PendingOp
infrastructure and was not. The one notable thing going on actor-wise is that
the request body (if present) requires special RemoteLazyInputStream
serialization and this is something that can only be done once we have the
RemoteWorkerParent. See https://phabricator.services.mozilla.com/D73173 and
its commit message and my "Restating" blocks for more context.

Threads and Proxies

Main Thread

ServiceWorkerManager's authoritative ServiceWorker state lives on the main
thread in the parent process. This is due to a combination of legacy factors
and that currently the nsHttpChannel AsyncOpen logic where navigation
interception occurs must be on the main thread.

IPDL Background Thread

The IPDL Background Thread is the thread where PBackground parent actors are
created. Because IPDL actors are explicitly tied to the thread they are created
on and PBackground is the only top-level protocol created for use by DOM
Workers, this thread is the natural home for book-keeping and authoritative
state for APIs that are accessed via PBackground-managed protocols. For
example, IndexedDB and other QuotaManager-managed storage APIs.

The Remote Worker APIs are all PBackground managed and so all messages sent via
the Remote Worker API need to be sent from the IPDL Background thread.

Main Thread to IPDL Background Proxies

There are 2 ways to get data from the main thread to the IPDL Background thread:
either via direct runnable dispatch or via IPDL IPC. We use IPDL IPC (which
has optimizations for same-process communication).

The following interfaces exist exclusively to proxy requests from the
ServiceWorkerManager on the main thread to the IPDL Background thread.

PRemoteWorkerController is a proxy wrapper around the
RemoteWorkerController API exposed on the IPDL Background thread.
PFetchEventOp is paired with PFetchEventOpProxy managed by the above
PRemoteWorkerController. PFetchEventOp gets the data to the IPDL
Background thread from the main thread, then PFetchEventOpProxy gets the
data down to the content process.

Non-Fetch ServiceWorker events AKA ExtendableEvents

How non-fetch events are dispatched to the serviceworker (including the IPC).

Because ServiceWorkers are intended to be shutdown and restarted on demand and
most event processing is asynchronous, there needs to be a way to track
outstanding requests and for content logic to indicate when it is done
processing requests. ExtendableEvents are the mechanism for this. A method
waitUntil(promise) adds promises to be track as long as the event is still
"active".

This straightforward lifecycle means that these events map well to our IPC
async return value mechanism. This is used by
PRemoteWorker::ExecServiceWorkerOp.

Fetch events and FetchEvent.respondWith()

FetchEvents have a different lifecycle and dataflow than regular
ExtendableEvents. They expose a respondWith() method that will eventually
resolve with a fetch Response object that potentially needs to be propagated
before the ExtendableEvent is no longer active. (The response will never be
propagated after waitUntil() settles because every call to respondWith()
implicitly calls waitUntil().)

From an IPC perspective, this means that FetchEvent instances need their own
IPC actor rather than being able to depend on the async return value mechanism
of IPDL. That's why PFetchEventOpProxy exists and is managed by
PRemoteWorker.

Depends on D92020

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 19

•

4 years ago

It turns out moz-phab has its own mechanism of downloading patches from phabricator, nicely documented at https://github.com/mozilla-conduit/review#downloading-a-patch-from-phabricator so I should have done moz-phab patch D91173. (My previous use of arc had come from previous uses of arc as directed by since-updated Firefox version control or conduit documentation but I'd never tried to submit any of the resulting applied patches, just use them.)

Unfortunately when submitting my stack which built on Nika's test this did permute the canonical test revision and its stack to no longer depend on Nika's provision fix. I think this might actually be desired, but it's weird. (And to its credit moz-phab did warn me about this, but after I panicked the first time after agreeing and hit control-c it left my hg bookmark pointing at the wrong thing, so I restored my state and just went ahead with the stack clobbering.)

Neha Kochar [:neha]

Updated

•

4 years ago

Whiteboard: patches waiting for review

Luca Greco [:rpl] [:luca] [:lgreco]

Updated

•

4 years ago

Blocks: 1656310

Luca Greco [:rpl] [:luca] [:lgreco]

Updated

•

4 years ago

Comment 20

•

4 years ago

This has been reported even without Fission so removing Fission milestone flag.

Fission Milestone: M6b → ---

Neha Kochar [:neha]

Comment 21

•

4 years ago

This is much worse with Fission so we do need to block M6b on this.

Fission Milestone: --- → M6b

Whiteboard: patches waiting for review

Tom Tung [:tt, :ttung]

Updated

•

4 years ago

Comment 22

•

4 years ago

Pushed by bugmail@asutherland.org:
https://hg.mozilla.org/integration/autoland/rev/5572b2a892c9
Part 1: Test fetch intercepting spawning a serviceworker process, r=asuth
https://hg.mozilla.org/integration/autoland/rev/c4920ccb9fbd
Test enhancements. r=nika
https://hg.mozilla.org/integration/autoland/rev/49008be99669
Make SW fetch events properly queue. r=dom-workers-and-storage-reviewers,janv

Treeherder Bug Filer

Updated

•

4 years ago

Regressions: 1669876

Atila Butkovits

Comment 23

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/5572b2a892c9
https://hg.mozilla.org/mozilla-central/rev/c4920ccb9fbd
https://hg.mozilla.org/mozilla-central/rev/49008be99669

Status: ASSIGNED → RESOLVED

Closed: 4 years ago

status-firefox83: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 83 Branch

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 24

•

4 years ago

Comment on attachment 9178901 [details]
Bug 1662925 - Make SW fetch events properly queue. r=#dom-workers-and-storage

Beta/Release Uplift Approval Request

User impact if declined: Websites that involve ServiceWorker interception may not properly launch on the first try if they are launching a content process to host the ServiceWorker. This is more of a major problem for fission, but the circumstances also seem to arise when normal e10s Firefox is first started and possibly also GeckoView.
Is this code covered by automated tests?: Yes
Has the fix been verified in Nightly?: Yes
Needs manual test from QE?: No
If yes, steps to reproduce:
List of other uplifts needed: None
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): After all is said and done, this is a relatively straightforward life-cycle management bug that definitely was broken before. The underlying pending operation tracking has been exercising related paths for some time and has been refined and the MozPromises involved in all of this already existed before and had the same lifetime relationships.

We don't actually have to take the new test and its refinements, but it seems appropriate to help verify the fix.

String changes made/needed: none

Attachment #9178901 - Flags: approval-mozilla-beta?

Andrew Sutherland [:asuth] (he/him)

Assignee

Updated

•

4 years ago

Attachment #9177434 - Flags: approval-mozilla-beta?

Attachment #9178900 - Flags: approval-mozilla-beta?

Julien Cristau [:jcristau]

Comment 25

•

4 years ago

As we're about to build the last beta for 82, uplifting makes me nervous. is this a new issue, or somehow getting worse for users? if there are regressions from it, how likely are we to notice and link them back to that change within a couple of days?

Flags: needinfo?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment 26

•

4 years ago

•

Edited

This is not a new issue. It also is recoverable if the user goes to the URL bar and hits enter again, etc. That said, bug 1663125 which impacted Firefox 80 would have made the problem disappear because ServiceWorkers were not having their registrations loading from disk on startup. That regression was fixed late in the 81 cycle (81.0rc2) as well, so it might appear like a new regression for some set of users.

See the blockers and see also's for potentially related bugs. Note that there's a separate twitter issue which is more concerning and getting some awareness and that's the one that involves the corrupted content UI (bug 1665368). That is likely unrelated to this bug.

In terms of regressions, I think we'd notice it pretty quickly because the failure modes for a screw-up would be:

Crash. We should see a ton of crashes on twitter.com for those that report the URL if so.
Hangs forever / displays about:blank. This is actually the problem that the bug fixes, but if that didn't get fixed right or increased the incidence, I'd still expect us to hear about it because of twitter.
Increased corrupted content error rate. We actually may see this in the success case because whatever's happening with the twitter SW throwing a rejection might happen more because the set of successfully launched ServiceWorkers with FetchEvents dispatched at them would increase.

Note again that the Twitter corrupted content error (bug 1665368) is a separate bug I'm working on a fix for and that one is definitely not suitable for immediate uplift. So maybe it makes sense to wait to for that fix to land and both to ride the beta train. I just wanted to make sure this potential fix was known to release management for consideration.

Flags: needinfo?(bugmail)

Julien Cristau [:jcristau]

Comment 27

•

4 years ago

Comment on attachment 9178901 [details]
Bug 1662925 - Make SW fetch events properly queue. r=#dom-workers-and-storage

Thanks for the details Andrew. Let's get this one in 82.0b9.

Attachment #9178901 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Julien Cristau [:jcristau]

Updated

•

4 years ago

Attachment #9177434 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Julien Cristau [:jcristau]

Updated

•

4 years ago

Attachment #9178900 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Julien Cristau [:jcristau]

Comment 28

•

4 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/3e19be3e6824
https://hg.mozilla.org/releases/mozilla-beta/rev/8c783a21273a
https://hg.mozilla.org/releases/mozilla-beta/rev/bba66a0ea401

status-firefox82: --- → fixed

Mathew Hodson

Comment 31

•

4 years ago

This was caused by the changes in bug 1626362. I didn't see it in the regressed by field.

Ryan VanderMeulen [:RyanVM]

Updated

•

4 years ago

status-firefox-esr78: --- → unaffected

Flags: in-testsuite+

Regressed by: 1626362

BMO Automation

Updated

•

4 years ago

Has Regression Range: --- → yes

Alice0775 White

Updated

•

4 years ago

Regressions: 1673534

Andrew Sutherland [:asuth] (he/him)

Assignee

Comment on attachment 9178652 [details]
Bug 1662925 - Doc improvements for Remote and Shared workers.

Revision D91887 was moved to bug 1701726. Setting attachment 9178652 [details] to obsolete.

Attachment #9178652 - Attachment is obsolete: true

Phabricator Automation

Updated

•

3 years ago

Attachment #9177007 - Attachment is obsolete: true

You need to log in before you can comment on or make changes to this bug.