Closed Bug 1351231 Opened 6 years ago Closed 14 days ago

Implement PFetch protocol and backing FetchService to allow Workers to request fetches without involving the main thread of their process

Categories

(Core :: DOM: Networking, task, P3)

task

Tracking

()

RESOLVED FIXED
111 Branch
Performance Impact medium
Tracking Status
firefox111 --- fixed

People

(Reporter: bkelly, Assigned: edenchuang, NeedInfo)

References

(Blocks 7 open bugs, Regressed 2 open bugs)

Details

(Keywords: perf:pageload, Whiteboard: [necko-triaged])

Attachments

(5 files, 1 obsolete file)

Currently fetch() using nsIChannel on the main thread in the current process.  This means that a worker thread (like service worker) can hit main thread jank when it tries to perform a network request.  This is particularly bad for service workers since delaying the fetch() may in turn delay the network result for the main page load.

The network team is working on making nsIChannel support PBackground so it can be used OMT.  That will probably take a long time to fully happen, however.

In addition, our service worker e10s refactor will move the service worker instances to a new process.  That will also take some time to happen.  Its also possible the memory overhead of this new process will be too large.

An alternative we could do in the short term would be to implement a separate PBackground actor just for fetch().  We already support serializing Request and Response objects across IPC in the Cache API.  In addition, Andrea recently landed parent-to-child AutoIPCStream support.  Between these it might be easy to use PBackground to go to parent, proxy to main thread there, and then create the nsIChannel.

If we did this, though, we would not be able to fully remove the current implementation.  A PBackground fetch() would bypass the child-side service worker intercept today.  We would only be able to use it in fetch() calls from within the service worker global itself.  Eventually we could remove the old fetch() implementation after SWM is moved to the parent.
We may want to do this even with the necko changes.  Right now I think they are only making OnDataAvailable() work OMT which means we still need to touch main thread for On(Start|Stop)Request().

Another factor to consider, though, is we would need to do something special for devtools.  It currently requires nsIChannel in order to show up in the network monitor.
Blocks: 1330826
Priority: P3 → P2
Whiteboard: [qf]
Blocks: 1522316
Whiteboard: [qf] → [qf:p1:pageload]

Note: From a conversation with Dragana on Feb 14, 2019, I understand the plan is to make AsyncOpen, OnStart, and OnStop work off the main thread, so this bug probably wants to be duped.

Component: DOM: Service Workers → DOM: Networking
Priority: P2 → --
Depends on: omt-networking
Priority: -- → P2
Whiteboard: [qf:p1:pageload] → [qf:p1:pageload][necko-triaged]
Priority: P2 → P3
See Also: → 1567556
Blocks: 1613912
Blocks: 1567556
Assignee: nobody → bugmail
Status: NEW → ASSIGNED
Summary: Consider using separate PBackground actor for fetch() → Implement PFetch protocol and backing FetchService to allow Workers to request fetches without involving the main thread of their process
Assignee: bugmail → ytausky
Whiteboard: [qf:p1:pageload][necko-triaged] → [qf:p2:pageload][necko-triaged]

Does this block parsing worklets as module scripts, i.e., bug 1572644?

(In reply to Anne (:annevk) from comment #4)

Does this block parsing worklets as module scripts, i.e., bug 1572644?

No. One could implement bug 1311726 while still performing the current "let's do all the network stuff on the main thread" behavior. Noting that right now the worklet script loader just assumes the one single invocation of WorkletFetchHandler that feeds into the module compilation.

The network-on-the-main-thread uses could then subsequently be replaced with PFetch later on.

Blocks: 1290958
No longer depends on: omt-networking
Assignee: ytausky → echuang
Depends on: 1725567
No longer blocks: 1290958
Performance Impact: --- → P2
Keywords: perf:pageload
Whiteboard: [qf:p2:pageload][necko-triaged] → [necko-triaged]
Attachment #9270005 - Attachment is obsolete: true
Attachment #9270007 - Attachment description: WIP: Bug 1351231 - FetchService integration for PFetch. → Bug 1351231 - FetchService integration for PFetch.
Attachment #9270006 - Attachment description: WIP: Bug 1351231 - PFetch protocol declaration and implementation. → Bug 1351231 - PFetch protocol declaration and implementation.
Attachment #9270006 - Attachment description: Bug 1351231 - PFetch protocol declaration and implementation. → Bug 1351231 - PFetch protocol declaration and implementation. r=#dom-worker-reviewers
Attachment #9270007 - Attachment description: Bug 1351231 - FetchService integration for PFetch. → Bug 1351231 - FetchService integration for PFetch. r=#dom-worker-reviewers

Various Investigation Notes

Re: the hand-off try push https://treeherder.mozilla.org/jobs?repo=try&selectedTaskRun=YUCMBF2yRbmQ5HHlTS4EIQ.0&revision=e1d06a199d576b745b6fa496312dcaa59b13f470

  • A bunch of the last bunch of the try failures turn out to have been bug 1777181, but this isn't to say the tests Eden identified as failing won't actually still be failing on try with those gone.
  • browser_download_canceled.js has a reproducible failure for me locally. It also has some (low occurrence) intermittent test bugs: single tracker bug 1776002 and Got "timeout", expected "canceled" bug 1771953.
    • I reproduced locally and here's a pernosco trace: https://pernos.co/debug/mKbEio8ZspzHWJfbhGV2TQ/index.html
    • I need to do more investigation here, but it looks a lot like at least what I reproduced may be a bug in the test. Specifically, nsExternalAppHandler::SaveDestinationAvailable is being invoked with a non-null target of my downloads folder which is what happens if you don't cancel the download.
      • This raises questions about how the test ever passes in automation. It does seem that we have explicitly disabled the test under "verify" so a theory I plan to disprove next is that browser_download_canceled only works in automation because a prior download-related test properly sets some configuration option that carries over to browser_download_canceled and if you run it on its own it's missing that inherited flag, and so it fails. (I believe verify only ever runs the test itself which would have identical semantics.)
      • In this case it's possible there will still be a PFetch bug after addressing any test issues; the background file saver intentionally invokes NS_AsyncCopy with aCloseSource=false because of the pivoting mechanism it uses (first it downloads to a temporary file, then when the user picks a final destination, the download is pivoted to the new target file), and there could be some edge-cases related to this. A notable thing here is that there is really an incredible level of useful discussion in the original bug at https://bugzilla.mozilla.org/show_bug.cgi?id=789932#c8 around this (and I think the code is also unusually well commented as well).
      • I need to call it a day but I'll dig more into that tomorrow.

Also, I stood up a searchfox index on the "dev" channel for review related aspects of landing this, so that's https://dev.searchfox.org/

Severity: normal → S3

There are some r+ patches which didn't land and no activity in this bug for 2 weeks.
:edenchuang, could you have a look please?
If you still have some work to do, you can add an action "Plan Changes" in Phabricator.
For more information, please visit auto_nag documentation.

Flags: needinfo?(rjesup)
Flags: needinfo?(echuang)
Flags: needinfo?(rjesup)
Pushed by echuang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d8fa8a006948
Preference for PFetch. r=dom-worker-reviewers,jesup.
https://hg.mozilla.org/integration/autoland/rev/5c3b5a384f2f
Support conversation between InternalRequest and IPCInternalRequest. r=dom-worker-reviewers,jesup
https://hg.mozilla.org/integration/autoland/rev/84162d09eef8
PFetch protocol declaration and implementation. r=dom-worker-reviewers,jesup
https://hg.mozilla.org/integration/autoland/rev/c0cb3c17f246
FetchService integration for PFetch. r=dom-worker-reviewers,jesup
https://hg.mozilla.org/integration/autoland/rev/cc26eeeaf3dd
Integrate FetchChild into Fetch.cpp r=dom-worker-reviewers,jesup
Backout by sstanca@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/ec41ebc45aec
Backed out 5 changesets for Fetch related failures. CLOSED TREE

Backed out for Fetch related failures.

  • Backout link
  • Push with failures
  • Failure Log
  • Failure line: PROCESS-CRASH | /fetch/api/request/request-bad-port.any.sharedworker.html | application crashed [@ (anonymous namespace)::ParentImpl::AssertIsOnBackgroundThread()]

*It also failed on some mochitests jobs. Failure log
*It also failed like this. Failure log

And also failed on this mochitests job. Failure log

Pushed by echuang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fd2a843599d1
Preference for PFetch. r=dom-worker-reviewers,jesup.
https://hg.mozilla.org/integration/autoland/rev/a1fcf22a2a0e
Support conversation between InternalRequest and IPCInternalRequest. r=dom-worker-reviewers,jesup
https://hg.mozilla.org/integration/autoland/rev/3eb8fdd0ba6d
PFetch protocol declaration and implementation. r=dom-worker-reviewers,jesup
https://hg.mozilla.org/integration/autoland/rev/66b279e5a513
FetchService integration for PFetch. r=dom-worker-reviewers,jesup
https://hg.mozilla.org/integration/autoland/rev/da5c4a821428
Integrate FetchChild into Fetch.cpp r=dom-worker-reviewers,jesup
Type: enhancement → task
Pushed by echuang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8c21c9468949
Preference for PFetch. r=dom-worker-reviewers,jesup.
https://hg.mozilla.org/integration/autoland/rev/ea27cd66fefd
Support conversation between InternalRequest and IPCInternalRequest. r=dom-worker-reviewers,jesup
https://hg.mozilla.org/integration/autoland/rev/dbdca4661a35
PFetch protocol declaration and implementation. r=dom-worker-reviewers,jesup
https://hg.mozilla.org/integration/autoland/rev/6cbb442f4772
FetchService integration for PFetch. r=dom-worker-reviewers,jesup
https://hg.mozilla.org/integration/autoland/rev/a41252141399
Integrate FetchChild into Fetch.cpp r=dom-worker-reviewers,jesup

Backed out 5 changesets (Bug 1351231) for causing multiple wpt failures.
Backout link
Push with failures <--> wpt7
Failure Log
Also wpt7 Failure Log

Pushed by echuang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/19b78046ffef
Preference for PFetch. r=dom-worker-reviewers,jesup.
https://hg.mozilla.org/integration/autoland/rev/9da00c1364a5
Support conversation between InternalRequest and IPCInternalRequest. r=dom-worker-reviewers,jesup
https://hg.mozilla.org/integration/autoland/rev/7a4e3f5f674a
PFetch protocol declaration and implementation. r=dom-worker-reviewers,jesup
https://hg.mozilla.org/integration/autoland/rev/4fd92351d64b
FetchService integration for PFetch. r=dom-worker-reviewers,jesup
https://hg.mozilla.org/integration/autoland/rev/257553919696
Integrate FetchChild into Fetch.cpp r=dom-worker-reviewers,jesup

Also, this assertion became very frequent Assertion failure: aResponseTarget, at /builds/worker/workspace/obj-build/dist/include/mozilla/MozPromise.h:504 there is a bug for this, Bug 1756635.
Please take a look at this too before you re-land this bug.
LATER EDIT:
also please take a look at:

  • Bug 1810967 which was caused by this bug. Bug 1810967 it is now closed as INVALID because this bug was backed out.
  • Bug 1810821 which was caused by this bug. Bug 1810821 it is now closed as INVALID because this bug was backed out.
  • Bug 1811064 which was caused by this bug. Bug 1811064 it is now closed as INVALID because this bug was backed out.
Pushed by echuang@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/dcfa6d7e82c0
Preference for PFetch. r=dom-worker-reviewers,jesup.
https://hg.mozilla.org/integration/autoland/rev/d77b78a71a7e
Support conversation between InternalRequest and IPCInternalRequest. r=dom-worker-reviewers,jesup
https://hg.mozilla.org/integration/autoland/rev/989cbb08b87e
PFetch protocol declaration and implementation. r=dom-worker-reviewers,jesup
https://hg.mozilla.org/integration/autoland/rev/b9d73f961292
FetchService integration for PFetch. r=dom-worker-reviewers,jesup
https://hg.mozilla.org/integration/autoland/rev/162fe96c2c33
Integrate FetchChild into Fetch.cpp r=dom-worker-reviewers,jesup
https://hg.mozilla.org/integration/autoland/rev/0c260a33541a
1351231, 1351231, 1351231, 1351231: apply code formatting via Lando

There are some bugs that became frequent with the landings in this bug, for eg:

In both of them can be observed the same behavior:
Bug 1351231 -> landed failures went up,
Bug 1351231 was backed out -> no more failures on tree
Bug 1351231 was relanded - failures are again frequent.

Eden, please have a look over these frequent failures. Thank you.

Regressions: 1810828
Regressions: 1810816
Regressions: 1811682
See Also: → 1812039
You need to log in before you can comment on or make changes to this bug.