Closed Bug 1383905 Opened 7 years ago Closed 7 years ago

push events can fire in service worker in web extension process

Categories

(Core :: DOM: Service Workers, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox-esr52 --- unaffected
firefox55 --- wontfix
firefox56 --- fixed
firefox57 --- fixed

People

(Reporter: bkelly, Assigned: asuth)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

See my comment here:

https://github.com/devtools-html/debugger.html/issues/3432#issuecomment-317555835

We seem to be running service worker threads in the web extension process instead of the content process sometimes.

Does anyone have time to look at this?
Andrew, does anyone have time to look at this?  We probably should not ship FF56 like this.
Flags: needinfo?(overholt)
We just need to change PushNotifier::Dispatch to filter based on the result of ContentParent::GetRemoteType().  Right now we could also end up in the file process or large allocation processes, both of which seem bad.  This is a trivial change, fix coming up.
Assignee: nobody → bugmail
Status: NEW → ASSIGNED
Flags: needinfo?(overholt)
This introduces filtering to avoid attempting to dispatch notifications
to WebExtensions processes and other types of content processes.


The fix was easy, but I got confused by about:memory calling the web
extensions process a "Web Content" process, so now we have bug 1383978
so I don't get confused again in the future.

Unfortunately, about:memory is fancy and sorts processes after the main
process by resident size, so we can't use it as means of telling what
the underlying order of ContentParent::GetAll is.  Thankfully GDB
exists and it's how I was planning to verify my fix anyways.

Here's my super cool gdb transcript showing that the 0th process is an
extension process, and that when we set a breakpoint in
PushNotifier::Dispatch the loop skips the "extension" process type and
continues on to "web", deciding to process it by sending the
permissions.  Inductively this will also work for the non-e10s case for
people who want to debug ServiceWorkers.

110	      for (uint32_t i = 0; i < contentActors.Length(); ++i) {
(gdb) p contentActors
$6 = nsTArray<mozilla::dom::ContentParent*> = {0x7fffcbc57000, 0x7fffb73fe000, 0x7fffb3571000, 0x7fffb06f3000}
(gdb) p ((mozilla::dom::ContentParent*)0x7fffcbc57000)->mRemoteType
$7 = u"extension"
(gdb) p ((mozilla::dom::ContentParent*)0x7fffb73fe000)->mRemoteType
$8 = u"web"
(gdb) p ((mozilla::dom::ContentParent*)0x7fffb3571000)->mRemoteType
$9 = u"web"
(gdb) p ((mozilla::dom::ContentParent*)0x7fffb06f3000)->mRemoteType
$10 = u"web"

110	      for (uint32_t i = 0; i < contentActors.Length(); ++i) {
(gdb) p i
$11 = 0
(gdb) next
113	        if (!contentActors[i]->GetRemoteType().EqualsLiteral(
(gdb) next
110	      for (uint32_t i = 0; i < contentActors.Length(); ++i) {
(gdb) p i
$12 = 1
(gdb) next
113	        if (!contentActors[i]->GetRemoteType().EqualsLiteral(
(gdb) next
122	          TransmitPermissionsForPrincipal(aDispatcher.GetPrincipal());
Attachment #8889718 - Flags: review?(bkelly)
Comment on attachment 8889718 [details] [diff] [review]
Push notifications should only use "web" content processes

Review of attachment 8889718 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the quick fix!

Should we uplift to FF55?
Attachment #8889718 - Flags: review?(bkelly) → review+
Backed out for timing out in xpcshell's dom/push/test/xpcshell/test_observer_remoting.js, at least on OS X:

https://hg.mozilla.org/integration/mozilla-inbound/rev/506c0fc7d2f25e68ba7d000d957db4f9781518f3

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=c49b7e6e9811851c02017fae4150a3624a84f3fc&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=117537078&repo=mozilla-inbound

08:07:25     INFO -  TEST-START | dom/push/test/xpcshell/test_observer_remoting.js
08:12:25  WARNING -  TEST-UNEXPECTED-TIMEOUT | dom/push/test/xpcshell/test_observer_remoting.js | Test timed out
08:12:25     INFO -  TEST-INFO took 300009ms
08:12:25     INFO -  >>>>>>>
08:12:25     INFO -  (xpcshell/head.js) | test MAIN run_test pending (1)
08:12:25     INFO -  (xpcshell/head.js) | test run_next_test 0 pending (2)
08:12:25     INFO -  (xpcshell/head.js) | test MAIN run_test finished (2)
08:12:25     INFO -  running event loop
08:12:25     INFO -  dom/push/test/xpcshell/test_observer_remoting.js | Starting test_observer_remoting
08:12:25     INFO -  (xpcshell/head.js) | test test_observer_remoting pending (2)
08:12:25     INFO -  PID 8599 | JavaScript error: resource://gre/modules/FileUtils.jsm, line 63: NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIProperties.get]
08:12:25     INFO -  (xpcshell/head.js) | test run in child pending (3)
08:12:25     INFO -  (xpcshell/head.js) | test run_next_test 0 finished (3)
08:12:25     INFO -  "CONSOLE_MESSAGE: (error) [JavaScript Error: "NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIProperties.get]" {file: "resource://gre/modules/FileUtils.jsm" line: 63}]"
08:12:25     INFO -  "CONSOLE_MESSAGE: (info) No chrome package registered for chrome://branding/locale/brand.properties"
08:12:25     INFO -  CHILD-TEST-STARTED
08:12:25     INFO -  (xpcshell/head.js) | test MAIN run_test pending (1)
08:12:25     INFO -  (xpcshell/head.js) | test run_next_test 0 pending (2)
08:12:25     INFO -  (xpcshell/head.js) | test MAIN run_test finished (2)
08:12:25     INFO -  running event loop
08:12:25     INFO -  dom/push/test/xpcshell/test_observer_remoting.js | Starting test_observer_remoting
08:12:25     INFO -  (xpcshell/head.js) | test test_observer_remoting pending (2)
08:12:25     INFO -  (xpcshell/head.js) | test run_next_test 0 finished (2)
08:12:25     INFO -  TEST-PASS | dom/push/test/xpcshell/test_observer_remoting.js | test_observer_remoting - [test_observer_remoting : 87] Should fire push notifications with the correct scope - "chrome://test-scope" == "chrome://test-scope"
08:12:25     INFO -  TEST-PASS | dom/push/test/xpcshell/test_observer_remoting.js | test_observer_remoting - [test_observer_remoting : 87] Should fire push notifications with the correct scope - "chrome://test-scope" == "chrome://test-scope"
08:12:25     INFO -  TEST-PASS | dom/push/test/xpcshell/test_observer_remoting.js | test_observer_remoting - [test_observer_remoting : 90] Should include the principal in the push message - {} == {}
08:12:25     INFO -  TEST-PASS | dom/push/test/xpcshell/test_observer_remoting.js | test_observer_remoting - [test_observer_remoting : 90] Should include the principal in the push message - {} == {}
08:12:25     INFO -  TEST-PASS | dom/push/test/xpcshell/test_observer_remoting.js | test_observer_remoting - [test_observer_remoting : 92] Should include data - "Hello from child!" === "Hello from child!"
08:12:25     INFO -  TEST-PASS | dom/push/test/xpcshell/test_observer_remoting.js | test_observer_remoting - [test_observer_remoting : 92] Should include data - "Hello from child!" === "Hello from child!"
08:12:25     INFO -  TEST-PASS | dom/push/test/xpcshell/test_observer_remoting.js | test_observer_remoting - [test_observer_remoting : 98] Should fire subscription change notifications with the correct scope - "chrome://test-scope" == "chrome://test-scope"
08:12:25     INFO -  TEST-PASS | dom/push/test/xpcshell/test_observer_remoting.js | test_observer_remoting - [test_observer_remoting : 98] Should fire subscription change notifications with the correct scope - "chrome://test-scope" == "chrome://test-scope"
08:12:25     INFO -  TEST-PASS | dom/push/test/xpcshell/test_observer_remoting.js | test_observer_remoting - [test_observer_remoting : 100] Should pass the principal as the subject of a change notification - {} == {}
08:12:25     INFO -  TEST-PASS | dom/push/test/xpcshell/test_observer_remoting.js | test_observer_remoting - [test_observer_remoting : 100] Should pass the principal as the subject of a change notification - {} == {}
08:12:25     INFO -  TEST-PASS | dom/push/test/xpcshell/test_observer_remoting.js | test_observer_remoting - [test_observer_remoting : 107] Should fire subscription modified notifications with the correct scope - "chrome://test-scope" == "chrome://test-scope"
08:12:25     INFO -  TEST-PASS | dom/push/test/xpcshell/test_observer_remoting.js | test_observer_remoting - [test_observer_remoting : 109] Should pass the principal as the subject of a modified notification - {} == {}
08:12:25     INFO -  TEST-PASS | dom/push/test/xpcshell/test_observer_remoting.js | test_observer_remoting - [test_observer_remoting : 107] Should fire subscription modified notifications with the correct scope - "chrome://test-scope" == "chrome://test-scope"
08:12:25     INFO -  TEST-PASS | dom/push/test/xpcshell/test_observer_remoting.js | test_observer_remoting - [test_observer_remoting : 109] Should pass the principal as the subject of a modified notification - {} == {}
08:12:25     INFO -  (xpcshell/head.js) | test pending (3)
08:12:25     INFO -  (xpcshell/head.js) | test finished (3)
08:12:25     INFO -  TEST-PASS | dom/push/test/xpcshell/test_observer_remoting.js | test_observer_remoting - [test_observer_remoting : 87] Should fire push notifications with the correct scope - "chrome://test-scope" == "chrome://test-scope"
08:12:25     INFO -  TEST-PASS | dom/push/test/xpcshell/test_observer_remoting.js | test_observer_remoting - [test_observer_remoting : 90] Should include the principal in the push message - {} == {}
08:12:25     INFO -  TEST-PASS | dom/push/test/xpcshell/test_observer_remoting.js | test_observer_remoting - [test_observer_remoting : 92] Should include data - "Hello from parent!" === "Hello from parent!"
08:12:25     INFO -  TEST-PASS | dom/push/test/xpcshell/test_observer_remoting.js | test_observer_remoting - [test_observer_remoting : 98] Should fire subscription change notifications with the correct scope - "chrome://test-scope" == "chrome://test-scope"
08:12:25     INFO -  TEST-PASS | dom/push/test/xpcshell/test_observer_remoting.js | test_observer_remoting - [test_observer_remoting : 100] Should pass the principal as the subject of a change notification - {} == {}
08:12:25     INFO -  TEST-PASS | dom/push/test/xpcshell/test_observer_remoting.js | test_observer_remoting - [test_observer_remoting : 107] Should fire subscription modified notifications with the correct scope - "chrome://test-scope" == "chrome://test-scope"
08:12:25     INFO -  TEST-PASS | dom/push/test/xpcshell/test_observer_remoting.js | test_observer_remoting - [test_observer_remoting : 109] Should pass the principal as the subject of a modified notification - {} == {}
08:12:25     INFO -  <<<<<<<
08:12:25     INFO -  xpcshell return code: None
08:12:25     INFO -  dom/push/test/xpcshell/test_observer_remoting.js | Process still running after test!
Flags: needinfo?(bugmail)
Blocks: 1394429
xpcshell tests use ContentParent::GetNewOrUsedBrowserProcess(); which uses a default remote type of NO_REMOTE_TYPE == "".  If we convert that to DEFAULT_REMOTE_TYPE, the test passes.

nsXULAppInfo::EnsureContentProcess() also uses the no-argument variant, but that's a little more harmless since it just invokes the method as a side-effect.  More importantly, it has no callers that searchfox can find.

All other control flow goes through ContentParent::CreateBrowser which happens via nsFrameLoader.

I'm going to file a bug against Core::IPC to remote NO_REMOTE_TYPE in favor because of DEFAULT_REMOTE_TYPE because I think NO_REMOTE_TYPE was a "why risk getting blamed for breaking tests?" cop-out, although I haven't done all of the archaeology required.

Details:
- callsite of GetNewOrUsedBrowserProcess():
https://searchfox.org/mozilla-central/rev/51b3d67a5ec1758bd2fe7d7b6e75ad6b6b5da223/toolkit/xre/nsEmbedFunctions.cpp#904
- prototype:
https://searchfox.org/mozilla-central/rev/51b3d67a5ec1758bd2fe7d7b6e75ad6b6b5da223/dom/ipc/ContentParent.h#172
- constant declaration:
https://searchfox.org/mozilla-central/rev/51b3d67a5ec1758bd2fe7d7b6e75ad6b6b5da223/dom/ipc/ContentParent.h#39
Flags: needinfo?(bugmail)
Depends on: 1395827
https://hg.mozilla.org/mozilla-central/rev/d69a6595950c
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Comment on attachment 8889718 [details] [diff] [review]
Push notifications should only use "web" content processes

Approval Request Comment
[Feature/Bug causing the regression]:
Bug 1357486 enabled the Out-Of-Process (OOP) WebExtensions process on Windows for Firefox 56, which caused the underlying emergent bug to manifest.

(The underlying emergent bug might be some type of E10SUtils.shouldLoadURI()/E10SUtils.redirectLoad() loop resulting from the ServiceWorker openWindow() code path somewhat uniquely invoking nsWindowWatcher::OpenWindow2 in the content process.  That's its own tech debt issue :bkelly has a fix for on bug 1293277.  I can investigate further, but with this fix it becomes a moot issue.)

[User impact if declined]:
Bug 1394429 "notification opens perpetually loading empty tab" happens.  Push events that attempt to trigger opening a window will hang in this "forever-loading tab that's also chewing up a lot of your CPU" if 1) there's a webext OOP process and 2) it's the first content process in the list.  Usually the push event will trigger a desktop notification popup and the user must click on it before the ServiceWorker openWindow() call happens, but the breakage is technically in the ServiceWorker path.  (And the breakage is unaffected by the user waiting long enough that the ServiceWorkerPrivate gets terminated for being idle.  The notification/alert infrastructure can be said to have content process affinity; that desktop notification will just spin the SW back up in the wrong process when it gets clicked.)

[Is this code covered by automated tests?]:
Yes and no.  Yes, we have coverage that makes sure the patch didn't break push notifications.  (That's what caused the initial back-out and necessitated bug 1395827's fix).

No in the sense that there's no test for this specific edge-case, and this is also likely too complex/specific to create a test for.  We will likely favor adding more diagnostic assertions.  Unfortunately, this wasn't an invariant any of the affected code knew it needed when it was originally written and our multi-e10s ServiceWorker refactorings are still in process that are aware of such invariants.

[Has the fix been verified in Nightly?]:
Yes.

[Needs manual test from QE? If yes, steps to reproduce]: 
No, especially because it's all probabilistic.  However, if you want it for paranoia:
- Test on Windows (or any platform with "extensions.webextensions.remote" set to true.)
- Have at least one WebExtension.
- Use https://gauntface.github.io/simple-push-demo/ to install a SW and trigger push notifications.  Success is clicking on the notification and having a tab open and load.  Failure is clicking on the notification and having the tab spin like it's loading forever, eating up your CPU.  If you don't get a desktop notification, that's unrelated to this bug.  Just ask for another one.
- Keep quitting Firefox and restarting and opening that site and triggering notifications and ensuring it works.  Maybe do it 10 times because of the probabilistic nature of the scenario.  It certainly took me less than 10 tries to reproduce.  Note that you do not need to create a fresh profile each time, but you do want to have your profile set so that it doesn't open your previous session; we want as few content processes activated as possible at startup.

[List of other uplifts needed for the feature/fix]:
Bug 1395827's https://bugzilla.mozilla.org/attachment.cgi?id=8903461&action=edit must be landed and for bisection purposes, this patch wants to come after that patch.  Just like it landed on trunk.

I will request permission on that bug's attachment too so it can have its own risk assessment.

[Is the change risky?]:
No.

[Why is the change risky/not risky?]:
This is a very straightforward case of us filtering out webextensions processes.  We never want to be running ServiceWorkers there.

[String changes made/needed]:
No string changes made.
Attachment #8889718 - Flags: approval-mozilla-beta?
I'm marking 56 affected because it's for-sure affected by the OOP webexts on windows in Firefox 56.

I'm marking 55 "?" because I would expect a similar thing to happen for the "file" protocol process introduced in bug 1147911 which landed (and shipped?) in Firefox 53.  Given that most people don't use "file://" URLs, I would expect the chance of the problem occurring there are greatly reduced, especially because we'd expect a normal content process to show up first in the content process list in that case.   (The only way I could see that happening is if sessionstore persisted a "file://" URL as the first-loaded tab and the file process got created before a proper "web" content process.)
Comment on attachment 8889718 [details] [diff] [review]
Push notifications should only use "web" content processes

Fix an issue related to push events dispatch. Beta56+.
Attachment #8889718 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
I'm still having an issue on the latest nightly. If I have a twitch.tv tab open and then click a push notification from twitch.tv, all of my tabs on that website will try to load the page for the notification.
(In reply to blakewolf from comment #14)
> I'm still having an issue on the latest nightly. If I have a twitch.tv tab
> open and then click a push notification from twitch.tv, all of my tabs on
> that website will try to load the page for the notification.

Please file a new bug for this.  It's likely something the site script is doing on purpose, though
You need to log in before you can comment on or make changes to this bug.