Closed Bug 1279503 Opened 3 years ago Closed 3 years ago

Push Notifications don't work in e10s depending on process configuration

Categories

(Core :: DOM: Push Notifications, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
e10s + ---
firefox48 + wontfix
firefox49 + fixed
firefox50 + fixed

People

(Reporter: bigben, Assigned: baku)

References

(Blocks 1 open bug)

Details

Attachments

(4 files, 2 obsolete files)

It seems there are issues with our Push Notification/Service Worker API when e10s mode is enabled. Indeed, if there is only a main process spawned (just launch nightly and open about:debugging only for example, make sure you have no tab containing remote content), you can't receive push notifications [1].

Moreover, if the current content/child process never opened the page that registered the notification service worker, you can't receive push notifications either [2].

I'm not sure this is very clear, so here is a long scenario to reproduce the various sides of this bug:

[Steps to reproduce]:
1. Launch Firefox Nightly.
2. Open about:debugging and go to service workers section.
3. Open a tab with https://gauntface.github.io/simple-push-demo/
4. Copy and paste the CURL command. You can click the XHR button to check that you receive the notifications. Since you opened the push demo page recently, the notification service worker should be alive (you can check this in about:debugging).
5. Open a tab with content, such as https://www.mozilla.org/
6. Close the tab which contained the simple push demo.
7. Check that the service worker is dead (in about:debugging), or wait for it to die.
8. Now, run the CURL command. You can notice that the service worker is woken up and you receive the push notification.
9. Now, close the spare tab you opened (containing mozilla.org or any other website) and wait a few seconds for the content process to die.
10. Run the CURL command again. You won't receive any notification! [1]
If you check about:debugging, the service worker isn't woken up. Even if you force it to start, you still don't receive notifications.
11. Open a new tab, containing https://mozillians.org/ for instance. A new content process should be spawned.
12. Run the CURL command again. Even with a content process, you still won't receive any notification, because this new content process never contained the original service page. [2]

[Expected result]:
Whenever Firefox is launched and whatever tabs you have opened inside (none, plenty, only about:* pages), you should receive push notifications for the services you registered

[Actual result]:
You can only receive push notifications if :
- There is a content process spawned
- This content process contains (or contained during its life) the service web page

Please note that this bug doesn't happen if you disable e10s.
In a single-process version of Firefox, you always receive the notifications when running the CURL command, regardless of your tabs.

I hope this is clear enough!
Thanks for filing this bug Benoit! I was able to reproduce it.

What works (in e10s):
- Receiving a Push notification, when a content process is running and it has previously loaded the Service Worker scope before

What doesn't work (in e10s):
- Receiving a Push notification without any content process (just the main process running).
- Receiving a Push notification with a content process that has never seen any page from the Service Worker scope before (e.g. just www.mozilla.org).

Kit, do you have any idea if this is expected behavior? Shouldn't the Push Service spawn a content process if none is available when your receive a push message?
Flags: needinfo?(kcambridge)
Blocks: 1164829
I think the issue without any content process is a known problem and we need the service worker e10s refactor to fix it.

The issue with a content process not having opened the scope before, though, is unexpected and probably can/should be fixed.
tracking-e10s: --- → ?
I spent some time looking at the issue with a content process not having opened the scope before. There's something odd with how we initialize the ServiceWorkerManager in the child.

Normally, if I open a new window, `TabChild::RecvLoadURL` (http://searchfox.org/mozilla-central/rev/970569ad57ac4436ff31aa2ac63f38ed3ee2932d/dom/ipc/TabChild.cpp#1331-1365) gets called in the child, which initializes all the service worker registrations.

This works correctly on startup, and if I close all windows, then open a new window with a page that's not controlled by the service worker and send a push.

However, I can reproduce the issue consistently if I do this:

1. Close all windows.
2. Open a new window.
3. Open "about:debugging", "about:addons", or "about:about". At this point, the content process shuts down...I'm guessing because these aren't remote?
4. Open an uncontrolled page in a new tab.
5. Try sending a push via cURL.
6. (Optionally, close the "about:" page, and open some more pages that aren't controlled by the service worker. That has no effect on the bug).

Neither RecvLoadURL nor ServiceWorkerManager::LoadRegistrations gets called, so ServiceWorkerManager::SendPushEvent bails because it doesn't have any registrations.

If I open an uncontrolled page in a new *window*, though, RecvLoadURL *is* called, so I can send pushes.

Any ideas, Ben? It's almost like loading a non-remote tab as the first tab in a window breaks it...even if I open some remote tabs and close the non-remote one in that same window.
Flags: needinfo?(kcambridge) → needinfo?(bkelly)
Andrea, can you help Kit with this issue?  I think you implemented our service worker registrar loading logic originally.
Flags: needinfo?(bkelly) → needinfo?(amarchesini)
The issue here is that we should spawn a content process if we don't have anything running. In the PushNotifier code there is
 PushDispatcher::HandleNoChildProcesses() but it does nothing:

https://dxr.mozilla.org/mozilla-central/source/dom/push/PushNotifier.cpp#264

This issue is probably related to moving ServiceWorkerManager to the parent process. Once this is done, we can centralize how ServiceWorkers are spawn also when we don't have any running content process.
Flags: needinfo?(amarchesini)
Andrea, in this case:

> - Receiving a Push notification with a content process that has never seen any page from the Service Worker scope before (e.g. just www.mozilla.org).

From comment 0 or Kit's situation in comment 3, there is a content process.

So I think we have an issue we could solve now before the full e10s refactor.
Flags: needinfo?(amarchesini)
There are 2 issues here, the first one is that, without content process we don't dispatch notifications as comment 0 says:

9. Now, close the spare tab you opened (containing mozilla.org or any other website) and wait a few seconds for the content process to die.
10. Run the CURL command again. You won't receive any notification! [1]

Then, with a new content process:

11. Open a new tab, containing https://mozillians.org/ for instance. A new content process should be spawned.
12. Run the CURL command again. Even with a content process, you still won't receive any notification, because this new content process never contained the original service page. [2]

So, yes, tomorrow I'll debug this second issue because it should work with the current setup.
Flags: needinfo?(amarchesini)
[Tracking Requested - why for this release]:
See comment 7, a service worker / push related issue that might need addressing before e10s rolls out.

bkelly, is this something we should block on?
Flags: needinfo?(bkelly)
I think its a bit of a corner case, but I'd love to see it fixed in 48 beta.

Andrea, any progress here?  It would be really good to fix this before we ship e10s in release.
Flags: needinfo?(bkelly) → needinfo?(amarchesini)
Assignee: nobody → amarchesini
Flags: needinfo?(amarchesini)
Attachment #8767744 - Flags: review?(mrbkap)
gabor, do you want to steal these patches from the Blake's review queue?
Flags: needinfo?(gkrizsanits)
(In reply to Andrea Marchesini (:baku) from comment #12)
> gabor, do you want to steal these patches from the Blake's review queue?

I talked to Blake and he does not mind if I start it and will take a look at it next week.

If I understand this patch correctly, pre-patch we did a swm->LoadRegistrations(aConfig.serviceWorkerRegistrations()); each time the parent requested a new URI to be loaded which is kind of a waste considering that swm is a singleton per content process if I read the code right, and wrong since it is not always called (Comment 3).

Post-patch the problem in comment 3 is fixed but I'm wondering if we could/should do the LoadRegistration instead of per browser creation only at the startup of a new content process. We have some functions on PContent like GetProcessAttributes()
or GetXPCOMProcessAttributes() for initial process data like this.

One more thing, could you add a test for this case to make this work future-/refactoringproof?
Flags: needinfo?(gkrizsanits) → needinfo?(amarchesini)
> If I understand this patch correctly, pre-patch we did a

Right.

> Post-patch the problem in comment 3 is fixed but I'm wondering if we
> could/should do the LoadRegistration instead of per browser creation only at
> the startup of a new content process. We have some functions on PContent
> like GetProcessAttributes()
> or GetXPCOMProcessAttributes() for initial process data like this.

Oh. Sure. This is better! And all of this happens before anything else, right?

> One more thing, could you add a test for this case to make this work
> future-/refactoringproof?

Sure.
Flags: needinfo?(amarchesini)
(In reply to Andrea Marchesini (:baku) from comment #14)
> Oh. Sure. This is better! And all of this happens before anything else,
> right?

Yeah, see ContentChild::Init for more details about the order of things we initialize there.
I moved everything to InitXPCOM because I need PBackground up and running.
This is a huge simplification of the code. Thanks for suggesting this!
Attachment #8767745 - Attachment is obsolete: true
Attachment #8767745 - Flags: review?(mrbkap)
Attachment #8769427 - Flags: review?(gkrizsanits)
Comment on attachment 8769427 [details] [diff] [review]
part 2 - moving BrowserConfiguration in the ContentChild::InitXPCOM

Review of attachment 8769427 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/ipc/ContentChild.cpp
@@ +1050,5 @@
>    RecvSetConnectivity(isConnected);
>    RecvBidiKeyboardNotify(isLangRTL, haveBidiKeyboards);
>  
> +  // Loading the ServiceWorker configuration.
> +  BrowserConfiguration configuration;

Do you know if there is any reason for this to be called BrowserConfiguration when it is only used for ServiceWorker data? 

Anyway, more important part is that I would prefer this LoadRegistration part a few lines lower, after the domain policy, CPOW inits and global->SetInitialProcessData(data) parts.

It can come before InitOnContentProcessCreated() I think.
Attachment #8769427 - Flags: review?(gkrizsanits) → review+
Attachment #8767744 - Flags: review?(mrbkap) → review+
> Do you know if there is any reason for this to be called
> BrowserConfiguration when it is only used for ServiceWorker data? 

The idea was to keep some data with the loadURI. But it seems that we don't have anything else than ServiceWorkerRegistrations.
We can probably rename BrowserConfiguration to ServiceWorkerConfiguration or something similar.
Follow up, of course.
Pushed by amarchesini@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/999f189f1147
part 1 - removed a non-used variable in BrowserConfiguration, r=gabor
https://hg.mozilla.org/integration/mozilla-inbound/rev/2f89df793074
part 2 - moving BrowserConfiguration in the ContentChild::InitXPCOM, r=gabor
https://hg.mozilla.org/mozilla-central/rev/999f189f1147
https://hg.mozilla.org/mozilla-central/rev/2f89df793074
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
The add-on LastPass stopped working I think with this patch.  This patch seems to be the only one between the GOOD and BAD.

GOOD
-------
https://hg.mozilla.org/integration/mozi ... 3d9f0c3b11

BAD
-----
https://hg.mozilla.org/integration/mozi ... 470409cf5e
Depends on: 1286015
Depends on: 1286126
Track this as e10s will be roll out in 48.
Do you want to request uplift for this fix?  If so, is there work in other bugs that should go along with it?

Jim, what do you think about the risk of bringing this into 48 beta 7?
Flags: needinfo?(jmathies)
Flags: needinfo?(amarchesini)
This bug lust land together with bug 1286126 (we broke nightly badly with just the patches attach to this bug).
To me, uplift them are OK for aurora. Beta seems a bit too risky.
Flags: needinfo?(amarchesini)
I'll rely on baku's response. bkelly stated in comment 9 this was corner case so sounds like aurora should suffice.
Flags: needinfo?(jmathies)
Comment on attachment 8767744 [details] [diff] [review]
part 1 - removed a non-used variable

Approval Request Comment
[Feature/regressing bug #]: ServiceWorkers
[User impact if declined]: content process doesn't have the correct serviceWorker entries, sometimes.
[Describe test coverage new/current, TreeHerder]: no tests. Only manual.
[Risks and why]: this patch is ok. The next one breaks nightly and it must land with bug 1286126
[String/UUID change made/needed]: none
Attachment #8767744 - Flags: approval-mozilla-aurora?
Comment on attachment 8769427 [details] [diff] [review]
part 2 - moving BrowserConfiguration in the ContentChild::InitXPCOM

Approval Request Comment
[Feature/regressing bug #]: ServiceWorkers
[User impact if declined]: content process doesn't have the correct serviceWorker entries, sometimes.
[Describe test coverage new/current, TreeHerder]: no tests. Only manual.
[Risks and why]: this patch must land with bug 1286126
[String/UUID change made/needed]: none
Attachment #8769427 - Flags: approval-mozilla-aurora?
Comment on attachment 8767744 [details] [diff] [review]
part 1 - removed a non-used variable

Review of attachment 8767744 [details] [diff] [review]:
-----------------------------------------------------------------

This patch fixes push notification not working in e10s which is high user impact. Take it in aurora.
Attachment #8767744 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8769427 [details] [diff] [review]
part 2 - moving BrowserConfiguration in the ContentChild::InitXPCOM

Review of attachment 8769427 [details] [diff] [review]:
-----------------------------------------------------------------

Same as above. Take it in aurora.
Attachment #8769427 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
has problems uplifting

grafting 353595:2f89df793074 "Bug 1279503 - part 2 - moving BrowserConfiguration in the ContentChild::InitXPCOM, r=gabor"
merging dom/ipc/ContentChild.cpp
merging dom/ipc/ContentChild.h
merging dom/ipc/ContentParent.cpp
merging dom/ipc/ContentParent.h
merging dom/ipc/PBrowser.ipdl
merging dom/ipc/TabChild.cpp
merging dom/ipc/TabChild.h
merging dom/ipc/TabParent.cpp
merging dom/ipc/TabParent.h
warning: conflicts while merging dom/ipc/ContentChild.cpp! (edit, then use 'hg resolve --mark')
abort: unresolved conflicts, can't continue
(use 'hg resolve' and 'hg graft --continue')
Flags: needinfo?(amarchesini)
Attached patch part 1 - m-a (obsolete) — Splinter Review
Flags: needinfo?(amarchesini)
Attached patch part 1 - m-aSplinter Review
Attachment #8772766 - Attachment is obsolete: true
Attached patch part 2 - m-aSplinter Review
Duplicate of this bug: 1291020
You need to log in before you can comment on or make changes to this bug.