Open Bug 1488067 Opened 2 years ago Updated 2 years ago

Web push signal is sometimes lost on Android

Categories

(Core :: DOM: Push Notifications, defect, P2)

61 Branch
defect

Tracking

()

UNCONFIRMED

People

(Reporter: collimarco91, Unassigned)

Details

User Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10_13_6) AppleWebKit/605.1.15 (KHTML, like Gecko) Version/11.1.2 Safari/605.1.15

Steps to reproduce:

When we send web push notifications to Firefox on Android (7.0, Huawei P9 lite), they usually work. However sometimes it happens that the push signal is lost and the notifications are not fetched from the application server. Only when a second notification (push signal) arrives, then both the notifications are fetched from the application server and displayed to the user. 

We noticed this behaviors multiple times, but obviously it is not easy to reproduce because it happens "sometimes".

Basically this is what happens:

1st notification: application server -- (success) --> Mozilla autopush -- (lost) --> ??

2nd notification: application server -- (success) --> Mozilla autopush -- (success) --> service worker activated --> fetch all notifications from application server --> 1st and 2nd notifications are displayed at the same time

Note that if you don't send a second notifications the 1st one will never be displayed (and will probably expire due to TTL after some days/weeks).

Also note that we don't send any payload with the notification: we only send a signal that activates the service worker, and then we fetch the notification payload from our servers.

We would also like to underly that the response code from Mozilla autopush is successful for both notifications. Also the endpoint is not expired/changed (otherwise the query to fetch the notifications from our application server would return only the new notification associated to the new endpoint, and not both the notifications).


Actual results:

The push signal is lost and doesn't activate the service worker (or the service worker doesn't process the push signal).


Expected results:

The push signal (even without payload) must always be processed.
snorp, do you know who knows about Android push stuff?
Flags: needinfo?(snorp)
Priority: -- → P2
There are more than a few moving parts here that may make this a very difficult bug to chase.

1) For android, we use GCM as a bridge system. When you send a message through our service, we hand off to GCM nearly instantly. The (201) reply you get back from our server indicates that the message is now in the hands of GCM which has accepted the message for delivery without error. 

2) We don't track messages through GCM for a lot of reasons. GCM, like WebPush in general, is "best effort" delivery. This means that some messages can and do get "lost" in transmission. Likewise it's not a "real time" delivery system. Effort is made to deliver quickly, but no guarantee can be provided that receipt can be made.

3) GCM is slated to go away next year. Effort is underway by the android team to move to FCM, but there are a lot of issues in doing that. 

What I expect may be happening is that GCM or the Android OS may be delaying delivery of messages for "reasons" (improved battery life, server load management, just feel like it, etc.). Those delays are out of our control.

Note that most of the mobile platforms we support use bridge services to carry the notification. Each has their own "quirks" we are subject to. 

Someone on the android team may be able to give more insight as well.
Yeah, as comment #2 illustrates, it's very complicated.

Reporter, are you able to reproduce this consistently? If you are able to capture a logcat from an affected device when you think it should be receiving the first notification, we may be able to tell if the GCM message is received. You should see a line like "Message received.  Processing on background thread." If you see that, we're dropping the message somewhere after receiving via GCM. If you don't, then it's an upstream problem.
Flags: needinfo?(snorp) → needinfo?(collimarco91)
> If you are able to capture a logcat

How can I see the log?

> are you able to reproduce this consistently?

It happens only rarely so it will be very difficult to reproduce.
(In reply to collimarco91 from comment #4)
> > If you are able to capture a logcat
> 
> How can I see the log?

You can download Android Studio or just the sdktools (at the bottom of the page) here https://developer.android.com/studio/

> 
> > are you able to reproduce this consistently?
> 
> It happens only rarely so it will be very difficult to reproduce.

I see. If you're able to plug your phone in and get a logcat after you believe a message was dropped, we may still be able to get some usable information.
I can reproduce the problem quite often if:

1. I close all applications (from the Android task bar -> click trash)
2. reboot Android
3. send a push signal


I have used this test website: https://vapid-example.pushpad.xyz - you can create a domain like that using Pushpad Express https://pushpad.xyz/docs/pushpad_express_getting_started

Tested on FF 61, Android 7.0
Flags: needinfo?(collimarco91)
You need to log in before you can comment on or make changes to this bug.