Closed Bug 1029426 Opened 10 years ago Closed 9 years ago

Decide what to do about handling incoming notifications whilst client is disconnected/offline

Categories

(Hello (Loop) :: Client, defect, P4)

defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: standard8, Assigned: abr)

References

Details

(Whiteboard: [notification])

Sometimes the web socket for the push notification might drop, e.g. due to a network issue, restarting the browser for an update, or just the user shutting down the browser for a period of time.

We need to decide what to do (if anything) about notifications that come in during the network issue. For example, if we're offline for 1-2 seconds, and we happen to get a push notification in, then we could still pick up the details and handle the call before the caller end times out.

If we're offline for longer, then we could still potentially use the information for tracking missed calls.

There's a couple of possible ways of doing this:

1) Resend the uaid to the push server as part of the "hello" message.

This would get the push server to send us any pending notifications. However, this requires the push server to re-accept the uaid. It would also mean that we need to store more information locally.

2) Query the loop server after each re-connection/re-registration

This requires an additional ping to the loop server, but it means that we're using the server as the single authoritative rather than having state split across two servers.



Assigning to Adam to look at, as he said he needed to think about it on irc - I filed so that it can be tracked.
The SimplePush server is not providing any queuing messages, so if the websocket is down during a push no notification will be send.

I don't think we should queue any notification on the server but I do agree that we should ask the server for pending calls on a reconnection.

Also we will need to make sure to agree about the call information TTL.

The protocol documentation is waiting for 10s before terminating the call if no connection on the websocket.
(In reply to Rémy Hubscher (:natim) from comment #1)
> The SimplePush server is not providing any queuing messages, so if the
> websocket is down during a push no notification will be send.

FYI, I actually tested this with the push.services.mozilla.com service (after a quick hack so that Loop stored the uaid), and it did queue up the messages even when the socket was disconnected, it also sent them on the next hello.

Granted the specification is actually unclear (and still developing), so it may not remain there in future, but there is some sense of a queue in the documentation about notification and I based this discussion on what the server is currently doing.
Oh ok, nice to know.

JR do you know for how long the SimplePush server is queuing notifications?
Flags: needinfo?(jrconlin)
Is that behavior part of the specification? If it's not then we shouldn't rely on it. 

If it is, that would avoid having the client pinging the server without reason to have the list of calls on startup (is it what the client is currently doing, Mark?)

Another easy solution is to ignore these queued calls on startup and always have a look at the list of calls just after registration (since a call is valid for a small amount of time, that would provide the same results).
SimplePush queues a notification event for 72 hours. Of course, it may be replaced by a newer version, but if a client is unable to immediately receive a notification, the server retains it until the device reconnects.
Flags: needinfo?(jrconlin)
Thank you JR.

Since the call information are valid for only 30s I guess this is good enough for us.
If these are stored for 72 hours, then we will get notified of calls we don't care about. I believe we should ignore these queued calls on startup and always have a look at the list of calls just after registration. That would provide the same results.
Target Milestone: mozilla33 → mozilla34
RT: looks like a good scenario to handle, but is it more edge than needed for MVP.  can it wait until Fx35 while we get Hello looking good, good call, and the UX good?

If yes, please move to Fx35
Flags: needinfo?(rtestard)
triaged together.  edge case and risky logic - so want to de-risk.
Flags: needinfo?(rtestard)
Priority: P3 → P2
Target Milestone: mozilla34 → mozilla35
Depends on: 1028871
backlog: --- → Fx38?
Target Milestone: mozilla35 → ---
Hi Adam,

this wasn't part of the spec - can you write a straw man possibly - depending on importance based against your other priorities?  also totally dependent on impact you think this scenario has and risk posed.  not that sure what this bug causes.

backlogging and can pull out based on your input.
backlog: Fx38? → backlog
Flags: needinfo?(adam)
Priority: P2 → P3
I think the only really clean way to handle this is to re-query the Loop server for state whenever the push connection has gone offline for any period of time. Given that this should (presumably) be a relatively infrequent error situation, the increase of load on the Loop server should be imperceptible. This should also be pretty easy to implement.

*However*, we should test the premise that this is a rare situation: the "re-poll for state" functionality should land in tandem with a telemetry patch that counts the number of times this happens. In order to allow us to reasonably compare numbers, we should also peg the number of times an initial connection occurs. If the telemetry bit is unclear, ping me on IRC and I can explain.

____
As a sidenote, I know that the Simple Push team has made some optimizations to the Loop Push servers that make several simplifying assumptions based on the behavior that we need. If my memory serves, one of those optimizations was elimination of message queuing.
Flags: needinfo?(adam)
backlog: backlog+ → backlog-
we are merging some of rooms and calling - so this is more determining how to handle for both.

mark is thinking of a few things and will look at this.

possibly change title to reflect both.
backlog: backlog- → ---
Rank: 45
Flags: needinfo?(standard8)
Flags: firefox-backlog+
Priority: P3 → P4
Whiteboard: [notification]
Summary: Decide what to do about handling incoming call notifications whilst client is disconnected/offline → Decide what to do about handling incoming notifications whilst client is disconnected/offline
Currently the Loop push servers don't do queuing of events.

When we switch to the new push backend, (bug 1131813), then we'll start getting notifications. I did some testing and analysis ahead of this:

- Rooms: On startup, we should be getting the list of rooms, we don't currently. If there's been room changes, we'll presumably get poked by the push server, at which point, because we've not retrieved any rooms yet we'll get a full list which should be the up to date one.

(Note: there's probably a bug here that we should be getting the list of rooms at startup, in case someone is in one, although I think having push notifications would solve most of the cases).

- Direct calls: I tested altering a client with a delayed handling of the push notification. I delayed it for ~10 seconds, and the call was gone from the list when queried on the server. Delaying the push notification handling for ~8 seconds meant the call still got handled.

At the moment, I think we could close this as invalid, but when we swap to the new service, we should do some testing around network connectivity to ensure we get all the right information.
Flags: needinfo?(standard8)
I did the testing in comment 13 for the new push server switch that happened recently - even without the new backend, our existing backend resends the same uaid, and the servers now deliver offline notifications.

Hence, we're now handling the calls and room notifications as per comment 13. So we can close this as WFM.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.