Closed Bug 1002414 Opened 6 years ago Closed 5 years ago

Loop Service needs to handle push server disconnects / expiry (Push connection 'silently' drops, unable to receive incoming call)

Categories

(Hello (Loop) :: Client, defect, P1)

defect

Tracking

(Not tracked)

VERIFIED FIXED
33 Sprint 3- 7/21
Blocking Flags:
backlog mlp?

People

(Reporter: standard8, Assigned: pkerr)

References

Details

(Whiteboard: [p=4][gecko][qa-])

User Story

MozLoopPushHandler:onStop/onServerClose:

- If either of these is triggered, re-initiate the websocket and try reconnecting.
-- If reconnecting fails, implement a backoff strategy to re-try occasionally.

MozLoopService/MozLoopPushHandler interactions:

- When a reconnect succeeds, we need to ensure that the channel is still registered.
- When a reconnect succeeds, if we're given a different push url in the registration step, we need to update the MozLoopService to have that push url.

Tests:

The existing MozLoopPushHandler/MozLoopService xpcshell tests should be extended to cover all new existing code.

Background info for protocol: https://wiki.mozilla.org/WebAPI/SimplePush/Protocol

Attachments

(2 files, 6 obsolete files)

The workarounds in MozLoopService don't handle the websocket being disconnected, nor do they handle a push url expiry (if there is such a thing, which happens without disconnection).

This means that a user could loose connection at some stage.
Blocks: 1013251
No longer blocks: loop_mvp
Depends on: 976789
We need to handle this for mvp, so hitting in FF 33 timescale.
Priority: -- → P2
Target Milestone: --- → mozilla33
need to make sure we have stability

we do not want to rely on the existing Push implementation in desktop - so need to continue to use our custom implementation.  network API we are using will be available, but desktop PUSH is changing to evolving w3c spec.
Duplicate of this bug: 1013251
Depends on: 1022689
Blocks: 1002416
Assignee: nobody → standard8
Whiteboard: [p=?][investigating]
Priority: P2 → P1
Summary: Loop Service needs to handle push server disconnects / expiry → Loop Service needs to handle push server disconnects / expiry (Push connection 'silently' drops, unable to receive incoming call)
Blocks: 1028865
Blocks: 1028868
Blocks: 1028869
User Story: (updated)
Whiteboard: [p=?][investigating] → [p=4]
Assignee: standard8 → nobody
Blocks: 1029426
Whiteboard: [p=4] → [p=4][gecko]
Assignee: nobody → pkerr
Code changes to add retry functionality when connection to PushServer is lost. Patch to follow with updated xpcshell tests.
Attachment #8454766 - Flags: review?(standard8)
Comment on attachment 8454766 [details] [diff] [review]
Part 1: Add retry logic to PushServer user agent

Review of attachment 8454766 [details] [diff] [review]:
-----------------------------------------------------------------

Generally looks good to me, most of the comments are trivial.

I haven't got perms to review this, Justin Dolske is the person to ask this week.

::: browser/components/loop/MozLoopPushHandler.jsm
@@ +32,5 @@
>    pushUrl: undefined,
> +  // Set to true once the channelID has been registered with the PushServer.
> +  registered: false,
> +
> +  _minRetryDelay_ms: (() => { 

nit: trailing whitespace

@@ +41,5 @@
> +      return 60000 // 1 minute
> +    }
> +  })(),
> +
> +  _maxRetryDelay_ms: (() => { 

nit: trailing whitespace

@@ +97,5 @@
> +    // If a uaID has already been assigned, assume this is a re-connect
> +    // and send the uaID in order to re-synch with the
> +    // PushServer. If a registration has been completed, send the channelID.
> +    this._retryEnd();
> +    let helloMsg = { messageType: "hello",

I think it makes clearer reading to put the comment after the retryEnd before the let.

@@ +159,5 @@
>          break;
>      }
>    },
>  
> +  _onRegister: function(msg) {

nit: please document this function and arguments

@@ +192,5 @@
> +   * was received, calling asyncOpen() on the same interface will
> +   * trigger a "alreay open socket" exception even though the channel
> +   * is logically closed.
> +   */
> +  _openSocket: function() {

Due to bug 1038304, I think it would be useful to incorporate the Services.io.offline check (similar to that in initialize) in here. This is what was throwing me when I was testing it earlier.

iirc I had put the check in initialize due to the bug, but then failed to file the core bug.

Supporting offline mode also makes it easier to test, even if it isn't used that often.

@@ +216,5 @@
>        channelID: this.channelID
>      }));
> +  },
> +
> +  _retryOperation: function(delayedOp) {

nit: please document this function and arguments

@@ +228,5 @@
> +    }
> +    this._timeoutID = setTimeout(delayedOp, this._retryDelay);
> +  },
> +
> +  _retryEnd: function() {

nit: please document this function and arguments
Attachment #8454766 - Flags: review?(standard8) → feedback+
Update based on feedback
Attachment #8454766 - Attachment is obsolete: true
Attachment #8456288 - Attachment is obsolete: true
Updated xpcshell unit test.
Attachment #8456483 - Flags: review?(dolske)
Attachment #8456484 - Flags: review?(dolske)
Comment on attachment 8456483 [details] [diff] [review]
Part 1: Add retry logic to PushServer user agent

Review of attachment 8456483 [details] [diff] [review]:
-----------------------------------------------------------------

I don't really know Loop code, so I think you'd be better off with a review from another Loop developer. But feedback+ with a couple nits, and if your other reviewer is OK with the changes I'm ok with that review being sufficient for landing, as a module peer.

::: browser/components/loop/MozLoopPushHandler.jsm
@@ +21,2 @@
>   */
> +//XXX - do we need to handle non-response from the PushServer during the hello handshake?

Nit -- if it's a question that needs answering, general preferred style it to either do it in this bug (and remove the XXX) or file a new bug on the issue and note the bug number here. (As had been done with the comment being removed.)

@@ +43,5 @@
> +  })(),
> +
> +  _maxRetryDelay_ms: (() => {
> +    try {
> +      return Services.prefs.getIntPref("loop.retry_delay.limit")

Unless these are extremely obscure, I'd suggest adding them to all.js/firefox.js so they're visible in about:prefs by default.
Attachment #8456483 - Flags: review?(dolske) → feedback+
Comment on attachment 8456484 [details] [diff] [review]
Part 2: Add additional tests for new functional paths

Review of attachment 8456484 [details] [diff] [review]:
-----------------------------------------------------------------

(ditto)
Attachment #8456484 - Flags: review?(dolske) → feedback+
Comment on attachment 8456483 [details] [diff] [review]
Part 1: Add retry logic to PushServer user agent

Given Justin's comment, I can take on the review.
Attachment #8456483 - Flags: review?(standard8)
1) Added retry operation to the PushServer initial handshake. 2) Put retry timeouts into prefs
Attachment #8456483 - Attachment is obsolete: true
Attachment #8456483 - Flags: review?(standard8)
Attachment #8458012 - Flags: review?(standard8)
Attachment #8456484 - Flags: review?(standard8)
Comment on attachment 8458012 [details] [diff] [review]
Part 1: Add retry logic to PushServer user agent

Review of attachment 8458012 [details] [diff] [review]:
-----------------------------------------------------------------

The additional changes here look good. r=Standard8

::: browser/app/profile/firefox.js
@@ +1520,1 @@
>  #endif

Just to note there's a little bit of bitrot here as the ifdefs have been removed.
Attachment #8458012 - Flags: review?(standard8) → review+
Comment on attachment 8456484 [details] [diff] [review]
Part 2: Add additional tests for new functional paths

Review of attachment 8456484 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. r=Standard8

::: browser/components/loop/test/xpcshell/test_looppush_initialize.js
@@ +29,5 @@
> +      },
> +      function(version) {
> +        Assert.equal(version, 15, "Should have version number 15");
> +        run_next_test();
> +      }, 

nit: trailing whitespace.

@@ +36,5 @@
>  
> +  add_test(function test_reconnect_websocket() {
> +    MozLoopPushHandler.uaID = undefined;
> +    MozLoopPushHandler.pushUrl = undefined; //Do this to force a new registration callback.
> +    mockWebSocket.stop();

I think it would be either worth documenting that these tests rely on the callbacks set up in test_initalize_websocket, or redefine those in these tests. It took me a couple of minutes to work out how this was succeeding.
Attachment #8456484 - Flags: review?(standard8) → review+
fix of bit-rotted patch
Attachment #8458012 - Attachment is obsolete: true
Attachment #8458310 - Attachment is obsolete: true
Attachment #8456484 - Attachment is obsolete: true
tagged with r=standard8
Keywords: checkin-needed
Target Milestone: mozilla33 → 33 Sprint 2- 7/7
Comment on attachment 8458349 [details] [diff] [review]
Part 1: Add retry logic to PushServer user agent

marking r=standard8
Attachment #8458349 - Flags: review+
Comment on attachment 8458350 [details] [diff] [review]
Part 2: Add additional tests for new functional paths

marking r=standard8
Attachment #8458350 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/a561499de7a4
https://hg.mozilla.org/mozilla-central/rev/89b195298ca3
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: 33 Sprint 2- 7/7 → 33 Sprint 3- 7/21
Does this need QA testing or is it sufficiently covered by automation?
Flags: needinfo?(pkerr)
QA Contact: anthony.s.hughes
I believe that automation should cover the failure modes. It is sufficiently covered.
Flags: needinfo?(pkerr)
(In reply to Paul Kerr [:pkerr] from comment #24)
> I believe that automation should cover the failure modes. It is sufficiently
> covered.

Thanks Paul, please needinfo? me if you change your mind.
Status: RESOLVED → VERIFIED
Whiteboard: [p=4][gecko] → [p=4][gecko][qa-]
You need to log in before you can comment on or make changes to this bug.