Closed Bug 1095379 Opened 6 years ago Closed 6 years ago

Opening the Loop panel before automatic registration causes rooms not to be displayed

Categories

(Hello (Loop) :: Client, defect, P1)

defect
Points:
2

Tracking

(firefox35 verified, firefox36 verified)

VERIFIED FIXED
mozilla36
Iteration:
36.3
Tracking Status
firefox35 --- verified
firefox36 --- verified
Blocking Flags:
backlog Fx35+

People

(Reporter: standard8, Assigned: MattN)

References

Details

(Whiteboard: [rooms])

Attachments

(1 file, 4 obsolete files)

STR:

1) In a profile with rooms enabled, create some rooms.
2) Restart Firefox, and open the panel as soon as it comes up.
3) Switch to the Rooms display

Expected Results

After registration completes, the rooms are displayed

Actual Results

No rooms are displayed. Lots of errors on the console along the lines of:

Failed to register with Loop server with sessionType 1" Error: error: channel already registered:
onRegistered@resource:///modules/loop/MozLoopService.jsm:330:20
MozLoopPushHandler.register@resource:///modules/loop/MozLoopPushHandler.jsm:116:7
MozLoopServiceInternal.promiseRegisteredWithPushServer/registerForNotification/<@resource:///modules/loop/MozLoopService.jsm:344:9
Iteration: --- → 36.2
Priority: -- → P1
Assignee: nobody → mdeboer
Status: NEW → ASSIGNED
Points: --- → 2
Whiteboard: [rooms]
Flags: qe-verify+
Flags: needinfo?(mmucci)
Flags: firefox-backlog+
Comment on attachment 8518868 [details] [diff] [review]
Patch v2: deal with race condition when the push server returns an error when attempting to register the same channel twice

Review of attachment 8518868 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/components/loop/LoopRooms.jsm
@@ +130,5 @@
>        version = null;
>      }
>  
>      Task.spawn(function* () {
> +      yield MozLoopService.promiseRegisteredWithServers(LOOP_SESSION_TYPE.FXA);

The additional parameter here isn't used, and I can't understand why it would be needed.

Was this change a mistake?

::: browser/components/loop/MozLoopService.jsm
@@ +318,5 @@
>        log.debug("registerForNotification", channelID);
>        return new Promise((resolve, reject) => {
>          function onRegistered(error, pushUrl) {
>            log.debug("registerForNotification onRegistered:", error, pushUrl);
> +          let raceURL = MozLoopServiceInternal.pushHandler.registeredChannels[channelID];

This needs a comment to explain what's happening - the fact that we may be handling urls that have already been registered due to timing issues.
As discussed over Vidyo, things are a bit hairier than I expected:

 - `promiseRegisteredWithServers(1)` and `promiseRegisteredWithServers(2)` are racing.
 - Both invoke `promiseRegisteredWithPushServer()`. The first wins the race but takes longer to return to the client with the pushURL, so the second one - loser - gets back to the client earlier with an error message, saying the channel is already registered.
 - OUCH!
Attachment #8518868 - Attachment is obsolete: true
Attachment #8518868 - Flags: review?(standard8)
Attachment #8518919 - Flags: review?(standard8)
Added to IT 36.2
Flags: needinfo?(mmucci)
This seems to fail xpcshell-tests:

 0:04.36 TEST_STATUS: Thread-37 test_register_websocket_success_loop_server_fail FAIL [test_register_websocket_success_loop_server_fail : 25] Expected no errors in websocket registration - "undefined" == "404"
/Users/mark/loop/gecko-dev/objdir-ff/_tests/xpcshell/browser/components/loop/test/xpcshell/test_loopservice_registration.js:test_register_websocket_success_loop_server_fail/<:25
resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:Handler.prototype.process:871
resource://gre/modules/Promise.jsm -> resource://gre/modules/Promise-backend.js:this.PromiseWalker.walkerLoop:747
/Users/mark/loop/gecko-dev/testing/xpcshell/head.js:_do_main:184
/Users/mark/loop/gecko-dev/testing/xpcshell/head.js:_execute_test:372
Marco, please re-assign this bug to me.
Assignee: mdeboer → jaws
Flags: needinfo?(mmucci)
Added to IT 36.2
Flags: needinfo?(mmucci)
Comment on attachment 8518919 [details] [diff] [review]
Patch v3: deal with race condition when the push server returns an error when attempting to register the same channel twice

Review of attachment 8518919 [details] [diff] [review]:
-----------------------------------------------------------------

I think another solution to consider is changing promiseRegisteredWithPushServer to do the FxA and guest push registrations separately (with a sessionType arg). Then we would use deferredRegistrations instead of adding a new deferred variable that we have to keep in sync.

::: browser/components/loop/MozLoopService.jsm
@@ +106,5 @@
>  let gFxAOAuthClient = null;
>  let gErrors = new Map();
>  let gLastWindowId = 0;
>  let gConversationWindowData = new Map();
> +let gRegisterWithPushServerPromise = null;

I think we should stop adding globals and add this as a private member as discussed in bug 1079198 and as I already started. It's much easier to test a member over a non-exported global.
Attached file MozReview Request: bz://1095379/MattN (obsolete) —
/r/383 - Bug 1095379 - WIP to separate push registrations by sessionType.

Pull down this commit:

hg pull review -r 228c6e602fc199e1ca99db7af85ee4711159ea52
Comment on attachment 8520253 [details]
MozReview Request: bz://1095379/MattN

I think this approach is easier to maintain and it's something I wanted to do anyways but left it as-is before to minimize changes.
Attachment #8520253 - Attachment description: MozReview Request: bz://1095379/MattN → MozReview Request: bz://1095379/MattN (Option 4)
Attachment #8520253 - Flags: feedback?(standard8)
Iteration: 36.2 → 36.3
We are already planning to fix this for Rooms Fx35, but the blocking flag wasn't set.

Currently, the decision is whether to go with Mike's approach or MattN's approach to fix this bug.
backlog: --- → Fx35+
(In reply to Maire Reavy [:mreavy] (Plz needinfo me) from comment #13)
> We are already planning to fix this for Rooms Fx35, but the blocking flag
> wasn't set.
> 
> Currently, the decision is whether to go with Mike's approach or MattN's
> approach to fix this bug.

Mark -- as reviewer, which approach do you prefer?
Flags: needinfo?(standard8)
I'll look at this in the morning.
Flags: needinfo?(standard8)
Comment on attachment 8520253 [details]
MozReview Request: bz://1095379/MattN

Ok, I like this version, as it's reducing the registrations when we don't need them.

However, I'm still slightly concerned that we're leaving ourselves open to a bug which Mike's second patch was trying to fix - namely that of stopping reentrancy failing.

Although I don't think that can happen with the current flow, I'd prefer to remove the possibility now, rather than finding another issue with it later.

Do you think it'd be possible to come up with a patch that combines the two approaches?
Attachment #8520253 - Flags: feedback?(standard8)
Attachment #8518919 - Flags: review?(standard8)
Attachment #8520253 - Attachment description: MozReview Request: bz://1095379/MattN (Option 4) → MozReview Request: bz://1095379/MattN
Attachment #8520253 - Flags: review?(standard8)
/r/383 - Bug 1095379 - Separate push registrations by sessionType and prevent calling promiseRegisteredWithPushServer from outside of promiseRegisteredWithServers. r=Standard8

Pull down this commit:

hg pull review -r df971242191b196191d97e91523860fa2d98870d
Attachment #8518919 - Attachment is obsolete: true
Assignee: jaws → MattN+bmo
I confirmed that the patch fixes comment 0
Attachment #8520253 - Flags: review?(standard8) → review+
https://reviewboard.mozilla.org/r/381/#review241

Looks good, seems to work in the various scenarios I tested.
https://hg.mozilla.org/integration/fx-team/rev/4a505fb25edb
Whiteboard: [rooms] → [rooms][fixed-in-fx-team]
https://hg.mozilla.org/mozilla-central/rev/4a505fb25edb
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Whiteboard: [rooms][fixed-in-fx-team] → [rooms]
Target Milestone: --- → mozilla36
qe-verify- because this already has test coverage. Please need-info me if this needs testing beyond that coverage.
Flags: qe-verify+ → qe-verify-
Anthony, it doesn't have test coverage. There were fixes to a test but no regression test for the fix.
Flags: qe-verify?
Flags: qe-verify-
Flags: needinfo?(anthony.s.hughes)
Flags: in-testsuite+
(In reply to Matthew N. [:MattN] (UTC+1 until Nov. 22; away Nov. 14-19) from comment #24)
> Anthony, it doesn't have test coverage. There were fixes to a test but no
> regression test for the fix.

Certainly something we can verify then, but can this have additional test coverage added? It's not really something we should be spending QE time on repeatedly checking indefinitely.
Flags: qe-verify?
Flags: qe-verify+
Flags: needinfo?(anthony.s.hughes)
Verified fixed 36.0a2 (2014-12-17), 35b4 Win 7
Status: RESOLVED → VERIFIED
Attachment #8520253 - Attachment is obsolete: true
Attachment #8618569 - Flags: review+
You need to log in before you can comment on or make changes to this bug.