Closed Bug 1109102 Opened 10 years ago Closed 9 years ago

Inconsistencies in the rooms list in the panel ui - not reflecting reality/expected state

Categories

(Hello (Loop) :: Client, defect, P2)

defect
Points:
2

Tracking

(firefox35 affected, firefox36 affected, firefox37 affected)

RESOLVED FIXED
mozilla35
Tracking Status
firefox35 --- affected
firefox36 --- affected
firefox37 --- affected
backlog Fx35+

People

(Reporter: cos_flaviu, Assigned: mikedeboer)

References

Details

(Whiteboard: [fixed by bug 1102432/bug 1111579])

Reproducible on the latest Beta (BuildID: 20141208150535):
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:35.0) Gecko/20100101 Firefox/35.0
Reproducible on the latest Aurora (BuildID: 20141209004001):
Mozilla/5.0 (Windows NT 6.1; WOW64; rv:36.0) Gecko/20100101 Firefox/36.0
Reproducible on the latest Nightly (BuildID: 20141208030202): 
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:37.0) Gecko/20100101 Firefox/37.0
Mozilla/5.0 (X11; Linux i686; rv:37.0) Gecko/20100101 Firefox/37.0
Mozilla/5.0 (Macintosh; Intel Mac OS X 10.8; rv:37.0) Gecko/20100101 Firefox/37.0


Steps to reproduce:
1. Launch Firefox;
2. Start a conversation using Loop;
3. Copy the conversation URL to clipboard;
4. Paste the conversation URL to Chrome and tap on 'Join the conversation' button;
5. Close the conversation tab from Chrome.

Expected results:
Once the conversation tab is closed the Loop icon turns from blue to grey and the conversation name is no longer bold.

Actual results:
The Loop icon remains blue and the conversation name remains bold forever.

Notes:
1. This issue started once with the rooms UI.
backlog: --- → Fx35+
Priority: -- → P1
Assignee: nobody → mdeboer
Status: NEW → ASSIGNED
Iteration: --- → 37.1
Points: --- → 2
Flags: qe-verify+
Flags: needinfo?(mmucci)
Flags: firefox-backlog+
Added to IT 37.2
Iteration: 37.1 → 37.2
Flags: needinfo?(mmucci)
AFAICT, this is caused by one of these problems (might be both):
 * The standalone client doesn't send an `action=leave` POST request to the Loop server, or doesn't get the time to do so, because the window closes too fast, cancelling XHRs that may be in-flight. In that case we need to do a synchronous XHR.
 * The video stream does get disconnected immediately and the TokBox SDK does the right thing, so my guess is that the Loop Server should also get an event when this happens. Is it possible for the server to just rely on this event or are we missing metadata at that point?

I don't have the definitive answer here, so n-i'ing the best we have...
Flags: needinfo?(standard8)
Flags: needinfo?(rhubscher)
Flags: needinfo?(adam)
I've run through the STR, and it appears that sometimes this works, and sometimes it does not.

Far more importantly, the "stuck room" phenomenon does not appear to be isolated to the "Chrome user closes a tab" issue, either. I've been able to make it happen with "Chrome user clicks on 'Leave' button" (as well as other variations on users leaving rooms).

I've also seen the "someone is in your room" indicators stay active after a room owner joins an otherwise empty room and then departs.

So it appears that the issue isn't necessarily triggered by closing a chrome tab. It appears that there's some issue that prevents reliable notification of a room state change. I note that it doesn't seem to arise when users *join* a room -- at least, I haven't been able to make that happen -- so I suspect it's *not* related to the loop push service. I would start by looking at the loop server logs to see what *it* thinks is happening when someone leaves a room but the room owner doesn't have that reflected in their UI.

Over to Remy and Alexis...
Flags: needinfo?(adam) → needinfo?(alexis+bugs)
Earlier today I was working at a coffee shop and looking at a bug about being notified when others join your room. For the life of me I was unable to trigger any notifications of a user joining my room.

Now that I'm connected from home, everything is working again. I wonder if something is being blocked. Will return to the coffee shop later today to test it again.

Either way, it's not reliable, as abr mentioned.
My experience matches abr's - especially with larger numbers of rooms open (that may open the timing window).  And it only seems to be 'leaves' that fail to work, not arrivals.
With a stuck blue dot/etc, quit and restart - if it's not stuck anymore, that implies the server knows the state and we just didn't get it somehow.
Leave when closing a tab has definitely been unreliable - bug 1097862 now has a patch for this.

I don't see how Leave when pressing the button could be unreliable (from a client perspective) - we don't destroy any of the relevant objects that control that, so there's no obvious cause there.

(In reply to Mike de Boer [:mikedeboer] from comment #2)
>  * The video stream does get disconnected immediately and the TokBox SDK
> does the right thing, so my guess is that the Loop Server should also get an
> event when this happens. Is it possible for the server to just rely on this
> event or are we missing metadata at that point?

AFAIK, there's no functionality available for the TokBox servers to tell the loop-server that a client has left (or for the loop-server to poll this info).

Currently, we rely on the 5-min timeout to reset the state, although the server doesn't actually notify the client when that happens. There's two things here:

- The server *never* tells the client for that room unless a full re-request is made - it should at least tell it in the next notification (bug 1095010)
- If a client timeouts then the server doesn't tell the client straight away. This seems to be a design requirement, although I don't know where its come from

Adam, can you tell us if its intentional that the server doesn't notify the client straight away on a timeout? I think I read something about it somewhere, but can't find it.
Depends on: 1097862, 1095010
Flags: needinfo?(standard8) → needinfo?(adam)
Depends on: 1111579
Ok, trying to update the title to reflect what's being discussed here.
Summary: The Loop icon remains blue and the conversation name remains bold forever → Inconsistencies in the rooms list in the panel ui - not reflecting reality/expected state
Blocks: 1110309
(In reply to Mark Banner (:standard8) from comment #7)
> Adam, can you tell us if its intentional that the server doesn't notify the
> client straight away on a timeout? I think I read something about it
> somewhere, but can't find it.

Yes, it's intentional, because the server guys were worried that having to run proactive timers would be difficult to scale. Since this should be a relatively rare corner case (pretty much limited to crashes, loss of network connectivity, and battery failures), the cost for being responsive didn't seem to be warranted.

If we're seeing that this situation arises more frequently than we originally anticipated, we probably want to revisit whether the server needs to run a timer to let the clients know of room departure in a more timely fashion.
Flags: needinfo?(adam)
Blocks: 1074679
Clearing the NI for me since Adam answered the question already.
Flags: needinfo?(alexis+bugs)
Clearing the NI for me since Adam answered the question already.
Flags: needinfo?(rhubscher)
Iteration: 37.2 → 37.3
Just fyi:  we believe this is solely caused by server bugs 1111579 and bug 1095010.  We are keeping this open until those land (and the server updated) so we can verify that there isn't another bug here.
Lowering this to a P2 (based on our new priority definitions) because I strongly believe the bug here is solely on the server-side and if there were a client-side bug here, I don't think we'd block the release for it.  We would however work to fix it ASAP and see how far we could/should uplift it.
Severity: normal → major
Priority: P1 → P2
Even if there is no notification when the participant expires from the room, after closing the browser and restarting it (after 5 minutes) the participant number on GET /rooms should be accurate.

We will be working on Bug 1095010 to notify the room owner on participant expiration.
Iteration: 37.3 - 12 Jan → ---
We believe this was fixed by bug 1102432/bug 1111579.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Whiteboard: [fixed by bug 1102432/bug 1111579]
Target Milestone: --- → mozilla35
You need to log in before you can comment on or make changes to this bug.