Closed Bug 1119765 Opened 9 years ago Closed 9 years ago

Joining and leaving a Hello room quickly can leave the room as "full" (desktop-only)

Categories

(Hello (Loop) :: Client, defect, P2)

defect
Points:
3

Tracking

(firefox36 verified, firefox37 verified, firefox38 verified)

VERIFIED FIXED
mozilla38
Iteration:
38.1 - 26 Jan
Tracking Status
firefox36 --- verified
firefox37 --- verified
firefox38 --- verified
backlog Fx36+

People

(Reporter: RT, Assigned: standard8)

Details

Attachments

(1 file)

Environment: Firefox Beta 35 on Windows 8

What happened:
1 Someone joined one of my rooms, I did not get audio notification and the Hello icon did not turn blue
2 Bringing down the panel, the indicator showed someone was in one of my rooms
3 When I joined my room I had a "Something went wrong" and browser logs were:

about:loopconversation#d9fz7pTJTdM : Unable to run script because scripts are blocked internally.
no element found ClientEvent:1
no element found ClientEvent:1
"Loop hawkRequest error:" Object { code: 400, errno: 202, error: "The room is full." } MozLoopService.jsm:552
"Error in state `room-media-wait`:" Object { code: 400, errno: 202, error: "The room is full." } activeRoomStore.js:114

Security Error: Content at about:loopconversation#d9fz7pTJTdM may not load data from blob:null/b6be0c48-e8b6-4b8f-893a-924894b38300.
Security Error: Content at about:loopconversation#d9fz7pTJTdM may not load data from blob:null/d5c07ca3-7461-44ab-91fe-f8573ca6e822.
no element found ClientEvent:1
no element found ClientEvent:1
about:loopconversation#d9fz7pTJTdM : Unable to run script because scripts are blocked internally.

I tried several times and could not join until I closed/reopened Firefox and it then worked.
RT -- Was this (the one that didn't work) the first call/conversation you had since starting the browser, or had you had other calls prior to the one that didn't work?
Flags: needinfo?(rtestard)
I had no calls not working before.
I think it was the first one since the browser start but I am not 100% sure.
Flags: needinfo?(rtestard)
Ok, I've taken a look at this. From the server logs for the room, I could see from the server logs an instance where:

- The room was joined by RT
- No leave was posted to the loop-server
- Another member joined the room
- RT tried to rejoin, but got "room full"
- RT then couldn't join until the other member left the room.

Its probable that he'd have been able to rejoin after the server had timed out his connection, which is currently after an hour, though it should be 5 minutes (bug 1119782).

I've reproduced this locally, you have to open the conversation and then close it at just the right time - somewhere around 1 second after it starts opening and whilst doing its initial set-up.

The other confusing issue is that the server is clearing the room state when the other member leaves the room. I'll file a separate bug for that when I get a few more details.
Assignee: nobody → standard8
backlog: --- → Fx37?
OS: Windows 8 → All
Priority: -- → P1
Hardware: x86_64 → All
Ok, at initial glance in the code, the issue appears to be that there's a small period of time in which we may have sent a join request, but we haven't stored the fact that we have - and hence, if the window is closed in that time slot, the leave won't get sent.

On my setup, I'm seeing round trip times of ~200ms for the join to rooms, so that's probably the rough time slot you've got to avoid closing the window in.

In theory, this could happen on standalone as well, but I think the process there would be:

- Hit join the conversation
- Accept the media prompt
- Close the tab within a second

I think that's a quite unlikely to be hit, and I also tried reproducing with the "Leave" button on standalone, but I think that's too late in the process.

So the main issue here is likely on the desktop side.
Iteration: --- → 37.3 - 12 Jan
Points: --- → 3
Mark is taking this as his next bug.  We think this may be worth uplifting to Fx36 (hopefully end of next week).   

It's a small timing hole (200 ms), and recovers after a few minutes (especially once the server refresh is set back to 5 mins), but the result is bad.  The server may also be able to mitigate or even eliminate the chances of the user hitting this (at least for the non-FxA use case).
backlog: Fx37? → Fx36?
Not sure if this is a true P1 for Fx36. I think it's a P2.  We can debate the priority once we have a fix in hand and know more about whether the server will be able to mitigate the chances of the user hitting this.
Priority: P1 → P2
Summary: Could not join one of my rooms where someone was there already → Joining and leaving a Hello room quickly can leave the room as "full" (desktop-only)
I considered several ways of dealing with this (flags, separate objects etc), and creating a new state seemed the best way. With the new state, we now know we've already sent the join, and so need to send a leave as well.
Attachment #8547539 - Flags: review?(mdeboer)
(In reply to Mark Banner (:standard8) from comment #7)
> Created attachment 8547539 [details] [diff] [review]
> Joining and Leaving a Loop room quickly can leave the room as full. Ensure
> we send the leave notification if we've already sent the join.
> 
> I considered several ways of dealing with this (flags, separate objects
> etc), and creating a new state seemed the best way. With the new state, we
> now know we've already sent the join, and so need to send a leave as well.

Would this impact the FxOS client which is currently implementing rooms? Copying Maria..
Comment on attachment 8547539 [details] [diff] [review]
Joining and Leaving a Loop room quickly can leave the room as full. Ensure we send the leave notification if we've already sent the join.

Review of attachment 8547539 [details] [diff] [review]:
-----------------------------------------------------------------

I agree, introducing this additional state is most appropriate. Nice work!
Attachment #8547539 - Flags: review?(mdeboer) → review+
https://hg.mozilla.org/integration/fx-team/rev/2d99ceafd759
Iteration: 37.3 - 12 Jan → 38.1 - 26 Jan
Target Milestone: --- → mozilla38
Comment on attachment 8547539 [details] [diff] [review]
Joining and Leaving a Loop room quickly can leave the room as full. Ensure we send the leave notification if we've already sent the join.

Approval Request Comment
[Feature/regressing bug #]: Loop Rooms
[User impact if declined]: There's a chance that users can get into a state where they cannot enter their own room. We've seen this reported in thr wild. 
[Describe test coverage new/current, TBPL]: Landed in mozilla-central. Includes unit level tests.
[Risks and why]: Low risk - small patch, low impact, only touches Loop rooms. Is part of the room joining process so is tested with each join.
[String/UUID change made/needed]: None
Attachment #8547539 - Flags: approval-mozilla-beta?
Attachment #8547539 - Flags: approval-mozilla-aurora?
Attachment #8547539 - Flags: approval-mozilla-beta?
Attachment #8547539 - Flags: approval-mozilla-beta+
Attachment #8547539 - Flags: approval-mozilla-aurora?
Attachment #8547539 - Flags: approval-mozilla-aurora+
backlog: Fx36? → Fx36+
Reproduced the problem in Nightly 2015-01-09 Win7.
Verified fixed FF 36b3, 37.0a2 (2015-01-23), 38.0a1 (2015-01-22).
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: