Closed Bug 1145819 Opened 5 years ago Closed 5 years ago

2-way direct call telemetry counts incorrectly

Categories

(Hello (Loop) :: Client, defect, P2)

x86
macOS
defect

Tracking

(firefox39 verified)

VERIFIED FIXED
mozilla39
Tracking Status
firefox39 --- verified
Blocking Flags:
backlog backlog+

People

(Reporter: dmose, Assigned: dmose)

References

Details

(Whiteboard: [metrics])

Attachments

(1 file, 1 obsolete file)

In doing verification on the direct calls stuff by hand with the logs, I found two problems:

* Both incoming and outgoing calls are being counted, unlike in the rooms case, so the numbers aren't comparable.  OtSdkDriver._isDesktop is by itself an insufficient guard for guaranteeing that we only count each session once.  I think we're going to want to have something passed in via the setupStreamElements actionData from OngoingCallView about whether or not to send telemetry for a given session. 

* OtSdkDriver._maybePublishLocalStream is passing performance.now as an argument rather than the result of performance.now().  This bugs means that some of the above calls won't get double-counted.  Fix is trivial, happily.
[Tracking Requested - why for this release]:

Otherwise the metrics for call session length will be significantly wrong, and which will make the overall rooms/calls session length significantly wrong.
backlog: --- → backlog+
I was thinking about this a bit, and I think this is a matter of perspective:

For the amount of time an individual uses Hello, this would be for time spent in rooms, time making incoming calls and time making outgoing calls.

What you're saying is that it should only be two of those. Going back to the original requirement it is:

"As a product manager I want to measure the impact that the Hello service has on the time users spend on Firefox in general."

That feels to me, like it wants all three measured as that's how much a user will be spending in Hello conversations.
Because the user story wasn't super explicitly, we asked RT what we wanted measured when this was in progress, and we implemented that ("only one side gets counted") for rooms.  Unfortunately, as per the above, both sides are getting counted for direct calls.  Since these all show up in the same histogram ("session length"), having them be different makes the data not very useful/coherent.

Excitingly, I implemented a fix for the basic double-counting using setupStreamElements, and even with that fix, clicking hangup and then close causes a call to still be counted twice (it was presumably counted three times before).

setupStreamElements as an approach has some problems, and I think it may just be too late to capture the ideal amount of logging; I'm not sure.  I'm gonna have to rethink this a bit and take another run at it on Monday.
Assignee: nobody → dmose
Attachment #8582086 - Flags: review?(standard8)
(In reply to Dan Mosedale (:dmose) - needinfo? me for response from comment #3)
>
> Excitingly, I implemented a fix for the basic double-counting using
> setupStreamElements, and even with that fix, clicking hangup and then close
> causes a call to still be counted twice (it was presumably counted three
> times before).

This turns out to have just been a bug in the previous patch iteration.  It's gone now.
Comment on attachment 8582086 [details] [diff] [review]
Fix loop Telemetry counting of direct calls

Review of attachment 8582086 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/components/loop/test/shared/activeRoomStore_test.js
@@ +499,5 @@
> +       "store._isDesktop is true", function() {
> +      store._isDesktop = true;
> +      var actionData = new sharedActions.JoinedRoom(fakeJoinedData);
> +
> +      store.joinedRoom(actionData);

nit: no need for the intermediate here - we don't normally do this elsewhere.

@@ +511,5 @@
> +       "store._isDesktop is false", function() {
> +      store._isDesktop = false;
> +      var actionData = new sharedActions.JoinedRoom(fakeJoinedData);
> +
> +      store.joinedRoom(actionData);

nit: ditto

::: browser/components/loop/test/shared/otSdkDriver_test.js
@@ +280,5 @@
>        sinon.assert.calledWith(session.connect, "1234567890", "1357924680");
>      });
>  
> +    it("should set the two-way media start time to 'uninitialized' " +
> +       "when sessionData.sendTwoWayMediaTelemetry is true'", function() {

That's not quite true - it sets it to uninitialized regardless (which I think is fine, just the test title needs updating)
Attachment #8582086 - Flags: review?(standard8) → review+
Rank: 20
Flags: firefox-backlog+
Comment on attachment 8582086 [details] [diff] [review]
Fix loop Telemetry counting of direct calls

Review of attachment 8582086 [details] [diff] [review]:
-----------------------------------------------------------------

::: browser/components/loop/test/shared/activeRoomStore_test.js
@@ +499,5 @@
> +       "store._isDesktop is true", function() {
> +      store._isDesktop = true;
> +      var actionData = new sharedActions.JoinedRoom(fakeJoinedData);
> +
> +      store.joinedRoom(actionData);

Actually, I was just copying the test above, which does exactly this.  :-)

However, there is a good reason to do this: the construction of the actionData is part of the setup block, but the call into the store is part of the execution block, so IMO it makes the test a bit easier to understand.

::: browser/components/loop/test/shared/otSdkDriver_test.js
@@ +280,5 @@
>        sinon.assert.calledWith(session.connect, "1234567890", "1357924680");
>      });
>  
> +    it("should set the two-way media start time to 'uninitialized' " +
> +       "when sessionData.sendTwoWayMediaTelemetry is true'", function() {

I'm believe it's correct as written: _setTwoWayMediaStartTime() returns early if this._sendTwoWayMediaTelemetry is not set to something truthy.
(In reply to Dan Mosedale (:dmose) - needinfo? me for response from comment #7)
> Comment on attachment 8582086 [details] [diff] [review]
> Fix loop Telemetry counting of direct calls
> 
> Review of attachment 8582086 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: browser/components/loop/test/shared/activeRoomStore_test.js
> @@ +499,5 @@
> > +       "store._isDesktop is true", function() {
> > +      store._isDesktop = true;
> > +      var actionData = new sharedActions.JoinedRoom(fakeJoinedData);
> > +
> > +      store.joinedRoom(actionData);
> 
> Actually, I was just copying the test above, which does exactly this.  :-)
> 
> However, there is a good reason to do this: the construction of the
> actionData is part of the setup block, but the call into the store is part
> of the execution block, so IMO it makes the test a bit easier to understand.

Yeah, it just feels a bit long winded for our tests. I'm not sure I buy the setup argument as its just wrapping the data for the call. We've not had an issue with the rest of the tests where we do this - afaik this is the only instance unless there's some extra modification going on.
Attachment #8582086 - Attachment is obsolete: true
Attachment #8582607 - Flags: review+
Ok, I've removed the intermediates in those tests; carrying forward r+.
Steps for manually verifying that the critical pieces of this are functioning as expected:

This test requires two browsers, each one running in a different profile, each logged in to a different Firefox Account.

In each browser profile:
* use about:config to set loop.debug.twoWayMediaTelemetry to true
* restart
* open Tools -> Web Developer -> Browser Console
* create and/or log in to a Firefox account

In the browser that will be making the calls (the “caller”):
* drop down the Loop panel
* click the contacts icon (looks like a person)
* click “New Contact"
* Fill in a name, and the email address of the other (“callee”) firefox account

At the beginning of each test, in both browsers:
 
* in the browser console, type the string “Telem” into the search field to filter out irrelevant junk

In the “caller” browser:

* drop down the Loop menu
* click the contacts icon if necessary
* click the “video call” button next to the new contact that you created above

== case 1 ==

* in the callee browser, click the “Accept” button in the incoming call window
* wait for the video & audio connection to complete
* verify that no new Telemetry messages have appeared in the browser console
* in the caller browser, verify that you see exactly one message that looks something like this:

"Loop Telemetry: noted two-way connection start, start time in ms:" 9550.395333

(Telemetry messages of other forms can be ignored).

* in the callee browser, click the hangup button

* in the caller browser, ensure that a message of the form:

"Loop Telemetry: noted two-way media connection in bucket: " "MORE_THAN_5M"

has been logged

* in the callee browser’s inset chat window, click the “X” button to close it

* in the callee browser’s console window, verify that no Loop Telemetry messages have been logged 

* in the caller browser inset chat window, click the X button to close it

* in the caller browser console, verify there is still only a single message of the mentioning the word “bucket”
Flags: qe-verify+
Whiteboard: [metrics]
Tracking for whatever help I can give. Looks like your fix will land soon. Thanks for the great steps to reproduce, super helpful.

Florin, do you want me to keep needinfo-ing you for things like this or is adding qe-verify+ enough to call your team in?  Let me know. For now here's a ni since we want to have try and have this this verified before 39 moves to aurora. Thanks!
Flags: needinfo?(florin.mezei)
Usually qe-verify+ is enough, but in cases like this where we need something verified with higher priority it's better to also needinfo me.
https://hg.mozilla.org/mozilla-central/rev/666e41236982
Status: NEW → RESOLVED
Closed: 5 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
It would be very helpful if this Could be verified before 39 uplifts to Aurora.  Adding a needinfo to FlorinM to get this on that radar.
Oh, this is already on that radar, so never mind.
Tested on the latest Firefox 39 Nightly as part of bug 1123660.

Following the scenarios from comment 11 I get Loop Telemetry messages only in the caller's console. Here's how the series of events looks like:
----- when callee accepts conversation -----
"Loop Telemetry: noted two-way connection start, start time in ms:" -1
"Loop Telemetry: noted two-way connection start, start time in ms:" 8500.318651…
----- on hangup (from any side) -----
"Loop Telemetry: noted two-way connection start, start time in ms:" -2
"Loop Telemetry: noted two-way media connection in bucket: " “BETWEEN_10S_AND_30S”
"Loop Telemetry: noted two-way connection start, start time in ms:" -1
----- if caller clicks "x" (inset chat window) after the callee ended the conversation -----
"Loop Telemetry: noted two-way connection start, start time in ms:" -1

Everything looks as expected except for the "start time" messages that show "-1" and "-2". As stated in bug 1123660, I'm not sure whether these are expected or not. Will wait for conclusions in bug 1123660 to close this.
Flags: needinfo?(florin.mezei)
According to Dan's comment in https://bugzilla.mozilla.org/show_bug.cgi?id=1123660#c64, the "-1" and "-2" messages are just for debugging, so I'm also marking this as Verified.
Status: RESOLVED → VERIFIED
Thanks very much Florin! Untracking this for 39. 

Dan please don't hesitate to re-nominate this for tracking if anything changes.
You need to log in before you can comment on or make changes to this bug.