Closed Bug 1122506 Opened 11 years ago Closed 10 years ago

Report Hello conversation URL funnel

Categories

(Cloud Services :: Operations: Metrics/Monitoring, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: RT, Assigned: kparlante)

References

Details

User Story

Report Hello conversation URL funnel

Acceptance criteria:
[P1] Daily provide:
Number of conversations created
Number of conversation URLs Copied/Shared
Unique conversation URLs clicked
Unique conversation joined on the clicker side
Number of occurrences of  “2 users in a conversation at the same time”
[P2] Provide graph showing evolution of the above numbers daily
No description provided.
Assignee: nobody → kparlante
Blocks: 1122505
User Story: (updated)
:whd, we want these counts (per day) in a json file: room_funnel.json * room_created This is a simple count of successful calls to "POST /rooms" (https://docs.services.mozilla.com/loop/apis.html#post-rooms). Note that we're not counting "POST /rooms/<token>", which is used for joining/leaving a room. The equivalent kibana query is: "method:post AND path:rooms AND -token:* AND errno:0" * room_url_clicked We need to look at the nginx log for this one. The url that the user clicks looks like "GET https://hello.firefox.com/xxxxxxxxxxx" where "xxxxxxxxxxx" is the room token. It hits the static http server. One thing to look out for: we don't want to match any file extensions. We want to count the number of unique room tokens for this one. * room_joined We want to count the number of rooms that were joined by the link clicker. We want to count unique room tokens seen on "POST /rooms/<token>" with action=join. https://docs.services.mozilla.com/loop/apis.html#joining-the-room kibana query: "method:post AND path:rooms AND action:join AND errno:0" - count unique room tokens, in this case handily available in the "token" field - the extra wrinkle here is that we only want to count "link clickers", not people who rejoin a room after having left the room. The "link clickers" should be using basic auth instead of hawk (https://docs.services.mozilla.com/loop/apis.html#rooms). I'm guessing that we'll have a uid for the hawk session users, but I can't test that out in kibana. * two_in_room We want to count the number of rooms that have ever had two participants in the room at the same time. Once 0.15 goes to production, we should have a "participants" field (https://bugzilla.mozilla.org/show_bug.cgi?id=1112379). kibana query: "method:post AND path:rooms AND action:join AND errno:0 AND participants:2" - unique room tokens, again found in "token" Lets combine into one file, so: [{"date":"2014-07-24","time_t":1406160000,"room_created":100,"room_url_clicked":99,"room_joined":98, "two_in_room":97},...]
Assignee: kparlante → whd
* room_created This is a cbuf filter with message matcher: Logger == 'mozilla-loop-server' && Fields[method] == 'post' && Fields[path] == '/rooms' && Fields[token] == NIL && Fields[errno] == 0 * room_url_clicked This is a HLL filter on the request field with message matcher: Logger == 'LoopWebserver' && Fields[request] =~ /^GET \/[^\.\/]{11}$/ * room_joined This is a HLL filter on the token field with message matcher: Logger == 'mozilla-loop-server' && Fields[method] == 'post' && Fields[path] =~ /rooms/ && Fields[action] == 'join' && Fields[errno] == 0 Regarding the wrinkle, it appears all "join" actions are accompanied by a uid, so I can't tell from loop server logs which requests used basic auth and which did not. I'm going to start out implementing naively and we can fix it up from there. * two_in_room This is a simple HLL filter on the token field with message matcher: Logger == 'mozilla-loop-server' && Fields[method] == 'post' && Fields[path] =~ /rooms/ && Fields[action] == 'join' && Fields[errno] == 0 && Fields[participants] >= 2 I'm starting work on this in the loop_room_metrics branch of puppet-config.
These filters are live and if the data looks good I can backfill the JSON at: https://metrics.services.mozilla.com/loop-server-dashboard/data/loop_room_funnel.json for all metrics except ones involving participants.
I have this graphing locally, looks reasonable. Lets get it backfilled. Also, "two_in_room" data should be available now that "participants" is showing up.
I'm seeing for the "method:post AND path:rooms AND action:join AND participants:2" query in kibana all 202 errno and 400 code responses, so they aren't currently being counted in the "two_in_room" filter (requires errno 0). I can change the message matcher if need be but it looks like something might be up app-side.
202 is "Room Full": https://github.com/mozilla-services/loop-server/blob/master/loop/errno.json (there's a bug looking into the high-ish frequency of that error: https://bugzilla.mozilla.org/show_bug.cgi?id=1123588). Looks like "participants" is the number of people in the room before the join, so lets change to: Logger == 'mozilla-loop-server' && Fields[method] == 'post' && Fields[path] =~ /rooms/ && Fields[action] == 'join' && Fields[errno] == 0 && Fields[participants] >= 1
The matcher has been updated.
The data has been backfilled. In this pass I added the filter "Fields[user_agent_browser] != NIL" as we have with some other metrics to discount load tests.
This is now live on the dashboard: https://metrics.services.mozilla.com/loop-server-dashboard/ Missing: - Number of conversation URLs Copied/Shared (should come from FHR+Telemetry, ideally working with Saptarshi to make sure we have a count that we can compare to the others in the funnel) - Ideally, correlate with Tokbox data to know whether or not a successful audio/video connection happened. - Not on a separate "analytics" dashboard Romain, let me know if you'd like any of the text tweaked for terminology consistency.
Assignee: whd → kparlante
Mark, the shared URLs on Telemetry seem broken, is there a bug tracking this already? Or is this something that should be part of the FHR work?
Flags: needinfo?(standard8)
(In reply to Katie Parlante from comment #9) > This is now live on the dashboard: > https://metrics.services.mozilla.com/loop-server-dashboard/ > > Missing: > - Number of conversation URLs Copied/Shared (should come from FHR+Telemetry, > ideally working with Saptarshi to make sure we have a count that we can > compare to the others in the funnel) > - Ideally, correlate with Tokbox data to know whether or not a successful > audio/video connection happened. > - Not on a separate "analytics" dashboard > > Romain, let me know if you'd like any of the text tweaked for terminology > consistency. Thanks Katie, this all looks good!
The spike on Jan 27th is related to the load tests? Any way to take out the data related to the load tests?
Flags: needinfo?(kparlante)
(In reply to Romain Testard [:RT] from comment #12) > The spike on Jan 27th is related to the load tests? > Any way to take out the data related to the load tests? The room funnel data has the load test spike filtered out, as do most of the call metrics. The metrics on the graph that have not been filtered are "active daily" (unique users that hit any endpoint) and the call setup stats that come from websocket logging. Those are trickier to filter, as we don't have the user agents on the log -- we have to associate the callId with other endpoints and look at those user agents. I'll log a separate bug for that one. It might be prudent to just asterisk the data.
Flags: needinfo?(kparlante)
(In reply to Wesley Dawson [:whd] from comment #8) > The data has been backfilled. In this pass I added the filter > "Fields[user_agent_browser] != NIL" as we have with some other metrics to > discount load tests. If we filter this out permanently are we going to hide from ourselves if other folks are writing clients and using them with Hello? (I don't expect this to be a high usage, but you never know). Or if we do filter out, maybe we should make sure we can track it somewhere...
Depends on: 1127574
(In reply to Romain Testard [:RT] from comment #10) > Mark, the shared URLs on Telemetry seem broken, is there a bug tracking this > already? > Or is this something that should be part of the FHR work? I've filed bug 1127574 on this. I'm sure we've discussed it before - we never took telemetry info across to call urls, though I think its different info tbh. I think we should keep it separate from the fhr work at the moment - we'll more likely get something out sooner.
Flags: needinfo?(standard8)
(In reply to Mark Banner (:standard8) from comment #15) > If we filter this out permanently are we going to hide from ourselves if > other folks are writing clients and using them with Hello? (I don't expect > this to be a high usage, but you never know). Or if we do filter out, maybe > we should make sure we can track it somewhere... This will potentially hide information about folks writing clients from the high-level dashboard (such as the "load test" client...). We still keep this info in kibana. Alternatively we know what the non-standard user agent we use with our load tests is, and we can filter out that agent explicitly instead.
(In reply to Mark Banner (:standard8) from comment #15) > (In reply to Wesley Dawson [:whd] from comment #8) > > The data has been backfilled. In this pass I added the filter > > "Fields[user_agent_browser] != NIL" as we have with some other metrics to > > discount load tests. > > If we filter this out permanently are we going to hide from ourselves if > other folks are writing clients and using them with Hello? (I don't expect > this to be a high usage, but you never know). Or if we do filter out, maybe > we should make sure we can track it somewhere... We're not filtering it from elasticsearch/kibana or any ops team monitoring, we're just filtering it out of the aggregates for the custom dashboard. The specter of some third party client hitting the hello server endpoints makes me wonder if we should send a flag with each endpoint indicating the application the HTTP or websocket request is on behalf of. FxA has a service=sync, for example. That could also help when we need to distinguish standalone from desktop. Just a thought.
(In reply to Wesley Dawson [:whd] from comment #17) > Alternatively we know what the non-standard user agent we use with our load > tests is, and we can filter out that agent explicitly instead. One detail to explain here: the raw user agent is not passed along to kibana right now, in part because it can be used as pii. The user agent field is parsed and put into common buckets for user_agent_browser, user_agent_os, user_agent_version. Looks like the load tester shows up as user_agent_os=Linux but the other fields don't fall into a recognized bucket. I think whd's proposal is that we modify that bucketing logic to specifically identify the load tester (user_agent_browser=loads) and then filter on that. Which is an excellent idea.
I think this is done. Reopen if necessary.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.