Closed Bug 1248569 Opened 8 years ago Closed 8 years ago

Report active users per new definition of an active user

Categories

(Hello (Loop) :: General, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED INCOMPLETE

People

(Reporter: RT, Unassigned)

References

Details

(Whiteboard: [btpp-active][metrics][other team])

User Story

The definition of “Active” for engagement metrics is changing. From now “Active” users are the users that have joined a Room in a given period This definition is applicable both for Desktop Clients and Link clickers. We need to have reports for:

*MAU: Count of unique Desktop Client and Link Clicker users that have accessed the Hello application (joined a room) in the last 28 days (MADU + MAWU). "Accessed" defined as joining a Hello Room
---MADU: Count of unique Desktop Client users that have accessed a Room in the last 28 days; 
---MAWU: Count of unique Link Clickers site users that have accessed the site in the last 28 days; 
*WAU: Count of unique Desktop Client and Link Clicker users that have accessed the Hello application in the last 7 days (WADU + WAWU)."Accessed" defined as joining a Hello Room
*DAU: Count of unique Desktop Client and Link Clicker users that have accessed the Hello application in the last 1 day (DADU + DAWU)."Accessed" defined as joining a Hello Room
The logs for these metrics will follow the same format that we are using now.

Attachments

(1 file, 1 obsolete file)

      No description provided.
User Story: (updated)
Blocks: 1248602
Rank: 19
Priority: -- → P1
Blocks: 1249646
No longer blocks: 1248602
Whiteboard: [btpp-active]
We have the data on the loop server already.
Katie, is this something your team could help us with?
We need the JSON output similar to what you did before (no graphing necessary, just the JSON output).
Flags: needinfo?(kparlante)
- Yes, my team can help with creating the JSON output
- We do have a requirement to put engagement ratio on a dashboard: https://bugzilla.mozilla.org/show_bug.cgi?id=1249763
- needinfo to rweiss, she should talk to you about engagement ratio (MAU & DAU) definitions (including whether/when we should move to a telemetry based definition)
- supporting hello engagement ratio metrics will be a high priority, once rweiss blesses the definition
Flags: needinfo?(kparlante) → needinfo?(rweiss)
See Also: → 1249763
Romain, where are these counts to be drawn from?  Unified Telemetry or is this from Google Analytics?

I'll note a few changes below. 

Desktop MAU = Count of unique Desktop Client users that have accessed a room in the last 28 days
Web MAU = Count of unique Link Clickers within the site that have accessed a room in the last 28 days
Desktop DAU(d) = The average count of unique Desktop Client users that have accessed a room on a date d 
Web DAU(d) = The average count of unique Desktop Client users that have accessed a room on a date d

These are aligned with the engagement ratio measurements.

:dzeber, can you confirm that these measurements don't have additional properties that necessitate a different MAU/DAU formulation?
Flags: needinfo?(rweiss) → needinfo?(dzeber)
(In reply to Rebecca Weiss from comment #3)
> Romain, where are these counts to be drawn from?  Unified Telemetry or is
> this from Google Analytics?

This data actually lives on the loop server.
Previous reports on https://metrics.services.mozilla.com/loop-server-dashboard/ were created from loop server data directly too.

> 
> I'll note a few changes below. 
> 
> Desktop MAU = Count of unique Desktop Client users that have accessed a room
> in the last 28 days
> Web MAU = Count of unique Link Clickers within the site that have accessed a
> room in the last 28 days
> Desktop DAU(d) = The average count of unique Desktop Client users that have
> accessed a room on a date d 
> Web DAU(d) = The average count of unique Desktop Client users that have
> accessed a room on a date d
> 
> These are aligned with the engagement ratio measurements.
> 
> :dzeber, can you confirm that these measurements don't have additional
> properties that necessitate a different MAU/DAU formulation?
(In reply to Rebecca Weiss from comment #3)
> Desktop MAU = Count of unique Desktop Client users that have accessed a room
> in the last 28 days
> Web MAU = Count of unique Link Clickers within the site that have accessed a
> room in the last 28 days
> Desktop DAU(d) = The average count of unique Desktop Client users that have
> accessed a room on a date d 
> Web DAU(d) = The average count of unique Desktop Client users that have
> accessed a room on a date d

So long as there is a reliable way to measure whether a given client was active (joined a room) on any given day, then these defintions apply.

I imagine a single client could be both a Desktop Client on some day and a Link Clicker on another. In that case, MAU = Desktop MAU + Web MAU would double-count, so we'd probably want to compute (D/W/M)AU as "count of unique users that have accessed a room either from Desktop or by clicking a link in the last (1/7/28) days.

For reference, the definitions we've been working with on Desktop are here: https://bugzilla.mozilla.org/show_bug.cgi?id=1240849#c19. These are slightly more general in that they allow for segmentation, and we could actually apply that here by thinking of "Desktop clients" and "Link clickers" as segments.
Flags: needinfo?(dzeber)
(In reply to Dave Zeber [:dzeber] from comment #5)
> (In reply to Rebecca Weiss from comment #3)
> > Desktop MAU = Count of unique Desktop Client users that have accessed a room
> > in the last 28 days
> > Web MAU = Count of unique Link Clickers within the site that have accessed a
> > room in the last 28 days
> > Desktop DAU(d) = The average count of unique Desktop Client users that have
> > accessed a room on a date d 
> > Web DAU(d) = The average count of unique Desktop Client users that have
> > accessed a room on a date d
> 
> So long as there is a reliable way to measure whether a given client was
> active (joined a room) on any given day, then these defintions apply.
> 
> I imagine a single client could be both a Desktop Client on some day and a
> Link Clicker on another. In that case, MAU = Desktop MAU + Web MAU would
> double-count, so we'd probably want to compute (D/W/M)AU as "count of unique
> users that have accessed a room either from Desktop or by clicking a link in
> the last (1/7/28) days.
> 
We unfortunately have no way to identify such users (clickers can be on Chrome, Firefox or Opera - identifying such users would probably require IP@ logging which would not even provide an accurate solution) and will have to include these double counted users in our stats.

> For reference, the definitions we've been working with on Desktop are here:
> https://bugzilla.mozilla.org/show_bug.cgi?id=1240849#c19. These are slightly
> more general in that they allow for segmentation, and we could actually
> apply that here by thinking of "Desktop clients" and "Link clickers" as
> segments.
User Story: (updated)
In that case, I'd suggest proceeding with the definitions in Comment 3, and adding a disclaimer when presenting AU = DesktopAU + WebAU that the actual number of unique users may be lower.
Thanks, Katie I NI you since we seem to be good to go for implementing this.
Please let me know if it is useful for me to set-up a call with Gareth and Mark who will respectively be able to help with Link clicker (Google analytics) and desktop client (loop server) metrics.
Flags: needinfo?(kparlante)
Depends on: 1258956
RT: I've created the bug for the lua filter implementation (Bug#1258956), we can do any calculations based on loop server metrics. One flag, though -- we have access to loop server data, but not google analytics. My team doesn't have the resources to automate pulling data from GA.
Flags: needinfo?(kparlante)
Thanks Katie, that sounds OK, we should be able to extract the data from analytics ourselves.
So yes let's just do the loop server piece on this bug.
Component: Client → General
Whiteboard: [btpp-active] → [btpp-active][metrics][other team]
Updating US to reflect that monthly users are per 28 day time periods
User Story: (updated)
Depends on: 1266011
Blocks: 1268106
I took a quick look through the puppet-config repo but it isn't obvious where the loop logs are stored. Please provide me the the s3 bucket name (and read access if I don't have it).
Flags: needinfo?(whd)
Flags: needinfo?(rtestard)
whd hooked me up with creds and the URI
Flags: needinfo?(whd)
Flags: needinfo?(rtestard)
Ok, I have taken a look at the data.  The description above states
> From now “Active” users are the users that have joined a Room

However, there is no definition/specification describing what joining a room looks like from the raw data.  Ideally it would be something like: for all log entries where Logger == 'mozilla-loop-server' && Fields[action] == 'join' count the distinct number of Field[uid] for the Timestamp in the given range.

- the join log message doesn't always have a uid (the ones without a uid all appear to be UserType: Link-clicker and the ones with a uid all appear to be UserType: Unregistered)
- should a 'refresh' action be treated as a join or will there always be an initial join and the refresh is redundant (i.e., the user was already counted so we can ignore it)

What is used to identify a desktop client vs a link clicker?
- The UserType? Is the Unregistered above a desktop user? 
- A user_agent_os/browser combo?
- How should a unique link clicker be identified/counted?
Flags: needinfo?(rtestard)
Adding Gareth and Mark needinfo to help with Trink's questions in https://bugzilla.mozilla.org/show_bug.cgi?id=1248569#c14
Flags: needinfo?(standard8)
Flags: needinfo?(garethcull.bugs)
(In reply to Mike Trinkala [:trink] from comment #14)
> Ok, I have taken a look at the data.  The description above states
> > From now “Active” users are the users that have joined a Room
> 
> However, there is no definition/specification describing what joining a room
> looks like from the raw data.  Ideally it would be something like: for all
> log entries where Logger == 'mozilla-loop-server' && Fields[action] ==
> 'join' count the distinct number of Field[uid] for the Timestamp in the
> given range.
> 
> - the join log message doesn't always have a uid (the ones without a uid all
> appear to be UserType: Link-clicker and the ones with a uid all appear to be
> UserType: Unregistered)
> - should a 'refresh' action be treated as a join or will there always be an
> initial join and the refresh is redundant (i.e., the user was already
> counted so we can ignore it)
> 
> What is used to identify a desktop client vs a link clicker?
> - The UserType? Is the Unregistered above a desktop user? 
> - A user_agent_os/browser combo?
> - How should a unique link clicker be identified/counted?

I think that what Katie did on https://metrics.services.mozilla.com/loop-server-dashboard/ can help.
Check out "Daily unique user IDs in rooms" for an example. This was using the HawkID which I believe is what we should be using for this too.
To be clear, "Daily unique user IDs in rooms" represented on https://metrics.services.mozilla.com/loop-server-dashboard/ was using a more restrictive definition of a user (a user was defined as a deskrtop client user who had been on a Hello session with someone else).

For the rest of your questions, Mark could you please help Mike since I'm not sure about the implementation details.
Removing Gareth from the NI since he rather focusses on the GA reporting side of things.
Flags: needinfo?(rtestard)
Flags: needinfo?(garethcull.bugs)
(In reply to Romain Testard [:RT] from comment #16)
Yeah, I have already looked at the previous plugins and as you state it doesn't match the new definition hence the need info.  The request being made using the above definition with the existing data could produce results that are over counted by multiple times depending on actual usage patters.  Since there are a lot of compromises and assumptions happening here I need your team to explicitly state what is acceptable.  IMO using this for a rough estimate of usage seems more appropriate than any kind of 'user' based count but that is your call.
I was going to set a meeting for you, Mark and I to discuss your concerns although your calendar looks packed this week! 
Can you clarify what is the issue you are seeing in the logs that would make it impossible to track the number of unique desktop client users (Firefox room creators) joining a room?
Mark, could you please help Mike if you know of an event that would make it easy to report what is required here?
The time blocked on my calendar represents this sprint's work (some of which is the loop analysis) please setup a meeting (overlapping that reserved time is fine) as I have already outlined the issues with the data.
The sprint is winding down and without a specification on how the team would like the raw data mapped to the xAU definitions there will be no implementation.  If it is not available by the end of day tomorrow this will have to be rescheduled.
Sorry for the delay.

(In reply to Mike Trinkala [:trink] from comment #14)
> Ok, I have taken a look at the data.  The description above states
> > From now “Active” users are the users that have joined a Room
> 
> However, there is no definition/specification describing what joining a room
> looks like from the raw data.  Ideally it would be something like: for all
> log entries where Logger == 'mozilla-loop-server' && Fields[action] ==
> 'join' count the distinct number of Field[uid] for the Timestamp in the
> given range.

Yes that's roughly right.

> - the join log message doesn't always have a uid (the ones without a uid all
> appear to be UserType: Link-clicker and the ones with a uid all appear to be
> UserType: Unregistered)

Yeah, that's correct. Note there's also a UserType of "Registered"

> - should a 'refresh' action be treated as a join or will there always be an
> initial join and the refresh is redundant (i.e., the user was already
> counted so we can ignore it)

I think ignore it. refreshing doesn't auto-join.

> What is used to identify a desktop client vs a link clicker?
> - The UserType? Is the Unregistered above a desktop user? 

UserType === "Registered" || UserType === "Unregistered"

> - How should a unique link clicker be identified/counted?

Per comment 10, we're doing this via Google Analytics, and we'll deal with that separately.

So in summary, I think we want something like:

Fields[action] == "join" && (Fields[UserType] == "Registered" || Fields[UserType] == "Unregistered")

and count the distinct number of Field[uid] for the Timestamp in the given range.
Flags: needinfo?(standard8)
Attached file Sample loop active users report (obsolete) —
Attachment #8752221 - Attachment description: Sample loop retention output (daily analysis) → Sample loop active users report
Who needs to review this?  I am more concerned that the business logic is what you need as for the output format it can be easily changed
Flags: needinfo?(rtestard)
Thanks Mike
I NI Jorge who'll import it to our reporting dashboard and check the data
Flags: needinfo?(rtestard) → needinfo?(jorge.munuera)
No longer blocks: 1268106
The format is fine, this is already integrated in the dashboard. Regarding the values, DAU data has strange values for some dates (the last one, for example), could you review it?
Flags: needinfo?(jorge.munuera)
2015-12-19 Double checked and the value match
2016-04-24 Looks like there was an analysis load error from S3 on that date after the re-load I am seeing 6111
2016-05-12 is partial data (the remainder of the May 12 logs will show up in the May 13th upload).  

On the other end the WAU is incomplete until the 7th day in and the MAU in incomplete until the 28th day in so neither should be consider representative until that point.

The missing data on April 24th skews all the result after that. I will run the full set again and see if I can reproduce the load failure.
Yes, I was able to reproduce a failure in which case the remainder of a file was skipped and the process continued on to the next day (since it was just being piped into the analysis)

('Connection broken: IncompleteRead(1538404 bytes read, 6850204 more expected)', IncompleteRead(1538404 bytes read, 6850204 more expected))
Attachment #8752221 - Attachment is obsolete: true
Thanks Mike, looks like data and format are good now.
What's the best way to make it available to digest and auto-update.
COuld we have it on https://metrics.services.mozilla.com/loop-server-dashboard/ with the other metrics?
Depends on: 1268106
Depends on: 1277383
Support for Hello/Loop has been discontinued.

https://support.mozilla.org/kb/hello-status

Hence closing the old bugs. Thank you for your support.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: