Closed
Bug 1234225
Opened 9 years ago
Closed 8 years ago
Report retention
Categories
(Hello (Loop) :: General, defect, P1)
Hello (Loop)
General
Tracking
(Not tracked)
RESOLVED
INCOMPLETE
People
(Reporter: RT, Unassigned)
References
(Blocks 1 open bug)
Details
(Whiteboard: [metrics][other team])
User Story
As a product manager, I want to measure retention daily, weekly and monthly for cohorts of desktop client users who started using Hello on the same week (avoids noise created from specific days of the week which have higher usage), so that I understand how sticky Hello is. User retention: Percentage of desktop client users in each cohort that accessed a Hello room in a specific time period. Time periods to graph user retention for: - Daily (graph the percentage of users in each cohort that accessed Hello the day after initial use, the second day after initial use, the third day after initial use, ...) - Weekly (graph the percentage of users in each cohort that accessed Hello in the 7 days following the initial use, in the following 7 days and also the 7 days following that, .....) - Monthly (graph the percentage of users in each cohort that had accessed Hello in the 28 days following the initial use, in the following 28 days and also the 28 days following that, .....) This bug is about creating the automated JSON file that will provide the necessary data to then create the retention graph (this graph will be done by the product team on the current Google sheet used for reporting).
Attachments
(4 files, 3 obsolete files)
No description provided.
Reporter | ||
Updated•9 years ago
|
User Story: (updated)
Reporter | ||
Updated•9 years ago
|
Rank: 28
Priority: -- → P2
Reporter | ||
Comment 1•9 years ago
|
||
It sounds like we could use Hawk ID, RT to check with he server team to confirm.
Flags: needinfo?(rtestard)
Reporter | ||
Comment 2•9 years ago
|
||
Notes from a separate e-mail with Ian/Standard8/Erik related to finding a solution for a persistent enough (60 days at least) client ID:
- We cannot use the room push URL since we don't actually store the uaid for the push channel across restarts. Plus if we go to shared code, it may be harder for us to be notified when a uaid is changed.
- We actually have durable "client IDs" attached to FF profiles now (for MAU purposes); we never turn them over.
Natim, do you know if the Hawk ID would be a suitable client ID for the purpose of churn measurement?
Flags: needinfo?(rtestard) → needinfo?(rhubscher)
Comment 3•9 years ago
|
||
> Natim, do you know if the Hawk ID would be a suitable client ID for the purpose of churn measurement?
AFAIK it seems to fit the need yes.
Flags: needinfo?(rhubscher)
Reporter | ||
Updated•9 years ago
|
User Story: (updated)
Summary: Report churn → Report retention
Reporter | ||
Updated•9 years ago
|
Rank: 28 → 19
Priority: P2 → P1
Reporter | ||
Comment 4•9 years ago
|
||
RT to review with Erik at reporting meeting
Flags: needinfo?(rtestard)
Reporter | ||
Comment 5•9 years ago
|
||
Updated user story to reflect the fact we want to look at cohorts of people spanning 7 days of usage.
User Story: (updated)
Reporter | ||
Comment 6•9 years ago
|
||
We have the data on the loop server already.
Katie, is this something your team could help us with?
We need the JSON output similar to what you did before (no graphing necessary, just the JSON output).
Flags: needinfo?(kparlante)
Reporter | ||
Comment 7•9 years ago
|
||
Attaching an example JSON file for weekly user retention as an example that Jorge produced.
Monthly user retention would be another file following the same principles
Reporter | ||
Comment 8•9 years ago
|
||
Hi Katie, can you please let us know if you could help us with the creation of the JSON file per the example attached?
Comment 9•9 years ago
|
||
Romain,
Yes, we can help you, although the work will be prioritized below helping e10s and top line kpis reporting.
Adding needinfo to rweiss to review retention measure.
Flags: needinfo?(rweiss)
Updated•9 years ago
|
Flags: needinfo?(kparlante)
Comment 10•9 years ago
|
||
I have one comment on the retention measures: in the user story you start with daily, weekly, and monthly measures but then only request the following reports:
Time periods to graph user retention for:
- Weekly (graph the percentage of users in each cohort that accessed Hello in the 7 days following the initial use, in the 7 days following the initial use and also the 7 days following that, .....)
- Monthly (graph the percentage of users in each cohort that had accessed Hello in the 30 days following the initial use, in the 30 days following the initial use and also the 30 days following that, .....)
Why would you also not want daily, e.g.
- Daily (graph the percentage of users in each cohort that accessed Hello the previous day following initial use).
My other comment is to verify that your definition of "usage" is the same as defined for MAU and DAU (see the data glossary here for the formal definition of "active" and "user" for other product areas [https://metrics.services.mozilla.com/firefox-kpis/])
Flags: needinfo?(rweiss) → needinfo?(rtestard)
Reporter | ||
Comment 11•9 years ago
|
||
(In reply to Rebecca Weiss from comment #10)
> I have one comment on the retention measures: in the user story you start
> with daily, weekly, and monthly measures but then only request the following
> reports:
>
> Time periods to graph user retention for:
> - Weekly (graph the percentage of users in each cohort that accessed Hello
> in the 7 days following the initial use, in the 7 days following the initial
> use and also the 7 days following that, .....)
> - Monthly (graph the percentage of users in each cohort that had accessed
> Hello in the 30 days following the initial use, in the 30 days following the
> initial use and also the 30 days following that, .....)
>
> Why would you also not want daily, e.g.
> - Daily (graph the percentage of users in each cohort that accessed Hello
> the previous day following initial use).
Agreed, now added back in to the user story
>
> My other comment is to verify that your definition of "usage" is the same as
> defined for MAU and DAU (see the data glossary here for the formal
> definition of "active" and "user" for other product areas
> [https://metrics.services.mozilla.com/firefox-kpis/])
Per bug 1248569 “Active” users are the users that have joined a Room in a given period.
I see on the glossary doc that FxA, Pocket and Firefox all have different ways to define "users" and "active", it seems we should add the Hello definitions on that document?
User Story: (updated)
Flags: needinfo?(rtestard)
Comment 12•9 years ago
|
||
Yes, we will want to add Hello's definitions. If you send them to me as a separate document, I'll add them to the glossary (I think the glossary is write-locked for the moment). We won't add them until the pipeline team has provided a data set that allows to compute for those numbers that can populate the dashboard, however.
We will also need to ensure that the logic used to compute each of those retention numbers passes muster (e.g. we should see an MVP of how the statistic is computed for a given day with real data in a notebook or something similar that can be r+). That can be added to this bug when it's done.
(In reply to Romain Testard [:RT] from comment #11)
> (In reply to Rebecca Weiss from comment #10)
> > I have one comment on the retention measures: in the user story you start
> > with daily, weekly, and monthly measures but then only request the following
> > reports:
> >
> > Time periods to graph user retention for:
> > - Weekly (graph the percentage of users in each cohort that accessed Hello
> > in the 7 days following the initial use, in the 7 days following the initial
> > use and also the 7 days following that, .....)
> > - Monthly (graph the percentage of users in each cohort that had accessed
> > Hello in the 30 days following the initial use, in the 30 days following the
> > initial use and also the 30 days following that, .....)
> >
> > Why would you also not want daily, e.g.
> > - Daily (graph the percentage of users in each cohort that accessed Hello
> > the previous day following initial use).
>
> Agreed, now added back in to the user story
> >
> > My other comment is to verify that your definition of "usage" is the same as
> > defined for MAU and DAU (see the data glossary here for the formal
> > definition of "active" and "user" for other product areas
> > [https://metrics.services.mozilla.com/firefox-kpis/])
>
> Per bug 1248569 “Active” users are the users that have joined a Room in a
> given period.
> I see on the glossary doc that FxA, Pocket and Firefox all have different
> ways to define "users" and "active", it seems we should add the Hello
> definitions on that document?
Reporter | ||
Comment 13•9 years ago
|
||
(In reply to Rebecca Weiss from comment #12)
> Yes, we will want to add Hello's definitions. If you send them to me as a
> separate document, I'll add them to the glossary (I think the glossary is
> write-locked for the moment). We won't add them until the pipeline team has
> provided a data set that allows to compute for those numbers that can
> populate the dashboard, however.
>
Here is the document for Firefox Hello definitions: https://docs.google.com/document/d/1bnRid9Z7c2DblGPoBJi6oBZMYWlA0ggIgD-6UTVJiHY/edit#
Comment 14•9 years ago
|
||
I've copied the Active and User definitions in the "to appear" section of the data glossary. Retention will not be added to that dashboard yet, as we have not established how many KPIs should appear on that dashboard...yet...
Once the report has been generated and we have a pipeline to these measurements, we can update the dashboard with those values (DAU, MAU, and engagement ratio).
Updated•9 years ago
|
Component: Client → General
Whiteboard: [metrics][other team]
Comment 15•9 years ago
|
||
Is 'initial use' the first time the a uid is seen in the data or is there some flag indicating they are a new user?
If it is the first time the uid is seen in a data then the cohorts can change if a different slice of the data is processed (e.g. after old data is pruned). Is this acceptable?
If there is a flag
1) What is it?
2) We may not ever see the flag since the data is not analysed from the beginning of time and they will not show up in any cohort. Is this acceptable?
Daily clarification: graph the percentage of users in each cohort that accessed Hello the day after initial use as opposed to 'the previous day following initial use'
The MAU is specified as a 28 day window and yet retention is 30 why the difference?
Flags: needinfo?(rtestard)
Comment 16•9 years ago
|
||
(In reply to Mike Trinkala [:trink] from comment #15)
> Is 'initial use' the first time the a uid is seen in the data or is there
> some flag indicating they are a new user?
>
> If it is the first time the uid is seen in a data then the cohorts can
> change if a different slice of the data is processed (e.g. after old data is
> pruned). Is this acceptable?
>
> If there is a flag
> 1) What is it?
> 2) We may not ever see the flag since the data is not analysed from the
> beginning of time and they will not show up in any cohort. Is this
> acceptable?
>
> Daily clarification: graph the percentage of users in each cohort that
> accessed Hello the day after initial use as opposed to 'the previous day
> following initial use'
adding needinfo to client & server devs for best way to identify 'initial use'
Flags: needinfo?(standard8)
Flags: needinfo?(rhubscher)
Comment 17•9 years ago
|
||
In a server point of view we can identified new authenticated users (new userID) and new sessions (new hawkID).
- A session can be linked to a user (FxA authenticated users)
- A session can be a unauthenticated user (in that case a session == a new unauthenticated user)
The first occurrence of a new Hawk session ID will give you a new usage of hello in a new browser (profile or user authentication).
The first occurrence of a new userID will give you a new authenticated user.
Flags: needinfo?(rhubscher)
Comment 18•9 years ago
|
||
So described in term of the data you are providing: https://mana.mozilla.org/wiki/display/CLOUDSERVICES/Loop+Logging
the Hawk session ID is 'uid' and the userId == ??
Currently I am only looking at 'action' == "join" events do I need to look at others?
Flags: needinfo?(rhubscher)
Comment 19•9 years ago
|
||
According to https://github.com/mozilla-services/loop-server/blob/master/loop/middlewares.js#L75
You do not need to bother making a difference between HawkID and UserID because the server already does it for you.
The first occurrence of a new uid will give you a new user (regardless of it is authenticated or not).
Flags: needinfo?(rhubscher)
Comment 20•9 years ago
|
||
Ok thanks
So if I start processing the data on Mar 6 every uid for that week will be considered 'new' and belong to that cohort. This really subverts the expectation of a "new user" cohort; the cohort, as it stands, would track a group of users that all used loop within a given week Is this acceptable?
Flags: needinfo?(rweiss)
Reporter | ||
Comment 21•9 years ago
|
||
(In reply to Mike Trinkala [:trink] from comment #15)
> Is 'initial use' the first time the a uid is seen in the data or is there
> some flag indicating they are a new user?
>
> If it is the first time the uid is seen in a data then the cohorts can
> change if a different slice of the data is processed (e.g. after old data is
> pruned). Is this acceptable?
>
> If there is a flag
> 1) What is it?
> 2) We may not ever see the flag since the data is not analysed from the
> beginning of time and they will not show up in any cohort. Is this
> acceptable?
If I understand well this means that the retention data won't be fully accurate regarding existing users but retention will be accurate for all new users?
Given the low number of users currently and high churn this seems fine.
>
> Daily clarification: graph the percentage of users in each cohort that
> accessed Hello the day after initial use as opposed to 'the previous day
> following initial use'
Good point, now fixed in the user story
>
> The MAU is specified as a 28 day window and yet retention is 30 why the
> difference?
We moved the MAU to 28 days (was 30 initially) to be aligned with Mozilla's standard of tracking MAU.
If it makes more sense to track retention over 28 days (if all other retention metrics at Mozilla use 28 days) then let's do it.
Rebecca can you please confirm that 28 days is what we should look at for retention?
Flags: needinfo?(rtestard)
Reporter | ||
Updated•9 years ago
|
User Story: (updated)
Comment 22•9 years ago
|
||
Flags: needinfo?(standard8)
Flags: needinfo?(rweiss)
Comment 23•9 years ago
|
||
Who needs to review this? I am more concerned that the business logic is what you need as for the output format it can be easily changed.
Flags: needinfo?(rtestard)
Comment 24•9 years ago
|
||
Comment 25•9 years ago
|
||
Reporter | ||
Comment 26•9 years ago
|
||
Thanks Mike
I NI Jorge who'll import it to our reporting dashboard and check the data
Flags: needinfo?(rtestard) → needinfo?(jorge.munuera)
Comment 27•9 years ago
|
||
Mike, could we follow the format we agreed (https://bugzilla.mozilla.org/show_bug.cgi?id=1234225#c7)?
Flags: needinfo?(jorge.munuera) → needinfo?(mtrinkala)
Reporter | ||
Comment 28•9 years ago
|
||
Changing the monthly retention description from 30 days to 28 days in the user story per Comment 21
User Story: (updated)
Comment 29•9 years ago
|
||
Jorge: sure if you can provide me a specification for each type of report (a json schema(s) would be ideal as it appears there would be a different schema for each analysis interval) or you could just modify the output code directly to meet your needs.
There are several issues with the example provide which is why it wasn't used:
- It looks like you want the keys to change for each interval type i.e., "day X", "week X", and "month X" (or 4 weeks as it may now be)
- Special casing the "week 0" to contain the date range is less than ideal (I would have actually expected Week 0 to be the total)
- as for the date format would you still want a range for day i.e. "13 dec. 2015 - 13 dec. 2015"
- I assume "week 1" actually represent the total number of users in the cohort
- It is hard coded to fixed number of intervals and will not show long tails and the short tails end up padded with a bunch zero values
The currently attached output data has a schema that is more flexible as it doesn't have to change for each interval specification, it is more human readable since it doesn't overload interval keys and instead provides the cohort names, total number of user pre cohort and an array of interval values supporting (long/short tails)
Flags: needinfo?(mtrinkala)
Comment 30•9 years ago
|
||
The current monthly report attachment was run with configuration of 30 days: https://github.com/mozilla-services/data-pipeline/blob/master/reports/loop/run/analysis/retention_monthly.cfg#L3
It looks like the reports will be re-run when we settle on an output specification, I will set it to 28 at that point.
Comment 31•9 years ago
|
||
Hi Mike, we can work with your current format (is fine for us). We are working to use this structure in the product dashboard. We selected the other format because is what we have in GA for the link clickers, but it is fine to use this new format.
Comment 32•9 years ago
|
||
Jorge; It was quick enough to tweak the output on the re-run (these have the partial day of data error on Apr 24 too)
If this works better for you, let me know and I will upload the corrected versions in this format. FYI: since the objects are just key value pairs my json outputter doesn't order them.
So 'Week 0' is the cohort title, 'Week 1' is the total number of users in the cohort and then you have the interval values. Daily would be Day #, and monthly would be Month # (the re-run uses 28 days instead of 30)
[
...
{
"Week 0": "24 Apr. 2016 - 30 Apr. 2016",
"Week 3": 186,
"Week 1": 35017,
"Week 4": 6,
"Week 2": 1715
},
{
"Week 3": 43,
"Week 0": "01 May. 2016 - 07 May. 2016",
"Week 1": 31553,
"Week 2": 1543
},
{
"Week 0": "08 May. 2016 - 14 May. 2016",
"Week 1": 17955,
"Week 2": 368
}
]
Comment 33•9 years ago
|
||
Thanks Mike, dont worry I have already updated the dashboard with the new format you have generated.
Comment 34•9 years ago
|
||
Attachment #8752224 -
Attachment is obsolete: true
Comment 35•9 years ago
|
||
Attachment #8752227 -
Attachment is obsolete: true
Updated•9 years ago
|
Attachment #8753484 -
Attachment description: Updated for Apr 24 → Daily report updated for Apr 24
Comment 36•9 years ago
|
||
Switched from a 30 day window to a 28 day window
Updated•9 years ago
|
Attachment #8752229 -
Attachment is obsolete: true
Reporter | ||
Comment 37•9 years ago
|
||
Thanks Mike, looks like data and format are good now.
What's the best way to make it available to digest and auto-update, could we have it on https://metrics.services.mozilla.com/loop-server-dashboard/ with the other metrics?
Comment 38•9 years ago
|
||
I was informed this was a one-off since we cannot run it on our analysis framework (a.t.m.o). However, it could be wired up to run as a cron job on a box that has the correct credentials and dump the results to an s3 bucket that is be published to where you need it. If you want something official (as opposed to a cron job running on a random dev box) and in production we need to loop in Ops. In either case it is a bit of work so a new deployment issue should be filed so we can get it into a sprint.
Reporter | ||
Comment 39•9 years ago
|
||
Do you know how the JSON files available on https://metrics.services.mozilla.com/loop-server-dashboard/ were implemented? This is good enough for us (not sure if it relies on a CRON job) and we could apply the same update/storage method?
I am surprised this was considered a one-off, this is one of the 3 Hello KPIs that we'll be reporting so we'll have to keep it updated regularly to observe the evolution as we bring changes to the product.
Comment 41•9 years ago
|
||
Romain,
For Q2, the priority was to make sure you have the data you need for business partner meetings.
Generally speaking, my team needs to be focusing on infrastructure and tools. We need alignment on who will be doing ongoing loop reporting work -- we should probably coordinate with Rebecca and Nick about priorities & resources.
Comment 42•9 years ago
|
||
The files on https://metrics.services.mozilla.com/loop-server-dashboard/ are generated by real time processing of the data using heka. Presumably we could use :trink's new filters in a similar capacity.
Flags: needinfo?(whd)
Reporter | ||
Comment 43•9 years ago
|
||
(In reply to Wesley Dawson [:whd] from comment #42)
> The files on https://metrics.services.mozilla.com/loop-server-dashboard/ are
> generated by real time processing of the data using heka. Presumably we
> could use :trink's new filters in a similar capacity.
That would be ideal, our data analysis happens today on Google spreadsheet and the JSON files from this page feed directly into these spreadsheets.
This JSON (retention) and the JSON resulting from bug 1248569 (engagement) would ideally be available on that page to bring us what we need.
Before I talk to Rebecca and Nick, do you know how much effort is needed to implement this?
Flags: needinfo?(whd)
Reporter | ||
Comment 44•8 years ago
|
||
Guys can you please help me understand the amount of effort needed here?
We were planning on running experiments with a door hanger to bring new users to Hello and this is dependent on having a way to measure retention for these users.
Updated•8 years ago
|
Flags: needinfo?(whd)
Comment 45•8 years ago
|
||
Support for Hello/Loop has been discontinued.
https://support.mozilla.org/kb/hello-status
Hence closing the old bugs. Thank you for your support.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•