If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Make Stub Attribution data available in re:dash

RESOLVED FIXED

Status

Cloud Services
Metrics: Pipeline
P1
normal
RESOLVED FIXED
a year ago
2 months ago

People

(Reporter: ckprice, Assigned: amiyaguchi)

Tracking

(Blocks: 3 bugs)

Firefox Tracking Flags

(Not tracked)

Details

In bug 1292360 we are adding data to the main telemetry ping for Stub Attribution[0].

We would like this data available in re:dash via a dashboard. You can find the approved mockup for the dashboard in bug 1259614 comment 4.

The telemetry bug is currently on beta and landing with Firefox 50. Ideally we'll have something rough to look at by then.

[0] https://dxr.mozilla.org/mozilla-beta/source/toolkit/components/telemetry/docs/environment.rst#62
There is some special logic[0] we'd like to employ on these values, and there is an open question of where to add some of this logic (notably the piece of changing the medium to `organic` if search engine). Could we include this logic as we're building the data set? NI kparlante 

[0] pseudo code: https://docs.google.com/document/d/1DzIg19kAdtYEzS_waQNCQBfi8CSGj4cl9N8T24WSiyc/edit#
Flags: needinfo?(kparlante)

Updated

a year ago
Component: Metrics: Product Metrics → Metrics: Pipeline
Flags: needinfo?(kparlante)

Comment 2

11 months ago
:cmore, can you specify here the telemetry fields that you need in the dataset?
Flags: needinfo?(chrismore.bugzilla)

Comment 3

11 months ago
please add points once we have enough info to do so
Priority: -- → P3
Cmore/Katie's team to follow-up here. Cmore noted that we should have the dashboards available by January.

We also may need to have a bug to have retention data available in re:dash. Cmore to file bug.

Comment 5

11 months ago
Cmore needs this to be available mid jan

Updated

11 months ago
Depends on: 1311816
Blocks: 1311820

Comment 6

10 months ago
Yes, mid-Jan is fine. If it was raw data that I would have to put into a pivot table in a spreadsheet, that would suffice for a short amount of time.
Flags: needinfo?(chrismore.bugzilla)

Comment 7

8 months ago
please add points.
Assignee: nobody → amiyaguchi
Priority: P3 → P1
(Assignee)

Comment 8

8 months ago
I am planning to set the start of this stub attribution dataset to 2016-11-15, corresponding to the Firefox 50 release. Is this reasonable?
Flags: needinfo?(chrismore.bugzilla)
(Assignee)

Updated

8 months ago
Depends on: 1331082

Comment 9

8 months ago
(In reply to Anthony Miyaguchi [:amiyaguchi] from comment #8)
> I am planning to set the start of this stub attribution dataset to
> 2016-11-15, corresponding to the Firefox 50 release. Is this reasonable?

That should be fine even though it isn't live yet. Hopefully we'll be live for a dark launch just after the Firefox 51 release.
Flags: needinfo?(chrismore.bugzilla)

Updated

8 months ago
Depends on: 1331702
(Assignee)

Comment 10

8 months ago
We should be collecting attribution data in the main summary dataset sometime this week. Since the stub attribution hasn't launched, I'm going to opt out of backfilling for this one unless necessary. I've found that there are no entries for `environment.settings.attribution` on 2016-01-01, and none of the pings over the last 5 days contain the field.
(Assignee)

Updated

8 months ago
No longer depends on: 1331702
(Assignee)

Comment 11

8 months ago
I am currently generating a dataset that is a relevant subset of the main summary [1]. This can only be accessed via spark and will be moved into the retention dataset soon. I created a dashboard that displays traffic from the last 7 days [2], which may be useful to track the status of the Stub Attribution launch.

In parallel, I am working on the Re:Dash dashboard. A basic version using a bar graph can be found at [3]. A bar graph was chosen as a prototype because individual attributes need to be summed up for a line plot, whereas the bar graph captures most of the useful information.

The data is generated with 100 clients and 5 weeks of data [4]. This data does not capture the volume or the long tail of potential attributes, but it is interesting enough to develop the dashboard against.

Future work (possibly in the next sprint) involves adding the stub attribution fields to the churn/retention dataset, since the hard work of slicing and dicing is done by redash/presto.

[1] https://gist.github.com/acmiyaguchi/f5eef1c11a4a5f24616bf50aeb5a8d7e/f663f82c3b7e8f8134847be9b60c6029f3172bff
[2] https://pipeline-cep.prod.mozaws.net/dashboard_output/graphs/analysis.amiyaguchi.amiyaguchi_stub_attribution_count.messages_per_hour.html
[3] https://sql.telemetry.mozilla.org/queries/2655
[4] https://gist.github.com/acmiyaguchi/d3cc2b2f9e8441a3c49a97ff09c12d53
(Assignee)

Updated

7 months ago
Depends on: 1337037

Comment 12

6 months ago
Anthony: What is the latest on the attribution report that is able to be filtered and pivoted in various dimensions? We need this report available before the end of the month as this was supposed to be available well before the end of Q1. Thanks
Flags: needinfo?(amiyaguchi)
(Assignee)

Comment 13

6 months ago
The data has been available as of 2017-03-09 as per bug 1337037 comment 4.

I've created an example dashboard of this data using the top 20 elements in each dimension as separate graphs [1]. Redash is somewhat limiting for querying over various dimensions through UI elements. This could be used as a point for further exploration by creating custom queries for individual sub cohorts.

The first two graphs displays all top 20 attributes at the same time, while the last two take advantage of the "::multi-filter" feature of redash to choose individual attributes.

[1] https://sql.telemetry.mozilla.org/dashboard/stub-attribution-aggregates?p_start_date=20170118
Flags: needinfo?(amiyaguchi)

Comment 14

6 months ago
(In reply to Anthony Miyaguchi [:amiyaguchi] from comment #13)
> The data has been available as of 2017-03-09 as per bug 1337037 comment 4.
> 
> I've created an example dashboard of this data using the top 20 elements in
> each dimension as separate graphs [1]. Redash is somewhat limiting for
> querying over various dimensions through UI elements. This could be used as
> a point for further exploration by creating custom queries for individual
> sub cohorts.
> 
> The first two graphs displays all top 20 attributes at the same time, while
> the last two take advantage of the "::multi-filter" feature of redash to
> choose individual attributes.
> 
> [1]
> https://sql.telemetry.mozilla.org/dashboard/stub-attribution-
> aggregates?p_start_date=20170118

Thanks!

Another question. Do you have the same data available in tabular format if we wanted to download it all to a spreadsheet to pivot it there?
Flags: needinfo?(amiyaguchi)
(Assignee)

Comment 15

6 months ago
If you want be able to see the source data for the dashboard, you can click on the dashboard and click on the table tab. [1] This is a table of week number, attribute, and percentage.

The query for the specific set of (source, medium, campaign, content) combinations would probably look most similar to the `[Stub Attribution] Retention by has_attribute` query, if you wanted it.

[1] https://sql.telemetry.mozilla.org/queries/3841?p_start_date=20170118#table
Flags: needinfo?(amiyaguchi)
(Assignee)

Comment 16

6 months ago
This dataset has been in production for a few months. Any follow-up issues should be filed as new bugs.
Status: NEW → RESOLVED
Last Resolved: 6 months ago
Resolution: --- → FIXED

Comment 17

6 months ago
Thanks!
(Assignee)

Updated

2 months ago
Blocks: 1381806
You need to log in before you can comment on or make changes to this bug.