Closed Bug 1309290 Opened 8 years ago Closed 7 years ago

Make Stub Attribution data available in re:dash

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ckprice, Assigned: amiyaguchi)

References

(Blocks 1 open bug)

Details

In bug 1292360 we are adding data to the main telemetry ping for Stub Attribution[0].

We would like this data available in re:dash via a dashboard. You can find the approved mockup for the dashboard in bug 1259614 comment 4.

The telemetry bug is currently on beta and landing with Firefox 50. Ideally we'll have something rough to look at by then.

[0] https://dxr.mozilla.org/mozilla-beta/source/toolkit/components/telemetry/docs/environment.rst#62
There is some special logic[0] we'd like to employ on these values, and there is an open question of where to add some of this logic (notably the piece of changing the medium to `organic` if search engine). Could we include this logic as we're building the data set? NI kparlante 

[0] pseudo code: https://docs.google.com/document/d/1DzIg19kAdtYEzS_waQNCQBfi8CSGj4cl9N8T24WSiyc/edit#
Flags: needinfo?(kparlante)
Component: Metrics: Product Metrics → Metrics: Pipeline
Flags: needinfo?(kparlante)
:cmore, can you specify here the telemetry fields that you need in the dataset?
Flags: needinfo?(chrismore.bugzilla)
please add points once we have enough info to do so
Priority: -- → P3
Cmore/Katie's team to follow-up here. Cmore noted that we should have the dashboards available by January.

We also may need to have a bug to have retention data available in re:dash. Cmore to file bug.
Cmore needs this to be available mid jan
Depends on: 1311816
Yes, mid-Jan is fine. If it was raw data that I would have to put into a pivot table in a spreadsheet, that would suffice for a short amount of time.
Flags: needinfo?(chrismore.bugzilla)
please add points.
Assignee: nobody → amiyaguchi
Priority: P3 → P1
I am planning to set the start of this stub attribution dataset to 2016-11-15, corresponding to the Firefox 50 release. Is this reasonable?
Flags: needinfo?(chrismore.bugzilla)
Depends on: 1331082
(In reply to Anthony Miyaguchi [:amiyaguchi] from comment #8)
> I am planning to set the start of this stub attribution dataset to
> 2016-11-15, corresponding to the Firefox 50 release. Is this reasonable?

That should be fine even though it isn't live yet. Hopefully we'll be live for a dark launch just after the Firefox 51 release.
Flags: needinfo?(chrismore.bugzilla)
Depends on: 1331702
We should be collecting attribution data in the main summary dataset sometime this week. Since the stub attribution hasn't launched, I'm going to opt out of backfilling for this one unless necessary. I've found that there are no entries for `environment.settings.attribution` on 2016-01-01, and none of the pings over the last 5 days contain the field.
No longer depends on: 1331702
I am currently generating a dataset that is a relevant subset of the main summary [1]. This can only be accessed via spark and will be moved into the retention dataset soon. I created a dashboard that displays traffic from the last 7 days [2], which may be useful to track the status of the Stub Attribution launch.

In parallel, I am working on the Re:Dash dashboard. A basic version using a bar graph can be found at [3]. A bar graph was chosen as a prototype because individual attributes need to be summed up for a line plot, whereas the bar graph captures most of the useful information.

The data is generated with 100 clients and 5 weeks of data [4]. This data does not capture the volume or the long tail of potential attributes, but it is interesting enough to develop the dashboard against.

Future work (possibly in the next sprint) involves adding the stub attribution fields to the churn/retention dataset, since the hard work of slicing and dicing is done by redash/presto.

[1] https://gist.github.com/acmiyaguchi/f5eef1c11a4a5f24616bf50aeb5a8d7e/f663f82c3b7e8f8134847be9b60c6029f3172bff
[2] https://pipeline-cep.prod.mozaws.net/dashboard_output/graphs/analysis.amiyaguchi.amiyaguchi_stub_attribution_count.messages_per_hour.html
[3] https://sql.telemetry.mozilla.org/queries/2655
[4] https://gist.github.com/acmiyaguchi/d3cc2b2f9e8441a3c49a97ff09c12d53
Depends on: 1337037
Anthony: What is the latest on the attribution report that is able to be filtered and pivoted in various dimensions? We need this report available before the end of the month as this was supposed to be available well before the end of Q1. Thanks
Flags: needinfo?(amiyaguchi)
The data has been available as of 2017-03-09 as per bug 1337037 comment 4.

I've created an example dashboard of this data using the top 20 elements in each dimension as separate graphs [1]. Redash is somewhat limiting for querying over various dimensions through UI elements. This could be used as a point for further exploration by creating custom queries for individual sub cohorts.

The first two graphs displays all top 20 attributes at the same time, while the last two take advantage of the "::multi-filter" feature of redash to choose individual attributes.

[1] https://sql.telemetry.mozilla.org/dashboard/stub-attribution-aggregates?p_start_date=20170118
Flags: needinfo?(amiyaguchi)
(In reply to Anthony Miyaguchi [:amiyaguchi] from comment #13)
> The data has been available as of 2017-03-09 as per bug 1337037 comment 4.
> 
> I've created an example dashboard of this data using the top 20 elements in
> each dimension as separate graphs [1]. Redash is somewhat limiting for
> querying over various dimensions through UI elements. This could be used as
> a point for further exploration by creating custom queries for individual
> sub cohorts.
> 
> The first two graphs displays all top 20 attributes at the same time, while
> the last two take advantage of the "::multi-filter" feature of redash to
> choose individual attributes.
> 
> [1]
> https://sql.telemetry.mozilla.org/dashboard/stub-attribution-
> aggregates?p_start_date=20170118

Thanks!

Another question. Do you have the same data available in tabular format if we wanted to download it all to a spreadsheet to pivot it there?
Flags: needinfo?(amiyaguchi)
If you want be able to see the source data for the dashboard, you can click on the dashboard and click on the table tab. [1] This is a table of week number, attribute, and percentage.

The query for the specific set of (source, medium, campaign, content) combinations would probably look most similar to the `[Stub Attribution] Retention by has_attribute` query, if you wanted it.

[1] https://sql.telemetry.mozilla.org/queries/3841?p_start_date=20170118#table
Flags: needinfo?(amiyaguchi)
This dataset has been in production for a few months. Any follow-up issues should be filed as new bugs.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Thanks!
Blocks: 1381806
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.