Import datasets to Looker for the New Tab report
Categories
(Data Platform and Tools :: General, task)
Tracking
(Not tracked)
People
(Reporter: mmccorquodale, Assigned: frank)
References
Details
We need to build a Looker report on New Tab, including data from multiple sources. The tables we will need to be able to query are:
- contextual_services.event_aggregates
- contextual_services.topsites_click
- contextual_services.topsites_impression
- activity_stream.events
- activity_stream.sessions
- activity_stream.impression_stats_flat
- search.search_clients_daily
If there are any questions or issues, please let me know.
| Assignee | ||
Comment 1•4 years ago
|
||
I'm mulling this over, and I think this should be a custom dashboard in the DUET project. We're not quite ready to deploy explores for all of these tables more generally.
| Assignee | ||
Comment 2•4 years ago
|
||
Megan and I met to plan this out. The current plan is as follows:
- Enable Search data in Looker, per the Desktop Firefox Data in Looker Proposal. That should enable the data she needs from search.
- Enable activity-stream data as a new Namespace. We need input from Nan on what the Namespace(s) should be for this data; we are currently thinking either "New Tab" or "Activity Stream". We could consider breaking the activity stream data up into multiple Namespaces as well. We will make this available as mozilla-confidential, and should be relatively easy to enable.
- Enable Contextual Services data via the bigquery oauth connection. This will limit capabilities (i.e. no caching), but initially Megan does not need to use the
topsites_impressiontable, which is the largest one.
We hope to have the mozilla-confidential data available by Wednesday, which means we'll need to quickly iterate on the activity_stream and search data. We will follow-up with the contextual_services data soon after.
Once these are deployed we will have them available as explores for everyone to use (except for contextual services, which will have the same permissions as the underlying BQ tables).
Megan, can you confirm that that matches your understanding?
| Assignee | ||
Comment 3•4 years ago
|
||
This relies on https://github.com/mozilla/lookml-generator/issues/126.
| Assignee | ||
Comment 4•4 years ago
|
||
Here is my first pass at enabling Activity Stream data in Looker:
Namespace: Activity Stream
Explores:
Event Counts- A view on theeventstable, with anevent countmeasure andclient countmeasure (and optionally, a ranking of the top events, like we did for UJET)Session Counts- A view on thesessionstable, with asession countmeasure andclient countmeasurePocket Tile Impressions- A view onimpression_stats_flat, with multiple count measures:load count,impression count,click count,block count, andpocketed count. We may also consider an e.g.clients loaded count, which would be a distinct count onclient_idfor clients who loaded the tile.
| Reporter | ||
Comment 5•4 years ago
|
||
This sounds good to me, thanks Frank.
| Assignee | ||
Updated•4 years ago
|
Comment 6•4 years ago
|
||
:frank - your proposal looks great to me.
Just wanted to point out that the contextual-services datasets are restricted to a certain group of folks, the group permission management is co-managed by two directors (ckarlof and atsay), and the redash access is managed separately by data ops. Not sure if Looker needs the same setup, please let us know if certain permission management is required on Looker.
Also, we definitely want to have all tables under Contextual-services available on Looker, let's use this as a starter for that work. Glad to chat about this in detail with you when you believe Looker is ready for this dataset :)
| Assignee | ||
Comment 7•4 years ago
|
||
Thank you for the review, Nan!
Just wanted to point out that the contextual-services datasets are restricted to a certain group of folks, the group permission management is co-managed by two directors (ckarlof and atsay), and the redash access is managed separately by data ops. Not sure if Looker needs the same setup, please let us know if certain permission management is required on Looker.
Also, we definitely want to have all tables under Contextual-services available on Looker, let's use this as a starter for that work. Glad to chat about this in detail with you when you believe Looker is ready for this dataset :)
This is good to know - let's try it with the bigquery-oauth connection, for now. Once Jason is back we can talk about switching to a different auth mechanism which would let us utilize PDTs and caching to speed up dashboards. It should be fine to enable topsites_impression right now, except that we can't speed up dashboards that use it.
Comment 8•4 years ago
|
||
Sounds good, let's do that.
| Assignee | ||
Comment 9•4 years ago
|
||
Okay, I'm closing this out, but we have follow-up work to do for the imported Namespaces:
Contextual Services - We should enable all the tables available in that dataset
Activity Stream - As above, all tables should be queryable
I'll track that work in new bugs/issues.
Description
•