Closed Bug 1711955 Opened 4 years ago Closed 4 years ago

Import datasets to Looker for the New Tab report

Categories

(Data Platform and Tools :: General, task)

task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mmccorquodale, Assigned: frank)

References

Details

We need to build a Looker report on New Tab, including data from multiple sources. The tables we will need to be able to query are:

  • contextual_services.event_aggregates
  • contextual_services.topsites_click
  • contextual_services.topsites_impression
  • activity_stream.events
  • activity_stream.sessions
  • activity_stream.impression_stats_flat
  • search.search_clients_daily

If there are any questions or issues, please let me know.

I'm mulling this over, and I think this should be a custom dashboard in the DUET project. We're not quite ready to deploy explores for all of these tables more generally.

Megan and I met to plan this out. The current plan is as follows:

  1. Enable Search data in Looker, per the Desktop Firefox Data in Looker Proposal. That should enable the data she needs from search.
  2. Enable activity-stream data as a new Namespace. We need input from Nan on what the Namespace(s) should be for this data; we are currently thinking either "New Tab" or "Activity Stream". We could consider breaking the activity stream data up into multiple Namespaces as well. We will make this available as mozilla-confidential, and should be relatively easy to enable.
  3. Enable Contextual Services data via the bigquery oauth connection. This will limit capabilities (i.e. no caching), but initially Megan does not need to use the topsites_impression table, which is the largest one.

We hope to have the mozilla-confidential data available by Wednesday, which means we'll need to quickly iterate on the activity_stream and search data. We will follow-up with the contextual_services data soon after.

Once these are deployed we will have them available as explores for everyone to use (except for contextual services, which will have the same permissions as the underlying BQ tables).

Megan, can you confirm that that matches your understanding?

Here is my first pass at enabling Activity Stream data in Looker:

Namespace: Activity Stream
Explores:

  • Event Counts - A view on the events table, with an event count measure and client count measure (and optionally, a ranking of the top events, like we did for UJET)
  • Session Counts - A view on the sessions table, with a session count measure and client count measure
  • Pocket Tile Impressions - A view on impression_stats_flat, with multiple count measures: load count, impression count, click count, block count, and pocketed count. We may also consider an e.g. clients loaded count, which would be a distinct count on client_id for clients who loaded the tile.

This sounds good to me, thanks Frank.

Depends on: 1712956
Assignee: nobody → fbertsch

:frank - your proposal looks great to me.

Just wanted to point out that the contextual-services datasets are restricted to a certain group of folks, the group permission management is co-managed by two directors (ckarlof and atsay), and the redash access is managed separately by data ops. Not sure if Looker needs the same setup, please let us know if certain permission management is required on Looker.

Also, we definitely want to have all tables under Contextual-services available on Looker, let's use this as a starter for that work. Glad to chat about this in detail with you when you believe Looker is ready for this dataset :)

Thank you for the review, Nan!

Just wanted to point out that the contextual-services datasets are restricted to a certain group of folks, the group permission management is co-managed by two directors (ckarlof and atsay), and the redash access is managed separately by data ops. Not sure if Looker needs the same setup, please let us know if certain permission management is required on Looker.

Also, we definitely want to have all tables under Contextual-services available on Looker, let's use this as a starter for that work. Glad to chat about this in detail with you when you believe Looker is ready for this dataset :)

This is good to know - let's try it with the bigquery-oauth connection, for now. Once Jason is back we can talk about switching to a different auth mechanism which would let us utilize PDTs and caching to speed up dashboards. It should be fine to enable topsites_impression right now, except that we can't speed up dashboards that use it.

Sounds good, let's do that.

Okay, I'm closing this out, but we have follow-up work to do for the imported Namespaces:
Contextual Services - We should enable all the tables available in that dataset
Activity Stream - As above, all tables should be queryable

I'll track that work in new bugs/issues.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.