Should we surface "live" views in user-facing datasets?
Categories
(Data Platform and Tools :: General, enhancement, P1)
Tracking
(Not tracked)
People
(Reporter: klukas, Assigned: klukas)
References
Details
(Whiteboard: [dataplatform])
In our table layout and naming docs, we define parameters around user-facing datasets and say that they:
contain user-facing views on top of the tables in the corresponding stable and derived datasets.
Note that we specifically do not include any automated views on top of live datasets. We have not documented a policy around live data, but I have advocated in general that live data not be considered user-facing with the assumption that there are few use cases that actually require live data, and it could be confusing for users to grapple with understanding the impact of a weaker deduplication contract and performance impact of different clustering.
We have in practice needed to provision views on top of live data, but we have generally published these only in the shared-prod project, and only under *_derived
datasets, as a way of avoiding advertising these to users and making a contract about how to access the live data.
It may no longer be valid to assume that "there are few use cases that actually require live data", so may be time to introduce more clear policy and practice around this.
It's also worth noting that we have several machine consumers of live views that access data via service accounts. These depend on stability of the names and existence of views under _derived
datasets, so the idea that these are not "user-facing" already does not hold up very well. See https://bugzilla.mozilla.org/show_bug.cgi?id=1682343 for an example of a machine user.
This may warrant a proposal to discuss what new documentation and guidance would be needed to confidently provision user-facing views on top of live data.
Assignee | ||
Comment 1•4 years ago
|
||
:jrmuizel pointed me to https://sql.telemetry.mozilla.org/dashboard/webrender which contains a variety of queries base on telemetry_live.main_v4
. He also mentioned interest in having a main_1pct_live
or main_nightly_live
that could provide better performance.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 2•3 years ago
|
||
I'm doing some info gathering this week about use cases for live data and materialized views, with the goal of writing up a compact proposal recommending a path forward here.
Assignee | ||
Comment 3•3 years ago
|
||
proposal is up for review: https://docs.google.com/document/d/1Zn8MbwQt2ANA7LyVpTAJnINDTnoX1Q7DSgjsEngwRDw/edit#heading=h.5x0d5h95i329
Assignee | ||
Comment 4•3 years ago
|
||
Proposal is now accepted and implementation of docs updates are tracked in metabug https://bugzilla.mozilla.org/show_bug.cgi?id=1727071
Assignee | ||
Updated•3 years ago
|
Updated•2 years ago
|
Updated•2 years ago
|
Description
•