Closed Bug 1697285 Opened 4 years ago Closed 3 years ago

Should we surface "live" views in user-facing datasets?

Categories

(Data Platform and Tools :: General, enhancement, P1)

enhancement
Points:
5

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: klukas, Assigned: klukas)

References

Details

(Whiteboard: [dataplatform])

In our table layout and naming docs, we define parameters around user-facing datasets and say that they:

contain user-facing views on top of the tables in the corresponding stable and derived datasets.

Note that we specifically do not include any automated views on top of live datasets. We have not documented a policy around live data, but I have advocated in general that live data not be considered user-facing with the assumption that there are few use cases that actually require live data, and it could be confusing for users to grapple with understanding the impact of a weaker deduplication contract and performance impact of different clustering.

We have in practice needed to provision views on top of live data, but we have generally published these only in the shared-prod project, and only under *_derived datasets, as a way of avoiding advertising these to users and making a contract about how to access the live data.

It may no longer be valid to assume that "there are few use cases that actually require live data", so may be time to introduce more clear policy and practice around this.

It's also worth noting that we have several machine consumers of live views that access data via service accounts. These depend on stability of the names and existence of views under _derived datasets, so the idea that these are not "user-facing" already does not hold up very well. See https://bugzilla.mozilla.org/show_bug.cgi?id=1682343 for an example of a machine user.

This may warrant a proposal to discuss what new documentation and guidance would be needed to confidently provision user-facing views on top of live data.

:jrmuizel pointed me to https://sql.telemetry.mozilla.org/dashboard/webrender which contains a variety of queries base on telemetry_live.main_v4. He also mentioned interest in having a main_1pct_live or main_nightly_live that could provide better performance.

Points: --- → 5
Priority: -- → P3
Whiteboard: [data-platform-infra-wg]
See Also: → 1719893

I'm doing some info gathering this week about use cases for live data and materialized views, with the goal of writing up a compact proposal recommending a path forward here.

Assignee: nobody → jklukas
See Also: → 1727071

Proposal is now accepted and implementation of docs updates are tracked in metabug https://bugzilla.mozilla.org/show_bug.cgi?id=1727071

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Component: Datasets: General → General
Whiteboard: [data-platform-infra-wg] → [dataplatform]
You need to log in before you can comment on or make changes to this bug.