Update Main Summary tutorial to include usage for sample_id column

NEW
Assigned to

Status

Data Platform and Tools
Documentation and Knowledge Repo (RTMO)
P2
normal
a year ago
10 months ago

People

(Reporter: amiyaguchi, Assigned: amiyaguchi)

Tracking

Details

Attachments

(1 obsolete attachment)

(Assignee)

Description

a year ago
The main summary is fairly large, so its useful to use a smaller sample of the data when working on a project that relies heavily on the dataset. The resources for using this attribute are sparse. 

The sample_id field is documented in telemetry-batch-view repository. [1]

The Main Summary Tutorial [2] should be updated to explain/show the nuances of obtaining a small representative sample of the data covering parquet partitioning and small code snippets taking advantage of the sample_id field. 

The distribution of sample_ids within a month/day can help explain how it works.

The report should be added on RTMO.

[1] https://github.com/mozilla/telemetry-batch-view/blob/master/docs/MainSummary.md
[2] https://gist.github.com/mreid-moz/518f7515aac54cd246635c333683ecce
(Assignee)

Updated

a year ago
Assignee: nobody → amiyaguchi
Points: --- → 1
(Assignee)

Comment 1

a year ago
Created attachment 8824270 [details]
How to use main_summary.sample_id

I've written a gist that shows the properties of the sample_id and how to use it in pyspark to select a subset of the main summary.
(Assignee)

Comment 2

a year ago
Comment on attachment 8824270 [details]
How to use main_summary.sample_id

Link to gist on how to use main_summary.sample_id

https://gist.github.com/acmiyaguchi/0b3772807f146575420a9e157b10fbb9
Attachment #8824270 - Attachment is obsolete: true

Updated

a year ago
Priority: -- → P2

Updated

10 months ago
Component: Metrics: Pipeline → Documentation and Knowledge Repo (RTMO)
Product: Cloud Services → Data Platform and Tools

Comment 3

10 months ago
It would be great if we could move this to a cookbook in the gitbook!
You need to log in before you can comment on or make changes to this bug.