Closed Bug 1324943 Opened 9 years ago Closed 9 years ago

Add churn/retention v2 dataset to hive

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P1)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: amiyaguchi, Assigned: amiyaguchi)

References

Details

The production location for the desktop churn/retention dataset will be `s3://telemetry-parquet/churn/v2`. This is necessary to expose the parquet partitions to presto and redash.
Assignee: nobody → amiyaguchi
Blocks: 1311816
Points: --- → 1
Priority: -- → P1
:sunahsuh provided me with the hive machine location and :wdh sent me the private key to access the machine. I appended the job for the churn location with the new verison, which should update the partitions automatically once a day. For reference, `cron -l` will list all parquet2hive jobs and a new job should probably not overlap with any other ones to avoid OOM (like main_summary which takes 6 hours+ as of now, will change in the near future).
The data is available for querying [1] limited to 3 months of backfill. [1] https://sql.telemetry.mozilla.org/queries/1957/source#table
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.