Closed
Bug 1325667
Opened 8 years ago
Closed 8 years ago
Persist parquet data from hindsight to S3
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect, P2)
Cloud Services Graveyard
Metrics: Pipeline
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mreid, Assigned: robotblake)
References
Details
(Whiteboard: [SvcOps])
Given an output plugin to convert data direct to parquet, we would like to upload that data to S3 so it can be made available in re:dash.
That should involve:
1. Decide upon an S3 location. I propose s3://telemetry-parquet/data-lake/<docType>/v<targetVersion>/<partitions>/somefile.parquet
2. Set up a data lake loader (similar to our data warehouse loader that uploads heka-framed data) that writes files locally
3. Set up the file-uploader as used on the existing edge nodes that can upload and prune completed parquet files.
4. Run parquet2hive on the "data-lake" dir so that new partitions are made available to re:dash
Comment 1•8 years ago
|
||
please work with Trink for more info as needed and update scope.
Assignee: nobody → whd
Priority: -- → P2
Comment 2•8 years ago
|
||
I didn't see this bug, but all the steps I'm responsible for (1-3) are completed. The data currently goes to s3://net-mozaws-prod-us-west-2-pipeline-data/*-parquet prefixes but this can be changed easily (s3://telemetry-parquet is hosted in dev and thus I am avoiding writing to it from prod). We're currently performing direct-to-parquet for core (bug #1333203) and testpilot (bug #1333206) pings, but others can be added as needed. This is performed on the regular DWL.
I believe (4) is blocked by bug #1333066, so I'm marking that as a blocker.
Depends on: 1333066
Updated•8 years ago
|
Whiteboard: [SvcOps]
Comment 3•8 years ago
|
||
Per bug #1344349 we've adopted a standard versioning policy on direct-to-parquet data, and
:robotblake's working on the p2h import stuff which will make new direct-to-parquet datasets automatically imported into presto. When that's done I think this bug can be closed.
Assignee: whd → bimsland
Status: NEW → ASSIGNED
Points: --- → 1
Assignee | ||
Updated•8 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•