Closed Bug 1506677 Opened 7 years ago Closed 7 years ago

Convert one day of Heka-protobuf main pings into newline-delimited json

Categories

(Data Platform and Tools :: General, enhancement, P1)

enhancement
Points:
1

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: amiyaguchi, Assigned: amiyaguchi)

References

Details

Attachments

(1 file)

We want 1 day of data from the telemetry pipeline copied into GCP for performance testing. The basic procedure will be the following: * Use the moztelemetry API to read an RDD[HekaMessage] and write out a RDD[JSON] * Store data in temporary staging area * Copy data over to GCS The day will be a weekday to get a reasonable sized load. In addition, it may be useful to generate 90 days of nightly data in the same format for further testing.
Blocks: 1506710
Blocks: 1506711
This converts a day of data for 20181101. It takes 9 hours to process, resulting in a 1TiB dataset with 160 partitions. It is stored in the following s3 location: > s3://net-mozaws-prod-us-west-2-pipeline-analysis/amiyaguchi/20181101-main-json-gzip
Attachment #9024842 - Flags: review?(fbertsch)
The data is now also available under the following bucket: > gs://bug-1506674/amiyaguchi/20181101-main-json-gzip/
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Comment on attachment 9024842 [details] Bug 1506677 - main ping conversion (Scala) LGTM
Attachment #9024842 - Flags: review?(fbertsch) → review+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: