Closed
Bug 1506677
Opened 7 years ago
Closed 7 years ago
Convert one day of Heka-protobuf main pings into newline-delimited json
Categories
(Data Platform and Tools :: General, enhancement, P1)
Data Platform and Tools
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: amiyaguchi, Assigned: amiyaguchi)
References
Details
Attachments
(1 file)
We want 1 day of data from the telemetry pipeline copied into GCP for performance testing.
The basic procedure will be the following:
* Use the moztelemetry API to read an RDD[HekaMessage] and write out a RDD[JSON]
* Store data in temporary staging area
* Copy data over to GCS
The day will be a weekday to get a reasonable sized load. In addition, it may be useful to generate 90 days of nightly data in the same format for further testing.
| Assignee | ||
Comment 1•7 years ago
|
||
This converts a day of data for 20181101. It takes 9 hours to process, resulting in a 1TiB dataset with 160 partitions. It is stored in the following s3 location:
> s3://net-mozaws-prod-us-west-2-pipeline-analysis/amiyaguchi/20181101-main-json-gzip
Attachment #9024842 -
Flags: review?(fbertsch)
| Assignee | ||
Comment 2•7 years ago
|
||
The data is now also available under the following bucket:
> gs://bug-1506674/amiyaguchi/20181101-main-json-gzip/
| Assignee | ||
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Comment 3•7 years ago
|
||
Comment on attachment 9024842 [details]
Bug 1506677 - main ping conversion (Scala)
LGTM
Attachment #9024842 -
Flags: review?(fbertsch) → review+
You need to log in
before you can comment on or make changes to this bug.
Description
•