Closed
Bug 922745
Opened 11 years ago
Closed 10 years ago
Write Hadoop job to convert and export historic Telemetry data to AWS format
Categories
(Mozilla Metrics :: Hadoop/HBase Operations, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
Unreviewed
People
(Reporter: mreid, Assigned: jonasfj)
Details
In order to export the historic data from mango to AWS, we need to implement a map/reduce version of the code that processes incoming submissions in the AWS-based Telemetry data pipeline.
To summarize:
- Read json payload
- Convert to validated format
- Write to partitioned directory structure
- lzma/xz compress
- Export to S3
Reporter | ||
Updated•11 years ago
|
Assignee: tmeyarivan → mreid
Reporter | ||
Updated•11 years ago
|
Group: metrics-private
Reporter | ||
Updated•11 years ago
|
Assignee: mreid → jopsen
Assignee | ||
Comment 1•11 years ago
|
||
hmm... Is this something we want to do?
So tmary and I arrived at two possible solutions:
A)
Aggregate historic telemetry data on mango using a map/reduce job.
This produces output to be displayed on the telemetry-dashboard.
B)
Export raw telemtry pings to S3, process them server side with telemetry
incoming server from telemtry-server and publish them to a bucket once converted and validated.
Then we can do whatever analysis we might want to do...
Strategy (A) is implemented, as the job finished last night, I'll be verifying the data today and tomorrow.
Hopefully, merging it into data file usable by the dashboard.
Strategy (B) is fairly easy to implement on the mango-side, and can be fired up in an hour or so.
Once everything is in S3, we can launch incoming-process servers on spot nodes as if the exported data was collected from HTTP nodes. A few hacks to the incoming-process servers would be required, these are already implemented (I have them laying around somewhere).
If (A) works, I'm not sure we want to do (B), that's what I'm asking?
Note, this bug suggests (C) conversion and validation of telemetry pings on mango, in terms of work required I think it'll be faster to do (B).
-----
Anyways, do we want to export, if (A) works?
Assignee | ||
Comment 2•11 years ago
|
||
Just, for reference, the script doing (A) on mango, is `process_seq_file-fast.py`, available here:
https://github.com/jonasfj/telemetry-dashboard/tree/hadoop-extraction-script
Assignee | ||
Comment 3•10 years ago
|
||
Pretty sure we did this... Then sat on the data so long that we care to import it anyways :)
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•