Closed Bug 922745 Opened 11 years ago Closed 10 years ago

Write Hadoop job to convert and export historic Telemetry data to AWS format

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

Unreviewed

People

(Reporter: mreid, Assigned: jonasfj)

Details

Mark Reid [:mreid]

Reporter

Description

•

11 years ago

In order to export the historic data from mango to AWS, we need to implement a map/reduce version of the code that processes incoming submissions in the AWS-based Telemetry data pipeline. To summarize: - Read json payload - Convert to validated format - Write to partitioned directory structure - lzma/xz compress - Export to S3

Mark Reid [:mreid]

Reporter

Updated

•

11 years ago

Assignee: tmeyarivan → mreid

Mark Reid [:mreid]

Reporter

Updated

•

11 years ago

Group: metrics-private

Mark Reid [:mreid]

Reporter

Updated

•

11 years ago

Assignee: mreid → jopsen

Jonas Finnemann Jensen (:jonasfj)

Assignee

Comment 1

•

11 years ago

hmm... Is this something we want to do? So tmary and I arrived at two possible solutions: A) Aggregate historic telemetry data on mango using a map/reduce job. This produces output to be displayed on the telemetry-dashboard. B) Export raw telemtry pings to S3, process them server side with telemetry incoming server from telemtry-server and publish them to a bucket once converted and validated. Then we can do whatever analysis we might want to do... Strategy (A) is implemented, as the job finished last night, I'll be verifying the data today and tomorrow. Hopefully, merging it into data file usable by the dashboard. Strategy (B) is fairly easy to implement on the mango-side, and can be fired up in an hour or so. Once everything is in S3, we can launch incoming-process servers on spot nodes as if the exported data was collected from HTTP nodes. A few hacks to the incoming-process servers would be required, these are already implemented (I have them laying around somewhere). If (A) works, I'm not sure we want to do (B), that's what I'm asking? Note, this bug suggests (C) conversion and validation of telemetry pings on mango, in terms of work required I think it'll be faster to do (B). ----- Anyways, do we want to export, if (A) works?

Jonas Finnemann Jensen (:jonasfj)

Assignee

Comment 2

•

11 years ago

Just, for reference, the script doing (A) on mango, is `process_seq_file-fast.py`, available here: https://github.com/jonasfj/telemetry-dashboard/tree/hadoop-extraction-script

Jonas Finnemann Jensen (:jonasfj)

Assignee

Comment 3

•

10 years ago

Pretty sure we did this... Then sat on the data so long that we care to import it anyways :)

Status: NEW → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Write Hadoop job to convert and export historic Telemetry data to AWS format

Categories

(Mozilla Metrics :: Hadoop/HBase Operations, defect)

Tracking

(Not tracked)

People

(Reporter: mreid, Assigned: jonasfj)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Updated

Comment 1

Comment 2

Comment 3