Closed
Bug 1122969
Opened 9 years ago
Closed 9 years ago
Redshift output
Categories
(Cloud Services Graveyard :: Metrics: Pipeline, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: kparlante, Assigned: trink)
References
Details
No description provided.
Assignee | ||
Updated•9 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 1•9 years ago
|
||
Depends on: https://github.com/mozilla-services/heka/issues/1303
Reporter | ||
Comment 2•9 years ago
|
||
Comments from bug triage: - Hopefully by the end of the week - First will use dummy tables (headers/json blobs) - Next step use FHR schema Risks: - synchronous, could jam up heka
Assignee | ||
Comment 3•9 years ago
|
||
Redshift output was tested/running on Feb 6 against a basic message/table schema. The speed was reasonable from my home machine when bulk loading (10K inserts/sec) and should be much better from machines within AWS. Synchronous individual inserts were painfully slow, about 10 per second. Katie: ETA on the real schema?
Flags: needinfo?(kparlante)
Reporter | ||
Comment 4•9 years ago
|
||
The FHR data is ingested by Bagheera and stored in HDFS initially. It is then processed by bcolloran's de-orphaning script. Saptarshi's code creates a set of samples from the de-orphaned data, which are loaded into vertica. Here's the full vertica schema (includes ADI and other tables): https://mana.mozilla.org/wiki/download/attachments/43724740/vertica_tables.txt And more info about the rollup & vertica import scripts: https://mana.mozilla.org/wiki/display/BIDW/FHR+rollups
Reporter | ||
Updated•9 years ago
|
Flags: needinfo?(kparlante)
Assignee | ||
Comment 5•9 years ago
|
||
So what do we actually need here? - Something to read the de-orphaned results out of HDFS and put them in Redshift instead? - Perform the de-orphaning in the pipeline data stream and populate Redshift avoiding HDFS? - ?? The generic Redshift output is done so I am closing this. Please open a bug(s) for the implementation of the specific FHR use cases.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 6•9 years ago
|
||
The example is here: https://github.com/mozilla-services/data-pipeline/blob/master/heka/sandbox/outputs/redshift.lua
Updated•6 years ago
|
Product: Cloud Services → Cloud Services Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•