Closed Bug 1122969 Opened 9 years ago Closed 9 years ago

Redshift output

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect)

x86
macOS
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: kparlante, Assigned: trink)

References

Details

      No description provided.
Blocks: 1122972
Status: NEW → ASSIGNED
Comments from bug triage:
- Hopefully by the end of the week
- First will use dummy tables (headers/json blobs)
- Next step use FHR schema

Risks:
- synchronous, could jam up heka
Redshift output was tested/running on Feb 6 against a basic message/table schema. The speed was reasonable from my home machine when bulk loading (10K inserts/sec) and should be much better from machines within AWS. Synchronous individual inserts were painfully slow, about 10 per second.

Katie: ETA on the real schema?
Flags: needinfo?(kparlante)
The FHR data is ingested by Bagheera and stored in HDFS initially. It is then processed by bcolloran's de-orphaning script. Saptarshi's code creates a set of samples from the de-orphaned data, which are loaded into vertica. 

Here's the full vertica schema (includes ADI and other tables): https://mana.mozilla.org/wiki/download/attachments/43724740/vertica_tables.txt

And more info about the rollup & vertica import scripts:
https://mana.mozilla.org/wiki/display/BIDW/FHR+rollups
Flags: needinfo?(kparlante)
So what do we actually need here? 
- Something to read the de-orphaned results out of HDFS and put them in Redshift instead?
- Perform the de-orphaning in the pipeline data stream and populate Redshift avoiding HDFS?
- ??

The generic Redshift output is done so I am closing this.  Please open a bug(s) for the implementation of the specific FHR use cases.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.