Status

Cloud Services
Metrics: Pipeline
RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: Katie Parlante, Assigned: trink)

Tracking

Firefox Tracking Flags

(Not tracked)

Details

Comment hidden (empty)
(Reporter)

Updated

3 years ago
Blocks: 1122972
(Assignee)

Updated

3 years ago
Status: NEW → ASSIGNED
(Reporter)

Comment 2

3 years ago
Comments from bug triage:
- Hopefully by the end of the week
- First will use dummy tables (headers/json blobs)
- Next step use FHR schema

Risks:
- synchronous, could jam up heka
(Assignee)

Comment 3

3 years ago
Redshift output was tested/running on Feb 6 against a basic message/table schema. The speed was reasonable from my home machine when bulk loading (10K inserts/sec) and should be much better from machines within AWS. Synchronous individual inserts were painfully slow, about 10 per second.

Katie: ETA on the real schema?
Flags: needinfo?(kparlante)
(Reporter)

Comment 4

3 years ago
The FHR data is ingested by Bagheera and stored in HDFS initially. It is then processed by bcolloran's de-orphaning script. Saptarshi's code creates a set of samples from the de-orphaned data, which are loaded into vertica. 

Here's the full vertica schema (includes ADI and other tables): https://mana.mozilla.org/wiki/download/attachments/43724740/vertica_tables.txt

And more info about the rollup & vertica import scripts:
https://mana.mozilla.org/wiki/display/BIDW/FHR+rollups
(Reporter)

Updated

3 years ago
Flags: needinfo?(kparlante)
(Assignee)

Comment 5

3 years ago
So what do we actually need here? 
- Something to read the de-orphaned results out of HDFS and put them in Redshift instead?
- Perform the de-orphaning in the pipeline data stream and populate Redshift avoiding HDFS?
- ??

The generic Redshift output is done so I am closing this.  Please open a bug(s) for the implementation of the specific FHR use cases.
Status: ASSIGNED → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.