Closed Bug 1122969 Opened 9 years ago Closed 9 years ago

Redshift output

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: kparlante, Assigned: trink)

References

Details

Katie Parlante

Reporter

Description

•

9 years ago

      No description provided.

Katie Parlante

Reporter

Updated

•

9 years ago

Blocks: 1122972

Mike Trinkala [:trink]

Assignee

Updated

•

9 years ago

Status: NEW → ASSIGNED

Mike Trinkala [:trink]

Assignee

Comment 1

•

9 years ago

Depends on: https://github.com/mozilla-services/heka/issues/1303

Katie Parlante

Reporter

Comment 2

•

9 years ago

Comments from bug triage:
- Hopefully by the end of the week
- First will use dummy tables (headers/json blobs)
- Next step use FHR schema

Risks:
- synchronous, could jam up heka

Mike Trinkala [:trink]

Assignee

Comment 3

•

9 years ago

Redshift output was tested/running on Feb 6 against a basic message/table schema. The speed was reasonable from my home machine when bulk loading (10K inserts/sec) and should be much better from machines within AWS. Synchronous individual inserts were painfully slow, about 10 per second.

Katie: ETA on the real schema?

Flags: needinfo?(kparlante)

Katie Parlante

Reporter

Comment 4

•

9 years ago

The FHR data is ingested by Bagheera and stored in HDFS initially. It is then processed by bcolloran's de-orphaning script. Saptarshi's code creates a set of samples from the de-orphaned data, which are loaded into vertica. 

Here's the full vertica schema (includes ADI and other tables): https://mana.mozilla.org/wiki/download/attachments/43724740/vertica_tables.txt

And more info about the rollup & vertica import scripts:
https://mana.mozilla.org/wiki/display/BIDW/FHR+rollups

Katie Parlante

Reporter

Updated

•

9 years ago

Flags: needinfo?(kparlante)

Mike Trinkala [:trink]

Assignee

Comment 5

•

9 years ago

So what do we actually need here? 
- Something to read the de-orphaned results out of HDFS and put them in Redshift instead?
- Perform the de-orphaning in the pipeline data stream and populate Redshift avoiding HDFS?
- ??

The generic Redshift output is done so I am closing this.  Please open a bug(s) for the implementation of the specific FHR use cases.

Status: ASSIGNED → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

Mike Trinkala [:trink]

Assignee

Comment 6

•

9 years ago

The example is here: https://github.com/mozilla-services/data-pipeline/blob/master/heka/sandbox/outputs/redshift.lua

BMO Automation

Updated

•

6 years ago

Product: Cloud Services → Cloud Services Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Redshift output

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect)

Tracking

(Not tracked)

People

(Reporter: kparlante, Assigned: trink)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Updated

Comment 5

Comment 6

Updated