Closed Bug 1429955 Opened 8 years ago Closed 8 years ago

Create a prototype ingestion pipeline in gcp

Categories

(Data Platform and Tools :: General, enhancement, P1)

enhancement
Points:
3

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: relud, Assigned: relud)

References

Details

at a glance it looks like this would be a good pipeline: app engine (1) -> cloud pub/sub -> app engine (2) -> cloud storage -> cloud function -> big query then set up big query to be accessible in redash. app engine (1) validates json schema and forwards to pub/sub cloud pub/sub makes sure we can connect other data warehouses and real-time analytics app engine (2) batches up incoming data and writes to cloud storage (in avro format maybe) cloud storage makes sure we don't run into limits when inserting to big query cloud function executes on a regular basis and inserts data from cloud storage to data partitioned big query tables
date* partitioned big query tables
Priority: P2 → P1
Points: 2 → 3
Blocks: 1431769
Blocks: 1431770
The google doc in comment 3 is up to date with current work on this. resolving fixed. Future work will be done in other bugs.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Component: Pipeline Ingestion → General
You need to log in before you can comment on or make changes to this bug.