Closed
Bug 1429955
Opened 8 years ago
Closed 8 years ago
Create a prototype ingestion pipeline in gcp
Categories
(Data Platform and Tools :: General, enhancement, P1)
Data Platform and Tools
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: relud, Assigned: relud)
References
Details
at a glance it looks like this would be a good pipeline:
app engine (1) -> cloud pub/sub -> app engine (2) -> cloud storage -> cloud function -> big query
then set up big query to be accessible in redash.
app engine (1) validates json schema and forwards to pub/sub
cloud pub/sub makes sure we can connect other data warehouses and real-time analytics
app engine (2) batches up incoming data and writes to cloud storage (in avro format maybe)
cloud storage makes sure we don't run into limits when inserting to big query
cloud function executes on a regular basis and inserts data from cloud storage to data partitioned big query tables
Assignee | ||
Comment 1•8 years ago
|
||
date* partitioned big query tables
Assignee | ||
Updated•8 years ago
|
Priority: P2 → P1
Assignee | ||
Comment 2•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Points: 2 → 3
Assignee | ||
Comment 3•8 years ago
|
||
Recording work here https://docs.google.com/document/d/1ENoZqLYBl-EyS9b8dZ-QDpsYLBZfGLKRdYO4NbpfMYk/edit#
Git repo here: https://github.com/relud/telemetry-sample
Assignee | ||
Comment 4•8 years ago
|
||
The google doc in comment 3 is up to date with current work on this. resolving fixed. Future work will be done in other bugs.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Updated•3 years ago
|
Component: Pipeline Ingestion → General
You need to log in
before you can comment on or make changes to this bug.
Description
•