Setup exports for tiles redshift data to be used with spark

RESOLVED FIXED

Status

RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: emtwo, Assigned: relud)

Tracking

(Blocks: 1 bug)

Firefox Tracking Flags

(Not tracked)

Details

Comment hidden (empty)
(Assignee)

Updated

2 years ago
Component: General → Tiles: Ops
Product: Cloud Services → Content Services
(Reporter)

Updated

2 years ago
Blocks: 1272388
(Assignee)

Comment 1

2 years ago
unfortunately this has proven more difficult than expected. the redshift cluster is in our prod IAM, while the spark clusters are in our dev IAM. this means that firewall access is not as simple as adding a security group rule to both ends, because traffic is sent over the internet, and spark clusters do not have specific EIPs.

We have a few options here:

1) we can create daily rollup files in s3, and give spark access to those.

2) we can set up some sort of nat, so that spark tries to access redshift from a consistent source ip.

I think 1 will take less effort to set up, but I don't know what your requirements are for this.
(Assignee)

Updated

2 years ago
Flags: needinfo?(msamuel)
(Reporter)

Comment 2

2 years ago
Hi Daniel, sorry about the delay here and thanks for looking into this.

1) Sounds fine for me. Are there any more details you'd need from me to follow through on 1)?
Flags: needinfo?(msamuel)
(Assignee)

Comment 3

2 years ago
not yet, i'll look into it and let you know here if I run into issues.
(Reporter)

Comment 4

2 years ago
Hey relud, I just wanted to follow up on this and see what the status is on it? Thanks!
Flags: needinfo?(dthorn)
(Assignee)

Comment 5

2 years ago
sorry, this fell off my to do list. here it is:

https://github.com/mozilla-services/puppet-config/pull/2222
https://github.com/mozilla-services/svcops/pull/1209

it's up in stage, so you can see some stage outputs with:

> aws s3 ls --recursive s3://net-mozaws-stage-us-east-1-pipeline-analysis/tiles/

in prod it would be

> aws s3 ls --recursive s3://net-mozaws-prod-us-west-2-pipeline-analysis/tiles/
Flags: needinfo?(dthorn)
(Assignee)

Updated

2 years ago
Summary: Setup security group + permissions for tiles redshift to be used with spark → Setup exports for tiles redshift data to be used with spark
(Assignee)

Comment 7

2 years ago
this is exporting as expected in prod
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.