Closed Bug 1121597 Opened 10 years ago Closed 10 years ago

Reload Impression, App and Error data from S3 into DDFS

Categories

(Content Services Graveyard :: Tiles: Ops, defect)

defect
Not set
normal
Points:
3

Tracking

(Not tracked)

RESOLVED FIXED
Iteration:
38.2 - 9 Feb

People

(Reporter: tspurway, Assigned: tspurway)

Details

(Whiteboard: .008)

We will need all of the data we have from Jan 13th back to wherever the data starts from (assuming a 7 day window). Note that we will have to delete all of the data from the 13th then reload it, as it is a partial day. We should follow these steps to ensure consistency: - all data transmitted must tagged with a DDFS 'processed: ...' prefix - delete data from the 13th, note the order is important. incoming: prefixed tags must be deleted first: - ddfs rm incoming:app:2015-01-13 incoming:error:2015-01-13 incoming:impression:2015-01-13 - ddfs rm processed:app:2015-01-13 processed:error:2015-01-13 processed:impression:2015-01-13 - load data for app, error, and impression data for the period <beginning> .. 2015-01-13 inclusive (use processed: as DDFS prefix for ALL tags) I can do this job as an Inferno task, rather than burden ops with a bunch of data transfers. I will need read access to relevant s3 buckets and naming conventions, :relud.
Flags: needinfo?(dthornton)
the infernyx host has been granted s3:* permission to tiles-incoming-prod-us-west-2 and all objects in it. broad permissions, because this is temporary. please only list and read objects from the bucket. file structure is currently "<md5>-<app|impression>-<YYYY>.<MM>.<DD>"
Flags: needinfo?(dthornton)
file structure will be changing soon though: https://bugzilla.mozilla.org/show_bug.cgi?id=1121694
Assignee: nobody → tspurway
relud: I would like to run an Inferno job to load all of the data (> 78,000 blobs). I am getting an error that seems to indicate the disco slave nodes don't have access to s3. Could we widen the (temporary) permissions to include the disco slaves? FATAL: [map:0] Traceback (most recent call last): File "/usr/var/disco/data/ip-172-31-26-122/30/bulk_load@58d:a9e7c:c0b62/usr/lib/python2.7/site-packages/disco/worker/__init__.py", line 340, in main job.worker.start(task, job, **jobargs) File "/usr/var/disco/data/ip-172-31-26-122/30/bulk_load@58d:a9e7c:c0b62/usr/lib/python2.7/site-packages/disco/worker/__init__.py", line 303, in start self.run(task, job, **jobargs) File "/usr/var/disco/data/ip-172-31-26-122/30/bulk_load@58d:a9e7c:c0b62/usr/lib/python2.7/site-packages/disco/worker/classic/worker.py", line 328, in run getattr(self, task.stage)(task, params) File "/usr/var/disco/data/ip-172-31-26-122/30/bulk_load@58d:a9e7c:c0b62/usr/lib/python2.7/site-packages/disco/worker/classic/worker.py", line 341, in map for key, val in self['map'](entry, params): File "infernyx/s3import.py", line 23, in s3_import_map File "/usr/lib/python2.7/site-packages/boto/s3/connection.py", line 502, in get_bucket return self.head_bucket(bucket_name, headers=headers) File "/usr/lib/python2.7/site-packages/boto/s3/connection.py", line 535, in head_bucket raise err S3ResponseError: S3ResponseError: 403 Forbidden
Flags: needinfo?(dthornton)
disco slave nodes have been granted access like infernyx
Flags: needinfo?(dthornton)
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Iteration: --- → 38.2 - 9 Feb
Points: --- → 3
Whiteboard: .008
You need to log in before you can comment on or make changes to this bug.