Closed
Bug 1121597
Opened 10 years ago
Closed 10 years ago
Reload Impression, App and Error data from S3 into DDFS
Categories
(Content Services Graveyard :: Tiles: Ops, defect)
Content Services Graveyard
Tiles: Ops
Tracking
(Not tracked)
RESOLVED
FIXED
Iteration:
38.2 - 9 Feb
People
(Reporter: tspurway, Assigned: tspurway)
Details
(Whiteboard: .008)
We will need all of the data we have from Jan 13th back to wherever the data starts from (assuming a 7 day window). Note that we will have to delete all of the data from the 13th then reload it, as it is a partial day.
We should follow these steps to ensure consistency:
- all data transmitted must tagged with a DDFS 'processed: ...' prefix
- delete data from the 13th, note the order is important. incoming: prefixed tags must be deleted first:
- ddfs rm incoming:app:2015-01-13 incoming:error:2015-01-13 incoming:impression:2015-01-13
- ddfs rm processed:app:2015-01-13 processed:error:2015-01-13 processed:impression:2015-01-13
- load data for app, error, and impression data for the period <beginning> .. 2015-01-13 inclusive (use processed: as DDFS prefix for ALL tags)
I can do this job as an Inferno task, rather than burden ops with a bunch of data transfers. I will need read access to relevant s3 buckets and naming conventions, :relud.
Updated•10 years ago
|
Flags: needinfo?(dthornton)
Comment 1•10 years ago
|
||
the infernyx host has been granted s3:* permission to tiles-incoming-prod-us-west-2 and all objects in it. broad permissions, because this is temporary. please only list and read objects from the bucket.
file structure is currently "<md5>-<app|impression>-<YYYY>.<MM>.<DD>"
Flags: needinfo?(dthornton)
Comment 2•10 years ago
|
||
file structure will be changing soon though: https://bugzilla.mozilla.org/show_bug.cgi?id=1121694
Assignee | ||
Updated•10 years ago
|
Assignee: nobody → tspurway
Assignee | ||
Comment 3•10 years ago
|
||
relud: I would like to run an Inferno job to load all of the data (> 78,000 blobs). I am getting an error that seems to indicate the disco slave nodes don't have access to s3. Could we widen the (temporary) permissions to include the disco slaves?
FATAL: [map:0] Traceback (most recent call last):
File "/usr/var/disco/data/ip-172-31-26-122/30/bulk_load@58d:a9e7c:c0b62/usr/lib/python2.7/site-packages/disco/worker/__init__.py", line 340, in main
job.worker.start(task, job, **jobargs)
File "/usr/var/disco/data/ip-172-31-26-122/30/bulk_load@58d:a9e7c:c0b62/usr/lib/python2.7/site-packages/disco/worker/__init__.py", line 303, in start
self.run(task, job, **jobargs)
File "/usr/var/disco/data/ip-172-31-26-122/30/bulk_load@58d:a9e7c:c0b62/usr/lib/python2.7/site-packages/disco/worker/classic/worker.py", line 328, in run
getattr(self, task.stage)(task, params)
File "/usr/var/disco/data/ip-172-31-26-122/30/bulk_load@58d:a9e7c:c0b62/usr/lib/python2.7/site-packages/disco/worker/classic/worker.py", line 341, in map
for key, val in self['map'](entry, params):
File "infernyx/s3import.py", line 23, in s3_import_map
File "/usr/lib/python2.7/site-packages/boto/s3/connection.py", line 502, in get_bucket
return self.head_bucket(bucket_name, headers=headers)
File "/usr/lib/python2.7/site-packages/boto/s3/connection.py", line 535, in head_bucket
raise err
S3ResponseError: S3ResponseError: 403 Forbidden
Assignee | ||
Updated•10 years ago
|
Flags: needinfo?(dthornton)
Comment 4•10 years ago
|
||
disco slave nodes have been granted access like infernyx
Updated•10 years ago
|
Flags: needinfo?(dthornton)
Assignee | ||
Updated•10 years ago
|
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Iteration: --- → 38.2 - 9 Feb
Points: --- → 3
Whiteboard: .008
You need to log in
before you can comment on or make changes to this bug.
Description
•