Closed
Bug 1109882
Opened 10 years ago
Closed 10 years ago
Set appropriate log retention policy for cloudfront
Categories
(Content Services Graveyard :: Tiles: Ops, defect)
Content Services Graveyard
Tiles: Ops
Tracking
(Not tracked)
RESOLVED
FIXED
Iteration:
37.2
People
(Reporter: Mardak, Assigned: mostlygeek)
Details
(Whiteboard: .009)
Our retention policy for our view and click pings is to have the raw data with IP addresses for at most 7 days. It sounds like we happen to be getting access logs from cloudfront in a S3 bucket.
We aren't processing that data right now, so it would make sense to set it to 0 days of retention.
If we do have a need to look at those, it would be reasonable to extend it to 7 days as we have for our pings.
oyiptong/tspurway, any reason right now to keep the logs?
Flags: needinfo?(tspurway)
Flags: needinfo?(oyiptong)
| Assignee | ||
Comment 1•10 years ago
|
||
If we don't need the logs we can simply turn off shipping of logs to S3 from cloudfront.
Comment 2•10 years ago
|
||
There may be a reason to keep them for 7 days. We could use IP addresses as additional data for fraud detection.
For instance: image downloads (or even HEAD requests) should occur soon after a fetch happens.
After a fetch, clicks/impressions could occur shortly after.
That said, we'd need to think about this more, perhaps.
Flags: needinfo?(oyiptong)
Comment 3•10 years ago
|
||
Right now, we are processing logs as they appear in a streaming / aggregating fashion. This is great for log aggregation, but there are many other types of processing that are usually run on daily, weekly or monthly basis (although for us, the highest granularity will be weekly):
- unique visitor analysis (daily)
- fraud detection (daily)
- user segment / categorization analysis (weekly)
when we start considering machine learning / clustering, there are many more
Flags: needinfo?(tspurway)
| Reporter | ||
Comment 4•10 years ago
|
||
I believe all of those types of analysis can be handled through our existing fetch/view/click logs, and we aren't using the cloudfront logs right now. The existing logs are probably a better source for some of those analysis anyway.
I think we'll want to have the logs for different types of analysis, e.g., image hotlinking and different types of fraud.
| Reporter | ||
Comment 5•10 years ago
|
||
mostlygeek, can you set the retention of cloudfront logs to be the same as our other storage with IP addresses, i.e., 7 days?
Assignee: nobody → bwong
Status: NEW → ASSIGNED
Iteration: --- → 37.2
Points: --- → 1
| Assignee | ||
Comment 6•10 years ago
|
||
retention has been set to 7 days.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
| Reporter | ||
Updated•10 years ago
|
Whiteboard: .? → .009
You need to log in
before you can comment on or make changes to this bug.
Description
•