Closed Bug 1109882 Opened 10 years ago Closed 10 years ago

Set appropriate log retention policy for cloudfront

Categories

(Content Services Graveyard :: Tiles: Ops, defect)

defect
Not set
normal
Points:
1

Tracking

(Not tracked)

RESOLVED FIXED
Iteration:
37.2

People

(Reporter: Mardak, Assigned: mostlygeek)

Details

(Whiteboard: .009)

Our retention policy for our view and click pings is to have the raw data with IP addresses for at most 7 days. It sounds like we happen to be getting access logs from cloudfront in a S3 bucket. We aren't processing that data right now, so it would make sense to set it to 0 days of retention. If we do have a need to look at those, it would be reasonable to extend it to 7 days as we have for our pings. oyiptong/tspurway, any reason right now to keep the logs?
Flags: needinfo?(tspurway)
Flags: needinfo?(oyiptong)
If we don't need the logs we can simply turn off shipping of logs to S3 from cloudfront.
There may be a reason to keep them for 7 days. We could use IP addresses as additional data for fraud detection. For instance: image downloads (or even HEAD requests) should occur soon after a fetch happens. After a fetch, clicks/impressions could occur shortly after. That said, we'd need to think about this more, perhaps.
Flags: needinfo?(oyiptong)
Right now, we are processing logs as they appear in a streaming / aggregating fashion. This is great for log aggregation, but there are many other types of processing that are usually run on daily, weekly or monthly basis (although for us, the highest granularity will be weekly): - unique visitor analysis (daily) - fraud detection (daily) - user segment / categorization analysis (weekly) when we start considering machine learning / clustering, there are many more
Flags: needinfo?(tspurway)
I believe all of those types of analysis can be handled through our existing fetch/view/click logs, and we aren't using the cloudfront logs right now. The existing logs are probably a better source for some of those analysis anyway. I think we'll want to have the logs for different types of analysis, e.g., image hotlinking and different types of fraud.
mostlygeek, can you set the retention of cloudfront logs to be the same as our other storage with IP addresses, i.e., 7 days?
Assignee: nobody → bwong
Status: NEW → ASSIGNED
Iteration: --- → 37.2
Points: --- → 1
retention has been set to 7 days.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Whiteboard: .? → .009
You need to log in before you can comment on or make changes to this bug.