Develop a data retention policy for Input



3 years ago
3 years ago


(Reporter: laura, Assigned: willkg)



(Whiteboard: u=dev c=data p=2 s=input.2015q1)



3 years ago
Similar to bug 629211 for Socorro, we need a data retention policy for Input, stating how long we keep user data and other kinds of data, including weblogs.
We did some data retention work a while back and have some data retention related things in place already including some text on the About page about it:

I'll dig up the bug numbers for that work and the summarize the relevant bits here as well as the code that goes along with it later today. Summarizing it here will make it easier to see whether there are outstanding issues that need to be fixed or not.

The things we're definitely missing from the policy are:

1. weblogs: I have no idea where weblogs are kept, who maintains them, where they go, etc. Someone in webops might know. I'll have to track that down.

2. database dumps, backups and other copies of the db: I'll have to talk to Sheeri or whoever maintains this side of things
Whiteboard: u=dev c=data p= s=input.2014q4
Depends on: 1112317
Data retention:

1. Bug #946456 covered creating a deletion policy for Input. We keep feedback data forever, but delete email addresses and feedback context after 6 months. We created a cron job and some minor infrastructure to handle this. It kicks off every Sunday at 3:am and sends out a record via email of how many of each type it deleted.

2. Bug #1055785 covered making that data retention policy "public" in the About Input page.

3. Bug #1104934 covered creating the browser data model and data purging code. We keep browser data for 6 months and delete it.

4. (not done) Bug #1112096 covers deleting journal entries. The journal app is used to log events at the application layer so we can diagnose problems. So far, we've used it for feedback description translations. That's fine, however, a few weeks ago, we started using it for heartbeat api errors which will need to be covered by a data retention policy.

5. (not done) Recently we started collecting heartbeat data. We currently have no data retention policy for this. That's covered in bug #1112317.

Database dumps:

1. There are 3 days worth of Input db dumps on sumotools1. We don't keep dumps longer than 3 days.

2. (not done) I sent an email asking about backups and whether there are other copies floating around. I'll reply here when I know more.


1. (not done) I sent an email asking about where server logs are kept and for how long. I'll reply here when I know more.
Depends on: 1112096
Bumping to next quarter.
Whiteboard: u=dev c=data p= s=input.2014q4 → u=dev c=data p= s=input.2015q1
Changing this to a P1. It should get done asap.
Priority: -- → P1
From Sheeri re: db backups:

> We keep input backups daily, for a month, and started keeping
> monthly backups August 1, 2014. (this is for production as well
> as dev/stage).
I talked to Lonnen and codified the data retention policy here:

We'll update that and treat it as the source of truth. Plus it gives me a place to point people to. I compared it to what Socorro has and I think we've covered most/all the bases except the "weblogs" one. I'm pretty sure we don't have anything interesting in the weblogs. It'd be whatever IT normally does.

Even with that caveat, I'm going to mark this as FIXED.

Laura: If you think I'm in error, please reopen.
Last Resolved: 3 years ago
Resolution: --- → FIXED
Marking this as 2 points retroactively.
Whiteboard: u=dev c=data p= s=input.2015q1 → u=dev c=data p=2 s=input.2015q1
You need to log in before you can comment on or make changes to this bug.