Closed Bug 850644 Opened 13 years ago Closed 12 years ago

Create daily cron job to export sumo production db on (new) sumotools

Categories

(Data & BI Services Team :: DB: MySQL, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: rrosario, Assigned: scabral)

Details

(Whiteboard: [2013q2] April)

Every so often, we (sumodevs) refresh our local development database with production data. Historically, we have grabbed a copy that is always available on the old sumotools (which is going away soon!). As soon as we copy and import the database locally, we run an "anonymizer" script to remove email addresses, session data, etc. It would be great if our dump just excluded this data from the start. This is our current anonyizer script: https://github.com/mozilla/kitsune/blob/master/scripts/anonymize.sql
Making this a q2 goal, so it happens sooner rather than later. In the meantime, if you need a production export, we can get you a full one just by copying a file, so do not hesitate to ask for that whenever you need it.
Whiteboard: [2013q2]
Sheeri,I tried my hand at this since I needed a db dump today. It works, but it's probably horrible, feel free to throw it away and do it properly. Since there is over 4GB of unnecessary session data in the db I decided to dump the schema first and then dump the tables excluding the ones specified in the anonymize script, that reduced the time for a dump to 5 minutes and the zipped files size to ~800MB. I couldn't find a way to execute simple SQL queries like in the anonymize script on a dumped db file. At first glance it looks fairly simple to convert them to regexes though, but I didn't attempt that. I added the following to the crontab, executing every day at 0:15, and a purge of all dumps older than 30 days. dump_support_mozilla.com.sh: #structure only mysqldump -u sumo -pfoxkeh --single-transaction --no-data support_mozilla_com > /data/db_dumps/support_mozilla_com/support_mozilla_com-$(date +%Y%m%d).sql #data too, but ignoring some tables mysqldump -u sumo -pfoxkeh --single-transaction support_mozilla_com --ignore-table=support_mozilla_com.tidings_watchfilter --ignore-table=support_mozilla_com.tidings_watch --ignore-table=support_mozilla_com.django_session --ignore-table=support_mozilla_com.messages_inboxmessage --ignore-table=support_mozilla_com.messages_outboxmessage_to --ignore-table=support_mozilla_com.messages_outboxmessage >> /data/db_dumps/support_mozilla_com/support_mozilla_com-$(date +%Y%m%d).sql #zip cd /data/db_dumps/support_mozilla_com/ tar -czf support_mozilla_com-$(date +%Y%m%d).sql.tar.gz support_mozilla_com-$(date +%Y%m%d).sql #remove raw file rm /data/db_dumps/support_mozilla_com/support_mozilla_com-$(date +%Y%m%d).sql
Oh, we already have the backups. We also have a way to ignore the tables you're ignoring, if that's what you'd like. The problem is copying the file over, making sure
Whiteboard: [2013q2] → [2013q2] April
whoops.... ...making sure the firewall is open, getting the timing right so first the export happens, then the copy happens, then the import happens. :D
Assignee: server-ops-database → scabral
Hrm, your list looks very similar to what we do in: https://bugzilla.mozilla.org/show_bug.cgi?id=816972#c12 Can we just give you a backup of stage, then? (or can you use stage? not sure what purposes these things are used for)
(In reply to Sheeri Cabral [:sheeri] from comment #5) > Can we just give you a backup of stage, then? (or can you use stage? not > sure what purposes these things are used for) No, because stage has lots of QA data and will be up to 30 days old. We like our data fresh :)
Right! because that's only on the 1st of the month. OK, I know what I have to work with now.
Sheeri, since the production db is already replicated on sumotools we don't need to do any copying from other machines. We just need a dump of the one on sumotools. I just wanted to check if everything is okay with the dumps the cronjob is creating right now, but the server seems to be down :(
sumotools is up, what's the problem you're having?
Also - my apologies. I have been working on several bugs which call for an import of a production backup, which is why I was focusing on that. Yes, certainly we can do a daily mysqldump export of that stuff, that's much easier than what I was thinking.
I worked on this script yesterday, and it ran perfectly overnight: [root@sumotools1.webapp.phx1 support_mozilla_com]# ls -rlth total 1.8G -rw-r--r-- 1 root root 884M Apr 25 13:29 support_mozilla_com.2013.04.25.sql.tar.gz -rw-r--r-- 1 root root 98 Apr 26 01:52 export.log -rw-r--r-- 1 root root 885M Apr 26 01:56 support_mozilla_com.2013.04.26.sql.tar.gz The 4/25 one is one I did manually, the 4/26 was done via cron. I have set it to delete old backups after 7 days.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
The exports are in: /data/exports/support_mozilla_com
Product: mozilla.org → Data & BI Services Team
You need to log in before you can comment on or make changes to this bug.