Closed
Bug 832353
Opened 11 years ago
Closed 10 years ago
Create a method for releng to clone the production graphs DB to the staging graphs DB
Categories
(Data & BI Services Team :: DB: MySQL, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: armenzg, Unassigned)
Details
(Whiteboard: [2014q1] February)
I would like to have a way to clone the production DB to the staging DB as things get out of sync. Is there a way for releng to trigger such process? Thanks in advance!
Comment 1•11 years ago
|
||
Armen, Do you want the cloning to be regularly? (e.g. daily, weekly, monthly) Or just have some kind of button to refresh whenever you need to?
Reporter | ||
Comment 2•11 years ago
|
||
Some of my coworkers where not comfortable with frequent cloning. I think that monthly and manual would be best.
Comment 3•11 years ago
|
||
Noted. When would you like the first cloning to happen?
Reporter | ||
Comment 4•11 years ago
|
||
The 1st of every month should be good enough. Thanks Sheeri!
Comment 5•11 years ago
|
||
Well, I missed 2/1. Are folks OK with me cloning sometime this week? I can work up some scripts and run them manually this week, and set them to auto-run on the 1st of the month. Should I do this during the evening, so it doesn't interfere with anything, perhaps? It should only take one evening, but I'm just getting back from some time off so I don't know quite when I'll be all caught up.
Reporter | ||
Comment 6•11 years ago
|
||
Any time this week should be fine. We don't touch the staging DB regularly. Just let us know what the window will be and I will let the rest of the team know.
Comment 7•11 years ago
|
||
OK, I'll work on it for tonight. How about after 6 pm Pacific time? I haven't tested how long it will take, but probably just a few hours.
Reporter | ||
Comment 8•11 years ago
|
||
It should not cause any sort of downtime, right? Are you concern it could affect the performance of reading/writing? After 6pm works for us. FTR load starts slowing down after 6pm, there will nevertheless be a lot of jobs running for few hours (since there is a lot of backlog) posting to the graph server past 6pm. Not sure if it affects in anyway what you're doing but worth pointing it out if you needed to know. Thanks!
Comment 9•11 years ago
|
||
Will the import from production mess up the load? the import drops tables and creates them, then adding the data.
Reporter | ||
Comment 10•11 years ago
|
||
IIUC only staging DB will be unusable. That's fine.
Comment 11•11 years ago
|
||
did you want the db dropped and re-created only with the tables in the backup? or just import the backup, which drops and re-creates the tables, but not the whole db (so if there are tables not backed up, like they were created sometime today, they'll be gone)
Comment 12•11 years ago
|
||
[17:59:56] <armenzg> sheeri: first time to drop everything and recreate [18:00:06] <armenzg> after that I just care that the data is up-to-date
Comment 13•11 years ago
|
||
Drive by comment - if reimporting is slow, perhaps we can neglect the actual test results from production. AIUI the motivation here is having tables like os_list, machines, branches, and tests in sync - the config that we modify as requirements change.
Comment 14•11 years ago
|
||
Hrm, sounds like everything except test_runs and test_run_values, then? Or can I also exclude tests? (the first 2 are the large tables). I had an error last night trying to import the backup, which is large mostly due to those first 2 tables: [root@stage1 scabral]# time mysql graphs_stage2_prod < graphs_mozilla_org_new.2013.02.06.sql ERROR 1197 (HY000) at line 4541: Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage; increase this mysqld variable and try again real 50m11.244s user 2m6.103s sys 0m7.322s [root@stage1 scabral]# Line 4541 imports into test_run_values....so skipping that would be tremendous. I'll work on some scripts for that.
Comment 15•11 years ago
|
||
Taking out just those 2 tables means that the backup goes from 21G to 20M. I did the import: [root@stage1 scabral]# time mysql graphs_stage2_prod < graphs_mozilla_org_new.2013.02.07.sql real 0m19.040s user 0m0.590s sys 0m0.044s Can you double-check that everything looks good? Should I truncate test_run_values on stage (test_runs is already truncated).
Reporter | ||
Comment 16•11 years ago
|
||
We're fine with truncation. We don't really need the data of the tests on staging as we don;t have a way of visualizing it. I think http://graphs.allizom.org/graph.html was to do it but I don't think we have it properly piped. In other words, we only care about having consistency so our patches on staging can apply on production. We only care about what nthomas mentions.
Comment 17•11 years ago
|
||
Awesome, well, it was done as a one-off yesterday, and I'll set the scripts to truncate for the automation.
Comment 18•11 years ago
|
||
Today I am setting and testing our db logical export script to take parameters like --ignore-tables so we can do this properly....
Comment 19•11 years ago
|
||
I have a script that creates the backup with the appropriate truncation on backup5.db.scl3 and copies it to stage1.db.scl3. I can continue tomorrow with a script that will import the logical export. What time should I set the script to run? I can set it to run the first Monday of the month in the morning before folks are working on it, but I'd need to know when that is.
Reporter | ||
Comment 20•11 years ago
|
||
We have nthomas on New Zealand's timezone. So perhaps the last Saturday of every month works best; do you think this could work? Thanks for your help!
Comment 21•11 years ago
|
||
Sure, that could work. Shall I have an import done tomorrow, since we haven't done one in 2 months? Then I can set up an auto-import for the last Saturday of this month.
Whiteboard: April 2013
Reporter | ||
Comment 22•11 years ago
|
||
(In reply to Sheeri Cabral [:sheeri] from comment #21) > Sure, that could work. Shall I have an import done tomorrow, since we > haven't done one in 2 months? Then I can set up an auto-import for the last > Saturday of this month. Yep. That should work. Many thanks!
Comment 23•11 years ago
|
||
Sorry I didn't reply earlier, there was a typo in the script when I ran it from cron (path issue). I just fixed the script and am running it by hand, to copy the file to stage.
Comment 24•11 years ago
|
||
OK, all the scripts are in place, checked by running by hand, and I have put the cron in script for the 2nd Saturday of the month, and so let's check on Monday to make sure it has recent data.
Comment 25•11 years ago
|
||
Armen -
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
Comment 26•11 years ago
|
||
This script had stopped working (i was checking cron errors), and I got it to work again, but it's importing a full production db, not one without tests and test_runs. Working on making this right again.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
Comment 27•11 years ago
|
||
I figured out what's wrong - I need to be able to pass in another argument to the underlying backup script, for the location of the smaller backup, but the underlying backup script only takes 2 args. And since it's a perl script from 2001, it's using shift to do that. I want to do this the right way and use GetOpts, but it's more than I can do tonight, so I"ll have to look in on it in the morning.
Comment 28•11 years ago
|
||
Had to yum install bcmath for the digest page to work. Had to add date.timezone='UTC' in php.ini to get rid of warnings when using the date() function. Checksum conversion from hex to dec in "Anemometer.php" line 641 is wrong - firstly, it's using capital letters, whereas MySQL does not, so I was getting that it didn't know what "a" was. Once I did that, calculations were still wrong, e.g. checksum hex - ebd4f01d29940465bd2fad27544946b5 actual decimal - 105866025350050960000 decimal calculated by anemomoeter - 313474216032254030094408162781080143541 I changed all instances of $checksum = $this->translate_checksum(get_var('checksum')); to $checksum = get_var('checksum'); in anemometer.php And the digest page now works!
Comment 29•11 years ago
|
||
Clearly that last comment is in the wrong bug...sorry!
Updated•10 years ago
|
Whiteboard: April 2013 → [2014q1]
Comment 30•10 years ago
|
||
These notes are mostly for me to be able to pick back up if interrupted, but lurkers are always welcome! Current cron & script: [root@backup3.db.scl3 cron.d]# more backups_devtools MAILTO="infra-dbnotices@mozilla.com" 30 5 * * * root /data/backups/bin/make_and_copy_graphs_to_stage.sh But that calls db-sqldump. I have changed db-sqldump (the underlying script) to take in parameters: --db=s instead of the db hard-coded as the first argument --extra=s instead of the extra stuff hard-coded as the second argument --targetdir to be the target directory --target to be the target name Need to change: make_and_copy_graphs_to_stage & and all the other scripts that call db-sqldump, to use the new format. targetdir for make_and_copy_graphs_to_stage will be /data/backups, target is graphs_small in this case, so it doesn't overwrite the regular backup.
Comment 31•10 years ago
|
||
Fixed the underlying scripts, now to test them out...
Comment 32•10 years ago
|
||
Testing worked out, and I changed the target on the import script to be the proper name and tested that. I will wait until Friday to close this out, just to make sure I haven't messed up all the logical export backups.
Comment 33•10 years ago
|
||
The backups are getting extra exports for the other databases on devtools, so there's some minimal tweaking that needs to be done. But the meat is done and working - the last error we had was 1/23, so things have been importing smoothly since then.
Reporter | ||
Comment 34•10 years ago
|
||
Thanks for keep beating at it!
Comment 35•10 years ago
|
||
Nagios alerts went off saying sql backups haven't been running for a few days in scl3. Tracked it back to this bug. When the script db-sqldump was updated in update 81353, the input option name changed from "db" to "cluster". Unfortunately, you didn't also change the wrapper script (run-all-sql-backups) that iteratively loops through the execution of db-sqldump, so it still was passing a flag called "db" to the script. Adjusted the wrapper script in change 81444: "fixed db parameter to be cluster parameter, following in the footsteps of change 81353 which renamed it to cluster inside the executing script, db-sqldump" bash-3.2$ svn diff -r 81353:81444 run-all-sql-backups Index: run-all-sql-backups =================================================================== --- run-all-sql-backups (revision 81353) +++ run-all-sql-backups (revision 81444) @@ -21,7 +21,7 @@ date >> $LOG /usr/bin/perl /data/backups/bin/stop-sql-threads >> $LOG for cluster in $MYSQL_CLUSTERS; do - /usr/bin/perl /data/backups/bin/db-sqldump --db=$cluster --targetdir=/data/backups >> $LOG + /usr/bin/perl /data/backups/bin/db-sqldump --cluster=$cluster --targetdir=/data/backups >> $LOG done /usr/bin/perl /data/backups/bin/start-sql-threads >> $LOG date >> $LOG
Comment 36•10 years ago
|
||
Backups are all good after Brandon's catch last week....so we can close this now!
Status: REOPENED → RESOLVED
Closed: 11 years ago → 10 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Whiteboard: [2014q1] → [2014q1] February
Updated•10 years ago
|
Product: mozilla.org → Data & BI Services Team
You need to log in
before you can comment on or make changes to this bug.
Description
•