832353 - Create a method for releng to clone the production graphs DB to the staging graphs DB

Reporter

Description

•

11 years ago

I would like to have a way to clone the production DB to the staging DB as things get out of sync.

Is there a way for releng to trigger such process?

Thanks in advance!

Sheeri Cabral [:sheeri]

Comment 1

•

11 years ago

Armen,

Do you want the cloning to be regularly? (e.g. daily, weekly, monthly) Or just have some kind of button to refresh whenever you need to?

Armen [:armenzg]

Reporter

Comment 2

•

11 years ago

Some of my coworkers where not comfortable with frequent cloning.
I think that monthly and manual would be best.

Sheeri Cabral [:sheeri]

Comment 3

•

11 years ago

Noted. When would you like the first cloning to happen?

Armen [:armenzg]

Reporter

Comment 4

•

11 years ago

The 1st of every month should be good enough. Thanks Sheeri!

Sheeri Cabral [:sheeri]

Comment 5

•

11 years ago

Well, I missed 2/1. Are folks OK with me cloning sometime this week? I can work up some scripts and run them manually this week, and set them to auto-run on the 1st of the month.

Should I do this during the evening, so it doesn't interfere with anything, perhaps? It should only take one evening, but I'm just getting back from some time off so I don't know quite when I'll be all caught up.

Armen [:armenzg]

Reporter

Comment 6

•

11 years ago

Any time this week should be fine.
We don't touch the staging DB regularly.
Just let us know what the window will be and I will let the rest of the team know.

Sheeri Cabral [:sheeri]

Comment 7

•

11 years ago

OK, I'll work on it for tonight. How about after 6 pm Pacific time? I haven't tested how long it will take, but probably just a few hours.

Armen [:armenzg]

Reporter

Comment 8

•

11 years ago

It should not cause any sort of downtime, right?

Are you concern it could affect the performance of reading/writing?

After 6pm works for us.
FTR load starts slowing down after 6pm, there will nevertheless be a lot of jobs running for few hours (since there is a lot of backlog) posting to the graph server past 6pm. Not sure if it affects in anyway what you're doing but worth pointing it out if you needed to know.

Thanks!

Sheeri Cabral [:sheeri]

Comment 9

•

11 years ago

Will the import from production mess up the load? the import drops tables and creates them, then adding the data.

Armen [:armenzg]

Reporter

Comment 10

•

11 years ago

IIUC only staging DB will be unusable.
That's fine.

Sheeri Cabral [:sheeri]

Comment 11

•

11 years ago

 did you want the db dropped and re-created only with the tables in the backup? or just import the backup, which drops and re-creates the tables, but not the whole db (so if there are tables not backed up, like they were created sometime today, they'll be gone)

Sheeri Cabral [:sheeri]

Comment 12

•

11 years ago

[17:59:56] <armenzg>	 sheeri: first time to drop everything and recreate
[18:00:06] <armenzg>	 after that I just care that the data is up-to-date

Nick Thomas [:nthomas] (UTC+12)

Comment 13

•

11 years ago

Drive by comment - if reimporting is slow, perhaps we can neglect the actual test results from production. AIUI the motivation here is having tables like os_list, machines, branches, and tests in sync - the config that we modify as requirements change.

Sheeri Cabral [:sheeri]

Comment 14

•

11 years ago

Hrm, sounds like everything except test_runs and test_run_values, then? Or can I also exclude tests? (the first 2 are the large tables).

I had an error last night trying to import the backup, which is large mostly due to those first 2 tables:


[root@stage1 scabral]# time mysql graphs_stage2_prod < graphs_mozilla_org_new.2013.02.06.sql
ERROR 1197 (HY000) at line 4541: Multi-statement transaction required more than 'max_binlog_cache_size' bytes of storage; increase this mysqld variable and try again

real    50m11.244s
user    2m6.103s
sys     0m7.322s
[root@stage1 scabral]# 

Line 4541 imports into test_run_values....so skipping that would be tremendous. I'll work on some scripts for that.

Sheeri Cabral [:sheeri]

Comment 15

•

11 years ago

Taking out just those 2 tables means that the backup goes from 21G to 20M.

I did the import:

[root@stage1 scabral]# time mysql graphs_stage2_prod < graphs_mozilla_org_new.2013.02.07.sql 

real    0m19.040s
user    0m0.590s
sys     0m0.044s

Can you double-check that everything looks good? Should I truncate test_run_values on stage (test_runs is already truncated).

Armen [:armenzg]

Reporter

Comment 16

•

11 years ago

We're fine with truncation.

We don't really need the data of the tests on staging as we don;t have a way of visualizing it. I think http://graphs.allizom.org/graph.html was to do it but I don't think we have it properly piped.

In other words, we only care about having consistency so our patches on staging can apply on production. We only care about what nthomas mentions.

Sheeri Cabral [:sheeri]

Comment 17

•

11 years ago

Awesome, well, it was done as a one-off yesterday, and I'll set the scripts to truncate for the automation.

Sheeri Cabral [:sheeri]

Comment 18

•

11 years ago

Today I am setting and testing our db logical export script to take parameters like --ignore-tables so we can do this properly....

Sheeri Cabral [:sheeri]

Comment 19

•

11 years ago

I have a script that creates the backup with the appropriate truncation on backup5.db.scl3 and copies it to stage1.db.scl3.

I can continue tomorrow with a script that will import the logical export.

What time should I set the script to run? I can set it to run the first Monday of the month in the morning before folks are working on it, but I'd need to know when that is.

Armen [:armenzg]

Reporter

Comment 20

•

11 years ago

We have nthomas on New Zealand's timezone.
So perhaps the last Saturday of every month works best; do you think this could work?

Thanks for your help!

Sheeri Cabral [:sheeri]

Comment 21

•

11 years ago

Sure, that could work. Shall I have an import done tomorrow, since we haven't done one in 2 months? Then I can set up an auto-import for the last Saturday of this month.

Whiteboard: April 2013

Armen [:armenzg]

Reporter

Comment 22

•

11 years ago

(In reply to Sheeri Cabral [:sheeri] from comment #21)
> Sure, that could work. Shall I have an import done tomorrow, since we
> haven't done one in 2 months? Then I can set up an auto-import for the last
> Saturday of this month.

Yep. That should work. Many thanks!

Sheeri Cabral [:sheeri]

Comment 23

•

11 years ago

Sorry I didn't reply earlier, there was a typo in the script when I ran it from cron (path issue).

I just fixed the script and am running it by hand, to copy the file to stage.

Sheeri Cabral [:sheeri]

Comment 24

•

11 years ago

OK, all the scripts are in place, checked by running by hand, and I have put the cron in script for the 2nd Saturday of the month, and so let's check on Monday to make sure it has recent data.

Sheeri Cabral [:sheeri]

Comment 25

•

11 years ago

Armen -

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → DUPLICATE

Sheeri Cabral [:sheeri]

Comment 26

•

11 years ago

This script had stopped working (i was checking cron errors), and I got it to work again, but it's importing a full production db, not one without tests and test_runs. Working on making this right again.

Status: RESOLVED → REOPENED

Resolution: DUPLICATE → ---

Sheeri Cabral [:sheeri]

Comment 27

•

11 years ago

I figured out what's wrong - I need to be able to pass in another argument to the underlying backup script, for the location of the smaller backup, but the underlying backup script only takes 2 args. And since it's a perl script from 2001, it's using shift to do that. I want to do this the right way and use GetOpts, but it's more than I can do tonight, so I"ll have to look in on it in the morning.

Sheeri Cabral [:sheeri]

Comment 28

•

11 years ago

Had to yum install bcmath for the digest page to work. Had to add date.timezone='UTC' in php.ini to get rid of warnings when using the date() function.

Checksum conversion from hex to dec in "Anemometer.php" line 641 is wrong - firstly, it's using capital letters, whereas MySQL does not, so I was getting that it didn't know what "a" was. Once I did that, calculations were still wrong, e.g. 

checksum hex - ebd4f01d29940465bd2fad27544946b5 
actual decimal - 105866025350050960000
decimal calculated by anemomoeter - 313474216032254030094408162781080143541

I changed all instances of 
        $checksum = $this->translate_checksum(get_var('checksum'));
to
        $checksum = get_var('checksum');
in anemometer.php

And the digest page now works!

Sheeri Cabral [:sheeri]

Comment 29

•

11 years ago

Clearly that last comment is in the wrong bug...sorry!

Sheeri Cabral [:sheeri]

Updated

•

10 years ago

Whiteboard: April 2013 → [2014q1]

Sheeri Cabral [:sheeri]

Comment 30

•

10 years ago

These notes are mostly for me to be able to pick back up if interrupted, but lurkers are always welcome!

Current cron & script:
[root@backup3.db.scl3 cron.d]# more backups_devtools 
MAILTO="infra-dbnotices@mozilla.com"
30 5 * * * root /data/backups/bin/make_and_copy_graphs_to_stage.sh

But that calls db-sqldump.

I have changed db-sqldump (the underlying script) to take in parameters:
--db=s instead of the db hard-coded as the first argument
--extra=s instead of the extra stuff hard-coded as the second argument
--targetdir to be the target directory
--target to be the target name

Need to change: make_and_copy_graphs_to_stage & and all the other scripts that call db-sqldump, to use the new format.

targetdir for make_and_copy_graphs_to_stage will be /data/backups, target is graphs_small in this case, so it doesn't overwrite the regular backup.

Sheeri Cabral [:sheeri]

Comment 31

•

10 years ago

Fixed the underlying scripts, now to test them out...

Sheeri Cabral [:sheeri]

Comment 32

•

10 years ago

Testing worked out, and I changed the target on the import script to be the proper name and tested that. I will wait until Friday to close this out, just to make sure I haven't messed up all the logical export backups.

Sheeri Cabral [:sheeri]

Comment 33

•

10 years ago

The backups are getting extra exports for the other databases on devtools, so there's some minimal tweaking that needs to be done.

But the meat is done and working - the last error we had was 1/23, so things have been importing smoothly since then.

Armen [:armenzg]

Reporter

Comment 34

•

10 years ago

Thanks for keep beating at it!

Brandon Johnson [:cyborgshadow]

Comment 35

•

10 years ago

Nagios alerts went off saying sql backups haven't been running for a few days in scl3. Tracked it back to this bug.

When the script db-sqldump was updated in update 81353, the input option name changed from "db" to "cluster".

Unfortunately, you didn't also change the wrapper script (run-all-sql-backups) that iteratively loops through the execution of db-sqldump, so it still was passing a flag called "db" to the script.

Adjusted the wrapper script in change 81444: "fixed db parameter to be cluster parameter, following in the footsteps of change 81353 which renamed it to cluster inside the executing script, db-sqldump"

bash-3.2$ svn diff -r 81353:81444 run-all-sql-backups
Index: run-all-sql-backups
===================================================================
--- run-all-sql-backups	(revision 81353)
+++ run-all-sql-backups	(revision 81444)
@@ -21,7 +21,7 @@
 date >> $LOG
 /usr/bin/perl /data/backups/bin/stop-sql-threads >> $LOG
 for cluster in $MYSQL_CLUSTERS; do
-  /usr/bin/perl /data/backups/bin/db-sqldump --db=$cluster --targetdir=/data/backups >> $LOG
+  /usr/bin/perl /data/backups/bin/db-sqldump --cluster=$cluster --targetdir=/data/backups >> $LOG
 done
 /usr/bin/perl /data/backups/bin/start-sql-threads >> $LOG
 date >> $LOG

Sheeri Cabral [:sheeri]

Comment 36

•

10 years ago

Backups are all good after Brandon's catch last week....so we can close this now!

Status: REOPENED → RESOLVED

Closed: 11 years ago → 10 years ago

Resolution: --- → FIXED

Sheeri Cabral [:sheeri]

Updated

•

10 years ago

Whiteboard: [2014q1] → [2014q1] February

Nobody; OK to take it and work on it

Updated

•

10 years ago

Product: mozilla.org → Data & BI Services Team