defrag sentry_log table periodically was: Querying sentry_log on tm-bouncer01-slave01:bouncer causes replication lag

VERIFIED FIXED

Status

Data & BI Services Team
DB: MySQL
VERIFIED FIXED
6 years ago
4 years ago

People

(Reporter: nthomas, Assigned: sheeri)

Tracking

Details

(Reporter)

Description

6 years ago
If I make a query on the sentry_log table of tm-bouncer01-slave01 there is often no response, and within a few minutes nagios reports a replication lag. Since I don't want to break frequently it I haven't experimented much, but these appear fine
  select log_date, mirror_id from sentry_log limit 10;
but adding any other field causes it to hang.

This table will be turning over quite rapidly as we write logs for 238 mirrors every 5 minutes, most of them 64k in size. There must be an expiry job if the table is only 63k row and 50GB in size, so this might be a problem scanning all those deleted rows ? It's a MyISAM table.
(Assignee)

Comment 1

6 years ago
Yes, that'd be a big problem - tons of fragmentation.
(Assignee)

Comment 2

6 years ago
Replication lags because MyISAM has table-level locking, so if you're reading the table you can't write to it, and thus replication lags.
(Assignee)

Comment 3

6 years ago
Also 65k rows could be a lot of data, since there is a text field there...maybe not 52G but.....

CREATE TABLE `sentry_log` (
  `log_date` datetime NOT NULL,
  `check_time` timestamp NOT NULL default CURRENT_TIMESTAMP,
  `mirror_id` int(10) unsigned NOT NULL,
  `mirror_active` enum('0','1') NOT NULL,
  `mirror_rating` int(11) NOT NULL,
  `reason` text,
  UNIQUE KEY `log_entry` (`mirror_id`,`log_date`),
  KEY `log_date_idx` (`log_date`),
  KEY `log_entry2` (`mirror_id`,`check_time`)
) ENGINE=MyISAM DEFAULT CHARSET=utf8
1 row in set (0.00 sec)
(Assignee)

Comment 4

6 years ago
mysql> optimize table sentry_log;
+--------------------+----------+----------+----------+
| Table              | Op       | Msg_type | Msg_text |
+--------------------+----------+----------+----------+
| bouncer.sentry_log | optimize | status   | OK       | 
+--------------------+----------+----------+----------+
1 row in set (31 min 15.34 sec)

Took 31 minutes on tm-bouncer01-slave02, which is not in production use. The data file was 52G before the defragment, 3.9G after:

-rw-rw---- 1 mysql mysql 3.9G Mar 27 21:39 sentry_log.MYD

I am working on a saner backfill for tm-bouncer01-slave01, so that the table isn't offline for 30 minutes.

Let's talk about how we can better use this table to avoid this kind of fragmentation in the future. I'd recommend partitioning, but that isn't a feature until MySQL 5.1 and we're on 5.0 for tm-bouncer01.
(Assignee)

Comment 5

6 years ago
still need to defragment tp-bouncer01-master01.phx.mozilla.com and slave03, but I'm too weary tonight to work on it and they aren't critical IIRC.
(Assignee)

Comment 6

6 years ago
defragmenting the tp-bounver01-master01 cluster and slave03 now.
(Assignee)

Comment 7

6 years ago
Everything has been defragmented. Keeping this open because we should defrag the table periodically.
Summary: Querying sentry_log on tm-bouncer01-slave01:bouncer causes replication lag → defrag sentry_log table periodically was: Querying sentry_log on tm-bouncer01-slave01:bouncer causes replication lag
(Assignee)

Comment 8

6 years ago
(or we could use partitioning, but that's MySQL 5.1 and up and we're only on MySQL 5.0 right now).
(Assignee)

Updated

6 years ago
Assignee: server-ops-database → scabral
There was supposedly a cron job once an hour on tm-b01-master01 which deleted any rows older than 7 days, 500? rows at a time to prevent lag....
of course, that host is gone now because everything moved...  probably that cron job didn't move with it?
(Assignee)

Comment 11

6 years ago
Nope, tm-bouncer01 hasn't moved yet. tm-b01-master01 has moved, but that's not what this server is on.

However, when it does move, we'll have time to be able to upgrade/put in partitioning.
(Assignee)

Comment 12

6 years ago
mysql> optimize table sentry_log;
+--------------------+----------+----------+----------+
| Table              | Op       | Msg_type | Msg_text |
+--------------------+----------+----------+----------+
| bouncer.sentry_log | optimize | status   | OK       |
+--------------------+----------+----------+----------+
1 row in set (7.25 sec)

This was on the phoenix master, and the file went from 4.1G to 1.5G. Not bad.

This was last fragmented 3/29, so it's only been about a month. I think we can defrag weekly, maybe Sundays at noon Pacific, so in case it causes lag, it doesn't wake anyone up?
(Assignee)

Comment 13

6 years ago
Just did it manually now and it's fine:

mysql> \! ls -lh /var/lib/mysql/bouncer/sentry_log.*
-rw-rw---- 1 mysql mysql 8.6K Mar 31  2011 /var/lib/mysql/bouncer/sentry_log.frm
-rw-rw---- 1 mysql mysql 2.4G May 23 15:40 /var/lib/mysql/bouncer/sentry_log.MYD
-rw-rw---- 1 mysql mysql 2.3M May 23 15:40 /var/lib/mysql/bouncer/sentry_log.MYI
mysql> optimize table bouncer.sentry_log;
+--------------------+----------+----------+----------+
| Table              | Op       | Msg_type | Msg_text |
+--------------------+----------+----------+----------+
| bouncer.sentry_log | optimize | status   | OK       | 
+--------------------+----------+----------+----------+
1 row in set (9.21 sec)

mysql> \! ls -lh /var/lib/mysql/bouncer/sentry_log.*
-rw-rw---- 1 mysql mysql 8.6K Mar 31  2011 /var/lib/mysql/bouncer/sentry_log.frm
-rw-rw---- 1 mysql mysql 2.4G May 23 15:40 /var/lib/mysql/bouncer/sentry_log.MYD
-rw-rw---- 1 mysql mysql 1.9M May 23 15:40 /var/lib/mysql/bouncer/sentry_log.MYI

This should definitely be made into a mysql event (if possible, if not, cron job).
(Assignee)

Comment 14

6 years ago
Done this manually today, but it looks like it was already defrag'd pretty well:

mysql> \! ls -lrth sentry_log.M*
-rw-rw---- 1 mysql mysql 2.5M Jul 16 13:09 sentry_log.MYI
-rw-rw---- 1 mysql mysql 2.6G Jul 16 13:09 sentry_log.MYD
mysql> optimize table bouncer.sentry_log;
+--------------------+----------+----------+----------+
| Table              | Op       | Msg_type | Msg_text |
+--------------------+----------+----------+----------+
| bouncer.sentry_log | optimize | status   | OK       | 
+--------------------+----------+----------+----------+
1 row in set (10.58 sec)

mysql> \! ls -lrth sentry_log.M*
-rw-rw---- 1 mysql mysql 2.6G Jul 16 13:09 sentry_log.MYD
-rw-rw---- 1 mysql mysql 2.1M Jul 16 13:09 sentry_log.MYI
mysql>
(Assignee)

Comment 15

6 years ago
Nick, I just checked today, and there was no fragmentation at all on the table:

mysql> \! ls -lh */*sentry_log*
-rw-rw---- 1 mysql mysql 8.6K Jul 25 12:07 bouncer/sentry_log.frm
-rw-rw---- 1 mysql mysql 264M Dec 21 11:00 bouncer/sentry_log.MYD
-rw-rw---- 1 mysql mysql  50K Dec 21 11:00 bouncer/sentry_log.MYI
mysql> optimize table bouncer.sentry_log;
+--------------------+----------+----------+----------+
| Table              | Op       | Msg_type | Msg_text |
+--------------------+----------+----------+----------+
| bouncer.sentry_log | optimize | status   | OK       |
+--------------------+----------+----------+----------+
1 row in set (0.00 sec)

mysql> \! ls -lh */*sentry_log*
-rw-rw---- 1 mysql mysql 8.6K Jul 25 12:07 bouncer/sentry_log.frm
-rw-rw---- 1 mysql mysql 264M Dec 21 11:00 bouncer/sentry_log.MYD
-rw-rw---- 1 mysql mysql  50K Dec 21 11:02 bouncer/sentry_log.MYI

Have you changed the way you handle sentry_log stuff on bouncer? Does it no longer do those deletions all the time?
(Reporter)

Comment 16

6 years ago
There will be only 5 logs coming in every 5 mins since we moved downloads to a CDN and disabled all the volunteer mirrors (cf 238 logs in comment #0). I don't know if there is a cron doing an optimize. 

jakem has made some changes to the bouncer code but AFAIK that was just enhancing the content rather than anything else.

Comment 17

6 years ago
I don't believe there is such a cron (at least, not anymore... they may have been back before the SCL3 migrations), unless it lives directly on a database server somewhere. I also don't know where the DELETEs are issued from either though, so it's possible the same piece of the puzzle handles both things.

Since the table shrank so much between comment 14 and comment 15, I'm inclined to believe something *is* happening, somewhere.

The turnover on this table *is* much smaller now than it used to be. That would explain why the problem seems much reduced now as compared to previously.

Not only are there far fewer records now, but I suspect they're much more likely to be the same size from one run to the next. Perhaps we're simply not getting much fragmentation anymore because it's able to regularly fill up the holes?

@sheeri: feel free to set up a mysql scheduled event if you'd like... I don't know anything about how they work. I suspect monthly would be plenty often, but I leave that up to you... I believe the "0.00s" runtime is likely to hold up pretty consistently, so if you want to go more often I'm okay with that, too. :)
(Assignee)

Comment 18

6 years ago
Well, it sounds like we don't need the defragmentation any more. Since the defrag in May, we haven't really had any fragmentation.

Now, this might change if we ever do change what we use for downloads, but at this point doing a defrag is legacy and unnecessary. So I'd rather not put it in place. I think we're all in agreement, and will resolve this as fixed, since the fragmentation isn't happening and isn't likely to happen unless we stop using the CDN.
Status: NEW → RESOLVED
Last Resolved: 6 years ago
Resolution: --- → FIXED

Comment 19

6 years ago
So say we all.
Status: RESOLVED → VERIFIED
Product: mozilla.org → Data & BI Services Team
You need to log in before you can comment on or make changes to this bug.