from #bteam: [1:28 am] <gozer> cyborgshadow: wait? How can a *read* query cause replication lag ? [1:28 am] <cyborgshadow> gozer: bugs_fulltext is MyISAM. You can't read and write to it at the same time. [1:28 am] <cyborgshadow> lock that table for 15 minutes on a read...you can't write to it during that time. [1:28 am] <gozer> cyborgshadow: oh damn, seriously ? [1:28 am] <cyborgshadow> the replication log plays things forward in a serial order. If it can't write, it stops and waits until it can. [1:28 am] <gozer> cyborgshadow: ouch [1:29 am] <cyborgshadow> gozer: so, it got to a write on the bugs_fulltext table...and had to wait that 15 minutes. [1:29 am] <cyborgshadow> gozer: yes. We've talked about moving it to InnoDB now that InnoDB has fulltext capability (and doesn't suffer from the same type of locking) [1:29 am] <gozer> cyborgshadow: I wouldn't have guessed a read query could stall replication, that's one hell of a gotcha [1:29 am] <cyborgshadow> gozer: as long as it's reading from the table replication is trying to currently write to, yes, it can (only in the case of MyISAM) upstream have bug 868869, but they are blocked on requiring mysql 5.6 for this. we have no such concerns, and given the DBAs have pointed to myiam being a factor in the replication lag issues (bug 974338) it makes sense for us switch bugs_fulltext to innodb.
Created attachment 8388351 [details] [diff] [review] 981487_1.patch i've tested bugzilla running with the current code against a database with an innodb bugs_fulltext and didn't encounter any issues. the plan here is to get the DBAs to alter the table to innodb, which they are able to do without any downtime (bug 981496). once that's done, we can push out this patch which mostly moves the updating of bugs_fulltext within the update's transaction.
cyborgshadow pointed out on irc that updating the db master _will_ required downtime, so we can't push this code until that's done.
The downtime we require is minimal, and can be done by failing over when the slaves are done. The first step is to do stage, let's do that. While we're doing that, let's make sure that the configs line up, since we've been doing some stuff on the slaves in bug 974338. And of course while this looks promising and is a great idea anyway, it doesn't account for the fact that the phx slaves have lag, with no queries running on them at all, so this isn't the only solution we need - but it is a good one!
Created attachment 8389146 [details] [diff] [review] 981487_2.patch this revision adds a check for mysql 5.6.12 (went with .12 because that's the version which is currently in production).
Hrm, any way to pull it apart and to 5.6.12 or higher? e.g. 5.6.20 or 5.7.10?
(In reply to Sheeri Cabral [:sheeri] from comment #5) > Hrm, any way to pull it apart and to 5.6.12 or higher? e.g. 5.6.20 or 5.7.10? 5.6.12 is the minimum mysql version we'll support (the current value is 5.0.15).
Comment on attachment 8389146 [details] [diff] [review] 981487_2.patch Review of attachment 8389146 [details] [diff] [review]: ----------------------------------------------------------------- r=dkl
Just to note: all the production systems are using InnoDB fulltext now. This code can be pushed during the next maintenance window, or anytime before then if we have proper warning. (We'll need to cancel the alter table statements at time of execution since they're already done in production).
True. But we should convert stage and push to stage first.
Roger. I was going to let the code push take care of stage. glob had already tested dev by the time we did prod. :)
This patch was done over two months ago. Can we schedule some downtime to push it out?
Mark - not sure what you need here. This is for stage, and the code push will do the table change, so it's really all what the bugzilla team wants to do, and when they want to do it. We can be around for it in case something goes wrong, just let us know.
Ah okay, thought this required DBAs. I'll talk to glob.
Removing DBAs since our side was done a while ago.
finally circling back around to this. this patch still looks good (aside from trivial to fix bitrot). i've tested that it works with a dump created after bmo prod's schema was updated to innodb as well as when creating a database from scratch. To ssh://firstname.lastname@example.org/webtools/bmo/bugzilla.git c7a6d5e..d247ef1 master -> master