Closed
Bug 848816
Opened 11 years ago
Closed 11 years ago
viewvc queries causing pain
Categories
(Data & BI Services Team :: DB: MySQL, task, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: scabral, Assigned: scabral)
Details
(Whiteboard: [2013q2] June)
Recently a few viewvc queries on the generic-rw db host caused the host to slow down and affect all 50 applications on the site. Here are the queries: 23102526 viewvc 10.8.70.202:49584 viewvc Query 1382 Sorting result SELECT checkins.* FROM checkins,repositories,people WHERE (checkins.repositoryid=repositories.id) AND (checkins.whoid=people.id) AND (repositories.repository='/repo/svn/mozilla') AND (people.who REGEXP '.*') ORDER BY checkins.ci_when DESC,descid LIMIT 1001 23104278 viewvc 10.8.70.202:61680 viewvc Query 1330 Sorting result SELECT checkins.* FROM checkins,repositories,people WHERE (checkins.repositoryid=repositories.id) AND (checkins.whoid=people.id) AND (repositories.repository='/repo/svn/mozilla') AND (people.who REGEXP '.*') ORDER BY checkins.ci_when DESC,descid LIMIT 1001 23105734 viewvc 10.8.70.202:7034 viewvc Query 1288 Sorting result SELECT checkins.* FROM checkins,repositories,descs WHERE (checkins.repositoryid=repositories.id) AND (checkins.descid=descs.id) AND (repositories.repository='/repo/svn/mozilla') AND (descs.description='619842') ORDER BY checkins.ci_when DESC,descid LIMIT 1001 23106824 viewvc 10.8.70.202:16140 viewvc Query 1253 Sorting result SELECT checkins.* FROM checkins,repositories,descs WHERE (checkins.repositoryid=repositories.id) AND (checkins.descid=descs.id) AND (repositories.repository='/repo/svn/mozilla') AND (descs.description='619842') ORDER BY checkins.ci_when DESC,descid LIMIT 1001 23108973 viewvc 10.8.70.202:31784 viewvc Query 1195 Sorting result SELECT checkins.* FROM checkins,repositories,descs WHERE (checkins.repositoryid=repositories.id) AND (checkins.descid=descs.id) AND (repositories.repository='/repo/svn/mozilla') AND (descs.description='619842') ORDER BY checkins.ci_when DESC,descid LIMIT 1001 23115077 viewvc 10.8.70.202:29716 viewvc Query 922 Sorting result SELECT checkins.* FROM checkins,repositories,descs WHERE (checkins.repositoryid=repositories.id) AND (checkins.descid=descs.id) AND (repositories.repository='/repo/svn/mozilla') AND (descs.description='619842') ORDER BY checkins.ci_when DESC,descid LIMIT 1001 There are several improvements that can be made to the checkins table. Firstly, it's MyISAM. Is that necessary? Or can we convert it to InnoDB? The table itself is <100M so the conversion would be fast. This would give the table better transactionality. Secondly, there's a redundant index: UNIQUE KEY `repositoryid` (`repositoryid`,`dirid`,`fileid`,`revision`), KEY `repositoryid_2` (`repositoryid`), That 2nd key is a waste of space, because the 'repositoryid' unique key already takes care of anything that would need that index. However, since you are ordering by checkins.ci_when in all the queries, it would be worth it to drop the 'repositoryid_2' key and add in a key on (repository_id,ci_when). This would eliminate the need for an extra pass-through the data to sort the query, and make it faster. And lastly, but most importantly, why are these running against the read-write IP for the generic database? It should have read-write splitting such that these queries would be on the read-only IP for the generic database. We can scale reads much more easily than we can scale writes, and it's a huge security issue if one application like viewvc can effectively DoS the whole server.
Updated•11 years ago
|
Assignee: nobody → server-ops-webops
Group: websites-security
Component: XUL Explorer → Server Operations: Web Operations
Product: Other Applications → mozilla.org
QA Contact: nmaul
Version: unspecified → other
Comment 1•11 years ago
|
||
Adding Dev Services to see if they have any knowledge if viewvc can do r/w splitting in mysql I'll also do some research
Assignee: server-ops-webops → bburton
Comment 2•11 years ago
|
||
Background: this powers viewvc.svn.mozilla.org. It's a web interface for SVN, although technically it can work with CVS too, so there's stuff to work with both in the code/config. To give you an idea of the age of this schema, here's a comment I found at the top of the script which creates the tables. :) # administrative program for CVSdb; creates a clean database in # MySQL 3.22 or later So umm... yeah. :) (In reply to Sheeri Cabral [:sheeri] from comment #0) > Firstly, it's MyISAM. Is that necessary? Or can we convert it to InnoDB? The > table itself is <100M so the conversion would be fast. This would give the > table better transactionality. The schema requests MyISAM specifically, but given that it was written for MySQL 3.22, this doesn't concern me at all... it was likely done so as to guarantee choosing "MyISAM" instead of "ISAM". I see no reason why it can't be InnoDB. Not many columns, and no blob columns of any type, so row length isn't a concern. It's possible there are some "select count(*)" things going on, although I think that's unlikely, and even so the row count might be low enough to be insignificant (820k rows). I'm not worried about the few seconds' pause to ALTER it. This should not be a high-traffic site these days. Most projects are in hg or git... SVN gets a lot of usage, but far less than those. Additionally this would only block the viewvc interface, not actual SVN+SSH traffic, so the scope of impact is relatively small. I don't feel a need to even announce a window for it. I made a copy and timed this... took 65 seconds. Feel free to do this at your leisure. > Secondly, there's a redundant index: > > UNIQUE KEY `repositoryid` (`repositoryid`,`dirid`,`fileid`,`revision`), > KEY `repositoryid_2` (`repositoryid`), > > That 2nd key is a waste of space, because the 'repositoryid' unique key > already takes care of anything that would need that index. Sounds good to me. Go for it. I'm not worried about updating any schema files or anything for repeatability. :) > However, since you are ordering by checkins.ci_when in all the queries, it > would be worth it to drop the 'repositoryid_2' key and add in a key on > (repository_id,ci_when). This would eliminate the need for an extra > pass-through the data to sort the query, and make it faster. Sounds good to me, however in my brief testing this didn't seem to affect the output of EXPLAIN on one of those queries from comment 0, apart from adding an additional possible key (which was not chosen). Don't know why that is. Might be worth a closer look. If it still looks valuable to you, go for it. > And lastly, but most importantly, why are these running against the > read-write IP for the generic database? It should have read-write splitting > such that these queries would be on the read-only IP for the generic > database. We can scale reads much more easily than we can scale writes, and > it's a huge security issue if one application like viewvc can effectively > DoS the whole server. This app supports a read-only username/password, but not (as far as I can tell) a read-only database. It's old (obviously, given that comment I pasted above) and not our own code... probably was just never designed with that in mind. fox2mike and bkero might have some input on how feasible it would be to retrofit this.
Comment 3•11 years ago
|
||
ViewVC is still very much an active project upstream. I hope any recommendations are pushed upstream to be committed for others to benefit as well.
Comment 4•11 years ago
|
||
Fair enough! Here's the upstream DB creation script. It's a bit newer than ours (the MySQL 3.22 comment is gone), but the checkins table is identical. Haven't examined others: http://viewvc.tigris.org/svn/viewvc/trunk/bin/make-database Info on contributing is here: http://viewvc.org/contributing.html. Patches (against that make-database script) should be sent to their Issue Tracker, which is linked on that page. Looks like a Bugzilla installation. Doesn't seem to get many bugs, but there was something about this very script that was fixed less than a month ago. I don't expect they'd be keen to take a patch that changes this one table over to InnoDB, but the index changes seem likely.
Updated•11 years ago
|
Group: core-security
Assignee | ||
Comment 5•11 years ago
|
||
While I'm happy to submit patches, is there any reason we couldn't make the changes on our own installation? I'm certainly happy to convert all the tables to MyISAM.
Assignee | ||
Updated•11 years ago
|
Assignee: bburton → scabral
Comment 6•11 years ago
|
||
(In reply to Sheeri Cabral [:sheeri] from comment #5) > While I'm happy to submit patches, is there any reason we couldn't make the > changes on our own installation? I'm certainly happy to convert all the > tables to MyISAM. Also, if there's a need to re-create the tables with a newer schema and move stuff around, I'm more than happy to do that and re-import all the data from svn.
Group: websites-security → infra
Component: Server Operations: Web Operations → Server Operations: Database
QA Contact: nmaul → cshields
Assignee | ||
Updated•11 years ago
|
Component: Server Operations: Database → Server Operations: Web Operations
QA Contact: cshields → nmaul
Whiteboard: [2013q2]
Comment 7•11 years ago
|
||
This isn't strictly owned by Webops, it's Dev Services. What exactly in here is a Q2 goal?
Component: Server Operations: Web Operations → Server Operations: Developer Services
QA Contact: nmaul → shyam
Assignee | ||
Comment 8•11 years ago
|
||
The q2 goal is for the DB Engineering team to convert tables to InnoDB and submit the patches upstream.
Assignee | ||
Comment 9•11 years ago
|
||
(also to change the indexing as recommended in the description and if applicable, submit those patches upstream as well)
Comment 10•11 years ago
|
||
Moving back to the DB team then :)
Component: Server Operations: Developer Services → Server Operations: Database
QA Contact: shyam → cshields
Assignee | ||
Updated•11 years ago
|
Whiteboard: [2013q2] → [2013q2] June
Assignee | ||
Comment 11•11 years ago
|
||
FYI most tables were converted to InnoDB on the dev instance, but I'm not seeing a viewvc dev/staging instance. :(
Assignee | ||
Comment 12•11 years ago
|
||
Bug 853466 converted these tables to InnoDB: viewvc.branches viewvc.checkins viewvc.descs viewvc.dirs viewvc.files viewvc.people viewvc.repositories viewvc.tags Submitted the point to the existing issue (already resolved) about using InnoDB - http://viewvc.tigris.org/issues/show_bug.cgi?id=262 - the resolution in 2007 was "nobody will need the power of InnoDB so we're keeping it MyISAM". Opened issue 529 about the redundant index and sorting: http://viewvc.tigris.org/issues/show_bug.cgi?id=529 And I have actually performed the ALTER TABLE on the checkins table: MariaDB [viewvc]> ALTER TABLE checkins DROP INDEX repositoryid_2, ADD INDEX repositoryid_2 (repositoryid,ci_when); Query OK, 834905 rows affected (35.56 sec) Records: 834905 Duplicates: 0 Warnings: 0 I believe all the actions are complete - change the tables at Mozilla to use InnoDB and to make better indexes on the checkins table, and I have submitted issues upstream. (I have offered diff patches, I don't think they actually need them, but just in case they want them...)
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 13•11 years ago
|
||
This is quite awesome, thank you!
Comment 14•11 years ago
|
||
Can we open this bug up now?
Assignee | ||
Comment 15•11 years ago
|
||
For sure! I should have opened it up once I realized viewvc had no confidential info in it.
Group: infra
Updated•10 years ago
|
Product: mozilla.org → Data & BI Services Team
You need to log in
before you can comment on or make changes to this bug.
Description
•