Closed Bug 848816 Opened 11 years ago Closed 11 years ago

viewvc queries causing pain

Categories

(Data & BI Services Team :: DB: MySQL, task, P2)

x86
macOS

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: scabral, Assigned: scabral)

Details

(Whiteboard: [2013q2] June)

Recently a few viewvc queries on the generic-rw db host caused the host to slow down and affect all 50 applications on the site. Here are the queries:

23102526        viewvc  10.8.70.202:49584       viewvc  Query   1382 Sorting result   SELECT checkins.* FROM checkins,repositories,people WHERE (checkins.repositoryid=repositories.id) AND (checkins.whoid=people.id) AND (repositories.repository='/repo/svn/mozilla') AND (people.who REGEXP '.*') ORDER BY checkins.ci_when DESC,descid LIMIT 1001
23104278        viewvc  10.8.70.202:61680       viewvc  Query   1330 Sorting result   SELECT checkins.* FROM checkins,repositories,people WHERE (checkins.repositoryid=repositories.id) AND (checkins.whoid=people.id) AND (repositories.repository='/repo/svn/mozilla') AND (people.who REGEXP '.*') ORDER BY checkins.ci_when DESC,descid LIMIT 1001
23105734        viewvc  10.8.70.202:7034        viewvc  Query   1288 Sorting result   SELECT checkins.* FROM checkins,repositories,descs WHERE (checkins.repositoryid=repositories.id) AND (checkins.descid=descs.id) AND (repositories.repository='/repo/svn/mozilla') AND (descs.description='619842') ORDER BY checkins.ci_when DESC,descid LIMIT 1001
23106824        viewvc  10.8.70.202:16140       viewvc  Query   1253 Sorting result   SELECT checkins.* FROM checkins,repositories,descs WHERE (checkins.repositoryid=repositories.id) AND (checkins.descid=descs.id) AND (repositories.repository='/repo/svn/mozilla') AND (descs.description='619842') ORDER BY checkins.ci_when DESC,descid LIMIT 1001
23108973        viewvc  10.8.70.202:31784       viewvc  Query   1195 Sorting result   SELECT checkins.* FROM checkins,repositories,descs WHERE (checkins.repositoryid=repositories.id) AND (checkins.descid=descs.id) AND (repositories.repository='/repo/svn/mozilla') AND (descs.description='619842') ORDER BY checkins.ci_when DESC,descid LIMIT 1001
23115077        viewvc  10.8.70.202:29716       viewvc  Query   922 Sorting result   SELECT checkins.* FROM checkins,repositories,descs WHERE (checkins.repositoryid=repositories.id) AND (checkins.descid=descs.id) AND (repositories.repository='/repo/svn/mozilla') AND (descs.description='619842') ORDER BY checkins.ci_when DESC,descid LIMIT 1001

There are several improvements that can be made to the checkins table.

Firstly, it's MyISAM. Is that necessary? Or can we convert it to InnoDB? The table itself is <100M so the conversion would be fast. This would give the table better transactionality.

Secondly, there's a redundant index:

  UNIQUE KEY `repositoryid` (`repositoryid`,`dirid`,`fileid`,`revision`),
  KEY `repositoryid_2` (`repositoryid`),

That 2nd key is a waste of space, because the 'repositoryid' unique key already takes care of anything that would need that index. 

However, since you are ordering by checkins.ci_when in all the queries, it would be worth it to drop the 'repositoryid_2' key and add in a key on (repository_id,ci_when). This would eliminate the need for an extra pass-through the data to sort the query, and make it faster.

And lastly, but most importantly, why are these running against the read-write IP for the generic database? It should have read-write splitting such that these queries would be on the read-only IP for the generic database. We can scale reads much more easily than we can scale writes, and it's a huge security issue if one application like viewvc can effectively DoS the whole server.
Assignee: nobody → server-ops-webops
Group: websites-security
Component: XUL Explorer → Server Operations: Web Operations
Product: Other Applications → mozilla.org
QA Contact: nmaul
Version: unspecified → other
Adding Dev Services to see if they have any knowledge if viewvc can do r/w splitting in mysql

I'll also do some research
Assignee: server-ops-webops → bburton
Background: this powers viewvc.svn.mozilla.org. It's a web interface for SVN, although technically it can work with CVS too, so there's stuff to work with both in the code/config.

To give you an idea of the age of this schema, here's a comment I found at the top of the script which creates the tables. :)

# administrative program for CVSdb; creates a clean database in
# MySQL 3.22 or later

So umm... yeah. :)


(In reply to Sheeri Cabral [:sheeri] from comment #0)
> Firstly, it's MyISAM. Is that necessary? Or can we convert it to InnoDB? The
> table itself is <100M so the conversion would be fast. This would give the
> table better transactionality.

The schema requests MyISAM specifically, but given that it was written for MySQL 3.22, this doesn't concern me at all... it was likely done so as to guarantee choosing "MyISAM" instead of "ISAM". I see no reason why it can't be InnoDB. Not many columns, and no blob columns of any type, so row length isn't a concern. It's possible there are some "select count(*)" things going on, although I think that's unlikely, and even so the row count might be low enough to be insignificant (820k rows).

I'm not worried about the few seconds' pause to ALTER it. This should not be a high-traffic site these days. Most projects are in hg or git... SVN gets a lot of usage, but far less than those. Additionally this would only block the viewvc interface, not actual SVN+SSH traffic, so the scope of impact is relatively small. I don't feel a need to even announce a window for it.

I made a copy and timed this... took 65 seconds. Feel free to do this at your leisure.


> Secondly, there's a redundant index:
> 
>   UNIQUE KEY `repositoryid` (`repositoryid`,`dirid`,`fileid`,`revision`),
>   KEY `repositoryid_2` (`repositoryid`),
> 
> That 2nd key is a waste of space, because the 'repositoryid' unique key
> already takes care of anything that would need that index. 

Sounds good to me. Go for it. I'm not worried about updating any schema files or anything for repeatability. :)

> However, since you are ordering by checkins.ci_when in all the queries, it
> would be worth it to drop the 'repositoryid_2' key and add in a key on
> (repository_id,ci_when). This would eliminate the need for an extra
> pass-through the data to sort the query, and make it faster.

Sounds good to me, however in my brief testing this didn't seem to affect the output of EXPLAIN on one of those queries from comment 0, apart from adding an additional possible key (which was not chosen). Don't know why that is. Might be worth a closer look. If it still looks valuable to you, go for it.

> And lastly, but most importantly, why are these running against the
> read-write IP for the generic database? It should have read-write splitting
> such that these queries would be on the read-only IP for the generic
> database. We can scale reads much more easily than we can scale writes, and
> it's a huge security issue if one application like viewvc can effectively
> DoS the whole server.

This app supports a read-only username/password, but not (as far as I can tell) a read-only database. It's old (obviously, given that comment I pasted above) and not our own code... probably was just never designed with that in mind. fox2mike and bkero might have some input on how feasible it would be to retrofit this.
ViewVC is still very much an active project upstream. I hope any recommendations are pushed upstream to be committed for others to benefit as well.
Fair enough!

Here's the upstream DB creation script. It's a bit newer than ours (the MySQL 3.22 comment is gone), but the checkins table is identical. Haven't examined others:
http://viewvc.tigris.org/svn/viewvc/trunk/bin/make-database

Info on contributing is here: http://viewvc.org/contributing.html.

Patches (against that make-database script) should be sent to their Issue Tracker, which is linked on that page. Looks like a Bugzilla installation. Doesn't seem to get many bugs, but there was something about this very script that was fixed less than a month ago.

I don't expect they'd be keen to take a patch that changes this one table over to InnoDB, but the index changes seem likely.
Group: core-security
While I'm happy to submit patches, is there any reason we couldn't make the changes on our own installation? I'm certainly happy to convert all the tables to MyISAM.
Assignee: bburton → scabral
(In reply to Sheeri Cabral [:sheeri] from comment #5)
> While I'm happy to submit patches, is there any reason we couldn't make the
> changes on our own installation? I'm certainly happy to convert all the
> tables to MyISAM.

Also, if there's a need to re-create the tables with a newer schema and move stuff around, I'm more than happy to do that and re-import all the data from svn.
Group: websites-security → infra
Component: Server Operations: Web Operations → Server Operations: Database
QA Contact: nmaul → cshields
Component: Server Operations: Database → Server Operations: Web Operations
QA Contact: cshields → nmaul
Whiteboard: [2013q2]
This isn't strictly owned by Webops, it's Dev Services.

What exactly in here is a Q2 goal?
Component: Server Operations: Web Operations → Server Operations: Developer Services
QA Contact: nmaul → shyam
The q2 goal is for the DB Engineering team to convert tables to InnoDB and submit the patches upstream.
(also to change the indexing as recommended in the description and if applicable, submit those patches upstream as well)
Moving back to the DB team then :)
Component: Server Operations: Developer Services → Server Operations: Database
QA Contact: shyam → cshields
Whiteboard: [2013q2] → [2013q2] June
FYI most tables were converted to InnoDB on the dev instance, but I'm not seeing a viewvc dev/staging instance. :(
Bug 853466 converted these tables to InnoDB:

viewvc.branches
viewvc.checkins
viewvc.descs
viewvc.dirs
viewvc.files
viewvc.people
viewvc.repositories
viewvc.tags


Submitted the point to the existing issue (already resolved) about using InnoDB - http://viewvc.tigris.org/issues/show_bug.cgi?id=262 - the resolution in 2007 was "nobody will need the power of InnoDB so we're keeping it MyISAM".

Opened issue 529 about the redundant index and sorting:

http://viewvc.tigris.org/issues/show_bug.cgi?id=529

And I have actually performed the ALTER TABLE on the checkins table:
MariaDB [viewvc]> ALTER TABLE checkins DROP INDEX repositoryid_2, ADD INDEX repositoryid_2 (repositoryid,ci_when);
Query OK, 834905 rows affected (35.56 sec)             
Records: 834905  Duplicates: 0  Warnings: 0

I believe all the actions are complete - change the tables at Mozilla to use InnoDB and to make better indexes on the checkins table, and I have submitted issues upstream. (I have offered diff patches, I don't think they actually need them, but just in case they want them...)
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
This is quite awesome, thank you!
Can we open this bug up now?
For sure! I should have opened it up once I realized viewvc had no confidential info in it.
Group: infra
Product: mozilla.org → Data & BI Services Team
You need to log in before you can comment on or make changes to this bug.