635765 - pushlog queries perform table scans because changesets' pushid is not a primary key or in an index

Reporter

Description

•

14 years ago

Much longer blog post about the problem, made for tool awareness rather than this being a difficult problem to investigate: http://www.visophyte.org/blog/?p=698 Nutshell, an index needs to be added: CREATE INDEX changeset_pushid ON changesets (pushid); to this file: http://hg.mozilla.org/hgcustom/hghooks/file/63a618e9a3d3/mozhghooks/pushlog.py And then someone needs to go open all the pushlog2.db files for all the big repos and run the same command since the databases won't fix themselves. (Or the pushlog needs to check for the presence of the index every time it runs.)

Andrew Sutherland [:asuth] (he/him)

Reporter

Updated

•

14 years ago

OS: Linux → All

Hardware: x86_64 → All

Andrew Sutherland [:asuth] (he/him)

Reporter

Updated

•

14 years ago

Blocks: 508863

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Comment 1

•

14 years ago

I'm not sure who owns the pushlog (bsmedberg I think?) or who watches this component -- you may want to ping bsmedberg and maybe releng to make sure this gets done.

Dirkjan Ochtman (:djc)

Comment 2

•

14 years ago

Ted owns the pushlog, I work on it as well.

Dirkjan Ochtman (:djc)

Comment 3

•

14 years ago

But creating an index is probably a job for the hg server admin, which has been Aravind. I read he's moving on to another job, but not sure when that will take effect or who will take over his job of looking after the hg webheads.

Aravind Gottipati [:aravind]

Assignee

Comment 4

•

14 years ago

@djc: I am here through this week, and should be able to wrap it up in that time. @Ted: Please update the pushlog code and let me know when its done. I will push out the code and update the pushlog db files around the same time. @Corey: Need someone to own this stuff moving forward.

(not currently active) Ted Mielczarek

Comment 5

•

14 years ago

Shows what a crappy DBA I am.

(not currently active) Ted Mielczarek

Comment 6

•

14 years ago

I pushed a fix here: http://hg.mozilla.org/hgcustom/hghooks/rev/033dd0cc2d55 For the existing repos, you should be able to do something like the following: find /path/ -name "pushlog2.db" -exec sqlite3 {} "CREATE INDEX changeset_pushid ON changesets (pushid);" assuming you have the sqlite3 binary in $PATH. Aravind: you can probably figure out something smarter than that with your mighty shell-fu. Assigning this to aravind to get the existing databases updated.

Assignee: nobody → aravind

Aravind Gottipati [:aravind]

Assignee

Comment 7

•

14 years ago

Will push this live sometime today.

Component: Hg: Customizations → Server Operations

QA Contact: hg.customizations → mrz

Aravind Gottipati [:aravind]

Assignee

Comment 8

•

14 years ago

All done, new hook deployed and all the pushlogs updated with the index. Thanks to Andrew and Ted for the bug finding and fixing.

Status: NEW → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 9

•

14 years ago

Thanks very much for rapidly moving to address the problem! Unfortunately, a url along the lines of http://hg.mozilla.org/mozilla-central/json-pushes?full=1&startdate=02/20/2011%2016:57:05Z&enddate=02/21/2011%2016:17:14Z Still takes the same amount of time for me, are you sure the index changes took? Note that there appears to be some kind of transparent caching layer in play, so you need to permute the URL of the request slightly to force the request to actually hit the implementation. I suggest twiddling the number of minutes or seconds or what not. If we believe the index is now there, it's possible there's some other problem. It would be interesting to see the schema dump of the actual mozilla-central table to sanity check things. One can do that like so: sqlite3 /path/to/pushlog2.db ".schema"

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Aravind Gottipati [:aravind]

Assignee

Comment 10

•

14 years ago

Yup, the index does exist. Here is the schema. [root@dm-svn01 mozilla-central]# sqlite3 .hg/pushlog2.db ".schema" CREATE TABLE changesets (pushid INTEGER, rev INTEGER, node text); CREATE TABLE pushlog (id INTEGER PRIMARY KEY AUTOINCREMENT, user TEXT, date INTEGER); CREATE UNIQUE INDEX changeset_node ON changesets (node); CREATE INDEX changeset_pushid ON changesets (pushid); CREATE UNIQUE INDEX changeset_rev ON changesets (rev); CREATE INDEX pushlog_date ON pushlog (date); CREATE INDEX pushlog_user ON pushlog (user); [root@dm-svn01 mozilla-central]#

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 11

•

14 years ago

urgh. Maybe the table-scan was not the long pole. The first sin of optimization and all that... Any chance of getting a copy of the pushlog database so I can profile what is going on in a more realistic fashion taking the meta-data retrieval from the hg repo into account? Using the ".dump" command to generate a (bzipped) SQL dump is probably the safest way to get a coherent database, and ideally it should not induce additional pushlog latency. (I am presuming there is no secret information that would be exposed by such a dump. Also worth noting that I'm a Mozilla Messaging employee if there is privileged info.)

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 12

•

14 years ago

Oy, I probably should have been more thorough and compared using "full=1" and not using "full=1". If I lose the full=1, things are fast enough. This suggests it's the meta-data retrieval that is slow. I'll look into this briefly to see if the problem is in the pushlog code. If it appears the problem is hg proper, I'll bail on it and seek getting the changeset meta-info via means that are more friendly to the server.

Status: REOPENED → RESOLVED

Closed: 14 years ago → 14 years ago

Resolution: --- → FIXED

Shawn Wilsher :sdwilsh

Comment 13

•

14 years ago

(In reply to comment #11) > urgh. Maybe the table-scan was not the long pole. The first sin of > optimization and all that... It feels like the normal pushlog page is a lot faster to me, for what it's worth. However, I'm a bit confused as to why adding an index on changesets.pushid helped at all for date queries (I read the blog post and it made sense though!). I'm willing to bet that the bottleneck is this gem: 0|0|0|USE TEMP B-TREE FOR ORDER BY I think what we actually want is to drop the current index pushlog_date, and create it again like this (this is totally untested though): CREATE INDEX pushlog_date ON pushlog (date DESC); Do we ever want the date to actually be sorted by ascending anyway? I don't think so, but I could be wrong! On a related note, we'd probably see a perf win across the board by using a SQLite compiled with STAT2 and running ANALYZE on the database once a month or so.

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 14

•

14 years ago

(In reply to comment #13) > (In reply to comment #11) > > urgh. Maybe the table-scan was not the long pole. The first sin of > > optimization and all that... > It feels like the normal pushlog page is a lot faster to me, for what it's > worth. However, I'm a bit confused as to why adding an index on > changesets.pushid helped at all for date queries (I read the blog post and it > made sense though!). I'm willing to bet that the bottleneck is this gem: > 0|0|0|USE TEMP B-TREE FOR ORDER BY The ephemeral b-tree only holds the result set, which is tiny, and so should have no meaningful impact. See the visualized control flow: http://www.visophyte.org/blog/wp-content/uploads/2011/02/with-index-blocks.png Having said that, not having to create the ephemeral table would indeed be a good thing. Certainly if someone issues malicious queries, not having it would be more of a boon. > On a related note, we'd probably see a perf win across the board by using a > SQLite compiled with STAT2 and running ANALYZE on the database once a month or > so. I think the analyzer's guesses are pretty much spot on without any information right now; not sure this would provide an additional win, especially given the regularity of the data.

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 15

•

14 years ago

Preliminary investigation by inspection suggests that the most expensive computation is figuring out what tags a revision is associated with. All the other information is immediately available once the changelog lookup cost is paid for. localrepository.nodetags() actually has to build a mapping of all the tags for the entire repository to answer the question. It does cache that information, but that assumes that no one throws away the localrepository instance... Does anyone know what the execution model of the hgweb stuff is on hg.mozilla.org? Specifically, are processes reused, and under what situations are they killed if they are reused? (For example, would a process be killed if it crossed a memory threshold that might be correlated with actually caching the required metadata...?)

Dirkjan Ochtman (:djc)

Comment 16

•

14 years ago

I'm pretty sure each request will recreate the required repository object, so yeah, you have to pay for tag determination each time. On the other hand, if everything is configured correctly, there are on-disk caches that should help. These only get written if the permissions are set correctly, but I think we already went through setting that up right at some point. Aravind, can you check the mtimes on mozilla-central/.hg/*.cache?

Aravind Gottipati [:aravind]

Assignee

Comment 17

•

14 years ago

(In reply to comment #16) > On the other hand, if > everything is configured correctly, there are on-disk caches that should help. > These only get written if the permissions are set correctly, but I think we > already went through setting that up right at some point. Aravind, can you > check the mtimes on mozilla-central/.hg/*.cache? So.. this is probably why our queries take longer - they are served out of a nfs read-only mount point on the vcview* servers. This was the outcome of a security bug to separate out the read and the read/write parts of the application. The *.cache directories correspondingly have not been updated in a while. -rw-rw-r-- Jul 9 2009 branch.cache -rw-rw-r-- Feb 23 16:28 branchheads.cache Is it possible to override the locations of these cache directories into some sort of a (different) shared read-write location?

Dirkjan Ochtman (:djc)

Comment 18

•

14 years ago

No. In the upcoming Mercurial 1.8 (to be released on March 1 if everything goes according to plan), there's a separate dir .hg/cache where these are saved, but I'm not sure if that will help this use case?

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 19

•

14 years ago

(In reply to comment #17) > (In reply to comment #16) > > On the other hand, if > > everything is configured correctly, there are on-disk caches that should help. > > These only get written if the permissions are set correctly, but I think we > > already went through setting that up right at some point. Aravind, can you > > check the mtimes on mozilla-central/.hg/*.cache? > > So.. this is probably why our queries take longer - they are served out of a > nfs read-only mount point on the vcview* servers. This was the outcome of a > security bug to separate out the read and the read/write parts of the > application. > > The *.cache directories correspondingly have not been updated in a while. > > -rw-rw-r-- Jul 9 2009 branch.cache > -rw-rw-r-- Feb 23 16:28 branchheads.cache > > Is it possible to override the locations of these cache directories into some > sort of a (different) shared read-write location? (In reply to comment #18) > No. In the upcoming Mercurial 1.8 (to be released on March 1 if everything goes > according to plan), there's a separate dir .hg/cache where these are saved, but > I'm not sure if that will help this use case? Since Mercurial 1.8 has been released, can an infrastructure person spin this out to its own bug targeted at someone who will address this, or if it has already been spun out, cite the bug number here? While the transparent proxying that happens is likely saving hg.mozilla.org from catching on fire due to TBPL, this seems like the kind of thing that would be a pretty big win for hg.mozilla.org. (I am not sure where to file the bug or who to assign it to in order to increase its chance of being addressed.)

Andrew Sutherland [:asuth] (he/him)

Reporter

Comment 20

•

14 years ago

I filed bug 640439 to cover the hg.mozilla.org mercurial caching so this does not get lost. Thanks all for the investigation work! I look forward to hg.mozilla.org requests being much snappier in the future. :)

Updated

•

10 years ago

Product: mozilla.org → mozilla.org Graveyard

Bugzilla

pushlog queries perform table scans because changesets' pushid is not a primary key or in an index

Categories

(mozilla.org Graveyard :: Server Operations, task)

Tracking

(Not tracked)

People

(Reporter: asuth, Assigned: aravind)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15

Comment 16

Comment 17

Comment 18

Comment 19

Comment 20

Updated