Closed Bug 635765 Opened 13 years ago Closed 13 years ago

pushlog queries perform table scans because changesets' pushid is not a primary key or in an index

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: asuth, Assigned: aravind)

References

Details

Much longer blog post about the problem, made for tool awareness rather than this being a difficult problem to investigate:
http://www.visophyte.org/blog/?p=698

Nutshell, an index needs to be added:
CREATE INDEX changeset_pushid ON changesets (pushid);
to this file:
http://hg.mozilla.org/hgcustom/hghooks/file/63a618e9a3d3/mozhghooks/pushlog.py

And then someone needs to go open all the pushlog2.db files for all the big repos and run the same command since the databases won't fix themselves.  (Or the pushlog needs to check for the presence of the index every time it runs.)
OS: Linux → All
Hardware: x86_64 → All
Blocks: 508863
I'm not sure who owns the pushlog (bsmedberg I think?) or who watches this component -- you may want to ping bsmedberg and maybe releng to make sure this gets done.
Ted owns the pushlog, I work on it as well.
But creating an index is probably a job for the hg server admin, which has been Aravind. I read he's moving on to another job, but not sure when that will take effect or who will take over his job of looking after the hg webheads.
@djc: I am here through this week, and should be able to wrap it up in that time.

@Ted: Please update the pushlog code and let me know when its done.  I will push out the code and update the pushlog db files around the same time.

@Corey: Need someone to own this stuff moving forward.
Shows what a crappy DBA I am.
I pushed a fix here:
http://hg.mozilla.org/hgcustom/hghooks/rev/033dd0cc2d55

For the existing repos, you should be able to do something like the following:
find /path/ -name "pushlog2.db" -exec sqlite3 {} "CREATE INDEX changeset_pushid ON changesets (pushid);"

assuming you have the sqlite3 binary in $PATH. Aravind: you can probably figure out something smarter than that with your mighty shell-fu.

Assigning this to aravind to get the existing databases updated.
Assignee: nobody → aravind
Will push this live sometime today.
Component: Hg: Customizations → Server Operations
QA Contact: hg.customizations → mrz
All done, new hook deployed and all the pushlogs updated with the index.

Thanks to Andrew and Ted for the bug finding and fixing.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Thanks very much for rapidly moving to address the problem!

Unfortunately, a url along the lines of
http://hg.mozilla.org/mozilla-central/json-pushes?full=1&startdate=02/20/2011%2016:57:05Z&enddate=02/21/2011%2016:17:14Z

Still takes the same amount of time for me, are you sure the index changes took?

Note that there appears to be some kind of transparent caching layer in play, so you need to permute the URL of the request slightly to force the request to actually hit the implementation.  I suggest twiddling the number of minutes or seconds or what not.

If we believe the index is now there, it's possible there's some other problem.  It would be interesting to see the schema dump of the actual mozilla-central table to sanity check things.  One can do that like so:

sqlite3 /path/to/pushlog2.db ".schema"
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Yup, the index does exist.  Here is the schema.

[root@dm-svn01 mozilla-central]# sqlite3 .hg/pushlog2.db ".schema"
CREATE TABLE changesets (pushid INTEGER, rev INTEGER, node text);
CREATE TABLE pushlog (id INTEGER PRIMARY KEY AUTOINCREMENT, user TEXT, date INTEGER);
CREATE UNIQUE INDEX changeset_node ON changesets (node);
CREATE INDEX changeset_pushid ON changesets (pushid);
CREATE UNIQUE INDEX changeset_rev ON changesets (rev);
CREATE INDEX pushlog_date ON pushlog (date);
CREATE INDEX pushlog_user ON pushlog (user);
[root@dm-svn01 mozilla-central]#
urgh.  Maybe the table-scan was not the long pole.  The first sin of optimization and all that...

Any chance of getting a copy of the pushlog database so I can profile what is going on in a more realistic fashion taking the meta-data retrieval from the hg repo into account?  Using the ".dump" command to generate a (bzipped) SQL dump is probably the safest way to get a coherent database, and ideally it should not induce additional pushlog latency.  (I am presuming there is no secret information that would be exposed by such a dump.  Also worth noting that I'm a Mozilla Messaging employee if there is privileged info.)
Oy, I probably should have been more thorough and compared using "full=1" and not using "full=1".  If I lose the full=1, things are fast enough.  This suggests it's the meta-data retrieval that is slow.

I'll look into this briefly to see if the problem is in the pushlog code.  If it appears the problem is hg proper, I'll bail on it and seek getting the changeset meta-info via means that are more friendly to the server.
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
(In reply to comment #11)
> urgh.  Maybe the table-scan was not the long pole.  The first sin of
> optimization and all that...
It feels like the normal pushlog page is a lot faster to me, for what it's worth.  However, I'm a bit confused as to why adding an index on changesets.pushid helped at all for date queries (I read the blog post and it made sense though!).  I'm willing to bet that the bottleneck is this gem:
0|0|0|USE TEMP B-TREE FOR ORDER BY

I think what we actually want is to drop the current index pushlog_date, and create it again like this (this is totally untested though):
CREATE INDEX pushlog_date ON pushlog (date DESC);

Do we ever want the date to actually be sorted by ascending anyway?  I don't think so, but I could be wrong!

On a related note, we'd probably see a perf win across the board by using a SQLite compiled with STAT2 and running ANALYZE on the database once a month or so.
(In reply to comment #13)
> (In reply to comment #11)
> > urgh.  Maybe the table-scan was not the long pole.  The first sin of
> > optimization and all that...
> It feels like the normal pushlog page is a lot faster to me, for what it's
> worth.  However, I'm a bit confused as to why adding an index on
> changesets.pushid helped at all for date queries (I read the blog post and it
> made sense though!).  I'm willing to bet that the bottleneck is this gem:
> 0|0|0|USE TEMP B-TREE FOR ORDER BY

The ephemeral b-tree only holds the result set, which is tiny, and so should have no meaningful impact.  See the visualized control flow:

http://www.visophyte.org/blog/wp-content/uploads/2011/02/with-index-blocks.png

Having said that, not having to create the ephemeral table would indeed be a good thing.  Certainly if someone issues malicious queries, not having it would be more of a boon.

> On a related note, we'd probably see a perf win across the board by using a
> SQLite compiled with STAT2 and running ANALYZE on the database once a month or
> so.

I think the analyzer's guesses are pretty much spot on without any information right now; not sure this would provide an additional win, especially given the regularity of the data.
Preliminary investigation by inspection suggests that the most expensive computation is figuring out what tags a revision is associated with.  All the other information is immediately available once the changelog lookup cost is paid for.  localrepository.nodetags() actually has to build a mapping of all the tags for the entire repository to answer the question.

It does cache that information, but that assumes that no one throws away the localrepository instance...

Does anyone know what the execution model of the hgweb stuff is on hg.mozilla.org?  Specifically, are processes reused, and under what situations are they killed if they are reused?  (For example, would a process be killed if it crossed a memory threshold that might be correlated with actually caching the required metadata...?)
I'm pretty sure each request will recreate the required repository object, so yeah, you have to pay for tag determination each time. On the other hand, if everything is configured correctly, there are on-disk caches that should help. These only get written if the permissions are set correctly, but I think we already went through setting that up right at some point. Aravind, can you check the mtimes on mozilla-central/.hg/*.cache?
(In reply to comment #16)
> On the other hand, if
> everything is configured correctly, there are on-disk caches that should help.
> These only get written if the permissions are set correctly, but I think we
> already went through setting that up right at some point. Aravind, can you
> check the mtimes on mozilla-central/.hg/*.cache?

So.. this is probably why our queries take longer - they are served out of a nfs read-only mount point on the vcview* servers.  This was the outcome of a security bug to separate out the read and the read/write parts of the application.

The *.cache directories correspondingly have not been updated in a while.

-rw-rw-r-- Jul  9  2009 branch.cache
-rw-rw-r-- Feb 23 16:28 branchheads.cache

Is it possible to override the locations of these cache directories into some sort of a (different) shared read-write location?
No. In the upcoming Mercurial 1.8 (to be released on March 1 if everything goes according to plan), there's a separate dir .hg/cache where these are saved, but I'm not sure if that will help this use case?
(In reply to comment #17)
> (In reply to comment #16)
> > On the other hand, if
> > everything is configured correctly, there are on-disk caches that should help.
> > These only get written if the permissions are set correctly, but I think we
> > already went through setting that up right at some point. Aravind, can you
> > check the mtimes on mozilla-central/.hg/*.cache?
> 
> So.. this is probably why our queries take longer - they are served out of a
> nfs read-only mount point on the vcview* servers.  This was the outcome of a
> security bug to separate out the read and the read/write parts of the
> application.
> 
> The *.cache directories correspondingly have not been updated in a while.
> 
> -rw-rw-r-- Jul  9  2009 branch.cache
> -rw-rw-r-- Feb 23 16:28 branchheads.cache
> 
> Is it possible to override the locations of these cache directories into some
> sort of a (different) shared read-write location?

(In reply to comment #18)
> No. In the upcoming Mercurial 1.8 (to be released on March 1 if everything goes
> according to plan), there's a separate dir .hg/cache where these are saved, but
> I'm not sure if that will help this use case?

Since Mercurial 1.8 has been released, can an infrastructure person spin this out to its own bug targeted at someone who will address this, or if it has already been spun out, cite the bug number here?  While the transparent proxying that happens is likely saving hg.mozilla.org from catching on fire due to TBPL, this seems like the kind of thing that would be a pretty big win for hg.mozilla.org.  (I am not sure where to file the bug or who to assign it to in order to increase its chance of being addressed.)
I filed bug 640439 to cover the hg.mozilla.org mercurial caching so this does not get lost.  Thanks all for the investigation work!  I look forward to hg.mozilla.org requests being much snappier in the future. :)
See Also: → 640439
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.