Open Bug 1173441 Opened 9 years ago Updated 9 years ago

Optimize |hg pushhead("tree")|

Categories

(Developer Services :: Mercurial: mozext, defect)

defect
Not set
normal

Tracking

(Not tracked)

ASSIGNED

People

(Reporter: nalexander, Assigned: nalexander)

Details

Attachments

(2 files)

My use case is Bug 1162191.  I want to make it easy to fetch good remote binaries for the current tip; to do this, I want to use the pushlog.  Right now, |hg pushhead("tree")| is pretty slow; this ticket tracks making it faster.
Bug 1173441 - Pre: Add |hg pushlogsync --reset| to wipe local pushlog and start fresh. r?gps
Attachment #8620433 - Flags: review?(gps)
Bug 1173441 - Optimize pushhead(tree) revset. r?gps

This patch adds an index for the specific query, guards against bad
pushlog data, and makes the revset lazy.  Locally, it performs
significantly better than the earlier version.  (pushhead() without an
argument is not performant enough to use locally, with or without this
patch.)

My use case is finding pushes "close" to tip.  Limiting the query to
pushes in a certain time range is probably the way to improve the
performance.

Without this patch, I see results around:
nalexander@chocho ~/M/gecko> time hg log -r 'last(pushhead("fx-team"), 20)' --template "{node}\n"
<snip>
        3.88 real         3.72 user         0.14 sys
With this patch, I see results around:
nalexander@chocho ~/M/gecko> time hg log -r 'last(pushhead("fx-team"), 20)' --template "{node}\n"
<snip>
        0.99 real         0.89 user         0.09 sys
(These timings are not robust.)
Attachment #8620434 - Flags: review?(gps)
Comment on attachment 8620433 [details]
MozReview Request: Bug 1173441 - Pre: Add |hg pushlogsync --reset| to wipe local pushlog and start fresh. r?gps

https://reviewboard.mozilla.org/r/10749/#review9425

I like the feature but not the color of the bikeshed.

::: pylib/mozautomation/mozautomation/changetracker.py:79
(Diff revision 1)
> +        with self._db:
> +            for table in ['trees', 'pushes', 'changeset_pushes', 'bug_changesets']:
> +                self._db.execute('DELETE FROM ' + table)
> +

At the point you are truncating every table, it's probably better to delete the database file from disk. That's what I've been doing locally to reset my database :)
Attachment #8620433 - Flags: review?(gps)
Attachment #8620434 - Flags: review?(gps)
Comment on attachment 8620434 [details]
MozReview Request: Bug 1173441 - Optimize pushhead(tree) revset. r?gps

https://reviewboard.mozilla.org/r/10751/#review9427

::: hgext/mozext/__init__.py:1002
(Diff revision 1)
> -
> -        return [r for r in subset if r in heads]
> -
> -    revs = []
> +            if head_changeset not in repo:
> +                # There are some malformed pushes.  Ignore them.
> +                continue
> +            head = repo[head_changeset].rev()

While changeset lookups are pretty fast, doing redundant lookups is overhead. Use the following to only do a single lookup:

try:
    head = repo[head_changeset].rev()
except error.LookupError:
    # Malformed pushes. Ignore.
    continue

::: pylib/mozautomation/mozautomation/changetracker.py:145
(Diff revision 1)
> -            'trees.id = pushes.tree_id AND trees.name=? '
> +            'GROUP BY head_changeset '

I'm not sure why this GROUP BY is here. There should only be one row per (changeset, tree).

::: hgext/mozext/__init__.py:1000
(Diff revision 1)
> -        heads = set(repo[r[4]].rev() for r in
> +        for push_id, head_changeset in repo.changetracker.tree_push_head_changesets(tree):

Since the time this code was written, Mercurial has created some new revset APIs to enable more lazy evaluation. See https://dxr.mozilla.org/hgcustom:version-control-tools/source/hgext/hgmo/__init__.py#104 for an example. It might be better to refactor the code for emitting the revs that are push heads as a generator function/closure then do something like:

return subset & revset.generatorset(pushheads())

You may want to pull up mercurial/revset.py and look at what the `iterasc` argument to generatorset does.

::: hgext/mozext/__init__.py:1013
(Diff revision 1)
> -                revs.append(r)
> +                    yield r

Again, while you are here, please utilize revset.generatorset or use ``subset.filter``.
https://reviewboard.mozilla.org/r/10749/#review9425

> At the point you are truncating every table, it's probably better to delete the database file from disk. That's what I've been doing locally to reset my database :)

In theory, I agree.  In reality, deleting the file means closing the connection and recreating the schema with an existing object, which means making the control flow needs to handle "re-initializing" the object.  What we really want is to call __init__ again, which might be possible but isn't standard.  Is this really worth it?
https://reviewboard.mozilla.org/r/10751/#review9427

> I'm not sure why this GROUP BY is here. There should only be one row per (changeset, tree).

This is not true.  Each row includes (changeset, head_changeset), and only the head_changeset should be considered a pushhead.

Now, I'd be willing to use different SQL than GROUP BY if you feel it's more clear: DISTINCT or whatever the syntax is.
Comment on attachment 8620433 [details]
MozReview Request: Bug 1173441 - Pre: Add |hg pushlogsync --reset| to wipe local pushlog and start fresh. r?gps

Bug 1173441 - Pre: Add |hg pushlogsync --reset| to wipe local pushlog and start fresh. r?gps
Attachment #8620433 - Flags: review?(gps)
Comment on attachment 8620434 [details]
MozReview Request: Bug 1173441 - Optimize pushhead(tree) revset. r?gps

Bug 1173441 - Optimize pushhead(tree) revset. r?gps

This patch adds an index for the specific query, guards against bad
pushlog data, and makes the revset lazy.  Locally, it performs
significantly better than the earlier version.  pushhead() without an
argument is not performant enough to use locally without this patch;
with this patch, it is usable.

My use case is finding pushes "close" to tip.  Limiting the query to
pushes in a certain time range is probably the way to improve the
performance.
Attachment #8620434 - Flags: review?(gps)
Comment on attachment 8620433 [details]
MozReview Request: Bug 1173441 - Pre: Add |hg pushlogsync --reset| to wipe local pushlog and start fresh. r?gps

https://reviewboard.mozilla.org/r/10749/#review11147

Ship It!
Attachment #8620433 - Flags: review?(gps) → review+
Attachment #8620434 - Flags: review?(gps) → review+
Comment on attachment 8620434 [details]
MozReview Request: Bug 1173441 - Optimize pushhead(tree) revset. r?gps

https://reviewboard.mozilla.org/r/10751/#review11149

::: pylib/mozautomation/mozautomation/changetracker.py:147
(Diff revision 2)
> -            yield name, pid, t, user, str(head)
> +            yield push_id, head

You should return a str, not unicode via `head.encode('ascii', 'strict')`

::: hgext/mozext/__init__.py:1004
(Diff revision 2)
> -        return [r for r in subset if r in heads]
> +                    head = repo[str(head_changeset)].rev()

Please do the str conversion in called function.
url:        https://hg.mozilla.org/hgcustom/version-control-tools/rev/32e3f6f8a94fae7f626b3588eeb136ddb6e32546
changeset:  32e3f6f8a94fae7f626b3588eeb136ddb6e32546
user:       Nick Alexander <nalexander@mozilla.com>
date:       Wed Jun 10 10:28:17 2015 -0700
description:
Bug 1173441 - Pre: Add |hg pushlogsync --reset| to wipe local pushlog and start fresh. r=gps

url:        https://hg.mozilla.org/hgcustom/version-control-tools/rev/fce1db104487b0f4ccbb4568e2adfb1cbc3d5926
changeset:  fce1db104487b0f4ccbb4568e2adfb1cbc3d5926
user:       Nick Alexander <nalexander@mozilla.com>
date:       Wed Jun 10 10:40:17 2015 -0700
description:
Bug 1173441 - Optimize pushhead(tree) revset. r=gps

This patch adds an index for the specific query, guards against bad
pushlog data, and makes the revset lazy.  Locally, it performs
significantly better than the earlier version.  pushhead() without an
argument is not performant enough to use locally without this patch;
with this patch, it is usable.

My use case is finding pushes "close" to tip.  Limiting the query to
pushes in a certain time range is probably the way to improve the
performance.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: