Open
Bug 1173441
Opened 9 years ago
Updated 9 years ago
Optimize |hg pushhead("tree")|
Categories
(Developer Services :: Mercurial: mozext, defect)
Developer Services
Mercurial: mozext
Tracking
(Not tracked)
ASSIGNED
People
(Reporter: nalexander, Assigned: nalexander)
Details
Attachments
(2 files)
My use case is Bug 1162191. I want to make it easy to fetch good remote binaries for the current tip; to do this, I want to use the pushlog. Right now, |hg pushhead("tree")| is pretty slow; this ticket tracks making it faster.
Assignee | ||
Comment 1•9 years ago
|
||
Bug 1173441 - Pre: Add |hg pushlogsync --reset| to wipe local pushlog and start fresh. r?gps
Attachment #8620433 -
Flags: review?(gps)
Assignee | ||
Comment 2•9 years ago
|
||
Bug 1173441 - Optimize pushhead(tree) revset. r?gps This patch adds an index for the specific query, guards against bad pushlog data, and makes the revset lazy. Locally, it performs significantly better than the earlier version. (pushhead() without an argument is not performant enough to use locally, with or without this patch.) My use case is finding pushes "close" to tip. Limiting the query to pushes in a certain time range is probably the way to improve the performance. Without this patch, I see results around: nalexander@chocho ~/M/gecko> time hg log -r 'last(pushhead("fx-team"), 20)' --template "{node}\n" <snip> 3.88 real 3.72 user 0.14 sys With this patch, I see results around: nalexander@chocho ~/M/gecko> time hg log -r 'last(pushhead("fx-team"), 20)' --template "{node}\n" <snip> 0.99 real 0.89 user 0.09 sys (These timings are not robust.)
Attachment #8620434 -
Flags: review?(gps)
Comment 3•9 years ago
|
||
Comment on attachment 8620433 [details] MozReview Request: Bug 1173441 - Pre: Add |hg pushlogsync --reset| to wipe local pushlog and start fresh. r?gps https://reviewboard.mozilla.org/r/10749/#review9425 I like the feature but not the color of the bikeshed. ::: pylib/mozautomation/mozautomation/changetracker.py:79 (Diff revision 1) > + with self._db: > + for table in ['trees', 'pushes', 'changeset_pushes', 'bug_changesets']: > + self._db.execute('DELETE FROM ' + table) > + At the point you are truncating every table, it's probably better to delete the database file from disk. That's what I've been doing locally to reset my database :)
Attachment #8620433 -
Flags: review?(gps)
Updated•9 years ago
|
Attachment #8620434 -
Flags: review?(gps)
Comment 4•9 years ago
|
||
Comment on attachment 8620434 [details] MozReview Request: Bug 1173441 - Optimize pushhead(tree) revset. r?gps https://reviewboard.mozilla.org/r/10751/#review9427 ::: hgext/mozext/__init__.py:1002 (Diff revision 1) > - > - return [r for r in subset if r in heads] > - > - revs = [] > + if head_changeset not in repo: > + # There are some malformed pushes. Ignore them. > + continue > + head = repo[head_changeset].rev() While changeset lookups are pretty fast, doing redundant lookups is overhead. Use the following to only do a single lookup: try: head = repo[head_changeset].rev() except error.LookupError: # Malformed pushes. Ignore. continue ::: pylib/mozautomation/mozautomation/changetracker.py:145 (Diff revision 1) > - 'trees.id = pushes.tree_id AND trees.name=? ' > + 'GROUP BY head_changeset ' I'm not sure why this GROUP BY is here. There should only be one row per (changeset, tree). ::: hgext/mozext/__init__.py:1000 (Diff revision 1) > - heads = set(repo[r[4]].rev() for r in > + for push_id, head_changeset in repo.changetracker.tree_push_head_changesets(tree): Since the time this code was written, Mercurial has created some new revset APIs to enable more lazy evaluation. See https://dxr.mozilla.org/hgcustom:version-control-tools/source/hgext/hgmo/__init__.py#104 for an example. It might be better to refactor the code for emitting the revs that are push heads as a generator function/closure then do something like: return subset & revset.generatorset(pushheads()) You may want to pull up mercurial/revset.py and look at what the `iterasc` argument to generatorset does. ::: hgext/mozext/__init__.py:1013 (Diff revision 1) > - revs.append(r) > + yield r Again, while you are here, please utilize revset.generatorset or use ``subset.filter``.
Assignee | ||
Comment 5•9 years ago
|
||
https://reviewboard.mozilla.org/r/10749/#review9425 > At the point you are truncating every table, it's probably better to delete the database file from disk. That's what I've been doing locally to reset my database :) In theory, I agree. In reality, deleting the file means closing the connection and recreating the schema with an existing object, which means making the control flow needs to handle "re-initializing" the object. What we really want is to call __init__ again, which might be possible but isn't standard. Is this really worth it?
Assignee | ||
Comment 6•9 years ago
|
||
https://reviewboard.mozilla.org/r/10751/#review9427 > I'm not sure why this GROUP BY is here. There should only be one row per (changeset, tree). This is not true. Each row includes (changeset, head_changeset), and only the head_changeset should be considered a pushhead. Now, I'd be willing to use different SQL than GROUP BY if you feel it's more clear: DISTINCT or whatever the syntax is.
Assignee | ||
Comment 7•9 years ago
|
||
Comment on attachment 8620433 [details] MozReview Request: Bug 1173441 - Pre: Add |hg pushlogsync --reset| to wipe local pushlog and start fresh. r?gps Bug 1173441 - Pre: Add |hg pushlogsync --reset| to wipe local pushlog and start fresh. r?gps
Attachment #8620433 -
Flags: review?(gps)
Assignee | ||
Comment 8•9 years ago
|
||
Comment on attachment 8620434 [details] MozReview Request: Bug 1173441 - Optimize pushhead(tree) revset. r?gps Bug 1173441 - Optimize pushhead(tree) revset. r?gps This patch adds an index for the specific query, guards against bad pushlog data, and makes the revset lazy. Locally, it performs significantly better than the earlier version. pushhead() without an argument is not performant enough to use locally without this patch; with this patch, it is usable. My use case is finding pushes "close" to tip. Limiting the query to pushes in a certain time range is probably the way to improve the performance.
Attachment #8620434 -
Flags: review?(gps)
Comment 9•9 years ago
|
||
Comment on attachment 8620433 [details] MozReview Request: Bug 1173441 - Pre: Add |hg pushlogsync --reset| to wipe local pushlog and start fresh. r?gps https://reviewboard.mozilla.org/r/10749/#review11147 Ship It!
Attachment #8620433 -
Flags: review?(gps) → review+
Updated•9 years ago
|
Attachment #8620434 -
Flags: review?(gps) → review+
Comment 10•9 years ago
|
||
Comment on attachment 8620434 [details] MozReview Request: Bug 1173441 - Optimize pushhead(tree) revset. r?gps https://reviewboard.mozilla.org/r/10751/#review11149 ::: pylib/mozautomation/mozautomation/changetracker.py:147 (Diff revision 2) > - yield name, pid, t, user, str(head) > + yield push_id, head You should return a str, not unicode via `head.encode('ascii', 'strict')` ::: hgext/mozext/__init__.py:1004 (Diff revision 2) > - return [r for r in subset if r in heads] > + head = repo[str(head_changeset)].rev() Please do the str conversion in called function.
Assignee | ||
Comment 11•9 years ago
|
||
url: https://hg.mozilla.org/hgcustom/version-control-tools/rev/32e3f6f8a94fae7f626b3588eeb136ddb6e32546 changeset: 32e3f6f8a94fae7f626b3588eeb136ddb6e32546 user: Nick Alexander <nalexander@mozilla.com> date: Wed Jun 10 10:28:17 2015 -0700 description: Bug 1173441 - Pre: Add |hg pushlogsync --reset| to wipe local pushlog and start fresh. r=gps url: https://hg.mozilla.org/hgcustom/version-control-tools/rev/fce1db104487b0f4ccbb4568e2adfb1cbc3d5926 changeset: fce1db104487b0f4ccbb4568e2adfb1cbc3d5926 user: Nick Alexander <nalexander@mozilla.com> date: Wed Jun 10 10:40:17 2015 -0700 description: Bug 1173441 - Optimize pushhead(tree) revset. r=gps This patch adds an index for the specific query, guards against bad pushlog data, and makes the revset lazy. Locally, it performs significantly better than the earlier version. pushhead() without an argument is not performant enough to use locally without this patch; with this patch, it is usable. My use case is finding pushes "close" to tip. Limiting the query to pushes in a certain time range is probably the way to improve the performance.
You need to log in
before you can comment on or make changes to this bug.
Description
•