Closed Bug 1065050 Opened 10 years ago Closed 10 years ago

Rewrite pushlog as an extension

Categories

(Developer Services :: Mercurial: hg.mozilla.org, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gps, Unassigned)

References

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1292] )

Attachments

(10 files, 1 obsolete file)

42 bytes, text/x-review-board-request
Details
38 bytes, text/x-review-board-request
bkero
: review+
Details
38 bytes, text/x-review-board-request
bkero
: review+
Details
38 bytes, text/x-review-board-request
bkero
: review+
Details
38 bytes, text/x-review-board-request
bkero
: review+
Details
38 bytes, text/x-review-board-request
bkero
: review+
Details
38 bytes, text/x-review-board-request
bkero
: review+
Details
38 bytes, text/x-review-board-request
bkero
: review+
Details
38 bytes, text/x-review-board-request
bkero
: review+
Details
38 bytes, text/x-review-board-request
bkero
: review+
Details
Once bug 1065041 is in place and all the pushlog code is in the same repository, we'll be able to rewrite pushlog as a proper extension. I plan to do this, preserving the old hook "entry point" in the process.

Major goals of this refactor are to facilitate syncing of pushlog data via `hg pull` and to better facilitate testing. Making pushlog handle rollback and `hg strip` should also be easier.
to ask, and be clear;

Is "Local non-moco users can pull pushlog data if they install the extension locally" in scope or out of scope of this bug?

(I'd suggest/envision out of scope, but being an envisioned ideal, and *possibly* possible out of the gate, but by accident if so)
(In reply to Justin Wood (:Callek) from comment #1)
> to ask, and be clear;
> 
> Is "Local non-moco users can pull pushlog data if they install the extension
> locally" in scope or out of scope of this bug?

It will be in scope as a side-effect of how it is implemented. Although, designing a system that aggregates pushlog data from multiple remotes would be out of scope initially. However, I do want that to eventually be supported because doing local forensics with pushlog data can be insanely useful, as my mozext extension has demonstrated.
/r/155 - Bug 1065050 - Rewrite pushlog as an extension
/r/156 - Bug 1065050 - Write to pushlog through extension API

Pull down these commits:

hg pull review -r 5eb959823e399ba4e7ad3c820ce5002189de8271
I'll be pushing things to ReviewBoard so people can see what's involved. So far I've only pushed a refactoring of the hook to use the extension API.
Comment on attachment 8486637 [details]
Review for review ID: bz://1065050/gps

/r/155 - Bug 1065050 - Rewrite pushlog as an extension
/r/156 - Bug 1065050 - Write to pushlog through extension API
/r/157 - Bug 1065050 - Move the pretxnchangegroup hook into the extension
/r/158 - Bug 1065050 - Move pushlog tests into extension
/r/159 - pushlog: use a context manager for sqlite connection
/r/160 - pushlog: add a test for `hg strip`
/r/161 - pushlog: handle changeset stripping

Pull down these commits:

hg pull review -r 34305872b615ebb06296cd89a6e9c7165410c50a
Comment on attachment 8486637 [details]
Review for review ID: bz://1065050/gps

/r/155 - Bug 1065050 - Rewrite pushlog as an extension
/r/156 - Bug 1065050 - Write to pushlog through extension API
/r/157 - Bug 1065050 - Move the pretxnchangegroup hook into the extension
/r/158 - Bug 1065050 - Move pushlog tests into extension
/r/159 - pushlog: use a context manager for sqlite connection
/r/160 - pushlog: add a test for `hg strip`
/r/161 - pushlog: handle changeset stripping
/r/163 - pushlog: add a test for rollback behavior
/r/164 - pushlog: refactor connection logic
/r/165 - pushlog: handle transaction rollback
/r/166 - pushlog: remove message about interrupting pushlog

Pull down these commits:

hg pull review -r 0db0a95a66a7b33fdc735f9bcf711bfe5e46e76e
I decided to fix `hg strip` and transaction abort in pushlog before I tackle the syncing problem. Solve existing bugs before optimization, etc.

With the patch series I just posted, `hg strip` will remove stripped data from pushlog (no more manual sqlite hackery required) and an aborted push won't result in the pushlog database being updated.

Properly handling the transaction rollback requires Mercurial 3.0+. I could potentially backport things to 2.5.4. Now that we have a test suite that tests against various Mercurial versions, it's easy enough to verify that it all works.

I'm going on PTO tomorrow. Not sure I'll have time to get these reviewed and to incorporate any fixes. But, hey, we have something to look forward to deploying when I return!
Comment on attachment 8486637 [details]
Review for review ID: bz://1065050/gps

/r/155 - Bug 1065050 - Rewrite pushlog as an extension
/r/156 - Bug 1065050 - Write to pushlog through extension API
/r/157 - Bug 1065050 - Move the pretxnchangegroup hook into the extension
/r/158 - Bug 1065050 - Move pushlog tests into extension
/r/159 - pushlog: use a context manager for sqlite connection
/r/160 - pushlog: add a test for `hg strip`
/r/161 - pushlog: handle changeset stripping
/r/163 - Bug 966545 - Add a test for rollback behavior
/r/164 - Bug 966545 - Refactor connection logic
/r/165 - Bug 966545 - Handle transaction rollback
/r/166 - Bug 966545 - Remove message about interrupting pushlog

Pull down these commits:

hg pull review -r 3811872b8b42dca5b33bb139a461036451146eb8
Comment on attachment 8486637 [details]
Review for review ID: bz://1065050/gps

/r/155 - Bug 1065050 - Rewrite pushlog as an extension
/r/156 - Bug 1065050 - Write to pushlog through extension API
/r/157 - Bug 1065050 - Move the pretxnchangegroup hook into the extension
/r/158 - Bug 1065050 - Move pushlog tests into extension
/r/159 - pushlog: use a context manager for sqlite connection
/r/160 - pushlog: add a test for `hg strip`
/r/161 - pushlog: handle changeset stripping
/r/163 - Bug 966545 - Add a test for rollback behavior
/r/164 - Bug 966545 - Refactor connection logic
/r/165 - Bug 966545 - Handle transaction rollback
/r/166 - Bug 966545 - Remove message about interrupting pushlog
/r/167 - pushlog: don't record changes in pushlog unless they came from a push
/r/168 - pushlog: support pulling pushlog data over wire protocol

Pull down these commits:

hg pull review -r 2b09bd7fef158634f4d1dd194ac91b2687d3a299
While that last commit implements automagical pushlog fetching over `hg pull`, the code is not robust enough to deploy in production. I'll finish it up on the other side of my holiday.

Ted, et al: if anyone wants to start reviewing patches, you have a few weeks before I get back :)
Product: Release Engineering → Developer Services
Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/72]
Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/72] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1281] [kanban:engops:https://kanbanize.com/ctrl_board/6/72]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1281] [kanban:engops:https://kanbanize.com/ctrl_board/6/72] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1286] [kanban:engops:https://kanbanize.com/ctrl_board/6/72]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1286] [kanban:engops:https://kanbanize.com/ctrl_board/6/72] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1287] [kanban:engops:https://kanbanize.com/ctrl_board/6/72]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1287] [kanban:engops:https://kanbanize.com/ctrl_board/6/72] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1292] [kanban:engops:https://kanbanize.com/ctrl_board/6/72]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1292] [kanban:engops:https://kanbanize.com/ctrl_board/6/72] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1292]
Attached file MozReview Request: bz://1065050/gps (obsolete) —
Attachment #8527968 - Flags: review?(mh+mozilla)
Attachment #8527968 - Flags: review?(bkero)
/r/953 - Bug 1065050 - Rewrite pushlog as an extension
/r/955 - Bug 1065050 - Write to pushlog through extension API
/r/957 - Bug 1065050 - Move the pretxnchangegroup hook into the extension
/r/959 - Bug 1065050 - Move pushlog tests into extension
/r/961 - pushlog: add a test for `hg strip`
/r/963 - pushlog: add a test for rollback behavior
/r/965 - pushlog: don't record changes in pushlog unless they came from a push
/r/967 - pushlog: use a context manager for sqlite connection
/r/969 - pushlog: support pulling pushlog data over wire protocol

Pull down these commits:

hg pull review -r 56b9cc5c2225159d9d7da06f7b855004539ae299
https://reviewboard.mozilla.org/r/963/#review531

::: hgext/pushlog/tests/test-rollback.t
(Diff revision 1)
> +A failuring during the transaction should cause the pushlog to not

s/failuring/failure/
https://reviewboard.mozilla.org/r/953/#review559

Note, without looking at the other patches, if you're kind-of starting from scratch, why not make the pushlog a revlog with an additional index to map push head nodeids and pushlog entry? that would make a lot of things easier.

::: hgext/pushlog/__init__.py
(Diff revision 1)
> +        c.close()

You should at the very least to a try/finally. The current (old) code catches any exception that occurs in sqlite code, and you don't.
https://reviewboard.mozilla.org/r/955/#review561

::: hghooks/mozhghooks/pushlog.py
(Diff revision 1)
> -    createdb = False
> +        print('repository not properly configured; missing pushlog extension.')

while you're here, you should turn those prints into ui.write()
https://reviewboard.mozilla.org/r/957/#review563

::: hgext/pushlog/__init__.py
(Diff revision 1)
> +    ui.setconfig('hooks', 'pretxnchangegroup.pushlog', pretxnchangegrouphook)

You should add a fourth argument with the name of the extension.
https://reviewboard.mozilla.org/r/959/#review565

::: hgext/pushlog/tests/test-basic.t
(Diff revision 1)
> +  $ . $TESTDIR/hghooks/tests/common.sh

I didn't look at this patch because the diff doesn't contain the move information that would make it readable in reviewboard (presumably).
https://reviewboard.mozilla.org/r/967/#review573

::: hgext/pushlog/__init__.py
(Diff revision 1)
>          c.close()

This close should disappear.
https://reviewboard.mozilla.org/r/969/#review575

::: hgext/pushlog/__init__.py
(Diff revision 1)
> +            res = c.execute('SELECT id, user, date, rev, node from pushlog '

might as well a "group by" clause with group_concat(node,'') (which sqlite supports), which will give you exactly one row per push. Then you can split that column every 40 characters.

::: hgext/pushlog/__init__.py
(Diff revision 1)
> +                node = row[4].encode('utf-8')

You should encode nodeids as as ascii

::: hgext/pushlog/__init__.py
(Diff revision 1)
> +            for row in res:

sqlite results can't be destructured?

::: hgext/pushlog/__init__.py
(Diff revision 1)
> -        c.close()
> +            for push in pushes:

destructure?

::: hgext/pushlog/__init__.py
(Diff revision 1)
> +                    current = (pushid, who, row[2], [bin(node)])

What's the point of making this function return nodes in the 20 byte binary form if the only consumer then turns that to the 40 bytes hex form?

::: hgext/pushlog/__init__.py
(Diff revision 1)
> +                ' '.join(hex(n) for n in push[3])))

You can skip the spaces between nodeids from a same push if you want to remove a few bytes from the wire. nodeids are always 40 bytes, so you can split every 40 bytes.

::: hgext/pushlog/__init__.py
(Diff revision 1)
> +    if lines[0] == '0':

Isn't 0 a valid pushid? Use a value that is not a valid pushid, like a plain string ("error"?)

::: hgext/pushlog/__init__.py
(Diff revision 1)
> +    lines = pullop.remote._call('pushlog', firstpush='%d' % fetchfrom)

str(fetchfrom)

::: hgext/pushlog/__init__.py
(Diff revision 1)
> +        nodes = [bin(n) for n in nodes.split()]

You could bin() in recordremotepushes for repo[n], and keep the node in hex form here. Then you don't need ctx.hex()

::: hgext/pushlog/__init__.py
(Diff revision 1)
> +    for line in lines[1:]:

You could avoid a list copy (which, on a first pull of, say, m-c, would be may tens of thousands items) by doing something like:
lines = iter(lines.splitlines())
line = lines.next()
if line == ...

pushes = []
for line in lines:
    ...
https://reviewboard.mozilla.org/r/953/#review579

Somewhere on my machine I have code for doing this. I remember writing it in Tokyo last December while we were meeting there :)

I agree that this would solve a lot of problems. However, replacing pushlog with a different backend bloats scope to changing pushlog-feed.py and upgrading all the pushlogs on the server to the new storage format. I would rather avoid that extra work at this time.

My goal with this patch series (and subsequent patches) is to get pushlog in a slightly better state while preserving backwards compatibility. Once we've converted to a modern and well-tested architecture, we can move forward with a BC incompatible change.

> You should at the very least to a try/finally. The current (old) code catches any exception that occurs in sqlite code, and you don't.

Good catch. Although I think I address this in a subsequent patch where I convert the db connection to a context manager.
https://reviewboard.mozilla.org/r/953/#review581

> Good catch. Although I think I address this in a subsequent patch where I convert the db connection to a context manager.

I'll just fix this in the next push.
https://reviewboard.mozilla.org/r/957/#review587

> You should add a fourth argument with the name of the extension.

Oh, I didn't realize that was a convention. Congratulations on out-hg'ing me :)
https://reviewboard.mozilla.org/r/969/#review591

> What's the point of making this function return nodes in the 20 byte binary form if the only consumer then turns that to the 40 bytes hex form?

Most Mercurial internal APIs return 20 byte raw nodes. If I had my way, we'd be using blobs in the database and storing 20 bytes raw instead of 40 byte hex. But backwards compatibility.

I agree it is a little weird to do the double conversion. I can probably get away with changing it since all consumers should be in version-control-tools and it will be easy enough to change the API later.

> sqlite results can't be destructured?

They can. I'll fix this.

> You should encode nodeids as as ascii

I agree: that makes more sense and may prevent a footgun if we accidentally store wrong values in the store.
https://reviewboard.mozilla.org/r/969/#review595

> You could avoid a list copy (which, on a first pull of, say, m-c, would be may tens of thousands items) by doing something like:
> lines = iter(lines.splitlines())
> line = lines.next()
> if line == ...
> 
> pushes = []
> for line in lines:
>     ...

Good idea. I'll refactor this.
https://reviewboard.mozilla.org/r/969/#review597

> might as well a "group by" clause with group_concat(node,'') (which sqlite supports), which will give you exactly one row per push. Then you can split that column every 40 characters.

That sounded like a good idea until I read the SQLite manual:

    The group_concat() function returns a string which is the concatenation of all non-NULL values of X. If parameter Y is present then it is used as the separator between instances of X. A comma (",") is used as the separator if Y is omitted. The order of the concatenated elements is arbitrary.
    
"arbitrary order" makes this a deal breaker.

Well, not realy. We could do something fancy with revs or use Mercurial to turn the nodes back into revs. But at that point, why bother writing a more clever SQL query?
/r/953 - Bug 1065050 - Rewrite pushlog as an extension
/r/955 - Bug 1065050 - Write to pushlog through extension API
/r/957 - Bug 1065050 - Move the pretxnchangegroup hook into the extension
/r/959 - Bug 1065050 - Move pushlog tests into extension
/r/961 - pushlog: add a test for `hg strip`; r=glandium
/r/963 - pushlog: add a test for rollback behavior; r=glandium
/r/965 - pushlog: don't record changes in pushlog unless they came from a push
/r/967 - pushlog: use a context manager for sqlite connection
/r/969 - pushlog: support pulling pushlog data over wire protocol

Pull down these commits:

hg pull review -r 20cbe3e2e9256ecfed56e3d045083dcf668b4521
https://reviewboard.mozilla.org/r/951/#review599

I think I hit every review comment one way or another.

I added r=glandium to the patches that received ship it's that I haven't changed. There was 1 patch you granted ship it on that I subsequently changed and want re-review on.
https://reviewboard.mozilla.org/r/955/#review751

::: hgext/pushlog-legacy/tests/helpers.sh
(Diff revision 2)
> +pushlog = $TESTDIR/hgext/pushlog

Did you mean to add this here?
https://reviewboard.mozilla.org/r/959/#review757

There still is no mercurial move information in this patch. Still not looking at it.
https://reviewboard.mozilla.org/r/969/#review763

::: hgext/pushlog/__init__.py
(Diff revisions 1 - 2)
> +    The first line starts with "0" or "1". This indicates success. If the line

How about making that number the number of pushes to expect subsequently?

::: hgext/pushlog/__init__.py
(Diff revisions 1 - 2)
>          nodes = [bin(n) for n in nodes.split()]

You say recordpushes can take both bin and hex form, why convert to bin here?
Comment on attachment 8527968 [details]
MozReview Request: bz://1065050/gps

Still one patch I haven't looked at (see comment 37)
Attachment #8527968 - Flags: review?(mh+mozilla)
https://reviewboard.mozilla.org/r/959/#review895

I checked out the code per RB's instructions and hg diff -c'd it myself. The file was deleted and moved into a new file with the correct contents. Marking shipit.
(In reply to Ben Kero [:bkero] from comment #43)
> https://reviewboard.mozilla.org/r/959/#review895
> 
> I checked out the code per RB's instructions and hg diff -c'd it myself. The
> file was deleted and moved into a new file with the correct contents.
> Marking shipit.

That's still not a desirable mercurial history, though, since mercurial likes to track renames manually...
https://reviewboard.mozilla.org/r/955/#review933

> Did you mean to add this here?

Yes.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Blocks: 1112415
Attachment #8527968 - Flags: review?(bkero) → review+
Attachment #8527968 - Attachment is obsolete: true
Attachment #8618330 - Flags: review+
Attachment #8618331 - Flags: review+
Attachment #8618332 - Flags: review+
Attachment #8618333 - Flags: review+
Attachment #8618334 - Flags: review+
Attachment #8618335 - Flags: review+
Attachment #8618336 - Flags: review+
Attachment #8618337 - Flags: review+
Attachment #8618338 - Flags: review+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: