Closed Bug 966545 Opened 10 years ago Closed 9 years ago

Interrupting a push with "Ctrl+c" can keep your (successful) push from making it into the pushlog

Categories

(Developer Services :: Mercurial: hg.mozilla.org, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dholbert, Assigned: gps)

References

Details

(Keywords: sheriffing-P2, Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/822] )

Attachments

(5 files, 1 obsolete file)

As described in bug 966523, I discovered today that it's possible to interrupt 'hg push' such that your cset *is* pushed, but doesn't make it into the pushlog.

Here's what I saw on my end:
{
    [dholbert@indigo:/scratch/work/builds/mozilla-inbound/mozilla]$ hg push
    pushing to ssh://hg.mozilla.org/integration/mozilla-inbound/
    searching for changes
    h^Cinterrupted!
    remote: adding changesets
    remote: adding manifests
    remote: adding file changes
    remote: added 1 changesets with 1 changes to 1 files
    [dholbert@indigo:/scratch/work/builds/mozilla-inbound/mozilla]$ 
}

and this ended up successfully making:
 http://hg.mozilla.org/integration/mozilla-inbound/rev/a4bd676baee6
without adding a pushlog entry -- i.e. this is blank:
 http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=a4bd676baee6
(and this cset was never displayed anywhere on TBPL or on the pushloghtml listing).

That probably shouldn't be allowed to happen. Filing this bug on investigating ways to prevent this sort of thing.
Summary: Interrupting a push with "Ctrl+c" can keep your push from making it into the pushlog → Interrupting a push with "Ctrl+c" can keep your (successful) push from making it into the pushlog
(To be clear, the expected behavior is that the whole push should either atomically fail or succeed, as a unit. It shouldn't be possible to get a push into the repo without also updating the pushlog.)
OS: Linux → All
Hardware: x86_64 → All
gps, will the proposed solution you made in bug 827123 prevent this?
This may be a side-effect of the current implementation of Mercurial's wire protocol and the choice of hook we are using for pushlog. Mercurial pushes currently consist of multiple commands to the server. If pushlog is being written in a command after the initial changeset push, a ctrl+c would result in changesets not being recorded by pushlog.

I'd need to know which hook we're using for recording pushlog on the server to diagnose this.

FWIW, the non-atomic Mercurial push issue should be resolved with "bundle format 2." It's unfortunately not the highest in the priority list for Mercurial. They would accept the patch if someone wrote it though.
Flags: needinfo?(bkero)
(In reply to Gregory Szorc [:gps] from comment #3)
> I'd need to know which hook we're using for recording pushlog on the server
> to diagnose this.

https://hg.mozilla.org/hgcustom/hghooks/file/tip/mozhghooks/pushlog.py
The source code of pushlog.py says nothing about which Mercurial hook we're running the code as part of.
So, we have two choices:

1) Rewrite pushlog to honor Mercurial's transaction semantics
2) Move pushlog to "changegroup" hook

I don't think #2 is a good idea because if pushlog fails, the transaction won't get rolled back and we'll have changesets in the repo with no pushlog.

I don't believe there's an easy way to hook into the rollback event from a hook. The preferred way to do this is to have pushlog implemented on top of Mercurial's storage API, which is transaction and rollback aware. We either need to build a custom SQLite adapter that talks Mercurial's storage API. Or, we just replace pushlog storage with revlogs or simple files, both of which can be rolled back.

From bug 827123, you know what my preferred solution is.

I may poke some Mercurial devs about hooking into rollback. I'm pretty sure they'll tell me to use the supported storage interface, however.
Component: General Automation → Repos and Hooks
QA Contact: catlee → hwine
Pushlog has been running as a 'changegroup' hook for some time now.

[root@hgssh1.dmz.scl3 ~]# grep pushlog /etc/mercurial/hgrc
changegroup.z_loghistory = python:mozhghooks.pushlog.log

It's proven very difficult for pushlog to fail, considering it's simply two entries in a sqlite database. The only time it has ever failed has been due to permission problems. That's only happened on some smaller autogenerated repositories (in example l10n).

As I understand things if an (out of process) hook returns anything but 0, the transaction is condered failed and the commit should be rolled back. Is this not the desired behavior?

One thing I'd suggest that might help (and has been suggested to us several times by the Mercurial authors) is to switch to out-of-process hooks. The original reason we used the in-process python hooks is that we required some information that was impossible to obtain through the environment variables available to out-of-process hooks. This is no longer the case. This could help make things clearer for the rollback scenario.
Flags: needinfo?(bkero)
16:07 <paolo> edmorley|sheriffduty: I hit CTRL+C during my last push to fx-team and it doesn't appear in the hg pushlog,
16:07 <paolo> despite it shows up in the list of revisions at https://hg.mozilla.org/integration/fx-team/summary
16:07 <paolo> any idea on how to fix this?
16:07 <paolo> or just wait for the next push?
16:09 <edmorley|sheriffduty> paolo: it requires a manual DB edit; I've closed fx-team and there hasn't been another push since so we should be fine
16:09 <edmorley|sheriffduty> paolo: in general ctrl+c when pushing is discouraged fwiw
16:09 <paolo> I see :-)
16:09 <edmorley|sheriffduty> paolo: though the push should be happening as one transaction in an ideal world :-/
Keywords: sheriffing-P2
Depends on: 1046802
Depends on: 1048063
I fixed this properly by rewriting pushlog as an extension. See bug 1065050 and https://reviewboard-dev.allizom.org/r/154/. I opted for the quick and dirty hack that allows pushlog to continue using sqlite. We can migrate pushlog to something else later. Scope bloat, etc.

Unfortunately, I'm going on PTO tomorrow and this change requires upgrading to the server to Mercurial 3.0. I may be able to backport to 2.5.4 to get it out sooner. Either way, I'd like to be around for the deployment in case things go wrong.
Assignee: nobody → gps
Status: NEW → ASSIGNED
Product: Release Engineering → Developer Services
Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/129]
Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/129] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/822] [kanban:engops:https://kanbanize.com/ctrl_board/6/129]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/822] [kanban:engops:https://kanbanize.com/ctrl_board/6/129] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/822]
Attached file MozReview Request: bz://966545/gps (obsolete) —
Attachment #8541030 - Flags: review?(mh+mozilla)
/r/1705 - pushlog: add verifypushlog command
/r/1707 - Bug 966545 - Refactor connection logic
/r/1709 - Bug 966545 - Handle transaction rollback
/r/1711 - Bug 966545 - Remove message about interrupting pushlog

Pull down these commits:

hg pull review -r b231f893ca77edfdf1edc6f3bac735da4dfc6d7e
https://reviewboard.mozilla.org/r/1705/#review1093

::: hgext/pushlog/__init__.py
(Diff revision 1)
>                  who = who.encode('utf-8')

While I see this, and since I know that our mercurial history contains non-utf-8 authors, you should check this code (or higher in the chain) doesn't die in that case.

::: hgext/pushlog/__init__.py
(Diff revision 1)
> +        for pushid, who, when, nodes in self.pushes():

for pushcount, (pushid, who, ...) in enumerate(self.pushes()):

::: hgext/pushlog/tests/test-verify.t
(Diff revision 1)
> +  Inserted into the pushlog db successfully.

For good measure, you should have a push with multiple changesets.

::: hgext/pushlog/__init__.py
(Diff revision 1)
> +                    ui.warn('pushlog entry has no nodes: %s\n' % pushid)

Looks like this should be a if not nodes test outside the loop.

::: hgext/pushlog/tests/test-verify.t
(Diff revision 1)
> +  pushlog entry has no nodes: 2

This would read better if the pushid was prefixed with #.
https://reviewboard.mozilla.org/r/1709/#review1097

::: hgext/pushlog/tests/test-rollback.t
(Diff revision 1)
> +  fake hook success

Nice. I would feel better with a test for a client-initiated abort.
Attachment #8541030 - Flags: review?(mh+mozilla)
https://reviewboard.mozilla.org/r/1705/#review1107

> While I see this, and since I know that our mercurial history contains non-utf-8 authors, you should check this code (or higher in the chain) doesn't die in that case.

The `who` here is from the USER variable from the SSH context. That variable in turn comes from your LDAP username. I'm 99% confident that these are ASCII in our systems.

Still, adding a test at insertion time can't hurt.
Attachment #8541030 - Flags: review?(mh+mozilla)
/r/1705 - pushlog: add verifypushlog command
/r/1707 - pushlog: refactor connection logic; r=glandium
/r/1709 - pushlog: handle transaction rollback (bug 966545); r=glandium
/r/1711 - pushlog: remove message about interrupting pushlog; r=glandium
/r/1737 - pushlog: ensure usernames are utf-8

Pull down these commits:

hg pull review -r baeed19df210d7a800ea92195790be20beaf8b3b
https://reviewboard.mozilla.org/r/1705/#review1117

::: hgext/pushlog/__init__.py
(Diff revisions 1 - 2)
> -                (len(seennodes), pushcount))
> +                (len(seennodes), pushcount + 1))

You can use enumerate(iterator, 1) if you want pushcount to start from 1 instead of adding 1 afterwards.

::: hgext/pushlog/__init__.py
(Diff revisions 1 - 2)
> -                (len(seennodes), pushcount))
> +                (len(seennodes), pushcount + 1))

You can use enumerate(iterator, 1) if you want pushcount to start from 1 instead of adding 1 afterwards.

::: hgext/pushlog/__init__.py
(Diff revisions 1 - 2)
> -        for pushid, who, when, nodes in self.pushes():
> +        pushcount = -1

I understand you're setting this value to make the status message happy when the loop doesn't run, but I'd argue the status message doesn't make sense when there are no pushes in the pushlog.
https://reviewboard.mozilla.org/r/1711/#review1119

::: hgext/pushlog/__init__.py
(Diff revision 2)
> -    ui.write('Please do not interrupt...\n')

Note this message was useless anyways, since none of this shows up on the client until after everything is finished. http://bz.selenic.com/show_bug.cgi?id=4248
Attachment #8541030 - Flags: review?(mh+mozilla) → review+
https://reviewboard.mozilla.org/r/1705/#review1199

> You can use enumerate(iterator, 1) if you want pushcount to start from 1 instead of adding 1 afterwards.

I guess I learn something about Python every day!
Needinfo me to deploy this after the new year and network freeze.
Flags: needinfo?(gps)
Since we're dealing with a pushlog-related issue when creating new repos in bug 1116328 and it looks like a deploy of that patch is imminent, I went ahead and pushed these changes to production. That way we test different code changes independently so we can isolate any regressions (that we're unlikely to encounter).

To recap, these patches make pushlog insertion follow the same transaction semantics as Mercurial itself. The SQLite transaction is committed iff the Mercurial transaction is committed. The SQLite transaction is rolled back if the Mercurial transaction is not committed. With this deployment, we should not lose pushlog inserts if a push is interrupted. With this deployment, we should not have extra pushlog entries for pushes that were aborted and rolled back.

Finally.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Flags: needinfo?(gps)
Resolution: --- → FIXED
Thanks for fixing this! I never understood enough Mercurial internals to figure out how to do the right thing here, sadly.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #27)
> Thanks for fixing this! I never understood enough Mercurial internals to
> figure out how to do the right thing here, sadly.

In your defense, I think Mercurial lacked the proper extension points to hook into the transaction API until recently. This patch would have been very painful to write against Mercurial 2.5.4. It's not your fault.
Attachment #8541030 - Attachment is obsolete: true
Attachment #8618068 - Flags: review+
Attachment #8618069 - Flags: review+
Attachment #8618070 - Flags: review+
Attachment #8618071 - Flags: review+
Attachment #8618072 - Flags: review+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: