Open Bug 787620 Opened 9 years ago Updated 3 months ago

Enforce correct formatting of user name & email address in mercurial repositories

Categories

(Developer Services :: Mercurial: hg.mozilla.org, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

REOPENED

People

(Reporter: nbp, Assigned: gps)

References

()

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/740] )

Attachments

(1 file)

Add hooks to all mercurial repositories to avoid future malformed user name and email addresses.  This will reduce the number of potential downtime of git mirrors (mountain view and torronto) knowing that eshan repository is used to keep b2g in sync with mozilla-central changes.
Product: mozilla.org → Release Engineering
Product: Release Engineering → Developer Services
Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/295]
Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/295] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/740] [kanban:engops:https://kanbanize.com/ctrl_board/6/295]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/740] [kanban:engops:https://kanbanize.com/ctrl_board/6/295] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/740]
glandium: you were mentioning issues round-tripping Git<->Hg author names. We should implement a hook to ensure things are sane. Care to provide details and/or code?
Flags: needinfo?(mh+mozilla)
Git<->Hg round-trip is not really interesting for us. The Hg->Git conversion is all that happens, and we never convert back that git repo to mercurial, so it doesn't really matter. People that do push from git to mercurial will have their commit done from git already, and valid git author names being a subset of valid mercurial author names, that just works.
That being said, the Hg->Git conversion does awful things to mercurial author names that git doesn't support, for purpose of round-tripping. The most awful thing it does is this:
http://git.mozilla.org/?p=integration/gecko-dev.git;a=commit;h=ce5a887c60aa5a2d9da107de82247f097fba68e8
(corresponding to http://hg.mozilla.org/mozilla-central/rev/761071f57ab6 )
http://git.mozilla.org/?p=integration/gecko-dev.git;a=commit;h=1c243ba23b0530160d9597ff4f193c10e7505c3a
(corresponding to http://hg.mozilla.org/mozilla-central/rev/d104c58ca967 )
which is an unreadable mess.

If you ask me, we should just reject on mercurial's end anything that doesn't match "name <e@mail>" (with exactly one < and one > in the whole expression ; fwiw, git removes any < and > that would be set in GIT_AUTHOR_NAME or GIT_AUTHOR_EMAIL or equivalent git config strings).
Flags: needinfo?(mh+mozilla)
Looks like low hanging fruit.
Assignee: nobody → gps
Status: NEW → ASSIGNED
QA Contact: hwine
hghooks: add hook to validate author field well-formedness (bug 787620); r?glandium

Mercurial isn't strict about what goes in the author/user field. It is
supposed to be an RFC 822 "NAME <EMAIL>" value. However, Mercurial by
itself will allow pretty much everything. Historically, this has caused
problems converting Mercurial repos to Git repos, as Git is more strict
about what goes in this field. The tools do normalize values to
something Git will accept. But it would be nice to avoid the problem in
the first place so there is parity between Mercurial and Git repos.

This commit introduces a new Mercurial hook that validates the
well-formedness of the author/user field on changesets. The hook has
been globally enabled on hg.mozilla.org.
Attachment #8683390 - Flags: review?(mh+mozilla)
Comment on attachment 8683390 [details]
MozReview Request: hghooks: add hook to validate author field well-formedness (bug 787620); r?glandium

https://reviewboard.mozilla.org/r/24311/#review21825

::: hghooks/mozhghooks/author_format.py:13
(Diff revision 1)
> +RE_PROPER_AUTHOR = re.compile('^.+\s?\<.+@.+\>$')

While "foo<foo@bar>" is a valid RFC822 address (no space), maybe we should enforce that there is one? At least it would look nicer.

::: hghooks/mozhghooks/author_format.py:29
(Diff revision 1)
> +        elif user.count('>') != 1:
> +            valid = False
> +        elif user.count('<') != 1:
> +            valid = False

One way to avoid those tests and thus simplify the function is to replace the dots in the regexp with [^<>]

::: hghooks/mozhghooks/author_format.py:62
(Diff revision 1)
> +             'username\n')

This could all be

ui.write(
    'user fields must be of the format.... \n'
    'e.g. ...\n'
    '...\n'
    ' hg up {parent} && hg graft --currentuser -r {first}::\n'
    '...\n'
    ' hg up {parent} && hg graft --user ... -r {first}::{tip}\n'
    '...\n'
    .format(
        parent=short(repo[node].p1().node()),
        first=short(repo[node].node()),
        tip=short(repo['tip'].node()),
    })
)

::: hghooks/tests/test-author-format.t:55
(Diff revision 1)
> +  saved backup bundle to $TESTTMP/client/.hg/strip-backup/d28cf3d35b22-3e561411-amend-backup.hg (glob)

Isn't it possible to disable backup bundles to avoid this output?
Attachment #8683390 - Flags: review?(mh+mozilla) → review+
https://reviewboard.mozilla.org/r/24311/#review21825

> Isn't it possible to disable backup bundles to avoid this output?

Yes. I was lazy. I'll add this in before landing.
https://reviewboard.mozilla.org/r/24311/#review21825

> Yes. I was lazy. I'll add this in before landing.

Apparently it isn't available on `hg commit --amend` :/
https://reviewboard.mozilla.org/r/24311/#review21825

> While "foo<foo@bar>" is a valid RFC822 address (no space), maybe we should enforce that there is one? At least it would look nicer.

I agree. I'll change this.

> One way to avoid those tests and thus simplify the function is to replace the dots in the regexp with [^<>]

Good idea.
https://hg.mozilla.org/hgcustom/version-control-tools/rev/9721f3f74db77a49b884cb57a72b899a2a6831d3
hghooks: add hook to validate author field well-formedness (bug 787620); r=glandium
Deploying now.

This /may/ impact our ability to clone repos on the server. Will investigate after deploy.
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
It doesn't impact `hg clone` because clone uses "pull" under the hood, which is excluded by the hook.
I'd be worried if this blocks the next merge-day type stuff, e.g. if anything (recent, say year) exists that would prevent those pushes from succeeding.
If it does then we'll deal with it then.
> +RE_PROPER_AUTHOR = re.compile('^[^<]+\s\<[^<>]+@[^<>]+\>$')

This makes "> <foo@bar>" valid.
So... this is going to be a problem very quickly, because of things like this:
http://hg.mozilla.org/mozilla-central/rev/05d85d9bbce4

`hg log --template '{rev} {author}\n' | sed '/[0-9]\+ [^<]\+\s<[^<>]\+@[^<>]\+>$/d'` for a full list of funny things. ffxbld is the only one from automation, from a quick glance.
I'm going to back this out because it has been causing issues.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
QA Contact: hwine → klibby

Just spotted this today:

changeset:   563343:10cf8563c5ed
user:        Akshat Dixit <Akshat Dixit> <Akshat Dixit>
date:        Fri Jan 15 11:17:29 2021 +0000

Have we moved away from the tools that caused problems (https://bugzilla.mozilla.org/show_bug.cgi?id=787620#c15)?

You need to log in before you can comment on or make changes to this bug.