Closed Bug 506949 Opened 15 years ago Closed 13 years ago

Enforce a set of allowed commit messages as hg precommit hooks on hg.m.o

Categories

(Developer Services :: Mercurial: hg.mozilla.org, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: ehsan.akhgari, Assigned: jruderman)

References

()

Details

Attachments

(4 files, 3 obsolete files)

It was suggested in a dev.planning thread that we use a precommit hook on hg.mozilla.org in order to force correctly formatted commit comments.  We probably need multiple regex's to cover various possible comments.

Here are some crude examples:

For normal bugs:

/^Bug\s+\d+\s+-\s+(.*)\s*r=[^\s]+(\s+sr=[^\s]+)?(\s+a\d+=[^\s]+)?(\(NPOTB\))?$/

Note that the summary field is intentionally optional for security sensitive bugs.

Backouts:

/^Backed out changeset [\da-f]{12}.*\b[bB]ug\s+\d+/
/^Backout\s+[Bb]ug\s+\d+/

Merges:

/^Merge /

Custom messages for things not covered about (example: branching):

/\(xxx\)$/

I'm not sure what to replace xxx above with.  :-)
Ah, and also:

/bustage fix/i
This seems really hard to get right with the wide variety of things people wind up committing. I'm open to ideas though! It would be interesting to take any proposed set of regexes and run them against the history of m-c commits (at least back to the end of cvs mirroring) and see if there are exceptions that we'd need to handle.
Are we in particular only requiring these rules on the last commit of each push? How would you allow e.g. tracemonkey->m-c merges for which a bug# and review annotations are irrelevant (because they are contained within the original commit when it was pushed to TM)?
If we go down the route of enforcing a specific style of comments, we need to make sure the release automation scripts either follow the enforced style or to exclude what that does in the hook.
CCing bhearsum, who probably knows those scripts best - Ben, note that there's no decision on doing this yet, we just shouldn't be caught off-guard if it's actually done.
Yeah, I'm watching this.
Man, do we assign a prize for Most Useless Bug yet? ;-)
It's certainly not useless. There have been plenty of patches committed without bug numbers or reviewers specified (or either, there are a few commits with just the MQ patch notation in them).
Hopefully the warning that hg 1.3 throws when you qfinish a patch that has no message will start to help there soon.
OK, I do think we have consensus to do this (finally!).

We're going to proceed with the plan Jesse laid out in bug 660705 comment 0.
If we're going with the one-off excluding of certain users for releng purposes (which I still find too fragile) then we need to add at least seabld and (I think) tbirdbld as well.
I agree with comment 2
(In reply to Bug 660705 comment #0)
> Commits without bug numbers mess with regression hunting, regression filing,
> backouts, and patch attribution.
> 
> I also rely on being able to get a list of bugs from hg, in order to write
> The Burning Edge, identify fixes that need fuzzing or security review, and
> identify fixes that need Feature Pages.
> 
> Allow commits (on mozilla-central and feeder branches) only if they:
> * Have "bug" or "b=" followed by a bug number
> * Have "no bug"
> * Start with
>     (back out|backing out|backed out|backout)( rev| changeset|)? [0-9a-f]{12}
> * Start with "merge"
> * Come from user "ffxbld"

Allow me to suggest some changes/additions:

* Have "bug" or "b=" followed by a bug number
* Have "no bug"
* Start with
    (back out|backing out|backed out|backout)( rev| changeset| cset)? [0-9a-f]{12}
* Starts with (back out|backing out|backed out|backout)( bug | b=)\d{1,9}
* has any cset in push that starts with "merge"
** Change to starts with "merge" for csets committed in a datetime of [Month Deployed + 3] or later [to account for local commits, where changing a commit message is hard, and for project branches/user repos that are created after this goes into affect]
* Come from user "ffxbld"
* Come from user "seabld"
* Come from user "tbirdbld"
* Come from user "cltbld"
* All tests above should be compared case insen.

In regard to the ONE regex here that refers to a rev, please see the following Hg Help for revisions, which I have cited below. I expect us to match a valid rev here without fail!

> A 40-digit hexadecimal string is treated as a unique revision identifier.
>
> A hexadecimal string less than 40 characters long is treated as a unique
> revision identifier and is referred to as a short-form identifier. A
> short-form identifier is only valid if it is the prefix of exactly one
> full-length identifier.
(In reply to comment #12)
> I agree with comment 2

Jesse has done that, see the thread.
(In reply to comment #14)
> (In reply to comment #12)
> > I agree with comment 2
> 
> Jesse has done that, see the thread.

No, it was done with an awfully restrictive 2 weeks.

Which is not all that much data, in reality.

It also misses the release-repo's which might matter for sake of argument here.

I propose:

pull m-c, m-aurora, m-beta, m-release, m-1.9.2, m-1.9.1, and m-1.9.0 to the same local repo.

Run this test across at least a year of data, to see if we need to tweak the regex's before we actually deploy this hook.
(In reply to comment #15)
> (In reply to comment #14)
> > (In reply to comment #12)
> > > I agree with comment 2
> > 
> > Jesse has done that, see the thread.
> 
> No, it was done with an awfully restrictive 2 weeks.

Whatever we come up with, we're bound to forget a case.  We can always modify the hook, and the hook includes switch off triggers such as "no bug" anyways.

> Which is not all that much data, in reality.

Compared to my previous guesswork, it is!

> It also misses the release-repo's which might matter for sake of argument here.

We don't need to deploy this hook on release repos.

> Run this test across at least a year of data, to see if we need to tweak the
> regex's before we actually deploy this hook.

I'm not sure what the merit of doing this would be.  The next person could ask for 13 months of data.  There is no reason for us to assume that 1 year is enough.
(In reply to comment #16)
> (In reply to comment #15)
> > (In reply to comment #14)
> > > (In reply to comment #12)
> > > > I agree with comment 2
> > > 
> > > Jesse has done that, see the thread.
> > 
> > No, it was done with an awfully restrictive 2 weeks.
> 
> Whatever we come up with, we're bound to forget a case.  We can always
> modify the hook, and the hook includes switch off triggers such as "no bug"
> anyways.

We can always modify the hook yes, but we must (a) modify, (b) get review, (c) test, (d) get IT to deploy

Which has a definite turnaround involved, which could unnecessarily block someones work (especially someone not great with hg to know about the qimport -r tip trick)

> > Which is not all that much data, in reality.
> 
> Compared to my previous guesswork, it is!

It is, but c#2 mentioned all of hg csets, and you implied that was done. I was just pointing out it was not.

> > It also misses the release-repo's which might matter for sake of argument here.
> 
> We don't need to deploy this hook on release repos.

Any repo that mirrors m-c, should have the m-c type hooks deployed. There is the SAME issue with bisecting in release repos as beta/aurora/etc. repos. and we already decided in the thread that we want this everywhere there is a m-c-based repo.

> > Run this test across at least a year of data, to see if we need to tweak the
> > regex's before we actually deploy this hook.
> 
> I'm not sure what the merit of doing this would be.  The next person could
> ask for 13 months of data.  There is no reason for us to assume that 1 year
> is enough.

I don't think even all of hg would be enough [to make this perfect] since there will always be cases we miss. I do however, feel 2 weeks is FAR FAR too few to have a reasonable reliability involved in getting this deployed. but for my proposal of 1 year, I weighed the aspects of time vs data in my mind on what would produce a useable set of data that we can skim for patterns that could/should change the above regex's.

I don't think we need to deeply investigate why a certain cset was a MQ header, etc. but just a look around of "is this rejected cset message one that should have matched but doesn't"
To make it more clear, I have no objection for you to convince Jesse to run his analysis on a larger chunk of time.  I just think that it won't catch all cases, but since you seem to know that already, I'll stop being grumpy about it.  :-)
Comment on attachment 539660 [details]
starter perl script to check comments against a set of regular expressions

we need to whitelist the specific releng users.
Attachment #539660 - Attachment is patch: false
> * Start with
>     (back out|backing out|backed out|backout)( rev| changeset| cset)?
> [0-9a-f]{12}

I'll add "cset". I've also added an allowance for plurals.

> * Starts with (back out|backing out|backed out|backout)( bug | b=)\d{1,9}

Isn't this redundant with allowing commits that have bug numbers?

> * has any cset in push that starts with "merge"
> ** Change to starts with "merge" for csets committed in a datetime of [Month
> Deployed + 3] or later [to account for local commits, where changing a
> commit message is hard, and for project branches/user repos that are created
> after this goes into affect]

That's one solution, but I like Ehsan's better: allow the tip to contain a magic phrase that overrides the hook for the rest of the commit.  This allows new experimental branches to operate in experimental mode until they prepare to merge.  Once a branch is no longer experimental (e.g. it has merged to mozilla-central) it should have the hook enabled.

> I expect us to match a valid rev here without fail!

In practice, we use 12-digit or 40-digit rev ids, so requiring 12+ digits shouldn't cause problems.  Accepting shorter ones would make future conflicts likely.
Updated taking into account feedback from bz, njn, pbiggar, and callek. Seems to have strong consensus behind it.
(In reply to comment #21)
> That's one solution, but I like Ehsan's better: allow the tip to contain a
> magic phrase that overrides the hook for the rest of the commit.

That magic phrase thing would also allow us to get rid of those one-off users for releng, if we'd just patch the comments of releng commits to use that (like we already do with the |CLOSED TREE a=release| in those).
RelEng bots could use "no bug", but they wouldn't be able to use the other magic phrase, which only applies to the *other* commits in a push.
Actually, "no bug" probably makes really sense there in any case. We should file a bug that they should add that.
(In reply to comment #25)
> Actually, "no bug" probably makes really sense there in any case. We should
> file a bug that they should add that.

For the record, I'd like the "specific user" stuff baked in from the start (if its not too complex) to make sure that we don't have a block on this from multiple sources. (such as all the various releng teams updating repo's to production with that magic)
(In reply to comment #26)
(hit submit too early)...

The specific user stuff we can drop at a point in time later when all releng users do "no bug" like we think makes the most sense.
Jesse,

I just caught/remember another case that I think warrants mention:

http://hg.mozilla.org/releases/mozilla-release/rev/921e788dbe19

|Automated blocklist update from host moz2-linux-slave34|

Do we want to (a) allow that message as it stands, (b) anotate as "no bug", (c) Enforce doing this with bugs, (d) something else. or (e) one of a or b with a followup discussion to decide a better way forward?
(In reply to comment #28)
> Do we want to (a) allow that message as it stands, (b) anotate as "no bug",
> (c) Enforce doing this with bugs, (d) something else. or (e) one of a or b
> with a followup discussion to decide a better way forward?

For one thing, it's "ffxbld" so triggers the "some users are special" clause, but for the other, I think it should have "no bug".
Assignee: nobody → jruderman
Attachment #546087 - Flags: review?(ted.mielczarek)
Fewer than 4% of commits rejected. Most were mistakes or bustage fixes.

(This is the result of running the script on its own, instead of as a hook, and then stripping out email addresses and adding a summary table.)
I tested this as a pretxnchangegroup hook, pushing from one local repo to another, and it worked great. (I had to set PYTHONPATH for local testing.)
Comment on attachment 546088 [details]
list of commits rejected in the last 120 days

>Rev 3ce0d40d9eba needs a bug number.
>Neil Rashbrook
>Fix DOS line endings rs=joedrew!
>DONTBUILD
>
>Rev d6de4c330619 needs a bug number.
>Neil Rashbrook
>Fix DOS line endings rs=sdwilsh
>DONTBUILD
So, what do I do here, just file a generic "DOS line endings creeping in to the tree on an all-too-regular basis" bug?
(In reply to comment #33)
> So, what do I do here, just file a generic "DOS line endings creeping in to
> the tree on an all-too-regular basis" bug?

You also can add "no bug" to the comment - the hook accepts this as it's clear you didn't inadvertently forget to add a bug number ;-)
Or we could add another hook enforcing line endings ;)
Comment on attachment 546087 [details] [diff] [review]
patch adding good_commit_messages.py

Review of attachment 546087 [details] [diff] [review]:
-----------------------------------------------------------------

We're going to need a wiki doc somewhere describing the requirements here. Can you write that, or get someone to write it?

::: mozhghooks/good_commit_messages.py
@@ +1,1 @@
> +#!/usr/bin/env python

All the hook scripts here are GPL just for convenience (since Mercurial is GPL), please put a license header here. (You can copy from pushlog.py.) Also if you could add a short comment describing the purpose of the script, that'd be great.

@@ +4,5 @@
> +from mercurial.node import hex
> +
> +# Before enabling this hook on mozilla-central (and feeder repos), this should be
> +# set to a date in the near future.
> +DATE_HOOK_ENABLED = 1300331446 # March 2011 (CHANGE ME)

Is this just to make sure we don't reject pushes from repos merging to m-c after the hook was enabled? Can we instead just be more lenient about merge pushes (where the tip changeset has >1 parent), and not sanity check all the child changesets, assuming they were already checked when they were originally pushed?

@@ +54,5 @@
> +            continue
> +        elif not okCommit(c):
> +            rejecting = 1
> +            # Keep looping so the pusher sees all commits they need to fix.
> +        if "IGNORE BAD COMMIT MESSAGES" in c.description():

Is there a reason we need this safety valve?

@@ +62,5 @@
> +
> +
> +# TESTS
> +
> +def testMessageRegexen():

Can you move these tests into the existing test harness in runtests.py?
http://hg.mozilla.org/hgcustom/hghooks/file/60fe3c851eb2/runtests.py
Attachment #546087 - Flags: review?(ted.mielczarek) → review+
I'm a little late to the party, but should this reject commits that otherwise pass the tests but contain the string "try:"?
> @@ +4,5 @@
> > +from mercurial.node import hex
> > +
> > +# Before enabling this hook on mozilla-central (and feeder repos), this should be
> > +# set to a date in the near future.
> > +DATE_HOOK_ENABLED = 1300331446 # March 2011 (CHANGE ME)
> 
> Is this just to make sure we don't reject pushes from repos merging to m-c
> after the hook was enabled?

Yes, along with the other direction (m-c merging to other repos).

> Can we instead just be more lenient about merge
> pushes (where the tip changeset has >1 parent), and not sanity check all the
> child changesets, assuming they were already checked when they were
> originally pushed?

Merge changesets are also used for other things.  Commits don't carry metadata stating which repositories they have been pushed to, and trying to figure that out while running a hook seems difficult.

> @@ +54,5 @@
> > +            continue
> > +        elif not okCommit(c):
> > +            rejecting = 1
> > +            # Keep looping so the pusher sees all commits they need to fix.
> > +        if "IGNORE BAD COMMIT MESSAGES" in c.description():
> 
> Is there a reason we need this safety valve?

It's primarily for allowing an experimental-rapid-informal branch to become a serious project branch once it becomes clear that the experiment is likely to succeed and merge to central.  We could force type inference to rewrite its commit history, but that would cause a bucket of other problems.
Thanks for the explanations. If you address the other comments I added, I'm fine with this landing and being enabled.

(In reply to comment #37)
> I'm a little late to the party, but should this reject commits that
> otherwise pass the tests but contain the string "try:"?

Jesse already thought of that, his patch implements that.
What's about rs=, this has been used in the tracemonkey branch for "no bug" stuff.
(In reply to comment #40)
> What's about rs=, this has been used in the tracemonkey branch for "no bug"
> stuff.

Stands for "rubber-stamp", see bug 650621 comment 6 for an example.
Comment on attachment 546087 [details] [diff] [review]
patch adding good_commit_messages.py

Review of attachment 546087 [details] [diff] [review]:
-----------------------------------------------------------------

I know i am not the reviewer here, just some things i noticed.

::: mozhghooks/good_commit_messages.py
@@ +4,5 @@
> +from mercurial.node import hex
> +
> +# Before enabling this hook on mozilla-central (and feeder repos), this should be
> +# set to a date in the near future.
> +DATE_HOOK_ENABLED = 1300331446 # March 2011 (CHANGE ME)

datetime.date(2011, 3, 17)

@@ +13,5 @@
> +    re.compile('''(back out|backing out|backed out|backout).*( |\:)[0-9a-f]{12}.*'''),
> +    re.compile('''(revert|reverted|reverting).*( |\:)[0-9a-f]{12}.*'''),
> +    re.compile('''update nanojit-import-rev stamp.'''),
> +    re.compile('''add(ed|ing)? tag.*'''),
> +    re.compile('''.*no bug.*'''),

Using triple quotes seems to unnecessary here.

@@ +18,5 @@
> +]
> +
> +def okCommit(c):
> +    def message(fmt):
> +        print fmt % hex(c.node())[0:12] + "\n" + c.user() + "\n" + c.description() + "\n"

Just my personal preference here.
"\n".join([fmt.format(rev=hex(c.node())[:12]), c.user(), c.description()])

Also you could put some new lines after this method (some other places, too)

@@ +23,5 @@
> +    d = c.description()
> +    dlower = d.lower()
> +    if c.user() in ["ffxbld", "seabld", "tbirdbld", "cltbld"]:
> +        return True
> +    if "try: -" in d:

This seems to be very restrictive, isn't it possible to trigger try eg. with "try:     -a" ?

@@ +24,5 @@
> +    dlower = d.lower()
> +    if c.user() in ["ffxbld", "seabld", "tbirdbld", "cltbld"]:
> +        return True
> +    if "try: -" in d:
> +        message("Rev %s uses try syntax. (Did you mean to push to Try instead?)")

would be message("Rev {rev} uses try syntax. (Did you mean to push to Try instead?)")

@@ +54,5 @@
> +            continue
> +        elif not okCommit(c):
> +            rejecting = 1
> +            # Keep looping so the pusher sees all commits they need to fix.
> +        if "IGNORE BAD COMMIT MESSAGES" in c.description():

This should have some extra comment. And why isn't this in okCommit?
It looks like the triple quotes were to avoid having to double-escape backslashes, maybe.  If so, Python has raw strings for this purpose:

  r"a\b" == "a\\b"
I poked Jesse about this in IRC a day or two ago, but I sometimes have a problem pushing patches which I haven't updated with r= information, and I mark all patches that aren't complete or haven't been reviewed with "NOT REVIEWED YET".  Very rarely I forget to update the commit message with the proper r= information prior to pushing -- once in January 2009, and twice again this month (!).  Adding some sort of reasonably sensible phrase that would prevent a commit from happening would be nice, so I don't forget to add that r= information again.
What is the status here?
(In reply to comment #42)

> > +DATE_HOOK_ENABLED = 1300331446 # March 2011 (CHANGE ME)
> 
> datetime.date(2011, 3, 17)

I think it has to be numeric because it gets compared with numeric dates from hg.

> Using triple quotes seems to unnecessary here.

Seemed less error-prone than using r"x", but I could switch.

> > +def okCommit(c):
> > +    def message(fmt):
> > +        print fmt % hex(c.node())[0:12] + "\n" + c.user() + "\n" + c.description() + "\n"
> 
> Just my personal preference here.
> "\n".join([fmt.format(rev=hex(c.node())[:12]), c.user(), c.description()])
> 
> Also you could put some new lines after this method (some other places, too)

Ok. Your change leaves off the trailing line break; is that intentional?

> > +    if "try: -" in d:
> 
> This seems to be very restrictive, isn't it possible to trigger try eg. with
> "try:     -a" ?

Dunno. But it's possible to get past the try hook with just "try: "!

> > +        if "IGNORE BAD COMMIT MESSAGES" in c.description():
> 
> This should have some extra comment.

I guess I'll copy a few sentences from attachment 539708 [details].

> And why isn't this in okCommit?

It's not in okCommit because it applies only to the following commits (earlier in history).
Attached patch updated patch (obsolete) — Splinter Review
So Jesse was a bit short on time, so i asked him to finish this. I fixed the nits i had + the comments by Jesse.

I'm going to write wiki entry about what this patch changed. I also would like to propose, some standardized for "normal" commits, before writing this. Furthermore i would like to like to make some more useful hooks, like disallowing trailing whitespace, checking that the bug id exists etc.
Jesse suggested adding a feature so you could run this hook locally, and it would warn you about bad commits. Some of the proposals above would be nice to have for local commits in case we don't want to run them on the server.

This needs some tinkering (at least for me)
#> export PYTHONPATH=~/hooks # change this to the path of good_commit_message.py
Add this to your hgrc.
[hooks]
pretxncommit.goodCommitMessage = python:good_commit_message.hook
Attachment #546087 - Attachment is obsolete: true
Attachment #549476 - Flags: review?(ted.mielczarek)
Comment on attachment 549476 [details] [diff] [review]
updated patch

Clearing review till i got some other stuff done here.
Attachment #549476 - Flags: review?(ted.mielczarek)
So started the documentation. Not sure where this should really go and what needs to link to it. https://wiki.mozilla.org/User:Evilpie/commit-restrictions?action=purge. Next up is fixing the tests.
Attached patch v2 (obsolete) — Splinter Review
Attachment #549476 - Attachment is obsolete: true
Attachment #552886 - Flags: review?(ted.mielczarek)
Attachment #552886 - Flags: review?(jruderman)
Sorry that this took so long, i just ignored this for the most part, because i didn't want to look add tests. But it turns out that it is actually easy to do. Please also look at https://wiki.mozilla.org/User:Evilpie/commit-restrictions. I also don't know yet where to really put it, and what needs to be done when we finalize that. Comments on mailing list? I would also write a blog post, but i have none ^^
Comment on attachment 552886 [details] [diff] [review]
v2

>+    r"update nanojit-import-rev stamp.",

I guess you wanted to escape that "."
Comment on attachment 552886 [details] [diff] [review]
v2

Review of attachment 552886 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with some changes.

I skimmed the tests, thanks for adding them!

::: mozhghooks/commit-message.py
@@ +23,5 @@
> +# Before enabling this hook on mozilla-central (and feeder repos), this should be
> +# set to a date in the near future.
> +DATE_HOOK_ENABLED = time.mktime(datetime.datetime(2011, 3, 1).timetuple()) # March 2011 (CHANGE ME)
> +
> +okMessageRegexen = map(re.compile, [

This would be more Pythonic written as:
okMessageRegexen = [re.compile(r) for r in [...]]

@@ +41,5 @@
> +        print c.description()
> +        print ""
> +
> +    d = c.description()
> +    dlower = d.lower()

I think we could skip the .lower here and compile the regexes with re.I instead.

@@ +50,5 @@
> +        message("Rev {rev} uses try syntax. (Did you mean to push to Try instead?)")
> +        return False
> +
> +    for r in okMessageRegexen:
> +        if r.match(dlower):

If you switch this to r.search, you don't need the leading and trailing .* on all of the regexes. (If you really want to match the beginning of the string you can use ^.)

@@ +53,5 @@
> +    for r in okMessageRegexen:
> +        if r.match(dlower):
> +            return True
> +
> +    if dlower.startswith("merge") or dlower.startswith("merging") or dlower.startswith("Automated merge"):

This last condition can't ever be true, since you're asking whether a lowercase string starts with an uppercase character.
Attachment #552886 - Flags: review?(ted.mielczarek) → review+
Attached patch v3Splinter Review
Good catches, thanks.
Attachment #553220 - Flags: review?(ted.mielczarek)
Attachment #553220 - Flags: review?(ted.mielczarek) → review+
Could we just use that now and improve it later?
Blocks: 692932
Attachment #552886 - Attachment is obsolete: true
Attachment #552886 - Flags: review?(jruderman)
Blocks: 698221
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Product: Release Engineering → Developer Services
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: