Closed Bug 1515119 Opened 5 years ago Closed 5 years ago

Require intentional action to land to mozilla-central (possibly excepting sheriffing somehow) and log them

Categories

(Developer Services :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mcote, Assigned: lars)

References

Details

(Keywords: leave-open)

Attachments

(17 files, 3 obsolete files)

47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
Although we want most landings to mozilla-central (and, later, other branches) to go through Lando, since we are building more automation on top of it, we will continue to allow direct pushes to hg.mozilla.org/mozilla-central; however, users will have to signify that there is an exceptional circumstance, such as Lando being broken, or a backout (which are currently annoying to do in Phab/Lando).  These exceptions will be logged publicly, although the reason will (at least to start) be kept private.

This bug should be split up into several separate bugs for the various tasks.

Enabling this should be the last step before resolving bug 1514805.

Connor do you have thoughts on this? (Not asking for you to start work on it, just thoughts on implementation)

Flags: needinfo?(sheehan)

We have similar functionality implemented in the tree closure hook. When Treestatus indicates that a tree is closed, we require the "magic words" (in this case "CLOSED TREE") to override pushing to the closed repo. The first thing that comes to mind is to model this functionality in a similar fashion. Write a hook that stops all pushes to the repo unless they come from automation, or include the "magic words". If we see the magic words in use, we allow the push and write to a log somewhere (either on hgmo, in the blackbox for example or to some service, etc).

(In reply to Mark Côté [:mcote] from comment #0)

These exceptions will be logged publicly, although the reason will (at least to start) be kept private.

I'm not sure if this is supposed to mean "we have a list showing exceptions that does not include the reason for the exception" or "the reason for the exception is explicitly not public information". If it's the former, and the reason can't be included in the commit message or the diff itself, this is slightly more complicated. I won't spend too much time thinking of a solution for that until I know that was the intended message. :)

Flags: needinfo?(sheehan)

I don't know the context of this sentence that Mark wrote. I'm not sure why the reason why would be kept private. Smacleod would you have any insight on this requirement?

Flags: needinfo?(smacleod)

(In reply to Kim Moir [:kmoir] ET from comment #3)

I don't know the context of this sentence that Mark wrote. I'm not sure why the reason why would be kept private. Smacleod would you have any insight on this requirement?

Mark and I discussed whether the reason should be made public, decided against it, then made our decision public, but the reasoning for it private... Which I find a little comical. It seems neither of us documented our rationale, and I really can't remember the cases we were worried about that had us decide to keep it private.

It might have had something to do with manually pushing security bugs because something broke down in Lando and not wanting to publicize "Hey, I'm pushing this directly because it's a security fix and something is broken with security fixes"... I'm not convinced this is worth all the extra work to design something more complex than what connor has proposed.

I think we should probably just proceed and allow the reasons to be public.

Flags: needinfo?(smacleod)
Assignee: nobody → lars
Blocks: 1545176

Before I can act on this, there is some vagueness that needs clarification.

What is the definition of "logged publicly"? Email list announcements? Posting to IRC and/or Slack?

How does the person pushing to hg.mozilla.org/mozilla-central indicate that the push is an exception and how is the reason specified? Is is a part of the commit message or is there some other out-of-band metadata accompanying the push?

Does the "reason" require approval by someone acting as an authority? How would authorization be specified?

Flags: needinfo?(smacleod)
No longer blocks: 1545176
Depends on: 1545176
Flags: needinfo?(smacleod)
Depends on: 1547755
Depends on: 1550583

We need to determine exactly what events we're logging to Sentry. Here's a potential list:

  • Pushes by Lando
  • Successful pushes by SCM Level 4 authorized people
  • Successful pushes by SCM Level 3 with the correct magic words and justification
  • Unsuccessful pushes from SCM Level 3 with missing magic words
  • Unsuccessful pushes from SCM Level 3 with missing/rejected justification
  • Unsuccessful pushes by anyone else

To get an account on Sentry for HGMO (Bug 1550583), we'll need some estimates for max number of hits/second from our system.

Flags: needinfo?(sheehan)

(In reply to K Lars Lohn [:lars] [:klohn] from comment #6)

We need to determine exactly what events we're logging to Sentry. Here's a potential list:

  • Pushes by Lando

This would be far too noisy, since people are supposed to be pushing via Lando. We only want the notification when someone does something out-of-policy.

  • Successful pushes by SCM Level 4 authorized people
  • Successful pushes by SCM Level 3 with the correct magic words and justification

We need these.

  • Unsuccessful pushes from SCM Level 3 with missing magic words
  • Unsuccessful pushes from SCM Level 3 with missing/rejected justification
  • Unsuccessful pushes by anyone else

If the push in unsuccessful, then the hook is working. So no need for a notification here.

To get an account on Sentry for HGMO (Bug 1550583), we'll need some estimates for max number of hits/second from our system.

A very liberal estimate would be once a minute, and that will only happen during the initial rollout. Once the hook is in place and all the major stakeholders are aware of the new processes, I'd only expect a few notifications a week, if that.

Steven, anything I've missed here?

Flags: needinfo?(sheehan) → needinfo?(smacleod)
Depends on: 1550755

(In reply to Connor Sheehan [:sheehan] from comment #7)

  • Successful pushes by SCM Level 4 authorized people
  • Successful pushes by SCM Level 3 with the correct magic words and justification

We need these.

Ideally the number of "Successful pushes by SCM Level 4 authorized people" would be low, but to start there will be regular pushes by this group, for things like merging central/inbound/autoland. I'd say we should start without notifying on these and maybe add it in later once inbound is gone.

Steven, anything I've missed here?

Everything else you said makes sense to me 👍

Flags: needinfo?(smacleod)

This hghook will vet pushes to mozilla-central and only allow
Pushes under these conditions:
1 - the user has SCM_ALLOW_DIRECT_PUSH
2 - without SCM_ALLOW_DIRECT_PUSH, the tip commit message contains
specific magic words followed by a justification

The code is written as a hierarchy of methods rather than objects to mimic
how other hooks have been written.

Hg hooks use a command-line style logic to indicate success or failure
that is opposite of the way that Python interprets conversion between integers
and boolean values. This module has been written entirely in terms of that
backwards logic using the definitions HOOK_TRUE and HOOK_FALSE. I thought
it best to be consistent as it makes the code more readable than mixing True/False
and 0/1 in an unconventional manner.

I chose the magic words "EXCEPTIONAL PUSH:" with the colon at the end to
coax the user into understanding that more than just the magic words are
required for a successful push.

temp commit

Depends on: 1553274

This hghook will vet pushes to mozilla-central and only allow
Pushes under these conditions:
1 - the user has SCM_ALLOW_DIRECT_PUSH
2 - without SCM_ALLOW_DIRECT_PUSH, the tip commit message contains
specific magic words followed by a justification

The code is written using the "Check" method of writing a hook.

I chose the magic words "PRIVILEGED PUSH:" with the colon at the end to
coax the user into understanding that more than just the magic words are
required for a successful push.

Attachment #9065796 - Attachment is obsolete: true

A comment in the file, https://hg.mozilla.org/hgcustom/version-control-tools/file/tip/testing/vcttesting/ldap.py#l180, which I just happened to see in passing, brought to my attention that there is a system of shadow LDAP groups that follow some of the scm_level_* groups. I asked :jabba about this and he gave me a great explanation about a heretofore unknown and convoluted system around all scm* groups. The explanation can be found here: https://bugzilla.mozilla.org/show_bug.cgi?id=1551292#c9 Since that is a confidential bug, i'll give a quick summary here. There is also a related issue in github about this: https://github.com/mozilla-iam/iam-project-backlog/issues/144#issuecomment-301854203

There are three shadow groups for all the scm_* groups: all_scm_*, active_scm_* and expired_scm_*. There exists a job that runs hourly that adjusts the membership in these groups for any user that is a member of any scm_* group. For example, if a user has scm_level_3, they also have active_scm_level_3 unless they've not used the privilege within some period of time. If so, they're apparently dropped from membership in active_scm_level_3 and given expired_scm_level_3. They must then act to have active_scm_level_3 restored.

This makes me suspicious that this Privileged Push Hook, really ought to be triggering on active_scm_* rather than just the bare scm_* group membership.

If any one could offer some clarification, it would be greatly appreciated.

Flags: needinfo?(smacleod)
Attachment #9067399 - Attachment description: mozhghooks: (50% review - DO NOT LAND) block pushes to mozilla-central without intentional action (Bug 1515119) r?sheehan → mozhghooks: block pushes to mozilla-central without intentional action (Bug 1515119) r?sheehan

Adds a new scm group called scm_allow_direct_push. This intended for
the new PrivilegedPushCheck that will force holders of scm_level_3 to use
Lando instead of pushing directly. Holders of scm_allow_direct_push will
be allowed to push without justification.

This new integration tests tries the new PrivilegedPushCheck by exercising each
of the combinations of scm_level_3 and scm_allow_direct_push with various
commit messages.

(In reply to K Lars Lohn [:lars] [:klohn] from comment #11)

This makes me suspicious that this Privileged Push Hook, really ought to be triggering on active_scm_* rather than just the bare scm_* group membership.

If any one could offer some clarification, it would be greatly appreciated.

This was discussed in person, but just documenting here. Yes, we should be using active_scm_*, so that when access expires so will this. In practice it probably doesn't matter as expiry is checked elsewhere by the server as well, but it's probably best to do the correct thing here.

Flags: needinfo?(smacleod)

In the document for the announcement, bholley and dbaron suggest using simpler wording than PRIVILEGED PUSH since the word privilege is difficult to spell. MANUAL PUSH was suggested. Could patches be updated to reflect this?

Flags: needinfo?(sheehan)

My plan was to land the existing patches as they are now, so the original author (Lars) gets credit for his work. I can make any other changes we would like as separate follow up reviews, join the stack together in Phabricator and land them as a whole. :)

MANUAL PUSH sounds reasonable to me! I'll send a patch to change it.

Flags: needinfo?(sheehan)

sounds good, thank you!

After consulting with some engineers from other groups, it was
pointed out that the real intent of the magic words and message
is to have the pushing user provide their reason for not using
Lando. The "privileges" required for this are not as important
as having the user acknowledge they are going outside the
required process, and providing their reason for doing so.

This commit changes the name of the check and most terminology
to "Lando Required Check". The true technical details of this
check are to enforce landings via Lando for a set of repositories,
and create a process for circumventing the Lando requirement.
If the user wishes to go around the automated landing process,
they must specify they are pushing "manually", and thus the
magic words have been changed to indicate they should be used
for that purpose.

These strings are not required to be class-level constant variables,
and their presence within the check class has no functional value.
Leaving them in the class requires prefixing self. to each use
of the variable. This commit moves all the variables to the top
level and removes many uses of self.

manual_push.py has a more verbose license header than other
checks in the same directory.

Previously this was added to define configuration in the
check file itself, but it was moved to a different file.
We also change the import style to multi-line.

The check step in the manual push hook included several
nested if statements. This commit flattens the function
while ensuring the same output, increasing readability.

Several methods in ManualPushCheck are static and can be moved
outside the class definition.

When changing the name of the "privileged push" hook, I forgot
to update the docker-hg-ssh role to modify the correct Ansible
variable in testing. As a result, the variable was missing from
the config altogether and many tests were failing. This is because
we were not checking the return value of ui.config before attempting
to call .split() on it. This commit adds logic to assert the
config value is usable before creating the list of repos.

Mostly just moving variables to their new locations and removing
a few mock Check objects that are no longer needed.

Let's only enable the hook on my user repo, so I can test
the LDAP integration in production.

This probably wouldn't have caused any issues in practice,
but it's considered more correct to check the strings are
equal in value, not identical objects.

Pushed by cosheehan@mozilla.com:
https://hg.mozilla.org/hgcustom/version-control-tools/rev/f165a1d49900
mozhghooks: block pushes to mozilla-central without intentional action r=sheehan
https://hg.mozilla.org/hgcustom/version-control-tools/rev/6778c5c94612
hg-ssh-server: ldap changes for new scm group r=sheehan
https://hg.mozilla.org/hgcustom/version-control-tools/rev/3fb3265831d7
mozhghooks: integration test for PrivilegedPushCheck r=sheehan
https://hg.mozilla.org/hgcustom/version-control-tools/rev/7afaa8a8e1cc
hghooks: rename privileged push check and change magic words r=glob
https://hg.mozilla.org/hgcustom/version-control-tools/rev/a4f50f7a9240
hghooks: move hard-coded string constants to top level r=glob
https://hg.mozilla.org/hgcustom/version-control-tools/rev/be31a1fbd411
hghooks: use == for string comparisons r=glob
https://hg.mozilla.org/hgcustom/version-control-tools/rev/de20439d0889
hghooks: use same license header as other checks r=glob
https://hg.mozilla.org/hgcustom/version-control-tools/rev/61e67e3f5010
hghooks: remove unused import r=glob
https://hg.mozilla.org/hgcustom/version-control-tools/rev/17c064383157
hghooks: flatten ManualPushCheck.check r=glob
https://hg.mozilla.org/hgcustom/version-control-tools/rev/7acd1ddd0b66
hghooks: move static methods outside class r=glob
https://hg.mozilla.org/hgcustom/version-control-tools/rev/7bc74db2d5fe
hghooks: do not fail when lando_required_repo_list config is absent r=glob
https://hg.mozilla.org/hgcustom/version-control-tools/rev/5b46ec42ff31
hghooks: fix unit test for lando required check r=glob
https://hg.mozilla.org/hgcustom/version-control-tools/rev/25c9bed28326
ansible/hg-ssh: set lando_required_repo_list to my user repo for testing r=glob

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED

I still need to deploy this and confirm the Sentry integration with Slack works as intended. That will happen on Monday morning.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Pushed by cosheehan@mozilla.com:
https://hg.mozilla.org/hgcustom/version-control-tools/rev/83cdb0bd4dcf
ansible/hg-ssh-server: change scm_allow_direct_push gid to 692 r=glob

Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Keywords: leave-open
Resolution: FIXED → ---

When testing the Lando required hook, I noticed that I was
able to send notifications from the pash virtualenv on
hgssh1 using the Python REPL, but could not get the message
through to Sentry when performing a push. After some testing
I discovered the Sentry SDK does not like Mercurial's
demandimporter. This commit disables the importer during the
Sentry SDK import.

We also move the import into the logging function. This
is both for performance, and to avoid an ImportError
in the unifyrepo process, which does not have the
Sentry SDK installed and fails when attempting to import
the SDK without demandimport.

Using the term "top" is confusing, since it could refer to either
the commit at the "base" of the stack, or the "tip" of the stack.
Many VCS' use "head" to refer to a commit with no children,
so this commit changes the Lando required hook to use the less
ambiguous term.

This is a micro-optimization, but worth doing if we're updating
the hook anyways. We can also make the logging funcion slightly
flatter with this change.

Adding some extra context for use in Sentry.

This will cause the repo name to be displayed in Slack,
making it easier to find the commit containing the
"justification" message.

This commit adds more context to the Sentry event messages.

The core message that is displayed in Slack is extended to add
the "justification" for using a manual push, and some unnecesary
cruft is removed. The two unique message formats (one for each
acceptable scm level) are consolidated into a single message
with the scm level as a parameter. As part of this change,
changeset_has_justification is renamed and changed from a
boolean check into a function which returns a value or None.

The Sentry APIs are used more extensively, adding user metadata
as well as some "tags" and extra pieces of information. This
will make organizing events by different parameters easier in
Sentry. For example, determining which repo is used most
frequently for direct pushes, which user is pushing directly
most often, etc.

Since users with scm_allow_allow_direct_push do not need to
provide justification for their pushes, a default string is
inserted in place of the justification.

Attachment #9080116 - Attachment is obsolete: true
Attachment #9080117 - Attachment is obsolete: true
Pushed by cosheehan@mozilla.com:
https://hg.mozilla.org/hgcustom/version-control-tools/rev/273828b4d8d7
hghooks/lando_required: disable demandimporter when importing Sentry SDK r=glob
https://hg.mozilla.org/hgcustom/version-control-tools/rev/dbfd6ddc0525
hghooks/lando_required: refer to most recent topological commit as "head" r=glob
https://hg.mozilla.org/hgcustom/version-control-tools/rev/6cce06c04c63
hghooks/lando_required: avoid double read of Sentry DSN r=glob
https://hg.mozilla.org/hgcustom/version-control-tools/rev/2a8976cf424a
hghooks/lando_required: add more context to Sentry notifications r=glob

The hook has been turned on for mozilla-central, integration/autoland and integration/mozilla-inbound.

Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Regressions: 1568853
Depends on: 1568927
Regressions: 1569611
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: