Closed Bug 1031378 Opened 10 years ago Closed 8 years ago

Changes to manifests in b2g-manifest that reference repos that are not mirrored to git.mozilla.org should trigger devs/releng/sheriffs

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: pmoore, Unassigned)

References

Details

We had (again) a change to b2g-manifest that caused bustage, because repos were not mirrored on git.mozilla.org:

https://github.com/mozilla-b2g/b2g-manifest/commit/7031c46b9bd6f9ba1b83024576a02045fe169a7c

In buildbot-master66.srv.releng.usw2.mozilla.com:/builds/b2g_bumper/master.log we could see:

09:05:55  WARNING - Returned 128 - sleeping and retrying
09:05:55  WARNING - Got output: fatal: remote error: FATAL: R any external/caf/platform/external/libpcap gitweb DENIED by fallthru
09:05:55  WARNING - (or you mis-spelled the reponame)
09:05:58  WARNING - Returned 128 - sleeping and retrying
09:05:58  WARNING - Got output: fatal: remote error: FATAL: R any external/caf/platform/external/tcpdump gitweb DENIED by fallthru
09:05:58  WARNING - (or you mis-spelled the reponame)
09:06:26  WARNING - Returned 128 - sleeping and retrying
09:06:26  WARNING - Got output: fatal: remote error: FATAL: R any external/caf/platform/external/libpcap gitweb DENIED by fallthru
09:06:26  WARNING - (or you mis-spelled the reponame)
09:06:29  WARNING - Returned 128 - sleeping and retrying
09:06:29  WARNING - Got output: fatal: remote error: FATAL: R any external/caf/platform/external/tcpdump gitweb DENIED by fallthru
09:06:29  WARNING - (or you mis-spelled the reponame)
09:06:56  WARNING - Returned 128 - sleeping and retrying
09:06:56  WARNING - Got output: fatal: remote error: FATAL: R any external/caf/platform/external/libpcap gitweb DENIED by fallthru
09:06:56  WARNING - (or you mis-spelled the reponame)
09:06:59  WARNING - Returned 128 - sleeping and retrying
09:06:59  WARNING - Got output: fatal: remote error: FATAL: R any external/caf/platform/external/tcpdump gitweb DENIED by fallthru
09:06:59  WARNING - (or you mis-spelled the reponame)
09:07:26  WARNING - Returned 128 - sleeping and retrying
09:07:26  WARNING - Got output: fatal: remote error: FATAL: R any external/caf/platform/external/libpcap gitweb DENIED by fallthru
09:07:26  WARNING - (or you mis-spelled the reponame)
09:07:29  WARNING - Returned 128 - sleeping and retrying
09:07:29  WARNING - Got output: fatal: remote error: FATAL: R any external/caf/platform/external/tcpdump gitweb DENIED by fallthru
09:07:29  WARNING - (or you mis-spelled the reponame)

The fallout was also reported by nagios-releng in #buildduty:

Fri 08:52:28 PDT [4562] buildbot-master66.srv.releng.usw2.mozilla.com:File Age - /builds/b2g_bumper/b2g_bumper.stamp is CRITICAL: FILE_AGE CRITICAL: /builds/b2g_bumper/b2g_bumper.stamp is 21036 seconds old and 0 bytes (http://m.mozilla.org/File+Age+-+/builds/b2g_bumper/b2g_bumper.stamp)


The problem here is that it is not always clear to devs that repos need to be mirrored to git.mozilla.org before they can be referenced in b2g-manifest.

An optimal solution would be to auto-revert commits where the git.mozilla.org repo did not exist (e.g. in a repo hook) but I'm not sure if that is possible.

A less aggressive solution would be to make sure when this condition is discovered, that we auto-report on it - e.g. via auto-creation of a bug in bugzilla, marked as a blocker, and assigned to sheriffs (i.e. requesting a revert) - or maybe an email, an IRC alert - something to make sure people know this has happened, and it needs to be reverted.

Possible distribution list would be releng, sheriffs, devs.

Another action point should be to further educate the requirement that repos need to mirrored before they can be referenced in b2g-manifest. Even if the dev does not know, hopefully the reviewer does.

Lastly - we may want to consider pulling from github as a fallback if we can't access from git.mozilla.org, and sending an email/creating a bug to say the repo(s) need(s) to urgently be mirrored.

This bug is about the long term solution - the short term fix for this particular instance of this problem was solved in bug 1025788.
(another idea - maybe we could set up our vcs sync tool to automatically mirror the repos in b2g-manifest, and set up our git server to allow vcs sync user to create repos that are missing)

Then we could auto-mirror required repos. This would be neat.
Now I think about this, I think this solution makes much more sense, than just reporting on the problem or reverting it - the real issue is that there is legwork to be done first, before the change can occur in b2g-manifest - but that legwork can also be done automatically. Auto-mirroring of referenced repos, I believe, is the right way to go.
Agreed that the "FILE_AGE CRITICAL" warning in #buildduty without any other context is very opaque for diagnosing what's broken. We could have had the problematic change to b2g-manifest found and reverted much more quickly if the alert had provided more information pointing to those external repos being the cause.
(In reply to Pete Moore [:pete][:pmoore] from comment #1)
> (another idea - maybe we could set up our vcs sync tool to automatically
> mirror the repos in b2g-manifest, and set up our git server to allow vcs
> sync user to create repos that are missing)
> 
> Then we could auto-mirror required repos. This would be neat.

We already have this capability for certain repo paths on git.m.o.
F/e, when the new vcssync l10n goes live, we'll auto-create any missing git.m.o repos.  When adding a new locale, we only have to worry about filing a bug for the hg.m.o side.

So we can technically do this, by changing the external/ path to also auto-create.  IIRC Hal had some misgivings about this, and I do not remember what they are.
An alternative that would WFM would just be to make the nagios alert clearer (particularly given that iirc in this case the repo URLs were wrong, so auto-create wouldn't be the right answer).
See Also: → 1042122
See Also: 1042122
See Also: → 1025788
(In reply to Aki Sasaki (not actively reading bugmail) from comment #4)
> So we can technically do this, by changing the external/ path to also
> auto-create.  IIRC Hal had some misgivings about this, and I do not remember
> what they are.

(In reply to Ed Morley [:edmorley] from comment #5)
> An alternative that would WFM would just be to make the nagios alert clearer
> (particularly given that iirc in this case the repo URLs were wrong, so
> auto-create wouldn't be the right answer).

That would be the reason! :)

For "controlled" cases, such as l10n, I think it's a wonderful idea. For cases where the input URL is questionable, it could quickly generate a lot of crud on the vcs servers.

See bug 1047501 where the b2g team is comfortable with requiring pre-approval of the upstream repos. Assuming that process is implemented, then:
 a) this bustage was correct, in that an out-of-process change had been made
 b) beefing up the error message resolves a lot of the issues
 c) once bug 1047501 is fully implemented on the b2g side, we could enable auto creation of mirrors as descrived in comment 4.

I will note that (c) won't prevent bustage and tree closure. Our experience with some upstream servers is that connectivity can vary, and the initial clone is the largest data transfer. That is, it could be hours (8-12) before the mirror is available. I think for non-mozilla upstream repos, the deterministic path is to ensure the mirror is populated prior to landing the manifest change.
See Also: → 1081825
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
Component: Tools → General
You need to log in before you can comment on or make changes to this bug.