Closed Bug 1350696 Opened 3 years ago Closed 3 years ago

Git mirror https://github.com/mozilla/gecko-dev stopped updating/is stuck

Categories

(Developer Services :: Git, defect, major)

defect
Not set
major

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aryx, Assigned: dhouse)

References

Details

Flags: needinfo?(dhouse)
I have not yet found the cause for this. We have successes reported mixed with failures for esr52 to gitmo. So it may be helpful to disable/remove esr52 temporarily to reduce the noise.

I ran a forced update, https://wiki.mozilla.org/ReleaseEngineering/How_To/VCSSync#How_to_force_the_process_to_pull.2Fbookmark.2Fconvert.2Fpush_a_repo.2C_even_if_nothing.27s_changed, but that did not fix the problem: I do not see a change in the pushes compared with the older logs and gecko-dev is still behind.

:hal does anything look like a failure to you in the logs, or what would you suggest for a next step? I see timeouts and the "vcs None" entries (for gitmo not configured for esr52), but there have been runs marked as successful in the past few days for beagle (https://groups.google.com/a/mozilla.com/forum/?hl=en#!searchin/releng-ops-trial/successful$20conversion$20for$20beagle%7Csort:date).
Assignee: nobody → dhouse
Flags: needinfo?(dhouse) → needinfo?(hwine)
I've disabled the gitmo-beagle push for esr52 in mozharness. I'm waiting for the current run to finish to test the conversion with that change. If we still see esr52/other warnings, I'll look at those and consider disabling esr52/other entirely until we get the base sync working again.
esr52 is marked as successful now (by skipping gitmo) but I'm still seeing timeouts and gecko-dev not updated
mozilla-central is logged in the latest run as not having any updates. I'm hoping that on the next cron run, with moz-central changes, that it will succeed (that the esr52 failure was breaking the full set). If not, I'll try a forced push again this afternoon.
The latest run shows changes for mozilla-central (hash matches current latest on hg.mo) and no failure, but the update did not come over to gecko-dev:
```
18:37:57     INFO - Running command: ['/opt/vcs2vcs/vcssync1/build/venv/bin/hg', '--config', 'web.cacerts=/etc/pki/tls/certs/ca-bundle.crt', 'p
ull', '-r', '4c7c05a49f3c', '/opt/vcs2vcs/vcssync1/build/stage_source/mozilla-central'] in /opt/vcs2vcs/vcssync1/build/conversion/beagle
18:37:57     INFO - Copy/paste: /opt/vcs2vcs/vcssync1/build/venv/bin/hg --config web.cacerts=/etc/pki/tls/certs/ca-bundle.crt pull -r 4c7c05a49
f3c /opt/vcs2vcs/vcssync1/build/stage_source/mozilla-central
18:38:25     INFO -  pulling from /opt/vcs2vcs/vcssync1/build/stage_source/mozilla-central
18:38:25     INFO -  searching for changes
18:38:25     INFO -  adding changesets
18:38:25     INFO -  adding manifests
18:38:25     INFO -  adding file changes
18:38:25     INFO -  added 9 changesets with 76 changes to 73 files (+1 heads)
18:38:26     INFO -  (run 'hg heads .' to see heads, 'hg merge' to merge)
18:38:26     INFO - Return code: 0
18:38:26     INFO - Running command: ['/opt/vcs2vcs/vcssync1/build/venv/bin/hg', '--config', 'web.cacerts=/etc/pki/tls/certs/ca-bundle.crt', 'b
ookmark', '-f', '-r', '4c7c05a49f3c', 'master'] in /opt/vcs2vcs/vcssync1/build/conversion/beagle
18:38:26     INFO - Copy/paste: /opt/vcs2vcs/vcssync1/build/venv/bin/hg --config web.cacerts=/etc/pki/tls/certs/ca-bundle.crt bookmark -f -r 4c
7c05a49f3c master
18:38:27     INFO - Return code: 0
```
(In reply to Dave House [:dhouse] from comment #2)
> I've disabled the gitmo-beagle push for esr52 in mozharness.

AFAIK, nothing should (any longer) be attempting to push to gitmo -- that host/serve has been decommissioned. :/ Any references left to gitmo can (and should) be removed.
(In reply to Dave House [:dhouse] from comment #5)
> The latest run shows changes for mozilla-central (hash matches current
...
> 18:38:25     INFO -  added 9 changesets with 76 changes to 73 files (+1
> heads)
> 18:38:26     INFO -  (run 'hg heads .' to see heads, 'hg merge' to merge)
> 18:38:26     INFO - Return code: 0

Something isn't right here -- mozilla-central should never have more than one head. I would expect to see an error further on, when the process does a trial git push locally -- multiple heads could end up with a non-fastforward change, which would be rejected, and not tried directly on github.
Flags: needinfo?(hwine)
at the moment, the gexport runs are taking a long time, and timing out (25min). That is "normal" after a new branch is introduced. Will look more tomorrow.
I just noticed that only inbound synced to github. Is it possible that the rest of the branches are not syncing from the correct location?
For example, the mercurial tree for aurora is https://hg.mozilla.org/releases/mozilla-aurora/. Maybe the script is looking elsewhere?
FWIW, these are the branches that are out-of-date on gecko-dev:
- aurora
- beta
- esr45
- esr52
- central

And the ones that are up-to-date (as of writing):
- esr17, esr24, esr31, esr38
- fx-team
- inbound
- release

Ironically, the tip of central is an ancestor of the current tip of inbound, so, technically, the commit for the tip of central is there. The branch just doesn't point to it.
I think master branch in git just got updated.
Perhaps they are syncing on 'next checkin'?
Indeed, aurora and central are up-to-date, now. I'm not sure about aurora, but central is still on the same changeset as when I wrote comment 10, though.
The graphics branch in mozilla/gecko-projects also got updated when I pushed something to it.
Hal, thank you! Do you think it was the timeouts that caused the problem or the multiple heads? (Did you fix the multiple heads manually?) Would breaking the beagle set out into two jobs or extending the timeout to prevent this?
Flags: needinfo?(hwine)
I didn't do anything except look -- so unsure what got things working. 

I don't believe either extending the timeout or breaking into two jobs would help (or is even advisable)

aki: do you recall if I'm correct about that (s/a comment 14)
Flags: needinfo?(hwine) → needinfo?(aki)
Hm. This is from a very long time ago.

- removing all traces of gitmo from the configs is good, as it is gone

- output timeout: this seems to be hardcoded [1] and errors are FATAL, which means we shouldn't go past it iirc.  And yet the logs show us erroring and going past it.

I vaguely seem to remember wanting to bump the timeout for new ESRs or large merges, but I'm not sure if this ended up being true.  Bumping the hardcoded timeout and re-running manually might work for ESR; if this is actually a fix, it might be nice to add a commandline option to bump this timeout without editing the script.  Also, as the repo grows over time, we may have to bump this timeout anyway.

[1] https://hg.mozilla.org/build/mozharness/file/tip/scripts/vcs-sync/vcs_sync.py#l885

- A ff error will break on the test_push, which is a push to a repo on local disk.  I'm not sure about the multiple heads issue; did that get resolved?
Flags: needinfo?(aki)
(In reply to Hal Wine [:hwine] (use NI) from comment #15)
> I don't believe either extending the timeout or breaking into two jobs would
> help (or is even advisable)

Extending the timeout for a one-time conversion might not be bad.

Agreed that breaking into two jobs may not be advisable...  Trimming old, known-retired branches like b2g* and esr <45 might help, but since they shouldn't be seeing new pushes, that probably won't help gexport at all unless we can actually remove the changesets from the conversion dir.  I seem to remember the conversion dir was rebuildable if we ever wanted to start fresh, but I also remember beagle has the CVS prepending which needs special treatment.

Maybe let's bump the gexport timeout for a one-time conversion and see how things look afterwards?
Thank you! I'll look into bumping the timeout for gexport timeout for a one-time conversion. :hwine, I'll look for you online/cal tomorrow to discuss the whole thing if that works for you.
Depends on: 1352478
marking this as resolved; I am following-up with actions to try to prevent this from happening again in bug 1352478
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Things are stuck again, new bug is 1362350.
See Also: → 1362350
You need to log in before you can comment on or make changes to this bug.