Closed
Bug 1459200
Opened 8 years ago
Closed 7 years ago
Automate the synchronization of the https://github.com/mozilla/gecko mirror (hg→git-cinnabar)
Categories
(Developer Services :: General, task)
Developer Services
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: janx, Assigned: janx, Mentored)
References
Details
Attachments
(1 file)
|
1.41 KB,
patch
|
myk
:
feedback+
|
Details | Diff | Splinter Review |
If you're using Git to work on Firefox, you probably know:
- "git-cinnabar", a Mercurial-protocol extension for Git,[0] and
- "gecko-dev", a GitHub mirror of hg.mozilla.org repositories.[1]
Unfortunately, you can't use both in the same workflow, because their commit histories are entirely different and incompatible.
That's why we recently started a second GitHub mirror called "gecko",[2] that's based on hg.mozilla.org commits translated by git-cinnabar. This is useful because cloning mozilla-central via GitHub is *much* faster than via hg.mozilla.org with git-cinnabar.
However, this "gecko"[2] mirror is currently synchronized manually ~every day by :myk. This isn't ideal, and we should probably automate this sync in order to:
- save :myk's time and sanity;
- improve the mirror's bus factor;[3]
- guarantee that it remains up-to-date consistently; and
- synchronize more than just the `master` branch (i.e. it would be great to also sync the `beta`, `release` and `esr*` branches/repositories)
Thankfully, this could be fairly easy to achieve, e.g. with a recurrent CI job in a service like TaskCluster, CircleCI, or similar.
[0] https://github.com/glandium/git-cinnabar/
[1] https://github.com/mozilla/gecko-dev/
[2] https://github.com/mozilla/gecko/
[3] https://en.wikipedia.org/wiki/Bus_factor
Comment 1•8 years ago
|
||
> Unfortunately, you can't use both in the same workflow, because their commit histories are entirely different and incompatible.
You actually can: https://github.com/glandium/git-cinnabar/wiki/Mozilla:-Using-a-git-clone-of-gecko%E2%80%90dev-to-push-to-mercurial
And it would be even more usable if gecko-dev switched to git-cinnabar for new commits.
Comment 2•8 years ago
|
||
(In reply to Jan Keromnes [:janx] (away until May 14) from comment #0)
> If you're using Git to work on Firefox, you probably know:
> - "git-cinnabar", a Mercurial-protocol extension for Git,[0] and
> - "gecko-dev", a GitHub mirror of hg.mozilla.org repositories.[1]
>
> Unfortunately, you can't use both in the same workflow, because their commit
> histories are entirely different and incompatible.
>
> That's why we recently started a second GitHub mirror called "gecko",[2]
> that's based on hg.mozilla.org commits translated by git-cinnabar. This is
> useful because cloning mozilla-central via GitHub is *much* faster than via
> hg.mozilla.org with git-cinnabar.
There's a bit more context and a comparison of mozilla/gecko-dev, mozilla/gecko, and mozilla/gecko-projects in <https://wiki.mozilla.org/GitHub/Gecko_Repositories>.
(In reply to Mike Hommey [:glandium] from comment #1)
> > Unfortunately, you can't use both in the same workflow, because their commit histories are entirely different and incompatible.
>
> You actually can:
> https://github.com/glandium/git-cinnabar/wiki/Mozilla:-Using-a-git-clone-of-
> gecko%E2%80%90dev-to-push-to-mercurial
> And it would be even more usable if gecko-dev switched to git-cinnabar for
> new commits.
gps mentioned this in bug 1406792 (which grew out of a conversation in dev-platform <https://groups.google.com/forum/#!msg/mozilla.dev.platform/kqMX4H6Iw5M/Kfu4x_aFBgAJ>).
One of the details that isn't clear to me is whether it would be possible to push to try/inbound/mozreview/etc. after cloning a gecko-dev that switched to git-cinnabar without also fetching mozilla-central using git-cinnabar. Currently, that isn't possible with mozilla/gecko, because cloning that repo doesn't fetch the metadata that git-cinnabar needs to map Git and Hg revision IDs.
(I think I recall a mention of this changing in git-cinnabar 0.5, although it doesn't seem to be the case with 0.5.0b2, which is what I'm using to sync and fetch mozilla/gecko.)
Comment 3•8 years ago
|
||
git-cinnabar master can do that, but it's slow at the moment.
| Assignee | ||
Comment 4•7 years ago
|
||
The attached patch is a proof-of-concept that easily automates the git-cinnabar mirroring of a Firefox repository on GitHub.
It's a CircleCI configuration that:
- runs every 6 hours
- clones mozilla-unified from hg.m.o using git-cinnabar, and pushes to my personal git-cinnabar-based fork https://github.com/jankeromnes/gecko to keep it up-to-date
- the entire process takes just under 30 minutes (thanks to glandium's pre-computed cinnabar meta-data)
- it works with free CircleCI accounts (I currently run it on my own CircleCI account)
It currently lives in its own repository, https://github.com/jankeromnes/gecko-sync/, but it can be forked; moved under https://github.com/mozilla/; or directly merged into the Firefox tree.
Myk, I took inspiration from the steps that you run manually to sync https://github.com/mozilla/gecko every day. So, if you wanted, you could just fork my repository; slightly edit the configuration; and activate it on CircleCI (you'll also need to set up GIT_USER_EMAIL, GIT_USER_NAME, GITHUB_LOGIN and GITHUB_TOKEN in CircleCI's environment).
This would use your own CircleCI account (free tier) to basically automate what you're already doing manually today. I can help you set this up, and it would just take a few minutes.
Or, we could land it in Firefox or move it under https://github.com/mozilla/, and use Mozilla's CircleCI account to sync the official git-cinnabar mirror. I'm fine either way.
Attachment #8994798 -
Flags: feedback?(myk)
| Assignee | ||
Updated•7 years ago
|
Attachment #8994798 -
Flags: feedback?(mh+mozilla)
| Assignee | ||
Comment 5•7 years ago
|
||
A quick note: My proof-of-concept syncs the "default" branch of mozilla-unified, while https://github.com/mozilla/gecko tracks the "central" branch of mozilla-central.
My config can easily be edited to sync mozilla-central instead, but I thought it would be interesting to maybe sync all the branches from mozilla-unified.
| Assignee | ||
Comment 6•7 years ago
|
||
Comment on attachment 8994798 [details] [diff] [review]
.circleci/config.yml
Fubar, what sort of security clearance would such a set-up require? Rapid Risk Assessment? (I guess the most sensitive part would be the GitHub "Personal Access Token" stored in CircleCI.)
Attachment #8994798 -
Flags: feedback?(klibby)
Comment 7•7 years ago
|
||
If we're going to do this, I have a strong preference for the processes running in Taskcluster instead of on a 3rd party service (CircleCI). And we could potentially even define the CI in mozilla-central as part of .cron.yaml (https://hg.mozilla.org/mozilla-central/file/4c4abe35d808/.cron.yml).
| Assignee | ||
Comment 8•7 years ago
|
||
(In reply to Gregory Szorc [:gps] from comment #7)
> If we're going to do this, I have a strong preference for the processes
> running in Taskcluster instead of on a 3rd party service (CircleCI). And we
> could potentially even define the CI in mozilla-central as part of
> .cron.yaml
> (https://hg.mozilla.org/mozilla-central/file/4c4abe35d808/.cron.yml).
Thank you for your input and suggestion!
Note that Mozilla already uses CircleCI (paid account) for various critical processes, like for example signing Mozilla Docker images as part of Cloud Ops' Dockerflow app deployments: https://github.com/mozilla-services/Dockerflow
But I agree that we should at least investigate how this could be done in Taskcluster:
- I like your suggestion of adding it to `.cron.yml`, but the set-up would potentially be much more complicated than the attached 30-line CircleCI config, because `.cron.yml` only schedules decision tasks (which, in my understanding, generate the task definitions for *all* Mozilla projects before filtering and triggering the relevant tasks). We'd then need to set up a transformed Taskcluster CI task; store the tokens in Taskcluster Secrets; query them in-task via an API through Taskcluster Proxy; ensure we have the right scopes for all this; etc.
- A simpler solution may be to define the mirroring task as a single self-contained Hook like `focus-nightly`: https://tools.taskcluster.net/hooks/project-mobile/focus-nightly
In summary, I believe that an extremely simple solution like my CircleCI config (or a self-contained Taskcluster Hook) would already be a major improvement over today's manual syncing of https://github.com/mozilla/gecko, while implementing a new task graph might be over-engineering.
Comment 9•7 years ago
|
||
I don't think either circle-ci or taskcluster are the right place for this. Something more akin to vcssync would be a better setup. (and if vcssync could be switched to use git-cinnabar, it would be even better)
Updated•7 years ago
|
Attachment #8994798 -
Flags: feedback?(mh+mozilla)
Comment 10•7 years ago
|
||
This solution is good for periodic updates and solves the problem we've got at the moment.
However, if/when we would like to update the Github repo on a push to `mozilla-central`, it would need to morph to a server with a volume holding the cloned repository.
Comment 11•7 years ago
|
||
All of the current cron tasks are decision tasks, but for something like this it would be easy enough to add a new `job.type` that lets you just do what you need to do. With a little bit more delicacy, you could add a very similar task to .taskcluster.yml that runs on every push, such that pushes are mirrored almost immediately when they are made, rather than every 15 minutes. A plain-old hook would also work.
| Assignee | ||
Comment 12•7 years ago
|
||
(In reply to Mike Hommey [:glandium] from comment #9)
> I don't think either circle-ci or taskcluster are the right place for this.
> Something more akin to vcssync would be a better setup. (and if vcssync
> could be switched to use git-cinnabar, it would be even better)
Unfortunately, from what I understand, vcssync is a very old and complicated system that nobody dares to touch anymore (or at least, new work on it is no longer accepted).
This is why we're looking into simpler mirroring systems.
(In reply to Piotr Zalewa [:zalun] from comment #10)
> This solution is good for periodic updates and solves the problem we've got
> at the moment.
Thanks for your feedback!
> However, if/when we would like to update the Github repo on a push to
> `mozilla-central`, it would need to morph to a server with a volume holding
> the cloned repository.
I'm not sure why a volume would be required here. This seems orthogonal to sync-on-push, which could be achieved e.g. by subscribing the mirroring task to the right Pulse message instead of basing it on a cron schedule.
(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #11)
> All of the current cron tasks are decision tasks, but for something like
> this it would be easy enough to add a new `job.type` that lets you just do
> what you need to do. With a little bit more delicacy, you could add a very
> similar task to .taskcluster.yml that runs on every push, such that pushes
> are mirrored almost immediately when they are made, rather than every 15
> minutes. A plain-old hook would also work.
Aha, thanks a lot for these suggestions! I'll try to come up with a quick Taskcluster proof-of-concept in the next few days, so that we can compare it with the CircleCI one.
Comment 13•7 years ago
|
||
There are two "vcssync" systems that are kinda/sorta maintained by the old Developer Services team. One of them is based on mozharness and does the Git syncing to GitHub. That code is very unmaintained and we want to see it go away.
The other "vcssync" code lives in version-control-tools. https://hg.mozilla.org/hgcustom/version-control-tools/file/d47e8b2ce5f3/vcssync and https://mozilla-version-control-tools.readthedocs.io/en/latest/vcssync/index.html. That's what we used for keeping the Servo Git repo in sync with the Firefox Mercurial repo. That code and architecture is a bit better. It was written with the intent of being extended for us on other projects.
Both projects rely on running on a dedicated machine. That is less preferable than running in an ad-hoc manner (such as what CircleCI and Taskcluster do). We prefer ad-hoc tasks. But for VCS syncing, ad-hoc tasks almost certainly require you to sacrifice low latency (in the seconds). Best you can do with the Firefox repo is probably a few to several minutes of latency. Since the original patch used CircleCI, I assumed that ad-hoc infrastructure (and its high latency) was acceptable. I suggested Taskcluster because this automation is intricately tied to mozilla-central / Firefox development and all the rest of Firefox automation uses Taskcluster. I think we should avoid fragmentation if possible.
Comment 14•7 years ago
|
||
(In reply to Jan Keromnes [:janx] from comment #12)
> (In reply to Piotr Zalewa [:zalun] from comment #10)
> > However, if/when we would like to update the Github repo on a push to
> > `mozilla-central`, it would need to morph to a server with a volume holding
> > the cloned repository.
>
> I'm not sure why a volume would be required here. This seems orthogonal to
> sync-on-push, which could be achieved e.g. by subscribing the mirroring task
> to the right Pulse message instead of basing it on a cron schedule.
Anything which wouldn't require to clone the entire repository for each update is good. Holding it in a volume would make the job done in seconds. There might be other solutions.
Comment 15•7 years ago
|
||
Comment on attachment 8994798 [details] [diff] [review]
.circleci/config.yml
(In reply to Jan Keromnes [:janx] (away until July 31) from comment #5)
> A quick note: My proof-of-concept syncs the "default" branch of
> mozilla-unified, while https://github.com/mozilla/gecko tracks the "central"
> branch of mozilla-central.
>
> My config can easily be edited to sync mozilla-central instead, but I
> thought it would be interesting to maybe sync all the branches from
> mozilla-unified.
This is indeed interesting. To ensure I understand, when you say "branches from mozilla-unified", are you referring to the "bookmarks" in that repository <https://hg.mozilla.org/mozilla-unified/bookmarks>?
In any case, I would avoid syncing the "default" Hg branch in mozilla-unified (which appears to track inbound) or naming a Git branch "default" in the GitHub repository, since all the Hg repos collected in mozilla-unified have their own default branches, so that name could be confusing.
I would instead sync the "default" Hg branches of the various Hg repos—or the bookmarks in mozilla-unified–to Git branches named after the originating repos: central, inbound, beta, etc.
(In reply to Gregory Szorc [:gps] from comment #13)
> There are two "vcssync" systems that are kinda/sorta maintained by the old
> Developer Services team. One of them is based on mozharness and does the Git
> syncing to GitHub. That code is very unmaintained and we want to see it go
> away.
My interpretation of glandium's comment is that he isn't necessarily advocating for vcssync itself but rather a solution like vcssync, in which a dedicated service maintains a clone of the Git repository and uses it to sync changes as they occur. (glandium, correct me if I misunderstood!)
> Both projects rely on running on a dedicated machine. That is less
> preferable than running in an ad-hoc manner (such as what CircleCI and
> Taskcluster do). We prefer ad-hoc tasks. But for VCS syncing, ad-hoc tasks
> almost certainly require you to sacrifice low latency (in the seconds). Best
> you can do with the Firefox repo is probably a few to several minutes of
> latency.
For the current frequency of sync of mozilla/gecko (approximately nightly), ad-hoc is sufficient. And that's probably still true if we increase the frequency by something less than an order of magnitude, as janx has done in his CircleCI configuration (approximately every six hours).
Eventually it'd be useful to have a task that syncs every change as it occurs, for which an ad-hoc task may be sufficient for infrequently-updated repos (central, beta) but will probably be insufficient for frequently-updated ones (inbound, autoland).
Nevertheless, I wouldn't let that perfect be the enemy of this good. I'd happily accept automation of low-frequency syncing via an ad-hoc task in the short term while investigating high-frequency syncing via a dedicated service in the long term.
> Since the original patch used CircleCI, I assumed that ad-hoc
> infrastructure (and its high latency) was acceptable. I suggested
> Taskcluster because this automation is intricately tied to mozilla-central /
> Firefox development and all the rest of Firefox automation uses Taskcluster.
> I think we should avoid fragmentation if possible.
I'm ok with a CircleCI task, as it's a known good quantity, but I too would like to see how Taskcluster handles this task, as I would expect it to provide better perf (and have more knobs to turn if it doesn't).
Attachment #8994798 -
Flags: feedback?(myk) → feedback+
| Assignee | ||
Updated•7 years ago
|
Attachment #8994798 -
Flags: feedback?(klibby)
| Assignee | ||
Comment 16•7 years ago
|
||
Thanks all for the great feedback and help so far!
I'm happy to report that I have a working Taskcluster-based proof of concept. More details below.
(In reply to Myk Melez [:myk] [@mykmelez] from comment #15)
> (In reply to Jan Keromnes [:janx] (away until July 31) from comment #5)
> > My config can easily be edited to sync mozilla-central instead, but I
> > thought it would be interesting to maybe sync all the branches from
> > mozilla-unified.
>
> This is indeed interesting. To ensure I understand, when you say "branches
> from mozilla-unified", are you referring to the "bookmarks" in that
> repository <https://hg.mozilla.org/mozilla-unified/bookmarks>?
Correct, I was referring to these mozilla-unified bookmarks as "branches".
It probably doesn't make sense to sync *all* of these on GitHub, but we can easily pick a subset that seems useful (e.g. "central", "beta", "release"?)
(In reply to Myk Melez [:myk] [@mykmelez] from comment #15)
> In any case, I would avoid syncing the "default" Hg branch in
> mozilla-unified (which appears to track inbound) or naming a Git branch
> "default" in the GitHub repository, since all the Hg repos collected in
> mozilla-unified have their own default branches, so that name could be
> confusing.
Ok agreed, I've removed the branch called "default" in my GitHub fork, and reinstated the branch called "central".
I was confused about these names, thank you for explaining them.
(In reply to Myk Melez [:myk] [@mykmelez] from comment #15)
> (In reply to Gregory Szorc [:gps] from comment #13)
> > Since the original patch used CircleCI, I assumed that ad-hoc
> > infrastructure (and its high latency) was acceptable. I suggested
> > Taskcluster because this automation is intricately tied to mozilla-central /
> > Firefox development and all the rest of Firefox automation uses Taskcluster.
> > I think we should avoid fragmentation if possible.
>
> I'm ok with a CircleCI task, as it's a known good quantity, but I too would
> like to see how Taskcluster handles this task, as I would expect it to
> provide better perf (and have more knobs to turn if it doesn't).
So I've created a Taskcluster Hook called "gecko-cinnabar-mirror":
https://tools.taskcluster.net/hooks/project-gh-mirrors/gecko-cinnabar-mirror
Every 6 hours, it uses git-cinnabar to fetch https://hg.mozilla.org/mozilla-unified and push the "central" bookmark to my private fork https://github.com/jankeromnes/gecko (which is currently 256 commits ahead of https://github.com/mozilla/gecko), basically like so:
# Start a Docker container that already has a cinnabar clone of mozilla-unified
# Configure Git with GitHub credentials (from a TaskCluster Secret)
git fetch origin
git remote add $GITHUB_LOGIN https://github.com/$GITHUB_LOGIN/gecko
git push $GITHUB_LOGIN origin/bookmarks/central:central
The whole process takes about 23 minutes from start to finish on a Taskcluster "github-worker".
As you can see, it's pretty easy to edit the mirroring script to add more push targets, like for example:
- the repository https://github.com/mozilla/gecko (e.g. in addition to the private fork currently synced)
- other mozilla-unified "branches" (e.g. "beta", "release", maybe "esr60")
(In reply to Myk Melez [:myk] [@mykmelez] from comment #15)
> (In reply to Gregory Szorc [:gps] from comment #13)
> > Both projects rely on running on a dedicated machine. That is less
> > preferable than running in an ad-hoc manner (such as what CircleCI and
> > Taskcluster do). We prefer ad-hoc tasks. But for VCS syncing, ad-hoc tasks
> > almost certainly require you to sacrifice low latency (in the seconds). Best
> > you can do with the Firefox repo is probably a few to several minutes of
> > latency.
>
> For the current frequency of sync of mozilla/gecko (approximately nightly),
> ad-hoc is sufficient. And that's probably still true if we increase the
> frequency by something less than an order of magnitude, as janx has done in
> his CircleCI configuration (approximately every six hours).
>
> Eventually it'd be useful to have a task that syncs every change as it
> occurs, for which an ad-hoc task may be sufficient for infrequently-updated
> repos (central, beta) but will probably be insufficient for
> frequently-updated ones (inbound, autoland).
>
> Nevertheless, I wouldn't let that perfect be the enemy of this good. I'd
> happily accept automation of low-frequency syncing via an ad-hoc task in the
> short term while investigating high-frequency syncing via a dedicated
> service in the long term.
Both CircleCI and Taskcluster Hook examples can bring the GitHub mirror's "freshness" from "at worst a few days old" to "at worst 6 hours old".
We could arguably tweak the cron schedule to make the mirroring task run hourly, and get to "at worst an hour old", if we wanted.
The next step will be unlocked when it becomes possible to trigger Taskcluster Hooks from Pulse messages directly, which could remove the need for a cron schedule, and bring us to "at worst as old as the mirroring task takes to complete".
I agree that anything faster than that might require a high-performance mirroring service e.g. with dedicated machines, but that seems a bit overkill for now.
| Assignee | ||
Comment 17•7 years ago
|
||
Mission accomplished!
The mirror https://github.com/mozilla/gecko is now synchronized automatically, by a Taskcluster Hook that currently runs every 6 hours.
- The mirroring Hook: https://tools.taskcluster.net/hooks/project-gh-mirrors/gecko-cinnabar-mirror
- The configuration Secret: https://tools.taskcluster.net/secrets/project%2Fgh-mirrors%2Fgecko-cinnabar-mirror
- The Hook and Secret are administrated by this Group: https://mozillians.org/en-US/group/gh-mirrors-admins/
Thanks a lot to Dustin and Myk for the precious help!
Potential next steps:
A) In addition to the "central" branch, the Hook could also mirror other branches like "beta" and "release".
To do that, simply add these instructions to the Hook's payload command:
&& git push mozilla origin/bookmarks/beta:beta
&& git push mozilla origin/bookmarks/release:release
B) The mirroring frequency of "once every 6 hours" can be adjusted by editing the Hook's cron schedule. It is currently:
0 0 */6 * * *
C) It could be great to generate this Hook from code that lives inside a repository.
The mozilla-central repository itself was ruled out, because its Taskcluster infrastructure is quite complicated, and has a lot of assumptions about in-tree tasks that would make implementing such a simple standalone Hook impractical.
However, the ci-configuration [0] and ci-admin [1] repositories are perceived to be good candidates:
- A new file called "hooks.yml", containing the full description of stand-alone Hooks, could be added to [0]
- This file could then be read, and Hooks could be generated from it, by new code added to [1]
This would be very similar to what currently happens with "actions.yml", "environments.yml", "grants.yml" and "projects.yml" in [0].
[0] https://hg.mozilla.org/build/ci-configuration/file/
[1] https://hg.mozilla.org/build/ci-admin/file/
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•