Closed Bug 1027167 Opened 10 years ago Closed 8 years ago

Publish git-hg mappings to mapper by default from new vcs sync (would currently affect gecko-dev and gecko-projects)

Categories

(Developer Services :: General, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: pmoore, Assigned: hwine)

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1013] )

Attachments

(3 files)

At the moment, the action 'publish-to-mapper' is not in the list of default actions in vcs sync, and is only enabled in build/* repos via the config file, e.g.:
http://hg.mozilla.org/build/mozharness/file/0183ef99b9ea/configs/vcs_sync/build-repos.py#l108

Other than the build/* repos, the only other projects currently *in production* in the new vcs sync system are gecko-dev (also known as "beagle") and gecko-projects.

By adding 'publish-to-mapper' as a default action in vcs_sync.py, both of these vcs sync projects (gecko-dev and gecko-projects), together with any new projects that go live in the new vcs sync (most notably, l10n and gecko-git) can automatically publish to mapper.

The following changes would be required:

1) Add project names to mapper using mapper API

e.g. for each repository that has conversion mappings:

  * curl -d foo=bar -L -H "Authentication: Bearer ${RELENGAPI_INSERT_HGGIT_PROJECTS_AUTH_TOKEN}" "${mapper_url}/${project_name}"

where:
  * mapper_url=https://api-pub-build.allizom.org/mapper (staging) or https://api.pub.build.mozilla.org/mapper (production)
  * RELENGAPI_INSERT_HGGIT_PROJECTS_AUTH_TOKEN is the secret authentication key for the mapper environment
  * project_name is a unique name per mapfile-to-be-pushed (to be stored in the "projects" table of mapper mysql database)

2) Using this config file sample as a guide:
  * http://hg.mozilla.org/build/mozharness/file/0183ef99b9ea/configs/vcs_sync/build-repos.py#l35
add mapper config to all other config files under http://hg.mozilla.org/build/mozharness/file/tip/configs/vcs_sync - making sure that the project names used in step 1 match those used in this step.

2) Update default_actions to include 'publish-to-mapper' directly before 'push' (i.e. insert at line 91 if using the following version of vcs_sync.py):
http://hg.mozilla.org/build/mozharness/file/0183ef99b9ea/scripts/vcs-sync/vcs_sync.py#l91

3) Test gecko-dev, gecko-projects in staging, making sure mappings get published correctly. If all good, get code changes reviewed, r+'d, landed, and merged into production branch. Once it is merged to production branch, it will automatically be picked up by vcs sync production systems.

4) Monitor vcs sync email, to make sure it is all working ok, and validate that mappings have been added to production mapper successfully!

Please note, I also have this script, which I've used for testing the integrity of the mappings in the mapper database, which may be useful (or a slightly modified form for the new repos - currently just tests build/* repos):

https://github.com/petemoore/myscrapbook/blob/master/validate_mapper_database.sh

It checks the live git-hg mappings file against the results in the database, shows the md5 of the complete set of mappings per project for the raw file and the database results (so they should match) and displays a diff, if there is one, between the two. In other words, if you run it, and it shows no diffs, and the md5's match, you know the process is working ok - the raw map files on the vcs sync server match the results returned by mapper taken from its mysql database.

Please note, I'm not currently marking this bug as blocking Bug 799719 - "(vcs-sync) tracker to retire legacy vcs2vcs" since I don't believe the old vcs sync process is currently providing map files for gecko-dev and gecko-projects, so I *think* this is a feature enhancement. However, if legacy vcs sync is already taking care of this, then feel free to put this as a blocker for Bug 799719.

Thanks!
Pete
Aki - we can also do this together with create-git-notes action - maybe best to do both in one fell swoop?

If we decide to do that, we should also git pull from staging gecko-dev, staging gecko-projects and compare performance against the current production versions, to see if there is any performance impact, having the new notes (please note, by default, notes will *not* get pulled down - only if devs *explicitly* update their .git/config to add the refs/notes/commits refspec to their git fetch).
I think testing both makes sense.
I assume if we hit some big performance problem, we can delete all refs/notes/* in staging and keep testing without them?
There will be a number of cvs-based git shas in gecko* that have no corresponding hg sha... does your git notes code account for that?
fwiw, there are currently 127 repositories needing mapping services in the legacy system:
    $ repo_type 
    . is hg (http://hg.mozilla.org/users/hwine_mozilla.com/repo-sync-configs)
    $ grep yes */mapfile_needed | wc -l
        127

These are, roughly:
 - all hg->git l10n repos
 - all git->hg gaia mirrors
 - releases/gecko.git (the partner visible flavor of gecko)
(In reply to Aki Sasaki [:aki] from comment #2)
> I think testing both makes sense.
> I assume if we hit some big performance problem, we can delete all
> refs/notes/* in staging and keep testing without them?

Sure this will be no problem.

(In reply to Aki Sasaki [:aki] from comment #3)
> There will be a number of cvs-based git shas in gecko* that have no
> corresponding hg sha... does your git notes code account for that?

I believe that will be fine - the process for adding git notes is to look for new entries added in the git-mapfile by the hggit plugin after it has run the gexport command, and iterate through the set, creating a git note for each new mapping it finds.
(In reply to Pete Moore - on PTO until June 27 [:pete][:pmoore] from comment #5)
> (In reply to Aki Sasaki [:aki] from comment #3)
> > There will be a number of cvs-based git shas in gecko* that have no
> > corresponding hg sha... does your git notes code account for that?
> 
> I believe that will be fine - the process for adding git notes is to look
> for new entries added in the git-mapfile by the hggit plugin after it has
> run the gexport command, and iterate through the set, creating a git note
> for each new mapping it finds.

Ok. The process for spinning up a new gecko* repo (hopefully we never have to) should then probably include
1) running initial_beagle.py, and/or extracting the initial3.tar.bz2 tarball and renaming the conversion dir
2) doing whatever is needed to avoid creating git notes for the cvs mapfile entries (either by only creating git notes for anything after initial3.tar.bz2, or by somehow having a copy of the cvs hg-git mapfile -- not sure how to do this latter one)
3) proceeding as normal
It's also possible the git shas pre-hg aren't in the mapfile, which will make this moot.  I'm not sure.
I'm currently testing gecko-dev (beagle) on github-sync4.dmz.scl3.mozilla.com with this patch: https://github.com/petemoore/build-mozharness/compare/mozilla:production...bug1027167

However, in order to avoid re-convert the entire beagle histories in hggit, i am first rsync'ing the current working dir /opt/vcs2vcs/build to my account on github-sync4 first. However, since I do not have direct network connectivity between vcssync1.srv.releng.usw2.mozilla.com and github-sync4.dmz.scl3.mozilla.com to do this, I've resorted to pumping the 47GB of data through my laptop as an intermediary step -> however, being on an ADSL connection in Europe and the data being in America, this is taking some time - I guess it will take around 2 days in total to get the 47GB copied across - which is still quicker than the 8 or so days it takes to perform the conversion, or so I believe.
So since discovering bug 1034725 I think it does not make sense to globally enable git notes by default quite yet. When a fix for that lands, that would be a good time to do it.

However, regarding enabling publishing of map files by default (so long as a mapper url is specified) - I've been busy setting up the staging environment with beagle.

However, since the working directory is ~50GB it has been a difficult process - I am not able to sync directly from the production machine (vcssync1.srv.releng.usw2.mozilla.com) to the staging machine (github-sync4.dmz.scl3.mozilla.com) since the network flows do not allow it. Therefore I've been syncing via my laptop, which of course, is taking an age.

Taking a step back - it seems reasonable to me to consider a pragmatic approach might be to enable the mapper publishing in production, and if it fails or causes problems, to roll it back. It is essentially switching a config on, and if it fails, switching it off again. There is nothing else in the code that should be affected - so either it runs the http posts, or it doesn't. If it breaks, no harm is done, because nothing is currently relying on the production mapper system.

Therefore, if it is ok with you, Aki, Hal - I'd like to go ahead and do that next week one morning European time, while I am on buildduty. If it causes problems, I'll switch it off again. We've seen that it is already working for the build-* repos, and worst case it could bring down production mapper - which nothing is relying on. This could save several days of machine-setup and testing in staging, for a simple config on/off setting.

If you agree, I'll still add a patch to this bug for review, of course.

Thanks,
Pete
Flags: needinfo?(hwine)
Flags: needinfo?(aki)
I still worry a bit about this.
You're going to have to have a staging env for gecko.git anyway, correct?  I think if you're able to do this successfully from any one of gecko{.git,-dev,-projects} in staging, that's sufficient testing.

I have a gecko.git staging setup on gd2:~asasaki/, not currently running since maybe February (pre-Antarctica).  If you need to be able to sync stuff to your gecko.git staging env, you should be able to use that or gd4:~asasaki/initial3.tar.bz2 .
Flags: needinfo?(aki)
I defer to Aki on this -- I know of the existing requirements for the current mapper, which are a small subset of the mapper services provided by the new code (In legacy, they are completely separate sub-systems). As long as it doesn't break b2g builds :)

That said, if we're debating this much, perhaps this is a case to apply the YAGNI rule.
Flags: needinfo?(hwine)
Product: Release Engineering → Developer Services
Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/228]
Whiteboard: [kanban:engops:https://kanbanize.com/ctrl_board/6/228] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1003] [kanban:engops:https://kanbanize.com/ctrl_board/6/228]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1003] [kanban:engops:https://kanbanize.com/ctrl_board/6/228] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1012] [kanban:engops:https://kanbanize.com/ctrl_board/6/228]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1012] [kanban:engops:https://kanbanize.com/ctrl_board/6/228] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1013] [kanban:engops:https://kanbanize.com/ctrl_board/6/228]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1013] [kanban:engops:https://kanbanize.com/ctrl_board/6/228] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/1013]
Current plan:
 - enable pushing to mapper for gecko-dev
 - enable pushing to mapper for gecko-projects

We'll not add git-notes at this time.
Assignee: nobody → hwine
disabling old "upload" method. Enabling push to new mapper for gecko-dev first, as :kats used the old upload for that.
Attachment #8620450 - Flags: review?(pmoore)
Component: Mercurial: hg.mozilla.org → General
QA Contact: hwine
Comment on attachment 8620450 [details] [diff] [review]
tweak configs to push beagle to modern mapper

Review of attachment 8620450 [details] [diff] [review]:
-----------------------------------------------------------------

r+ with above changes

::: configs/vcs_sync/beagle.py
@@ +38,5 @@
>          "vcs": "hg",
> +        "mapper": {
> +            "url": "https://api.pub.build.mozilla.org/mapper",
> +            "project": "gecko-dev"
> +        },

I think you have to add this to every conversion_repo. For example see https://github.com/mozilla/build-mozharness/blob/1181704ac9b87559ed394ff214722e3c04611549/configs/vcs_sync/build-repos.py#L34-L37 (in this example the conversion_repos are added in a loop, but in the current beagle.py they are all listed in exploded form).

@@ +678,5 @@
> +        'publish-to-mapper',
> +        'push',
> +        'combine-mapfiles',
> +        'notify',
> +    ],

I think now that upload has gone you can remove the upload_config section too (https://github.com/mozilla/build-mozharness/blob/1181704ac9b87559ed394ff214722e3c04611549/configs/vcs_sync/beagle.py#L651-L657)
Attachment #8620450 - Flags: review?(pmoore) → review+
(erm "below" changes rather than "above" changes) :p
Comment on attachment 8620450 [details] [diff] [review]
tweak configs to push beagle to modern mapper

landed on both default & production, as vcs-sync impacting only
   https://hg.mozilla.org/build/mozharness/rev/8da12cef38df
   https://hg.mozilla.org/build/mozharness/rev/a05ca7f7bd46
Attachment #8620450 - Flags: checked-in+
pick up changes from comment 14, carry through :pmoore's r+
Attachment #8620491 - Flags: review+
Comment on attachment 8620491 [details] [diff] [review]
bz1027167-2.patch

landed on default & production, as impacts vcs-sync only:
   https://hg.mozilla.org/build/mozharness/rev/361d0597182c
   https://hg.mozilla.org/build/mozharness/rev/b8ef350e0bfd
Attachment #8620491 - Flags: checked-in+
Attached patch bustage patchSplinter Review
bustage - virtualenv package list needs updating when enabling pushes to mapper.

landed on both default & production:
   https://hg.mozilla.org/build/mozharness/rev/046017433da7
   https://hg.mozilla.org/build/mozharness/rev/222eb5626643
Attachment #8620680 - Flags: checked-in+
Speaking of publishing git-hg mappings, now that hg.mozilla.org can publish metadata on changeset pages, I'd eventually like to show the Git SHA-1 there as well. All I need is the mappings on the server so the lookup can be fast.
(In reply to Gregory Szorc [:gps] from comment #20)
> Speaking of publishing git-hg mappings, now that hg.mozilla.org can publish
> metadata on changeset pages, I'd eventually like to show the Git SHA-1 there
> as well. All I need is the mappings on the server so the lookup can be fast.

You should be able to get these from https://wiki.mozilla.org/ReleaseEngineering/Applications/Mapper

By the way, due to confusion around what constitutes a project, I'd propose having an api endpoint for pulling back mappings across all projects. In the end the SHAs are enough to uniquely identify commits, so the in the case a consumer knows a sha but doesn't know the (internal) project name, this shouldn't be an issue. Also useful if hg.mozilla.org just wants to pull all mappings regardless of project.

Note, we should probably also have a /projects api endpoint to pull back the list of projects.

Another consideration is whether the projects should map to a bunch of source repos and target repos, so combining the source repo(s) or target repo(s) with the associated commits would allow you to know exactly which repositories they reside on, e.g. you should be able to say "I have this git commit, which repos does this exist on in git, and which repos does it exist on in hg, and what is the hg commit sha for it?". An API endpoint could even return a list of urls to browse the commit on both all the git repos and all the hg repos it exists on.
Note, this is what the "project" essentially defines - a set of hg and git repos with inclusion/exclusion patterns for tags/heads and branch/tag name mapping between repos. So that information would have to be persisted in the database too, so that the full set of valid urls for all the locations of a given commit across all hg and git repos could be extrapolated.
FWIW I found that with my script, which just grepped the mapfile, I didn't have to know or care if a given SHA was hg or git, it would just find the entry in the mapfile either way and I could pick out the other one. The API endpoints for per-SHA lookup don't have this ability (you have to pick hg/ or /git/ in the URL).
Agreed - there is an argument for this too. You have a sha, someone has given it to you, and you're not sure if it is a git sha or an hg sha - you could go to an endpoint with just the sha, and it could return a bunch of urls pointing to this sha in all repos it exists in, plus the url of all the places it has been mirrored to (where it has an alternative sha). TBH it is a bit confusing in the api when you specify either /hg/... or /git/... since I can never remember if you have to specify the system the mapping is from, or the system you want to get the sha for! :) If nothing else, it might be better to have /hg-to-git/ or /git-to-hg/ instead of /hg/ or /git/.
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from bug 1027167 comment #23)
> FWIW I found that with my script, which just grepped the mapfile, I didn't
> have to know or care if a given SHA was hg or git, it would just find the
> entry in the mapfile either way and I could pick out the other one. The API
> endpoints for per-SHA lookup don't have this ability (you have to pick hg/
> or /git/ in the URL).

moved to bug 1174211 as separate request
(In reply to Gregory Szorc [:gps] from comment #20)
> Speaking of publishing git-hg mappings, now that hg.mozilla.org can publish
> metadata on changeset pages, I'd eventually like to show the Git SHA-1 there
> as well. All I need is the mappings on the server so the lookup can be fast.

As Pete noted, all the functionality you need is there today, as long as you know the repository -> project mapping. I'll cc you on that request. If you need anything else, please file a separate bug. I hope to close this one RSN :)
(In reply to Pete Moore [:pmoore][:pete] from comment #21)
> I'd propose
> having an api endpoint for pulling back mappings across all projects. In the
> end the SHAs are enough to uniquely identify commits, so the in the case a
> consumer knows a sha but doesn't know the (internal) project name, this
> shouldn't be an issue. Also useful if hg.mozilla.org just wants to pull all
> mappings regardless of project.

So far, I only see a hypothetical use case for this, so not opening a bug. If someone has a use case, please open a bug.

> 
> Note, we should probably also have a /projects api endpoint to pull back the
> list of projects.

bug 1174215 opened for the /projects endpoint

> Another consideration is whether the projects should map to a bunch of
> source repos and target repos, so combining the source repo(s) or target
> repo(s) with the associated commits would allow you to know exactly which
> repositories they reside on, e.g. you should be able to say "I have this git
> commit, which repos does this exist on in git, and which repos does it exist
> on in hg, and what is the hg commit sha for it?". An API endpoint could even
> return a list of urls to browse the commit on both all the git repos and all
> the hg repos it exists on.

Again, I don't see a use case, so didn't open a bug. This one is tricky as it's conflating "branch" and "repository" (as we historically have done.) We'd like to not (blindly) maintain that confusion going forward.
I recently had to lookup up the git equivalent of the mercurial cset 3c26bef95d54 (full hash is 3c26bef95d54870e5891b43b5fdbfabd1c8b026e). This doesn't exist either in the full mapfile returned by mapper, nor in the per-hash lookup (https://api.pub.build.mozilla.org/mapper/gecko-dev/rev/hg/3c26bef95d54870e5891b43b5fdbfabd1c8b026e). Do you know what's wrong?
Flags: needinfo?(hwine)
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #28)
> I recently had to lookup up the git equivalent of the mercurial cset
> This
> doesn't exist either in the full mapfile returned by mapper, nor in the
> per-hash lookup

Opened bug 1175684 for this bustage, to keep this bug for the deployment work.
Flags: needinfo?(hwine)
Not a priority, B2G deprioritised.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: