vcs sync should only push tags/heads that have changed since last successful push

RESOLVED WONTFIX

Status

Developer Services
General
RESOLVED WONTFIX
3 years ago
a year ago

People

(Reporter: pmoore, Assigned: kartik gupta, NeedInfo)

Tracking

(Blocks: 1 bug)

Details

(Whiteboard: [good first bug][lang=python])

Attachments

(5 attachments, 7 obsolete attachments)

(Reporter)

Description

3 years ago
For repos with hundreds of tags, when new commits are converted, we push all branches/tags regardless of whether these branches/tags have changed.

This has a performance impact, and can result in a change in a head resulting in hundreds of tags getting pushed, which do not need to be.

For example, buildbotcustom currently has 1391 tags and 6 branches (heads):

buildbotcustom master $ git ls-remote  mozilla | grep 'refs/tags' | wc -l
    1391
buildbotcustom master $ git ls-remote  mozilla | grep 'refs/heads' | wc -l
       6

When a change occurs e.g. on default branch, we will push all tags and all branches to each downstream mirror (in this case, one downstream mirror on github). We batch them up in groups of 10 heads/tags at a time, so for 1397 targets, we'd need to do 140 git push commands, each one taking a couple of seconds.

Instead if we only push the changed references, we should see a significant performance improvement.

To do this, we would need to additionally record the state of each branch/tag with each push, or read it dynamically upfront. e.g. before pushing, we can run git ls-remote -h <git repo> to get the branch refs, and git ls-remote -t <git repo> to get the tag refs, and these should run in a second or two, still much quicker than pushing all tags/heads. The benefit of this approach is just that it is a little bit safer: if someone were to delete tags from a downstream git server (although hopefully they wouldn't have permission) - this way we would spot that the tag needed to be pushed, because it wouldn't be present when we run the ls-remote to query it.
(Reporter)

Comment 1

3 years ago
I realise it isn't completely trivial - there are regex's for branches/tags which need to be respected. It might be something like we generate the full list of ref specs we want to push (as usual) but then have a pre-processing phase where from looking at the ref spec ("+local ref:remote ref") we can build a dictionary of all the local refs and remote refs with git branch -l and git tag -l (for the local ones) and git ls-remote -h and git ls-remote -t (for the remote ones) - and then iterate through the ref specs, comparing what is in the local dictionary for that local ref, what is in the remote dictionary for the remote ref spec, and if they are equal shas, remove it from the list...

Something like this?
It's not clear to me where we're being hurt by current performance. Can you elaborate on the current problems being experienced?
(Reporter)

Comment 3

3 years ago
(In reply to Hal Wine [:hwine] (use needinfo) from comment #2)
> It's not clear to me where we're being hurt by current performance. Can you
> elaborate on the current problems being experienced?

Hey Hal,

Currently, if there is a single commit to a repo which has > 1000 tags (which is all repos which are tagged with Firefox, Thunderbird releases, including all mozilla central repos, plus things like mozharness, tools, ...) it takes several minutes for the vcs sync process to run for this repo (maybe 5 minutes?). With this conceptually trivial change, we would reduce this approximately by a factor of 100 (1 git push + 1 git ls-remote instead of 140 git pushes).

So the improvements are:
*) each test cycle is 100 times quicker - this reduces the hurt during development/test cycles, where we iteratively make changes, and test them locally
*) the delay between making a change and it landing downstream is 100 times smaller (so downstream repo users get their change much quicker)
*) from a network bandwidth / CPU perspective, we need only a 100th of the hardware we currently need, since it is 100 times more efficient (of course, disk space may be a limiting factor) - this means we can scale much better
*) from a cost perspective, needing less equipment means the financial hurt is less
*) from github perspective, we are using far less of their resources, not needlessly pushing when there are no changes - this means they are less likely to block our traffic, and also frees up their resources for other worthy open source projects

Other than the benefits stated above, it is my opinion that great products (software or otherwise) are born from a process of constant incremental improvement, which involves many many iterations. If we want great products (which I believe we all do) then I think the way to achieve this is to improve them, when we see opportunities to do so, rather than accept them, however inefficient, only altering them when problems are being experienced.
Agreed in principle. And, knowing whether something is a "bug" or "enhancement" is useful for resource allocation, and when triaging this bug.

Comment 5

3 years ago
As noted in IRC.

* I agree with Pete that this is good to do in general
* We are experiencing github issues that may or may not be exacerbated by the lack of this fix.  I say yes; hwine says no.
** If we are experiencing issues with pushing multiple tags to github, it will be a matter of time before this spreads to other, more critical repos.
* We can potentially have a flag to push all tags for certain repos.  build/* is certainly a no brainer candidate for this.  --tags will allow for pushing all converted tags without listing.  We would need to make this a non-default option, because we don't want any chance of gecko* pushing all tags.  And if it's a non-default option, this bug may still be needed later for gecko*.

Short term fix: kill tag conversion/pushing in build/tools
Potential medium term fix: optional --tags push for certain repos.
Longer term fix: this bug

Comment 6

3 years ago
build/tools finally pushed successfully, so this can be a lower priority item again; we no longer need the short term fix above.  However, I still see this as needed at some point, at a non-blocker level.

Comment 7

3 years ago
Created attachment 8453605 [details] [diff] [review]
build-tags

I *think* this will kill the tag push for buildbot-configs, buildbotcustom, tools.
Attachment #8453605 - Flags: review?(pmoore)
(Reporter)

Comment 8

3 years ago
Comment on attachment 8453605 [details] [diff] [review]
build-tags

This patch is updated remote_targets, but the definition of which tags get pushed to github is defined in the targets section, not the remote targets section. No worries, I hadn't spotted you had created this patch - I created one in bug 1036819 for the same thing. The difference is I decided to stop pushing all tags, not just for these three repos, since I think this problem could spread on any repo with tags.

Thanks,
Pete
Attachment #8453605 - Flags: review?(pmoore) → review-
(Reporter)

Updated

3 years ago
Whiteboard: [good first bug][lang=python]
(Assignee)

Comment 9

3 years ago
Created attachment 8479946 [details]
this file is my config script

pmoore sir this the config file i am using
Flags: needinfo?(pmoore)
(Assignee)

Comment 10

3 years ago
Created attachment 8479947 [details]
build-repos_info.log

pmoore sir this is the log i am getting
(Reporter)

Comment 11

3 years ago
pmoore|dinner
18:22:39 kartikgupta0909: you need to create directory /home/kartik/build/stage_source/
18:22:53 e.g.
18:22:58 Can't run command ['/home/kartik/build/venv/bin/hg', '--config', 'web.cacerts=/etc/pki/tls/certs/ca-bundle.crt', 'pull'] in non-existent directory '/home/kartik/build/stage_source/build-autoland'
18:23:34 kartikgupta0909: there are no email-related error messages i see in the log file
18:24:35 kartikgupta0909: i would also recommend deleting /home/kartik/build/upload/repo_update.json since it looks like it has some pollution in there
18:24:41 it will get regenerated
Flags: needinfo?(pmoore)
(Reporter)

Comment 12

3 years ago
(In reply to kartik gupta from comment #10)
> Created attachment 8479947 [details]
> build-repos_info.log
> 
> pmoore sir this is the log i am getting


your config says:

21:03:56     INFO -  'hg_options': ('--config', 'web.cacerts=/etc/pki/tls/certs/ca-bundle.crt'),

then you get error:

ERROR -  abort: could not find web.cacerts: /etc/pki/tls/certs/ca-bundle.crt

See http://mercurial.selenic.com/wiki/CACertificates#Configuration_of_HTTPS_certificate_authorities - you need to change the location of web.cacerts in your config.

Thanks,
Pete
Flags: needinfo?(kartikgupta0909)
(Assignee)

Comment 13

3 years ago
(In reply to pmoore from comment #12)
i have added the following line to /etc/mercurial/hgrc 
[web]
cacerts = etc/ssl/certs/ca-certificates.crt

i also tried adding this
[web]
cacerts = /etc/pki/tls/certs/ca-bundle.crt
but still its giving the same error as before
Flags: needinfo?(kartikgupta0909) → needinfo?(pmoore)
(Assignee)

Comment 14

3 years ago
(In reply to pmoore from comment #12)
after fixing that bug in my config file, it did not give the same error , now its giving the following error
KeyError: 'RELENGAPI_INSERT_HGGIT_MAPPINGS_AUTH_TOKEN'
(Reporter)

Comment 15

3 years ago
(In reply to kartik gupta from comment #14)
> (In reply to pmoore from comment #12)
> after fixing that bug in my config file, it did not give the same error ,
> now its giving the following error
> KeyError: 'RELENGAPI_INSERT_HGGIT_MAPPINGS_AUTH_TOKEN'

Hi Kartik,

That is great news.

The new error you are getting is because we publish the git commit sha <-> hg changeset sha mappings to a service called "mapper", so that other tools can easily lookup git shas based on the hg sha, and vice versa. For the purposes of this bug, this activity is not needed, and can be disabled by removing publish-to-mapper from the list of default actions in your config. Otherwise you would need to set up a mapper environment too, which is quite a big job, and not really needed for this bug.

Let me know how you get on!

Thanks,
Pete
Flags: needinfo?(pmoore) → needinfo?(kartikgupta0909)
(Assignee)

Comment 16

3 years ago
Created attachment 8483419 [details]
Now everything else is working, but it is having problems in the final step of pushing it to my git repo.
Flags: needinfo?(kartikgupta0909)
(Assignee)

Comment 17

3 years ago
Created attachment 8483422 [details]
Now everything else is working, but it is having problems in the final step of pushing it to my git repo.

pmoore ,
after fixing my config file it is unable to push to my git repo, can you see the log and tell me the possibilities as to why it is giving an error.
Flags: needinfo?(pmoore)
(Reporter)

Comment 18

3 years ago
Kartik,

In the logs it says:

17:12:43     INFO - Calling ['git', 'push', '-f', 'git@github.com:kartikgupta0909/gittest.git', '+refs/notes/commits:refs/notes/commits', '+refs/heads/master:refs/heads/master'] with output_timeout 1800
17:12:43     INFO -  ssh: connect to host github.com port 22: Connection refused
17:12:43     INFO -  fatal: Could not read from remote repository.
17:12:43     INFO -  Please make sure you have the correct access rights
17:12:43     INFO -  and the repository exists.

What happens when you run the command manually? Do you get the same output. Make sure to run it locally from the same directory as it states in the logs.

The logs are very detailed - please have a good luck at them when you hit a problem - if you can't solve it, of course no problem to contact me. But most of the time the logs will tell you what you need to know.

Good luck with this last step. I think you're close!

Thanks,
Pete
Flags: needinfo?(pmoore) → needinfo?(kartikgupta0909)
(Assignee)

Comment 19

3 years ago
Created attachment 8496822 [details]
vcs_copy.py

The Patch has been added from line 547 to 574 in the attached file.
Flags: needinfo?(kartikgupta0909) → needinfo?(pmoore)
(Reporter)

Comment 20

3 years ago
Hi Kartik,

Thanks for your patch. The way we normally review these is to attach a diff of the changes, rather than an original file - e.g. if you worked in git or hg, you can git diff or hg diff between the original revision, and the completed revision, to get a diff output of your changes. Then redirect this to a file, call it <something>.patch, and attach this to the bug. I usually call it bug<bug number>_<repo to patch>_v<version number>.patch.

I will create bug1020613_mozharness_v1.patch for you now, attach it to this bug (so you see how it should be done) and will then review it.

Let me know if you have questions.

I actually generated your patch like this:

# clone mozharness
git clone git@github.com:mozilla/build-mozharness mozharness

# cd into the mozharness checkout
cd mozharness

# replace the vcs_sync.py script with your new version
curl -L 'https://bugzilla.mozilla.org/attachment.cgi?id=8496822' > scripts/vcs-sync/vcs_sync.py

# generate the patch file, based on the output of the diff
git diff > bug1020613_mozharness_v1.patch

When attaching the patch file to the bug, you can mark it as needing to be reviewed, by setting the "review" flag to "?" in the "Create New Attachment" page, and then in the text field that appears after selecting "?", you can type the full email address of the reviewer. This is different to "needinfo" - that is when you ask a question about something - the "review" flag of an attachment says "I have made a code change that I would like you to review".

Thanks,
Pete
Flags: needinfo?(pmoore)
(Reporter)

Comment 21

3 years ago
Created attachment 8497371 [details] [diff] [review]
bug1020613_mozharness_v1.patch

(In reply to Pete Moore [:pete][:pmoore] from comment #20)

> I will create bug1020613_mozharness_v1.patch for you now, attach it to this
> bug (so you see how it should be done) and will then review it.
Attachment #8496822 - Attachment is obsolete: true
Attachment #8497371 - Flags: review?(pmoore)
(Reporter)

Comment 22

3 years ago
Comment on attachment 8497371 [details] [diff] [review]
bug1020613_mozharness_v1.patch

Review of attachment 8497371 [details] [diff] [review]:
-----------------------------------------------------------------

First, many thanks Kartik, for all the work you have done so far! It is a great first patch.

This is your first patch, so please don't be surprised there is a lot of feedback in the review for you, this is quite normal. Also, since it is your first, there are also some things to learn about the mechanics of requesting a review - i.e. how to create a patch, how to mark it as needing review, etc.

Looking forward to your next patch - let me know if you need a hand with anything.

Good luck!
Pete

::: scripts/vcs-sync/vcs_sync.py
@@ +380,4 @@
>                          'error_list': HgErrorList,
>                          'success_codes': [0, 1, 256],
>                      },
> +		)

Please use spaces not tabs on line 383

@@ +380,5 @@
>                          'error_list': HgErrorList,
>                          'success_codes': [0, 1, 256],
>                      },
> +		)
> +                

Please remove the trailing whitespace on line 384

@@ +385,5 @@
>                  if status in (1, 256):
>                      self.info("No changes for %s; skipping." % repo_name)
>                      # Overload self.failures to tell downstream actions to noop on
>                      # this repo
> +		    self.failures.append(repo_name)

Please use spaces not tabs in line 389

@@ +429,4 @@
>              out of the target_repo list loop, and the commands loop borks that.
>              """
>          commands = []
> +        if refs_list:	    

Please delete trailing whitespace on line 432

@@ +544,4 @@
>                                  if tag_name != 'tip' and regex.search(tag_name) is not None:
>                                      refs_list += ['+refs/tags/%s:refs/tags/%s' % (tag_name, tag_name)]
>                                      continue
> +		"""Here begins the patch added by Kartik Gupta"""

In the final patch, please don't include this comment on line 547 or the one on line 576

@@ +546,5 @@
>                                      continue
> +		"""Here begins the patch added by Kartik Gupta"""
> +		path = ""
> +		current = os.getcwd()
> +		path = os.getcwd() + "/build/conversion/%s/.git" %(repo_config['repo_name'])

Please use conversion_dir variable, that was already set on line 457, e.g.  os.path.join(conversion_dir, '.git')

@@ +549,5 @@
> +		current = os.getcwd()
> +		path = os.getcwd() + "/build/conversion/%s/.git" %(repo_config['repo_name'])
> +		os.chdir(path)
> +		os.system("git show-ref --tags > local_tags")
> +		os.system("git ls-remote -t https://github.com/kartikgupta0909/%s > remote_tags" % (repo_config['repo_name']))

1) There is a hardcoded reference to https://github.com/kartikgupta0909/ - that should be fixed

2) When running commands, please use something like:

output = self.get_output_from_command(<command to run>, cwd=os.path.join(conversion_dir, '.git')) rather than executing commands directly with os.system(..). This is because we want the commands to use the mozharness logging framework so we can see in logs what is going on. This also means we do not need to create files to store the output, and the output will also be logged in the logs.

Look in the rest of the script for examples of self.get_output_from_command.

@@ +572,5 @@
> +			    refs_list.remove(j)
> +
> +		os.chdir(current)
> +
> +		"""Here Ends My Patch"""

1) Please use spaces, not tabs, in this whole section (lines 548 -> 575).
2) This only covers tags, not branches - please also take care to only push branches that have changed too
3) It might be more efficient and easier to troubleshoot if tags and branches are only added if there are changes, rather than adding everything, and then later removing entries (line 572). In other words, maybe better to read tags and branches from local and remote repos first, and then when adding entries to refs_list (e.g. lines 511, 517, 525, 545) only do this if the local and remote versions are known to be different. In this case, the reading of the local tags and branches needs to have happened before we reach line 511, so that at that point, it can be decided whether to add items to refs_list or not.

@@ +922,4 @@
>                  git_revision = self._query_mapped_revision(
>                      revision=rev, mapfile=generated_mapfile)
>                  repo_map['repos'][repo_name]['branches'][branch]['git_revision'] = git_revision
> +	self._write_repo_update_json(repo_map)

Please use spaces not tabs on line 925
Attachment #8497371 - Flags: review?(pmoore) → review-
(Assignee)

Comment 23

3 years ago
Created attachment 8498823 [details] [diff] [review]
bug1020613_mozharness_v2.patch
Attachment #8498823 - Flags: review?(pmoore)
(Reporter)

Comment 24

3 years ago
Comment on attachment 8498823 [details] [diff] [review]
bug1020613_mozharness_v2.patch

Review of attachment 8498823 [details] [diff] [review]:
-----------------------------------------------------------------

Kartik,

Overall, this is a great patch! There are still just a few things that need cleaning up. But we're getting close! =)

You attached the patch correctly to the bug, many thanks for this!

Again this patch had many tabs for indentation, rather than spaces. Please always use spaces and not tabs - maybe if you configure this in your IDE it will avoid that tabs creep into the patch.

Looking forward to your next version!

Kind regards,
Pete

::: scripts/vcs-sync/vcs_sync.py
@@ +496,4 @@
>                  # We query hg for these because the conversion dir will have
>                  # branches from multiple hg repos, and the regexes may match
>                  # too many things.
> +		output1 = self.get_output_from_command("git show-ref --tags",cwd=os.path.join(conversion_dir, '.git'))

1) a space after the comma here: ,cwd 
2) a code comment would be helpful, like "we only want to push branches and tags that have changed, so first store details of all local branches and tags, and all remote branches and tags, so that only changed ones get pushed." (obviously split over several lines if the text is too long)
3) maybe local_tags_output / remote_tags_output instead of output1 and output2 variable names?

@@ +496,5 @@
>                  # We query hg for these because the conversion dir will have
>                  # branches from multiple hg repos, and the regexes may match
>                  # too many things.
> +		output1 = self.get_output_from_command("git show-ref --tags",cwd=os.path.join(conversion_dir, '.git'))
> +		output2 = self.get_output_from_command("git ls-remote -t https://github.com/mozilla/%s" % repo_config['repo_name'], cwd=os.path.join(conversion_dir, '.git'))

1) you don't need cwd=... for git ls-remote command - it can be run from any directory
2) the url hardcoded to github - the url should be from variable target_git_repo, we don't only mirror to github

@@ +510,4 @@
>                  if repo_config.get('generate_git_notes', False):
>                      refs_list.append('+refs/notes/commits:refs/notes/commits')
> +		output1 = self.get_output_from_command('git show-ref --heads',cwd = os.path.join(conversion_dir, '.git'))
> +	        output2 = self.get_output_from_command('git ls-remote -h https://github.com/mozilla/%s' % repo_config['repo_name'],cwd=os.path.join(conversion_dir,'.git'))

1) same with variable names here - maybe local_branches_output/remote_branches_output instead of output1/output2?
2) same with not needing cwd=... here
3) same with the url hardcoded to github - the url should be from variable target_git_repo, we don't only mirror to github

@@ +527,4 @@
>                  # local git branch and the value is the target git branch.
>                  if target_config.get("branch_config"):
>                      for (branch, target_branch) in branch_map.items():
> +			if remote_heads.has_key("refs/heads/%s" % branch) and remote_heads["refs/heads/%s" % branch] == local_heads["refs/heads/%s" % branch]:

1) local branch name is in /branch/ variable, and remote name is in /target_branch/ variable - you are using /branch/ for both
2) here I believe you can use remote_heads.get("refs/heads/%s" % target_branch, None) rather than first checking if it exists, and then using it - if it does not exist, /None/ is returned

@@ +534,5 @@
>                  # branch; use the git branch for both local and target git
>                  # branch names.
>                  else:
>                      for (hg_branch, git_branch) in branch_map.items():
> +			if remote_heads.has_key("refs/heads/%s" % git_branch) and remote_heads["refs/heads/%s" % git_branch] == local_heads["refs/heads/%s" % git_branch]:

1) same as above - you can use remote_heads.get(..., None) rather than check if key exists first
2) this time branch local name and remote name are the same - so correct to use git_branch for both

@@ +545,4 @@
>                  tag_config = target_config.get('tag_config', repo_config.get('tag_config', {}))
>                  if tag_config.get('tags'):
>                      for (tag, target_tag) in tag_config['tags'].items():
> +			tag = tag

I think this line does nothing and can be removed

@@ +545,5 @@
>                  tag_config = target_config.get('tag_config', repo_config.get('tag_config', {}))
>                  if tag_config.get('tags'):
>                      for (tag, target_tag) in tag_config['tags'].items():
> +			tag = tag
> +			if remote_tags.has_key("refs/tags/%s" % (tag)) and remote_tags["refs/tags/%s" % (tag)] == local_tags["refs/tags/%s" % (tag)]:

1) Again no need for .has_key if you use .get(..., None)
2) Again use target_tag when looking at remote tags, and tag for local tags
3) No need to wrap tag in brackets - i.e. (tag) -> tag

@@ +568,4 @@
>                              tag_name = tag_parts[0]
>                              for regex in regex_list:
>                                  if tag_name != 'tip' and regex.search(tag_name) is not None:
> +			            tag = tag_name

I think this line does not add any clarity - better to just use tag_name below rather than create new variable tag that is the same

@@ +568,5 @@
>                              tag_name = tag_parts[0]
>                              for regex in regex_list:
>                                  if tag_name != 'tip' and regex.search(tag_name) is not None:
> +			            tag = tag_name
> +				    if remote_tags.has_key("refs/tags/%s" % (tag)) and remote_tags["refs/tags/%s" % (tag)] == local_tags["refs/tags/%s"      % (tag)]:

Same as above:
1) no need for .has_key
2) no need for brackets around tag: (tag) -> tag
3) switch to tag_name instead of tag
4) local and remote tag have same name, so correct to use the same variable
5) extra spacing has crept in, please can you remove:
  ["refs/tags/%s"      % (tag)] -> ["refs/tags/%s" % (tag)]
Attachment #8498823 - Flags: review-
(Reporter)

Comment 25

3 years ago
Kartik,

By the way, the review might be easier to read if you click on:
https://bugzilla.mozilla.org/page.cgi?id=splinter.html&bug=1020613&attachment=8498823

And then click on "All Files"...

I remember it was several reviews before I noticed this myself - hope this helps.

Pete
Assignee: nobody → kartikgupta0909
Status: NEW → ASSIGNED
(Reporter)

Updated

3 years ago
Attachment #8497371 - Attachment is obsolete: true
(Reporter)

Comment 26

3 years ago
Last point - when you attach a new patch, you can mark it as a replacement for the previous one (there is a tickbox for obsoleting other bug attachments in the html form you complete to add an attachment) and this way, the previous patch becomes obsoleted and is not displayed by default.
(Reporter)

Comment 27

3 years ago
Comment on attachment 8483419 [details]
Now everything else is working, but it is having problems in the final step of pushing it to my git repo.

Looks like this was attached twice - obsoleting one of them...
Attachment #8483419 - Attachment is obsolete: true
(Assignee)

Comment 28

3 years ago
It was working fine in my repo, can you please attach the info log.
Flags: needinfo?(pmoore)
(Reporter)

Comment 29

3 years ago
(In reply to kartik gupta from comment #28)
> It was working fine in my repo, can you please attach the info log.

I think this might be the point of confusion:

I obsoleted a file which you had named "Now everything else is working, but it is having problems in the final step of pushing it to my git repo." in comment 27 - so that was your comment, not mine! :D
Flags: needinfo?(pmoore)
(Reporter)

Comment 30

3 years ago
Hal has highlighted some important points in IRC:

1) We should try to stand up tests to test this stuff - at the moment we don't have any
2) We should try to reduce network traffic - maybe we can combine the two git ls-remote commands into one (i.e. git ls-remote -h -t <repo>)
3) We need to make sure we avoid hanging git commands (we touch on hanging commands in bug 1054050) - maybe a timeout should be used for the new git remote calls
4) We need to make sure we handle errors well - although I think if we hit an error, with the current implementation we will fail fast - but we should test this. Subsequent iterations should recover.

Kartik - this is a lot to take in, so I would propose continue with cleaning up your current patch, and we can address these issues afterwards. The only one you might want to join in with your current patch is point number 2). We can work on 1) together afterwards, and probably Ricardo might be able to help us with 3) from his work in bug 1054050. 4) can be tested manually, maybe also have unit tests.
(Reporter)

Updated

3 years ago
Attachment #8498823 - Flags: review?(pmoore)
(Assignee)

Comment 31

3 years ago
Created attachment 8499444 [details] [diff] [review]
Improved patch
Attachment #8498823 - Attachment is obsolete: true
Attachment #8499444 - Flags: review?(pmoore)
(Assignee)

Comment 32

3 years ago
Created attachment 8500482 [details] [diff] [review]
bug1020613_mozharness_v4.patch

In this version the git ls-remote -t and git ls-remote -h commands have been combined to reduce network traffic
Attachment #8499444 - Attachment is obsolete: true
Attachment #8499444 - Flags: review?(pmoore)
Attachment #8500482 - Flags: review?(pmoore)
(Reporter)

Comment 33

3 years ago
Comment on attachment 8500482 [details] [diff] [review]
bug1020613_mozharness_v4.patch

Review of attachment 8500482 [details] [diff] [review]:
-----------------------------------------------------------------

Really great work Kartik. Some small points - I'm confident after you've addressed these, we'll be able to land the code in staging, and run some tests, and then roll out to production. The only one that is a real sticking point and was the reason for the r- was that the variable remote_tags_output stores both tags and heads, which is a little confusing. The other points are all minor beautifications based on my personal preferences.

Can you also give details about what testing you have done? What might be useful here, is if all scenarios can be tested:

 * Running the script, then adding a new tag, and then running again. Did just the one new tag get added?
 * Running the script, then changing a tag, and running again. Did just the changed tag get updated?
 * The above two tests for branches instead of tags.
 * Then a combination test - maybe adding a new tag, changing a tag, adding a new branch, and changing a branch - did all four get added?

These tests, we can also write as unit tests afterwards, so that they will be tested on our CI every time there is a change (see http://hg.mozilla.org/build/mozharness/file/default/test).

Looking forward to your next patch! Many thanks again for all your work on this!

Pete

::: scripts/vcs-sync/vcs_sync.py
@@ +496,5 @@
>                  # We query hg for these because the conversion dir will have
>                  # branches from multiple hg repos, and the regexes may match
>                  # too many things.
> +                # Storing the details of local and remote tags and heads to filter the mirroring.
> +                local_tags_output = self.get_output_from_command("git show-ref --tags", cwd=os.path.join(conversion_dir, '.git'))

Since you combine the remote tags and heads, I think it would be cleaner also to combine the local tags and heads - e.g.

git show-ref --tags --heads

This will also save some lines of code too.

@@ +497,5 @@
>                  # branches from multiple hg repos, and the regexes may match
>                  # too many things.
> +                # Storing the details of local and remote tags and heads to filter the mirroring.
> +                local_tags_output = self.get_output_from_command("git show-ref --tags", cwd=os.path.join(conversion_dir, '.git'))
> +                remote_tags_output = self.get_output_from_command("git ls-remote -t -h %s" % target_git_repo)

Can we call this remote_tags_and_heads_output?

@@ +500,5 @@
> +                local_tags_output = self.get_output_from_command("git show-ref --tags", cwd=os.path.join(conversion_dir, '.git'))
> +                remote_tags_output = self.get_output_from_command("git ls-remote -t -h %s" % target_git_repo)
> +                local_tags = {}
> +                remote_tags_and_heads = {}
> +                local_tags_output = re.split('\n',local_tags_output)

Optional: an alternative is to delete line 504 and just add .splitlines() in line 506, i.e. in line 506:
    for in in local_tags_output.splitlines()

The same applies for lines 508/509.

If you keep your version, please add a space after the ','.

@@ +501,5 @@
> +                remote_tags_output = self.get_output_from_command("git ls-remote -t -h %s" % target_git_repo)
> +                local_tags = {}
> +                remote_tags_and_heads = {}
> +                local_tags_output = re.split('\n',local_tags_output)
> +		# Iterating over the output to convert into a dictionary of the format "tag_name:tag_sha"

Two tabs in line 505 - please change to spaces

@@ +516,5 @@
> +                local_heads_output = re.split('\n',local_heads_output)
> +                # Iterating over the heads outputs to convert them into a dictionary of the format "head_name:head_sha"
> +                for i in local_heads_output:
> +                    local_heads[i.split()[1]] = i.split()[0]
> +	        branch_map = self.query_branches(

Tab in line 520 - please change to spaces

@@ +525,4 @@
>                  # local git branch and the value is the target git branch.
>                  if target_config.get("branch_config"):
>                      for (branch, target_branch) in branch_map.items():
> +                        if remote_tags_and_heads.get("refs/heads/%s" % target_branch, None) == local_heads.get("refs/heads/%s" % branch):

I realise now that you don't technically need the ", None" since None is already the default if you do not provide a default - so maybe best to have ", None" on neither side of the condition, or on both, e.g.:

if remote_tags_and_heads.get("refs/heads/%s" % target_branch) == local_heads.get("refs/heads/%s" % branch):

I personally prefer without ", None" on both sides - but you can choose what you like.

Same for lines 536, 546, 568.

@@ +532,5 @@
>                  # branch; use the git branch for both local and target git
>                  # branch names.
>                  else:
>                      for (hg_branch, git_branch) in branch_map.items():
> +                        if remote_tags_and_heads.get("refs/heads/%s" % git_branch, None) == local_heads.get("refs/heads/%s" % hg_branch):

this should be git_branch on both sides - not hg_branch (see line 538)
Attachment #8500482 - Flags: review?(pmoore) → review-
(Reporter)

Comment 34

3 years ago
(In reply to Pete Moore [:pete][:pmoore] from comment #33)

> The only one that is a real sticking
> point and was the reason for the r- was that the variable remote_tags_output
> stores both tags and heads, which is a little confusing.

and the bug on line 536 (using hg_branch instead of git_branch)
(Assignee)

Comment 35

3 years ago
Created attachment 8501763 [details] [diff] [review]
bug1020613_mozharness_v5.patch

The suggested changes have been made in this version.
Attachment #8500482 - Attachment is obsolete: true
Attachment #8501763 - Flags: review?(pmoore)
(Reporter)

Comment 36

3 years ago
Comment on attachment 8501763 [details] [diff] [review]
bug1020613_mozharness_v5.patch

Review of attachment 8501763 [details] [diff] [review]:
-----------------------------------------------------------------

Awesome patch!!!

I plan to test this in staging, and if it goes well, we can land it, and roll it out to production.

There are two ways we can land this commit, I think option 1) is the best since it guarantees that we get the user and date fields of the hg changeset exactly as you have them.

1) You generate a patch using hg export
2) I land this patch with your user

So far you provided a diff, which is enough for me to review, but if I should land the patch, it is useful to have the hg headers that store the information from the commit, such as date it landed, your commit message, and your user (name and email address) - since you do not have permission (yet) to land changes in our mozilla build repos.

Step 1 would look something like this:

<install hg>
<set up ~/.hgrc with your username, and any other settings you want>
cd <mozharness *git* directory where you have your code in git>
hg clone http://hg.mozilla.org/build/mozharness /tmp/mozharness_hg
<git command you normally use to generate patch> | patch -p1 -i - -d /tmp/mozharness_hg
hg -R /tmp/mozharness_hg commit -m "Bug 1020613 - vcs sync should only push tags/heads that have changed since last successful push,r=pmoore"
hg -R /tmp/mozharness_hg export -r tip > ~/bug1020613_mozharness_v5_with_hg_headers.patch
<upload new file ~/bug1020613_mozharness_v5_with_hg_headers.patch to this bug for review, obsoleting previous version


The steps above essentially import your git patch into hg, then commit it in hg, then export the commit (containing the commit message, etc) so that I can land the patch exactly as you have it locally.

Please note our convention is to include the bug number and description in the commit message (I copy/paste directly from the bugzilla bug) and to include r=<reviewer> at the end, hence the r=pmoore at the end of the commit message.
Attachment #8501763 - Flags: review?(pmoore) → review+
(Assignee)

Comment 37

3 years ago
Created attachment 8505434 [details] [diff] [review]
bug1020613_mozharness_v5.patch
Attachment #8505434 - Flags: review?(pmoore)
(Reporter)

Comment 38

3 years ago
Comment on attachment 8505434 [details] [diff] [review]
bug1020613_mozharness_v5.patch

Review of attachment 8505434 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks Kartik! This patch is almost identical to the last, and good enough to go.

The next step is that I would like to write a basic unit test to start the ball rolling, which I hope to find some time to do soon. Then we can add specific tests for your new functionality.

Will write back in the bug when I'm ready with that!

Many thanks again!
Pete

::: scripts/vcs-sync/vcs_sync.py
@@ +534,5 @@
>                  # may match too many things.
>                  tag_config = target_config.get('tag_config', repo_config.get('tag_config', {}))
>                  if tag_config.get('tags'):
>                      for (tag, target_tag) in tag_config['tags'].items():
> +                        if remote_tags_and_heads.get("refs/tags/%s" % target_tag, None) == local_tags_and_heads.get("refs/tags/%s" % tag):

It looks like ", None" has slipped in here since the last patch - but it does no harm.

Other than that, this patch is the same as the last one.
Attachment #8505434 - Flags: review?(pmoore) → review+
(Reporter)

Comment 39

3 years ago
Comment on attachment 8501763 [details] [diff] [review]
bug1020613_mozharness_v5.patch

Superseded by newer patch
Attachment #8501763 - Attachment is obsolete: true
(Reporter)

Comment 40

3 years ago
Putting a needinfo on myself so I don't forget to work on the unit tests..
Flags: needinfo?(pmoore)
(Reporter)

Updated

3 years ago
Blocks: 1091089
(Reporter)

Comment 41

3 years ago
OK, so before we go any further, I'd like the unit tests to run in travis, so that contributors (such as Kartik) can see them... Working on this first over in bug 1076810.
Depends on: 1076810
(Reporter)

Comment 42

2 years ago
Now we have mozharness unit tests running in travis:
https://travis-ci.org/mozilla/build-mozharness

There are instructions for contributors to fork and run the unit tests in travis themselves:
https://github.com/mozilla/build-mozharness/blob/master/README.md

Kartik has kindly offered to take a look at setting up unit tests for the changes delivered in this bug, so hopefully we will then be in a position to land the changes and finally close the bug.

Thanks Kartik! Give me a shout if you need a hand with anything.

Pete
Flags: needinfo?(pmoore)
(Reporter)

Updated

2 years ago
Blocks: 1133685
(Assignee)

Comment 43

2 years ago
Hello Pete, sorry for neglecting this bug for a long time , i was a little busy in contributing to some other org for my Gsoc project.
Can you explain in a little more detail what exactly do i need to do to run tests in travis. I have no prior experience in using that.
In the link you provided i see that we have to tox, will i have to write simple python scripts that contain unit tests, and travis records the result of these tests. Is it how travis works?
(Reporter)

Comment 44

2 years ago
(In reply to kartik gupta from comment #43)
> Hello Pete, sorry for neglecting this bug for a long time , i was a little
> busy in contributing to some other org for my Gsoc project.
> Can you explain in a little more detail what exactly do i need to do to run
> tests in travis. I have no prior experience in using that.
> In the link you provided i see that we have to tox, will i have to write
> simple python scripts that contain unit tests, and travis records the result
> of these tests. Is it how travis works?

Yes, travis is a general CI, and easy to set up for github projects. If you follow the instructions https://github.com/mozilla/build-mozharness/blob/master/README.md you will be able to set up testing in travis for your mozharness fork. To test locally, just run tox (the README should help you with this too).

Good luck! =)
Flags: needinfo?(kartikgupta0909)
Move to new product/component for vcs-sync work, flag for followup by :hwine
Component: Tools → General
Flags: needinfo?(hwine)
Product: Release Engineering → Developer Services
QA Contact: hwine
(Reporter)

Updated

a year ago
Flags: needinfo?(kartikgupta0909)
(Reporter)

Comment 46

a year ago
Too bad this never landed, especially as it was written, but just not thoroughly tested. But I don't think it is realistic it will be landed now, as vcs sync is not an important system with deprioritisation of B2G.
Status: ASSIGNED → RESOLVED
Last Resolved: a year ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.