Closed Bug 1408051 Opened 3 years ago Closed 1 month ago

Remove mozharness' copy of virtualenv and use the one under /third_party/python instead

Categories

(Release Engineering :: Applications: MozharnessCore, enhancement)

enhancement
Not set
normal

Tracking

(firefox80 fixed, firefox81 fixed)

RESOLVED FIXED
Tracking Status
firefox80 --- fixed
firefox81 --- fixed

People

(Reporter: ahal, Assigned: hamzah18051, Mentored)

References

(Blocks 3 open bugs)

Details

Attachments

(3 files)

In bug 1297515 we copied the in-tree virtualenv to testing/mozharness/external_tools. We needed to do this to upgrade pip, since buildbot was downloading the 'testing/mozharness' directory directly from hg.m.o and there was no easy way to also package other directories.

This means there are now two copies of virtualenv in the tree. Once everything is off of buildbot, we can remove the copy under testing/mozharness/external_tools and instead package virtualenv into the mozharness.zip artifact that taskcluster builds produce.
Catlee indicated this should be actionable now that everything is off buildbot. If anyone would like to try and take a stab at this I'm happy to mentor. Otherwise I may get around to this eventually :).
Mentor: ahalberstadt
We have various bits of releng automation (merge day which gets it from the archiver [1], l10n bumper which downloads it from hg.m.o as a tarball [2]) which would need to be updated to get virtualenv from the new location.

[1] https://github.com/mozilla-releng/releasewarrior-2.0/blob/master/docs/mergeduty/howto.md#connect-to-remote-instance-and-get-a-copy-of-mozharness
[2] https://hg.mozilla.org/build/puppet/file/tip/modules/l10n_bumper/templates/download_mozharness.sh.erb#l7
Assignee: nobody → catlee
Comment on attachment 8976582 [details]
Bug 1408051: Remove mozharness' copy of virtualenv

https://reviewboard.mozilla.org/r/244686/#review250744

::: testing/mozharness/mozharness/base/python.py:344
(Diff revision 1)
>          c = self.config
>          dirs = self.query_abs_dirs()
>          venv_path = self.query_virtualenv_path()
>          self.info("Creating virtualenv %s" % venv_path)
>  
> -        # Always use the virtualenv that is vendored since that is deterministic.
> +        for p in ('base_work_dir', 'abs_src_dir'):

This comment is probably worth keeping.

::: testing/mozharness/mozharness/base/python.py:510
(Diff revision 1)
>          if sys.version_info[:2] < (2, 7):
>              self.warning('Resource monitoring will not be enabled! Python 2.7+ required.')
>              return
>  
>          try:
> +            self.activate_virtualenv()

Does this want to be in the `try:` block? I think a failure here is distinct enough that it shouldn't be conveted to the generic error about the resource monitor.
Attachment #8976582 - Flags: review?(mozilla) → review+
Pushed by catlee@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a00ab75aea2b
Remove mozharness' copy of virtualenv r=tomprince
Comment on attachment 8976582 [details]
Bug 1408051: Remove mozharness' copy of virtualenv

https://reviewboard.mozilla.org/r/244686/#review250914

::: python/mozbuild/mozbuild/action/test_archive.py:376
(Diff revision 3)
>              'pattern': 'mozharness/**',
>          },
> +        {
> +            'source': buildconfig.topsrcdir,
> +            'base': 'third_party/python/virtualenv',
> +            'dest': 'third_party/python/virtualenv',

Would it be better to prefix this with `mozharness/`? Or, maybe even better, change the prefix to match the old location: `mozharness/external_tools/virtualenv`? It is unusual to have a a zip or tarball extract to multiple top-level directories.

This means we don't need the changes to the taskcluster scripts (which isn't the reason), and some changes to the logic for finding the virtualenv location.
Attachment #8976582 - Flags: review+ → review-
Comment on attachment 8976582 [details]
Bug 1408051: Remove mozharness' copy of virtualenv

https://reviewboard.mozilla.org/r/244686/#review250914

> Would it be better to prefix this with `mozharness/`? Or, maybe even better, change the prefix to match the old location: `mozharness/external_tools/virtualenv`? It is unusual to have a a zip or tarball extract to multiple top-level directories.
> 
> This means we don't need the changes to the taskcluster scripts (which isn't the reason), and some changes to the logic for finding the virtualenv location.

To be clear:

We should definitely have mozharness.zip only have one top-level directory. I don't have strong feelings on where virtualenv goes inside of it.
(In reply to Tom Prince [:tomprince] from comment #11)
> Comment on attachment 8976582 [details]
> Bug 1408051: Remove mozharness' copy of virtualenv
> 
> https://reviewboard.mozilla.org/r/244686/#review250914
> 
> > Would it be better to prefix this with `mozharness/`? Or, maybe even better, change the prefix to match the old location: `mozharness/external_tools/virtualenv`? It is unusual to have a a zip or tarball extract to multiple top-level directories.
> > 
> > This means we don't need the changes to the taskcluster scripts (which isn't the reason), and some changes to the logic for finding the virtualenv location.
> 
> To be clear:
> 
> We should definitely have mozharness.zip only have one top-level directory.
> I don't have strong feelings on where virtualenv goes inside of it.

I wanted to try and keep the structure of mozharness.zip as close as possible to the original source directory. I'll look into moving virtualenv into mozharness/third_party/... inside the zip file.
Flags: needinfo?(catlee)
(In reply to Tom Prince [:tomprince] from comment #4)
> ::: testing/mozharness/mozharness/base/python.py:510
> (Diff revision 1)
> >          if sys.version_info[:2] < (2, 7):
> >              self.warning('Resource monitoring will not be enabled! Python 2.7+ required.')
> >              return
> >  
> >          try:
> > +            self.activate_virtualenv()
> 
> Does this want to be in the `try:` block? I think a failure here is distinct
> enough that it shouldn't be conveted to the generic error about the resource
> monitor.

Now I remember why I put this in. If the main create_virtualenv action fails, then activate_virtualenv() will also raise an exception in the post-action handler. The original exception from create_virtualenv is then lost. I think this is actually a more fundamental bug in how BaseScript is not logging exceptions from the main action methods.
Comment on attachment 8976582 [details]
Bug 1408051: Remove mozharness' copy of virtualenv

https://reviewboard.mozilla.org/r/244686/#review251104

::: taskcluster/scripts/tester/test-macosx.sh:41
(Diff revision 4)
>  
>      echo "Downloading mozharness"
>  
>      while [[ $attempt < $max_attempts ]]; do
>          if curl --fail -o mozharness.zip --retry 10 -L $MOZHARNESS_URL; then
> -            rm -rf mozharness
> +            rm -rf mozharness third_party

This change is not needed.
Attachment #8976582 - Flags: review?(mozilla) → review+
Pushed by catlee@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5afad9b7fd21
Remove mozharness' copy of virtualenv r=tomprince
The jobs are taking like 20 minutes to set up their virtualenv on OSX. Not sure why yet....
Flags: needinfo?(catlee)
With some more debug logging, it looks like the new pip is spending a long time fetching indexes from https endpoints. e.g.

old:
18:13:47     INFO -    Getting page https://pypi.pub.build.mozilla.org/pub
18:13:47     INFO -    Starting new HTTPS connection (1): pypi.pub.build.mozilla.org
18:13:47     INFO -    "GET /pub HTTP/1.1" 301 246
18:13:47     INFO -    "GET /pub/ HTTP/1.1" 200 None
18:13:48     INFO -    Analyzing links from page https://pypi.pub.build.mozilla.org/pub/

new:
15:38:39     INFO -    Getting page https://pypi.pub.build.mozilla.org/pub
15:38:39     INFO -    Starting new HTTPS connection (1): pypi.pub.build.mozilla.org
15:38:39     INFO -    https://pypi.pub.build.mozilla.org:443 "GET /pub HTTP/1.1" 301 246
15:38:39     INFO -    https://pypi.pub.build.mozilla.org:443 "GET /pub/ HTTP/1.1" 200 None
15:40:40     INFO -    Analyzing links from page https://pypi.pub.build.mozilla.org/pub/

Aki found this, which may be relevant: https://github.com/pyca/pyopenssl/issues/625
Assignee: catlee → nobody
Blocks: 1201085
It might be worth upgrading our in-tree pip and trying again (current is 9.0.3, released is 18.0). Also this reply has a workaround that seems worth investigating:
https://github.com/pyca/pyopenssl/issues/625#issuecomment-300322038
Assignee: nobody → hamzah18051
Status: NEW → ASSIGNED
Blocks: 1654457
Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7864e0d0295f
Remove mozharness' copy of virtualenv and use the one under /third_party/python instead r=ahal,rail
Status: ASSIGNED → RESOLVED
Closed: 2 months ago
Resolution: --- → FIXED
Regressions: 1655164
Regressions: 1655666

I think we should consider backing this out and fixing the regressions on our own time.

This change had a few regressions that will be easier to fix without the pressure of time.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---

The somewhat good news is that bug 1655666 has a patch, and the issues from bug 1655164 and bug 1655180 look the same. So I think there is only thing to fix here. I couldn't figure out why it was happening from a brief glance, so decided to back this change out in the meantime.

Here is the traceback:

[task 2020-07-20T12:13:21.829Z] 12:13:21    FATAL - Uncaught exception: Traceback (most recent call last):
[task 2020-07-20T12:13:21.829Z] 12:13:21    FATAL -   File "/builds/task_1595246656/workspace/mozharness/mozharness/base/script.py", line 2137, in run
[task 2020-07-20T12:13:21.829Z] 12:13:21    FATAL -     self.run_action(action)
[task 2020-07-20T12:13:21.829Z] 12:13:21    FATAL -   File "/builds/task_1595246656/workspace/mozharness/mozharness/base/script.py", line 2076, in run_action
[task 2020-07-20T12:13:21.829Z] 12:13:21    FATAL -     self._possibly_run_method(method_name, error_if_missing=True)
[task 2020-07-20T12:13:21.829Z] 12:13:21    FATAL -   File "/builds/task_1595246656/workspace/mozharness/mozharness/base/script.py", line 2031, in _possibly_run_method
[task 2020-07-20T12:13:21.829Z] 12:13:21    FATAL -     return getattr(self, method_name)()
[task 2020-07-20T12:13:21.829Z] 12:13:21    FATAL -   File "/builds/task_1595246656/workspace/mozharness/mozharness/mozilla/testing/raptor.py", line 483, in install_chrome_android
[task 2020-07-20T12:13:21.829Z] 12:13:21    FATAL -     self.device.shell_output("settings put global verifier_verify_adb_installs 0")
[task 2020-07-20T12:13:21.829Z] 12:13:21    FATAL -   File "/builds/task_1595246656/workspace/mozharness/mozharness/mozilla/testing/android.py", line 57, in device
[task 2020-07-20T12:13:21.829Z] 12:13:21    FATAL -     adb = self.adb_path
[task 2020-07-20T12:13:21.829Z] 12:13:21    FATAL -   File "/builds/task_1595246656/workspace/mozharness/mozharness/mozilla/testing/android.py", line 45, in adb_path
[task 2020-07-20T12:13:21.829Z] 12:13:21    FATAL -     self.activate_virtualenv()
[task 2020-07-20T12:13:21.829Z] 12:13:21    FATAL -   File "/builds/task_1595246656/workspace/mozharness/mozharness/base/python.py", line 442, in activate_virtualenv
[task 2020-07-20T12:13:21.829Z] 12:13:21    FATAL -     exec(open(activate).read(), dict(__file__=activate))
[task 2020-07-20T12:13:21.829Z] 12:13:21    FATAL - IOError: [Errno 2] No such file or directory: '/builds/task_1595246656/workspace/build/venv/bin/activate_this.py'
[task 2020-07-20T12:13:21.829Z] 12:13:21    FATAL - Running post_fatal callback...
[task 2020-07-20T12:13:21.829Z] 12:13:21    FATAL - Exiting -1

I just tested and the version of virtualenv we have in-tree does produce an activate_this.py script, so I have no idea why it appears to be missing. Maybe the whole venv path is wrong or something?

I think it's happening because it's not able to find abs_src_dir or base_work_dir in the dictionary

Status: REOPENED → RESOLVED
Closed: 2 months ago2 months ago
Resolution: --- → FIXED
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Do you also want the backout on beta, or are we ok there?

Flags: needinfo?(ahal)

Thanks for asking! I don't think it's a huge deal to leave it, but I guess ideally it would be backed out on beta too.

Julien, would you be able to handle that for me?

Flags: needinfo?(ahal) → needinfo?(jcristau)

I think with the fix from bug 1655666 we don't need to land it on beta, unless there are other issues it is causing. :ahal, with that fix, is there any reason this couldn't land again? I didn't investigate all the regressions, but based on comment 27, it sounds like they all had the same root cause.

Flags: needinfo?(jcristau) → needinfo?(ahal)

I was under the impression that bug 1655666 was a separate issue from the other two, but I haven't investigated very closely. If they are all the same issue then I guess I backed it out for now reason :(.

Hamzah, can you confirm if bug 1655666 fixed the other regression?

Flags: needinfo?(ahal) → needinfo?(hamzah18051)

bug1655666 fixed perma android regression(bug 1655164) , but it didn't fix bug 1655180.
I tried fixing that and did a try run . Everything seemed to work fine.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=bdc11189ee0e590df5038b3db3bca67c60a44dd0
Andrew if you could please confirm this then that would be great.

Flags: needinfo?(hamzah18051) → needinfo?(ahal)

(In reply to Hamzah Akhtar from comment #36)

bug1655666 fixed perma android regression(bug 1655164) , but it didn't fix bug 1655180.

Ah ok, thanks for clarifying! It looks like a separate patch landed in bug 1655164 though (aside from the one in bug 1655666). Maybe that second patch also ended up fixing bug 1655180?

I tried fixing that and did a try run . Everything seemed to work fine.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=bdc11189ee0e590df5038b3db3bca67c60a44dd0
Andrew if you could please confirm this then that would be great.

That bug talks about mach raptor being broken, and your try push doesn't contain any raptor tasks (even if it did, I think the issue was only showing up locally). So it's hard to say.

Could you answer the following please?

  1. Can you reproduce the mach raptor issue on the latest central with only the patch from this bug? (i.e excluding the fix from your try push)
  2. If so, can you confirm that your try push from comment 36 fixes it?

Either way sounds like we're close here.

Flags: needinfo?(ahal)

After looking at that try run that I mentioned above again, I remembered that it was related to perma Android regression, I think that is fixed now.
About the mach raptor command being broken, yeah I was able to reproduce to it using the patch in this bug, also I was able to fix it
I'll submit a patch for that in bug 1655180

Perfect, thanks for clarifying. It might be cleanest to incorporate the changes from that fix in the patch here, but it's up to you :).

Pushed by ahalberstadt@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/d25219ad617a
Remove mozharness' copy of virtualenv and use the one under /third_party/python instead r=ahal,rail
Status: REOPENED → RESOLVED
Closed: 2 months ago1 month ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.