Closed Bug 1783635 Opened 3 years ago Closed 3 months ago

mach try commands fail when connecting via ssh: AttributeError: 'sshv1peer' object has no attribute '_initstack'

Categories

(Firefox Build System :: General, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED INACTIVE

People

(Reporter: emz, Unassigned)

References

(Blocks 1 open bug)

Details

When I attempt to push something to try via mach try fuzzy mach try auto or mach try again it fails when pushing to ssh://hg.mozilla.org/try.

Notable errors include:

ValueError: module object for 'OpenSSL.crypto' substituted in sys.modules during a lazy load
Exception ignored in: <function sshv1peer.__del__ at 0x16b10c9d0>
Traceback (most recent call last):
  File "/Users/pbz/Library/Python/3.8/lib/python/site-packages/mercurial/sshpeer.py", line 449, in __del__
    self._cleanup(warn=self._initstack)
AttributeError: 'sshv1peer' object has no attribute '_initstack'

Full error log: https://gist.github.com/Trikolon/afe00a6234438ffddd629860f42ce8e5

Running ssh hg.mozilla.org manually works fine and prints all my access levels.

This happens on macOS 12.5 on latest central state freshly bootstrapped via mach bootstrap.

Summary: mach try commands fail when connecting via ssh → mach try commands fail when connecting via ssh: AttributeError: 'sshv1peer' object has no attribute '_initstack'

Not sure if it's relevant: I'm using the 1Password SSH agent. This has worked fine before.

On first glance this looked like an issue with mach lazy-loading, but looking again I think this is caused by some out-of-date packages vendored into v-c-t. I'm doing some other dependency upgrades in v-c-t so I'll update these while I'm there.

Assignee: nobody → sheehan
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED

:pbz can you try updating version-control-tools (via mach vcs-setup or hg -R ~/.mozbuild/version-control-tools pull -u) and see if you still have this problem?

Flags: needinfo?(pbz)

I've updated the version control tools via the command you provided. Unfortunately the error persists. However, there is a new module error for urllib3._version. Not sure if that's relevant.

[~/src/moz/mozilla-unified]$ mach try auto                                  
Creating temporary commit for remote...
A try_task_config.json
pushing to ssh://hg.mozilla.org/try
temporary commit removed, repository restored
abort: No module named 'urllib3._version'
Exception ignored in: <function sshv1peer.__del__ at 0x111308ca0>
Traceback (most recent call last):
  File "/Users/pbz/Library/Python/3.8/lib/python/site-packages/mercurial/sshpeer.py", line 449, in __del__
    self._cleanup(warn=self._initstack)
AttributeError: 'sshv1peer' object has no attribute '_initstack'
Flags: needinfo?(pbz)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

What version of hg do you have installed (hg version)? On my Mac I'm getting errors on 6.2 or higher (though they're not the same as your errors), but if I downgrade to 6.1.4 or lower, everything works fine.

pip install --upgrade mercurial==6.1.4

Flags: needinfo?(pbz)

I have mercurial 6.1.2 installed. Same error after running pip install --upgrade mercurial==6.1.4 though. Is there anything else I could share that would help troubleshooting this?

Flags: needinfo?(pbz)

I think the original error is the one that matters, the _initstack is a red herring. If you go up far enough in the stack you see the assignment to self_initstack that never happens because of the ValueError being thrown at the bottom.

  File "/Users/pbz/Library/Python/3.8/lib/python/site-packages/mercurial/sshpeer.py", line 404, in __init__
    self._initstack = b''.join(util.getstackframes(1))

So I'm thinking 'load value error' is the actual problem (but I don't necessarily think it's linked to mach lazy loading, since I should be able to reproduce it if that alone was the root cause).

  File "/Applications/Xcode.app/Contents/Developer/Library/Frameworks/Python3.framework/Versions/3.8/lib/python3.8/importlib/util.py", line 250, in __getattribute__
    raise ValueError(f"module object for {original_name!r} "
ValueError: module object for 'cryptography.hazmat.primitives.ciphers.algorithms' substituted in sys.modules during a lazy load

There has to be something different with your local environment/config. Kind of a shot in the dark, but did you upgrade to a newer version of Xcode? I see a beta for version 14 was released in June.

Flags: needinfo?(pbz)

I'm on Xcode 13.4.1 (13F100) installed via the app store. Do you mean I should try the beta instead?
I've also tried to reset my ~/.mozbuild directory and run mach bootstrap again, but that didn't help.

My .hgrc: https://gist.github.com/Trikolon/84c0a2c72d0f3a45508b59d49a278688

Flags: needinfo?(pbz)

(In reply to Paul Zühlcke [:pbz] from comment #9)

I'm on Xcode 13.4.1 (13F100) installed via the app store.

I upgraded my Xcode to 13.4.1 to match yours and was still wasn't able to reproduce. So that's not it.

(In reply to Paul Zühlcke [:pbz] from comment #9)

Do you mean I should try the beta instead?

No, I was just taking a guess as to something that might be a difference between your machine and mine. Eagerly upgrading to a beta version of something seems like a plausible way to encounter errors others aren't seeing.

My .hgrc: https://gist.github.com/Trikolon/84c0a2c72d0f3a45508b59d49a278688

I tried using these settings (albeit slightly tweaked to my username) and still could not reproduce.

I have not tried the 1Password SSH Agent you mentioned in comment 1, but I don't have a subscription. Could you try using the default SSH agent instead?

Flags: needinfo?(pbz)

I've just tested it and I can reproduce the issue with the default SSH agent too unfortunately.

Edit: Seeing the urllib3 error again.

I unfortunately don't have much experience with mach or python to debug this myself. I'm happy to do a video call some time if it's helpful.

Flags: needinfo?(pbz)

Could you please run pip list and share the output here? I'm thinking you might have pip installed something (maybe cryptography?) that's causing a bad interaction.

I did pip install cryptography and it installed 37.0.4 and still couldn't reproduce, but maybe the issue is with a different version?

I'm only honing in on cryptography because I see it in the call stack (and I didn't have it installed), but I do see various versions that I presume are pip installed in version-control-tools, but maybe the one you have installed is taking precedence and causing the discrepancy?

Flags: needinfo?(pbz)

Here is the output from pip list:

[~/src/moz/mozilla-unified]$ pip list
Package    Version
---------- -------
pip        22.0.4
setuptools 58.1.0
WARNING: You are using pip version 22.0.4; however, version 22.2.2 is available.
You should consider upgrading via the '/Users/pbz/.pyenv/versions/3.10.5/bin/python3.10 -m pip install --upgrade pip' command.
Flags: needinfo?(pbz)
Product: Firefox Build System → Developer Infrastructure

Okay I was finally able to reproduce after working on a different issue.

For my case to get this issue to show up, I installed Python 3.10 via brew and did brew link python@3.10. After that python3 --version showed Python 3.10.6. When running ./mach try auto I got the same error as you did, and but it was still coming from the site-packages dir for Python 3.8. It looks like what's happening is mach is starting with Python 3.10, then when it starts doing things with mercurial it's going to the 3.8 version of mercurial (or whatever version it was originally installed under during the MacOS Build Setup and there's some incompatibility there (which isn't surprising).

I fixed it by installing mercurial for my new version of python (3.10.6) eg:

python3 -m pip install --user mercurial==6.1.2

Specifying mercurial==6.1.2 was necessary. Without it I grabbed 6.2.1 and there's a recent regression with fsmonitor if you're on 6.2.

I think everything makes sense. When switching to a new version of Python, mercurial must be re-installed. Though, mercurial from a different Python installation being used does seem kind of wrong, but I'm not too sure why that's happening. It'd be great if we could stop that and/or add a check to see if mercurial is installed for the running version of Python so that there's a helpful error message instead of what we saw here.

I do see that in comment 7 you mention that you tried upgrading/installing mercurial already. I'm going to assume that that was into a different Python installation (probably 3.8 again) than what ended up running mach (3.10). The steps regarding brew link and/or brew unlink may be necessary to resolving this (prior to installing mercurial).

Thanks! I can't use python latest (3.10.6) because of Bug 1784861.
I've tried it with the following setup which still throws:

[~/src/moz/mozilla-unified]$ pip list                    
Package    Version
---------- -------
mercurial  6.1.2
pip        22.2.2
setuptools 58.1.0
[~/src/moz/mozilla-unified]$ python3 --version                             
Python 3.10.5
[~/src/moz/mozilla-unified]$ which pip
/Users/pbz/.pyenv/shims/pip
[~/src/moz/mozilla-unified]$ which python3
/Users/pbz/.pyenv/shims/python3
[~/src/moz/mozilla-unified]$ hg --version
Mercurial Distributed SCM (version 6.1.2)

I'll try to completely reinstall python without pyenv following the build instructions again.

Ok so python 3.10 doesn't work because of Bug 1784861. But I can get it to work when I brew unlink python@3.10 which means it will use python from the OS v3.8.9. Generally what is the recommended / supported python version for mach?

Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED

Can you try running either of the commands in comment 4 and checking if this works for you? I messed up vendoring a library but have now fixed it, just want to confirm this isn't the cause of this problem.

Flags: needinfo?(pbz)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

I'm afraid I can't because mach doesn't work at all on latest python because of Bug 1784861.

Flags: needinfo?(pbz)

(In reply to Paul Zühlcke [:pbz] from comment #17)

Generally what is the recommended / supported python version for mach?

The minimum supported is Python 3.6. I'd personally recommend Python 3.9 (due the various issues with Python 3.10, not limited to the one you linked) so that you have the newest that's reliable.

Just to clarify, does switching to Python 3.9 and re-installing mercurial as described above solve both this issue and the one mentioned in Bug 1784861?

Flags: needinfo?(pbz)

If I brew unlink the brew-installed python version my OS falls back to Python 3.8.9. Using this older version and reinstalling Mercurial fixes both issues.

Flags: needinfo?(pbz)

You should be able to install Python 3.9 with brew install python@3.9 (For me it grabbed 3.9.13). Then link it to python3 in zsh with ln -s /usr/local/bin/python3.9 /usr/local/bin/python3. That should make mach use Python 3.9. I'm not entirely sure why linking through brew doesn't make it take precedence over the OS installed Python. Let me know if it doesn't work, or if there's a better way to achieve the same thing (I'm not a Mac expert or anything, just figuring this out as I go and this worked for me on my Mac).

Flags: needinfo?(pbz)

Symlinking and brew linking python@3.9 like you showed didn't work, it still used 3.8.9. Anyway, I'm fine with sticking to my macOS python version for now since mach works fine on that.
Thanks for your help!

Flags: needinfo?(pbz)

It looks like the problem is a which("hg") when initializing the repository object in mach. For example, on MacOS, with a running Python version of 3.10 that doesn't have hg installed, but there exists a Python 3.8 on the system with hg installed, this can yield a path to the hg installed in a Python that isn't running (eg: /Users/<user>/Library/Python/3.8/bin/hg).

This can't occur on Windows for those using MozillaBuild (but might be possible if invoking mach outsode of MozillaBuild? I did not attempt). I tried reproducing this on Ubuntu 20.04, but got plenty of other errors that made it clear mercurial was missing in Python before getting to this specific ./mach try error with _initstack.

Connor, should we add an additional check (Maybe just for MacOS?) that ensures the result of which("hg") resides in a site-packages dir that's the same version of the running Python? I can't think of a reason why we would want to allow a mismatch like that. My thinking is that even with a mismatch, most of the time everything probably still works, but every once in a while we'd get an extremely obscure error like this due to incompatibility between versions.

The root cause will always be that the user didn't install mercurial to the version of Python they upgraded to, but it'd be nice if we had a better warning/error message for that. Alternatively, we could do this on mach startup where we check the minimum Python version, or even add it as a check to ./mach doctor (but I think there it's probably less likely to help the user due to lack of visibility).

Edit: I forgot to mention it, but we'd also have to take git into consideration, not just hg.

Flags: needinfo?(sheehan)

Connor, ping for the needinfo above.

(In reply to Alex Hochheiden [:ahochheiden] from comment #26)

It looks like the problem is a which("hg") when initializing the repository object in mach. For example, on MacOS, with a running Python version of 3.10 that doesn't have hg installed, but there exists a Python 3.8 on the system with hg installed, this can yield a path to the hg installed in a Python that isn't running (eg: /Users/<user>/Library/Python/3.8/bin/hg).

This can't occur on Windows for those using MozillaBuild (but might be possible if invoking mach outsode of MozillaBuild? I did not attempt). I tried reproducing this on Ubuntu 20.04, but got plenty of other errors that made it clear mercurial was missing in Python before getting to this specific ./mach try error with _initstack.

Connor, should we add an additional check (Maybe just for MacOS?) that ensures the result of which("hg") resides in a site-packages dir that's the same version of the running Python? I can't think of a reason why we would want to allow a mismatch like that. My thinking is that even with a mismatch, most of the time everything probably still works, but every once in a while we'd get an extremely obscure error like this due to incompatibility between versions.

I'm not sure I fully understand why invoking a Py3.8 hg from a Py3.10 mach via subprocess would cause an interaction in the first place. Our lazy loading shouldn't make it's way into the Mercurial process, right? Mercurial has it's own lazy loading module system, not sure how these two would interact. IMO the Mercurial install would optimally be completely independent of how mach manages it's Python install/dependencies.

The root cause will always be that the user didn't install mercurial to the version of Python they upgraded to, but it'd be nice if we had a better warning/error message for that. Alternatively, we could do this on mach startup where we check the minimum Python version, or even add it as a check to ./mach doctor (but I think there it's probably less likely to help the user due to lack of visibility).

Edit: I forgot to mention it, but we'd also have to take git into consideration, not just hg.

What would we need to do for git? Check that git-cinnabar is running on the same Python version as mach?

Flags: needinfo?(sheehan) → needinfo?(ahochheiden)
Assignee: sheehan → nobody

Moving to Firefox Build System::General since the problem seems to be in mozversioncontrol.

Component: Try → General
Product: Developer Infrastructure → Firefox Build System

(In reply to Connor Sheehan [:sheehan] from comment #28)

I'm not sure I fully understand why invoking a Py3.8 hg from a Py3.10 mach via subprocess would cause an interaction in the first place. Our lazy loading shouldn't make it's way into the Mercurial process, right? Mercurial has it's own lazy loading module system, not sure how these two would interact. IMO the Mercurial install would optimally be completely independent of how mach manages it's Python install/dependencies.

I honestly don't fully understand the interaction either. I'm just going off of how I was eventually able to reproduce this.

I did encounter a similar issue while trying to reproduce bug 1800776 (the problem in the bug itself doesn't matter, only my steps for reproducing it are relevant):

Using Python3.9 I wasn't able to reproduce.
I switched to Python3.11 (using pyenv) and wasn't able to reproduce.
I installed mercurial for Python3.11, then I was able to reproduce.
I switched back to Python3.9 and I was still able to reproduce.

I think this example makes more sense than the what I saw reproducing this bug. Basically, mercurial encounters an error when ran with Python 3.11. It only runs with Python 3.11 if which hg points to the one in the Python 3.11 scripts folder. If I switch to using a different Python version (eg: 3.9) while which hg still points to the one in Python 3.11, it will run with Python3.11 even though I'm using Python3.9.

That still ultimately boils down to 'user error' (the user didn't make which hg point to the correct one) which is the same for this bug, but it's not very obvious (it took me a few minutes to realize what the problem was, and I worked on this bug).

What doesn't make sense in this bug is:

./mach try with Python 3.8 with which hg -> Python 3.8: works fine.
Upgrade to Python 3.10 ./mach try with Python 3.10 with which hg -> Python 3.8: does not work.

If it worked on the previous Python version (3.8), why does calling it from Python 3.10 (and it running hg with Python 3.8) not work? That's what we're seeing, but I don't understand it.

I also don't have a great solution for either problem. which hg could point to a shim (that's what pyenv does) so checking if that's in the current python site isn't always going to work. We could check if the base python site packages has mercurial installed, but that doesn't necessarily fix the 'user error' of which hg pointing to the wrong thing, just the 'user error' of not having it installed (which users usually notice pretty quickly if they can't use version control commands, and only very rarely encounter issues like the one in this bug).

Edit: I forgot to mention it, but we'd also have to take git into consideration, not just hg.

What would we need to do for git? Check that git-cinnabar is running on the same Python version as mach?

That is what I had in mind, but it might not be worth the effort. I was just thinking if hg can have this 'different python version' problem in regards to the install, it seemed like a good idea to consider git-cinnabar as well, but I think we should leave it as-is until/if a similar problem is encountered.

Flags: needinfo?(ahochheiden)

FWIW, git-cinnabar doesn't need python anymore.

The severity field is not set for this bug.
:ahochheiden, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(ahochheiden)
Severity: -- → S3
Flags: needinfo?(ahochheiden)
Priority: -- → P3
Status: REOPENED → RESOLVED
Closed: 3 years ago3 months ago
Resolution: --- → INACTIVE
You need to log in before you can comment on or make changes to this bug.