Closed Bug 1739067 Opened 3 years ago Closed 2 years ago

Don't share "Mach" virtualenv between checkouts ("./mach create-mach-environment" must be run when moving across repositories)

Categories

(Firefox Build System :: Mach Core, defect, P2)

defect

Tracking

(firefox-esr91 unaffected, firefox94 unaffected, firefox95 unaffected, firefox96 fixed)

RESOLVED FIXED
96 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox94 --- unaffected
firefox95 --- unaffected
firefox96 --- fixed

People

(Reporter: standard8, Assigned: mhentges)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(1 file)

I'm using two different source trees on my local disk and I'm currently frequently swapping between them.

I've pulled them both and they are up to date with the latest mozilla-central. When I build in one, I get the message:

The "mach" virtualenv is not up-to-date, please run "./mach create-mach-environment"

So I run the command as suggested and then it builds fine. However, when I swap to my other repository, I get exactly the same message and have to rebuild the virtualenv again.

This repeats forever.

Doing a quick bit of debug, I notice there are two sections that fail: https://searchfox.org/mozilla-central/rev/1e7f7235cf822e79cd79ba9e200329ede3d37925/python/mach/mach/virtualenv.py#177-185,226-227

The first is the modification times. If I remove that, then the second is the path comparison. The path comparison fails because the directories are:

/Users/mark/dev/mozilla-central/build
vs
/Users/mark/dev/gecko/build

self.virtualenv_root is /Users/mark/.mozbuild/_virtualenvs/mach. So it appears using the same virtualenv across repositories isn't possible?

Flags: needinfo?(mhentges)

Set release status flags based on info from the regressing bug 1732948

I'm glad you brought this up, thanks for the ticket.
We used to share a single Mach virtualenv in ~/.mozbuild/_virtualenvs/mach. However, this Mach virtualenv is tied to a specific repository, since it:

  • Uses that repo's first-party Python modules mozbuild, mozversioncontrol, vendored code, etc
  • Has installed "pypi" dependencies according to its associated checkout (glean-sdk, psutil, zstandard)
  • Has a "virtualenv metadata file" whose structure is currently evolving, so is dependent on which revision you're on.

Two workaround-y pieces of information:

  • The create-mach-environment call will happen automatically "soon". It will still be slow to change trees, but won't impact your workflow
  • You can set the MACH_USE_SYSTEM_PYTHON=1 environment variable, but:
    • This is slower because Mach has to verify that your system python provides necessary packages
    • You need to keep your environment compatible with Mach

The actual, real solution to this problem is that we shouldn't put the Mach virtualenv in a shared location - it should be in the objdir with the rest of the virtualenvs. So, there'd be a "Mach" virtualenv for each checkout you have.

Type: defect → enhancement
Flags: needinfo?(mhentges)
Summary: Swapping between multiple source trees in different directories broken - Continuous 'The "mach" virtualenv is not up-to-date' failures → Don't share "Mach" virtualenv between checkouts
Type: enhancement → defect
See Also: → 1736762
Priority: -- → P2

Ideally, there should be one per source tree, not per objdir.

Ideally, there should be one per source tree, not per objdir.

True, though our distinction between "target-specific output dir" and "general tool output/cache/etc dir" isn't currently very strong.
However, that's a discussion for another time - after a chat with Connor yesterday, I think the right solution here is to chuck it in the state_dir(srcdir=True) (~/.mozbuild/srcdirs/<checkout>/)

Assignee: nobody → mhentges
Status: NEW → ASSIGNED

Build and run the Mach virtualenv from a state_dir that is
"specific-to-topsrcdir".

As part of this, move get_state_dir() to mach so that it's usable
before sys.path entries are fully set up.

This is blocked on us activating the Mach virtualenv in Python-land, not sh-land.

Depends on: 1717051
Attachment #9249267 - Attachment description: Bug 1739067: Scope Mach virtualenv to be checkout-specific → WIP: Bug 1739067: Scope Mach virtualenv to be checkout-specific
Summary: Don't share "Mach" virtualenv between checkouts → Don't share "Mach" virtualenv between checkouts ("./mach create-mach-environment" must be run when moving across repositories)
See Also: → 1740114

(In reply to Mitchell Hentges [:mhentges] 🦀 from comment #2)

I'm glad you brought this up, thanks for the ticket.
We used to share a single Mach virtualenv in ~/.mozbuild/_virtualenvs/mach. However, this Mach virtualenv is tied to a specific repository, since it:

  • Uses that repo's first-party Python modules mozbuild, mozversioncontrol, vendored code, etc
  • Has installed "pypi" dependencies according to its associated checkout (glean-sdk, psutil, zstandard)
  • Has a "virtualenv metadata file" whose structure is currently evolving, so is dependent on which revision you're on.

Two workaround-y pieces of information:

  • The create-mach-environment call will happen automatically "soon". It will still be slow to change trees, but won't impact your workflow
  • You can set the MACH_USE_SYSTEM_PYTHON=1 environment variable, but:
    • This is slower because Mach has to verify that your system python provides necessary packages
    • You need to keep your environment compatible with Mach

The actual, real solution to this problem is that we shouldn't put the Mach virtualenv in a shared location - it should be in the objdir with the rest of the virtualenvs. So, there'd be a "Mach" virtualenv for each checkout you have.

This bug is horrible for productivity with multiple checkouts. Is there any ETA on getting this fixed, and can we backout the original offending patch until then?

I tried MACH_USE_SYSTEM_PYTHON=1 workaround, which worked for 15 minutes, until something changed and now:

Skipping automatic management of Python dependencies since the 'MACH_USE_SYSTEM_PYTHON' environment variable is set.
The following issues were found while validating your Python environment:
glean-sdk==40.0.0: Installed with unexpected version "42.2.0"
Flags: needinfo?(mhentges)

Two options to resolve this:

  1. python3 -m pip install glean-sdk==40.0.0 --user (if you're on Windows, omit the --user)
    • If glean-sdk keeps getting reinstalled later as 42.2.0, I'd recommend narrowing down what's causing that
  2. Unset MACH_USE_SYSTEM_PYTHON and set MOZBUILD_STATE_PATH to a different directory for each srcdir.

Is there any ETA on getting this fixed

It's probably going to be another couple weeks.

can we backout the original offending patch until then?

We're in a catch-22 situation where:

  • Without the "up-to-date" check, modifying virtualenv-handling behaviour causes failures with out-of-date virtalenvs
  • If we can't modify the virtualenv-handling code, then we can't reach a point where the "up-to-date" check smoothly resolves itself.

Miko, for your case here, I'm pretty sure that the easiest solution will be to continue using MACH_USE_SYSTEM_PYTHON. If the glean-sdk version change happens again, you should be able to narrow down what's causing it. Otherwise, it should tide your use case over until the up-to-date check becomes automatically handled.

Flags: needinfo?(mhentges)

(In reply to Mitchell Hentges [:mhentges] 🦀 from comment #8)

can we backout the original offending patch until then?

We're in a catch-22 situation where:

  • Without the "up-to-date" check, modifying virtualenv-handling behaviour causes failures with out-of-date virtalenvs
  • If we can't modify the virtualenv-handling code, then we can't reach a point where the "up-to-date" check smoothly resolves itself.

Can you send a note about the situation to dev-platform? It sounds like quite a few people are running into this and it would be good to let a wider audience know about the problem and the workarounds.

Flags: needinfo?(mhentges)

it's MozPhab what installs a newer glean-sdk which is quite unfortunate

Can you send a note about the situation to dev-platform?

Good call, submitted.

it's MozPhab what installs a newer glean-sdk which is quite unfortunate

Oh shoot, because glean-sdk isn't pinned by moz-phab ("glean-sdk>=36.0.0"), so pip blows out the old one and installs a new one.
Fortunately, moz-phab is only installed if it isn't yet installed, which is good.

So, it is a bummer for first-time setups for multi-worktree users right now, because they'll have to redo the python3 -m pip install glean-sdk==40.0.0 after they bootstrap the first time.

Flags: needinfo?(mhentges)

Don't moz-phab self upgrades have the potential to break it too?

They do, but moz-phab releases are relatively infrequent (the last one was in May) that we should make some progress on this issue before the next release.

FWIW, one other option we can (temporarily?) provide is a "don't use the Mach environment or the system python, and act independently" setting. This will likely run into issues around fetching .zst artifacts, though.

The severity field is not set for this bug.
:mhentges, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(mhentges)
Severity: -- → S2
Flags: needinfo?(mhentges)

Would it be sensible to just run create-mach-environment if out-of-date?

Flags: needinfo?(mhentges)

IIRC, we can't do that on Windows because we'll already be executing inside the Mach virtualenv by the time we realize that it's out-of-date, and if we try to rebuild it then it'll fail due to some of the files being "in use".

Flags: needinfo?(mhentges)

IIRC, Windows allows file or directory renames. Mach could move the entire virtualenv directory, create the new one, and ... well then all bets are off, as, as I recently found out, os.exec* functions don't end the calling process until the subprocess dies. I guess we could leave the old virtualenv around until the next update, where we'd need to definitely remove the old one before renaming.

We're currently a six-patch stack from 1717051 being resolved.

Considering that this is mostly affecting multi-tree users, of which we have rough-but-viable workarounds (MACH_USE_SYSTEM_PYTHON, multiple state dirs), I think the time is better spent getting the stack landed.

A side effect of this is that I'm now afraid of running mach create-mach-environment && mach command in one directory while a build happens in another directory, as I'm not sure this wouldn't mess my build. It would be good to have some information about these parallel invocations.

Hmm, parallel invocations are definitely risky. Off the top of my head, one potential issue is if an ongoing build attempts to import a package from the Mach virtualenv just as ./mach create-mach-environment has deleted and is re-creating the virtualenv - you'll get a strange ImportError.

Also, if the two srcdirs have different pip-installed Mach dependencies (e.g.: glean-sdk==36.0.0 for one, glean-sdk==40.0.0 for the other) then that can cause failures as well.

This will be improved by "mach virtualenv per srcdir" work, but there's more concerns afoot here: bootstrapped files are stored in a global location. For example, a "sysroot" expected by a build may be entirely replaced by a different sysroot while the build is running.

TL;DR: I'm a little bit shocked that you haven't encountered bizarre issues due to parallel-builds so far!

Attachment #9249267 - Attachment description: WIP: Bug 1739067: Scope Mach virtualenv to be checkout-specific → Bug 1739067: Scope Mach virtualenv to be checkout-specific
Pushed by mhentges@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9e0e7bc1308f
Scope Mach virtualenv to be checkout-specific r=perftest-reviewers,ahal,sparky
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 96 Branch
Regressions: 1743592
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: