Open Bug 1697493 Opened 4 years ago Updated 4 years ago

Investigate a shared working directory in the state dir for tooling

Categories

(Firefox Build System :: General, enhancement, P5)

enhancement

Tracking

(Not tracked)

People

(Reporter: ahal, Unassigned)

Details

Bug 1409733 implements a mach command that updates the user's working directory in version control. There were some understandable concerns that came up in review over this. Ultimately we decided that the target audience of this feature was people who knew what they were doing, and that it was not much different than e.g, running moz-phab.

That said, ideally mach commands (and moz-phab for that matter) wouldn't mess with the user's working directory. It would be neat if we could provide a way for tools to jump around version control that didn't mess with the working directory.

With mercurial, we could leverage the hg share extension. With git, worktrees might accomplish what we need. Ideally we'd have a single shared working directory somewhere in the state dir that all tools could access.

For now given there is only a single instance of us needing this, this bug doesn't feel very worthwhile. But maybe having this feature, we could envision new commands that suddenly become viable (e.g regression bisection, or running linters / tests across all patches in a stack to verify each commit is green on its own).

It's worth calling out the two main issues with having a separate worktree:

  • Disk space usage (a couple gigs is required for a checkout, I believe).
  • Performance - depending on the gap between the current checked out revision and the target revision, it can take a while to perform the update (especially on Windows).
    • Additionally, there's the concern of communicating this with the user. Can we show the user the fancy progress bar that hg shows, even though it's a subprocess and we control stdout?
  • Also, a tertiary issue: what if some confident deviant tries to use the shared work tree from two different Firefox checkouts simultaneously? We'll probably have to handle locking :(

Of course, it's a better solution than requiring that the user's active checkout gets updated!

However, IMO, the ideal solution here is if all use cases requiring historical file state are able to do that by fetching from VCS history without mutating the checkout directory. moz-phab will be able to do this in the future.
We should be able to expose a mach API that simplifies this VCS access, and port use cases to use it. There's performance considerations from these operations, but if we can expose a "batch query" operation, it shouldn't be worse than updating a shared state dir.

(In reply to Mitchell Hentges [:mhentges] 🦀 from comment #1)

However, IMO, the ideal solution here is if all use cases requiring historical file state are able to do that by fetching from VCS history without mutating the checkout directory. moz-phab will be able to do this in the future.
We should be able to expose a mach API that simplifies this VCS access, and port use cases to use it. There's performance considerations from these operations, but if we can expose a "batch query" operation, it shouldn't be worse than updating a shared state dir.

Interesting! I'm not sure if this will be viable for mach taskgraph though as that depends on thousands of files, and we don't know what they are ahead of time. Maybe there's a way we can set it up to fetch them on demand when something tries to access them? Sounds tricky though.

Assuming that the mach VCS API was flexible enough, it could allow on-demand access:

def on_file_loaded(file):
    dependents = file.parse_dependents()
    for dep in dependents:
        mach_context.vcs.load_file(dep, on_file_loaded)
    # ... do work

(or however that would look, depending on performance constraints, the interfaces of hg and git, etc)

I'm not very interested in this being implemented as a separate worktree. 2.7GB (currently) for occasional use of some of those files doesn't seem like a price worth paying. I'd be more interested in a VFS layer that gets the data from hg cat and git cat-file --batch, although the former would likely be excruciatingly slow, and we'd probably need a small script that just uses the mercurial libraries (which should be simple enough that it'd virtually never break with newer versions of mercurial). Another option might be to use hg serve... that could be fast enough.

Anyways, I do have a usecase for this: mach artifact toolchain.

(In reply to Mike Hommey [:glandium] from comment #4)

Anyways, I do have a usecase for this: mach artifact toolchain.

The interesting fact, though, is that this is another use case based on the taskgraph code... maybe we only need a solution for taskgraph.

A VFS layer sounds ideal, just unsure how feasible that is, especially one that supports hg and git.

What about a separate worktree only containing the taskgraph sparse profile? (Not sure if this works with git)

Personally I don't mind an extra 3GB, though I agree it's unnecessary for most developers. Maybe as a poor man's fix, we could make these tools configurable such that one can point them to a different work tree, then developers can optionally set up a separate working directory to use for purposes like this...

A VFS layer sounds ideal, just unsure how feasible that is, especially one that supports hg and git.

Would you mind clarifying this?
I think that we know that it's possible in the technical sense: hg has cat (or serve), and git has cat-file.

Flags: needinfo?(ahal)

I assumed there were existing tools to handle this we could use (like Microsoft's VFS for Git). Figured it might be unlikely to find an off-the-shelf solution that handles both vcs'. Though I guess you are talking about writing our own?

Both options sound complicated to me, but I have no idea what I'm talking about :)

Flags: needinfo?(ahal)

Ah, I see!
I think the term VFS is a little overloaded here. It could mean:

  • An actual OS-level virtual file system with all the associated abilities that are expected (mtime, ownership, etc).
  • A mach-level read-only file system abstraction, that has just enough behaviour to fulfull our use case.

I've been implicitly referring to the second interpretation. I imagine that we'd adjust mozversioncontrol to add the functionality we need.

Marking as P5 until we have a stronger need/additional use cases for a solution here.

Priority: -- → P5
You need to log in before you can comment on or make changes to this bug.