Closed Bug 1789814 Opened 2 years ago Closed 2 years ago

Can no longer bootstrap / build

Categories

(Firefox Build System :: Task Configuration, defect)

defect

Tracking

(firefox-esr91 unaffected, firefox-esr102 unaffected, firefox104 unaffected, firefox105 unaffected, firefox106 fixed)

RESOLVED FIXED
106 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox-esr102 --- unaffected
firefox104 --- unaffected
firefox105 --- unaffected
firefox106 --- fixed

People

(Reporter: emilio, Unassigned)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

After bug 1784232 (https://phabricator.services.mozilla.com/D155978) in particular, I can no longer build nor bootstrap on a git-cinnabar set-up.

When digging, mach artifact fails:

$ ./mach artifact toolchain --bootstrap --from-build linux64-clang-tidy --verbose                                                                                                                                                            
error: No such remote 'origin'
fatal: no upstream configured for branch 'gecko-7'
fatal: no upstream configured for branch 'gecko-7'
Error running mach:

    ['artifact', 'toolchain', '--bootstrap', '--from-build', 'linux64-clang-tidy', '--verbose']

The error occurred in code that was called by the mach command. This is either
a bug in the called code itself or in the way that mach is calling it.
You can invoke |./mach busted| to check if this issue is already on file. If it
isn't, please use |./mach busted file artifact| to report it. If |./mach busted| is
misbehaving, you can also inspect the dependencies of bug 1543241.

If filing a bug, please include the full output of mach, including this error
message.

The details of the failure are as follows:

RuntimeError: Unable to find default branch. Got: []

  File "/home/emilio/src/moz/gecko-7/python/mozbuild/mozbuild/artifact_commands.py", line 379, in artifact_toolchain
    tasks = toolchain_task_definitions()
  File "/home/emilio/src/moz/gecko-7/python/mozbuild/mozbuild/toolchains.py", line 19, in toolchain_task_definitions
    toolchains = load_tasks_for_kind(params, "toolchain", root_dir=root_dir)
  File "/home/emilio/src/moz/gecko-7/taskcluster/gecko_taskgraph/generator.py", line 451, in load_tasks_for_kind
    for task in tgg.full_task_set
  File "/home/emilio/src/moz/gecko-7/taskcluster/gecko_taskgraph/generator.py", line 159, in full_task_set
    return self._run_until("full_task_set")
  File "/home/emilio/src/moz/gecko-7/taskcluster/gecko_taskgraph/generator.py", line 427, in _run_until
    k, v = next(self._run)
  File "/home/emilio/src/moz/gecko-7/taskcluster/gecko_taskgraph/generator.py", line 263, in _run
    parameters = self._parameters(graph_config)
  File "/home/emilio/src/moz/gecko-7/third_party/python/taskcluster_taskgraph/taskgraph/parameters.py", line 351, in get_parameters
    parameters = load_parameters_file(
  File "/home/emilio/src/moz/gecko-7/third_party/python/taskcluster_taskgraph/taskgraph/parameters.py", line 302, in load_parameters_file
    return Parameters(strict=strict, repo_root=repo_root, **overrides)
  File "/home/emilio/src/moz/gecko-7/third_party/python/taskcluster_taskgraph/taskgraph/parameters.py", line 151, in __init__
    kwargs = Parameters._fill_defaults(repo_root=repo_root, **kwargs)
  File "/home/emilio/src/moz/gecko-7/third_party/python/taskcluster_taskgraph/taskgraph/parameters.py", line 191, in _fill_defaults
    defaults.update(fn(repo_root))
  File "/home/emilio/src/moz/gecko-7/third_party/python/taskcluster_taskgraph/taskgraph/parameters.py", line 88, in _get_defaults
    default_base_ref = repo.default_branch
  File "/home/emilio/src/moz/gecko-7/third_party/python/taskcluster_taskgraph/taskgraph/util/vcs.py", line 349, in default_branch
    return self._guess_default_branch()
  File "/home/emilio/src/moz/gecko-7/third_party/python/taskcluster_taskgraph/taskgraph/util/vcs.py", line 382, in _guess_default_branch
    raise RuntimeError(f"Unable to find default branch. Got: {branches}")

Sentry event ID: 524c6f9428c640baada80a513d91b2e8
Sentry is attempting to send 0 pending error messages
Waiting up to 2 seconds
Press Ctrl-C to quit
Flags: needinfo?(jlorenzo)
Flags: needinfo?(ahal)

If I set an upstream to bypass the error with:

$ g branch --set-upstream-to mozilla/bookmarks/central                                                                                                                                                                                     
branch 'gecko-7' set up to track 'mozilla/bookmarks/central'.

Then I get a different error:

$ mach artifact toolchain --bootstrap --from-build linux64-clang-tidy --verbose                                                                                                                                                            
error: No such remote 'origin'
fatal: Not a valid object name branches/default/tip
Error running mach:

    ['artifact', 'toolchain', '--bootstrap', '--from-build', 'linux64-clang-tidy', '--verbose']

The error occurred in code that was called by the mach command. This is either
a bug in the called code itself or in the way that mach is calling it.
You can invoke |./mach busted| to check if this issue is already on file. If it
isn't, please use |./mach busted file artifact| to report it. If |./mach busted| is
misbehaving, you can also inspect the dependencies of bug 1543241.

If filing a bug, please include the full output of mach, including this error
message.

The details of the failure are as follows:

subprocess.CalledProcessError: Command '('/usr/bin/git', 'merge-base', 'branches/default/tip', '572ebcaa6f43d2e7622b8d430bbf63cf3358f77c')' returned non-zero exit status 128.

  File "/home/emilio/src/moz/gecko-7/python/mozbuild/mozbuild/artifact_commands.py", line 379, in artifact_toolchain
    tasks = toolchain_task_definitions()
  File "/home/emilio/src/moz/gecko-7/python/mozbuild/mozbuild/toolchains.py", line 19, in toolchain_task_definitions
    toolchains = load_tasks_for_kind(params, "toolchain", root_dir=root_dir)
  File "/home/emilio/src/moz/gecko-7/taskcluster/gecko_taskgraph/generator.py", line 451, in load_tasks_for_kind
    for task in tgg.full_task_set
  File "/home/emilio/src/moz/gecko-7/taskcluster/gecko_taskgraph/generator.py", line 159, in full_task_set
    return self._run_until("full_task_set")
  File "/home/emilio/src/moz/gecko-7/taskcluster/gecko_taskgraph/generator.py", line 427, in _run_until
    k, v = next(self._run)
  File "/home/emilio/src/moz/gecko-7/taskcluster/gecko_taskgraph/generator.py", line 263, in _run
    parameters = self._parameters(graph_config)
  File "/home/emilio/src/moz/gecko-7/third_party/python/taskcluster_taskgraph/taskgraph/parameters.py", line 351, in get_parameters
    parameters = load_parameters_file(
  File "/home/emilio/src/moz/gecko-7/third_party/python/taskcluster_taskgraph/taskgraph/parameters.py", line 302, in load_parameters_file
    return Parameters(strict=strict, repo_root=repo_root, **overrides)
  File "/home/emilio/src/moz/gecko-7/third_party/python/taskcluster_taskgraph/taskgraph/parameters.py", line 151, in __init__
    kwargs = Parameters._fill_defaults(repo_root=repo_root, **kwargs)
  File "/home/emilio/src/moz/gecko-7/third_party/python/taskcluster_taskgraph/taskgraph/parameters.py", line 191, in _fill_defaults
    defaults.update(fn(repo_root))
  File "/home/emilio/src/moz/gecko-7/third_party/python/taskcluster_taskgraph/taskgraph/parameters.py", line 92, in _get_defaults
    "base_rev": repo.find_latest_common_revision(default_base_ref, repo.head_rev),
  File "/home/emilio/src/moz/gecko-7/third_party/python/taskcluster_taskgraph/taskgraph/util/vcs.py", line 455, in find_latest_common_revision
    return self.run("merge-base", base_ref_or_rev, head_rev).strip()
  File "/home/emilio/src/moz/gecko-7/third_party/python/taskcluster_taskgraph/taskgraph/util/vcs.py", line 42, in run
    return subprocess.check_output(
  File "/usr/lib/python3.10/subprocess.py", line 420, in check_output
    return run(*popenargs, stdout=PIPE, timeout=timeout, check=True,
  File "/usr/lib/python3.10/subprocess.py", line 524, in run
    raise CalledProcessError(retcode, process.args,

Sentry event ID: 148f5ebdf6dc42d0a80eb970fb024739
Sentry is attempting to send 0 pending error messages
Waiting up to 2 seconds
Press Ctrl-C to quit

That one I can fix with:

diff --git a/third_party/python/taskcluster_taskgraph/taskgraph/util/vcs.py b/third_party/python/taskcluster_taskgraph/taskgraph/util/vcs.py
index 31703cd86cc88..200d7c17fc60e 100644
--- a/third_party/python/taskcluster_taskgraph/taskgraph/util/vcs.py
+++ b/third_party/python/taskcluster_taskgraph/taskgraph/util/vcs.py
@@ -452,7 +452,7 @@ class GitRepository(Repository):
         self.run("checkout", ref)
 
     def find_latest_common_revision(self, base_ref_or_rev, head_rev):
-        return self.run("merge-base", base_ref_or_rev, head_rev).strip()
+        return self.run("merge-base", f"{self.remote_name}/{base_ref_or_rev}", head_rev).strip()
 
     def does_revision_exist_locally(self, revision):
         try:

With that patch, I can build on my setup if my default remote is called origin rather than mozilla. So this patch "fixes" it for me completely:

diff --git a/third_party/python/taskcluster_taskgraph/taskgraph/util/vcs.py b/third_party/python/taskcluster_taskgraph/taskgraph/util/vcs.py
index 31703cd86cc88..275dde1e06619 100644
--- a/third_party/python/taskcluster_taskgraph/taskgraph/util/vcs.py
+++ b/third_party/python/taskcluster_taskgraph/taskgraph/util/vcs.py
@@ -325,6 +325,9 @@ class GitRepository(Repository):
         if "origin" in remotes:
             return "origin"
 
+        if "mozilla" in remotes:
+            return "mozilla"
+
         raise RuntimeError(
             f"Cannot determine remote repository name. Candidate remotes: {remotes}"
         )
@@ -452,7 +455,7 @@ class GitRepository(Repository):
         self.run("checkout", ref)
 
     def find_latest_common_revision(self, base_ref_or_rev, head_rev):
-        return self.run("merge-base", base_ref_or_rev, head_rev).strip()
+        return self.run("merge-base", f"{self.remote_name}/{base_ref_or_rev}", head_rev).strip()
 
     def does_revision_exist_locally(self, revision):
         try:

Set release status flags based on info from the regressing bug 1784232

Thank you for the super detailed bug report, :emilio! I just asked sheriffs to back out bug 1784232 of mozilla-central.

Flags: needinfo?(jlorenzo)

Fixed by backout of bug 1784232

Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(ahal)
Resolution: --- → FIXED
Target Milestone: --- → 106 Branch

Status update: after discussing with :emilio and another affected developer: there are 2 bugs in this report. The first one is: the local clone has several remote repositories defined and none of them is called origin. The second bug is the default branch is not checked out locally. Both bugs make sense in the context of git-cinnabar + the multiple repos we have to work with (e.g.: mozilla-unified, try, but also the old mozilla-central, etc.). Moreover the default branch (named branches/default/tip) points to autoland in the case of mozilla-unified, so it's tempting to delete this branch.

:ahal and I had a lengthy conversation about how we should approach this. The conclusions we came up with are:
a. When multiple repos are present, select whatever the first one is, use it to get the default branch, and warn users about this potentially unexpected behavior. The warning message will also suggest developers to define a remote called origin.
b. taskgraph will stop assuming the default branch is checked out locally and will use the remote one (which is the approach suggested by :emilio)
c. we will communicate about this change next time bug 1784232 lands on central.

Should we allow for an env var to specify which remote is the base? if os.environ.get("GECKO_GIT_REMOTE", "origin") in remotes:

Hey :aki! Sorry, I just saw your message. :ahal and I agreed on this new implementation[1]. Just to make sure I'm not missing anything obvious: how would this env var be used? Would developers have to define it if origin doesn't exist?

[1] https://github.com/taskcluster/taskgraph/pull/110

Flags: needinfo?(aki)

It feels to me this should look at git rev-parse --symbolic-full-name @{upstream} before trying to do wild guesses.

That being said, the base revision shouldn't matter for what the bootstrap code is using taskcluster modules for, it seems to me it would be better to not even try to guess the base revision altogether in that case.

(In reply to Johan Lorenzo [:jlorenzo] from comment #8)

Hey :aki! Sorry, I just saw your message. :ahal and I agreed on this new implementation[1]. Just to make sure I'm not missing anything obvious: how would this env var be used? Would developers have to define it if origin doesn't exist?

[1] https://github.com/taskcluster/taskgraph/pull/110

    default_remote_name = os.environ.get("NAME_OF_ENV_VAR", "origin")
Flags: needinfo?(aki)

(In reply to Mike Hommey [:glandium] from comment #9)

It feels to me this should look at git rev-parse --symbolic-full-name @{upstream} before trying to do wild guesses.

GMTA! 😃 That's exactly what taskgraph does first[1] when it wants to get the remote name. I didn't work in the case work in :emilio's case (described above) because his local branch doesn't track any remote one. That's why taskgraph started to look into all remotes to see if there's a suitable one.

That being said, the base revision shouldn't matter for what the bootstrap code is using taskcluster modules for, it seems to me it would be better to not even try to guess the base revision altogether in that case.

Yeah, I didn't know taskgraph was called by ./mach bootstrap . I agree it doesn't make sense to fetch this value. That said, I don't know why taskgraph is called in ./mach bootstrap anyway. Maybe we could take it out? What do you think?

(In reply to Aki Sasaki [:aki] (he/him) (UTC-6) from comment #10)

    default_remote_name = os.environ.get("NAME_OF_ENV_VAR", "origin")

I'm sorry, my questions were too ambiguous. I'm not worried about getting an environment variable with a default value if it's not set 🙂 I meant: how would developers discover about this environment variable? How can we let them persist this variable so they can keep their local clone freed from origin?

[1] https://github.com/taskcluster/taskgraph/pull/93/files#diff-4ccccf532575772eec31d198c18aa2e767a8355f0a339e1c3add7be8539b26b2R214

Flags: needinfo?(mh+mozilla)
Flags: needinfo?(aki)

(In reply to Johan Lorenzo [:jlorenzo] from comment #11)

Yeah, I didn't know taskgraph was called by ./mach bootstrap . I agree it doesn't make sense to fetch this value. That said, I don't know why taskgraph is called in ./mach bootstrap anyway. Maybe we could take it out? What do you think?

bootstrap uses taskgraph to extract the index for toolchain tasks. There is no other way to get this information.

Flags: needinfo?(mh+mozilla)

Oh, that makes total sense, then! Defining the base_rev is super useful when taskgraph runs in the context of the decision task. I think we should keep the same behavior on the decision task and on local machines. This way, there are fewer surprises when somethings runs perfectly locally but not on CI. I'll do my best to ensure mach isn't busted on local machines. That said, I'll keep in mind we could just stop setting a default base_rev if that happens to break too many things.

(In reply to Johan Lorenzo [:jlorenzo] from comment #11)

(In reply to Aki Sasaki [:aki] (he/him) (UTC-6) from comment #10)

    default_remote_name = os.environ.get("NAME_OF_ENV_VAR", "origin")

I'm sorry, my questions were too ambiguous. I'm not worried about getting an environment variable with a default value if it's not set 🙂 I meant: how would developers discover about this environment variable? How can we let them persist this variable so they can keep their local clone freed from origin?

I'm thinking documentation.
I generally rename my origin remote to upstream and my fork to escapewindow so I can confidently push to a remote without having to worry about whether origin refers to the official repo or my fork. In my case, if I hit taskgraph bustage due to this remote rename, I could work around it by specifying which remote to use.

Flags: needinfo?(aki)

Got it. I understand the use case. I'm on board with renaming origin into upstream, I do it too. That said, I then set my fork to origin so that any command that default to origin goes to my fork, which I'm usually fine breaking.

The new logic :ahal and I agreed on a few days is to use whatever the first remote is (in your case: it'll very likely be escapewindow due to alphabetical sort) and warn users to set an origin repo. How does that sound to you, :aki?

Flags: needinfo?(aki)

(In reply to Johan Lorenzo [:jlorenzo] from comment #15)

Got it. I understand the use case. I'm on board with renaming origin into upstream, I do it too. That said, I then set my fork to origin so that any command that default to origin goes to my fork, which I'm usually fine breaking.

The new logic :ahal and I agreed on a few days is to use whatever the first remote is (in your case: it'll very likely be escapewindow due to alphabetical sort) and warn users to set an origin repo. How does that sound to you, :aki?

I sometimes fetch other people's remotes so I can easier do reviews, so it's possible the first remote in my list will be, e.g. ahal.
Having said that, other automation does require origin as a remote and I end up setting that as a duplicate of one of my other remotes. I might prefer the option of setting an env var, documented, but the decision is up to you and :ahal and the team.

Flags: needinfo?(aki)
You need to log in before you can comment on or make changes to this bug.