Closed Bug 1636797 Opened 1 year ago Closed 9 months ago

mach bootstrap fails installing minidump stackwalk or other artifacts ('Could not find artifacts for a toolchain build named...')

Categories

(Firefox Build System :: Task Configuration, defect, P1)

defect

Tracking

(firefox-esr68 unaffected, firefox-esr78 wontfix, firefox79 wontfix, firefox80 wontfix, firefox81 fixed)

RESOLVED FIXED
81 Branch
Tracking Status
firefox-esr68 --- unaffected
firefox-esr78 --- wontfix
firefox79 --- wontfix
firefox80 --- wontfix
firefox81 --- fixed

People

(Reporter: tnikkel, Assigned: rstewart)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(8 files, 1 obsolete file)

This is on mozilla-central rev f44e64a61ed1
No changes to the source tree, no local commits. hg diff is empty.
Note that artifact builds are mentioned a few times in the error messages but I'm not trying to do an artifact build.

 ./mach bootstrap                       

[...]

Would you like to run a configuration wizard to ensure Mercurial is
optimally configured? (Yn): n
 0:03.86 Setting up artifact node.tar.xz
 0:03.86 Using artifact from local cache: /Users/tim/.mozbuild/toolchains/46c579dc11a71cd1-node.tar.xz
 0:03.90 rm tree: /Users/tim/.mozbuild/node
 0:04.30 untarring "/Users/tim/.mozbuild/node.tar.xz"
 0:03.93 Setting up artifact fix-stacks.tar.xz
 0:03.93 Using artifact from local cache: /Users/tim/.mozbuild/toolchains/2d7a56f8b2600435-fix-stacks.tar.xz
 0:03.93 rm tree: /Users/tim/.mozbuild/fix-stacks
 0:03.93 untarring "/Users/tim/.mozbuild/fix-stacks.tar.xz"
 0:02.90 Could not find artifacts for a toolchain build named `macosx64-minidump-stackwalk`. Local commits and other changes in your checkout may cause this error. Try updating to a fresh checkout of mozilla-central to use artifact builds.
Error running mach:

    ['bootstrap']

The error occurred in code that was called by the mach command. This is either
a bug in the called code itself or in the way that mach is calling it.
You can invoke |./mach busted| to check if this issue is already on file. If it
isn't, please use |./mach busted file| to report it. If |./mach busted| is
misbehaving, you can also inspect the dependencies of bug 1543241.

If filing a bug, please include the full output of mach, including this error
message.

The details of the failure are as follows:

subprocess.CalledProcessError: Command '['/usr/local/opt/python/bin/python3.7', '/Users/tim/ffandroid/src/mach', 'artifact', 'toolchain', '--from-build', 'macosx64-minidump-stackwalk']' returned non-zero exit status 1.

  File "/Users/tim/ffandroid/src/python/mozboot/mozboot/mach_commands.py", line 44, in bootstrap
    bootstrapper.bootstrap()
  File "/Users/tim/ffandroid/src/python/mozboot/mozboot/bootstrap.py", line 540, in bootstrap
    checkout_root)
  File "/Users/tim/ffandroid/src/python/mozboot/mozboot/bootstrap.py", line 396, in maybe_install_private_packages_or_exit
    self.instance.ensure_minidump_stackwalk_packages(state_dir, checkout_root)
  File "/Users/tim/ffandroid/src/python/mozboot/mozboot/osx.py", line 568, in ensure_minidump_stackwalk_packages
    minidump_stackwalk.MACOS_MINIDUMP_STACKWALK)
  File "/Users/tim/ffandroid/src/python/mozboot/mozboot/base.py", line 358, in install_toolchain_artifact
    subprocess.check_call(cmd, cwd=state_dir)
  File "/usr/local/Cellar/python/3.7.7/Frameworks/Python.framework/Versions/3.7/lib/python3.7/subprocess.py", line 363, in check_call
    raise CalledProcessError(retcode, cmd)

Bugbug thinks this bug should belong to this component, but please revert this change in case of error.

Component: General → Bootstrap Configuration

Nicholas, is that something you can help with?
thanks

Flags: needinfo?(n.nethercote)
Duplicate of this bug: 1636850

Actually, not Nicholas :)

Flags: needinfo?(nalexander)
Flags: needinfo?(n.nethercote)
Flags: needinfo?(gbrown)
Regressed by: 1635834

Had same issue building on windows 64.

It gave an error attempting to download toolchain "win32-minidump-stackwalk"

Set release status flags based on info from the regressing bug 1635834

This appears to be for any operating system and not just for android builds as this also occurs for desktop builds as well as windows and linux.

I am confused, as 'mach bootstrap' for either android or desktop works for me, on linux. Also, "mach artifact toolchain --from-build macosx64-minidump-stackwalk" (or linux64-minidump-stackwalk, or win32-minidump-stackwalk) works for me.

:bc has this problem and provided this additional info:

$ ./mach artifact toolchain --from-build macosx64-minidump-stackwalk --verbose
 0:01.96 Could not find artifacts for a toolchain build named `macosx64-minidump-stackwalk`. Local commits and other changes in your checkout may cause this error. Try updating to a fresh checkout of mozilla-central to use artifact builds.

This even happens with no local commits. That looks like bug 1635852. :tomprince -- any idea?

Flags: needinfo?(mozilla)
See Also: → 1635852

(In reply to Geoff Brown [:gbrown] from comment #10)

Oh, maybe https://bugzilla.mozilla.org/show_bug.cgi?id=1635852#c5?

Yes, bc and calixte report that variations on that fixed this problem for them.

calixte needed to delete toolkit/crashreporter/pycache.
bc needed to find -name '*.pyc' | xargs rm

(In reply to Geoff Brown [:gbrown] from comment #11)

(In reply to Geoff Brown [:gbrown] from comment #10)

Oh, maybe https://bugzilla.mozilla.org/show_bug.cgi?id=1635852#c5?

Yes, bc and calixte report that variations on that fixed this problem for them.

calixte needed to delete toolkit/crashreporter/pycache.
bc needed to find -name '*.pyc' | xargs rm

Just FYI I believe there exists a ./mach clobber python that should achieve this as well.

Flags: needinfo?(nalexander)

Is there anything we can/should land to avoid the bootstrap failure automatically, or should we just advise "mach clobber python" or similar?

(In reply to Geoff Brown [:gbrown] from comment #13)

Is there anything we can/should land to avoid the bootstrap failure automatically, or should we just advise "mach clobber python" or similar?

Aggressively setting PYTHONDONTWRITEBYTECODE=1 (or whatever the setting is) would be a start, excluding the *.pyc files would be another, but advertising mach clobber python in the error message (since the error message doesn't actually help a lot of the time, and can be rage-inducing when hg status or similar shows a clean tree but things are still broken) would be a good start.

The issue is the list of files includes a *, and so will pickup any files in those directories. The logic for doing that is here. It might make sense to take into account ignored files[1] or look at version control rather than the filesystem (maybe with logic like this).

This is similar to Bug 1580622 where we encountered both the issue in Bug 1635852 and the issue here (though in automation rather than locally). We do (since that bug) set PYTHONDONTWRITEBYTECODE in the decision task for exactly this reason.

[1] We'd probably want to be careful and include files that match ignore patterns but are in version-control

Flags: needinfo?(mozilla)
See Also: → 1637042

I'm throwing a patch out there to augment the error message here as suggested in comment 14. I'm also adding the leave-open keyword here because it seems like the actual underlying cause of this bug is something that should be fixed (and probably not by me because it seems more like a TC thing rather than a build system thing).

RE: comment 15, am I understanding it correctly that we're enumerating every file in some set of subdirectories, including ignored files such as .pyc files, and incorporating them into a hash, which is what's causing these cache misses? In that case, yeah, I strongly support working directly with version control to accomplish this sort of thing (and frankly I'm surprised that's not what it's already doing). git (and, I assume, hg as well) would make this all pretty trivial.

Keywords: leave-open

The current error message leaves you with basically no recourse besides filing a bug if you're already at the latest HEAD. Meanwhile, mach clobber will fix it but in doing so you're taking a very blunt sledgehammer to the problem. Instead, I've updated this error message to tell you you can mach clobber python. I also removed the explicit reference to "artifact builds" because you can encounter this error outside of artifact builds as well. Finally, I added another reminder that mach bootstrap and mach artifact don't work for old revisions of central because I keep getting bugs about it and more screaming about how it's unsupported can't hurt.

Attachment #9147410 - Attachment description: Bug 1636797 - Improve error messgae when artifacts cannot be downloaded → Bug 1636797 - Improve error message when artifacts cannot be downloaded
Assignee: nobody → rstewart
Status: NEW → ASSIGNED

Thanks Ricky. That looks good to me.

Flags: needinfo?(gbrown)
Pushed by rstewart@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5597d830b70f
Improve error message when artifacts cannot be downloaded r=froydnj
Summary: mach bootstrap fails for android builds on mac → mach bootstrap fails install minidump stackwalk
Summary: mach bootstrap fails install minidump stackwalk → mach bootstrap fails installing minidump stackwalk

(In reply to Geoff Brown [:gbrown] from comment #11)

(In reply to Geoff Brown [:gbrown] from comment #10)

Oh, maybe https://bugzilla.mozilla.org/show_bug.cgi?id=1635852#c5?

Yes, bc and calixte report that variations on that fixed this problem for them.

calixte needed to delete toolkit/crashreporter/__pycache__.

What is this __pycache__ thing? It looks like mach clobber python should delete that too, but I'd like to understand what it is before I write a patch for it. :)

(In reply to Nathan Froyd [:froydnj] from comment #21)

(In reply to Geoff Brown [:gbrown] from comment #11)

(In reply to Geoff Brown [:gbrown] from comment #10)

Oh, maybe https://bugzilla.mozilla.org/show_bug.cgi?id=1635852#c5?

Yes, bc and calixte report that variations on that fixed this problem for them.

calixte needed to delete toolkit/crashreporter/__pycache__.

What is this __pycache__ thing? It looks like mach clobber python should delete that too, but I'd like to understand what it is before I write a patch for it. :)

https://stackoverflow.com/questions/16869024/what-is-pycache

It's just the folder that contains .pyc files. mach clobber python will clean out all the __pycache__ folders -- adding another bit of logic to additionally delete folders named __pycache__ would be overkill. :)

rickystewart-a5lvdq:mozilla-unified rickystewart$ for D in `find . -type d -name __pycache__`; do find $D -type f; done | rev | cut -d. -f1 | rev | sort | uniq
pyc

(In reply to Ricky Stewart from comment #16)

RE: comment 15, am I understanding it correctly that we're enumerating every file in some set of subdirectories, including ignored files such as .pyc files, and incorporating them into a hash, which is what's causing these cache misses?

Yes. The case of this bug, for example, all these paths are directories.

In that case, yeah, I strongly support working directly with version control to accomplish this sort of thing (and frankly I'm surprised that's not what it's already doing). git (and, I assume, hg as well) would make this all pretty trivial.

Yeah, I think that would be a win (I mentioned that in both the bugs I linked). We do have a finder implementation that looks at the vcs rather than the filesystem that we could use. One thing to be mindful of is supporting comm-central tasks that use this (not sure if there are any currently).

(In reply to Ricky Stewart from comment #22)

(In reply to Nathan Froyd [:froydnj] from comment #21)

What is this __pycache__ thing? It looks like mach clobber python should delete that too, but I'd like to understand what it is before I write a patch for it. :)

https://stackoverflow.com/questions/16869024/what-is-pycache

It's just the folder that contains .pyc files. mach clobber python will clean out all the __pycache__ folders -- adding another bit of logic to additionally delete folders named __pycache__ would be overkill. :)

Comment 11 suggests that, at least in some cases, removing __pycache__ is necessary; bug 1637034 comment 5 suggests much the same thing. Maybe there's a different explanation, though?

I haven't seen anyone say that mach clobber python didn't fix the problem but deleting __pycache__ folders did, including in the comments that you cited. They're basically just two different ways of accomplishing more or less the same thing. So I would want to see someone reproduce the scenario where deleting __pycache__ fixes a problem that mach clobber python didn't before we take that patch.

I don't see any reason why having empty __pycache__ directories around in tree would break anything, either on the Python side or on the TC side (the source doesn't seem to have anything in it that would risk being broken by the presence of empty directories).

(In reply to Ricky Stewart from comment #25)

I haven't seen anyone say that mach clobber python didn't fix the problem but deleting __pycache__ folders did, including in the comments that you cited. They're basically just two different ways of accomplishing more or less the same thing. So I would want to see someone reproduce the scenario where deleting __pycache__ fixes a problem that mach clobber python didn't before we take that patch.

The entirety of bug 1637034 says that somebody tried mach clobber python and that didn't work, whereas nuking toolkit/crashreporter/ did. But that was only on OS X, for whatever reason.

./mach clobber doesn't do ./mach clobber python.
Also after ./mach clobber python ./mach bootstrap still fails. (Windows 10)

rm -rf toolkit/crashreporter/__pycache__ seems to allow downloading minidump_stackwalk.tar.xz again.

.pyc files are ALWAYS ignored and are never relevant when hashing the state of a working tree. It would be better overall to not consult the filesystem directly and go through the VCS to ensure we never try to hash ignored files, but the .pyc files seem to be the main stumbling block and the primary cause of bugs like bug 1636797, so this is a fine stopgap in the meantime.

Attachment #9148038 - Attachment description: Bug 1636797 - Don't include .pyc files in hash in taskgraph → Bug 1636797 - Don't include .pyc, .pyd, or .pyo files in hash in taskgraph
Pushed by rstewart@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/465ff2025b7f
Don't include .pyc, .pyd, or .pyo files in hash in taskgraph r=tomprince,glandium

As of today, I updated the tree about an hour ago or so, I got
Could not find artifacts for a toolchain build named linux64-minidump-stackwalk
on linux amd64 version. (It is Debian GNU/Linux if it matters.)
Why I wanted to do this is explained below. (*)

This happens even after I ran |mach clobber python|.
So should I run
rm -rf toolkit/crashreporter/pycache seems to allow downloading minidump_stackwalk.tar.xz again.
as in comment 28?

Well, I did the |rm ...| command, but no, |mach bootstrap| still fails with
Could not find artifacts for a toolchain build named linux64-minidump-stackwalk

(*) Why I wanted to run |mach bootstrap| was this.
Maybe this lack of appropriate binary related to fix-stacks is why I get a series of blank lines in the local log where the stackdump would have been for the last couple of days... I thought something was amiss and wanted to obtain the latest binaries just in case.

TIA

I think we've exhausted the amount of work we can do to fix this at a build system level. With the understanding that this is an issue in TC code I'm moving this over to that Bugzilla component.

Component: Bootstrap Configuration → General
Product: Firefox Build System → Taskcluster

(In reply to ISHIKAWA, Chiaki from comment #33)

As of today, I updated the tree about an hour ago or so, I got
Could not find artifacts for a toolchain build named linux64-minidump-stackwalk
on linux amd64 version. (It is Debian GNU/Linux if it matters.)
Why I wanted to do this is explained below. (*)

This happens even after I ran |mach clobber python|.
So should I run
rm -rf toolkit/crashreporter/pycache seems to allow downloading minidump_stackwalk.tar.xz again.
as in comment 28?

Well, I did the |rm ...| command, but no, |mach bootstrap| still fails with
Could not find artifacts for a toolchain build named linux64-minidump-stackwalk

(*) Why I wanted to run |mach bootstrap| was this.
Maybe this lack of appropriate binary related to fix-stacks is why I get a series of blank lines in the local log where the stackdump would have been for the last couple of days... I thought something was amiss and wanted to obtain the latest binaries just in case.

TIA

So unfortunately, deleting random Python-related stuff in your working directory is not a robust solution to this issue (instead it just fixes things in the majority of circumstances for the majority of people). Any dirty or ignored file, if it's in the right place in tree, is liable to cause this error. I'm in the process of figuring out exactly which use cases/mach commands/etc. are creating these dirty/ignored files and what they are. (I suspect it's some sort of test but I'm not sure which one and can't provide specific advice without knowing that, and the code isn't well-suited at this point to diagnose the problem.)

Checking out a clean copy of central, combined with a properly constructed call to git clean or hg purge to wipe your ignored files, should fix things. So for me that git clean command would look like this (YMMV, I recommend giving this sort of thing a dry run beforehand, you might want to adjust this slightly to avoid wiping files you need):

git clean -dfx --exclude=obj* --exclude=mozconfig

Some people are also fixing this by just giving up and freshly cloning central, but that's a super heavyweight solution to the problem and the above should be sufficient.

Assignee: rstewart → nobody
Status: ASSIGNED → NEW

Ricky, can you explain how this is related to TC code? I'm not seeing the link.

Flags: needinfo?(rstewart)
Flags: needinfo?(rstewart)

Ah, thanks, that's the task-graph, not Taskcluster :)

Component: General → Task Configuration
Product: Taskcluster → Firefox Build System

(In reply to Ricky Stewart from comment #34)

I think we've exhausted the amount of work we can do to fix this at a build system level. With the understanding that this is an issue in TC code I'm moving this over to that Bugzilla component.

Comment 29 is still an issue, though.

(In reply to Ricky Stewart from comment #35)

...
So unfortunately, deleting random Python-related stuff in your working directory is not a robust solution to this issue (instead it just fixes things in the majority of circumstances for the majority of people). Any dirty or ignored file, if it's in the right place in tree, is liable to cause this error. I'm in the process of figuring out exactly which use cases/mach commands/etc. are creating these dirty/ignored files and what they are. (I suspect it's some sort of test but I'm not sure which one and can't provide specific advice without knowing that, and the code isn't well-suited at this point to diagnose the problem.)

Checking out a clean copy of central, combined with a properly constructed call to git clean or hg purge to wipe your ignored files, should fix things. So for me that git clean command would look like this (YMMV, I recommend giving this sort of thing a dry run beforehand, you might want to adjust this slightly to avoid wiping files you need):

git clean -dfx --exclude=obj* --exclude=mozconfig

I followed your advice, but used hg:
I used the following.

# Dry run:
hg purge -p      # to print out the files to be deleted. 
                         # Noticed some unexpected files under the mozilla's top directory. Saved a few.
                         # hg purge -p | grep -v /   
                         # helped me to identify such top-level files.
                         # Also noticed a lot of __pyc__ directories were removed.
# For real:
hg purge          #  really removed it.

Voila, finally I could install linux64-minidump-stackwalk, and not only that |mach bootstrap| installed a few more binaries I was not aware of.
Maybe it did install the earlier versions of these tools, but I was not paying attention. I added the proper PATH to these additional tools just in case. (I use GCC, gcc-9, to build TB and so some tools are not used at all, I suspect.)

Some people are also fixing this by just giving up and freshly cloning central, but that's a super heavyweight solution to the problem and the above should be sufficient.

|hg purge| worked perfectly for me.

Thank you.

(In reply to ISHIKAWA, Chiaki from comment #40)

(In reply to Ricky Stewart from comment #35)

...
So unfortunately, deleting random Python-related stuff in your working directory is not a robust solution to this issue (instead it just fixes things in the majority of circumstances for the majority of people). Any dirty or ignored file, if it's in the right place in tree, is liable to cause this error. I'm in the process of figuring out exactly which use cases/mach commands/etc. are creating these dirty/ignored files and what they are. (I suspect it's some sort of test but I'm not sure which one and can't provide specific advice without knowing that, and the code isn't well-suited at this point to diagnose the problem.)

Checking out a clean copy of central, combined with a properly constructed call to git clean or hg purge to wipe your ignored files, should fix things. So for me that git clean command would look like this (YMMV, I recommend giving this sort of thing a dry run beforehand, you might want to adjust this slightly to avoid wiping files you need):

git clean -dfx --exclude=obj* --exclude=mozconfig

I followed your advice, but used hg:
I used the following.

# Dry run:
hg purge -p      # to print out the files to be deleted. 
                         # Noticed some unexpected files under the mozilla's top directory. Saved a few.
                         # hg purge -p | grep -v /   
                         # helped me to identify such top-level files.
                         # Also noticed a lot of __pyc__ directories were removed.
# For real:
hg purge          #  really removed it.

Voila, finally I could install linux64-minidump-stackwalk, and not only that |mach bootstrap| installed a few more binaries I was not aware of.
Maybe it did install the earlier versions of these tools, but I was not paying attention. I added the proper PATH to these additional tools just in case. (I use GCC, gcc-9, to build TB and so some tools are not used at all, I suspect.)

Some people are also fixing this by just giving up and freshly cloning central, but that's a super heavyweight solution to the problem and the above should be sufficient.

|hg purge| worked perfectly for me.

Thank you.

I hasten to add.:

  • I build binaries locally for TB.
  • I issue |make bootstrap| and the above |hg purge| while my current working directory is
    |./mozilla|, the top directory of MC source tree locally.
  • My obj directory is outside the M-C/C-C tree.
    So I did not have to exclude obj* directories, and my mozconfig for local TB is under
    mozilla/comm and it is NOT removed by the |hg purge| above.
    Thus I did not have to exclude it either.
    This is because since |hg purge| issued under ./mozilla directory only purges files under the control of
    M-C hg repository.
    comm is under C-C hg repository.
    We have TWO separate repositories for development of TB.

So the reader may want to check what |hg purge -p| lists carefully, and save files if necessary.

Assignee: nobody → gbrown
Status: NEW → ASSIGNED
Assignee: gbrown → nobody
Status: ASSIGNED → NEW
Pushed by gbrown@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7737a0e28618
Purge __pycache__ in 'mach clobber python'; r=firefox-build-system-reviewers,rstewart
Assignee: nobody → gbrown
Status: NEW → ASSIGNED
Assignee: gbrown → nobody
Status: ASSIGNED → NEW
Duplicate of this bug: 1637034
Duplicate of this bug: 1643278
Summary: mach bootstrap fails installing minidump stackwalk → mach bootstrap fails installing minidump stackwalk or other artifacts ('Could not find artifacts for a toolchain build named...')

I'm getting this error for linux64-minidump-stackwalk with a tree I just updated. I tried ./mach clobber python, I tried manually deleting every __pycache__ or *.pyc in the srcdir, I tried clobbering the objdir completely, and I tried git clean -df. (git clean -ndx doesn't show anything interesting that would be gitignored, other than maybe third_party/python/psutil/build.) What am I missing?

./mach bootstrap --no-system-changes is able to make enough progress before the crash that I can still build.

(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #47)

I'm getting this error for linux64-minidump-stackwalk with a tree I just updated. I tried ./mach clobber python, I tried manually deleting every __pycache__ or *.pyc in the srcdir, I tried clobbering the objdir completely, and I tried git clean -df. (git clean -ndx doesn't show anything interesting that would be gitignored, other than maybe third_party/python/psutil/build.) What am I missing?

./mach bootstrap --no-system-changes is able to make enough progress before the crash that I can still build.

I used hg but I failed with all the workarounds.
My workaround is to check out another clean copy like esr78 and bootstrap there.
(Not central again makes me have dups of central)

As a comparison point, I updated the tree yesterday (M-C, C-C) and |mach bootstrap| worked. (Debian GNU/Linux amd64 and using testing repository, my /etc/debian_version contains "bullseye/sid". (But of course, I deleted unwanted? files in comment 40 and comment 41.)
I noticed mozilla/python/mozboot/mozboot/debian.py has been changed lately, and that also might have helped in my case.

(In reply to Jed Davis [:jld] ⟨⏰|UTC-6⟩ ⟦he/him⟧ from comment #47)

./mach bootstrap --no-system-changes is able to make enough progress before the crash that I can still build.

This actually isn't great, because it crashes before it gets to the tools for building Gecko.

As a workaround, I can edit out the ensure_minidump_stackwalk_packages call from python/mozboot/mozboot/bootstrap.py.

But something closer to a fix is to make a clone of the local checkout (like cd .. && git clone gecko-dev scratch-gecko-dev; this doesn't use the network and is relatively fast), cd into that, copy the mozconfig and do ./mach artifact toolchain --bootstrap --from-build linux64-minidump-stackwalk from there. That downloads the .tar.xz file (and unpacks it into the wrong place, but that doesn't matter) — and somehow that changed things so that, even if I delete the downloaded file, running mach bootstrap from the original source tree works correctly and redownloads the file if needed.

This also means that I can no longer reproduce this bug (I didn't think to take a snapshot of my home volume first), so I can't get any more information unless it happens again.

I think I know part of why I'd been having so much trouble with this, and probably the cause of the weird result in my last comment: one of my topic branches has some minor changes in crash reporter code that I'd forgotten about. (The earlier comments in this bug were all talking about stale files or Python caches, and it didn't occur to me to look for changes in version control itself.)

Specifically, the “regressing commit” touched toolkit/crashreporter/breakpad-client/linux/handler/exception_handler.cc, which isn't directly in any of these directories, but it is recursively underneath toolkit/crashreporter. But in that case, I don't understand why several (nested) subdirectories of toolkit/crashreporter are also included. Maybe it doesn't need to depend on all of those files?

Duplicate of this bug: 1648668

What is the status here? It still fails for me locally on win10x64 (current m-c).

Flags: needinfo?(gbrown)

I'm not actively working on this bug and I don't think anyone else is.

Are you seeing the error message added in https://phabricator.services.mozilla.com/D74732? Are you running 'mach bootstrap' from a fresh, current checkout of mozilla-central? Have you tried running 'mach clobber python'? If so, we might be able to discover more about this problem if you ran something like 'mach artifact toolchain --verbose --from-build=<artifact name that is failing>' and showed us your log.

Flags: needinfo?(gbrown)

(In reply to Geoff Brown [:gbrown] from comment #54)

I'm not actively working on this bug and I don't think anyone else is.

Are you seeing the error message added in https://phabricator.services.mozilla.com/D74732?

yes

Are you running 'mach bootstrap' from a fresh, current checkout of mozilla-central?

from an up-to-date, but not fresh

Have you tried running 'mach clobber python'?

yes

If so, we might be able to discover more about this problem if you ran something like 'mach artifact toolchain --verbose --from-build=<artifact name that is failing>' and showed us your log.

will do.

Flags: needinfo?(honzab.moz)

The output of hg status and hg status -i might also be informative.

$ ./mach artifact toolchain --verbose --from-build=win32-minidump-stackwalk
 0:03.90 Searching for public/build/minidump_stackwalk.tar.xz in ['gecko.cache.level-3.toolchains.v3.win32-minidump-stackwalk.hash.5c51c53de87557c966be2c53090bd5b2f7cfd6ea95a1997ed89ec58538310bc1']
 0:04.47 Could not find artifacts for a toolchain build named `win32-minidump-stackwalk`. Local commits, dirty/stale files, and other changes in your checkout may cause this error. Make sure you are on a fresh, current checkout of mozilla-central. If you are already, you may be able to avoid this error by running `mach clobber python`. Beware that commands like `mach bootstrap` and `mach artifact` are unlikely to work on any versions of the code besides recent revisions of mozilla-central.

not very informative IMO.

In contrast, I see:

$ ./mach artifact toolchain --verbose --from-build=win32-minidump-stackwalk
 0:13.18 Searching for public/build/minidump_stackwalk.tar.xz in ['gecko.cache.level-3.toolchains.v3.win32-minidump-stackwalk.hash.58753a0dd0a3379a08feec2037ea8f58c644adb65c97a6b8dd6630774b6245a4']
 0:13.37 Found public/build/minidump_stackwalk.tar.xz in D1hMP1fPTXex5jhYljvaiQ
 0:13.37 attempt 1/5
 0:13.84 Setting up artifact minidump_stackwalk.tar.xz
 0:13.84 attempt 1/5
 0:13.84 Downloading artifact to local cache: /home/gbrown/.mozbuild/toolchains/0bd29b8d8c59a1cf-minidump_stackwalk.tar.xz
 0:13.97 Downloading... 0.0 %
 0:13.98 Downloading... 7.1 %
...
 0:14.20 Downloading... 96.1 %
 0:14.20 Downloading... 100.0 %
 0:14.21 hashed '/home/gbrown/.mozbuild/toolchains/0bd29b8d8c59a1cf-minidump_stackwalk.tar.xz' with sha256 to be 949428257f29c22e846108587ab6e36fcfb1070b8855d1d8b162097226f10671
 0:14.21 rm tree: /home/gbrown/src/minidump_stackwalk
 0:14.22 untarring "/home/gbrown/src/minidump_stackwalk.tar.xz"

I note the differing hash and suspect that's where this runs into trouble, but I don't have any insight into the relevant code.

I've facing this on macOS and Linux. mach clobber python is not enough, mach clobber is not enough. find . -name "*.pyc" and hg purge --all seem to have made this work.

I've seeing logs similar to Geoff in comment 57 and 58.

Attached file `ht st -i` logs
$./mach clobber
$./mach clobber python
$hg st -i > bootstrap-failure-before.txt
$./mach bootstrap

Note on Artifact Mode:

Artifact builds download prebuilt C++ components rather than building
them locally. Artifact builds are faster!

Artifact builds are recommended for people working on Firefox or
Firefox for Android frontends, or the GeckoView Java API. They are unsuitable
for those working on C++ code. For more information see:
https://developer.mozilla.org/en-US/docs/Artifact_builds.

Please choose the version of Firefox you want to build:
  1. Firefox for Desktop Artifact Mode
  2. Firefox for Desktop
  3. GeckoView/Firefox for Android Artifact Mode
  4. GeckoView/Firefox for Android
Your choice: 2
Your version of Mercurial (5.0.1) is sufficiently modern.
Your version of Python (2.7.15) is new enough.
Your version of Rust (1.44.1) is new enough.
Rust supports aarch64-linux-android, i686-linux-android, i686-pc-windows-msvc, thumbv7neon-linux-androideabi, x86_64-linux-android, x86_64-pc-windows-msvc targets.

Mozilla recommends a number of changes to Mercurial to enhance your
experience with it.

Would you like to run a configuration wizard to ensure Mercurial is
optimally configured? (Yn): n
 0:05.57 Setting up artifact node.tar.bz2
 0:05.57 Using artifact from local cache: c:\Users\xmayhemer\.mozbuild\toolchains\97a30da439a065e2-node.tar.bz2
 0:05.65 rm tree: c:\Users\xmayhemer\.mozbuild\node
 0:06.72 untarring "c:\Users\xmayhemer\.mozbuild\node.tar.bz2"
 0:04.94 Setting up artifact fix-stacks.tar.bz2
 0:04.94 Using artifact from local cache: c:\Users\xmayhemer\.mozbuild\toolchains\7857866b42f45825-fix-stacks.tar.bz2
 0:04.98 rm tree: c:\Users\xmayhemer\.mozbuild\fix-stacks
 0:04.98 untarring "c:\Users\xmayhemer\.mozbuild\fix-stacks.tar.bz2"
 0:04.38 Could not find artifacts for a toolchain build named `win32-minidump-stackwalk`. Local commits, dirty/stale files, and other changes in your checkout may cause this error. Make sure you are on a fresh, current checkout of mozilla-central. If you are already, you may be able to avoid this error by running `mach clobber python`. Beware that commands like `mach bootstrap` and `mach artifact` are unlikely to work on any versions of the code besides recent revisions of mozilla-central.
Error running mach:

    ['bootstrap']

The error occurred in code that was called by the mach command. This is either
a bug in the called code itself or in the way that mach is calling it.
You can invoke |./mach busted| to check if this issue is already on file. If it
isn't, please use |./mach busted file bootstrap| to report it. If |./mach busted| is
misbehaving, you can also inspect the dependencies of bug 1543241.

If filing a bug, please include the full output of mach, including this error
message.

The details of the failure are as follows:

subprocess.CalledProcessError: Command '['c:\\Mozilla\\mozilla-build\\python3\\python3.exe', 'c:\\Mozilla\\src\\mozilla-central3\\mach', 'artifact', 'toolchain', '--bootstrap', '--from-build', 'win32-minidump-stackwalk']' returned non-zero exit status 1.

  File "c:\Mozilla\src\mozilla-central3\python/mozboot/mozboot/mach_commands.py", line 46, in bootstrap
    bootstrapper.bootstrap()
  File "c:\Mozilla\src\mozilla-central3\python/mozboot\mozboot\bootstrap.py", line 565, in bootstrap
    checkout_root)
  File "c:\Mozilla\src\mozilla-central3\python/mozboot\mozboot\bootstrap.py", line 414, in maybe_install_private_packages_or_exit
    self.instance.ensure_minidump_stackwalk_packages(state_dir, checkout_root)
  File "c:\Mozilla\src\mozilla-central3\python/mozboot\mozboot\mozillabuild.py", line 194, in ensure_minidump_stackwalk_packages
    minidump_stackwalk.WINDOWS_MINIDUMP_STACKWALK)
  File "c:\Mozilla\src\mozilla-central3\python/mozboot\mozboot\base.py", line 353, in install_toolchain_artifact
    subprocess.check_call(cmd, cwd=state_dir)
  File "c:\Mozilla\mozilla-build\python3\lib\subprocess.py", line 291, in check_call
    raise CalledProcessError(retcode, cmd)
Sentry is attempting to send 0 pending error messages
Waiting up to 2 seconds
Press Ctrl-Break to quit

$hg st -i > bootstrap-failure-after.txt
Flags: needinfo?(honzab.moz)

deleting third_party/python/psutil-cp27-none-win_amd64/psutil/pycache and python/mozbuild/mozbuild/pycache didn't help.

I would guess that the files under toolkit/, at least, would be having an effect here.

(In reply to Nathan Froyd [:froydnj] from comment #62)

I would guess that the files under toolkit/, at least, would be having an effect here.

If you mean toolkit/components/telemetry/tests/marionette/__pycache__/mach_commands.cpython-36.pyc there was no .pyc file before I ran bootstrap.

I thought the relevant files were those listed at https://searchfox.org/mozilla-central/rev/5a4aaccb28665807a6fd49cf48367d47fbb5a19a/taskcluster/ci/toolchain/minidump_stackwalk.yml#16-25; if that's the case, the intersection with Honza's list is (mfbt/mfbt.vcxproj, mfbt/mfbt.vcxproj.filters, mfbt/mfbt.vcxproj.user), I think.

Could/should the list of filename extensions excluded from hashing be expanded to include .vcxproj and its variants? Any better ideas?

https://searchfox.org/mozilla-central/rev/5a4aaccb28665807a6fd49cf48367d47fbb5a19a/taskcluster/taskgraph/util/hash.py#43

(In reply to Geoff Brown [:gbrown] from comment #65)

Could/should the list of filename extensions excluded from hashing be expanded to include .vcxproj and its variants? Any better ideas?

https://searchfox.org/mozilla-central/rev/5a4aaccb28665807a6fd49cf48367d47fbb5a19a/taskcluster/taskgraph/util/hash.py#43

Note: .vcxproj files aren't in .{git,hg}ignore, but they probably should be, huh?

I mean, yes, we should do this since it looks like another band-aid that might buy us some time. But it's just another band-aid.

Duplicate of this bug: 1653661

This is pretty broadly impacting a bunch of people's development flows, and really needs to be fixed properly.

Severity: -- → S2
Priority: -- → P1
elif conditions.is_git(self):
                cmd = ['git', 'clean', '-d', '-f', '-x', '*.py[cdo]', '*/__pycache__',
                       'third_party/python/']
$ git clean -nxd */__pycache__
Would remove build/__pycache__/
Would remove config/__pycache__/
Would remove python/__pycache__/
Would remove remote/__pycache__/
Would remove security/__pycache__/
Would remove taskcluster/__pycache__/
Would remove testing/__pycache__/
Would remove tools/__pycache__/

Oops:

$ git clean -nxd
Would remove .mozconfig
Would remove accessible/xpcom/__pycache__/
Would remove browser/app/winlauncher/freestanding/__pycache__/
Would remove browser/locales/__pycache__/
Would remove build/__pycache__/
Would remove build/valgrind/__pycache__/
Would remove config/__pycache__/
Would remove config/external/ffi/__pycache__/
Would remove configure
Would remove devtools/shared/css/generated/__pycache__/
Would remove devtools/shared/webconsole/__pycache__/
Would remove dom/base/__pycache__/
Would remove dom/bindings/__pycache__/
Would remove dom/bindings/mozwebidlcodegen/__pycache__/
Would remove dom/bindings/parser/__pycache__/
Would remove dom/encoding/__pycache__/
Would remove gfx/layers/d3d11/__pycache__/
Would remove intl/locale/__pycache__/
[...]

E.g.:

$ git clean -nxd */*/__pycache__
Would remove accessible/xpcom/__pycache__/
Would remove browser/locales/__pycache__/
Would remove build/valgrind/__pycache__/
Would remove dom/base/__pycache__/
Would remove dom/bindings/__pycache__/
Would remove dom/encoding/__pycache__/
Would remove intl/locale/__pycache__/
Would remove layout/generic/__pycache__/
Would remove layout/style/__pycache__/
Would remove media/libdav1d/__pycache__/
Would remove mobile/android/__pycache__/
Would remove mozglue/dllservices/__pycache__/
Would remove netwerk/dns/__pycache__/
Would remove python/safety/__pycache__/
Would remove security/apps/__pycache__/
Would remove taskcluster/taskgraph/__pycache__/
Would remove testing/awsy/__pycache__/
Would remove testing/condprofile/__pycache__/
Would remove testing/firefox-ui/__pycache__/
Would remove testing/geckodriver/__pycache__/
Would remove testing/marionette/__pycache__/
Would remove testing/mochitest/__pycache__/
Would remove testing/mozharness/__pycache__/
Would remove testing/raptor/__pycache__/
Would remove testing/talos/__pycache__/
Would remove testing/tps/__pycache__/
Would remove testing/web-platform/__pycache__/
Would remove testing/xpcshell/__pycache__/
Would remove toolkit/library/__pycache__/
Would remove toolkit/locales/__pycache__/
Would remove tools/browsertime/__pycache__/
Would remove tools/compare-locales/__pycache__/
Would remove tools/lint/__pycache__/
Would remove tools/moztreedocs/__pycache__/
Would remove tools/phabricator/__pycache__/
Would remove tools/power/__pycache__/
Would remove tools/tryselect/__pycache__/
Would remove tools/vcs/__pycache__/
Would remove xpcom/base/__pycache__/
Would remove xpcom/build/__pycache__/
Would remove xpcom/components/__pycache__/
Would remove xpcom/ds/__pycache__/

Should probably be **/__pycache__.

It also feels to me like we should exclude paths containing /__pycache__/ when computing the hash.

Assignee: nobody → rstewart
Status: NEW → ASSIGNED
Assignee: rstewart → nobody
Status: ASSIGNED → NEW
Assignee: nobody → rstewart
Status: NEW → ASSIGNED

(In reply to Mike Hommey [:glandium] from comment #71)

It also feels to me like we should exclude paths containing /__pycache__/ when computing the hash.

Yes definitely.

(In reply to Mike Hommey [:glandium] from comment #70)

Should probably be **/__pycache__.

This doesn't change the results on my machine.

Use PYTHONDONTWRITEBYTECODE=1 by default with mach.

I've attached a patch that fixes clobber-python, but I think now hashes fail because we're missing files.
The real fix is to not hash irrelevant files. (maybe all git-ignore files?)

(In reply to Jeff Gilbert [:jgilbert] from comment #73)

(In reply to Mike Hommey [:glandium] from comment #71)

It also feels to me like we should exclude paths containing /__pycache__/ when computing the hash.

Yes definitely.

(In reply to Mike Hommey [:glandium] from comment #70)

Should probably be **/__pycache__.

This doesn't change the results on my machine.

**/__pycache__/* does.

*/__pycache__/* does too, actually.

Oh ok so git clean -nxd "**/__pycache__/" (which is what the python calls) works but git clean -nxd **/__pycache__/ from my console does not! WFM

Attachment #9165224 - Attachment description: Bug 1636797 - Use ** rather than * when deleting files from `mach clobber python` → Bug 1636797 - Tweak `git clean` call in `mach clobber python`
Pushed by rstewart@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/57e33fc3bce2
Tweak `git clean` call in `mach clobber python` r=mhentges,jgilbert,froydnj
Attachment #9165229 - Attachment is obsolete: true
Assignee: rstewart → nobody
Status: ASSIGNED → NEW

I have done hg purge --all and I'm still getting this error. I'm on central with no other commits or changes.

(In reply to Paul Bone [:pbone] from comment #81)

I have done hg purge --all and I'm still getting this error. I'm on central with no other commits or changes.

Okay, a fresh checkout seems to work. but IDK what I could have missed.

FWIW you can use git check-ignore ... to query Git to see if the specified files are ignored; I bet that Mercurial has a similar command.

Still happening.

Just wasted 1h: After updating Rust, ./mach bootstrap failed; Tried successively: ./mach clobber, ./mach clobber python, manually deleting __pycache__ directories and others, hg purge --all, none helped; had to hg clone a fresh repo. And I'll probably waste a bit more time importing my WIP patches from the old tree...

I'll keep the broken tree around, in case someone would like me to test things. I'm on Windows 10.

(In reply to Gerald Squelart [:gerald] (he/him) from comment #84)

Still happening.

Just wasted 1h: After updating Rust, ./mach bootstrap failed; Tried successively: ./mach clobber, ./mach clobber python, manually deleting __pycache__ directories and others, hg purge --all, none helped; had to hg clone a fresh repo. And I'll probably waste a bit more time importing my WIP patches from the old tree...

I'll keep the broken tree around, in case someone would like me to test things. I'm on Windows 10.

Please provide/upload the results of hg status -i post clobbering/manually removing things/etc.

Flags: needinfo?(gsquelart)

$ hg purge --all
warning: testing/talos/talos/tests/tp5n/amazon.com/www.amazon.com/Kindle-Wireless-Reader-Wifi-Graphite/dp/B002Y27P3M/%5C%22http%3A/g-ecx.images-amazon.com/images/G/01/kindle/shasta/photos/icon_newwindow.V209801570.jpg%5C%22.html cannot be removed
warning: testing/talos/talos/tests/tp5n/amazon.com/www.amazon.com/Kindle-Wireless-Reader-Wifi-Graphite/dp/B002Y27P3M/%5C%22http%3A/g-ecx.images-amazon.com/images/G/01/kindle/turing/btn-add-to-cart-md-pri.V192549178.gif%5C%22.html cannot be removed
warning: testing/talos/talos/tests/tp5n/amazon.com/www.amazon.com/Kindle-Wireless-Reader-Wifi-Graphite/dp/B002Y27P3M/%5C%22http%3A/g-ecx.images-amazon.com/images/G/01/kindle/turing/btn-processing-md-st.V192549173.gif%5C%22.html cannot be removed
warning: testing/talos/talos/tests/tp5n/amazon.com/www.amazon.com/Kindle-Wireless-Reader-Wifi-Graphite/dp/B002Y27P3M/%5C%22http%3A/g-ecx.images-amazon.com/images/G/01/x-locale/personalization/yourstore/message-bullet.V192186523.gif% cannot be removed
warning: testing/talos/talos/tests/tp5n/bild.de/www.bild.de/code/core,15400948.29-15400978.22-15400980.22-15400986.22-15400984.22-15400960.27-15400952.18-15400958.22-15400950.22-15400966.31-15400974.24-15400968.39-15400962.22.bild.js cannot be removed
warning: testing/talos/talos/tests/tp5n/dailymotion.com/www.facebook.com/plugins/likebox.php@href=http%3A%2F%2Fwww.facebook.com%2Fdailymotionusa&width=300&colorscheme=light&connections=5&stream=false&header=false&height=180.html cannot be removed
warning: testing/talos/talos/tests/tp5n/digg.com/dads.new.digg.com/view.html@kw=zone%3A5&kw=mozilla&kw=nice&kw=logo&kw=firefox&kw=mozzilla&kw=new&kw=proposal&kw=really&kw=browser&kw=check&kw=pagetype%3Apermalink&template=5.html cannot be removed
warning: testing/talos/talos/tests/tp5n/filestube.com/www.facebook.com/plugins/likebox.php@href=http%3A%2F%2Fwww.facebook.com%2Fpages%2FFilesTube-Media-Search-Engine%2F135577699745&width=292&connections=0&stream=false&header=true&heig cannot be removed
warning: testing/talos/talos/tests/tp5n/store.apple.com/metrics.apple.com/b/ss/appleglobal,applestoreWW,applestoreus,applestoreUSconsum/1/H.8--NS/0@AQB=1&pccr=true&vidn=26CFC25C85013A76-4000010A40120A71&pageName=No-Script%3AAOS%3A+hom cannot be removed
warning: testing/talos/talos/tests/tp5n/store.apple.com/storeimages.apple.com/1834/as-images.apple.com/is/image/AppleInc/MC531_GEO_US@wid=90&hei=80&fmt=jpeg&qlt=95&op_sharpen=0&resMode=bicub&op_usm=0.5,0.5,0,0&iccEmbed=0&layer=comp cannot be removed
warning: testing/talos/talos/tests/tp5n/store.apple.com/storeimages.apple.com/1834/as-images.apple.com/is/image/AppleInc/ipad-takeover-hero@wid=619&hei=217&fmt=jpeg&qlt=95&op_sharpen=0&resMode=bicub&op_usm=0.5,0.5,0,0&iccEmbed=0&layer cannot be removed
warning: testing/talos/talos/tests/tp5n/terra.com.br/p2.trrsf.com.br/image/get@o=cf&w=89&h=67&src=http%3A%2F%2Fsdp.terra.com.br%2FThumBox%2Ffree%2Fcnt358895_h90_w120_imagens-mostram-frieza-de-assassino-em-escola-de-realengo.jpg cannot be removed
warning: testing/talos/talos/tests/tp5n/tmall.com/a.tbcdn.cn/s/kissy/1.1.6/index.html@%3Fsuggest%2Fsuggest-pkg-min.js,anim%2Fanim-pkg-min.js,switchable%2Fswitchable-pkg-min.js,datalazyload%2Fdatalazyload-pkg-min.js,ajax%2Fajax-pkg-min cannot be removed
warning: testing/talos/talos/tests/tp5n/yahoo.co.jp/b8.yahoo.co.jp/b@P=23L2vsvY..mh_8X5j6PdByB3RagwF02fgoIADTIT&T=14335av37%2FX=1302299266%2FE=2079181999%2FR=jp_toppage%2FK=5%2FV=3.1%2FW=J%2FY=jp%2FF=4001638655%2FQ=-1%2FS=1%2FJ=16F8D8 cannot be removed

$ hg status -i
I testing/talos/talos/tests/tp5n/amazon.com/www.amazon.com/Kindle-Wireless-Reader-Wifi-Graphite/dp/B002Y27P3M/%5C%22http%3A/g-ecx.images-amazon.com/images/G/01/kindle/shasta/photos/icon_newwindow.V209801570.jpg%5C%22.html
I testing/talos/talos/tests/tp5n/amazon.com/www.amazon.com/Kindle-Wireless-Reader-Wifi-Graphite/dp/B002Y27P3M/%5C%22http%3A/g-ecx.images-amazon.com/images/G/01/kindle/turing/btn-add-to-cart-md-pri.V192549178.gif%5C%22.html
I testing/talos/talos/tests/tp5n/amazon.com/www.amazon.com/Kindle-Wireless-Reader-Wifi-Graphite/dp/B002Y27P3M/%5C%22http%3A/g-ecx.images-amazon.com/images/G/01/kindle/turing/btn-processing-md-st.V192549173.gif%5C%22.html
I testing/talos/talos/tests/tp5n/amazon.com/www.amazon.com/Kindle-Wireless-Reader-Wifi-Graphite/dp/B002Y27P3M/%5C%22http%3A/g-ecx.images-amazon.com/images/G/01/x-locale/personalization/yourstore/message-bullet.V192186523.gif%
I testing/talos/talos/tests/tp5n/bild.de/www.bild.de/code/core,15400948.29-15400978.22-15400980.22-15400986.22-15400984.22-15400960.27-15400952.18-15400958.22-15400950.22-15400966.31-15400974.24-15400968.39-15400962.22.bild.js
I testing/talos/talos/tests/tp5n/dailymotion.com/www.facebook.com/plugins/likebox.php@href=http%3A%2F%2Fwww.facebook.com%2Fdailymotionusa&width=300&colorscheme=light&connections=5&stream=false&header=false&height=180.html
I testing/talos/talos/tests/tp5n/digg.com/dads.new.digg.com/view.html@kw=zone%3A5&kw=mozilla&kw=nice&kw=logo&kw=firefox&kw=mozzilla&kw=new&kw=proposal&kw=really&kw=browser&kw=check&kw=pagetype%3Apermalink&template=5.html
I testing/talos/talos/tests/tp5n/filestube.com/www.facebook.com/plugins/likebox.php@href=http%3A%2F%2Fwww.facebook.com%2Fpages%2FFilesTube-Media-Search-Engine%2F135577699745&width=292&connections=0&stream=false&header=true&heig
I testing/talos/talos/tests/tp5n/store.apple.com/metrics.apple.com/b/ss/appleglobal,applestoreWW,applestoreus,applestoreUSconsum/1/H.8--NS/0@AQB=1&pccr=true&vidn=26CFC25C85013A76-4000010A40120A71&pageName=No-Script%3AAOS%3A+hom
I testing/talos/talos/tests/tp5n/store.apple.com/storeimages.apple.com/1834/as-images.apple.com/is/image/AppleInc/MC531_GEO_US@wid=90&hei=80&fmt=jpeg&qlt=95&op_sharpen=0&resMode=bicub&op_usm=0.5,0.5,0,0&iccEmbed=0&layer=comp
I testing/talos/talos/tests/tp5n/store.apple.com/storeimages.apple.com/1834/as-images.apple.com/is/image/AppleInc/ipad-takeover-hero@wid=619&hei=217&fmt=jpeg&qlt=95&op_sharpen=0&resMode=bicub&op_usm=0.5,0.5,0,0&iccEmbed=0&layer
I testing/talos/talos/tests/tp5n/terra.com.br/p2.trrsf.com.br/image/get@o=cf&w=89&h=67&src=http%3A%2F%2Fsdp.terra.com.br%2FThumBox%2Ffree%2Fcnt358895_h90_w120_imagens-mostram-frieza-de-assassino-em-escola-de-realengo.jpg
I testing/talos/talos/tests/tp5n/tmall.com/a.tbcdn.cn/s/kissy/1.1.6/index.html@%3Fsuggest%2Fsuggest-pkg-min.js,anim%2Fanim-pkg-min.js,switchable%2Fswitchable-pkg-min.js,datalazyload%2Fdatalazyload-pkg-min.js,ajax%2Fajax-pkg-min
I testing/talos/talos/tests/tp5n/yahoo.co.jp/b8.yahoo.co.jp/b@P=23L2vsvY..mh_8X5j6PdByB3RagwF02fgoIADTIT&T=14335av37%2FX=1302299266%2FE=2079181999%2FR=jp_toppage%2FK=5%2FV=3.1%2FW=J%2FY=jp%2FF=4001638655%2FQ=-1%2FS=1%2FJ=16F8D8

I've tried removing these files from Windows Explorer, but they stick around. Too-long path?
It doesn't feel like talos test files should impact bootstrap though!? 🤔

Flags: needinfo?(gsquelart)

Presumably hg status -A just tells you about the ignored files and nothing else?

Flags: needinfo?(gsquelart)

(In reply to Nathan Froyd [:froydnj] from comment #87)

Presumably hg status -A just tells you about the ignored files and nothing else?

It gives a list of all files, including "clean" (committed but unchanged) ones.
I can see the same ignored files as in comment 86, and 283,218 clean files, nothing else.

Oh, an extra thing I did (but didn't write in comment 85). is reboot the computer, to make sure there were no files kept open by some stray Python program (as sometimes happens).

Flags: needinfo?(gsquelart)

It doesn't feel like talos test files should impact bootstrap though!? 🤔

For clarity: ANY kind of dirty file ANYWHERE In tree can possibly trigger this bug. It is not necessarily restricted to Python files, non-test files, files unrelated to talos or any other project, or anything else.

Is there some way to diagnose this and move forward? Add more logs, perhaps? Any hints where to look or what to hack to move on? I can't bootstrap at all. I already cleared all hg ignored files from the tree I could think of.

(In reply to Honza Bambas (:mayhemer) from comment #90)

Is there some way to diagnose this and move forward? Add more logs, perhaps? Any hints where to look or what to hack to move on? I can't bootstrap at all. I already cleared all hg ignored files from the tree I could think of.

What does hg status -ui say? What is your base m-c version?

Flags: needinfo?(honzab.moz)

Can we stop hashing the world? Until we stop hashing the whole tree, we're just playing whack-a-mole with adding exceptions.
What's the benefit of hashing globs of source files instead of embedding a file that says which toolchain to pull?
This is a recurring developer productivity blocker, and we should prioritize addressing it properly.

Flags: needinfo?(rstewart)

I'm not the assignee or triage owner so I'm not sure why I've been needinfo'd.

Flags: needinfo?(rstewart)

If you have a better idea of who to ask, pass the NI along!

Flags: needinfo?(mozilla)

I think we've asked to be able to pull artifact aliases (taskcluster.v3.gecko.linux64-minidump-stackwalk) rather than task names in the past and that idea has been met with "let's not do that". But it does seem like that is the right way out of this problem. Tom?

The right way to solve this is to change https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/util/hash.py#24-49 to use VCS information to limit the files it looks at.

I think we've asked to be able to pull artifact aliases (taskcluster.v3.gecko.linux64-minidump-stackwalk) rather than task names in the past and that idea has been met with "let's not do that".

The reason to not use .latest is that caches aren't restricted by branch (deliberately, so that we don't rebuild things on each branch), so that .latest might end up pointing at an esr78 toolchain, if a change is made to a toolchain there.

Flags: needinfo?(mozilla)

I've attached a proof-of-concept patch, but it takes a bunch of terrible shortcuts.

I've made a clean bundle clone of m-c, pulled/updated from hg.m.o, unset MOZCONFIG, just for the case, and ran bootstrap. No problems. Time to play "find 10 differences" game?

Note that I have _obj-* dirs under my normal working trees, that by the way "Directory Junction" the dist subdir to a different drive (for build speed). This works very well when building.

Flags: needinfo?(honzab.moz)
Attached file hg st -ui output

(In reply to Nathan Froyd [:froydnj] from comment #91)

What does hg status -ui say? What is your base m-c version?

note that this is on a tree that bootstrap DOES NOT work, not on the clean checkout.

(apparently, I forgot to press 'y' at the confirmation of uploading the patch, and found it waiting for me this morning)

Depends on: 1658626
Assignee: nobody → rstewart

This resolves a long-standing issue in development where mach artifact (and therefore mach bootstrap) would fail unpredictably if you had dirty, but ignored, files in your checkout. Resolving this problem often required unwieldy hg purge/git ignore incantations that are easy to get wrong.

This patch addresses the problem by doing what we "should" have been doing all along, and consulting the VCS to list tracked files rather than listing EVERY file on disk and applying heuristics to determine whether they should be included in the hash.

We keep the old FileFinder implementation in case it's useful.

The implementation shunts everything into this one file, rather than adding a new abstraction to mozpack/files.py for this. I tried doing that, but mach artifact is reasonably computation-heavy and involves matching LOTS of patterns against LOTS of files and the existing code in files.py isn't well-suited for doing this in a performant way. In order to keep from regressing performance a great deal, the implementation here caches a lot of stuff and does binary search to find candidate files in the master list of every file in the monorepo where appropriate. It's not trivial, but it's possible to refactor this stuff back into files.py to make everything slightly cleaner and to possibly allow other use cases to take advantage of perf improvements in the common code.

Duplicate of this bug: 1658886
Pushed by rstewart@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/98fddfef58ce
In `hash.py`, enumerate files from the VCS rather than searching the filesystem directly r=ahal
Regressions: 1659602

My understanding is that with this patch, this particular issue should be fixed for everyone. Re-open this bug or file a new one if you think I'm wrong about that. :)

Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → 81 Branch
See Also: → 1660165

I've seen enough evidence to suggest that this issue is not entirely fixed. The patch I wrote is certainly an IMPROVEMENT, at least in certain respects and for certain kinds of breakages, but it doesn't fix every kind of issue.

One of the three following scenarios must be true:

  1. The patch D86780 is buggy, i.e., the algorithm as intended is correct but the code translation of that algorithm is wrong.

  2. The patch D86780 is wrong inasmuch as the algorithm is wrong, so we need to augment or replace that patch with the "correct" algorithm that does the correct thing.

  3. Some other piece of infrastructure or code is not working as expected, so the problem is completely unrelated to the hashing stuff we've been discussing to this point, and the patch as written wouldn't possibly have fixed it anyway.

I currently have no evidence to suggest which of those three scenarios we're currently living under, and unfortunately I can't reproduce any of the reported issues, which severely inhibits my ability to root cause.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
See Also: → 1660381

Never mind. The examples we've seen thus far are looking like user error.

Status: REOPENED → RESOLVED
Closed: 9 months ago9 months ago
Resolution: --- → FIXED
Regressions: 1699188
You need to log in before you can comment on or make changes to this bug.