Closed Bug 1753182 Opened 4 years ago Closed 4 years ago

building fails if build dir is symlinked, >96.0

Categories

(Firefox Build System :: General, defect)

Firefox 96
defect

Tracking

(firefox-esr91 unaffected, firefox97 wontfix, firefox98 wontfix, firefox99 fixed)

RESOLVED FIXED
99 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox97 --- wontfix
firefox98 --- wontfix
firefox99 --- fixed

People

(Reporter: juippis, Assigned: mhentges)

References

(Regression)

Details

(Keywords: regression)

Attachments

(2 files)

Steps to reproduce:

Symlink the directory firefox is being built in. E.g.
mkdir -p /slow/tmp_portage
ln -s /slow/tmp_portage /var/tmp/portage

Proceed trying to build Firefox >96.0 in that symlinked directory, /var/tmp/portage/...

Actual results:

 0:14.26 Traceback (most recent call last):
 0:14.26   File "/usr/lib/python3.10/runpy.py", line 196, in _run_module_as_main
 0:14.27     return _run_code(code, main_globals, None,
 0:14.27   File "/usr/lib/python3.10/runpy.py", line 86, in _run_code
 0:14.27     exec(code, run_globals)
 0:14.27   File "/slow/tmp_portage/www-client/firefox-96.0.3/work/firefox-96.0.3/python/mozbuild/mozbuild/action/webidl.py", line 20, in <module>
 0:14.27     sys.exit(log_build_task(main, sys.argv[1:]))
 0:14.27   File "/slow/tmp_portage/www-client/firefox-96.0.3/work/firefox-96.0.3/python/mozbuild/mozbuild/action/util.py", line 18, in log_build_task
 0:14.27     return f(*args, **kwargs)
 0:14.27   File "/slow/tmp_portage/www-client/firefox-96.0.3/work/firefox-96.0.3/python/mozbuild/mozbuild/action/webidl.py", line 16, in main
 0:14.27     manager.generate_build_files()
 0:14.27   File "/slow/tmp_portage/www-client/firefox-96.0.3/work/firefox-96.0.3/dom/bindings/mozwebidlcodegen/__init__.py", line 293, in generate_build_files
 0:14.27     self._parse_webidl()
 0:14.27   File "/slow/tmp_portage/www-client/firefox-96.0.3/work/firefox-96.0.3/dom/bindings/mozwebidlcodegen/__init__.py", line 413, in _parse_webidl
 0:14.27     self._config = Configuration(
 0:14.27   File "/slow/tmp_portage/www-client/firefox-96.0.3/work/firefox-96.0.3/dom/bindings/Configuration.py", line 117, in __init__
 0:14.27     raise TypeError(
 0:14.27 TypeError: Interfaces which are exposed to the web may only be defined in a DOM WebIDL root ('/slow/tmp_portage/www-client/firefox-96.0.3/work/firefox-96.0.3/dom/webidl', '/slow/tmp_portage/www-client/firefox-96.0.3/work/firefox-96.0.3/dom/bindings', '/var/tmp/portage/www-client/firefox-96.0.3/work/firefox_build/dom/bindings'). Consider marking the interface [ChromeOnly] or [Func='nsContentUtils::IsCallerChromeOrFuzzingEnabled'] if you do not want it exposed to the web.
 0:14.27 /slow/tmp_portage/www-client/firefox-96.0.3/work/firefox_build/dom/bindings/Animation.webidl line 18:10
 0:14.27 interface Animation : EventTarget {
 0:14.27           ^
 0:14.74 gmake[4]: *** [Makefile:54: webidl.stub] Error 1
 0:14.74 gmake[4]: Leaving directory '/slow/tmp_portage/www-client/firefox-96.0.3/work/firefox_build/dom/bindings'
 0:14.74 gmake[3]: *** [/slow/tmp_portage/www-client/firefox-96.0.3/work/firefox-96.0.3/config/recurse.mk:99: dom/bindings/export] Error 2
 0:14.74 gmake[3]: *** Waiting for unfinished jobs....
 0:23.71 touch ipdl.track
 0:23.71 gmake[4]: Leaving directory '/slow/tmp_portage/www-client/firefox-96.0.3/work/firefox_build/ipc/ipdl'
 0:23.71 gmake[3]: Leaving directory '/slow/tmp_portage/www-client/firefox-96.0.3/work/firefox_build'
 0:23.71 gmake[2]: *** [/slow/tmp_portage/www-client/firefox-96.0.3/work/firefox-96.0.3/config/recurse.mk:34: export] Error 2
 0:23.71 gmake[2]: Leaving directory '/slow/tmp_portage/www-client/firefox-96.0.3/work/firefox_build'
 0:23.71 gmake[1]: *** [/slow/tmp_portage/www-client/firefox-96.0.3/work/firefox-96.0.3/config/rules.mk:352: default] Error 2
 0:23.71 gmake[1]: Leaving directory '/slow/tmp_portage/www-client/firefox-96.0.3/work/firefox_build

Full build.log attached.

Expected results:

The build should finish. This works in firefox-95 so something has been introduced between 95 and 96 to break it. I suspect https://bugzilla.mozilla.org/show_bug.cgi?id=1742564 is somehow relevant.

Since the error happens quite early in the build, I should be able to bisect the exact commit if-needed.

The Bugbug bot thinks this bug should belong to the 'Firefox Build System::General' component, and is moving the bug to that component. Please revert this change in case you think the bot is wrong.

Component: Untriaged → General
Product: Firefox → Firefox Build System

Hmm, thanks for the report!
Are you able to bisect this failure to a specific revision? That would be really helpful :)

Flags: needinfo?(juippis)

So I tried to get into this today. I've learned it's Gentoo portage related (the package manager), since if I manually do mach build in a symlinked directory it will not fail even for 96.0.3. Now I can't yet exactly pinpoint the error, maybe it's the sandboxing, or the sandbox environment itself somehow, but this surely is somehow related to the new python/mach setup since that wasn't used in 95 right?

I'll keep investigating on Gentoo-side, since this bug affects our users.

I'll post back here if I somehow manage to find the exact cause.

Status: UNCONFIRMED → RESOLVED
Closed: 4 years ago
Flags: needinfo?(juippis)
Resolution: --- → INVALID

Sounds good, thanks.

but this surely is somehow related to the new python/mach setup since that wasn't used in 95 right?

Yeah, it could be related to the pathlib migration, since we've added some .resolve() calls.

I can confirm this (on a non-gentoo system, but a system which uses symlinks to build dirs by default). Easily reproducible:

ln -s mozilla foo
cd foo
mkdir bar
cd bar
(configure and build as usual, oops it fails)

i.e. it doesn't just fail if the build dir is symlinked: it fails if any directory on the path is symlinked.

The message shows the problem:

0:37.66 TypeError: Interfaces which are exposed to the web may only be defined in a DOM WebIDL root ('/usr/src/mozilla/mozilla/dom/webidl', '/usr/src/mozilla/mozilla/dom/bindings', '/usr/src/mozilla/x86_64-loom/shai-build.loom/instrumented/dom/bindings'). Consider marking the interface [ChromeOnly] or [Func='nsContentUtils::IsCallerChromeOrFuzzingEnabled'] if you do not want it exposed to the web.

That .resolve call really fubars things :/ you can even tell which bit was pre-resolve() and which bit was post-...

A crude solution: resolve the current directory and cd into it as the first thing we do, and resolve the $MOZCONFIG and $MOZ_OBJDIR, and (for me at least) the problem goes away. (It would be hard to see how it wouldn't.)

(I've done this in the build system, not Mach, but Mach is probably the right place to do it.)

you can even tell which bit was pre-resolve() and which bit was post-...

Looking at the message you pasted, I can't find any that are pre-resolve: none of them are /usr/src/foo. Can you elaborate?

Easily reproducible:

I'm trying this exact technique:

  1. Clone Firefox
  2. ln -s firefox firefox-symlink
  3. cd firefox-symlink
  4. ./mach clobber && ./mach build (build succeeded)
  5. pwd && readlink -f .
/home/mitch/dev/firefox-symlink
/home/mitch/dev/firefox

I'm currently on revision e8444fbb022b.

Nix, can you provide additional information on how to reproduce this issue?

Flags: needinfo?(nix)

Oh sorry, that wasn't very clear: I shouldn't have changed naming schemes in the middle of the post! In the system above, /usr/src/mozilla/x86_64-loom is a symlink to 'mozilla', which is the actual build directory. Under that is the build directory (actually a bind-mount), 'shai-build.loom'.

So in

0:37.66 TypeError: Interfaces which are exposed to the web may only be defined in a DOM WebIDL root ('/usr/src/mozilla/mozilla/dom/webidl', '/usr/src/mozilla/mozilla/dom/bindings', '/usr/src/mozilla/x86_64-loom/shai-build.loom/instrumented/dom/bindings'). Consider marking the interface [ChromeOnly] or [Func='nsContentUtils::IsCallerChromeOrFuzzingEnabled'] if you do not want it exposed to the web.

the last path hasn't had its components resolved, but the first two have (the x86_64-loom symlink has been resolved to mozilla). Resolving the paths explicitly before invoking make works around the problem, but really checks like this should ensure that path components are resolved themselves (or the build system should resolve everything first).

So, for me, this fails:

Clone firefox
PKGROOT=$(pwd)
ln -s firefox firefox-symlink
mkdir buld
# fill out build/mozconfig with MOZILLA_OFFICIAL=t, various BINDGEN_CFLAGS, configure args etc
export MOZCONFIG=$PKGROOT/firefox-symlink/build/mozconfig
export MOZ_OBJDIR=$PKGROOT/firefox-symlink/build
./mach configure && ./mach build -j $(getconf _NPROCESSORS_ONLN)

I suspect the presence of the symlinked path component in MOZCONFIG or (more likely) MOZ_OBJDIR.

Flags: needinfo?(nix)

Ugggh. Sorry about the markdown fubar, should have previewed it :( would edit it but don't see how...

No worries - for future reference, there should be a "pencil" button in the top-right of each comment that you can use for editing.
Thanks for the additional information, I'll take a look.

Status: RESOLVED → REOPENED
Ever confirmed: true
Resolution: INVALID → ---

The first revision that fails with this error is ca4d439114f3.

Regressed by: 1730712

Previously, the Python virtualenv path would be realpath'd before the
virtualenv was activated [1].

However, now that (when going through Mach) we're calling configure.py
with the build virtualenv's Python binary directly, that realpath()
was lost.

We //could// realpath(self.topsrcdir) in building.py, but then the
virtualenv will be needlessly re-created when it's called from a
non-normpath'd context.

Instead, let's leave realpath-ing Mach's self.topsrcdir to another
day, and let's spot-fix this issue: when evaluating PYTHON3 in
configure, do realpath() on the path we get from the running Python
process.

Note: sys.prefix was normpath'd instead of (...).python_path,
because on Linux virtualenv's bin/python is symlinked to the system
installation it's associated with, which we don't want here.

[1] https://hg.mozilla.org/mozilla-central/rev/ca4d439114f3#l1.61

Assignee: nobody → mhentges

I can confirm this patch fixes the problem I observed: FF 97 now builds for me happily (or rather I'm fighting wasi and llvm now, which is much further into the build!). Joonas, can you confirm? (I tried it under portage as well as under my crazy homebrew autobuilder and it seemed to be happy enough.)

Flags: needinfo?(juippis)

Yes, I can confirm this works with portage's build dir being symlinked. And doesn't break the normal case when it's not. Thank you both for working on this! I did manage to unpack the source and look for realpath. solution, but it never materialized into a patch. 97 and 91.6.0 has kept me busy this week.

I'll ship the patch with Gentoo's 97.0, thanks again! Will report back if new problems arise.

Flags: needinfo?(juippis)

Set release status flags based on info from the regressing bug 1730712

Has Regression Range: --- → yes
Pushed by mhentges@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/199a0f0d49eb Get realpath of Python during configure r=glandium
Status: REOPENED → RESOLVED
Closed: 4 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 99 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: