Closed Bug 1363655 Opened 3 years ago Closed 3 years ago

stylo: configure should fail when libclang is not actually there

Categories

(Firefox Build System :: General, defect, P2)

defect

Tracking

(firefox56 fixed)

RESOLVED FIXED
mozilla56
Tracking Status
firefox56 --- fixed

People

(Reporter: glandium, Assigned: froydnj)

References

(Blocks 1 open bug)

Details

Attachments

(9 files, 9 obsolete files)

3.23 KB, patch
rillian
: review+
Details | Diff | Splinter Review
2.38 KB, patch
rillian
: review+
Details | Diff | Splinter Review
1.77 KB, patch
rillian
: review+
Details | Diff | Splinter Review
1.60 KB, patch
rillian
: review+
Details | Diff | Splinter Review
5.71 KB, patch
rillian
: review+
Details | Diff | Splinter Review
1.86 KB, patch
rillian
: review+
Details | Diff | Splinter Review
12.44 KB, patch
rillian
: review+
Details | Diff | Splinter Review
4.81 KB, patch
rillian
: review+
Details | Diff | Splinter Review
2.32 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
My first build ever with --enable-stylo (with a local patch that presumably refailed with:
thread '<unnamed>' panicked at 'Unable to find libclang: "couldn\'t find any of [\'libclang.so\', \'libclang.so.1\'], set the LIBCLANG_PATH environment variable to a path where one of these files can be found (skipped: [])"', src/libcore/result.rs:868

LIBCLANG_PATH was correctly set by configure, because I do have llvm 3.9 installed, but I didn't have libclang 3.9 installed.

Installing the libclang1-3.9 package then further failed with:
thread '<unnamed>' panicked at 'called `Result::unwrap()` on an `Err` value: "could not run executable: `/usr/lib/llvm-3.9/bin/clang`"', src/libcore/result.rs:868

It would be better to fail at configure time.
Please also check bitness mismatches. I got 'Unable to find libclang: "the `libclang` shared library could not be opened: c:/Users/kimu/.mozbuild/clang\\bin\\libclang.dll"' even though the file actually exists. It is really confusing.
Depends on: 1310852, 1333036
Priority: -- → P2
Summary: configure should fail when libclang is not actually there → stylo: configure should fail when libclang is not actually there
Duplicate of this bug: 1366939
Checking --with-clang-path or similar is pretty easy.

Checking --with-libclang-path or the result returned from LLVM_CONFIG is a little harder, because the exact name of the library(ies) that we're looking for depends on the target and what compiler we're using, in the case of mingw builds.  Depending on the target is no problem; depending on whether we're doing a mingw build is a little harder, because we'd need to know the compiler, which means we need an --enable-compile-environment build.  None of the Stylo stuff is currently behind --enable-compile-environment, though arguably it should be; llvm-config and clang are not tools we expect people to have laying around in a --disable-compile-environment build.

I don't know how to address the bitness problems from comment 1.

glandium, does moving all the stylo configuration stuff behind --enable-compile-environment make sense?  Are we going to shoot ourselves in the foot with artifact builds somehow?
Flags: needinfo?(mh+mozilla)
Duplicate of this bug: 1333036
Assigning to Nathan because he is working on this bug.
Assignee: nobody → nfroyd
(In reply to Nathan Froyd [:froydnj] from comment #3)
> glandium, does moving all the stylo configuration stuff behind
> --enable-compile-environment make sense?  Are we going to shoot ourselves in
> the foot with artifact builds somehow?

At least the parts that require clang/libclang/llvm. Things like what makes the pref go on/off should stay independent, since it touches a .js file.
Flags: needinfo?(mh+mozilla)
Posting WIP patches so people (i.e. glandium) can reproduce what I am seeing.
We're trying to get to a point where we can:

- Only check for stylo bindgen bits when --enable-compile-environment; so we can:

- check `target` to determine what our EXE_SUFFIX/LIB_{PREFIX,SUFFIX} are, so:

- we can verify that clang shared libraries/clang itself actually exist.
On some systems, llvm-config may be installed, but not the files to
which it needs to refer.  We should ensure that the values returned
actually make some sense.
stylo.build is just a copy of stylo_config.build, so we should use the
former to limit the number of duplicate concepts we have.  Using
stylo_config consistently enables us to separate out what artifact
builds need vs. non-artifact builds.
We can get this information directly from --enable-stylo-build-bindgen.
Prior to this patch, we had a not-so-great hack in place: the bit of
configure that determined bindgen config paths *and* the bit of
configure that returned various pieces of information about Stylo were
both guarded on --enable-stylo-build-bindgen.  We should really only
have one place that depends on --enable-stylo-build-bindgen, and this
patch puts us in a place to do that, by making bindgen_config_paths
properly use a when= argument to determine when it needs to run.

This commit makes the `stylo` function entirely redundant, and said
function will be removed in the next patch.
After the previous set of patches, it's just a bloated wrapper for
bindgen_config_paths.
Up until this point, everything has worked as expected.  We now have a clearly
delineated set of things that we want to move under the protection of
--enable-compile-environment, mostly so Stylo-enabled artifact builds (to turn
Stylo bits on by default) will not try to check for bindgen things that they're
never going to use.

I was initially going to stick this in an `include(...)` bit, as is done for
everything else, but glandium pointed out that `with only_when(...):` worked
just as well, and also enables us to preserve some semblance of history.
Either way, though, we run into the problems described below.

We only want to check for LLVM_CONFIG when we're building stylo bindgen, and we
only want to offer the ability to tweak stylo bindgen when we're building with
--enable-compile-environment.  But the patch as written results in:

[...frames elided...]
  File "/home/froydnj/src/gecko-dev.git/toolkit/moz.configure", line 679, in <module>
    what='llvm-config', allow_missing=True)
  File "/home/froydnj/src/gecko-dev.git/python/mozbuild/mozbuild/configure/__init__.py", line 733, in wrapper
    ret = template(*args, **kwargs)
  File "/home/froydnj/src/gecko-dev.git/python/mozbuild/mozbuild/configure/__init__.py", line 1000, in wrapped
    return new_func(*args, **kwargs)
  File "/home/froydnj/src/gecko-dev.git/build/moz.configure/checks.configure", line 121, in check_prog
    @checking('for %s' % what, lambda x: quote(x) if x else 'not found')
  File "/home/froydnj/src/gecko-dev.git/python/mozbuild/mozbuild/configure/__init__.py", line 733, in wrapper
    ret = template(*args, **kwargs)
  File "/home/froydnj/src/gecko-dev.git/build/moz.configure/util.configure", line 379, in decorator
    @depends(*args, when=when)
  File "/home/froydnj/src/gecko-dev.git/python/mozbuild/mozbuild/configure/__init__.py", line 667, in depends_impl
    raise ConfigureError('@depends function needs the same `when` '
mozbuild.configure.ConfigureError: @depends function needs the same `when` as options it depends on

How should I go about fixing this?
Attachment #8875886 - Flags: feedback?(mh+mozilla)
Comment on attachment 8875886 [details] [diff] [review]
WIP: try to make stylo bindgen bits depend on a compile environment

I played around with this a lot today, and I don't think there's a good way forward with a separate file or a context manager.  I'm just going to do:

@depends(..., '--enable-compile-environment')

and go from there.
Attachment #8875886 - Flags: feedback?(mh+mozilla)
--with-libclang-path is supposed to be a directory, and
--with-clang-path is supposed to be a file, but it was easy to pass a
directory for the latter, and it was easy to misinterpret the
documentation for --with-libclang-path as pointing to one of the
libraries themselves.  Clearer documentation and additional checks
should help with this situation.
Attachment #8878200 - Flags: review?(giles)
Attachment #8875875 - Attachment is obsolete: true
On some systems, llvm-config may be installed, but not the files to
which it needs to refer.  We should ensure that the values returned
actually make some sense.
Attachment #8878201 - Flags: review?(giles)
Attachment #8875876 - Attachment is obsolete: true
stylo.build is just a copy of stylo_config.build, so we should use the
former to limit the number of duplicate concepts we have.  Using
stylo_config consistently enables us to separate out what artifact
builds need vs. non-artifact builds.
Attachment #8878202 - Flags: review?(giles)
Attachment #8875877 - Attachment is obsolete: true
We can get this information directly from --enable-stylo-build-bindgen.
Attachment #8878204 - Flags: review?(giles)
Attachment #8875878 - Attachment is obsolete: true
Prior to this patch, we had a not-so-great hack in place: the bit of
configure that determined bindgen config paths *and* the bit of
configure that returned various pieces of information about Stylo were
both guarded on --enable-stylo-build-bindgen.  We should really only
have one place that depends on --enable-stylo-build-bindgen, and this
patch puts us in a place to do that, by making bindgen_config_paths
properly use a when= argument to determine when it needs to run.

This commit makes the `stylo` function entirely redundant, and said
function will be removed in the next patch.
Attachment #8878205 - Flags: review?(giles)
Attachment #8875879 - Attachment is obsolete: true
After the previous set of patches, it's just a bloated wrapper for
bindgen_config_paths.
Attachment #8878206 - Flags: review?(giles)
Attachment #8875880 - Attachment is obsolete: true
People building without a compilation environment (artifact builds, l10n
builds) won't have a compiler available, let alone the bits to build
bindgen.  We should limit our checks for bindgen-y things accordingly.
Attachment #8878207 - Flags: review?(giles)
Attachment #8875886 - Attachment is obsolete: true
Ideally this check will alert people that something is wrong with their
configuration.  This check shouldn't normally be firing with our `mach
boostrap` setup, but if somebody chooses to install distribution
packages for some reason, this will at least prevent some problems.

This patch relies on bug 1372987 for library_name_info.
Attachment #8878209 - Flags: review?(giles)
Depends on: 1372987
Comment on attachment 8878200 [details] [diff] [review]
part 1 - clarify the required values for stylo bindgen options

It seems to me this is just being annoying. If the user is passing a file to libclang-path, it's not hard to strip the file name and (try to) use the resulting directory. Likewise for clang, in the other direction.
Comment on attachment 8878207 [details] [diff] [review]
part 7 - make stylo bindgen bits depend on a compile environment

Review of attachment 8878207 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/moz.configure
@@ +659,5 @@
>  
>  option('--disable-stylo-build-bindgen',
>         help='Enable build-time bindgen for Stylo')
>  
> +@depends(stylo_config, '--enable-stylo-build-bindgen', '--enable-compile-environment')

Is it where "when='--enable-compile-environment'" doesn't work?
Comment on attachment 8878200 [details] [diff] [review]
part 1 - clarify the required values for stylo bindgen options

Review of attachment 8878200 [details] [diff] [review]:
-----------------------------------------------------------------

As someone who's been confused by exactly this issue, I think this patch is good. I'd also be ok with glandium's suggestion of just fixing the dir vs. library issue, although I think you should print a warning if you do that there's some chance of checking what moz.configure is actually using.

::: toolkit/moz.configure
@@ +673,5 @@
>  option('--disable-stylo-build-bindgen',
>         help='Enable build-time bindgen for Stylo')
>  
>  option('--with-libclang-path', nargs=1,
> +       help='Absolute path to a directory containing Clang/LLVM libraries for Stylo (version 3.9.x')

Bindgen works with llvm 4 and 5 now, so I'd drop the version number while you're here. It's likely to be out of date soon anyway, given the problems we've had with 3.9.0 on macOS.
Attachment #8878200 - Flags: review?(giles) → review+
Comment on attachment 8878201 [details] [diff] [review]
part 2 - check for existence of paths returned by llvm-config

Review of attachment 8878201 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/moz.configure
@@ +732,5 @@
> +        # they need.
> +        if not os.path.exists(libclang_path) or \
> +           not os.path.isdir(libclang_path):
> +            die(dedent('''\
> +            The path {} returned by `llvm-config {}` is not a directory.

I think this is trying to do too much at once, and mostly duplicates the later check for --with-clang-path/--with-libclang-path options.

llvm-config is less likely to confuse file and directory paths, so maybe just check for existence and say "The directory {} returned by `llvm-config {}` doesn't exist."

@@ +751,5 @@
>  
>      if (not libclang_path and clang_path) or \
>         (libclang_path and not clang_path):
>          die(dedent('''\
>          You must provide both of --with-libclang-path and --with-clang-path

Since this error message is specific to the --with-*clang-path option, it would be more clear to move this check and the subsequent return above the llvm_config block, to make more clear that these options override that logic.
Attachment #8878201 - Flags: review?(giles) → review+
Comment on attachment 8878202 [details] [diff] [review]
part 3 - remove stylo.build key in favor of stylo_config.build

Review of attachment 8878202 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/moz.configure
@@ -790,5 @@
>          PATH may expose 'clang' as well, potentially altering your compiler,
>          which may not be what you intended.'''))
>  
>      return namespace(
> -        build=bool(stylo_config.build),

stylo_config.build can be None instead of False. Does that cause problems with dropping the `bool()`?

You can also remove the key from the `if not bindgen_enabled` stanza higher up, I think.
Attachment #8878202 - Flags: review?(giles) → review+
(In reply to Mike Hommey [:glandium] from comment #24)
> ::: toolkit/moz.configure
> @@ +659,5 @@
> >  
> >  option('--disable-stylo-build-bindgen',
> >         help='Enable build-time bindgen for Stylo')
> >  
> > +@depends(stylo_config, '--enable-stylo-build-bindgen', '--enable-compile-environment')
> 
> Is it where "when='--enable-compile-environment'" doesn't work?

I couldn't get something like:

with only_when('--enable-compile-environment'):
    llvm_config = check_prog('LLVM_CONFIG', when=building_stylo_bindgen)

    @depends(..., when=building_stylo_bindgen)
    def bindgen_config_paths(...)

or variations to work.  Which I think is a "yes" to your question, but I'm not sure.
(In reply to Mike Hommey [:glandium] from comment #23)
> It seems to me this is just being annoying. If the user is passing a file to
> libclang-path, it's not hard to strip the file name and (try to) use the
> resulting directory. Likewise for clang, in the other direction.

I think this is true, though I'd be a little surprised if people were passing an actual .so/.dll/.dylib file to --with-libclang-path.  I could see trying to be friendly for --with-clang-path, though.

The ideal here is that people will specify LLVM_CONFIG and things will Just Work; the other options are these as fallbacks for cases (Windows) where llvm-config doesn't exist...although I think newer versions have finally started packaging llvm-config on Windows as well, so maybe we'll be able to drop these separate options soon?
Attachment #8878204 - Flags: review?(giles) → review+
Comment on attachment 8878205 [details] [diff] [review]
part 5 - make things properly depend on stylo bindgen building

Review of attachment 8878205 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/moz.configure
@@ +657,5 @@
>          enable = enable_stylo,
>      )
>  
> +option('--disable-stylo-build-bindgen',
> +       help='Enable build-time bindgen for Stylo')

Please s/Enable/Disable/ in the help string too, while you're here. That's less confusing and matches the other --disable options we give.
Attachment #8878205 - Flags: review?(giles) → review+
Attachment #8878206 - Flags: review?(giles) → review+
Comment on attachment 8878207 [details] [diff] [review]
part 7 - make stylo bindgen bits depend on a compile environment

Review of attachment 8878207 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/moz.configure
@@ +623,5 @@
>  option('--enable-stylo', nargs='?', choices=('build',),
>         help='Include Stylo in the build and/or enable it at runtime')
>  
> +@depends('--enable-stylo', milestone, target, '--help')
> +def stylo_config(value, milestone, target, _):

Does passing '--help' here make the help output omit the clang-path stuff if '--disable-compile-environment'?
Attachment #8878207 - Flags: review?(giles) → review+
Comment on attachment 8878209 [details] [diff] [review]
part 8 - check for the existence of a libclang bindgen will use

Review of attachment 8878209 [details] [diff] [review]:
-----------------------------------------------------------------

lgtm with comments addressed. Sorry the review took so long.
Attachment #8878209 - Flags: review?(giles) → review+
Pushed by nfroyd@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/af0dd402ca5b
part 1 - clarify the required values for stylo bindgen options; r=rillian
https://hg.mozilla.org/integration/mozilla-inbound/rev/5d8ef49108c1
part 2 - check for existence of paths returned by llvm-config; r=rillian
https://hg.mozilla.org/integration/mozilla-inbound/rev/ae3346e96d47
part 3 - remove stylo.build key in favor of stylo_config.build; r=rillian
https://hg.mozilla.org/integration/mozilla-inbound/rev/45f92bb6cf01
part 4 - remove bindgen_enabled key from namespace returned from stylo; r=rillian
https://hg.mozilla.org/integration/mozilla-inbound/rev/f8c24ace77cb
part 5 - make things properly depend on stylo bindgen building; r=rillian
https://hg.mozilla.org/integration/mozilla-inbound/rev/ff90fedc24b2
part 6 - remove the `stylo` configure function; r=rillian
https://hg.mozilla.org/integration/mozilla-inbound/rev/247d01ab53c7
part 7 - make stylo bindgen bits depend on a compile environment; r=rillian
https://hg.mozilla.org/integration/mozilla-inbound/rev/cc58f2b74c4f
part 8 - check for the existence of a libclang bindgen will use; r=rillian
This seems to have broken building on Windows for me. I get:

  TypeError: coercing to Unicode: need string or buffer, PositiveOptionValue found

here:

  https://hg.mozilla.org/mozilla-central/file/f8bb96fb5c4f/toolkit/moz.configure#l776

At that point `libclang_path` is PositiveOptionValue(u'c:/Users/Brian/.mozbuild/clang/bin',)

My mozconfig being:

> . $topsrcdir/browser/config/mozconfig
> mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/obj-stylo-debug-opt
> mk_add_options AUTOCLOBBER=1
> ac_add_options --enable-optimize
> ac_add_options --enable-debug
> ac_add_options --enable-tests
> ac_add_options --enable-stylo
> ac_add_options --target=x86_64-pc-mingw32
> ac_add_options --host=x86_64-pc-mingw32
> ac_add_options --with-libclang-path="c:/Users/Brian/.mozbuild/clang/bin"
> ac_add_options --with-clang-path="c:/Users/Brian/.mozbuild/clang/bin/clang.exe"
This patch fixes the problem for me at least.
Attachment #8880212 - Flags: review?(nfroyd)
Comment on attachment 8880212 [details] [diff] [review]
Unwrap (lib)clang_path before passing it to os.path.* methods

Review of attachment 8880212 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/moz.configure
@@ +778,3 @@
>              die(dedent('''\
>              The argument to --with-libclang-path is not a directory: {}
>              '''.format(libclang_path)))

Actually, I guess we need to do the same thing here too...
Attached patch unwrapPaths (obsolete) — Splinter Review
Attachment #8880212 - Attachment is obsolete: true
Attachment #8880212 - Flags: review?(nfroyd)
Attachment #8880213 - Flags: review?(nfroyd)
Attachment #8880213 - Attachment is obsolete: true
Attachment #8880213 - Flags: review?(nfroyd)
Attachment #8880215 - Flags: review?(nfroyd)
Comment on attachment 8880215 [details] [diff] [review]
Unwrap (lib)clang_path before passing it to os.path.* methods

Review of attachment 8880215 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for the patch!  I'm not exactly sure how this worked in the first place; chatter in #build indicates that os.path.exists will happily accept tuples on Linux but not on Mac (or Windows apparently) (!).

One small fix which I can apply locally and push to inbound for you.

::: toolkit/moz.configure
@@ +773,5 @@
>              You must provide both of --with-libclang-path and --with-clang-path
>              or neither of them.'''))
>  
> +        if not os.path.exists(libclang_path[0]) or \
> +           not os.path.isdir(libclang_path[0]):

I think it's cleaner to just unwrap once and for all before this check:

libclang_path = libclang_path[0]
clang_path = clang_path[0]

and then we don't have to worry about it at all (we do have to be careful to change the values in the namespace() call below, though).
Attachment #8880215 - Flags: review?(nfroyd) → review+
Pushed by nfroyd@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/842a147e06b8
Unwrap (lib)clang_path before passing it to os.path.* methods; r=froydnj
Depends on: 1375798
Depends on: 1387510
Well, it might be fixed, but believe it or not, some of us only install llvm for essentials (in my case, for Mesa) and do NOT install clang. I did test 56.0b7 on a machine where clang was installed, and ended up having to disable stylo because it couldn't find headers from system nspr. So I thought I'd try the current version on this machine which has llvm without clang to see if that had been fixed. But it doesn't get very far.

ERROR: Could not find the clang shared library in the path /usr/lib
returned by `llvm-config --libdir` (searched for files [u'libclang.so.1', u'libclang.so']).
Please check your system configuration.

Looking at the mozconfig files, it seems to be doing this as part of a process to decide if stylo should be enabled. AFAICS clang is still an optional extension of llvm, so I would expect the tests to in effect say "well, we can find llvm, but clang is missing so we can't build stylo".
(In reply to Ken Moffat from comment #43)
> Well, it might be fixed, but believe it or not, some of us only install llvm
> for essentials (in my case, for Mesa) and do NOT install clang. I did test
> 56.0b7 on a machine where clang was installed, and ended up having to
> disable stylo because it couldn't find headers from system nspr. So I
> thought I'd try the current version on this machine which has llvm without
> clang to see if that had been fixed. But it doesn't get very far.
> 
> ERROR: Could not find the clang shared library in the path /usr/lib
> returned by `llvm-config --libdir` (searched for files [u'libclang.so.1',
> u'libclang.so']).
> Please check your system configuration.
> 
> Looking at the mozconfig files, it seems to be doing this as part of a
> process to decide if stylo should be enabled. AFAICS clang is still an
> optional extension of llvm, so I would expect the tests to in effect say
> "well, we can find llvm, but clang is missing so we can't build stylo".

If --disable-stylo doesn't work to disable this check, please file a followup bug rather than trying to change the flags on this bug.
My apologies if my reading of bugzilla's advice to me caused inappropriate action - I thought it was encouraging me to change the flags on this.

Yes, --disable-stylo works around this, my points are that the message from the failure doesn't point to that as a workaround, that adding an extra build dependency of clang on top of rust is very unpleasant, and assuming that having llvm-config implies clang is present is a strange decision - at least for linux users.
(In reply to Ken Moffat from comment #45)
> Yes, --disable-stylo works around this, my points are that the message from
> the failure doesn't point to that as a workaround, that adding an extra
> build dependency of clang on top of rust is very unpleasant, and assuming
> that having llvm-config implies clang is present is a strange decision - at
> least for linux users.

Perhaps the error message should be changed to indicate the existence of --disable-stylo and/or perhaps we should check for the existence of `clang` first, rather than the shared libraries for clang.  Would you mind filing bugs for either or both of those?

Regardless, clang is not a hard dependency at this point and for many cases is an easier dependency to deal with than Rust: there are none of the bootstrapping issues Rust has, for instance.
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.