Closed Bug 1710235 Opened 4 years ago Closed 4 years ago

"no matching function" error with GCC 11

Categories

(Firefox Build System :: General, defect, P3)

x86_64
Linux
defect

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: anotherworldofworld, Unassigned)

References

Details

Attachments

(4 files, 2 obsolete files)

Attached file build.log (obsolete) —

User Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:88.0) Gecko/20100101 Firefox/88.0

Steps to reproduce:

Building gecko-dev from master branch with gcc version 11.1.1 20210428 (Red Hat 11.1.1-1) (GCC) on Fedora 34.

Attached build.log with full output of the:

$ MACH_USE_SYSTEM_PYTHON=1 ./mach build

command. The actuall error starts from the 3193 line

OS: Unspecified → Linux
Hardware: Unspecified → x86_64

The failure appears to be:

0:09.48 /home/alex/disk2/gecko-dev/obj-x86_64-pc-linux-gnu/dist/include/nsTHashtable.h:317:27: error: no matching function for call to ‘nsTHashtable<detail::VoidPtrHashKey>::WithEntryHandle(const void*&, const fallible_t&, nsTHashtable<detail::VoidPtrHashKey>::PutEntry(nsTHashtable<detail::VoidPtrHashKey>::KeyType, const fallible_t&)::<lambda(auto:7)>)’

Two questions for you:

  1. Why are you using MACH_USE_SYSTEM_PYTHON=1 when doing ./mach build?
  2. Can you share your mozbuild and obj-*/config.status?
Flags: needinfo?(anotherworldofworld)
Attached file config.status (obsolete) —
Flags: needinfo?(anotherworldofworld)

Hello,

Why are you using MACH_USE_SYSTEM_PYTHON=1 when doing ./mach build?

Without it I'm getting followign error:

~/disk2/gecko-dev (master) $ ./mach build
This mach command requires /home/alex/.mozbuild/_virtualenvs/mach/bin/python, which wasn't found on the system!
Consider running 'mach bootstrap' or 'mach create-mach-environment' to create the mach virtualenvs, or set MACH_USE_SYSTEM_PYTHON to use the system Python installation over a virtualenv.

so I've tried to use MACH_USE_SYSTEM_PYTHON as suggested and before updating to Fedora 34 it worked.

Can you share your mozbuild and obj-*/config.status?

Yes. Attached.

Sweet, thanks. I'm a little low on time this week, but I should be able to dig in more later this week.
Two things:

  1. I'd definitely recommend running ./mach bootstrap, then running ./mach build without the environment variable. ./mach bootstrap supplies dependencies that the build system blindly uses. If the build system is blindly using incorrect tools, sometimes it can explode in a tough-to-diagnose way, such as in this failure
  2. Can you upload your mozconfig (not mozbuild, my mistake) file here, so I can use it to replicate the issue?
Flags: needinfo?(anotherworldofworld)

I'd definitely recommend running ./mach bootstrap, then running ./mach build without the environment variable. ./mach bootstrap supplies dependencies that the build system blindly uses. If the build system is blindly using incorrect tools, sometimes it can explode in a tough-to-diagnose way, such as in this failure

I have tried to build Firefox with ./mach bootstrap and ./mach build after it. The build error is the same. I will attach match-bootstrap.log and mach-build.log files with full output of these commands.

Can you upload your mozconfig (not mozbuild, my mistake) file here, so I can use it to replicate the issue?

I'm not using custom mozconfig. There is just ./browser/config/mozconfig with default options I think:

# This Source Code Form is subject to the terms of the Mozilla Public
# License, v. 2.0. If a copy of the MPL was not distributed with this
# file, You can obtain one at http://mozilla.org/MPL/2.0/.

# This file specifies the build flags for Firefox.  You can use it by adding:
#  . $topsrcdir/browser/config/mozconfig
# to the top of your mozconfig file.

ac_add_options --enable-application=browser

The one difference that I've found is that I'm getting the same compile error if I will choose 2 option from mach bootstrap:

Please choose the version of Firefox you want to build:

  1. Firefox for Desktop Artifact Mode [default]
  2. Firefox for Desktop
  3. GeckoView/Firefox for Android Artifact Mode
  4. GeckoView/Firefox for Android
  5. SpiderMonkey JavaScript engine

In a case of when I'm trying to use Artifact Mode firefox is compiled.

Attached file mach-bootstrap.log
Flags: needinfo?(anotherworldofworld)
Attached file mach-build.log
Attachment #9220968 - Attachment is obsolete: true
Attachment #9221265 - Attachment is obsolete: true

Nice, thanks for the details. The ./mach bootstrap looks good 👍.

I'm not using custom mozconfig

2 questions:

  1. If you're not using a custom mozconfig, then how have you configured the build system to use GCC?
  2. Can you upload /home/alex/disk2/gecko-dev/mozconfig?
    • Also, you can try temporarily removing /home/alex/disk2/gecko-dev/mozconfig, then doing a clobber build to see if that works, so we can narrow down the issue.

In a case of when I'm trying to use Artifact Mode firefox is compiled.

That part of the bootstrap is a bit confusing, because the "version of Firefox to build" is recorded in a mozconfig file, but that recording doesn't happen if a mozconfig exists already (it's difficult to parse and adjust an existing mozconfig). So, re-running bootstrap and selecting Artifact Mode probably didn't affect your build.

Flags: needinfo?(anotherworldofworld)

If you're not using a custom mozconfig, then how have you configured the build system to use GCC?

GNU GCC/g++ are only one compilers that were installed in that time when I've got this errors first time. In addtion there are:

~$ echo $CC
gcc
~$ echo $CXX
g++

variables set in the systme.

Can you upload /home/alex/disk2/gecko-dev/mozconfig? Also, you can try temporarily removing /home/alex/disk2/gecko-dev/mozconfig, then doing a clobber build to see if that works, so we can narrow down the issue.

Yes. That I did right after the build of Firefox for Desktop Artifact Mode was done sucessfully. So after I convinced that build with artifacts works my steps were:

  • Removed gecko-dev/mozconfig. It contained only one line: ac_add_options --enable-artifact-builds
  • Executed ./mach boostrap and selected Firefox for Desktop (without artifacts).
  • Executed ./mach build

and got the same error that you may see in the mach-build.log.

That part of the bootstrap is a bit confusing, because the "version of Firefox to build" is recorded in a mozconfig file, but that recording doesn't happen if a mozconfig exists already (it's difficult to parse and adjust an existing mozconfig). So, re-running bootstrap and selecting Artifact Mode probably didn't affect your build.

And I didn't have mozconfig before in the gecko-dev root directory. I've cloned the gecko-dev repo from the start (to have new and clean environment just for the case) and executed ./mach boostrap, selected Artifacts mode. After that step was done - mozconfig was generated with:

# Automatically download and use compiled C++ components:
ac_add_options --enable-artifact-builds

content and after that ./mach build was sucessful. I decided to build gecko-dev without artifacts. So I've removed mozconfig and executed the steps that I've described above.

Flags: needinfo?(anotherworldofworld)

Ah, you've specified the compiler with CC and CXX environment variables from your shell, not via mozconfig. I'll try that and see if I can repro.

👋, I've got a reproduce and I'll be toying with this tomorrow.

Yes. That was it. I've unset CC and CXX variables in gecko-dev build was successful! I'm not really sure how do that environment variables affect build process and bring these errors especially in context that it worked a couple of weeks ago even with these variables. The only one change was - update of Fedora from 33 to 34.

Probably it is something system-wise or some kind of confusing settings around compilers on my system. Mitchell thank you very much for your help and your time. Just last question - I am not sure should we investigate the issue with CC and CXX variables further or I can close/mark the ticket as invalid?

Flags: needinfo?(mhentges)

Without CC/CXX set, you're building with clang from mach bootstrap. That doesn't mean the problem is gone when building with GCC 11, which is either a problem with GCC 11 or a problem in the code itself.

Ah, yes. Mike you're right. I've just compared config.status files from sucesfull build (with unset CC and CXX) and broken build and sucesfull one was built with clang that is taken from /.mozbuild/.

Attached file config.status
Attached file config.status-error

I've attached two config.status files for comparision: config.status - the build was succesffull clang was used. The second file config.status-error is where I'm getting the compilation erros during build - GNU gcc was used:

~$ gcc --version
gcc (GCC) 11.1.1 20210428 (Red Hat 11.1.1-1)
Copyright (C) 2021 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.  There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.

It's possible that the Firefox build worked against a previous version of GCC, but is running into issues with the new version that's shipped with Fedora, and that's why it's failing now.
Though, at the same time, I'm able to reproduce both on my Fedora 34 guest (GCC 11.1.1) and my Ubuntu 20.10 host (10.2.0), so perhaps that theory isn't valid.

This is definitely a valid issue, so let's leave this ticket open while I dig in. Cheers 🍻

Flags: needinfo?(mhentges)
Summary: Error(s) during building of Firefox from master on Fedora 34 with GCC 11.1.1 → "no matching function" error with GCC 11

I've got a mostly-minimal reproduce on godbolt here.
I don't understand exactly why this is failing on GCC 11. It seems to require:

  • The aFunc closure.
  • The two EntryHandle definitions.
  • The Maybe::Union definition.

It's unclear to me if this is a GCC 11 bug, a new restriction, or some other situation.
:botond, your C++ knowledge is significantly more vast than mine, would you be able to take a look?

Flags: needinfo?(botond)

The difference between GCC 11 and GCC 10 / Clang seems to boil down to whether it accepts a templated constructor as a move constructor.

Here's the code example reduced further to illustrate this:

#include <utility>  // for std::move

struct NonMovable {
  NonMovable(NonMovable &&) = delete;
};

template <class T>
struct Maybe {
  NonMovable mMember;

  template <typename U>
  Maybe(Maybe<U>&&);
};

void foo(Maybe<int>);

void unlucky(Maybe<int>&& x) {
  foo(std::move(x));
}

Here, NonMovable has a deleted move constructor, so the default move constructor that the compiler would generate for Maybe is ill-formed. However, Maybe has a templated constructor which would be a match for the move if instantiated.

It looks like GCC 10 and Clang don't try to generate a move constructor for Maybe, they just use the templated one. GCC 11, however, ignores the templated constructor and tries to generate the default move constructor, which is then ill-formed.

The connection between the original testcase and this one is:

  • Maybe<nsTHashtable::EntryHandle>::Union is not movable because it has a member of type nsTHashtable::EntryHandle which has a non-trivial move constructor.
  • The attempt to move an object of type Maybe<EntryHandle> comes from the internals of invoke_result, which tries to check if the lambda (which has an auto parameter and therefore accepts its argument by value) is callable with an argument of type Maybe<EntryHandle>&&. The lambda's call operator is instantiated to have the signature void (Maybe<EntryHandle>), and then calling that with Maybe<EntryHandle>&& then requires calling the move constructor of Maybe<EntryHandle>.

I'm not sure yet which compiler is right. Will leave the needinfo on me until I can get some more info on that.

(In reply to Botond Ballo [:botond] from comment #21)

I'm not sure yet which compiler is right. Will leave the needinfo on me until I can get some more info on that.

Here's my attempt at an analysis:

  • [class.copy.ctor] p2 says only non-templated constructors count as move constructors. Therefore, Maybe's templated constructor is not considered a move constructor, and thus Maybe does not have an explicitly declared move constructor.
  • [class.copy.ctor] p8 outlines the conditions under which an implicit move constructor is declared as defaulted. It seems to me that in our example, all conditions are met. (In particular, the "if the class does not explicitly declare a move constructor" condition is met per the previous point.) Therefore, in our example, Maybe gets an implicit move constructor declared.
  • [class.copy.ctor] p10 outlines the conditions under which an implicit-declared move constructor is defined as deleted. In our example, condition (10.1) is met (the subobject mMember has a deleted move constructor). Therefore, in our example, Maybe's implicitly-declared move constructor is defined as deleted.
  • Finally, [over.match.funcs.general] p9 says that during overload resolution, "A defaulted move special member function that is defined as deleted is excluded from the set of candidate functions in all contexts." My interpretation of this is that the presence of the defined-as-deleted move constructor of Maybe should not interfere with successful selection of the templated constructor for the purpose of carrying out the move.
    • This is the part I'm least sure about. The overload resolution spec is quite long, and it's hard to rule out overlooking some other rule that comes into play here.

Based on the above, I believe GCC 10 and Clang are correct here, and the behaviour we're seeing in GCC 11 is a bug.

I'll go ahead and file a bug against GCC 11; we'll see if the GCC devs agree with my analysis.

(In reply to Botond Ballo [:botond] from comment #22)

I'll go ahead and file a bug against GCC 11; we'll see if the GCC devs agree with my analysis.

Filed as https://gcc.gnu.org/bugzilla/show_bug.cgi?id=100644.

Flags: needinfo?(botond)

I should add that I'm not sure whether my analysis of the reduced testcase helps explain the original error in the unreduced Mozilla code (where several things are different, for example Union has explicitly declared constructors).

I haven't built m-c with gcc 11 to look at the original error, but please let me know if you'd like me to do so and investigate the original error further.

Wow, thanks Botond, that's great! Thanks for the upstream report too 👍.

I haven't built m-c with gcc 11 to look at the original error, but please let me know if you'd like me to do so and investigate the original error further.

I wouldn't worry about it just yet, let's see what upstream GCC thinks about the report - if it's an expected behaviour change, then we can address it 👍. I think this is the right call because:

  • Anybody running into this error can work around it by using either clang or GCC 10 instead
  • I don't believe that this is affecting many people: even "bleeding-edge" Ubuntu 21.04 uses GCC 10 by default, so the only people affected are the smaller population of people either running a more modern distro (e.g.: Fedora) or having manually installed GCC 11 (and, of course, who have explicitly configured the build to use GCC instead of clang).
  • I'm interested in seeing what the GCC folks think.
Priority: -- → P3

GCC upstream landed a fix on GCC trunk. I confirmed it fixes the issue when applied to GCC 11. Past this bug, there is bug 1711811 if you build with --enable-warnings-as-errors, and another (not yet filed) error with --enable-stdcxx-compat.

Considering this is a GCC issue, let's close this.

Status: UNCONFIRMED → RESOLVED
Closed: 4 years ago
Resolution: --- → INVALID

(In reply to Mike Hommey [:glandium] from comment #26)

and another (not yet filed) error with --enable-stdcxx-compat.

Filed as bug 1711816.

for the record - Fedora Rawhide has already updated gcc 11 to include the fix (https://koji.fedoraproject.org/koji/buildinfo?buildID=1757008) and build for Fedora 34 is in progress right now (https://koji.fedoraproject.org/koji/buildinfo?buildID=1757009)

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: