Closed Bug 1245076 Opened 8 years ago Closed 8 years ago

[libstdc++6] Firefox fails to build - undefined malloc() free() and so on

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(firefox47 affected, firefox48 fixed)

RESOLVED FIXED
mozilla48
Tracking Status
firefox47 --- affected
firefox48 --- fixed

People

(Reporter: stransky, Unassigned)

References

()

Details

Attachments

(3 files)

Fedora 24 migrated to GCC 6.0 and Firefox (44, Trunk) fails to build here.

libstdc++-devel-6.0.0-0.7 ships /usr/include/c++/6.0.0/stdlib.h which is fetched by mozilla header wrappers instead of plain /usr/include/stdlib.h.

Fails (6.0.0):
# 2 "../../dist/system_wrappers/stdlib.h" 3
# 1 "/usr/include/c++/6.0.0/stdlib.h" 1 3
# 1 "../../dist/stl_wrappers/cstdlib" 1 3

Ok (5.x)
# 2 "../../dist/system_wrappers/stdlib.h" 3
# 1 "/usr/include/stdlib.h" 1 3 4

Those lines in "../../dist/system_wrappers/stdlib.h" seems to workaround it:

#define _GLIBCXX_INCLUDE_NEXT_C_HEADERS
#include_next <stdlib.h>
#undef _GLIBCXX_INCLUDE_NEXT_C_HEADERS
Mike, any idea here?
Flags: needinfo?(mh+mozilla)
Trevor, that sounds like what you were looking at the other day, doesn't it?
Flags: needinfo?(mh+mozilla) → needinfo?(tbsaunde+mozbugs)
(In reply to Mike Hommey [:glandium] from comment #3)
> Trevor, that sounds like what you were looking at the other day, doesn't it?

yeah, its the same issue.  I don't have a great idea, but I think I have an idea that at least doesn't involve touching nspr.  I should know how that works shotly.
Flags: needinfo?(tbsaunde+mozbugs)
> yeah, its the same issue.  I don't have a great idea, but I think I have an
> idea that at least doesn't involve touching nspr.  I should know how that
> works shotly.

actually noap.

We can't just include <stdlib.h> with _GLIBCXX_INCLUDE_NEXT_C_HEADERS defined in the stl wrappers because then libstdc++'s <stdlib.h> is the one we include and it tries to include our stdlib.h.  I don't think we can work around that with #include_next because I think that will still get the libstdc++ stdlib.h not the glibc one.

Which I think means we need to change the system wrappers (and so nspr's make-system-wrappers.pl)  That doesn't seem totally correct, including stdlib.h will get you what C says it should include not c++, but that's probably fine.

so nspr patch coming up.
libstdc++ has started shipping a wrapper for <stdlib.h> that includes
<cstdlib>.  our stl wrappers include <stdlib.h> from mozalloc.h, so we end up
in a cycle with our stdlib.h including libstdc++'s stdlib.h which includes our
cstdlib which includes our stdlib.h which includes libstdc++'s again which does
nothing.  So then the code in mozalloc.h comes before the contents of stdlib.h
which doesn't work.

We need to get stdlib.h's contents included before mozalloc.h.  We can't
include  stdlib.h from our cstdlib wrapper because that would still go through
the libstdc++ header which would bring in our cstdlib wrapper again.  As a
result we need to get libstdc++'s stdlib.h to include the real stdlib.h.
libstdc++ uses GLIBCXX_INCLUDE_NEXT_C_HEADERS internally when it needs to
include the real stdlib.h, so we can abuse it to do the same.  This isn't quiet
perfect because it means we get only what C says is in the standard headers not
what C++ says is there, but that's probably fine for now, and its certainly
better than not building at all.
Attachment #8715553 - Flags: review?(mh+mozilla)
I just hit this for nightly on Fedora rawhide today.  Looked and found this bugzilla, which seems to indicate a fix was implemented.  Will that fix get to nightly soon?
(In reply to stan from comment #7)
> I just hit this for nightly on Fedora rawhide today.  

For Fedora rawhide you may also consider Bug 1245783
I'll keep that in mind once I get a successful compile.
I'm still not able to successfully compile nightly without the patch on fedora rawhide with gcc 6.  It does successfully compile with gcc 4.9.  If I apply the patch to my local hg repository so I can compile nightly on rawhide, will it no longer compile with gcc 4.9?  Is there a way to compile one with the patch and one without the patch using hg?

Thanks.
The previous comment doesn't make it clear.  I'm compiling on two different installs, but from the same hg repository.
I patched using the above patch, and nightly compiled successfully.  But when I tried to start it, it immediately crashed.  I have pulled the latest changes, and disabled gio as the above ticket suggested.  Will see what happens.
That also compiled successfully, but crashed on start.
To try to trace the problem of why nightly was crashing at start, I changed the compile flags I use from " -O3 " to " -ggdb -O0 ".  My intention was to run firefox in gdb from a terminal, and see the error it hit.  Just for kicks, I tried starting nightly without gdb, and nightly runs fine with those flags, if slowly.  So, the patch above seems to fix the compile problem, but gcc 6 appears to have problems with O3 optimization.  Next, I suppose I should try " -O2 ".

This is still with gio disabled, but I don't seem to have the crash problem after running for a short time, that the other ticket documented.
I used " -ggdb -O2 " as compile options, and the system crashed on start again.  When I ran using gdb it said that there was an error sig38.  It happened in different libraries.

After this failed start, the regular production version of firefox would not start.  It kept getting segmentation faults.  But I started it in safe mode, and after that it runs normally.  Currently, I'm recompiling nightly with " -ggdb -O0 ", and will use that, with occasional testing to see if optimization is working.
The final error messages from a run of firefox in gdb, compiled with " -ggdb -O2 ".

(firefox:23948): Gtk-WARNING **: State 0 for GtkMenuItem 0x7fffd962a6a0 doesn't match state 128 set via gtk_style_context_set_state ()

Thread 1 "firefox" received signal SIGBUS, Bus error.
0x00007fffe78a3fe2 in ?? () from /usr/local/lib64/firefox-48.0a1/libxul.so
Comment on attachment 8715553 [details] [diff] [review]
define _GLIBCXX_INCLUDE_NEXT_C_HEADERS when including system headers

Review of attachment 8715553 [details] [diff] [review]:
-----------------------------------------------------------------

After thoroughly analysing the problem, I have what I think is a better approach... provided my build ends (but it's going much farther than without my patch)
Attachment #8715553 - Flags: review?(mh+mozilla) → review-
(In reply to stan from comment #13)
> That also compiled successfully, but crashed on start.

This would be a separate bug, possibly even a GCC bug (actually quite likely).
huh... failed with /usr/bin/../lib/gcc/x86_64-linux-gnu/6.0.0/../../../../include/c++/6.0.0/bits/move.h:45:3: error: templates must have C++ linkage
oh, but I'm building with clang + libstdc++ 6, I'm asking for trouble.
Summary: [GCC 6.0] Firefox fails to build - undefined malloc() free() and so on → [libstdc++6] Firefox fails to build - undefined malloc() free() and so on
By including both, the compiler ends up prefering abs(double) instead of
abs(int) for the abs call with a signed char, and subsequently fails to
compile, not wanting to add a double to a pointer.

Review commit: https://reviewboard.mozilla.org/r/39193/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/39193/
Attachment #8728945 - Flags: review?(jfkthame)
Our STL wrappers do various different things, one of which is including
mozalloc.h for infallible operator new. mozalloc.h includes stdlib.h,
which, in libstdc++ >= 6 is now itself a wrapper around cstdlib, which
circles back to our STL wrapper.

But of the things our STL wrappers do, including mozalloc.h is not one
that is necessary for cstdlib. So skip including mozalloc.h in our
cstdlib wrapper.

Additionally, some C++ sources (in media/mtransport) are including
headers in an extern "C" block, which end up including stdlib.h, which
ends up including cstdlib because really, this is all C++, and our
wrapper pre-includes <new> for mozalloc.h, which fails because templates
don't work inside extern "C". So, don't pre-include <new> when we're not
including mozalloc.h.

Review commit: https://reviewboard.mozilla.org/r/39195/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/39195/
Attachment #8728947 - Flags: review?(nfroyd)
Note we could probably get away with not wrapping cstlib at all, but I'm not sure how the __throw stuff in it plays without including throw_gcc.h.
(In reply to Mike Hommey [:glandium] from comment #22)
> Created attachment 8728945 [details]
> MozReview Request: Bug 1245076 - Work around unified sources including both
> cmath and cstdlib in graphite2
> 
> By including both, the compiler ends up prefering abs(double) instead of
> abs(int) for the abs call with a signed char, ...

Is that correct behavior by the compiler? I'm not intimately familiar with all the details of overload resolution, but I'd have assumed an integral promotion would be preferred over an integer-to-floating conversion. That seems to be implied by the "Ranking of implicit conversion sequences" mentioned in [1], but maybe there are other considerations that override it?

[1] http://en.cppreference.com/w/cpp/language/overload_resolution
Comment on attachment 8728947 [details]
MozReview Request: Bug 1245076 - Don't include mozalloc.h from the cstdlib wrapper

https://reviewboard.mozilla.org/r/39195/#review35899

The media/mtransport/ extern "C" stuff makes me sad.  Can we not get that fixed?

::: config/gcc-stl-wrapper.template.h:20
(Diff revision 1)
> +#ifndef moz_dont_include_mozalloc_for_cstdlib

I think a comment here, perhaps just pointing back to this bug, would be splendid.
Attachment #8728947 - Flags: review?(nfroyd) → review+
(In reply to Mike Hommey [:glandium] from comment #18)
> Comment on attachment 8715553 [details] [diff] [review]
> define _GLIBCXX_INCLUDE_NEXT_C_HEADERS when including system headers
> 
> Review of attachment 8715553 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> After thoroughly analysing the problem, I have what I think is a better
> approach... provided my build ends (but it's going much farther than without
> my patch)

great, I'm glad we don't need that patch and sorry I didn't think of this.
https://reviewboard.mozilla.org/r/39195/#review35899

I'm sure that if you created a patch, bcampen would be happy to review it.  I'm fairly familiar with that code, so I don't see why the C files can't include stdlib.h if they need it.
https://reviewboard.mozilla.org/r/39195/#review35899

The problem is C++ code including C code including stdlib.h in a extern "C" block.
https://reviewboard.mozilla.org/r/39195/#review35899

Specifically, the problem is that since this is really C++ code including stdlib.h, libstdc++ 6 headers make that include cstdlib.
(In reply to Jonathan Kew (:jfkthame) from comment #25)
> (In reply to Mike Hommey [:glandium] from comment #22)
> > Created attachment 8728945 [details]
> > MozReview Request: Bug 1245076 - Work around unified sources including both
> > cmath and cstdlib in graphite2
> > 
> > By including both, the compiler ends up prefering abs(double) instead of
> > abs(int) for the abs call with a signed char, ...
> 
> Is that correct behavior by the compiler? I'm not intimately familiar with
> all the details of overload resolution, but I'd have assumed an integral
> promotion would be preferred over an integer-to-floating conversion. That
> seems to be implied by the "Ranking of implicit conversion sequences"
> mentioned in [1], but maybe there are other considerations that override it?
> 
> [1] http://en.cppreference.com/w/cpp/language/overload_resolution

Thanks for making me look deeper. It turns out this all comes from Collider.cpp including math.h instead of cmath.
Attachment #8728945 - Attachment description: MozReview Request: Bug 1245076 - Work around unified sources including both cmath and cstdlib in graphite2 → MozReview Request: Bug 1245076 - Include cmath instead of math.h in Collider.cpp
Comment on attachment 8728945 [details]
MozReview Request: Bug 1245076 - Include cmath instead of math.h in Collider.cpp

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/39193/diff/1-2/
Comment on attachment 8728947 [details]
MozReview Request: Bug 1245076 - Don't include mozalloc.h from the cstdlib wrapper

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/39195/diff/1-2/
Comment on attachment 8728945 [details]
MozReview Request: Bug 1245076 - Include cmath instead of math.h in Collider.cpp

https://reviewboard.mozilla.org/r/39193/#review36113

Ah, that makes more sense - thanks for tracking it down.

(I've sent upstream a message asking them to fix this, so the problem doesn't reappear each time we take an update.)
Attachment #8728945 - Flags: review?(jfkthame) → review+
https://hg.mozilla.org/mozilla-central/rev/27c94617d706
https://hg.mozilla.org/mozilla-central/rev/55212130f19d
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
First, I can confirm that nightly compiles successfully with these changes.  Not that you needed my confirmation, but it doesn't hurt anything.

@comment19, Mike.  I opened a ticket against gcc at fedora, https://bugzilla.redhat.com/show_bug.cgi?id=1316999, and received the following response.  It seems this is a known issue with gcc6 and firefox.

From Jakub Jelinek:
Please try -flifetime-dse=1 or -fno-lifetime-dse and/or -fno-delete-null-pointer-checks, forgot which of these violates firefox, whether trying to preserve data from before construction to after construction, or calling methods on NULL pointers (or both).

I'll be following that advice.  If you have any insight to offer, I'd be happy to receive it, as well.
Thanks.
Jakub provided a reference to c++ changes in gcc6.
https://gcc.gnu.org/gcc-6/porting_to.html

I am currently trying to compile using this advice:

Optimizations remove null pointer checks for this

When optimizing, GCC now assumes the this pointer can never be null, which is guaranteed by the language rules. Invalid programs which assume it is OK to invoke a member function through a null pointer (possibly relying on checks like this != NULL) may crash or otherwise fail at run time if null pointer checks are optimized away. With the -Wnull-dereference option the compiler tries to warn when it detects such invalid code.

If the program cannot be fixed to remove the undefined behavior then the option -fno-delete-null-pointer-checks can be used to disable this optimization. That option also disables other optimizations involving pointers, not only those involving this. 

If using
ac_add_options --enable-optimize=" -Wnull-dereference -fno-delete-null-pointer-checks -O3" 
compiles and runs correctly, then it is likely that the issue is in nightly code incompatible with C++14, I think.
I tried several variations of options, and nightly crashes on start with any optimization regardless of what I do.
ac_add_options --enable-optimize=" -Wall -std=gnu++11 -O3"
ac_add_options --enable-optimize=" -Wall -std=gnu++98 -O3"  # this wouldn't compile
ac_add_options --enable-optimize=" -Wall -fno-delete-null-pointer-checks -O3"
ac_add_options --enable-optimize=" -Wnull-dereference -fno-delete-null-pointer-checks -O3"

I don't know whether it is gcc or nightly with the problem.  I'm out of my depth here.  So, I'll just drop it.
(In reply to Jonathan Kew (:jfkthame) from comment #34)
> (I've sent upstream a message asking them to fix this, so the problem
> doesn't reappear each time we take an update.)

FWIW: https://github.com/silnrsi/graphite/commit/fc73a64f5f86bb00ae082a7991497cbc392bf3fd
Someone at Fedora pointed out that I hadn't tried the -fno-lifetime-dse compiler option mentioned by Jakub.  When I did that, using this option in .mozconfig,
ac_add_options --enable-optimize=" -Wall -fno-lifetime-dse -fno-delete-null-pointer-checks -O3"
the latest nightly compiled fine, and is running fine.  I am using it to write this comment.

So, there seems to be something in the coding style of nightly that runs afoul of the new standards in gcc 6, and that compiler switch fixes it.

That's great, I have my nightly back.  And there is a work around for optimization of nightly in gcc 6.
(In reply to stan from comment #41)
> So, there seems to be something in the coding style of nightly that runs
> afoul of the new standards in gcc 6, and that compiler switch fixes it.

... or not. Because my gcc6 build with the Debian experimental package (6-20160228-1) doesn't crash.
This is offtopic in this bug anyways.
Depends on: 1259537
Please note that applying ONLY https://bugzilla.mozilla.org/attachment.cgi?id=8715553 does NOT seem to fix the issue. So, please be careful...

I have now also applied https://hg.mozilla.org/mozilla-central/rev/55212130f19d and it seems to resume the build process... it hasn't finished yet, will update shortly on the outcome.
After checking more carefully, it seems that https://bugzilla.mozilla.org/attachment.cgi?id=8715553 is NOT required.
Patch https://hg.mozilla.org/mozilla-central/rev/55212130f19d fixes the problem for firefox 45.0.1 (no other patches needed).
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.