Closed Bug 1332788 Opened 7 years ago Closed 7 years ago

PGO build fails with gcc-6.3: Unified_cpp_gfx_thebes1.o: requires dynamic R_X86_64_PC32 reloc against '[too long, see below]' which may overflow at runtime; recompile with -fPIC

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 1389436

People

(Reporter: bircoph, Unassigned)

Details

Attachments

(5 files)

Attached file build.log.xz
User Agent: Mozilla/5.0 (X11; Linux i686; rv:51.0) Gecko/20100101 Firefox/51.0
Build ID: 20170117162408

Steps to reproduce:

I'm building firefox-51.0_beta14 with PGO enabled, I tried with -fPIC added to CFLAGS/CXXFLAGS, but it still fails.

Without PGO the very same configuration builds and works fine.


Actual results:

Build environment: amd64 arch, gcc-6.3.0, binutils-2.27. Gold linker is used, but the same problem is present with BFD one.

1. Original error message:

/usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: error: /var/tmp/portage/www-client/firefox-51.0_beta14/work/firefox-51.0b14/ff/t
oolkit/library/../../gfx/thebes/Unified_cpp_gfx_thebes1.o: requires dynamic R_X86_64_PC32 reloc against '_ZN7gfxFont13GetShapedWordIhEEP13gfxShapedWordPN7mozil
la3gfx10DrawTargetEPKT_jjNS3_7unicode6ScriptEbijP18gfxTextPerfMetrics' which may overflow at runtime; recompile with -fPIC
/usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: error: read-only segment has dynamic relocations
/var/tmp/portage/www-client/firefox-51.0_beta14/work/firefox-51.0b14/ff/toolkit/library/../../gfx/thebes/Unified_cpp_gfx_thebes1.o:Unified_cpp_gfx_thebes1.cpp:
function gfxTextRun::SetSpaceGlyph(gfxFont*, mozilla::gfx::DrawTarget*, unsigned int, unsigned short) [clone .cold.358]: error: undefined reference to 'gfxShap
edWord* gfxFont::GetShapedWord<unsigned char>(mozilla::gfx::DrawTarget*, unsigned char const*, unsigned int, unsigned int, mozilla::unicode::Script, bool, int,
 unsigned int, gfxTextPerfMetrics*)'
collect2: error: ld returned 1 exit status

2. I recompiled with -fPIC as asked:

../../gfx/thebes/Unified_cpp_gfx_thebes1.o: In function `gfxTextRun::SetSpaceGlyph(gfxFont*, mozilla::gfx::DrawTarget*, unsigned int, unsigned short) [clone .c
old.358]':
Unified_cpp_gfx_thebes1.cpp:(.text.unlikely+0x72d2): undefined reference to `gfxShapedWord* gfxFont::GetShapedWord<unsigned char>(mozilla::gfx::DrawTarget*, un
signed char const*, unsigned int, unsigned int, mozilla::unicode::Script, bool, int, unsigned int, gfxTextPerfMetrics*)'
/usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: libxul.so: hidden symbol `_ZN7gfxFont13GetShapedWordIhEEP13gfxShapedWordPN7mozil
la3gfx10DrawTargetEPKT_jjNS3_7unicode6ScriptEbijP18gfxTextPerfMetrics' isn't defined
/usr/lib/gcc/x86_64-pc-linux-gnu/6.3.0/../../../../x86_64-pc-linux-gnu/bin/ld: final link failed: Bad value
collect2: error: ld returned 1 exit status

Problematic symbol is still not found. Full build.log is attached below.


Expected results:

PGO build should succeed.

It worked fine with firefox-48.0 on the same box, but with gcc-4.9. With gcc-6.3 firefox-48.0 fails as well. Maybe a regression in gcc, maybe something wrong with firefox code was triggered by gcc-6.3

Unfortunately I can't test firefox-51.0_beta14 with gcc-4.9 due to incompatible libstdc++ changes (binary withot gcc ABI tags can't be linked with libraries with gcc ABI tags).
Component: Untriaged → Build Config
Product: Firefox → Core
trying to build latest central trunk fails the same way
Previous bugs with similar issues are bug 1303085 and bug 1194520.

The common fix is to add a missing header to the system-headers file/list. I'm not a C++ guru. Perhaps froydnj can help tackle this.
Flags: needinfo?(nfroyd)
Given that a non-PGO build succeeds and the symbol that it's complaining about is an internal, Firefox-specific symbol, I don't think this is the same problem as bug 1303085 or bug 1194520.  Usually it'd be a problem with some symbol exported by the system.

Part of the problem might be that gfxFont::GetShapedWord is declared as a template function:

https://dxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxFont.h#1754

but its definition is not visible from the header file:

https://dxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxFont.cpp#2543

and gfxFont::GetShapedWord is called in a single place outside of gfxFont.cpp:

https://dxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxTextRun.cpp#1374

and so, somehow, a PGO compilation doesn't find an instantiation inside gfxFont.cpp of the overload that's called in gfxTextRun.cpp.

I see, however, that we explicitly instantiate gfxFont::SplitAndInitTextRun, which calls gfxFont::GetShapedWord, to avoid these sort of problems:

http://dxr.mozilla.org/mozilla-central/source/gfx/thebes/gfxFont.cpp#3074

But the compiler isn't given any information that gfxFont::GetShapedWord might be called outside of the compilation unit gfxFont.cpp (or the unified file that contains it, which is different than the unified file that contains gfxTextRun.cpp).  So some bit of the compiler probably decides that the definition can be deleted during the PGO process, perhaps after it gets inlined into its only caller(s).  Inlining probably doesn't happen with a normal build, I'd bet, because gfxFont::GetShapedWord is...rather large.

I *think* what we need, then, is some declaration that informs the compiler about what's going on.  However, I'm not really set up to reproduce the problem; are you willing to test potential patches for the issue?
Flags: needinfo?(nfroyd) → needinfo?(bircoph)
There's a possibility that this is a GCC bug, too: it's not clear to me whether the compiler is permitted to discard the implicit instantiation of gfxFont::GetShapedWord resulting from the explicit instantiation of gfxFont::SplitAndInitTextRun, or whether it must retain it for potential references external to the gfxFont.cpp compilation unit.  I'd think it would have to retain it, just as it would retain any other templated method instantiation, but I also don't feel like I have a good understanding on how external templates like this are supposed to work.
needinfo to arya for the question at the end of comment 3; I just need one person willing to test some patches.
Flags: needinfo?(arya)
Here's a patch that at least compiles locally for me, so I think all the
syntax-y bits are correct.  What I'm interested in knowing is whether:

1) This patch fixes the PGO build issues;
2) This patch, minus the gfxFont.h changes, fixes the PGO build issues.
   I added the gfxFont.h changes because they seemed like a good idea, but
   if just the gfxFont.cpp changes are necessary, I'd rather not duplicate
   function prototypes all over the place.
Thanks for the patch, I'll test it and report back. This will take some time.
trying the patch as well; will get back to you later
it built! but the text is garbled; see linked image.

https://i.abcdn.co/ff54%20gcc6.3%20patch%20test.png
Same here: with proposed patch firefox-51.0 release builds fine, but fonts are broken like letters are eaten.
Flags: needinfo?(bircoph)
Attached image firefox.jpg
Example of broken page rendering with patch applied.
Are you building with -fno-schedule-insns2 -fno-delete-null-pointer-checks? If not, you need those.
(In reply to Mike Hommey [:glandium] from comment #12)
> Are you building with -fno-schedule-insns2 -fno-delete-null-pointer-checks?
> If not, you need those.

rebuilding
Flags: needinfo?(arya)
Attached image ff text.png
still same problem with the text
(In reply to Mike Hommey [:glandium] from comment #12)
> Are you building with -fno-schedule-insns2 -fno-delete-null-pointer-checks?
> If not, you need those.

Yes, user-provided C++ flags are:
-march=native -pipe -fno-delete-null-pointer-checks -fno-lifetime-dse -fno-schedule-insns2
Attached file build.log.xz
A build log of firefox-51 with proposed patch applied and PGO enabled. Build is successful, but fonts are broken. There are gfx-related warnings, maybe they will ring some bells.
I tried proposed patch with PGO disabled: works fine, no fonts mangling.
I do not understand this: PGO should not change code's functionality, only performance.
(In reply to bircoph from comment #17)
> I tried proposed patch with PGO disabled: works fine, no fonts mangling.
> I do not understand this: PGO should not change code's functionality, only
> performance.

That's the theory. We've historically run into several bugs with the PGO implementations of various compilers. That's one of the reasons why moz.build files allow you to disable PGO on a per-directory basis. See https://dxr.mozilla.org/mozilla-central/search?q=NO_PGO&case=true. So, adding `NO_PGO = True` is a workaround. But it would be nice to get a bug filed against the compiler when we can reproduce PGO behaving badly.
PGO also changes the optimization level of hot code to -O3. That alone can change the optimizer behavior.
putting NO_PGO = true in gfx/thebes/moz.build with or without the patch results in the same garbled text
I had this issue with GCC 7.1.1 and Firefox ESR 52.1.2; I fixed it by adding the following line after the definition of GetShapedWord:

template gfxShapedWord* gfxFont::GetShapedWord(DrawTarget *aDrawTarget, const unsigned char    *aText, uint32_t    aLength, uint32_t    aHash, Script      aRunScript, bool        aVertical, int32_t     aAppUnitsPerDevUnit, uint32_t    aFlags, gfxTextPerfMetrics *aTextPerf GFX_MAYBE_UNUSED);

I don't have the mentioned font issues, but I'm using gtk2, not gtk3, so that could be the cause.
Still not working PGO (gfxFont::GetShapedWord - error), Firefox 55, GCC 7.1. Patch is not match and the recommendation above, too. 

Really want to build fast and optimized Firefox. How long to wait for fix this error ?
tried to delete following profiling files and firefox was compiled successfully

Unified_cpp_gfx_thebes0.gcda
Unified_cpp_gfx_thebes1.gcda
gfxFontUtils.gcda

This is not the solution, but a temporary fix.
Status: UNCONFIRMED → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: