Closed Bug 1471698 Opened Last year Closed 6 months ago

Work around binutils corrupting mingw clang binaries.

Categories

(Firefox Build System :: General: Unsupported Platforms, enhancement, P5)

enhancement

Tracking

(firefox68- fixed, firefox69 fixed)

RESOLVED FIXED
mozilla69
Tracking Status
firefox68 - fixed
firefox69 --- fixed

People

(Reporter: jacek, Assigned: tjr)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Attachments

(5 files, 1 obsolete file)

The problem is that binutils strip/objcopy corrupts clang+lld-produced libraries unless they are stripped already by lld. Here is an upstream bug report:

https://sourceware.org/bugzilla/show_bug.cgi?id=23348

Ideally, it'd be fixed in binutils (or we'd use other tool), but until then I use a workaround suggested by Martin in bug 1390583 and use -Wl,-s to let lld strip binaries. That's what's in mozconfig in bug 1465800. It works mostly well, except module order test doesn't work because at this point we don't have required symbols.
Comment on attachment 8988289 [details] [diff] [review]
Bug 1471698 - Skip module order test if binaries are stripped by linker

https://reviewboard.mozilla.org/r/253548/#review260290

How about using -S instead of -s, leaving the symbols table intact? Also note you've been bitrotted by bug 1470127 and bug 1471132.

Actually, the symbols should be exported, so it shouldn't matter that it's stripped, so really I wonder what's wrong here. The new check code would give a more useful output.
Attachment #8988289 - Flags: review?(mh+mozilla)
(In reply to Mike Hommey [:glandium] from comment #2)
> 
> How about using -S instead of -s, leaving the symbols table intact?

I'm afraid that lld doesn't support that option (yet), but I guess I could try to implement it.
(In reply to Martin Storsjö from comment #3)
> (In reply to Mike Hommey [:glandium] from comment #2)
> > 
> > How about using -S instead of -s, leaving the symbols table intact?
> 
> I'm afraid that lld doesn't support that option (yet), but I guess I could
> try to implement it.

The -S option is now implemented in LLD since SVN r335947.
Component: General → General: Unsupported Platforms
Product: Core → Firefox Build System
I did quick testing and it seems to work. Thanks!
Attachment #8988289 - Attachment is obsolete: true
Attachment #8988289 - Attachment is patch: true
Attachment #8988289 - Attachment mime type: text/x-review-board-request → text/plain
FWIW, some initial progress on debugging the corruption issue. When LLD produces debug info, the .reloc section contains base relocations for the debug data sections as well. When stripping out the debug data sections, the .reloc section still is intact and contains all of it, including base relocations to sections that no longer exist. When GCC/binutils produces debug info, the .reloc section doesn't seem to contain base relocations for the debug data, so it's fine after stripping as well.

I've yet to study it closer to see why LLD produces these base relocations and why binutils doesn't; if LLD is doing something wrong, or if it's just a case of binutils not handling this case.
I have patches locally that fix this issue in LLD now; see https://reviews.llvm.org/D49350 and https://reviews.llvm.org/D49351.
Attached file sxstrace.txt
So I worked on this a bit.  Here's a build removing the LDFLAGS/objcopy/strip: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0defbcf318df8ece703574b378a656077c7978dd

The builds do not run, with the error 
> The application has failed to start because its side-by-side configuration is incorrect. Please see the application event log or use the command-line sxstrace.exe tool for more detail.

This probably has something to do with this recent commit: https://hg.mozilla.org/mozilla-central/rev/3eb04f5363eb  (But if we leave our existing strip/bin/true workaround in, the build runs fine.)
This problem is currently present in m-c builds even if objcopy/strip hack is used; it doesn't seem related and it's covered by bug 1515982. I think that the patch is fine to go to the tree.

Martin added support for COFF files in llvm-objcopy, so we may use it now. I'd suggest to update to current LLVM trunk and try to use llvm-objcopy.

(In reply to Jacek Caban from comment #11)

Martin added support for COFF files in llvm-objcopy, so we may use it now.
I'd suggest to update to current LLVM trunk and try to use llvm-objcopy.

I gave it a try and used r351992 which should contain the llvm-objcopy fixes if I see that correctly. I bumped the mingw-w64 revision to 8b2c7826b8d68e2ffc79c286b8792efe4168f666 to avoid the _xgetbv builtin issues.

However, I still run into errors related to that. See e.g. in skia code:

9:15.37 /var/tmp/build/firefox-4d0f9fa5fdd5/gfx/skia/skia/src/core/SkCpu.cpp:20:55: error: '__builtin_ia32_xgetbv' needs target feature xsave
9:15.37 static uint64_t xgetbv(uint32_t xcr) { return _xgetbv(xcr); }
9:15.37 ^
9:15.37 /var/tmp/dist/mingw-w64-clang/lib/clang/9.0.0/include/xsaveintrin.h:49:20: note: expanded from macro '_xgetbv'
9:15.37 #define _xgetbv(A) __builtin_ia32_xgetbv((long long)(A))
9:15.37 ^
9:15.40 1 error generated.

Or when compiling the sandbox:

5:30.69 /var/tmp/build/firefox-4d0f9fa5fdd5/security/sandbox/chromium/base/cpu.cc:84:27: error: expected ')'
5:30.69 uint64_t _xgetbv(uint32_t xcr) {
5:30.69 ^
5:30.69 /var/tmp/build/firefox-4d0f9fa5fdd5/security/sandbox/chromium/base/cpu.cc:84:10: note: to match this '('
5:30.69 uint64_t _xgetbv(uint32_t xcr) {
5:30.69 ^
5:30.69 /var/tmp/dist/mingw-w64-clang/lib/clang/9.0.0/include/xsaveintrin.h:49:53: note: expanded from macro '_xgetbv'
5:30.69 #define _xgetbv(A) __builtin_ia32_xgetbv((long long)(A))
5:30.69 ^
5:30.69 /var/tmp/build/firefox-4d0f9fa5fdd5/security/sandbox/chromium/base/cpu.cc:84:10: error: expected expression
5:30.69 uint64_t _xgetbv(uint32_t xcr) {
5:30.69 ^
5:30.69 /var/tmp/dist/mingw-w64-clang/lib/clang/9.0.0/include/xsaveintrin.h:49:56: note: expanded from macro '_xgetbv'
5:30.69 #define _xgetbv(A) __builtin_ia32_xgetbv((long long)(A))
5:30.69 ^
5:30.69 /var/tmp/build/firefox-4d0f9fa5fdd5/security/sandbox/chromium/base/cpu.cc:84:31: error: expected ';' after top level declarator
5:30.69 uint64_t _xgetbv(uint32_t xcr) {
5:30.69 ^
5:30.69 ;
5:30.69 /var/tmp/build/firefox-4d0f9fa5fdd5/security/sandbox/chromium/base/cpu.cc:187:10: error: called object type 'uint64_t' (aka 'unsigned long long') is not a function or function pointer
5:30.69 (_xgetbv(0) & 6) == 6 /* XSAVE enabled by kernel */;
5:30.69 ^~~~~~~~~~
5:30.69 /var/tmp/dist/mingw-w64-clang/lib/clang/9.0.0/include/xsaveintrin.h:49:41: note: expanded from macro '_xgetbv'
5:30.69 #define _xgetbv(A) __builtin_ia32_xgetbv((long long)(A))
5:30.69 ~~~~~~~~~~~~~~~~~~~~~^
5:30.72 1 warning and 4 errors generated.
5:30.72 /var/tmp/build/firefox-4d0f9fa5fdd5/config/rules.mk:1054: recipe for target 'cpu.o' failed
5:30.72 make[4]: *** [cpu.o] Error 1

That's with ESR 60. I am not sure if that is a problem for mozilla-central, too. Martin: Are those errors expected (i.e. we need updated skia and sandbox code, too) or is the _xgetbv issue not fully resolved yet?

Flags: needinfo?(martin)

(In reply to Georg Koppen from comment #12)

(In reply to Jacek Caban from comment #11)

Martin added support for COFF files in llvm-objcopy, so we may use it now.
I'd suggest to update to current LLVM trunk and try to use llvm-objcopy.

I gave it a try and used r351992 which should contain the llvm-objcopy fixes if I see that correctly. I bumped the mingw-w64 revision to 8b2c7826b8d68e2ffc79c286b8792efe4168f666 to avoid the _xgetbv builtin issues.

However, I still run into errors related to that. See e.g. in skia code:

9:15.37 /var/tmp/build/firefox-4d0f9fa5fdd5/gfx/skia/skia/src/core/SkCpu.cpp:20:55: error: '__builtin_ia32_xgetbv' needs target feature xsave
9:15.37 static uint64_t xgetbv(uint32_t xcr) { return _xgetbv(xcr); }
9:15.37 ^
9:15.37 /var/tmp/dist/mingw-w64-clang/lib/clang/9.0.0/include/xsaveintrin.h:49:20: note: expanded from macro '_xgetbv'
9:15.37 #define _xgetbv(A) __builtin_ia32_xgetbv((long long)(A))
9:15.37 ^
9:15.40 1 error generated.

Or when compiling the sandbox:

5:30.69 /var/tmp/build/firefox-4d0f9fa5fdd5/security/sandbox/chromium/base/cpu.cc:84:27: error: expected ')'
5:30.69 uint64_t _xgetbv(uint32_t xcr) {
5:30.69 ^

That's with ESR 60. I am not sure if that is a problem for mozilla-central, too. Martin: Are those errors expected (i.e. we need updated skia and sandbox code, too) or is the _xgetbv issue not fully resolved yet?

I don't regularly build the skia and chromium sandbox code, so I don't know, but I would expect them to be resolved very soon, as chromium has got buildbots that run ~daily with latest clang (and they bump the official builds to use a newer clang every couple weeks or so, afaik).

Flags: needinfo?(martin)

(In reply to Martin Storsjö from comment #13)

(In reply to Georg Koppen from comment #12)

(In reply to Jacek Caban from comment #11)

Martin added support for COFF files in llvm-objcopy, so we may use it now.
I'd suggest to update to current LLVM trunk and try to use llvm-objcopy.

I gave it a try and used r351992 which should contain the llvm-objcopy fixes if I see that correctly. I bumped the mingw-w64 revision to 8b2c7826b8d68e2ffc79c286b8792efe4168f666 to avoid the _xgetbv builtin issues.

However, I still run into errors related to that. See e.g. in skia code:

9:15.37 /var/tmp/build/firefox-4d0f9fa5fdd5/gfx/skia/skia/src/core/SkCpu.cpp:20:55: error: '__builtin_ia32_xgetbv' needs target feature xsave
9:15.37 static uint64_t xgetbv(uint32_t xcr) { return _xgetbv(xcr); }
9:15.37 ^
9:15.37 /var/tmp/dist/mingw-w64-clang/lib/clang/9.0.0/include/xsaveintrin.h:49:20: note: expanded from macro '_xgetbv'
9:15.37 #define _xgetbv(A) __builtin_ia32_xgetbv((long long)(A))
9:15.37 ^
9:15.40 1 error generated.

Or when compiling the sandbox:

5:30.69 /var/tmp/build/firefox-4d0f9fa5fdd5/security/sandbox/chromium/base/cpu.cc:84:27: error: expected ')'
5:30.69 uint64_t _xgetbv(uint32_t xcr) {
5:30.69 ^

That's with ESR 60. I am not sure if that is a problem for mozilla-central, too. Martin: Are those errors expected (i.e. we need updated skia and sandbox code, too) or is the _xgetbv issue not fully resolved yet?

I don't regularly build the skia and chromium sandbox code, so I don't know,
but I would expect them to be resolved very soon, as chromium has got
buildbots that run ~daily with latest clang (and they bump the official
builds to use a newer clang every couple weeks or so, afaik).

Fair enough. Turns out the sandbox issue was already fixed on trunk but the skia one not. I fixed those with attached patches, but there is more broken related to xgetbv in Mozilla code after switching to r351992, alas. :(

Attached patch cpu.patchSplinter Review
Attached patch skia.patchSplinter Review

(In reply to Georg Koppen from comment #14)

Fair enough. Turns out the sandbox issue was already fixed on trunk but the skia one not. I fixed those with attached patches, but there is more broken related to xgetbv in Mozilla code after switching to r351992, alas. :(

Yes, the _xgetbv change seems to be breaking quite a bit of things all over the place.

The change isn't in the 8.0 release branch (it was committed just before the branch, but reverted due to some breakage, and then reapplied after the branch was made). So if it's generally deemed problematic, perhaps it still can be backed out, but that would require actually reporting the issues to the ones who made the change, instead of just fixing all broken projects silently. So far I've had to fix mingw-w64 and Qt.

Priority: -- → P5

I have a green try run for llvm bumped to 359019 (which is right before the bug in Bug 1548624) with three xgetbv patches: https://treeherder.mozilla.org/#/jobs?repo=try&revision=8410a7de12c868559edb782fb2d3efa1da3cadbb

And here's a try run with the objcopy changes and clang on trunk: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d4b2ec95091c09cc8655d9f6bf07fb939da9f345

I tested all four builds and they work.

Okay; I switched to the 8.0 release branch at the urging of GeKo and glandium. Jacek can you take a look at these two patches and confirm they're good, then I'll post them for review: https://treeherder.mozilla.org/#/jobs?repo=try&revision=be9059e23dff13359f594b3bf28717d1b27e5a2f

Flags: needinfo?(jacek)

A couple of comments:

  1. Shouldn't the llvm_revision be 356265 like in the other clang-8*.json files?

  •  -DCMAKE_CXX_FLAGS="${DEBUG_FLAGS} -Wno-dll-attribute-on-redeclaration -nostdinc++ -I$SRC_DIR/libcxx/include -DPSAPI_VERSION=2" \
    
  •   -DCMAKE_C_FLAGS="-Wno-dll-attribute-on-redeclaration" \
    

Indentation.

  •  DLLVM_COMPILER_CHECKED=TRUE \
    

It seems you missed a "-" here, but nothing broke. Maybe that option is not needed then?

Okay new try.

Fixed the indent, the dash, and included all the objcopy patches that Martin pointed me to. Build runs and seems good to me. https://treeherder.mozilla.org/#/jobs?repo=try&revision=cf62265191aae0922259dbaec2c07445cb3374bc

binutils part looks good to me. Reverting to branch 8 is a temporary solution does not exactly solve the problem, but I guess that's fine.

Flags: needinfo?(jacek)

As Martin pointed, we could use llvm-objcopy/llvm-strip instead and simplify the toolchain by using those and getting rid of binutils.

This will match the compiler version Tor would like. We backport several
llvm-objcopy patches that landed right after the 8 branch though. We
also grab some upstream changes from mingw-clang in the build script

Pushed by ccoroiu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/adaa62b87e68
Switch the mingw clang compiler to the 8 branch r=froydnj

Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 6 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla69

Comment on attachment 9065229 [details]
Bug 1471698 - Switch the mingw clang compiler to the 8 branch r?#build

Beta/Release Uplift Approval Request

  • User impact if declined: Tor will have to carry the patch and the ESR branch of 68 will use a different compiler than Tor will use.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Affects only the mingw-clang build.
  • String changes made/needed:
Attachment #9065229 - Flags: approval-mozilla-beta?
Pushed by cbrindusan@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7de6c5aab8b4
Remove our binutils-corruption-avoiding workaround for mingw-clang r=froydnj

Comment on attachment 9065229 [details]
Bug 1471698 - Switch the mingw clang compiler to the 8 branch r?#build

mingwclang updates for 68.0b4

Attachment #9065229 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9065230 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Now that Bug 1553481 is fixed, we can commit the second half of this (and uplift it to beta).

Keywords: checkin-needed

Pushed by apavel@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cd5d8525541f
Remove our binutils-corruption-avoiding workaround for mingw-clang r=froydnj

Keywords: checkin-needed
Flags: qe-verify-

flagging for beta checkin of part 2.

Whiteboard: [checkin-needed-beta]
Whiteboard: [checkin-needed-beta]
You need to log in before you can comment on or make changes to this bug.