Closed Bug 1426865 Opened 3 years ago Closed 1 year ago

Missing symbols when building standalone SpiderMonkey 60 with Clang

Categories

(Core :: JavaScript Engine, defect, P2)

60 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla72
Tracking Status
firefox-esr52 --- wontfix
firefox-esr68 --- fixed
firefox57 --- wontfix
firefox58 --- wontfix
firefox72 --- fixed

People

(Reporter: lantw44, Assigned: ptomato)

References

(Blocks 1 open bug)

Details

Attachments

(11 files, 4 obsolete files)

447 bytes, patch
Details | Diff | Splinter Review
1.69 KB, patch
Details | Diff | Splinter Review
1.45 KB, patch
Details | Diff | Splinter Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
47 bytes, text/x-phabricator-request
Details | Review
13.02 KB, patch
Details | Diff | Splinter Review
4.72 KB, patch
Details | Diff | Splinter Review
24.57 KB, patch
Details | Diff | Splinter Review
User Agent: Mozilla/5.0 (X11; Fedora; Linux x86_64; rv:57.0) Gecko/20100101 Firefox/57.0
Build ID: 20171130114045

Steps to reproduce:

1. Build standalone SpiderMonkey 52 with '/path/to/js/src/configure --enable-posix-nspr-emulation --with-system-zlib --with-intl-api --disable-jemalloc' with Clang on FreeBSD.
2. SpiderMonkey 52 can be built and installed without errors, 
3. Build GNOME's gjs with 'configure --enable-installed-tests --enable-gtk-doc --disable-coverage --disable-Werror' with the same compiler on the same system.
4. gjs cannot be built with SpiderMonkey 52 because of missing symbols.

I have spent several days debugging this problem, and I will provide more details as comments and patches. This problem is caused by the behavior difference between GCC and Clang. We have to build SpiderMonkey 52 with the system compiler, which is Clang 4.0.0, because mixing LLVM libc++ and GCC libstdc++ is known to cause many serious problems.


Actual results:

gmake LIBTOOL=/usr/local/bin/libtool  all-am
gmake[1]: Entering directory '/home/lantw44/gnome/build/gjs'
  CXXLD    gjs-console
./.libs/libgjs.so: undefined reference to `void js::UnsafeTraceManuallyBarrieredEdge<JS::Value>(JSTracer*, JS::Value*, char const*)'
./.libs/libgjs.so: undefined reference to `void js::UnsafeTraceManuallyBarrieredEdge<JSObject*>(JSTracer*, JSObject**, char const*)'
./.libs/libgjs.so: undefined reference to `void js::UnsafeTraceManuallyBarrieredEdge<jsid>(JSTracer*, jsid*, char const*)'
./.libs/libgjs.so: undefined reference to `void JS::TraceEdge<JSObject*>(JSTracer*, JS::Heap<JSObject*>*, char const*)'
clang++: error: linker command failed with exit code 1 (use -v to see invocation)
gmake[1]: *** [Makefile:1821: gjs-console] Error 1
gmake[1]: Leaving directory '/home/lantw44/gnome/build/gjs'
gmake: *** [Makefile:1383: all] Error 2


Expected results:

gjs can be built successfully with SpiderMonkey 52.
This problem is caused by the behavior difference between GCC and Clang on '#pragma GCC visibility push(hidden)', which is used extensively in SpiderMonkey by compiling many things with '-include config/gcc_hidden.h'. It has been reported on LLVM  Bugzilla as https://bugs.llvm.org/show_bug.cgi?id=22254, but it hasn't been fixed.

Both the template 'UnsafeTraceManuallyBarrieredEdge' and 'TraceEdge' are defined and instantiated with 8 types in js/src/gc/Marking.cpp. When we run 'nm -D' on the shared library built by GCC, we can see all of the 8 types are available as weak symbols:

$ nm -DC libmozjs-52.so | grep 'TraceEdge<' | wc -l
8
$ nm -DC libmozjs-52.so | grep 'UnsafeTraceManuallyBarrieredEdge<' | wc -l
8

However, if we run 'nm -D' on the shared library built by Clang, we can find there are many missing symbols:

$ nm -DC libmozjs-52.so | grep 'TraceEdge<' | wc -l
7
$ nm -DC libmozjs-52.so | grep 'UnsafeTraceManuallyBarrieredEdge<' | wc -l
1

After searching the source code for these two templates, we find that types which are missing from the shared library are the ones used in the source code of SpiderMonkey itself. The problem is caused by files calling these two functions, not by js/src/gc/Marking.cpp itself. js/src/gc/Marking.cpp is compiled correctly with 8 weak symbols with default visibility.

By using 'readelf -sW' to read the symbol table of js/src/jsarray.o, or object files of any other source files using UnsafeTraceManuallyBarrieredEdge, we can find the difference between object files generated by GCC and Clang. When a template function with default visibility marked by __attribute__((visibility("default"))) is called with a type with hidden visibility because of the '#pragma GCC visibility push(hidden)' added to the top of the file, GCC generates a global undefined symbol with default visibility for the function call, while Clang generates a global undefined symbol with hidden visibility for it. When these object files are linked into a shared library, the global undefined symbol with hidden visibility converts the corresponding weak symbol defined by js/src/gc/Marking.cpp to a local symbol, making it unavailable to programs using the shared library.

It seems the only way to fix the problem is to mark these 8 types with default visibility. I found two possible solutions:

1. Add a pair of '#pragma GCC visibility push(default)' and '#pragma GCC visibility pop' to header files which declare these 8 types. gjs build can be easily fixed by adding a pair of them to js/public/TypeDecls.h, but I haven't check whether all 8 types are made available. I don't know whether putting this GCC-specific pragma in a public header to fix a Clang issue is acceptable.

2. Mark these 8 types with JS_PUBLIC_API, which is the same as __attribute__((visibility("default"))), when building a shared library. It is possible for multiple header files to declare the same type. I think it is better to modify all headers which have declarations of these 8 types to use JS_PUBLIC_API, but this will be a big change and doing so can take much time. If we want to keep the change small, we can get all types back by modifying only 3 header: js/public/TypeDecls.h, js/public/GCPolicyAPI.h, js/public/Realm.h. TypeDecls.h and GCPolicyAPI.h include all of these 8 types. The reason of modifying Realm.h is that js/src/vm/Realm.cpp includes Realm.h before TypeDecls.h, and Clang requires the visibility mark to be present on the first declaration. Putting the visibility mark on the second declaration without also marking the first declaration causes compilation error.
Waldo, would you be the right person for our JS API? Otherwise, can you need-info the best person?
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(jwalden+bmo)
Priority: -- → P2
I am totally not the right person to understand symbol visibility issues.  :-)  Forwarding along, a bit belatedly (sorry about that).
Flags: needinfo?(jwalden+bmo) → needinfo?(mh+mozilla)
The right thing would be to add JS_*_API annotations on the right symbols, *but*, I'm not the person to tell you whether those should be JS_PUBLIC_API or JS_FRIEND_API.
Flags: needinfo?(mh+mozilla)
(In reply to Mike Hommey [:glandium] from comment #6)
> The right thing would be to add JS_*_API annotations on the right symbols,
> *but*, I'm not the person to tell you whether those should be JS_PUBLIC_API
> or JS_FRIEND_API.

What do 'symbols' mean here? Both UnsafeTraceManuallyBarrieredEdge and TraceEdge are marked as JS_PUBLIC_API in TracingAPI.h. Do 'symbols' here mean the types which should be marked as JS_PUBLIC_API?
Sounds like solution 2 is the correct one, only someone needs to figure out whether it's JS_PUBLIC_API or JS_FRIEND_API. Mike, would you mind forwarding the needinfo to the right person?
Flags: needinfo?(mh+mozilla)
I'm not sure who the right person would be, but I hope Luke would know.
Flags: needinfo?(mh+mozilla) → needinfo?(luke)
Given GC symbols, I'll tentatively punt to sfink?
Flags: needinfo?(luke) → needinfo?(sphink)
Move version to 60 because it can be reproduced with mozjs-60.1.0.
Summary: Missing symbols when building standalone SpiderMonkey 52 with Clang → Missing symbols when building standalone SpiderMonkey 60 with Clang
Version: 52 Branch → 60 Branch
Comment on attachment 8997727 [details] [diff] [review]
Possible solution 2 patch made for SpiderMonkey 60

Review of attachment 8997727 [details] [diff] [review]:
-----------------------------------------------------------------

Huh. If this does the trick, I have no problem with it (including PUBLIC vs FRIEND).
Attachment #8997727 - Flags: review+
Comment on attachment 8997727 [details] [diff] [review]
Possible solution 2 patch made for SpiderMonkey 60

Review of attachment 8997727 [details] [diff] [review]:
-----------------------------------------------------------------

Nope. At least with the current code, `struct JS_PUBLIC_API(Foo)` does not compile.
Attachment #8997727 - Flags: review+ → review-
Comment on attachment 8997727 [details] [diff] [review]
Possible solution 2 patch made for SpiderMonkey 60

Review of attachment 8997727 [details] [diff] [review]:
-----------------------------------------------------------------

If I change it to eg

    class JS_PUBLIC_API JSAtom;

(as in, just remove the parentheses) it seems to be happy. Does that work for you?

Also, if I land this patch in the mainline, what should I use for the author on the patch? Right now, I have it just using your email.
Attachment #8997727 - Flags: feedback?(lantw44)

(In reply to Steve Fink [:sfink] [:s:] from comment #15)

class JS_PUBLIC_API JSAtom;

(as in, just remove the parentheses) it seems to be happy.

Parentheses need to be removed because of the later change https://bugzilla.mozilla.org/show_bug.cgi?id=1508065 that fixed application of clang-format. Using Gjs 1.55.4 with SpiderMonkey 60.1.0 including this patch, fine so far.

Hm. Ok, not so happy. That was enough to compile one sample file, but there are many other declarations that clang gets unhappy about because the visibility doesn't match. Sadly, it does not report where they are.

Is there any testing or compiler-warning-squashing I can do to move this patch along? I'm afraid I didn't understand what the compiler was complaining about.

Sorry for the late reply. This is the updated patch which doesn't have unnecessary parentheses.

Attachment #8997727 - Attachment is obsolete: true
Attachment #8997727 - Flags: feedback?(lantw44)

These need to be marked as public API, otherwise they are hidden by
default, and the symbols of instantiated template types using them will
not be exported when compiling with Clang.

The MFBT header forward-declares JSContext and JSObject, and since it is
included before any JSAPI headers, the visibility annotation needs to be
placed there as well.

I made a new version of the above patch that applies to master, it seems that the forward declarations of JSContext and JSObject in mfbt/RecordReplay.h are the first place those symbols are defined, which was causing the above mentioned compiler errors. Another option discussed with tcampbell would be to use void* in that header instead of forward declaring the symbols.

Well, this is no good: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c15341de04f50212595892d3c070432ad1bb731f&selectedJob=265199191

I guess something else that includes RecordReplay.h also forward-declares the symbols somewhere, but not a file that gets built in the SpiderMonkey-only build so I didn't catch it when building locally... I could move RecordReplay.h to use void* but presumably the build would still fail when the other unknown forward-declaration later encounters the forward declarations in TypeDecls.h.

Any advice?

Have the public interface just use void* pointers so that we don't have to
worry about symbol visibility.

Depends on D46744

Attachment #9090459 - Attachment is obsolete: true

Here's my latest attempt at fixing this. What I've done is just put JS_PUBLIC_API in front of every public API forward-declaration inside TypeDecls.h, and then everywhere else each of these declarations is forward-declared (including broken GCC guards). This seems to work except in RecordReplay.h, so I changed the API to use void* there, in the second patch.

Try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=19a179eb34a93bf3a5b924bf71efc03bc59eef0a

Hm. The pragma goop is pretty verbose. I wonder if this is an argument for "never forward-declare any JS types yourself, always #include js/TypeDecls.h even if you only need one type." But I'm going to defer to Waldo on that, since I feel like he's dealt with our #include hairball more.

Flags: needinfo?(sphink) → needinfo?(jwalden)

Have the public interface just use void* pointers so that we don't have to
worry about symbol visibility.

(Includes some reformatting of a macro because clang-format required it)

This macro makes any forward declarations unnecessarily verbose, and the
build system uses Clang by default anyway, except in the hazard analysis
which already specified -Wno-attributes.

Depends on D49097

As discussed on IRC a couple of weeks ago, here is a revised patchset that removes JS_BROKEN_GCC_ATTRIBUTE_WARNING altogether. I didn't add -Wno-attributes to the pkg-config file after all, since after bug 1539036 we now suppress compiler warnings in installed header files for embedders.

Does this still need some sort of thing in the main Firefox configure script that will add -Wno-attributes when building Firefox with GCC?

Try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=336d9a9ecb72f28382d806c5bd35a7c5174bdd8c

Attachment #9094492 - Attachment is obsolete: true

This suppresses the warning for GCC builds that was previously suppressed
by the JS_BROKEN_GCC_ATTRIBUTE_WARNING macro.

Depends on D49099

Added the warning suppression for GCC builds; here's a try run that should include all of the builds that have specified FORCE_GCC: https://treeherder.mozilla.org/#/jobs?repo=try&revision=190a911e2b606f537d00092bdf05b26004260c69

Waldo, sfink: Any reservations about these four patches, or can I request checkin?

Flags: needinfo?(sphink)

This has nothing to do with the bug, but it was driving me up the wall.

Assignee: nobody → philip.chimento
Status: NEW → ASSIGNED

(In reply to Philip Chimento [:ptomato] from comment #33)

Waldo, sfink: Any reservations about these four patches, or can I request checkin?

I'm happy with them.

Flags: needinfo?(sphink)
Keywords: checkin-needed

Please rebase your patch:
Details: We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again. applying /tmp/tmpoz6Hav mfbt/RecordReplay.cpp Hunk #1 FAILED at 25. 1 out of 1 hunk FAILED -- saving rejects to file mfbt/RecordReplay.cpp.rej abort: patch command failed: exited with status 256

Flags: needinfo?(philip.chimento)
Keywords: checkin-needed

OK, rebased everything. Also note that D50269 depends on the other four patches even though Phabricator doesn't seem to think so.

I can't seem to add checkin-needed back again?

Flags: needinfo?(philip.chimento)
Flags: needinfo?(jwalden)

The "checkin-needed" tag was removed from Bugzilla: https://bugzilla.mozilla.org/show_bug.cgi?id=1546667

You should have a "Checkin-needed" tag on Phabricator.

Tried again to land, but got this:
On Sun, October 27, 2019, 9:15 PM GMT+2, by ccoroiu@mozilla.com.
Revisions: D50269 diff 182272
Details: We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again. applying /tmp/tmpf4FJxV mfbt/RecordReplay.cpp Hunk #1 FAILED at 25. 1 out of 1 hunk FAILED -- saving rejects to file mfbt/RecordReplay.cpp.rej abort: patch command failed: exited with status 256

There are 6 patches on this bug, which are the other four?

Flags: needinfo?(philip.chimento)
Attachment #9094493 - Attachment is obsolete: true

I thought I had already abandoned the obsolete patch and it was still showing up for some reason, but it turns out I hadn't, or possibly had but hadn't pressed submit yet. There should now be only five patches, all marked checkin-needed in Phabricator, and I've also tried to add the appropriate dependencies to D50269. I hope it works now.

Flags: needinfo?(philip.chimento)

Still having some issues, D50269 is blocked for landing (Depends on multiple open parents)

Flags: needinfo?(philip.chimento)

Something went wrong in the hg history. I manually removed extra parents in phabricator and then pushed myself with Lando.

Flags: needinfo?(philip.chimento)
Pushed by tcampbell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/9f67a98b5632
Don't forward-declare JSContext and JSObject in RecordReplay. r=sfink
https://hg.mozilla.org/integration/autoland/rev/01c0d41a024b
Remove JS_BROKEN_GCC_ATTRIBUTE_WARNING. r=sfink
https://hg.mozilla.org/integration/autoland/rev/0ae96da6fdb2
Add default visibility to JSAPI symbols. r=sfink
https://hg.mozilla.org/integration/autoland/rev/104b2ee49a39
Add -Wno-attributes to GCC builds. r=jwalden,dmajor
https://hg.mozilla.org/integration/autoland/rev/48201e9f3338
Fix formatting of macros in RecordReplay. r=jwalden

Thanks. I'm not sure what went wrong there. I'll try to use a git checkout in the future since it seems I can't wrap my head around hg.

Now that it's landed I'll go ahead and request esr68 uplift for this.

Comment on attachment 9100781 [details]
Bug 1426865 - Don't forward-declare JSContext and JSObject in RecordReplay. r=sfink

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Affects embedders building standalone SpiderMonkey with Clang. Without these patches, an incomplete library is built.
  • User impact if declined: SpiderMonkey embedders will need to apply these patches themselves in order to use standalone SpiderMonkey with Clang.
  • Fix Landed on Version: 72
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): I believe the risk is low because any unforeseen regressions would show up at compile time, not at run time.
  • String or UUID changes made by this patch: None
Attachment #9100781 - Flags: approval-mozilla-esr68?
Attachment #9100782 - Flags: approval-mozilla-esr68?
Attachment #9100783 - Flags: approval-mozilla-esr68?
Attachment #9101172 - Flags: approval-mozilla-esr68?

Comment on attachment 9100781 [details]
Bug 1426865 - Don't forward-declare JSContext and JSObject in RecordReplay. r=sfink

This needs rebasing for ESR68. Not sure about the others. Please double-check and re-request approval.

Flags: needinfo?(philip.chimento)
Attachment #9100781 - Flags: approval-mozilla-esr68?
Attachment #9100782 - Flags: approval-mozilla-esr68?
Attachment #9100783 - Flags: approval-mozilla-esr68?
Attachment #9101172 - Flags: approval-mozilla-esr68?

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: I've uploaded patches manually for the three revisions that didn't apply cleanly to ESR68. D49230 applies cleanly, and D50269 doesn't need backporting. See previous uplift request in this bug for rationale.
  • User impact if declined:
  • Fix Landed on Version:
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky):
  • String or UUID changes made by this patch:
Flags: needinfo?(philip.chimento)
Attachment #9105474 - Flags: approval-mozilla-esr68?
Attachment #9101172 - Flags: approval-mozilla-esr68?
Attachment #9105472 - Flags: approval-mozilla-esr68?
Attachment #9105473 - Flags: approval-mozilla-esr68?

Comment on attachment 9101172 [details]
Bug 1426865 - Add -Wno-attributes to GCC builds. r?sfink,jwalden

SM embedding improvements which don't affect Firefox. Approved for 68.3esr.

Attachment #9101172 - Flags: approval-mozilla-esr68? → approval-mozilla-esr68+
Attachment #9105472 - Flags: approval-mozilla-esr68? → approval-mozilla-esr68+
Attachment #9105473 - Flags: approval-mozilla-esr68? → approval-mozilla-esr68+
Attachment #9105474 - Flags: approval-mozilla-esr68? → approval-mozilla-esr68+
You need to log in before you can comment on or make changes to this bug.