Closed Bug 878791 Opened 7 years ago Closed 6 years ago

Firefox on FreeBSD crashes due to loading libc.so.6

Categories

(Toolkit :: OS.File, defect, critical)

x86_64
FreeBSD
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla29

People

(Reporter: dimitry, Assigned: Yoric)

References

Details

(Keywords: crash, Whiteboard: [Async])

Attachments

(1 file, 8 obsolete files)

On the FreeBSD ports mailing list, I noticed reports [1] about Firefox
21 crashes, which had very strange backtraces.  For example:

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 2d8c7e00 (LWP 101017/DOM Worker)]
0x282a854b in strcasecmp_l () from /lib/libc.so.7
(gdb) where
#0  0x282a854b in strcasecmp_l () from /lib/libc.so.7
#1  0x282a8666 in strcasecmp () from /lib/libc.so.7
#2  0x282d195e in .cerror () from /lib/libc.so.7
#3  0x283477b8 in .got () from /usr/local/lib/libffi.so.6
#4  0x28345e87 in ffi_call_SYSV () from /usr/local/lib/libffi.so.6
#5  0x28345cbe in ffi_call () from /usr/local/lib/libffi.so.6
#6  0x2a6c295b in js::SizeOfDataIfCDataObject ()
   from /usr/local/lib/firefox/libxul.so
#7  0x2a41d167 in js::AutoMaybeTouchDeadZones::~AutoMaybeTouchDeadZones ()
   from /usr/local/lib/firefox/libxul.so
#8  0x2a3ec016 in js::AutoCTypesActivityCallback::AutoCTypesActivityCallback ()
   from /usr/local/lib/firefox/libxul.so
#9  0x2a41d127 in js::AutoMaybeTouchDeadZones::~AutoMaybeTouchDeadZones ()
   from /usr/local/lib/firefox/libxul.so
#10 0x2a41969f in js::AutoMaybeTouchDeadZones::~AutoMaybeTouchDeadZones ()
   from /usr/local/lib/firefox/libxul.so
#11 0x2a412563 in js::AutoMaybeTouchDeadZones::~AutoMaybeTouchDeadZones ()
   from /usr/local/lib/firefox/libxul.so
#12 0x2a41d09a in js::AutoMaybeTouchDeadZones::~AutoMaybeTouchDeadZones ()
   from /usr/local/lib/firefox/libxul.so
#13 0x2a3ec016 in js::AutoCTypesActivityCallback::AutoCTypesActivityCallback ()
   from /usr/local/lib/firefox/libxul.so
#14 0x2a41d127 in js::AutoMaybeTouchDeadZones::~AutoMaybeTouchDeadZones ()
   from /usr/local/lib/firefox/libxul.so
#15 0x2a41969f in js::AutoMaybeTouchDeadZones::~AutoMaybeTouchDeadZones ()
   from /usr/local/lib/firefox/libxul.so
#16 0x2a412563 in js::AutoMaybeTouchDeadZones::~AutoMaybeTouchDeadZones ()
   from /usr/local/lib/firefox/libxul.so
#17 0x2a41d09a in js::AutoMaybeTouchDeadZones::~AutoMaybeTouchDeadZones ()
   from /usr/local/lib/firefox/libxul.so
#18 0x2a41d5e3 in js::AutoMaybeTouchDeadZones::~AutoMaybeTouchDeadZones ()
   from /usr/local/lib/firefox/libxul.so
#19 0x2a3a12f6 in JS_CallFunctionValue () from /usr/local/lib/firefox/libxul.so
#20 0x2927b2aa in std::vector<mozilla::gfx::Glyph, std::allocator<mozilla::gfx::Glyph> >::_M_insert_aux () from /usr/local/lib/firefox/libxul.so
#21 0x29c43dc3 in std::vector<mozilla::plugins::IPCByteRange, std::allocator<mozilla::plugins::IPCByteRange> >::_M_fill_insert ()
   from /usr/local/lib/firefox/libxul.so
#22 0x29c43cc2 in std::vector<mozilla::plugins::IPCByteRange, std::allocator<mozilla::plugins::IPCByteRange> >::_M_fill_insert ()
   from /usr/local/lib/firefox/libxul.so
#23 0x2a41d127 in js::AutoMaybeTouchDeadZones::~AutoMaybeTouchDeadZones ()
   from /usr/local/lib/firefox/libxul.so
#24 0x2a41d5e3 in js::AutoMaybeTouchDeadZones::~AutoMaybeTouchDeadZones ()
   from /usr/local/lib/firefox/libxul.so
#25 0x2a3a124c in JS_CallFunctionName () from /usr/local/lib/firefox/libxul.so
#26 0x29279429 in std::vector<mozilla::gfx::Glyph, std::allocator<mozilla::gfx::Glyph> >::_M_insert_aux () from /usr/local/lib/firefox/libxul.so
#27 0x29290724 in std::vector<mozilla::gfx::Glyph, std::allocator<mozilla::gfx::Glyph> >::_M_insert_aux () from /usr/local/lib/firefox/libxul.so
#28 0x29287475 in std::vector<mozilla::gfx::Glyph, std::allocator<mozilla::gfx::Glyph> >::_M_insert_aux () from /usr/local/lib/firefox/libxul.so
#29 0x29288f6e in std::vector<mozilla::gfx::Glyph, std::allocator<mozilla::gfx::Glyph> >::_M_insert_aux () from /usr/local/lib/firefox/libxul.so
#30 0x29282933 in std::vector<mozilla::gfx::Glyph, std::allocator<mozilla::gfx::Glyph> >::_M_insert_aux () from /usr/local/lib/firefox/libxul.so
#31 0x29e118c4 in XRE_AddJarManifestLocation ()
   from /usr/local/lib/firefox/libxul.so
#32 0x29dcbf68 in std::vector<mozilla::plugins::IPCByteRange, std::allocator<mozilla::plugins::IPCByteRange> >::_M_fill_insert ()
   from /usr/local/lib/firefox/libxul.so
#33 0x29e10e30 in XRE_AddJarManifestLocation ()
   from /usr/local/lib/firefox/libxul.so
#34 0x2b38a59a in PR_CreateThread () from /usr/local/lib/libplds4.so.1
#35 0x281b3f1a in pthread_getprio () from /lib/libthr.so.3
#36 0x00000000 in ?? ()

The common property of all the backtraces was that they originated from
ffi_call() invocations.  Those invocations tried to call into a
perfectly normal libc function, such as open(2), but for some reason
ended up in a completely different one, usually having disastrous
results.  Most of the time, this ended in segfaults.

After a lot of debugging, where libffi, libc, libthr (the FreeBSD
pthread library) and others were suspected, instrumented and found to
work correctly, the cause turned to be in Firefox itself.

In Firefox's OS.File implementation for UNIX, it loads libc.so.6 if that
is available, and then uses libffi to call into it.  However, the system
C library on recent FreeBSD versions is called libc.so.7 instead.

For compatibility with older applications, there is an installable
package which provides libc.so.6, so if the compatibility package is
installed, Firefox will load that on top of the already loaded
libc.so.7.  Since several internal symbols of these different library
verions conflict, this usually leads to problems, such as the segfault
mentioned above.

As a workaround, users can uninstall the compatibility package, or
remove the libc.so.6 file manually, but it is probably better if this is
fixed in Firefox itself.  Currently,
toolkit/components/osfile/osfile_unix_allthreads.jsm does:

  (function(exports) {
  [...]
    // Open libc
    let libc;
    let libc_candidates =  [ "libSystem.B.dylib",
			     "libc.so.6",
			     "libc.so" ];
    for (let i = 0; i < libc_candidates.length; ++i) {
      try {
	libc = ctypes.open(libc_candidates[i]);
	break;
      } catch (x) {
	LOG("Could not open libc ", libc_candidates[i]);
      }
    }

I propose adding libc.so.7 into the list, just before libc.so.6, like in
the attached patch.  This should not lead to trouble on Linux, where
there is only libc.so.6.  Alternatively, it could be made OS-dependent,
but I am not sure if that is worth the complexity here.

[1]: <http://lists.freebsd.org/pipermail/freebsd-ports/2013-May/083791.html>
Assignee: nobody → general
Severity: normal → critical
Component: General → JavaScript Engine
Keywords: crash
Product: Firefox → Core
Attachment #757393 - Attachment is patch: true
This is a toolkit issue, not a jseng one.
Assignee: general → nobody
Component: JavaScript Engine → OS.File
Product: Core → Toolkit
Apologies, this is the correct patch, which adds libc.so.7 before libc.so.6, instead of replacing it.
Attachment #757393 - Attachment is obsolete: true
Comment on attachment 757453 [details] [diff] [review]
Add libc.so.7 before libc.so.6 in the list of libc candidates for OS.File

Review of attachment 757453 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me.
Attachment #757453 - Flags: review+
I have exactly same issue with uClibc which has libc.so.0.9.32 on Alpine Linux. I know for sure that libc.so.0 is used for other uClibc setups.

I think we can do better than have a static list of known/supported libc's that will break late whenever a new/unknown version shows up. (this is not the way ELF is supposed to work).

I think we have 2 options to solve this properly:

1) Implement a find_library() similar to python's ctype.find_library does, and parse the output of ldconfig -p.

2) from configure script find the current platforms proper libc.so name, something like:

libc_file=$(cat <<EOF | python -
from ctypes.util import find_library
print(find_library('c'))
EOF
)

Store that some place as a constant during compile time and use that constant from toolkit.

It should be safe to detect it compile time since if it changes due to ABI breakage, the application will need to rebuild anyways.
Attached patch workaround for alpine linux (obsolete) — Splinter Review
A patch that works around the issue for Alpine Linux
Attached patch example (obsolete) — Splinter Review
I haven not even compile tested this patch but it gives an idea how we could detect the libc at compile time.
Attachment #759130 - Attachment is obsolete: true
Attached patch A patch that actually works (obsolete) — Splinter Review
I have tested this patch and it appears to actually work. It should fix the problem for freebsd too.
Attachment #759693 - Attachment is obsolete: true
Attachment #759721 - Flags: review?(dteller)
Just as a quick verification, the Python fragment indeed gives the wanted result on FreeBSD:

$ python -c "from ctypes.util import find_library; print(find_library('c'))"
libc.so.7

So that looks good to me.
Comment on attachment 759721 [details] [diff] [review]
A patch that actually works

Review of attachment 759721 [details] [diff] [review]:
-----------------------------------------------------------------

Are you sure it's the right patch?
There should be no ".orig" files. That's artifacts from a hg merge.
Attachment #759721 - Flags: review?(dteller)
(In reply to David Rajchenbach Teller [:Yoric] from comment #9)
> Comment on attachment 759721 [details] [diff] [review]
> A patch that actually works
> 
> Review of attachment 759721 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Are you sure it's the right patch?
> There should be no ".orig" files. That's artifacts from a hg merge.

I manually copied the .orig file from the release source (firefox-21) and manually did git diff -u. (i think)

Do you want me to rebase it against a mercurial branch? which branch in that case?
Regarding FreeBSD, we have recently switched libc.so to be an ld script (this will be the case in the next major release FreeBSD 10), similar to what we can find on Linux (at least on Debian):

debian$ cat /usr/lib/x86_64-linux-gnu/libc.so 
/* GNU ld script
   Use the shared library, but some functions are only in
   the static library, so try that secondarily.  */
OUTPUT_FORMAT(elf64-x86-64)
GROUP ( /lib/x86_64-linux-gnu/libc.so.6 /usr/lib/x86_64-linux-gnu/libc_nonshared.a  AS_NEEDED ( /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 ) )

freebsd$ cat /usr/lib/libc.so
/* $FreeBSD: head/lib/libc/libc.ldscript 251668 2013-06-12 21:12:05Z jlh $ */
GROUP ( /lib/libc.so.7 /usr/lib/libssp_nonshared.a )


Can you point me where the change is to be done please?

I tried naively to the following command in the Firefox source tree
$ grep -rl 'libc\.so' .

But it finds 23 files.  This would greatly help me if someone knowing the Firefox source tree could share his knowledge.

As a side note, I think trying directly libc.so is your best bet. Do you know the rationale that led to open the major-numbered library directly instead of the standard symlink (or ld script)?
Regarding FreeBSD, we have recently switched libc.so to be an ld script (this will be the case in the next major release FreeBSD 10), similar to what we can find on Linux (at least on Debian):

debian$ cat /usr/lib/x86_64-linux-gnu/libc.so 
/* GNU ld script
   Use the shared library, but some functions are only in
   the static library, so try that secondarily.  */
OUTPUT_FORMAT(elf64-x86-64)
GROUP ( /lib/x86_64-linux-gnu/libc.so.6 /usr/lib/x86_64-linux-gnu/libc_nonshared.a  AS_NEEDED ( /lib/x86_64-linux-gnu/ld-linux-x86-64.so.2 ) )

freebsd$ cat /usr/lib/libc.so
/* $FreeBSD: head/lib/libc/libc.ldscript 251668 2013-06-12 21:12:05Z jlh $ */
GROUP ( /lib/libc.so.7 /usr/lib/libssp_nonshared.a )


Can you point me where the change is to be done please?

I tried naively to the following command in the Firefox source tree
$ grep -rl 'libc\.so' .

But it finds 23 files.  This would greatly help me if someone knowing the Firefox source tree could share his knowledge.

As a side note, I think trying directly libc.so is your best bet. Do you know the rationale that led to open the major-numbered library directly instead of the standard symlink (or ld script)?
Wouldn't ldscript break dlopen() ? 

js> ctypes.open("libc.so")
typein:2:0 Error: couldn't open library libc.so

>>> ctypes.CDLL("libc.so")
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python2.7/ctypes/__init__.py", line 365, in __init__
    self._handle = _dlopen(self._name, mode)
OSError: /usr/lib/libc.so: invalid file format

And I'm not sure dlopen(NULL) is supported/desirable by js-ctypes.
(In reply to Natanael Copa from comment #10)
> I manually copied the .orig file from the release source (firefox-21) and
> manually did git diff -u. (i think)
> 
> Do you want me to rebase it against a mercurial branch? which branch in that
> case?

Well, it's not a problem of rebasing. There should be no .orig file in a patch.
Whiteboard: [Async]
Attached patch detect-libc-filename.patch (obsolete) — Splinter Review
This patch should have no .orig ans was rebased against a fresh hg clone.
Attachment #759721 - Attachment is obsolete: true
Attachment #8346464 - Flags: review+
Attachment #8346464 - Flags: review+ → review?(dteller)
Comment on attachment 8346464 [details] [diff] [review]
detect-libc-filename.patch

Review of attachment 8346464 [details] [diff] [review]:
-----------------------------------------------------------------

The OS.File part is fine for me.
Asking ted about the configure.in part.

::: dom/system/OSFileConstants.cpp
@@ +836,5 @@
>    if (!SetStringProperty(cx, objPath, "tmpDir", gPaths->tmpDir)) {
>      return false;
>    }
>  
> +  if (!SetStringProperty(cx, objPath, "libcFilename", NS_LITERAL_STRING(MOZ_LIBC_FILENAME))) {

Just "libc".

::: toolkit/components/osfile/modules/osfile_unix_allthreads.jsm
@@ +45,2 @@
>                           "libc.so.6",
>                           "libc.so" ];

Well, if OS.Constants.Path.libc is correct, we don't need to try the others, do we?
Attachment #8346464 - Flags: review?(ted)
Attachment #8346464 - Flags: review?(dteller)
Attachment #8346464 - Flags: feedback+
This is dangerous to hard-code "libc.so.6".  You may end up using a stale libc from a previous version of the system.

How does it work on Linux?  On Debian, /usr/lib/libc.so is also an ldscript, so there should be already a solution in place that can be re-used on FreeBSD 10+.

-- Jeremie
(In reply to David Rajchenbach Teller [:Yoric] (away 20 Dec to Jan 4th - please use "needinfo?") from comment #16)
> @@ +45,2 @@
> >                           "libc.so.6",
> >                           "libc.so" ];
> 
> Well, if OS.Constants.Path.libc is correct, we don't need to try the others,
> do we?

That is correct. We should not even need to loop over different alternatives. This should be enough:
try {
     libc = ctypes.open(OS.Constants.Path.libcFilename);
catch (x) {
     LOG("Could not open libc ", OS.Constants.Path.libcFilename);
}

I left all the alternatives there in case the unexpected happens.

(I just tested run the python's find_library('c') on an OSX box and there it returns libc.dylib which appears to be a symlink libc.dylib -> libSystem.dylib -> libSystem.B.dylib. I'm not sure if libc.dylib vs libSystem.B.dylib matters.)
(In reply to Jeremie Le Hen from comment #17)
> This is dangerous to hard-code "libc.so.6".

The existence of this bug itself is a proof of that...

My patch detects the libc file name at build time. (if libc name changes runtime you'll have to recompile it all anyways)
(In reply to Natanael Copa from comment #19)
> (In reply to Jeremie Le Hen from comment #17)
> > This is dangerous to hard-code "libc.so.6".
> 
> The existence of this bug itself is a proof of that...
> 
> My patch detects the libc file name at build time. (if libc name changes
> runtime you'll have to recompile it all anyways)

You're right.  I suggest you entirely remove the "libc.so.6" from the list then.  I think it's better to break blatantly rather than sneakily.
Comment on attachment 8346464 [details] [diff] [review]
detect-libc-filename.patch

Review of attachment 8346464 [details] [diff] [review]:
-----------------------------------------------------------------

I don't see how this makes sense. The machines we build on and the machines we run on have no connection, so how can detecting this at build time possibly work? The Windows/Mac cases seem to be trivial (the Python module literally just returns the MSVCRT or libc.dylib for them), and the Linux one sounds like we could just try a list of useful names.

Also in the Windows case the Python ctypes version is just wrong, since it'll return the CRT that Python was built against, not the one that Firefox was built against.

Python ctypes ref:
http://svn.python.org/view/python/branches/release27-maint/Lib/ctypes/util.py?revision=84835&view=markup
Attachment #8346464 - Flags: review?(ted) → review-
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #21)
> Comment on attachment 8346464 [details] [diff] [review]
> detect-libc-filename.patch
> 
> Review of attachment 8346464 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I don't see how this makes sense. The machines we build on and the machines
> we run on have no connection, so how can detecting this at build time
> possibly work?

Do you support crosscompiling of firefox? (eg build Firefox for linux on windows or build for ARM on x86). If you do support crosscompile then the patch will not work.

> The Windows/Mac cases seem to be trivial (the Python module
> literally just returns the MSVCRT or libc.dylib for them), and the Linux one
> sounds like we could just try a list of useful names.

How do you define the list of useful names? How do you define the search order? What if your system you run it on has multiple libc's installed (eg you have both glibc, uclibc and musl libc installed on same system. ELF does not forbid that...). There are atleast 2 systems that breaks due to the current list and as soon any POSIX system updates to an new libc ABI (or somebody wants to port to a new system) the list will need to be updated. Would you be willing to maintain it?

Wouldn't it be better to try detect the libc firefox will be linked against? We could even testcompile a minimal C prog with same CC and CFLAGS as firefox and use readelf to find out what we linked against. Something like:

 echo "int main(void) { return 0; }" > link-test.c
 # CC, CFLAGS, LDFLAGS should correspond to whatever firefox will be built with
 $CC $CFLAGS -o link-test link-test.c $LDFLAGS
 MOZ_LIBC_FILENAME=$(readelf -d link-test | grep 'NEEDED.*\[libc\.' | sed 's/.*\[\(.*\)\].*/\1/')
 rm link-test link-test.c

> Also in the Windows case the Python ctypes version is just wrong, since
> it'll return the CRT that Python was built against, not the one that Firefox
> was built against.

AFAIK, Windows will not use this at all as the issue we are trying to resolve does not affect it. (oh btw, it looks like on windows 'kernel32.dll' will be used regardless of what libc Firefox was built against)

> Python ctypes ref:
> http://svn.python.org/view/python/branches/release27-maint/Lib/ctypes/util.
> py?revision=84835&view=markup

Yeah, the implementation was buggy on Alpine Linux so i have patched that code too. (which is why i knew about it in the first place)
Yes, we support cross-compiling. We have Android builds that are built on Linux, we're working on cross-compiling from Linux to Darwin, and there are people out there cross-compiling from Linux to Windows.
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #23)
> Yes, we support cross-compiling.

So we cannot use the python trick. I think the only way to do it properly is to compile/link a minimal test program and check what libc it got linked to.
Attached patch detect-libc-filename.patch (obsolete) — Splinter Review
This patch will test compile and use readelf -d | grep NEEDED to find what libc we linked against. It should work even when crosscompiling.

It also removes the libc.so.6 from candidate list as requested but still keeps libSystem.B.dylib and libc.so as fallback in case the detection goes wrong.
Attachment #8346464 - Attachment is obsolete: true
Attachment #8348624 - Flags: review?(ted)
Comment on attachment 8348624 [details] [diff] [review]
detect-libc-filename.patch

Review of attachment 8348624 [details] [diff] [review]:
-----------------------------------------------------------------

I'm punting this to glandium.
Attachment #8348624 - Flags: review?(ted) → review?(mh+mozilla)
What is this libc used for?
Flags: needinfo?(dteller)
OS.File dynamically links to this libc and its functions and uses it build the Unix implementation of the cross-platform API:
http://dxr.mozilla.org/mozilla-central/toolkit/components/osfile/osfile_unix_back.jsm
Flags: needinfo?(dteller)
Can't we use the special "a.out" instead of libc?
Flags: needinfo?(dteller)
I seem to remember that a.out doesn't exist on all platforms, but we can put it as first choice and see what gives.
Flags: needinfo?(dteller)
Comment on attachment 8350595 [details] [diff] [review]
Loading a.out as the first choice of libc

Well, the results are encouraging.
Attachment #8350595 - Flags: review?(nfroyd)
Attachment #8350595 - Flags: review?(mh+mozilla)
(In reply to David Rajchenbach Teller [:Yoric] (away 20 Dec to Jan 4th - please use "needinfo?") from comment #32)
> Comment on attachment 8350595 [details] [diff] [review]
> Loading a.out as the first choice of libc
> 
> Well, the results are encouraging.

ISTR that we had issues with this dynamic library loading at some time in the past and/or there might have been some reason we decided to not use a.out.  (Those Android tests on the try push look...unencouraging, for instance.)  Investigating that and finding the bug(s) would put this bug above the "review in 2 minutes or less" threshold I'm currently try to hold to at this point.  If you want to do my research for me, that'd be great! :)  Otherwise, I'll go code spelunking in the new year.
Comment on attachment 8348624 [details] [diff] [review]
detect-libc-filename.patch

I'd rather find a way that doesn't involve configure.
Attachment #8348624 - Flags: review?(mh+mozilla) → feedback-
Comment on attachment 8350595 [details] [diff] [review]
Loading a.out as the first choice of libc

Review of attachment 8350595 [details] [diff] [review]:
-----------------------------------------------------------------

Try says this doesn't work so well, so you may want to try to put a.out last and remove libc.so.6.
Attachment #8350595 - Flags: review?(nfroyd)
Attachment #8350595 - Flags: review?(mh+mozilla)
Attachment #8350595 - Flags: feedback-
(In reply to Mike Hommey [:glandium] from comment #34)
> Comment on attachment 8348624 [details] [diff] [review]
> detect-libc-filename.patch
> 
> I'd rather find a way that doesn't involve configure.

The problem is that the FFI does dlopen(3) of the libc file. What the the file actually is depends completely on the linker at build time. (eg what file is used when linking with -lc). The libc file name is a constant that does not change once the application is linked. Detecting this from configure is IMHO the simplest way to do it reliably.

If not using configure for this, then I think the only sane way to do it is to parse the ELF header in a known binary (for example the firefox binary itself) and read the NEEDED section and find out what libc the firefox binary was actually linked against.

The information is there so there is no need to blindly shoot at a guess list of candidates and hope something hits. There is no guarantee that you will hit the correct libc or any libc at all.
(In reply to Mike Hommey [:glandium] from comment #29)
> Can't we use the special "a.out" instead of libc?

No. a.out has not really anything to do with libc. http://en.wikipedia.org/wiki/A.out
(In reply to Mike Hommey [:glandium] from comment #35)
> Comment on attachment 8350595 [details] [diff] [review]
> Loading a.out as the first choice of libc
> 
> Review of attachment 8350595 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Try says this doesn't work so well, so you may want to try to put a.out last
> and remove libc.so.6.

NACK. There are no sane systems that calls their shared libc for a.out.

Also, on most current GNU/Linux'es the correct file to open will be libc.so.6. The correct filename depends on system/toolchain.
(In reply to Natanael Copa from comment #38)
> (In reply to Mike Hommey [:glandium] from comment #35)
> > Comment on attachment 8350595 [details] [diff] [review]
> > Loading a.out as the first choice of libc
> > 
> > Review of attachment 8350595 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > Try says this doesn't work so well, so you may want to try to put a.out last
> > and remove libc.so.6.
> 
> NACK. There are no sane systems that calls their shared libc for a.out.
> 
> Also, on most current GNU/Linux'es the correct file to open will be
> libc.so.6. The correct filename depends on system/toolchain.

Picking the last message to reply to numerous statements that have been made through this bug.

I have the feeling there's some kind of over-engineering going on.  You cannot sanely try to cope with every single operating system weirdness you will find out there.  This is a never-ending chase.

I suggest going the easy and straightforward way: just try to open "libc.so" and MacOS X' equivalent and fail blatantly if you cannot (I personally don't know any OS that calls his libc "a.out"):
- First, this will avoid failing in an awkward manner which may be difficult to debug.
- Second, each maintainer probably has a better insight to deal with his own OS weirdness (they may even have already the bits in place to work around this problem: FreeBSD allows to redirect a dlopen(2)'ed file to another another file [1]).


Performing some introspection on Firefox' own binary (that is, looking for DT_NEEDED entries and trying to match the right one) might be appealing, but you have to keep in mind to honor the ELF spec and the system's implementation specificities: DT_NEEDED entries contains only the basename of the shared library, you have to mimic all the runtime linker's ways to look up the right path ($LD_LIBRARY_PATH, the binary's DT_RPATH, the system config), **and do it in the right order**.  Otherwise you may end up loading the wrong library and fail in an awkward manner again.


If you really want to relieve maintainers from dealing with this problem as much as possible, then one method you may try is to make Firefox execute ldd(1) on itself, capture the output and try to match the right library; that way you delegate the path dance to the system'd runtime linker.  You have to cope with the different format of ldd(1) output out there though; I don't even know if it exists everywhere.  But in that case you can still fall back to the straightforward method.


[1] http://www.freebsd.org/cgi/man.cgi?query=libmap.conf

-- Jeremie
(In reply to Jeremie Le Hen from comment #39)

> I suggest going the easy and straightforward way: just try to open "libc.so"
> and MacOS X' equivalent and fail blatantly if you cannot (I personally don't
> know any OS that calls his libc "a.out"):
> - First, this will avoid failing in an awkward manner which may be difficult
> to debug.

I agree that its better to fail early.

> - Second, each maintainer probably has a better insight to deal with his own
> OS weirdness (they may even have already the bits in place to work around
> this problem: FreeBSD allows to redirect a dlopen(2)'ed file to another
> another file [1]).
> 
> 
> Performing some introspection on Firefox' own binary (that is, looking for
> DT_NEEDED entries and trying to match the right one) might be appealing, but
> you have to keep in mind to honor the ELF spec and the system's
> implementation specificities:

[snipping some insightsful rows] 

> If you really want to relieve maintainers from dealing with this problem as
> much as possible, then one method you may try is to make Firefox execute
> ldd(1) on itself, capture the output and try to match the right library;
> that way you delegate the path dance to the system'd runtime linker.  You
> have to cope with the different format of ldd(1) output out there though; I
> don't even know if it exists everywhere.  But in that case you can still
> fall back to the straightforward method.

Talked with a friend and he said "why do they need to dlopen()? why not just use dlsym(RTLD_DEFAULT, ...)"

And this is a really a good point. Theoretically we don't need open it since it already is open. It also looks that calling dlopen(0,...) will return a handle to the global symbol table[1].

I think that the proper solution would be to use that rather than trying to again try dlopen the libc. It would save us from lots of headache.

Maybe something in the direction of:

--- a/nsprpub/pr/src/linking/prlink.c
+++ b/nsprpub/pr/src/linking/prlink.c
@@ -795,17 +795,21 @@ pr_LoadLibraryByPathname(const char *nam
     /* ensure the file exists if it contains a slash character i.e. path */
     /* DARWIN's dlopen ignores the provided path and checks for the */
     /* plain filename in DYLD_LIBRARY_PATH */
     if (strchr(name, PR_DIRECTORY_SEPARATOR) == NULL ||
         PR_Access(name, PR_ACCESS_EXISTS) == PR_SUCCESS) {
             h = dlopen(name, dl_flags);
         }
 #else
-    h = dlopen(name, dl_flags);
+    if (strcmp(name, "RTLD_GLOBAL") == 0) {
+        h = dlopen(0, dl_flags); /* or: h = RTLD_DEFAULT; */
+    } else {
+        h = dlopen(name, dl_flags);
+    }
 #endif
 #elif defined(USE_HPSHL)
     int shl_flags = 0;
     shl_t h;
 
     /*
      * Use the DYNAMIC_PATH flag only if 'name' is a plain file
      * name (containing no directory) to match the behavior of


[1] http://pubs.opengroup.org/onlinepubs/009695399/functions/dlopen.html
(In reply to Natanael Copa from comment #40)
> And this is a really a good point. Theoretically we don't need open it since
> it already is open. It also looks that calling dlopen(0,...) will return a
> handle to the global symbol table[1].
> 
> I think that the proper solution would be to use that rather than trying to
> again try dlopen the libc. It would save us from lots of headache.

Just FYI, NSPR does this:

http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/linking/prlink.c#166

and assigns that handle to the special name "a.out":

http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/linking/prlink.c#192

So that's why we've been talking about using "a.out".  If you'd like to try the patch that David pushed to Try that uses "a.out", please do!  That would be a useful data point.

But "a.out" doesn't work on Android.  So it's not a fully general solution, hence Mike's comment 35.
(In reply to Nathan Froyd (:froydnj) (AFKish through 3 January 2014) from comment #41)
> (In reply to Natanael Copa from comment #40)
> > And this is a really a good point. Theoretically we don't need open it since
> > it already is open. It also looks that calling dlopen(0,...) will return a
> > handle to the global symbol table[1].
> > 
> > I think that the proper solution would be to use that rather than trying to
> > again try dlopen the libc. It would save us from lots of headache.
> 
> Just FYI, NSPR does this:
> 
> http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/linking/prlink.
> c#166
> 
> and assigns that handle to the special name "a.out":
> 
> http://mxr.mozilla.org/mozilla-central/source/nsprpub/pr/src/linking/prlink.
> c#192

Oh.. that explains alot. The use 'a.out' made me think people were going nuts but it all makes sense now. Sorry.

> So that's why we've been talking about using "a.out".  If you'd like to try
> the patch that David pushed to Try that uses "a.out", please do!  That would
> be a useful data point.

I expect it to just work, but I will test it.
 
> But "a.out" doesn't work on Android.  So it's not a fully general solution,
> hence Mike's comment 35.

Well, the behavior is mentioned in POSIX[1] so either Android is not POSIX compliant (and needs special handling anyways) or is broken and should be fixed. I assume dlopen'ing libc.so.6 won't work on Android either.

I believe this is the direction we want to go.

[1] http://pubs.opengroup.org/onlinepubs/000095399/functions/dlopen.html
(In reply to Natanael Copa from comment #38)
> (In reply to Mike Hommey [:glandium] from comment #35)
> > Comment on attachment 8350595 [details] [diff] [review]
> > Loading a.out as the first choice of libc
> > 
> > Review of attachment 8350595 [details] [diff] [review]:
> > -----------------------------------------------------------------
> > 
> > Try says this doesn't work so well, so you may want to try to put a.out last
> > and remove libc.so.6.
> 
> NACK. There are no sane systems that calls their shared libc for a.out.
> 
> Also, on most current GNU/Linux'es the correct file to open will be
> libc.so.6. The correct filename depends on system/toolchain.

After gaining insight in the special handling of 'a.out', I'd like to revert my NACK to an ACK. Sorry for my ignorance.
Here's a trivial patch that seems to do the trick.
Try: https://tbpl.mozilla.org/?tree=Try&rev=8290759c67ce
Attachment #757453 - Attachment is obsolete: true
Attachment #8348624 - Attachment is obsolete: true
Attachment #8350595 - Attachment is obsolete: true
Attachment #8358332 - Flags: review?(mh+mozilla)
Comment on attachment 8358332 [details] [diff] [review]
Loading a.out as the first choice of libc, v2

Review of attachment 8358332 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/osfile/modules/osfile_unix_allthreads.jsm
@@ +40,5 @@
>  
>  // Open libc
>  let libc;
> +let libc_candidates =  [ "libc.so",
> +                         "a.out" ];

Does that work on mac?
Attachment #8358332 - Flags: review?(mh+mozilla) → review+
(In reply to Mike Hommey [:glandium] from comment #45)
> Does that work on mac?

According to my tests, it does.
https://hg.mozilla.org/mozilla-central/rev/c0298b69e631
Status: UNCONFIRMED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
(In reply to David Rajchenbach Teller [:Yoric] (please use "needinfo?") from comment #46)
> (In reply to Mike Hommey [:glandium] from comment #45)
> > Does that work on mac?
> 
> According to my tests, it does.

Looks like it does not at least for the 32bit mode. There is bug 960509 on file now.
Looks like we should readd "libSystem.B.dylib" as last item of the list.
You need to log in before you can comment on or make changes to this bug.