Open Bug 1543602 Opened 1 year ago Updated 1 year ago

[PATCH] Cannot spawn a child process for native messaging on NetBSD

Categories

(Toolkit :: Async Tooling, defect, P2)

66 Branch
defect

Tracking

()

UNCONFIRMED

People

(Reporter: pho, Unassigned)

Details

Attachments

(1 file, 1 obsolete file)

User Agent: Mozilla/5.0 (X11; NetBSD amd64; rv:66.0) Gecko/20100101 Firefox/66.0

Steps to reproduce:

I installed Tridactyl on Firefox 66.0.2 on NetBSD 8.0/amd64, and tried to launch its native messaging helper program by running its ":native" command.

I have already found out the cause, and will later post a patch to fix it.

Actual results:

The helper program didn't start, and an error was shown in the browser console:

cannot read contents of null pointer ctypes.char.ptr.ptr(ctypes.UInt64("0x0")) subprocess_unix.jsm:115

Expected results:

The helper program ($HOME/.local/share/tridactyl/native_main.py) starts up, and successfully communicates with Tridactyl.

Attached patch bug-1543602.patch (obsolete) — Splinter Review

This relates to #1519750 and #1538102, and actually I believe this is the root cause of both of these issues. The problem is in the way how subprocess_shared_unix.js tries to acquire the address of environ(7).

On Linux, libc.so is actually an ld script so dlopen'ing it will always fail. libSystem.B.dylib is only present on Darwin so the first two candidates are ignored. The last one "a.out", the symbol table of the main executable and its dependencies, is therefore the only possible candidate.

On NetBSD, on the other hand, libc.so is a symbolic link to an actual ELF shared library so dlopen succeeds. However, the symbol environ resides in the bss section of both libc.so and the main executable (crt0.o actually), and the former is overridden by the latter and is thus always unused. This means dlsym(dlopen("libc.so", ...), "environ") returns the address of the wrong environ which always contains NULL. This is usually not a problem because there is normally no reason to dlopen libc.so.

So the fix is to always search for libc symbols in the NULL handle (i.e. the one which dlopen(NULL, ...) returns) instead of explicitly asking for libc.

Commenting here instead of pkgsrc,

libSystem.B.dylib is only present on Darwin

I don't think that's a reason to remove it, this file seems to be used on Darwin

(In reply to coypu from comment #2)

Commenting here instead of pkgsrc,

libSystem.B.dylib is only present on Darwin

I don't think that's a reason to remove it, this file seems to be used on Darwin

libSystem.B.dylib is the libc on Darwin. This means it is still possible to find libSystem symbols through a NULL dlopen handle (the "a.out"), because the main executable is linked with it anyway.

This looks like it belongs on Toolkit, therefore moving it out from Untriaged.

Component: Untriaged → General
Product: Firefox → Toolkit
Component: General → Async Tooling
Comment on attachment 9057487 [details] [diff] [review]
bug-1543602.patch

Review of attachment 9057487 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/modules/subprocess/subprocess_shared_unix.js
@@ +19,5 @@
> + * because they are meant to be overridden by the main executable.
> + * So the most portable way to access libc symbols is to do it through
> + * the NULL handle, i.e. the one which NSPR calls "a.out".
> + */
> +const LIBC_CHOICES = ["a.out"];

Please keep "libSystem.B.dylib" in the list. That is still used on OS-X.

I don't think that's needed because the main executable is still linked with libSystem.B.dylib on Darwin, but anyway I rebased and updated my patch.

Attachment #9057487 - Attachment is obsolete: true

The priority flag is not set for this bug.
:Yoric, could you have a look please?

For more information, please visit auto_nag documentation.

Flags: needinfo?(dteller)
Flags: needinfo?(dteller)
Priority: -- → P2
You need to log in before you can comment on or make changes to this bug.