Open Bug 1284105 Opened 3 years ago Updated Last year

plugin-container segfault on linux gtk3 because we're hitting old-gtk2 compatibility codepaths

Categories

(Core :: Plug-ins, defect, P3)

45 Branch
defect

Tracking

()

REOPENED

People

(Reporter: martin, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

3.40 KB, text/plain
Details
User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Firefox/45.0
Build ID: 20160617015606

Steps to reproduce:

Open any page that has some flash content (using firefox 45.2.0 in Gentoo with KDE5).


Actual results:

Flash content is not displayed, instead "The Adobe Flash plugin has crashed" is displayed in place of flash element.

dmesg reports segfault in libmozgtk.so:
plugin-containe[11853]: segfault at 0 ip 00007f459d5b88e7 sp 00007ffe92da8ef8 error 6 in libmozgtk.so[7f459d5b8000+1000]

if I do ulimit -c unlimited and then run firefox from that shell, I am able to get core file and open it in gdb. This is complete backtrace (of thread that caused that segfault):
Program terminated with signal SIGSEGV, Segmentation fault.
(gdb) bt full
#0  0x00007ffff11248e7 in gdk_window_lookup () from /usr/lib64/firefox/libmozgtk.so
No symbol table info available.
#1  0x00007ffff463a259 in mozilla::plugins::PluginInstanceChild::AnswerNPP_SetWindow(mozilla::plugins::NPRemoteWindow const&) () from /usr/lib64/firefox/libxul.so
No symbol table info available.
#2  0x00007ffff3850b9f in mozilla::plugins::PPluginInstanceChild::OnCallReceived(IPC::Message const&, IPC::Message*&) () from /usr/lib64/firefox/libxul.so
No symbol table info available.
#3  0x00007ffff3849819 in mozilla::plugins::PPluginModuleChild::OnCallReceived(IPC::Message const&, IPC::Message*&) () from /usr/lib64/firefox/libxul.so
No symbol table info available.
#4  0x00007ffff3777996 in mozilla::ipc::MessageChannel::DispatchInterruptMessage(IPC::Message const&, unsigned long) () from /usr/lib64/firefox/libxul.so
No symbol table info available.
#5  0x00007ffff3778ae0 in mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message const&)
    () from /usr/lib64/firefox/libxul.so
No symbol table info available.
#6  0x00007ffff377a083 in mozilla::ipc::MessageChannel::OnMaybeDequeueOne() ()
   from /usr/lib64/firefox/libxul.so
No symbol table info available.
#7  0x00007ffff373cb93 in MessageLoop::RunTask(Task*) () from /usr/lib64/firefox/libxul.so
No symbol table info available.
#8  0x00007ffff373e6bd in MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&) () from /usr/lib64/firefox/libxul.so
No symbol table info available.
#9  0x00007ffff373f341 in MessageLoop::DoWork() () from /usr/lib64/firefox/libxul.so
No symbol table info available.
#10 0x00007ffff3738f7d in base::MessagePumpForUI::HandleDispatch() ()
   from /usr/lib64/firefox/libxul.so
No symbol table info available.
#11 0x00007ffff3738fb0 in (anonymous namespace)::WorkSourceDispatch(_GSource*, int (*)(void*), void*) () from /usr/lib64/firefox/libxul.so
No symbol table info available.
#12 0x00007fffedf201cd in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
No symbol table info available.
---Type <return> to continue, or q <return> to quit---
#13 0x00007fffedf20478 in g_main_context_iterate.isra () from /usr/lib64/libglib-2.0.so.0
No symbol table info available.
#14 0x00007fffedf2051c in g_main_context_iteration () from /usr/lib64/libglib-2.0.so.0
No symbol table info available.
#15 0x00007ffff3738b3d in base::MessagePumpForUI::RunWithDispatcher(base::MessagePump::Delegate*, base::MessagePumpForUI::Dispatcher*) () from /usr/lib64/firefox/libxul.so
No symbol table info available.
#16 0x00007ffff373cbf3 in MessageLoop::Run() () from /usr/lib64/firefox/libxul.so
No symbol table info available.
#17 0x00007ffff4db07e6 in XRE_InitChildProcess () from /usr/lib64/firefox/libxul.so
No symbol table info available.
#18 0x0000000000408f3c in content_process_main(int, char**) ()
No symbol table info available.
#19 0x00007ffff189cfb0 in __libc_start_main () from /lib64/libc.so.6
No symbol table info available.
#20 0x0000000000408bf9 in _start ()
No symbol table info available.


Expected results:

Flash element should normally appear and work.
I'm experiencing this issue since I upgraded from Firefox 42 (I updated my whole system, I'm not sure if it's caused by upgrade of Firefox or some other system component).
I don't recognize this as a previously-known crash: what version of Flash and GTK are you running? 

There is a compatibility hack for older versions of GTK (less than 2.18.7) which appears to be being triggered: http://searchfox.org/mozilla-central/source/dom/plugins/ipc/PluginInstanceChild.cpp#1276

But I can't explain why that call to gdk_window_lookup would then crash unless something else were wrong.
Flags: needinfo?(martin)
I'm using GTK+ 3.16.7 and Firefox 45.2.0 ESR (but I tried also 47 and the problem was the same).

Hmm, I don't see a reason why gdk_window_lookup in the mentioned piece of code should be triggered, since I'm using newer version of GTK...
Flags: needinfo?(martin)
I don't either! If you're interested in stepping through this, I'd love to see whether gtk_check_version is returning true or false. I wonder if this is some sort of GTK2/GTK3 mismatch.

In any case, I don't have the engineering resources for the team to diagnose this ourself. I'm going to leave it open because if you can find an explanation that might trigger a quick fix, and I'd certainly take a patch!
Priority: -- → P5
Ok, I can now confirm that gtk_check_version is returning not null in this case and thus the branch with gdk_window_lookup is triggered.

According to https://developer.gnome.org/gtk3/stable/gtk3-Feature-Test-Macros.html#gtk-check-version, it does not exactly check if version "is newer than", but it check if "the GTK+ library is compatible with the given version".
It returns "NULL if the GTK+ library is compatible with the given version, or a string describing the version mismatch".

The returned string in this case is "GTK+ version too new (major mismatch)".

Unfortunately, it is not sufficient to disable this branch (adding "&& 0") as temporary fix. It will segfault again, but this time on different place. I will try to keep digging...
I'm quite stucked since then. 

I'm quite sure that some branches that should be triggered with old GTK+ 2.x are incorectly triggered also with GTK+ 3.x.
I used grep to find gtk_check_version, there are few occurences of this, checking for old GTK+:
./dom/plugins/ipc/PluginModuleChild.cpp:468:        if (gtk_check_version(2,18,7) != nullptr && // older
./dom/plugins/ipc/PluginInstanceChild.cpp:1225:    if (mXEmbed && gtk_check_version(2,18,7) != nullptr) { // older
./dom/plugins/ipc/PluginInstanceChild.cpp:1240:            && gtk_check_version(2, 12, 10) != nullptr) { // older

I'm quite sure this is wrong. 
But only disabling these branches doesn't fix the problem.

After disabling it, I got another segfault, this time with following backtrace:
Program terminated with signal SIGSEGV, Segmentation fault.
#0  0x00007f688060b94f in gtk_object_get_type () from /usr/lib64/firefox/libmozgtk.so
(gdb) bt full
#0  0x00007f688060b94f in gtk_object_get_type () from /usr/lib64/firefox/libmozgtk.so
#1  0x00007f6871f635bc in ?? () from /usr/lib64/nsbrowser/plugins/libflashplayer.so
#2  0x00007f6871f66369 in ?? () from /usr/lib64/nsbrowser/plugins/libflashplayer.so
#3  0x00007f6871f700cf in ?? () from /usr/lib64/nsbrowser/plugins/libflashplayer.so
#4  0x00007f6883b2011f in mozilla::plugins::PluginInstanceChild::AnswerNPP_SetWindow(mozilla::plugins::NPRemoteWindow const&) () from /usr/lib64/firefox/libxul.so
#5  0x00007f6882d37069 in mozilla::plugins::PPluginInstanceChild::OnCallReceived(IPC::Message const&, IPC::Message*&) () from /usr/lib64/firefox/libxul.so
#6  0x00007f6882d2fce3 in mozilla::plugins::PPluginModuleChild::OnCallReceived(IPC::Message const&, IPC::Message*&) () from /usr/lib64/firefox/libxul.so
#7  0x00007f6882c5de60 in mozilla::ipc::MessageChannel::DispatchInterruptMessage(IPC::Message const&, unsigned long) () from /usr/lib64/firefox/libxul.so
#8  0x00007f6882c5efaa in mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message const&)
    () from /usr/lib64/firefox/libxul.so
#9  0x00007f6882c6054d in mozilla::ipc::MessageChannel::OnMaybeDequeueOne() ()
   from /usr/lib64/firefox/libxul.so
#10 0x00007f6882c2305d in MessageLoop::RunTask(Task*) () from /usr/lib64/firefox/libxul.so
#11 0x00007f6882c24b87 in MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&) () from /usr/lib64/firefox/libxul.so
#12 0x00007f6882c2580b in MessageLoop::DoWork() () from /usr/lib64/firefox/libxul.so
#13 0x00007f6882c1f447 in base::MessagePumpForUI::HandleDispatch() ()
   from /usr/lib64/firefox/libxul.so
#14 0x00007f6882c1f47a in (anonymous namespace)::WorkSourceDispatch(_GSource*, int (*)(void*), void*) () from /usr/lib64/firefox/libxul.so
#15 0x00007f687d4071cd in g_main_context_dispatch () from /usr/lib64/libglib-2.0.so.0
#16 0x00007f687d407478 in g_main_context_iterate.isra () from /usr/lib64/libglib-2.0.so.0
#17 0x00007f687d40751c in g_main_context_iteration () from /usr/lib64/libglib-2.0.so.0
#18 0x00007f6882c1f007 in base::MessagePumpForUI::RunWithDispatcher(base::MessagePump::Delegate*, base::MessagePumpForUI::Dispatcher*) () from /usr/lib64/firefox/libxul.so
#19 0x00007f6882c230bd in MessageLoop::Run() () from /usr/lib64/firefox/libxul.so
#20 0x00007f688429591a in XRE_InitChildProcess () from /usr/lib64/firefox/libxul.so
#21 0x0000000000408f3c in content_process_main(int, char**) ()
#22 0x00007f6880d83fb0 in __libc_start_main () from /lib64/libc.so.6
#23 0x0000000000408bf9 in _start ()

I'm quite lost here, because only occurence of gtk_object_get_type in Firefox code is in:
./widget/gtk/mozgtk/mozgtk.c:593:STUB(gtk_object_get_type)
I suppose that this backtrace means that flash-plugin was invoked after some preparations in AnswerNPP_SetWindow and then something went wrong.

Maybe this warning from console could be also related to that:
(plugin-container:7572): Gdk-WARNING **: The GDK_NATIVE_WINDOWS environment variable is not supported in GTK3.
See the documentation for gdk_window_ensure_native() on how to get native windows.

But I don't know GTK+ much, I'm not sure.
I'd like to dig more, but at the moment I don't have any idea where to start...
ok, this is interesting. We may have not been suitably careful when we switched from GTK2 to GTK3, not realizing that it would trigger some old compat code.

Adding a couple people who were involved with the GTK3 transition. It could be that these checks are just completely wrong in a gtk3 environment and should be #ifdefed out.
Blocks: gtk3
Flags: needinfo?(stransky)
Flags: needinfo?(karlt)
Priority: P5 → P3
Summary: plugin-container segfault on linux → plugin-container segfault on linux gtk3 because we're hitting old-gtk2 compatibility codepaths
Oh and so you're aware, the gtk_object_get_type call is coming from Flash itself, not from the Firefox code.
Some things need clarification here.

42 and 45 are *not* using Gtk+3. Except maybe in the Fedora packages for Firefox, or for people that enable it manually.

As for Gtk+3 builds, we have ugly hacks in place that make the plugin-container itself still use Gtk+2.
Aha!

> #0  0x00007ffff11248e7 in gdk_window_lookup () from /usr/lib64/firefox/libmozgtk.so

Since this is a plugin-container process for a plugin, the path should be /usr/lib64/firefox/gtk2/libmozgtk.so. LD_LIBRARY_PATH is modified before spawning a plugin process so that it contains the gtk2 directory.
https://dxr.mozilla.org/mozilla-central/source/ipc/glue/GeckoChildProcessHost.cpp#750
So the question becomes why isn't it happening for you, assuming you *are* using a Gtk+3 build.
Ok, that changes the situation if that previously mentioned parts of code should *not* use GTK3.

Yes, that's strange.
I printed LD_LIBRARY_PATH from the piece of code you mentioned and LD_LIBRARY_PATH (new_ld_lib_path) is "/usr/lib64/firefox/gtk2:/usr/lib64/firefox".
There is also libmozgtk.so in /usr/lib64/firefox/gtk2.

But the library used by plugin-container is /usr/lib64/firefox/libmozgtk.so:
(gdb) info sharedlibrary
...
0x00007fbf208287c0  0x00007fbf2082895c  Yes (*)     /usr/lib64/firefox/libmozgtk.so
...

Also ldd shows path to /usr/lib64/firefox/ (but I guess that actual loading path should depend on LD_LIBRARY_PATH):
ldd /usr/lib/firefox/plugin-container
...
        libmozgtk.so => /usr/lib64/firefox/libmozgtk.so (0x00007fd4b44f6000)
...

I don't know why it doesn't use /usr/lib64/firefox/gtk2/libmozgtk.so...

But thanks for pointing me out to that trick and gtk2/gtk3 situation in Firefox. I guess I can rebuild firefox with gtk2 (using gentoo USE flags), that would probably fix the original problem for me.

But I'd be glad if I can help with solving this mystery, any more ideas what can I try?
I see you use Gentoo so I expect you build the package on your own. It would be useful to have your build config. You can compare it with Fedora ones which works fine:

http://pkgs.fedoraproject.org/cgit/rpms/firefox.git/tree/firefox-mozconfig?id=405574520cf35b441647b454f271aeea54a5f6bd
Flags: needinfo?(stransky)
Attached file mozconfig
This should be my .mozconfig (automatically created by emerge (gentoo package manager), I did not edited it myself).
I don't know what to look for there... --enable-default-toolkit=cairo-gtk3 is there, but it's different from the Fedora one.
What does this command print?

> LD_LIBRARY_PATH=/usr/lib64/firefox/gtk2:/usr/lib64/firefox ldd /usr/lib64/firefox/libxul.so | grep mozgtk

If that shows /usr/lib64/firefox/libmozgtk.so, then use

> readelf -d /usr/lib64/firefox/libxul.so | grep PATH

to check there is no RPATH set.  RUNPATH should be fine.

If not, then try

> strace -f -e trace=file firefox -no-remote | grep mozgtk

to see which paths are being tested first in each process.

But note that Firefox 45 was not ready for use with gtk3.
A number of bugs were fixed for 46.
Flags: needinfo?(karlt)
Closing GTK2 related bugs since we removed GTK2 support at the beginning of 2018 in bug 1278282. Probably best to open a new bug in the unlikely event that any of these are still relevant.
Status: UNCONFIRMED → RESOLVED
Closed: Last year
Resolution: --- → INVALID
Oops, shouldn't have closed this one. Is it still happening though?
Status: RESOLVED → REOPENED
Ever confirmed: true
Resolution: INVALID → ---
You need to log in before you can comment on or make changes to this bug.