Closed Bug 468239 Opened 16 years ago Closed 15 years ago

setting cmdline.preventDefault = true from an extension causes a crash because at-spi was unloaded and illegally used g_atexit

Categories

(Core :: Disability Access APIs, defect)

defect
Not set
normal

Tracking

()

VERIFIED DUPLICATE of bug 460926

People

(Reporter: Fallen, Unassigned)

Details

(Keywords: crash, testcase)

Attachments

(2 files)

Attached file Testcase XPI
Tested this with firefox 3.0.6pre and a custom compiled Shredder.

Setting cmdline.preventDefault = true causes a crash for me. Example backtrace:

(gdb) bt
#0  0xb7f27430 in __kernel_vsyscall ()
#1  0xb710ade6 in nanosleep () from /lib/tls/i686/cmov/libc.so.6
#2  0xb710abfe in sleep () from /lib/tls/i686/cmov/libc.so.6
#3  0xb7d6b3cd in ah_crap_handler (signum=11) at /home/kewisch/mozilla/comm-central/mozilla/toolkit/xre/nsSigHandlers.cpp:149
#4  0xb7d6b907 in nsProfileLock::FatalSignalHandler (signo=11) at nsProfileLock.cpp:216
#5  <signal handler called>
#6  0xb6c60480 in ?? ()
#7  0xb708468d in __libc_start_main () from /lib/tls/i686/cmov/libc.so.6
#8  0x08049b61 in _start ()

I've also had a backtrace with only the last three frames.

I think this is very important even for 1.9.1 since it will not allow extensions to install commandline handlers that don't bring up UI.
Flags: blocking1.9.1?
I forgot to update the extension to work with all toolkit apps, install in a shredder nightly works out of the box. Sorry about that.

Call application with -preventDefault to expose crash.
Keywords: crash
Stack with libc debug symbols installed: 

#0  0xb6d76480 in ?? ()
#1  0xb719668d in __libc_start_main (main=0x8049c24 <main>, argc=5, ubp_av=0xbfd1e844, init=0x8058460 <__libc_csu_init>, fini=0x8058450 <__libc_csu_fini>, rtld_fini=0xb8010f50 <_dl_fini>, 
    stack_end=0xbfd1e83c) at libc-start.c:252
#2  0x08049b61 in _start ()
Further debugging shows that this happens on one of the numerous __cxa_atexit handlers. I haven't been able to identify which one.
After pairing up the __cxa_atexit registrations with their callings at shutdown and installing some more debug symbols, the attached registration stack seems to cause the segfault when the exit handler is called.
i suppose i should explain. you can't safely use any atexit if you aren't statically part of the running application, doing so risks being unloaded and leaving a dangling function pointer. (technically it's worse than that, someone could be loaded or alloced in your place, which is just *very* scary.)

fwiw, the bug i filed is a cookie cutter bug based on https://bugs.maemo.org/show_bug.cgi?id=3420
Summary: setting cmdline.preventDefault = true from an extension causes a crash → setting cmdline.preventDefault = true from an extension causes a crash because at-spi was unloaded and illegally used g_atexit
This does not appear to be a Mozilla bug. Is there some workaround in the Mozilla code which we can/should apply?
Flags: blocking1.9.1? → blocking1.9.1-
we could try to intentionally leak their library, it should work.

other than that, if we're really clever, and perhaps we should just do it, we could try to replace at_exit/g_atexit.

If we're really lucky and properly hook at_exit we could create our own datastructure and remember which library the address is. if we do that, we could decide to either call the at_exit before we unload, or we could just drop it when we unload.

I don't think this is conceptually hard, and given that these methods will never work properly if our application is behaving properly, we might want to do it.
In which library does spi_atk_bridge_exit_func live, and how would we intentionally leak it?

I think that intercepting atexit/g_atexit is too fragile and is not a good idea.
Component: Cmd-line Features → Disability Access APIs
QA Contact: cmd-line → accessibility-apis
Version: 1.9.1 Branch → Trunk
Ginn, could you look at this?
benjamin: sorry, ask someone (fallen) to use grep, nm, or info share.
you leak a library by having one additional reference count to it (dlopen). It won't actually unload until its reference count reaches zero. if its reference count doesn't reach zero, then well, it's leaked :).

alexander/ginn: please just get spi_atk fixed upstream, thanks.
We do intentional leak the libraries, see
http://mxr.mozilla.org/mozilla-central/source/accessible/src/atk/nsAppRootAccessible.cpp#586

But I think the problem is we didn't reach the place where we dlopen the library since there's no UI.

I'll ask Li Yuan to fix it in atk-bridge.
Since we fixed Bug 460926, this bug should not be reproducible on newer GNOME. e.g. Ubuntu 9.04.
We suppress atk_bridge_init() during start until we have UI.
When we really want to do atk_bridge_init(), we dlopen the lib and intentional leak it.

I didn't reproduce it on my box.

Mark it as DUPE of 460926.
Feel free to reopen if I was wrong.

Li Yuan will fix the real problem by using another way to do the callback.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → DUPLICATE
Thanks for looking into this. It seems things are fixed now. The testcase and my extension work fine now.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: