Closed Bug 503645 Opened 15 years ago Closed 12 years ago

Flash 10 prevents Breakpad from catching crashes

Categories

(Toolkit :: Crash Reporting, defect)

x86
Linux
defect
Not set
major

Tracking

()

RESOLVED INCOMPLETE

People

(Reporter: touristinresidence, Unassigned)

Details

User-Agent:       Mozilla/5.0 (compatible; Konqueror/3.5; Linux; X11; i686; en_US) KHTML/3.5.7 (like Gecko)
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.0.11) Gecko/2009060214 Firefox/3.0.11

I'm running Firefox 3.0.11 on a Slackware 12.0 system. Until recently, I had no idea that Breakpad existed on Linux, because it never ran once despite Firefox crashing every other week. A different problem led me to uninstall the Adobe Flash player from my system, and Breakpad began working. I came across Bug 422308 in my research, however it is a) listed as Windows only, and b) supposed to be fixed with Flash 10. 

I upgraded to Flash 10, and a very similar problem to Bug 422308 still exists. If Flash is being used (not merely loaded), then Breakpad will not run when Firefox crashes.

Reproducible: Always

Steps to Reproduce:
1. Run ./firefox 
2. Firefox loads with 2 tabs: http://seattletransitblog.com/ (which currently has a Flash video on the main page), and http://www.google.com/firefox?client=firefox-a&rls=org.mozilla:en-US:official.
3. Click the Flash video to start playing
4. Select Tools > Crash me! > Null pointer deref!
Actual Results:  
Firefox crashes, and Breakpad does not run.

Expected Results:  
Firefox crashes, and Breakpad does run, collecting the crash info.

There is a lot of other weirdness which may help point to the problem, or may all be red herrings, so I'll just report what I've observed.

This is using a close to brand new profile, on a barebones setup. The only Add-on I have installed is Crash Me Now! Advanced 0.2, and the only non-default plugin is Shockwave Flash 10.0 r22 (just downloaded from Adobe's site today). The only "major" configuration of Firefox that I have done is to disable Javascript.

I have not tested other URLs than http://seattletransitblog.com/, but I suspect any page with embedded Flash video would cause the same problem.

In the repro scenario above, no error is printed on the console. It looks like:
user@wizzard:~/firefox$ ./firefox 
user@wizzard:~/firefox$ 

If I do NOT start the video playing (skipping step 3 above), Breakpad does run as expected (the "Mozilla Crash Reporter" box appears). Also, the console has a line like:
./run-mozilla.sh: line 131:  3803 Segmentation fault      "$prog" ${1+"$@"}

If I DO start the video playing, but then close that tab (leaving only the tab with Google open) before using Crash Me!, then Breakpad also runs as expected. However, no "Segmentation fault" message is produced on the console.

If I crash Firefox using the original 4 steps above, and manually restart it on the command line, and choose "Restore Previous Session", then the video(s) on the Seattle Transit Blog page no longer play (the static images do load, however).

About half of the Crash Me! methods do not crash Firefox. "Pure virtual call", "Invalid parameter to CRT method", and "ObjC exception" do not do anything at all. "Stack Overflow" causes a crash and a "Segmentation fault" message as above, but Breakpad does not run. "Null pointer deref", "Null function call", and "Divide by zero" all cause Breakpad to run (except, of course, in the testcases above where it does not run). This information is only provided in case it is relevant to the problem of Breakpad not running; I certainly don't expect you to debug the Crash Me! extension. :-)

I can reproduce this problem at will, and would love to help fix it, as Firefox had been crashing very often for me, rendering an upgrade to 3.x unusable, and it seems likely that without the Breakpad reports I cannot figure out why that was happening. Please let me know if you need any more information, or if there are other debugging steps you need performed. Thank you!
Can confirm problems using STR. Instead of a crash I get a total freeze of Firefox. Doesn't just prevent the catch of the crash; seems to prevent the crash. I have to kill the process. (Firefox 3/3.5 on Kubuntu 8.04)

For what it's worth, not all of Crash Me Now's features work on all OSes.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Component: General → Breakpad Integration
Product: Firefox → Toolkit
QA Contact: general → breakpad.integration
Yes, I would just use "Null pointer deref" as the basic crash. The other crashes are only useful if you need to test something very specific.

I don't know what the flash plugin is doing on Linux, but it may be installing its own signal handler, and then not calling back to our original signal handler, thereby bypassing it completely.
How would a plugin keep breakpad from running?
By installing a new exception handler and not forwarding exceptions to the breakpad one, or by explicitly uninstalling the breakpad one.
Mike M,  can you comment on the likelihood of this being a Flash bug?
I uninstalled Flash Player in Linux while installing that Crash Me Now Firefox extension. No matter what the crash mode, I don't see a Firefox crash invoking Breakpad. This is under Fedora Core 10 where other program crashes launch Breakpad.

Until I can see Breakpad from a Firefox crash but not from a Firefox w/ Flash crash, I can't verify that this is a Flash problem.
Mike: are you using a Fedora distro build, or an official build from mozilla.org? Fedora builds will not have crash reporting enabled, you need a build from here:
http://www.mozilla.com/products/download.html?product=firefox-3.5.3&os=linux&lang=en-US
I thought this bug was about inhibiting Breakpad.
I downloaded Firefox 3.5.3 and unpacked/ran it manually.

Flash 9 + Crash Me Now = no Mozilla crash reporter
Flash 10 + Crash Me Now = Mozilla crash reporter invoked
I can't reproduce.  I get the crash reporter with Ubuntu 8.10, Firefox 3.5.3, Shockwave Flash 10.0 r32, and playing the video at http://seattletransitblog.com/2009/09/27/sunday-open-thread-double-articulated-bus/

Frank/Dave do you still not get the crash reporter with recent versions?
Dave: do you get the crash reporter if you don't view any Flash sites?
(In reply to comment #8)
> I thought this bug was about inhibiting Breakpad.

Breakpad is the library we use for our crash reporting in Firefox.
Just reproduced this 100% successfully on Slackware 12.0.

Firefox 3.5.3 + Crash Me Now 0.2 ==  Mozilla crash reporter invoked
Firefox 3.5.3 + Crash Me Now 0.2 + Flash 10.0.32.18, video loaded and not playing ==  Mozilla crash reporter invoked
Firefox 3.5.3 + Crash Me Now 0.2 + Flash 10.0.32.18, video loaded and playing ==  Mozilla crash reporter NOT invoked
Also, even if the video is then stopped and no longer playing, the crash reporter is still NOT invoked.
Thanks for checking, Frank.

If you are feeling adventurous, installing debug symbols for libc (which hopefully includes libpthread), attaching a debugger before playing the video, setting breakpoints in signal and __sigaction conditional on sig == 11, continuing, and collecting call stacks each time to breakpoint is hit will likely give some clues as to what is changing the signal handler.
I'll give that a go this weekend, and report back my results.
I filed bug 522332, which could possibly be involved.
If there is still no crashreporter with MOZ_DISABLE_SIG_HANDLER=1 in the environment then that would exclude bug 522332 as the cause of this bug.
Running "export MOZ_DISABLE_SIG_HANDLER=1" before running firefox yields no change in behavior.
Here are the results from the libc with debug exercise. The first two are shortly after the video starts playing, but don't seem to be *as* it starts playing. They could possibly have happened because I accessed the "Tools" menu, but I can't swear to that. The third, obviously, is after selecting the null pointer deref. 

Breakpoint 4, 0xb67a4b09 in __bsd_signal (sig=13, handler=0xaf80d8c0) at ../sysdeps/posix/signal.c:34
34      in ../sysdeps/posix/signal.c
(gdb) bt
#0  0xb67a4b09 in __bsd_signal (sig=13, handler=0xaf80d8c0) at ../sysdeps/posix/signal.c:34
#1  0xaf80eca1 in esd_send_auth () from /usr/lib/libesd.so.0
#2  0xaf80ef40 in esd_open_sound () from /usr/lib/libesd.so.0
#3  0xb7b46b67 in ?? () from ./libxul.so
#4  0xb7e47418 in ?? () from ./libxul.so
#5  0xb7e55cd6 in ?? () from ./libxul.so
#6  0xaed17220 in ?? ()
#7  0xaed17220 in ?? ()
#8  0xaed171e4 in ?? ()
#9  0xb7f58180 in ?? () from ./libxul.so
#10 0xbfb28348 in ?? ()
#11 0xb7b470c3 in ?? () from ./libxul.so
#12 0xaec75a60 in ?? ()
#13 0x00000000 in ?? ()
(gdb) c
Continuing.

Breakpoint 4, 0xb67a4b09 in __bsd_signal (sig=13, handler=0x1) at ../sysdeps/posix/signal.c:34
34      in ../sysdeps/posix/signal.c
(gdb) bt
#0  0xb67a4b09 in __bsd_signal (sig=13, handler=0x1) at ../sysdeps/posix/signal.c:34
#1  0xaf80ed69 in esd_send_auth () from /usr/lib/libesd.so.0
#2  0xaf80ef40 in esd_open_sound () from /usr/lib/libesd.so.0
#3  0xb7b46b67 in ?? () from ./libxul.so
#4  0xb7e47418 in ?? () from ./libxul.so
#5  0xb7e55cd6 in ?? () from ./libxul.so
#6  0xaed17220 in ?? ()
#7  0xaed17220 in ?? ()
#8  0xaed171e4 in ?? ()
#9  0xb7f58180 in ?? () from ./libxul.so
#10 0xbfb28348 in ?? ()
#11 0xb7b470c3 in ?? () from ./libxul.so
#12 0xaec75a60 in ?? ()
#13 0x00000000 in ?? ()
(gdb) c
Continuing.

Program received signal SIGSEGV, Segmentation fault.
0xb5d02b77 in nsCrasher::Crash ()
   from /home/fwojcik/.mozilla/firefox/df8zgkwx.default/extensions/crashme@ted.mielczarek.org/platform/Linux_x86-gcc3/components/libcrashme.so
(gdb) bt
#0  0xb5d02b77 in nsCrasher::Crash ()
   from /home/fwojcik/.mozilla/firefox/df8zgkwx.default/extensions/crashme@ted.mielczarek.org/platform/Linux_x86-gcc3/components/libcrashme.so
#1  0xb7c309db in NS_InvokeByIndex_P () from ./libxul.so
#2  0xb73eb056 in ?? () from ./libxul.so
#3  0xaec79690 in ?? ()
#4  0x00000003 in ?? ()
#5  0x00000001 in ?? ()
#6  0xbfb27924 in ?? ()
#7  0xbfb279cc in ?? ()
#8  0xbfb279b4 in ?? ()
#9  0xbfb27a0c in ?? ()
#10 0x00000001 in ?? ()
#11 0xaed5ce00 in ?? ()
#12 0xaed5ce00 in ?? ()
#13 0xb5cd6a90 in ?? ()
#14 0x00000000 in ?? ()
(gdb) c
Continuing.
No such process

Program exited normally.
(gdb)
In case it matters (nothing above seems to relate to the signal 11 handler, for instance :), the rest of the session was:

fwojcik@wizzard:~/firefox35/firefox$ LD_LIBRARY_PATH=/usr/lib/debug ./firefox -g -d gdb
./run-mozilla.sh -g -d gdb ./firefox-bin
MOZILLA_FIVE_HOME=.
  LD_LIBRARY_PATH=.:./plugins:.:/usr/lib/debug
DISPLAY=:0
DYLD_LIBRARY_PATH=.:.
     LIBRARY_PATH=.:./components:.
       SHLIB_PATH=.:.
          LIBPATH=.:.
       ADDON_PATH=.
      MOZ_PROGRAM=./firefox-bin
      MOZ_TOOLKIT=
        moz_debug=1
     moz_debugger=gdb
/usr/bin/gdb ./firefox-bin -x /tmp/mozargs.BsUXQE
GNU gdb 6.6
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "i486-slackware-linux"...
(no debugging symbols found)
Using host libthread_db library "/usr/lib/debug/libthread_db.so.1".
(gdb) break signal
Function "signal" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 1 (signal) pending.
(gdb) break __sigaction
Function "__sigaction" not defined.
Make breakpoint pending on future shared library load? (y or [n]) y
Breakpoint 2 (__sigaction) pending.
(gdb) run
Starting program: /home/fwojcik/firefox35/firefox/firefox-bin
Breakpoint 3 at 0xb7fbccb5: file ../nptl/sigaction.c, line 38.
Pending breakpoint "__sigaction" resolved
[Thread debugging using libthread_db enabled]
[New Thread -1238067504 (LWP 21608)]
Breakpoint 4 at 0xb67a4b09: file ../sysdeps/posix/signal.c, line 34.
Pending breakpoint "signal" resolved
[New Thread -1245709424 (LWP 21611)]
[Switching to Thread -1238067504 (LWP 21608)]

Breakpoint 4, 0xb67a4b09 in __bsd_signal (sig=13, handler=0x1) at ../sysdeps/posix/signal.c:34
34      ../sysdeps/posix/signal.c: No such file or directory.
        in ../sysdeps/posix/signal.c
(gdb) c
Continuing.
[New Thread -1254098032 (LWP 21612)]
[New Thread -1263535216 (LWP 21613)]

Breakpoint 4, 0xb67a4b09 in __bsd_signal (sig=13, handler=0x1) at ../sysdeps/posix/signal.c:34
34      in ../sysdeps/posix/signal.c
(gdb) c

<lots more New Thread/Thread ... exited events here>
Thank you Frank for trying that.  It seems to suggest that no other signal handlers are being installed.

(In reply to comment #18)
> Program received signal SIGSEGV, Segmentation fault.
> 0xb5d02b77 in nsCrasher::Crash ()
>    from
> /home/fwojcik/.mozilla/firefox/df8zgkwx.default/extensions/crashme@ted.mielczarek.org/platform/Linux_x86-gcc3/components/libcrashme.so
> (gdb) bt
> #0  0xb5d02b77 in nsCrasher::Crash ()
>    from
> /home/fwojcik/.mozilla/firefox/df8zgkwx.default/extensions/crashme@ted.mielczarek.org/platform/Linux_x86-gcc3/components/libcrashme.so
> #1  0xb7c309db in NS_InvokeByIndex_P () from ./libxul.so
> #2  0xb73eb056 in ?? () from ./libxul.so
> #3  0xaec79690 in ?? ()
> #4  0x00000003 in ?? ()
> #5  0x00000001 in ?? ()
> #6  0xbfb27924 in ?? ()
> #7  0xbfb279cc in ?? ()
> #8  0xbfb279b4 in ?? ()
> #9  0xbfb27a0c in ?? ()
> #10 0x00000001 in ?? ()
> #11 0xaed5ce00 in ?? ()
> #12 0xaed5ce00 in ?? ()
> #13 0xb5cd6a90 in ?? ()
> #14 0x00000000 in ?? ()
> (gdb) c
> Continuing.
> No such process
> 
> Program exited normally.
> (gdb)

I wonder what "No such process" might indicate, and how gdb would know how the program exited if there were no process.
If Mozilla is trying to start the crashreporter, it calls execl from the parent process of a fork.  exec seems to confuse gdb sometimes, so maybe that's happening.

Before calling execl our crash signal handler calls a number of function that are not async-signal-safe, so it's possible that there are some race conditions with flash threads, though I'd expect this to cause intermittent problems and possibly more likely hangs rather than terminations.  The functions include:
  clone (not sure why this wouldn't be async-signal-safe, given that fork() is)
  opendir
  readdir
  ptrace

Apparently there have been significant changes in upstream breakpad, so maybe we should wait and see if they make a difference when we pick up those changes.
(In reply to comment #20)
> Apparently there have been significant changes in upstream breakpad, so maybe
> we should wait and see if they make a difference when we pick up those changes.

Those landed this morning in bug 514188, so should be included in tomorrow's nightly build.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.