Open Bug 1053757 Opened 11 years ago Updated 3 years ago

[Linux] filehandles to resources (including /dev/video) not closed when restarting from crashreport

Categories

(Toolkit :: Crash Reporting, defect)

34 Branch
x86
Linux
defect

Tracking

()

People

(Reporter: bmaris, Unassigned)

Details

Attachments

(1 file)

Reproduced on Ubuntu 14.04 32bit using latest Nightly 34.0a1 (buildID: 20140813030201). STR: 1. Start Firefox 2. Install crashme addon: http://ted.mielczarek.org/mozilla/crashme.html 3. Wait for OpenH264 addon to install. (see about:addons) 4. Visit http://mozilla.github.io/webrtc-landing/pc_test.html and start a call. 5. Crash Firefox using one of the options from crashme addon. 6. After submitting the crash open http://mozilla.github.io/webrtc-landing/pc_test.html again and try to make a call. Expected result: After Firefox crashes the call is interrupted. Actual result: After Firefox crashes the call is still on (light from camera is on). If I try another call I get message 'Failure callback: "Starting video failed"' message. Notes: 1. Possible regression, will investigate further. 2. It only reproduces on Linux (I used Ubuntu 14.04 32bit)
After using crashme, please do a "ps wax | grep firefox" to verify that firefox isn't still running (in debug linux builds, it will wait 5 minutes for a debugger to attach before exiting).
All the camera access happens from the main process, not the GMP process, right?
(In reply to Ted Mielczarek [:ted.mielczarek] from comment #2) > All the camera access happens from the main process, not the GMP process, > right? Right. GMP plugin-container has no access. Note that crashing the plugin doesn't stop that tab from capturing the camera; the STR aren't clear as to what "open pc_test.html again and try to make a call" means (same tab? Different tab?) First test would be to crash the plugin, submit, then hit reload for the tab, and see if the active capture indicator goes away.
Flags: needinfo?(bogdan.maris)
(In reply to Randell Jesup [:jesup] from comment #1) > After using crashme, please do a "ps wax | grep firefox" to verify that > firefox isn't still running (in debug linux builds, it will wait 5 minutes > for a debugger to attach before exiting). If I run 'ps wax | grep firefox' I get this: 11477 pts/0 Sl 0:00 /home/bogdanmaris/Documents/Latest Nightly/crashreporter /home/bogdanmaris/.mozilla/firefox/zqlqevcq.Test WEBRTC/minidumps/0df78b79-29ea-a687-119cefaa-19290738.dmp 11487 pts/1 S+ 0:00 grep --color=auto firefox I used the command when the crash reporter is up. If I hit restart Firefox opens the pc_test.html in the same tab. If I open a new tab, load the pc_test.html and start the call, I will get 'Failure callback: "Starting video failed" (https://db.tt/kF5DwqZA). If I go to the first tab where the pc_test.html is loaded already and start a call there as well, the global indicator appears but camera shows nothing (https://db.tt/lHKXp1gK). (In reply to Randell Jesup [:jesup] from comment #3) > First test would be to > crash the plugin, submit, then hit reload for the tab, and see if the active > capture indicator goes away. I used media.gmp.plugin.crash to crash the plugin and after submitting the crash indicator goes away, then if I reloading the tab I can make another call. Note that I did not even start a h264 video, starting h264 video has the same result.
Flags: needinfo?(bogdan.maris) → needinfo?(rjesup)
This appears to be an issue with the CrashReporter's Restart feature; it's not closing all the resources uses by the original process (on linux). Reproduces on Fedora 19 with a crashreporter-enabled m-c opt build.
Component: WebRTC → Breakpad Integration
Flags: needinfo?(rjesup) → needinfo?(ted)
Product: Core → Toolkit
The firefox process does a fork and exec to run the crash reporter: http://hg.mozilla.org/mozilla-central/annotate/0753f7b93ab7/toolkit/crashreporter/nsExceptionHandler.cpp#l825 and then calls _exit(1). The "restart" button in the crash reporter just re-launches the Firefox binary. I don't see how anything the crashreporter binary could do would be a problem here. What resources are we holding on to that aren't being released on _exit?
Flags: needinfo?(ted)
Anthony, is this something you might want to track for Loop?
Flags: needinfo?(anthony.s.hughes)
CCing Maire so she is aware of this bug for Loop. I'm going to suggest we block MVP on this.
Blocks: loop_mvp
Flags: qe-verify+
Flags: needinfo?(bogdan.maris)
Flags: needinfo?(anthony.s.hughes)
Did some more testing: 1) It's not an Openh264 issue. It happens in normal VP8 calls 2) It's not a loop issue. It happens with plain getUserMedia pages, no peerconnections in sight 3) I'm 99% certain this isn't new (unless crashreporter changed a lot recently), and may go back as far as 22. 4) It strongly appears the issue is a failure to release /dev/video when you hit "restart" 5) I believe this may be causing other files to be left open Before restart: ls -l /proc/NNNNN/fd | grep video -> lrwx------ 1 jesup jesup 64 Sep 12 17:30 72 -> /dev/video0 After restart, without re-opening anything (FF home page shown): ls -l /proc/XXXXX/fd -> | grep video lrwx------ 1 jesup jesup 64 Sep 12 17:31 72 -> /dev/video0 This should not block anything. This should get looked at from the crashreporter side; this may be causing other problems for people who only restart the browser when it crashes.
No longer blocks: loop_mvp
Flags: needinfo?(bogdan.maris) → needinfo?(ted)
Summary: [Linux] webRTC call still on after Firefox crash → [Linux] filehandles to resources (including /dev/video) not closed when restarting from crashreport
As I stated in comment 6: by the time you click "Restart" the Firefox process should be *dead*, we call _exit(1). The crash reporter client re-launches a new instance of Firefox, but it should have no impact on the old one. What is holding on to /dev/video? Does it somehow persist past fork/exec?
Flags: needinfo?(ted)
CrashReporter is inheriting at least some open files from firefox. Not only /dev/video0, but also it appears jprof-log (I use --enable-jprof), though there are two fds to it. The default is to inherit fd's on fork/exec/execve. I'll note none of these are opened with O_CLOEXEC/F_DUPFD_CLOEXEC/FD_CLOEXEC Note also that linux mq's are inherited.
I guess I never realized that, but that's terrible. The supported way seems to be "use O_CLOEXEC everywhere". We could fix this particular issue by doing that when opening video devices. To fix the general case I guess we'd have to add some code to iterate and close all open fds before we exec, which doesn't seem to be easy.
This will probably work (but I don't know if it's always safe?): int m = getdtablesize(); for (int i = 3; i < m; i++) { close(i); } This would need to use the sys_ wrappers from linux_syscall_support.h, so it'd have to use sys_getrlimit instead of getdtablesize.
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: