Closed Bug 1456101 Opened 6 years ago Closed 6 years ago

Intermittent Linux xserver hang with webrtc screen capture hangs user's desktop

Categories

(Core :: WebRTC, defect, P2)

50 Branch
x86_64
Linux
defect

Tracking

()

VERIFIED FIXED
mozilla62
Tracking Status
firefox-esr52 --- wontfix
firefox-esr60 --- verified
firefox60 --- wontfix
firefox61 - wontfix
firefox62 - verified

People

(Reporter: dfetis, Assigned: ng)

References

(Blocks 1 open bug, )

Details

(Keywords: crash)

Attachments

(3 files)

User Agent: Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.181 Safari/537.36

Steps to reproduce:

1- Open https://mozilla.github.io/webrtc-landing/gum_test.html with firefox on linux OS (tested with last Debian, Ubuntu and Mint)
2 - Click on  screen or windows button
3 - Allow cature within the dialog box
4 - Reload page and restart from step 1 multiple times.


Actual results:

After 5 to 10 iteration, desktop freeze and only mouse pointer is active.
We can see some xserver Fatal IO error error in /var/lgsyslog. 

For example with Linux Mint 18 cinammon distribution we got this : 

Gdk-WARNING: t+477,72786s: cinnamon-session: Fatal IO error 11 (Ressource temporairement non disponible) on X server :0.
org.a11y.atspi.Registry[2821]: XIO:  fatal IO error 11 (Resource temporarily unavailable) on X server ":0"




Expected results:

No desktop freeze and no error in syslog.
Severity: normal → critical
Has Regression Range: --- → irrelevant
Has STR: --- → yes
Component: Untriaged → WebRTC
Keywords: crash
OS: Unspecified → Linux
Product: Firefox → Core
Hardware: Unspecified → x86_64
I can reproduce this on a Linux Mint VM.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Rank: 29
Priority: -- → P3
(In reply to Nico Grunbaum [:ng] from comment #1)
> I can reproduce this on a Linux Mint VM.

Can you please provide the crash report?
There is no crash report (it isn't crashing). I switched to nightly and I am no longer able to reproduce. I tried it on Ubuntu 17.10, and the latest Linux Mint 18 live CD. Damien, could you see if you can reproduce this issue with Nightly? It can be downloaded here: https://www.mozilla.org/en-US/firefox/channel/desktop/ (scroll down).
Flags: needinfo?(dfetis)
Hi Nico, 
I can reproduce it with last nighty-build version  ( 61.0a1 (2018-04-27) (64-bit))  on my Linux Mint 18.3.

After looking at GetUserMedia Logs with   NSPR_LOG_MODULES=MediaManager:4,GetUserMedia:4 and logging it to a tempory file.

I see this before the screen freeze : 

[21089:MediaManager]: D/MediaManager ChooseCapability(kFitness) for mCapability (Allocate) --
[21089:MediaManager]: D/MediaManager Video device 0 allocated
[21089:Main Thread]: D/MediaManager GetUserMediaStreamRunnable::Run()
[21089:Main Thread]: D/MediaManager SourceListener 0x7fb4db230400 activating audio=(nil) video=0x7fb4e0536660
[21089:MediaManager]: D/MediaManager virtual nsresult mozilla::MediaEngineRemoteVideoSource::SetTrack(const RefPtr<const mozilla::AllocationHandle>&, const RefPtr<mozilla::SourceMediaStream>&, mozilla::TrackID, const PrincipalHand
le&)
[21089:MediaManager]: D/MediaManager virtual nsresult mozilla::MediaEngineRemoteVideoSource::Start(const RefPtr<const mozilla::AllocationHandle>&)
[21089:MediaManager]: D/MediaManager started all sources
[21089:Main Thread]: D/MediaManager GetUserMediaStreamRunnable::Run: starting success callback following InitializeAsync()
[21089:Main Thread]: D/MediaManager Returning success for getUserMedia()



Then the screen freeze but log continue to be written and firefox process don't crash.

[21089:Main Thread]: D/MediaManager SourceListener 0x7fb4db230400 stopping video track 1
[21089:Main Thread]: D/MediaManager SourceListener 0x7fb4db230400 this was the last track stopped
[21089:Main Thread]: D/MediaManager SourceListener 0x7fb4db230400 stopping
[21089:MediaManager]: D/MediaManager virtual nsresult mozilla::MediaEngineRemoteVideoSource::Stop(const RefPtr<const mozilla::AllocationHandle>&)
[21089:Main Thread]: D/MediaManager SourceListener 0x7fb4db230400 StopSharing
[21089:Main Thread]: D/MediaManager SourceListener 0x7fb4db230400 stopping video track 1
[21089:MediaManager]: D/MediaManager virtual nsresult mozilla::MediaEngineRemoteVideoSource::Deallocate(const RefPtr<const mozilla::AllocationHandle>&)
[21089:MediaManager]: D/MediaManager Video device 0 deallocated
[21144:MediaManager]: D/MediaManager GetUserMediaTask::Run()
[21144:MediaManager]: D/MediaManager virtual nsresult mozilla::MediaEngineRemoteVideoSource::Allocate(const mozilla::dom::MediaTrackConstraints&, const mozilla::MediaEnginePrefs&, const nsString&, const mozilla::ipc::PrincipalInfo&, mozilla::AllocationHandle**, const char**)
[21144:MediaManager]: D/MediaManager ChooseCapability(kFitness) for mCapability (Allocate) ++
[21144:MediaManager]: D/MediaManager bool mozilla::MediaEngineRemoteVideoSource::ChooseCapability(const mozilla::NormalizedConstraints&, const mozilla::MediaEnginePrefs&, const nsString&, webrtc::CaptureCapability&, mozilla::DistanceCalculation)
[21144:MediaManager]: D/MediaManager ChooseCapability: prefs: 640x480 @30fps
[21144:MediaManager]: D/MediaManager Constraints: width: { min: -2147483647, max: 2147483647 }
[21144:MediaManager]: D/MediaManager              height: { min: -2147483647, max: 2147483647 }
[21144:MediaManager]: D/MediaManager              frameRate: { min: -inf, max: inf }
[21144:MediaManager]: D/MediaManager ChooseCapability(kFitness) for mCapability (Allocate) --

After killing Firefox in console mode  (CTRL+ALT+F2) I can return to GUI and all is running well.

So the Firefox process is freezing the windows manager and not crashing it, 
but I didn't look deeper what could cause this.
Flags: needinfo?(dfetis)
I was able to get it to reproduce in nightly, though it took far more attempts (>50). I attached gdb and got a backtrace of the threads in the parent process. There may be a deadlock between thread 59 and thread 1. Thread 59 is in libxcb and thread 1 is in libx11. I am not an X11 expert but I know that multithreaded access to libX11 can be tricky, and I am suspicious of mixed use of the two libraries.
Damien, if possible could you get it to hang again and run
`sudo gdb --batch -ex "thread apply all bt" -p YOUR_FIREFOX_PARENT_PID | tee hang_backtrace.log`
where YOUR_FIREFOX_PARENT_PID is the PID of the Firefox parent process? Then add that file as an attachment here.  Thanks for taking the time to report and help diagnose this.
Flags: needinfo?(dfetis)
Attached file hang_backtrace.log
Nico,
I run your gdb command to get the Firefox thread backtrace log for the freeze situation and it was added to bug attachment files.
Flags: needinfo?(dfetis)
[Tracking Requested - why for this release]: Intermittently hangs the linux user's desktop. Workaround is to open a virtual console and close firefox master process. Not a regression AFAIK.
Assignee: nobody → na-g
Rank: 29 → 13
Priority: P3 → P2
Summary: Linux xserver crash with webrtc screen capture → Intermittent Linux xserver crash with webrtc screen capture
Version: Trunk → 50 Branch
Summary: Intermittent Linux xserver crash with webrtc screen capture → Intermittent Linux xserver crash with webrtc screen capture hangs user's desktop
I don't think we need to track this if it goes back all the way to Fx50, but we'd certainly consider backporting a low-risk fix should one be available.
Attachment #8973217 - Attachment mime type: text/x-log → text/plain
main thread:

  gdk_x11_device_core_window_at_position() grabs the Xserver, which
  "disables processing of requests and close downs on all other connections".

video capture thread:

  From XOpenDisplay(), _XConnectXCB() holds _Xglobal_lock during the call to
  xcb_connect_to_display_with_auth_info().

  _XConnectXCB() is poll()ing via read_setup() from xcb_connect_to_fd() for a
  response on the new connection to the X server.  The server will not
  respond until the main thread releases the grab.

main thread:

  gdk_x11_device_core_window_at_position() triggers XSetErrorHandler(), which
  waits for the video thread to release _Xglobal_lock.
The best fix would be to avoid using gdk_display_get_window_at_pointer() in nsWindow.cpp.
Depends on: 510411
Summary: Intermittent Linux xserver crash with webrtc screen capture hangs user's desktop → Intermittent Linux xserver hang with webrtc screen capture hangs user's desktop
Attachment #8985877 - Flags: review?(dminor)
Comment on attachment 8985877 [details]
Bug 1456101 - ensure X11 DesktopCapture module is created on main thread

https://reviewboard.mozilla.org/r/251384/#review257926

LGTM. With the sync dispatch this seems safe.
Attachment #8985877 - Flags: review?(dminor) → review+
While fixing bug 510411 would probably fix most occurrences of this race, the only way to ensure that a race doesn't occur is to dispatch this to the main thread.
No longer depends on: 510411
See Also: → 510411
This is fairly low risk and limited to Linux where the bug takes down the user's entire desktop. That said, we are in the last half of the soft code freeze. Liz, do you think this is appropriate to land?
Flags: needinfo?(lhenry)
That seems reasonable, please do land it and the fix should end up in the 62.0b2 build by the end of the week.
Flags: needinfo?(lhenry)
Pushed by na-g@nostrum.com:
https://hg.mozilla.org/integration/autoland/rev/fb01c7ab313c
ensure X11 DesktopCapture module is created on main thread r=dminor
https://hg.mozilla.org/mozilla-central/rev/fb01c7ab313c
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Do you want to request uplift to 60.2esr?
Flags: needinfo?(na-g)
Comment on attachment 8985877 [details]
Bug 1456101 - ensure X11 DesktopCapture module is created on main thread

[Approval Request Comment]
If this is not a sec:{high,crit} bug, please state case for ESR consideration: this has a major impact on Google Hangouts and other sites which use screen sharing
User impact if declined: sites that use gUM screen sharing may cause the users entire desktop to freeze
Fix Landed on Version: 62
Risk to taking this patch (and alternatives if risky): low, furthermore even if this introduces a crash that would be much preferred to the current behavior of taking down the user's desktop session
String or UUID changes made by this patch: none
Flags: needinfo?(na-g)
Attachment #8985877 - Flags: approval-mozilla-esr60?
Comment on attachment 8985877 [details]
Bug 1456101 - ensure X11 DesktopCapture module is created on main thread

Fixes screen freezes for users using screen sharing. Approved for ESR 60.2.
Attachment #8985877 - Flags: approval-mozilla-esr60? → approval-mozilla-esr60+
I have managed to reproduce this bug on an affected version 62.0b1(buildID=20180619022742).
I've verified this bug on build 62.0 (buildID=20180827144429) and 60.2.0esr (buildID=20180828172101), using the STR from comment 0. 

This was tested on Ubuntu 16.04x64.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
See Also: → 1558475
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: