Closed Bug 1195303 Opened 4 years ago Closed 4 years ago

[e10s] Ubuntu completely freezes when sharing a window

Categories

(Core :: WebRTC: Audio/Video, defect, P1, major)

All
Linux
defect

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
e10s + ---
Blocking Flags:

People

(Reporter: bogdan_maris, Assigned: pkerr)

References

Details

Affected builds:
- Latest Aurora 42.0a2 e10s enabled (could not reproduce with e10s disabled)

Affected OS`s:
- Ubuntu 14.04 32-bit

STR:
1. Start a conversation.
2. Share a link to the conversation to another PC.
3. Share a window.

Expected behavior: The call and sharing window work as intended.

Actual results: Ubuntu freezes completely, only way to recover is a restart by entering ctrl+alt+F1 or force restart.

Notes:
- Did not reproduce without e10s enabled.
- I need to investigate this further.
Needinfo on Mark, in case I did not cc-ed the right persons and he knows someone who could take a look at this.
Flags: needinfo?(standard8)
This almost has to be a Platform issue, not a Hello front-end issue.

Bogdan -- Can you try to repro on Nightly?  And also on Fedora?  (I'm wondering if this affects all Linux distros or just Ubuntu.)

There is a bug that is *very* close to landing (Bug 1104616) that will rewrite most of the platform logic around video sharing; the new bug adds video sandboxing support.  I fully expect 1104616  to land in Fx43 -- very likely this week.  So I think it's worth getting a regression range (if there is one -- or if it's always been this way) and then retesting this bug as soon as 1104616 lands.  If 1104616 "fixes" the problem, we may choose to do nothing here since e10s in Fx42 will not go to Release.

I'm also needinfo'ing Florin since I believe Bogdan is on PTO until 7th of Sept.  Florin -- If you can find someone on your team to investigate this further (as I've described immediately above) before the 7th, I'd really appreciate it.  Thanks!
backlog: --- → webrtc/webaudio+
Component: Client → WebRTC: Audio/Video
Flags: needinfo?(standard8)
Flags: needinfo?(florin.mezei)
Flags: needinfo?(bogdan.maris)
Product: Hello (Loop) → Core
I'm making this a low P1 until we more fully understand the scope/extent of this bug.
Rank: 19
Priority: -- → P1
See Also: → 1104616
I did reproduce using latest Aurora/latest Nightly on Machine (1):
1.1. OS`s:
- Fedora 17 x64
- Ubuntu 12.04 x86
1.2. Graphic card:
- AMD Radeon HD 3000
1.3 Processor:
- AMD Athlon II X3 450 3.2 Ghz

Machine (2):
2.1 OS`s:
- Ubuntu 14.04 x86
- Ubuntu 14.04 x64
2.2. Graphics card
- AMD Radeon HD 6450
2.3. Processor:
- AMD FX(tm)-8320 Eight-Core Processor 3.50Ghz

Machine (3):
3.1 OS`s:
- Ubuntu 14.04 x86
3.2. Graphics card:
 - AMD Radeon HD 3000
3.3. Processor:
- AMD FX(tm)-8320 Eight-Core Processor 3.50Ghz

I was UNABLE to reproduce using latest Aurora/latest Nightly on Machine (3) with Fedora 22. Did not test with a Nvidia graphics card yet, will try and get one this week and see how it compares.
Flags: needinfo?(florin.mezei)
Flags: needinfo?(bogdan.maris)
(In reply to Bogdan Maris, QA [:bogdan_maris] from comment #4)
> Did not test with a Nvidia graphics card yet, will try and
> get one this week and see how it compares.

I had a chance to test using this setup:
Machine (4):
2.1 OS`s:
- Ubuntu 14.04 x86
2.2. Graphics card
- NVIDIA GeForce 210
2.3. Processor:
- AMD FX(tm)-8320 Eight-Core Processor 3.50Ghz

I did reproduced the freeze a bit harder but I did.
Please let me know if I can help further on.
Flags: needinfo?(mreavy)
Thanks, Bogdan.  Can you retest with the very latest Nightly (which would be Nightly from yesterday, August 26th, since I don't believe today's Nightly has finished building)?    

The fix for Bug 1194397 just landed;  it made it into Aug 26th's Nightly but not earlier.  Given the symptoms, there's a chance that this bug is a dup of (or at least resolved by) Bug 1194937. 

Please retest with the machine that is easiest to repro this problem on and report back.  Thanks for your help with this!
Flags: needinfo?(mreavy) → needinfo?(bogdan.maris)
I can still reproduce on Machine (2) -see comment 4- using Ubuntu 14.04 32-bit with builds:
- latest Nightly from 2015-08-27 (buildID: 20150827030213)
https://hg.mozilla.org/mozilla-central/rev/f8086bd3c84fc1a42c3625cf3cc2253f0a5e8cfd
- latest Nightly tinderbox (buildID: 20150827183149)
https://hg.mozilla.org/mozilla-central/rev/87e23922be375985d0b1906ed5ba5f095f323a38
Flags: needinfo?(bogdan.maris) → needinfo?(mreavy)
I just tried it with 43.0a1 (2015-08-31) on my 14.04.3 LTS 64bit Linux. No problem.

Bogdan, did you always tested with the same far end client?

I can try with Aurora tomorrow.
Reproducible for me with 42. STR as in Bogdan's initial report.
Thanks, Nils.  Can you get a rough range for this on your machine?  For example, can you repro this on a Nightly build from 2 weeks ago?

Also needinfo'ing pkerr since he has an Ubuntu machine where he can also try to repro. I'm very interested in how easy this is to hit when running e10s.
Flags: needinfo?(pkerr)
Flags: needinfo?(mreavy)
Flags: needinfo?(drno)
For me on Nightly the fix must have landed between Nightly 2015-08-30 and 2015-08-31.

That makes Bug 1194397 rather unlikely as that should have hit Nightly before 08-30.

I'll try bisect inbound.
Flags: needinfo?(drno)
The fix for me is in bug 1199573.
Depends on: 1199573
Did a build from mozilla-central today. Running on Ubuntu 14.04.2. Running Hello from Ubuntu machine to stand-alone client running on OS X Nightly. Ubuntu machine locked up completely when trying to share a window from the Ubuntu machine (the main desktop).
Flags: needinfo?(pkerr)
(In reply to Paul Kerr [:pkerr] from comment #13)
> Did a build from mozilla-central today. Running on Ubuntu 14.04.2. Running
> Hello from Ubuntu machine to stand-alone client running on OS X Nightly.
> Ubuntu machine locked up completely when trying to share a window from the
> Ubuntu machine (the main desktop).

Strange. I just pulled central, did a clobber and build and it work fine for me. The other side of the call is 42 on Mac as well.
As my problem seems to be audio related: my Ubuntu is using Pulse audio. Is your Ubuntu using something else Paul?
Flags: needinfo?(pkerr)
My system is using pulse audio. I am running a non-debug optimized build of central.
Flags: needinfo?(pkerr)
On subsequent tests, I was able to regain control over mouse movement but the window manager (Unity) did not respond to any mouse actions. Dropping into the console and killing Nightly was necessary to get back to normal. My experience, after the first test, matches that of Nils.
Paul, is this fixed by bug 1199573?
Flags: needinfo?(pkerr)
This is still broken for me with 1199573 in the build. From the log:

changeset:   259960:9af640b297a1
user:        Jean-Yves Avenard <jyavenard@mozilla.com>
date:        Fri Aug 28 23:56:16 2015 +1000
summary:     Bug 1199573: [MSE] Properly handle partial media header received prior a discontinuity. r=gerald
Flags: needinfo?(pkerr)
Flags: needinfo?(twalker)
I am unable to reproduce this on Ubuntu 14.04.2 (VM).  What service are you all using to "Share" the conversation?  I used yahoo mail.
Flags: needinfo?(twalker) → needinfo?(bogdan.maris)
(In reply to [:tracy] Tracy Walker - QA Mentor from comment #19)
> I am unable to reproduce this on Ubuntu 14.04.2 (VM).  What service are you
> all using to "Share" the conversation?  I used yahoo mail.

I simply shared a terminal window. But since bug 1199573 has landed in 43 and 42 this is fixed for me.
(In reply to [:tracy] Tracy Walker - QA Mentor from comment #19)
> I am unable to reproduce this on Ubuntu 14.04.2 (VM).  What service are you
> all using to "Share" the conversation?  I used yahoo mail.

I shared terminal, Firefox, VLC, skype and others and I sent the link through facebook chat, skype, evernote. 

(In reply to Nils Ohlmeier [:drno] from comment #20)
> (In reply to [:tracy] Tracy Walker - QA Mentor from comment #19)
> > I am unable to reproduce this on Ubuntu 14.04.2 (VM).  What service are you
> > all using to "Share" the conversation?  I used yahoo mail.
> 
> I simply shared a terminal window. But since bug 1199573 has landed in 43
> and 42 this is fixed for me.

I can still reproduce this using latest Nightly 43.0a1 (20150916030203) on Ubuntu 14.04 32-bit.
Flags: needinfo?(bogdan.maris)
Flags: needinfo?(twalker)
Bogdan,

Do you think there is a correlation between particular OS and graphics card?  Looking to narrow this down.  Can you get a stack for the hang? see instructions in another bug comment https://bugzilla.mozilla.org/show_bug.cgi?id=1199602#c9

Thanks.
Flags: needinfo?(bogdan.maris)
(In reply to [:tracy] Tracy Walker - QA Mentor from comment #22)
> Bogdan,
> 
> Do you think there is a correlation between particular OS and graphics card?
> Looking to narrow this down.  Can you get a stack for the hang? see
> instructions in another bug comment
> https://bugzilla.mozilla.org/show_bug.cgi?id=1199602#c9
> 
> Thanks.

I have only tested with machines with AMD processors or graphics card on different OS's (see comment 4 and comment 5). It would be interesting to test on Intel machines with nvidia maybe, but we don't have that kind of setup at the office.

I can't use the instructions from bug 1199602 comment 9 because my OS completely hangs and I can't recover without a restart. Don't know if there is a way to get a stack in this conditions.
Flags: needinfo?(bogdan.maris)
Hmmm, so the system never completely recovers on its own?  I'm not sure how we can get this bug into an actionable state for the developers, unless it becomes reproducible for Nils or Paul again and they can share their machine.
Flags: needinfo?(twalker)
(In reply to [:tracy] Tracy Walker - QA Mentor from comment #24)
> Hmmm, so the system never completely recovers on its own?  I'm not sure how
> we can get this bug into an actionable state for the developers, unless it
> becomes reproducible for Nils or Paul again and they can share their machine.

I never left it more then 30 minutes to see if it recovers but during that time, it didn`t.
Flags: needinfo?(blassey.bugs)
Jim, what information do you need?
Flags: needinfo?(blassey.bugs)
Maire, what is the status here? Is this still an issue and if so how are we going to address it?
Flags: needinfo?(mreavy)
I was actually just looking at this since I'm reviewing all the P1s.  There's a high likelihood that this is tied to specific hardware and/or drivers.  Nils' issue was resolved; so what remains is relatively hard to repro.  Further, it's most likely to be in code outside of WebRTC and outside of Firefox.  

Pkerr can repro it, but I need him to finish importing branch 43 (that's more important than this).  My plan is for pkerr to capture it in the debugger if possible (or add enough logging to find out what it's doing) and isolate the cause -- then we can determine if it is a bug in our code or if there is a possible workaround.  I may wind up transferring this to gfx if we think we need to blacklist a driver.

In the meantime, I'm lowering this to a P2 with the intention of getting it resolved (either fixed or worked around) in the next few weeks.
Rank: 19 → 25
Flags: needinfo?(mreavy)
Priority: P1 → P2
To be safe I just tested again with Nightly and now I have some really strange behavior:
Sharing terminal window 1 within a Loop calls works fine. But if I share another terminal window it freezes X11. Again I can switch back to the command line via CRTL+ALT+F1 and 'killall firefox' restores my X11 to normal behavior.
Assigning to pkerr to investigate.  Given the e10s release plans, bumping to a p1 until we know if there's anything we can do to fix or work around this, or if we need to relnote it.
Assignee: nobody → pkerr
Rank: 25 → 15
Flags: needinfo?(pkerr)
Priority: P2 → P1
I've seen similar behavior on a Debian Jessie /  nvidia-driver | 352.55-2 system while debugging. If it happens again I'll dig more.
Flags: needinfo?(pkerr)
So I tried today again with a local build from today's Nightly on my Ubuntu 14.04.3 with talky.io (because Hello in Nightly no longer allows window sharing) and with Fx 42 official release build and Firefox Hello. No luck. It never freezes for me any more.
The new way that the Hello UX works (tab sharing for the start of the session, only able to run in non-e10s mode) no longer causes me any lock-up on Ubuntu. As with Nils' experience, it also works for talky.io.

My configuration:
Ubuntu 14.0.4.3 LTS
NVIDIA Quadro K600 1GB (GK107GL)
NVIDIA Driver version 304.131-0ubuntu
xserver-xorg-video-mach64 6.9.1-1build1
We don't have a way to attack this anymore -- and I also believe it's very hard to hit.  I'm resolving this as incomplete, but we'll keep a look out to see if there are any more reports like this.
Status: NEW → RESOLVED
Closed: 4 years ago
Rank: 15 → 12
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.