Closed Bug 583053 Opened 14 years ago Closed 14 years ago

hang/corruption while running Rdio

Categories

(Core Graveyard :: Plug-ins, defect)

x86
Windows 7
defect
Not set
normal

Tracking

(blocking2.0 betaN+)

VERIFIED FIXED
mozilla2.0b5
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: shaver, Assigned: benjamin)

References

Details

(Keywords: relnote)

Attachments

(2 files)

Attached file windbg session
I have Rdio as an app tab, and sometimes when I switch back to it, the browser gets into a funny hang.  I often (but not always) have the horizontal-resize cursor everywhere in the window.  The window *controls* are still responsive, and if I minimize and then restore I get the UI drawn but the content area blank.  Pressing the close button treats me to a crash like 

bp-5b3b217b-e514-492d-a454-978e42100729

I attached Windbg to the firefox process and poked a bit, also the plugin-container process, didn't look like the symbols were correct or something?  I'll attach, I am not good at windbg.

D2D + glass enabled, Mozilla/5.0 (Windows; Windows NT 6.1; WOW64; rv:2.0b3pre) Gecko/20100728 Minefield/4.0b3pre, Win 7 64, flash 10.1.53.64
Ah yes, killing the plugin-container process doesn't give me anything back.  Tried waiting a few mins after, even tried multi.

Also: rdio continues to play music in the background without difficulty.
Jim, could you investigate this? If not let me know and I'll see if I can find someone else. Shaver, I take it that this is a regression?
Assignee: nobody → jmathies
blocking2.0: ? → betaN+
Sure, can do as soon as I can get in! Looks like they might be in some sort of limited trial beta?

"Thanks for your interest in Rdio. We've got your email address and will let you know when you can sign up."


Maybe we could contact them for a test account or something? I'm not familiar with the service.
I sent you an invite.  I think it's a few days of free trial, I'll get them to extend it if needed.  I think it happens if I switch back to the tab after a song change, but I'm not 100% sure.

Most recently I restarted my browser, which resumed my rdio stream from mid-song (woo) in the background, and then read bugmail.  Switching to it (I think for the first time this session) to send you the invite hung me.
Do you need it extended?
(In reply to comment #5)
> Do you need it extended?

Just got it, thx.
Hmm, unfortunately I'm not having much luck reproducing. I'll keep it running for the the trial period. I guess I have 10 invites now as well if anyone else would care to take this for a spin.
Mike, do you have the desktop app installed?
(In reply to comment #8)
> Mike, do you have the desktop app installed?

Hrm, doesn't seem to change anything.
I've seen this as well, and it's unrelated to app tabs or switching -- I often get it after a few minutes of rearranging/adding/deleting stuff in their collection view.  The content area stops repainting, and the cursor doesn't update based on whatever it was when it crossed the window threshold.

However, if I resize the window, chrome still reflows correctly and gets repainted, just not the content area of the tab.  Attaching with windbg, there's nothing obviously wrong; events are being received and processed, just nothing is happening when I try to click or similar.

I've seen this happen before in debug builds, usually when we end up calling through some object that's been deleted and jump through a busted vtable.  We end up triggering an assert in the event manager that doesn't allow us to dispatch events while we're executing javascript (or something similar -- it spews lots of asserts in a debug build), and the events don't actually get handled by gecko.  This fits the behaviour I'm seeing (win32 event loop is spinning, but it's like gecko doesn't get the input events).

I also don't think it has anything to do with plugins, as killing the plugin-container doesn't change anything.  I'm running with GDI with dwrite enabled; shaver, are you running with GDI or d2d?
Summary: hang when switching back to Rdio tab → hang/corruption while running Rdio
I can't *stop* reproducing this.  I have D2D on, no dwrite.

Should I take a full memory dump the next time it happens?
Yeah, this is what I thought it was -- we get into this state on windows:

###!!! ASSERTION: Want to fire mutation events, but it's not safe: '(aNode->IsNodeOfType(nsINode::eCONTENT) && static_cast<nsIContent*>(aNode)-> IsInNativeAnonymousSubtree()) || sScriptBlockerCount == sRemovableScriptBlockerCount', file c:/proj/firefox/content/base/src/../../../../mozilla-central/content/base/src/nsContentUtils.cpp, line 3561
###!!! ASSERTION: Want to fire mutation events, but it's not safe: '(aNode->IsNodeOfType(nsINode::eCONTENT) && static_cast<nsIContent*>(aNode)-> IsInNativeAnonymousSubtree()) || sScriptBlockerCount == sRemovableScriptBlockerCount', file c:/proj/firefox/content/base/src/../../../../mozilla-central/content/base/src/nsContentUtils.cpp, line 3561
###!!! ASSERTION: constructing frames in the middle of a paint: 'mPresContext->mLayoutPhaseCount[eLayoutPhase_Paint] == 0', file c:\proj\mozilla-central\layout\base\nsPresContext.h, line 1324
###!!! ASSERTION: constructing frames in the middle of a paint: 'mPresContext->mLayoutPhaseCount[eLayoutPhase_Paint] == 0', file c:\proj\mozilla-central\layout\base\nsPresContext.h, line 1324

etc.

This is my debug build failure case when we called into some virtual method of  deleted object or similar, and the net result is that we never allow events to be dispatched, thus we never paint.  It's really annoying.  Someone needs to run with valgrind on linux, and the error should be obvious.  It should also crash on linux, and maybe on OSX.  If noone beats me to it I'll give it a go shortly.
So I tested this out with thunder's mac, with the latest nightly, and could not reproduce it.  So it might be windows only, which means this is going to be a great opportunity for me to learn how to use our replay debugging setup at the office.  I'm going to move this to Core:General until I figure out where the culprit is; along the way I hope to find out how to make the weird exception handling behaviour not happen!
Assignee: jmathies → vladimir
Component: Plug-ins → General
QA Contact: plugins → general
Assignee: vladimir → nobody
Component: General → Plug-ins
QA Contact: general → plugins
Assignee: nobody → benjamin
:bs: did we get bent to look at this? really want it in the beta if possible.
Attachment #465773 - Flags: review?(bent.mozilla)
Comment on attachment 465773 [details] [diff] [review]
Make the double-pass rendering event win RPC races, rev. 1

>+        if (DoublePassRenderingEvent() == npevent->event) {
>+            CallPaint(npremoteevent, &handled);
>+            return handled;
>+        }
>+
>+        switch (npevent->event) {
>             case WM_PAINT:
>             {
>                 RECT rect;
>                 SharedSurfaceBeforePaint(rect, npremoteevent);
>                 CallPaint(npremoteevent, &handled);
>                 SharedSurfaceAfterPaint(npevent);
>                 return handled;
>             }

So... You could still do | case DoublePassRenderingEvent(): | in the switch statement right? And I guess I don't know enough to say for sure but shouldn't you be calling the SharedSurface[Before|After]Paint like in the WM_PAINT case? Maybe just make DoublePassRenderingEvent() and WM_PAINT share the same code?
Preeeeeetty sure that case labels in C++ need to be compile-time constants of integer type.  Too much JS or other friendly languages for you, sir!
Yeah, can't do the switch thing. I don't really know anything about SharedSurfacebefore/after, I was just replicating the existing codepath.
Comment on attachment 465773 [details] [diff] [review]
Make the double-pass rendering event win RPC races, rev. 1

Ok, let's do this then. Maybe keep r? for jimm when he gets back about the SharedMem stuff.
Attachment #465773 - Flags: review?(bent.mozilla) → review+
http://hg.mozilla.org/mozilla-central/rev/f52342744bda
http://hg.mozilla.org/mozilla-central/rev/d233936ab314

I'm concerned that we're doing double-pass plugin rendering here, especially with dwrite disabled: that could become a plugin bottleneck pretty quickly. I know we were doing it for a while when d2d was on because we don't have readback (or readback is very slow).

But async plugin rendering (bug 556487) should fix this properly, if I understand that bug correctly.

Marking this FIXED for blocker-tracking purposes, leaving the review request for jimm.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Depends on: 587715
Comment on attachment 465773 [details] [diff] [review]
Make the double-pass rendering event win RPC races, rev. 1

This is fine, the double pass event acts as a signal indicating what's coming up next. So there's no painting involved on it.
Attachment #465773 - Flags: review?(jmathies) → review+
(In reply to comment #25)
> I'm concerned that we're doing double-pass plugin rendering here, especially
> with dwrite disabled: that could become a plugin bottleneck pretty quickly. I
> know we were doing it for a while when d2d was on because we don't have
> readback (or readback is very slow).

If a windowless plugin is semi-transparent, double pass rendering occurs. This is (according to roc) a very rare case so we just take the hit.
Been using rdio for about an hour now with no issues; used to be that I'd die within seconds, or at most a minute.  Yay!
Status: RESOLVED → VERIFIED
Target Milestone: --- → mozilla2.0b5
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: