Closed Bug 26926 Opened 25 years ago Closed 24 years ago

sort out ownership model (WAS: shutdown crash in PresShell::ProcessReflowCommands)

Categories

(Core :: Layout, defect, P3)

x86
Windows NT
defect

Tracking

()

VERIFIED DUPLICATE of bug 58830
mozilla0.9

People

(Reporter: warrensomebody, Assigned: nisheeth_mozilla)

References

Details

(Keywords: crash, Whiteboard: [nsbeta3-])

Attachments

(2 files)

I just did a Ctrl-Q to quit the browser, and crashed with the 
following stack trace. I had brought up the browser window, then opened 
mail/news, read 2 messages (imap), and did Ctrl-Q:

021b0a19()
PresShell::ProcessReflowCommands(PresShell * const 0x021b4740, int 0x00000001) 
line 1921
ReflowEvent::HandleEvent(ReflowEvent * const 0x01bada44) line 1851
PL_HandleEvent(PLEvent * 0x03478f00) line 527
PL_ProcessPendingEvents(PLEventQueue * 0x00529140) line 487 + 6 bytes
_md_EventReceiverProc(HWND__ * 0x0feb0c54, unsigned int 0x0000c0b8, unsigned 
int 0x00000000, long 0x00529140) line 975 + 10 bytes
USER32! 77e71820()
00529140()

It seems to me like maybe a dll is being unloaded prematurely, since it dies at 
021b0a19() which doesn't have a name. However, I'm running an optimized build 
with symbols (undef MOZ_DEBUG, MOZ_PROFILE=1) -- maybe something got confused 
here. Another possibility is that it's another VC6 code generation bug, but I 
can't tell. Here's the calling frame:

1919:     mFrameManager->GetRootFrame(&rootFrame);
017F98C2   mov         eax,dword ptr [esi+7Ch]
017F98C5   lea         edx,[rootFrame]
017F98C8   push        edx
017F98C9   push        eax
017F98CA   mov         ecx,dword ptr [eax]
017F98CC   call        dword ptr [ecx+10h]
1920:     nsresult rv=CreateRenderingContext(rootFrame, &rcx);
017F98CF   mov         eax,dword ptr [esi]
017F98D1   lea         ecx,[this]
017F98D4   push        ecx
017F98D5   push        dword ptr [rootFrame]
017F98D8   push        esi
017F98D9   call        dword ptr [eax+88h]
1921:     if (NS_FAILED(rv)) return rv;
pc==> 017F98DF   test        eax,80000000h
017F98E4   jne         PresShell::ProcessReflowCommands+115h (017f99b7)
017F98EA   push        edi

and here's where it dies:

pc=> 021B0A19   or          ebx,dword ptr [ebx]
021B0A1B   add         ah,byte ptr [eax+0Ah]
021B0A1E   sbb         eax,dword ptr [edx]
021B0A20   fadd        dword ptr [eax]
021B0A22   shl         byte ptr [esp+eax-24h],1
021B0A26   sbb         byte ptr [ecx+1],ah
021B0A29   pop         es

But maybe msdev is fooling me, because when I scroll up to see the assembly 
before this, the code immediately realigns or something, and I see this:

021B0A07   add         byte ptr [ecx],dh
021B0A09   add         byte ptr [eax],al
021B0A0B   add         byte ptr [eax],ah
021B0A0D   add         dword ptr [eax],eax
021B0A0F   add         byte ptr [esp-25ABFDE5h],bl
021B0A16   mov         edx,1B0B2001h
021B0A1B   add         ah,byte ptr [eax+0Ah]
021B0A1E   sbb         eax,dword ptr [edx]
021B0A20   fadd        dword ptr [eax]
021B0A22   shl         byte ptr [esp+eax-24h],1
021B0A26   sbb         byte ptr [ecx+1],ah
021B0A29   pop         es

(notice that the line in question becomes 021B0A16 instead of 021B0A19) (!)
Lost.
I just tried several time, and I can't reproduce this. Boo hoo. This is scary.
Nisheeth, the problem is that you're not revoking any outstanding PLEvents when 
the pres shell is destroyed. That means any ReflowEvent objects you create for 
async reflow of reflow commands remain in the NSPR queue even after the pres 
shell has been destroyed.

You need to "revoke" any outstanding PLEvents in the pres shell destructor (or 
whenever the pres shell is notified we're shutting down).

The easiest way to do this is to call RevokeEvents() on the event queue object 
passing in the pres shell as the "owner". That means you need to set the owner 
when calling PL_InitEvent().

See the frame manager destructor code for an example of what to do.

Marking this as something to fix for beta1
Assignee: troy → nisheeth
Keywords: beta1
*** Bug 26599 has been marked as a duplicate of this bug. ***
Thanks for figuring this out, Troy.  I have the fix ready to go.  Will check it 
in once the tree opens today.
Status: NEW → ASSIGNED
Target Milestone: M14
Whiteboard: [PDT+] Expexted Fix Date: by 2/11
Putting crash in the keyword field.
Keywords: crash
The fix for this is checked in...
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Adding verifyme keyword.
Keywords: verifyme
Under Linux build 2000021715 if I open the browser, open mail, and hit ALT-Q the
browser exits and sits there chewing up 85% of my cpu.

OnUnload from XUL
Clean up ...
WEBSHELL- = 3
WEBSHELL- = 2
WEBSHELL- = 1

(Infinite loop..)

If I do not open mail this problem does NOT occur.
With the Feb 23rd (2000022308) builds running under Windows NT, Win 98, Linux 
Redhat 6.0 and Mac 9.0, I can't reproduce the crash described. Warren , could you 
please check this too ?
I'm not sure how this bug got marked fixed. Someone put a comment in here that
just bringing up mail will show you the crash when you quit! Same stack trace.
This has been a problem for us for 3 or 4 weeks now. I didn't realize the bug to
fix it had been marked fixed or I would have  re-opened it sooner.


Re-opening.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Severity: normal → critical
Whiteboard: [PDT+] Expexted Fix Date: by 2/11 → [PDT+]
*** Bug 29644 has been marked as a duplicate of this bug. ***
I can't reproduce this on the Feb 29 build running under Windows 98. Can this 
problem still be reproduced ?
petersen - yes with today's build.  See the duplicate bug that is noted above.  
We get this all the time in mail closing and the stack trace is the same as this 
bug report.  
Is this cross platform or Win only?
linux and mac as per the mail cases from bug 29644 which is marked dup of this 
one.
Using the 3/1 build on NT, I can run mail and shut down cleanly. Not seeing the
crash.
*** Bug 29694 has been marked as a duplicate of this bug. ***
I guess I'm not as lucky as Phil. I still see this all the time in 3/1 builds.
At least 20% of the time when I close a mail or compose window I'll crash with
this stack trace.
I can reproduce this. The stack I see has one more frame than the original one 
quoted (see below). The PresShell in question hasn't yet been destroyed, but it 
looks like some of the other supporting constructs have - including maybe the 
view manager. Still investigating, but don't let that stop someone else from 
looking at it too. :-)

PresShell::CreateRenderingContext(PresShell * const 0x0f62c9d0, nsIFrame * 
0x0eadf298, nsIRenderingContext * * 0x0012fcd4) line 2158 + 33 bytes
PresShell::ProcessReflowCommands(PresShell * const 0x0f62c9d0, int 1) line 2004 
+ 23 bytes
ReflowEvent::HandleEvent() line 1935
HandlePLEvent(ReflowEvent * 0x04f59490) line 1947
PL_HandleEvent(PLEvent * 0x04f59490) line 526 + 10 bytes
PL_ProcessPendingEvents(PLEventQueue * 0x01026d80) line 487 + 9 bytes
_md_EventReceiverProc(HWND__ * 0x00040922, unsigned int 49392, unsigned int 0, 
long 16936320) line 975 + 9 bytes
USER32! 77e71250()
Any chance this goes away when you put in all the rpotts/warren thread safety 
code?
Thanks,
Jim (wild guessing in the bleachers) Roskind
Looks like someone recently changed the ownership model for frames and views 
(Travis?). The content viewer now owns the ViewManager (and hence all Views) 
while the PresContext owns all frames. Since neither are refcounted, it's 
important that their lifetime be identical. The problem is that the PresContext 
can outlive the ViewManager (and does in this case), leaving dangling references 
- both the PresContext's weak reference to the ViewManager and Frame references 
to corresponding Views.

I've attached a hacky patch that allows the PresContext to drop its dangling 
reference early and does early returns in the specific places where we try to 
reference views after the ViewManager has gone away. This deals with the 
immediate problem, but the ownership model needs to be revisisted.
Attached patch patch for beta1Splinter Review
mscott, could you verify that the patch fixes the problem on your machine? I'll 
try to get it reviewed in the meantime.
Whiteboard: [PDT+] → [PDT+] Patch available, waiting for review
I'm running with vidur's patch now and will report back with any problems.
Vidur,
Please land the changes as soon as you feel safe.  For now, I'm putting an 
expected landing of on/before 3/7 so that I won't have to re-read this each time 
;-).
Thanks,
Jim
Whiteboard: [PDT+] Patch available, waiting for review → [PDT+] 3/7 Patch available, waiting for review
I haven't seen any ill side effects nor have I seen this crash yet since running
with Vidur's patch. So far so good.
Temporary patch checked in on 3/6/2000. I'm not closing the bug since the real 
fix is to sort out the ownership model. I'll leave the bug on Nisheeth's list, 
though he should probably forward to Travis.
Keywords: beta1
Whiteboard: [PDT+] 3/7 Patch available, waiting for review
Changing milestone to M16...
Status: REOPENED → ASSIGNED
Target Milestone: M14 → M16
Moving bugs out by one milestone...
Target Milestone: M16 → M17
Move back into M16...
Target Milestone: M17 → M16
Changing summary from "shutdown crash in PresShell::ProcessReflowCommands" to 
"sort out ownership model". Removing crash keyword as we're no longer seeing 
crash.
Keywords: crash
Summary: shutdown crash in PresShell::ProcessReflowCommands → sort out ownership model (WAS: shutdown crash in PresShell::ProcessReflowCommands)
Keywords: crash
Adding crash to keyword field.
M16 has been out for a while now, these bugs target milestones need to be 
updated.
We should try to fix the ownership model (see Vidur's earlier comment) early in 
the beta 3 cycle.  Marking nsbeta3...
Keywords: nsbeta3
With Build ID 2000072520 there's no problem.
No crash.
[nsbeta3-], ->future.  Will reconsider if problems reappear.
Whiteboard: [nsbeta3-]
Target Milestone: M16 → Future
note to QA: also verify http://bugzilla.mozilla.org/show_bug.cgi?id=29694 (dupe 
bug) when this is fixed.
Target Milestone: Future → mozilla0.9

*** This bug has been marked as a duplicate of 58830 ***
Status: ASSIGNED → RESOLVED
Closed: 25 years ago24 years ago
Resolution: --- → DUPLICATE
Verified dup.
Status: RESOLVED → VERIFIED
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: