Closed
Bug 26926
Opened 25 years ago
Closed 24 years ago
sort out ownership model (WAS: shutdown crash in PresShell::ProcessReflowCommands)
Categories
(Core :: Layout, defect, P3)
Tracking
()
mozilla0.9
People
(Reporter: warrensomebody, Assigned: nisheeth_mozilla)
References
Details
(Keywords: crash, Whiteboard: [nsbeta3-])
Attachments
(2 files)
20.29 KB,
image/jpeg
|
Details | |
1.90 KB,
patch
|
Details | Diff | Splinter Review |
I just did a Ctrl-Q to quit the browser, and crashed with the following stack trace. I had brought up the browser window, then opened mail/news, read 2 messages (imap), and did Ctrl-Q: 021b0a19() PresShell::ProcessReflowCommands(PresShell * const 0x021b4740, int 0x00000001) line 1921 ReflowEvent::HandleEvent(ReflowEvent * const 0x01bada44) line 1851 PL_HandleEvent(PLEvent * 0x03478f00) line 527 PL_ProcessPendingEvents(PLEventQueue * 0x00529140) line 487 + 6 bytes _md_EventReceiverProc(HWND__ * 0x0feb0c54, unsigned int 0x0000c0b8, unsigned int 0x00000000, long 0x00529140) line 975 + 10 bytes USER32! 77e71820() 00529140() It seems to me like maybe a dll is being unloaded prematurely, since it dies at 021b0a19() which doesn't have a name. However, I'm running an optimized build with symbols (undef MOZ_DEBUG, MOZ_PROFILE=1) -- maybe something got confused here. Another possibility is that it's another VC6 code generation bug, but I can't tell. Here's the calling frame: 1919: mFrameManager->GetRootFrame(&rootFrame); 017F98C2 mov eax,dword ptr [esi+7Ch] 017F98C5 lea edx,[rootFrame] 017F98C8 push edx 017F98C9 push eax 017F98CA mov ecx,dword ptr [eax] 017F98CC call dword ptr [ecx+10h] 1920: nsresult rv=CreateRenderingContext(rootFrame, &rcx); 017F98CF mov eax,dword ptr [esi] 017F98D1 lea ecx,[this] 017F98D4 push ecx 017F98D5 push dword ptr [rootFrame] 017F98D8 push esi 017F98D9 call dword ptr [eax+88h] 1921: if (NS_FAILED(rv)) return rv; pc==> 017F98DF test eax,80000000h 017F98E4 jne PresShell::ProcessReflowCommands+115h (017f99b7) 017F98EA push edi and here's where it dies: pc=> 021B0A19 or ebx,dword ptr [ebx] 021B0A1B add ah,byte ptr [eax+0Ah] 021B0A1E sbb eax,dword ptr [edx] 021B0A20 fadd dword ptr [eax] 021B0A22 shl byte ptr [esp+eax-24h],1 021B0A26 sbb byte ptr [ecx+1],ah 021B0A29 pop es But maybe msdev is fooling me, because when I scroll up to see the assembly before this, the code immediately realigns or something, and I see this: 021B0A07 add byte ptr [ecx],dh 021B0A09 add byte ptr [eax],al 021B0A0B add byte ptr [eax],ah 021B0A0D add dword ptr [eax],eax 021B0A0F add byte ptr [esp-25ABFDE5h],bl 021B0A16 mov edx,1B0B2001h 021B0A1B add ah,byte ptr [eax+0Ah] 021B0A1E sbb eax,dword ptr [edx] 021B0A20 fadd dword ptr [eax] 021B0A22 shl byte ptr [esp+eax-24h],1 021B0A26 sbb byte ptr [ecx+1],ah 021B0A29 pop es (notice that the line in question becomes 021B0A16 instead of 021B0A19) (!) Lost.
Reporter | ||
Comment 1•25 years ago
|
||
I just tried several time, and I can't reproduce this. Boo hoo. This is scary.
Nisheeth, the problem is that you're not revoking any outstanding PLEvents when the pres shell is destroyed. That means any ReflowEvent objects you create for async reflow of reflow commands remain in the NSPR queue even after the pres shell has been destroyed. You need to "revoke" any outstanding PLEvents in the pres shell destructor (or whenever the pres shell is notified we're shutting down). The easiest way to do this is to call RevokeEvents() on the event queue object passing in the pres shell as the "owner". That means you need to set the owner when calling PL_InitEvent(). See the frame manager destructor code for an example of what to do. Marking this as something to fix for beta1
Assignee: troy → nisheeth
Keywords: beta1
Assignee | ||
Comment 4•25 years ago
|
||
Thanks for figuring this out, Troy. I have the fix ready to go. Will check it in once the tree opens today.
Status: NEW → ASSIGNED
Target Milestone: M14
Assignee | ||
Updated•25 years ago
|
Whiteboard: [PDT+] Expexted Fix Date: by 2/11
Assignee | ||
Comment 6•25 years ago
|
||
The fix for this is checked in...
Status: ASSIGNED → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Under Linux build 2000021715 if I open the browser, open mail, and hit ALT-Q the browser exits and sits there chewing up 85% of my cpu. OnUnload from XUL Clean up ... WEBSHELL- = 3 WEBSHELL- = 2 WEBSHELL- = 1 (Infinite loop..) If I do not open mail this problem does NOT occur.
Comment 10•25 years ago
|
||
With the Feb 23rd (2000022308) builds running under Windows NT, Win 98, Linux Redhat 6.0 and Mac 9.0, I can't reproduce the crash described. Warren , could you please check this too ?
Comment 11•25 years ago
|
||
I'm not sure how this bug got marked fixed. Someone put a comment in here that just bringing up mail will show you the crash when you quit! Same stack trace. This has been a problem for us for 3 or 4 weeks now. I didn't realize the bug to fix it had been marked fixed or I would have re-opened it sooner. Re-opening.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Severity: normal → critical
Whiteboard: [PDT+] Expexted Fix Date: by 2/11 → [PDT+]
Comment 12•25 years ago
|
||
*** Bug 29644 has been marked as a duplicate of this bug. ***
Comment 13•25 years ago
|
||
I can't reproduce this on the Feb 29 build running under Windows 98. Can this problem still be reproduced ?
Comment 14•25 years ago
|
||
petersen - yes with today's build. See the duplicate bug that is noted above. We get this all the time in mail closing and the stack trace is the same as this bug report.
Comment 15•25 years ago
|
||
Is this cross platform or Win only?
Comment 16•25 years ago
|
||
linux and mac as per the mail cases from bug 29644 which is marked dup of this one.
Comment 17•25 years ago
|
||
Using the 3/1 build on NT, I can run mail and shut down cleanly. Not seeing the crash.
Comment 18•25 years ago
|
||
*** Bug 29694 has been marked as a duplicate of this bug. ***
Comment 19•25 years ago
|
||
I guess I'm not as lucky as Phil. I still see this all the time in 3/1 builds. At least 20% of the time when I close a mail or compose window I'll crash with this stack trace.
Comment 20•25 years ago
|
||
I can reproduce this. The stack I see has one more frame than the original one quoted (see below). The PresShell in question hasn't yet been destroyed, but it looks like some of the other supporting constructs have - including maybe the view manager. Still investigating, but don't let that stop someone else from looking at it too. :-) PresShell::CreateRenderingContext(PresShell * const 0x0f62c9d0, nsIFrame * 0x0eadf298, nsIRenderingContext * * 0x0012fcd4) line 2158 + 33 bytes PresShell::ProcessReflowCommands(PresShell * const 0x0f62c9d0, int 1) line 2004 + 23 bytes ReflowEvent::HandleEvent() line 1935 HandlePLEvent(ReflowEvent * 0x04f59490) line 1947 PL_HandleEvent(PLEvent * 0x04f59490) line 526 + 10 bytes PL_ProcessPendingEvents(PLEventQueue * 0x01026d80) line 487 + 9 bytes _md_EventReceiverProc(HWND__ * 0x00040922, unsigned int 49392, unsigned int 0, long 16936320) line 975 + 9 bytes USER32! 77e71250()
Comment 21•25 years ago
|
||
Any chance this goes away when you put in all the rpotts/warren thread safety code? Thanks, Jim (wild guessing in the bleachers) Roskind
Comment 22•25 years ago
|
||
Looks like someone recently changed the ownership model for frames and views (Travis?). The content viewer now owns the ViewManager (and hence all Views) while the PresContext owns all frames. Since neither are refcounted, it's important that their lifetime be identical. The problem is that the PresContext can outlive the ViewManager (and does in this case), leaving dangling references - both the PresContext's weak reference to the ViewManager and Frame references to corresponding Views. I've attached a hacky patch that allows the PresContext to drop its dangling reference early and does early returns in the specific places where we try to reference views after the ViewManager has gone away. This deals with the immediate problem, but the ownership model needs to be revisisted.
Comment 23•25 years ago
|
||
Comment 24•25 years ago
|
||
mscott, could you verify that the patch fixes the problem on your machine? I'll try to get it reviewed in the meantime.
Whiteboard: [PDT+] → [PDT+] Patch available, waiting for review
Comment 25•25 years ago
|
||
I'm running with vidur's patch now and will report back with any problems.
Comment 26•25 years ago
|
||
Vidur, Please land the changes as soon as you feel safe. For now, I'm putting an expected landing of on/before 3/7 so that I won't have to re-read this each time ;-). Thanks, Jim
Whiteboard: [PDT+] Patch available, waiting for review → [PDT+] 3/7 Patch available, waiting for review
Comment 27•25 years ago
|
||
I haven't seen any ill side effects nor have I seen this crash yet since running with Vidur's patch. So far so good.
Comment 28•25 years ago
|
||
Temporary patch checked in on 3/6/2000. I'm not closing the bug since the real fix is to sort out the ownership model. I'll leave the bug on Nisheeth's list, though he should probably forward to Travis.
Keywords: beta1
Whiteboard: [PDT+] 3/7 Patch available, waiting for review
Assignee | ||
Comment 29•24 years ago
|
||
Changing milestone to M16...
Status: REOPENED → ASSIGNED
Target Milestone: M14 → M16
Comment 32•24 years ago
|
||
Changing summary from "shutdown crash in PresShell::ProcessReflowCommands" to "sort out ownership model". Removing crash keyword as we're no longer seeing crash.
Keywords: crash
Summary: shutdown crash in PresShell::ProcessReflowCommands → sort out ownership model (WAS: shutdown crash in PresShell::ProcessReflowCommands)
Comment 33•24 years ago
|
||
Adding crash to keyword field.
Comment 34•24 years ago
|
||
M16 has been out for a while now, these bugs target milestones need to be updated.
Assignee | ||
Comment 35•24 years ago
|
||
We should try to fix the ownership model (see Vidur's earlier comment) early in the beta 3 cycle. Marking nsbeta3...
Keywords: nsbeta3
Comment 36•24 years ago
|
||
With Build ID 2000072520 there's no problem. No crash.
Comment 37•24 years ago
|
||
[nsbeta3-], ->future. Will reconsider if problems reappear.
Whiteboard: [nsbeta3-]
Target Milestone: M16 → Future
Comment 38•24 years ago
|
||
note to QA: also verify http://bugzilla.mozilla.org/show_bug.cgi?id=29694 (dupe bug) when this is fixed.
Updated•24 years ago
|
Target Milestone: Future → mozilla0.9
Assignee | ||
Comment 39•24 years ago
|
||
*** This bug has been marked as a duplicate of 58830 ***
Status: ASSIGNED → RESOLVED
Closed: 25 years ago → 24 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•