Closed
Bug 530070
Opened 15 years ago
Closed 15 years ago
Firefox crashes in [@ nsWindow::GetParentWindow(int)] called from nsLayoutUtils::TranslateWidgetToView
Categories
(Core :: Widget: Win32, defect)
Tracking
()
RESOLVED
FIXED
mozilla1.9.3a5
People
(Reporter: jimm, Assigned: MatsPalmgren_bugz)
References
Details
(Keywords: crash, topcrash, Whiteboard: [sg:critical?] [no steps to reproduce][ccbr][critsmash:patch])
Crash Data
Attachments
(3 files, 2 obsolete files)
4.53 KB,
patch
|
Details | Diff | Splinter Review | |
14.28 KB,
patch
|
Details | Diff | Splinter Review | |
10.71 KB,
patch
|
jimm
:
review+
|
Details | Diff | Splinter Review |
![]() |
Reporter | |
Updated•15 years ago
|
Summary: Crash in [@ nsWindow::GetParentWindow] from [@ nsLayoutUtils::TranslateWidgetToView] → Crash in [@ nsWindow::GetParentWindow(int)] from [@ nsLayoutUtils::TranslateWidgetToView]
Updated•15 years ago
|
Severity: normal → critical
Comment 1•15 years ago
|
||
Just the stack for reference:
0 xul.dll nsWindow::GetParentWindow widget/src/windows/nsWindow.cpp:1063
1 xul.dll nsWindow::GetParent widget/src/windows/nsWindow.cpp:1025
2 xul.dll GetWidgetOffset layout/base/nsLayoutUtils.cpp:839
3 xul.dll nsLayoutUtils::TranslateWidgetToView layout/base/nsLayoutUtils.cpp:860
4 xul.dll nsLayoutUtils::GetEventCoordinatesRelativeTo layout/base/nsLayoutUtils.cpp:680
5 xul.dll PresShell::HandleEvent layout/base/nsPresShell.cpp:6127
I think we should close this bug for now until we know if it's not exploidable.
Group: core-security
Comment 2•15 years ago
|
||
should we remove the brackets from nsLayoutUtils::TranslateWidgetToView in the summary? If we ever get a crash at that address we'll be fooling crash-stats into linking here.
If the line number in the stack is correct it seems like we got a bogus (already destroyed?) widget out of the HWND. At a brief glance it doesn't look like we're doing anything all that dangerous with it, just trying to read its size and position. We're not writing to it, writing to a location returned by it, or calling any virtual functions.
Whiteboard: [sg:investigate]
Comment 3•15 years ago
|
||
I take that back, GetBounds() is virtual.
Updated•15 years ago
|
Whiteboard: [sg:investigate] → [sg:critical?]
Comment 4•15 years ago
|
||
(In reply to comment #2)
> should we remove the brackets from nsLayoutUtils::TranslateWidgetToView in the
> summary? If we ever get a crash at that address we'll be fooling crash-stats
> into linking here.
Using the | character to separate the different frames is the way we normally go, right? If not feel free to update the summary again.
Summary: Crash in [@ nsWindow::GetParentWindow(int)] from [@ nsLayoutUtils::TranslateWidgetToView] → Firefox crashes in [@ nsWindow::GetParentWindow(int) | nsLayoutUtils::TranslateWidgetToView]
Comment 5•15 years ago
|
||
Bug 506108 comment 7 answers that. Reverting Summary.
Summary: Firefox crashes in [@ nsWindow::GetParentWindow(int) | nsLayoutUtils::TranslateWidgetToView] → Firefox crashes in [@ nsWindow::GetParentWindow(int)] called from nsLayoutUtils::TranslateWidgetToView
Mats, can you look into this? No steps to reproduce, so may not be fixable.
Whiteboard: [sg:critical?] → [sg:critical?] [no steps to reproduce]
Updated•15 years ago
|
Assignee: nobody → jmathies
Assignee: jmathies → matspal
Assignee: matspal → nobody
Assignee | ||
Comment 7•15 years ago
|
||
I found one problem, probably not the crash cause though.
In OnDestroy we set mOnDestroyCalled to PR_TRUE (which means GetParent()
will return nsnull) before calling nsBaseWidget::Destroy() which
prevents it from removing us from the parent's child list.
(STR: drag a tab onto the desktop (creating a new window for it) --
each time the child list grows by one)).
The child window in question has mWnd=NULL and mOnDestroyCalled=PR_TRUE
so I don't think it can cause any trouble (other than the temporary
"leak").
Assignee | ||
Comment 8•15 years ago
|
||
The nsWindow parent relationship looks a bit fragile (storing the native
parent HWND on a property on the child HWND), it's a "weak ref" and we
depend on Windows to remove these properties in a timely fashion
(can an event be dispatched to a child window after/during a WM_DESTROY
on a parent?).
Since we can't find the error, I suggest we add protective code to avoid
crashing. I would re-instate the "unnecessary" SetParent(nsnull) in
OnDestroy as a first step.
http://mxr.mozilla.org/mozilla-central/source/widget/src/windows/nsWindow.cpp#6039
Perhaps also do SetParent(nsnull) on all children in nsBaseWidget::Destroy().
If the crashing continues I suggest we give up the current design and
use a strong ref mParent (as we do on gtk2).
![]() |
Reporter | |
Comment 9•15 years ago
|
||
(In reply to comment #8)
> If the crashing continues I suggest we give up the current design and
> use a strong ref mParent (as we do on gtk2).
^^ that would be my preference as well, but I found this wasn't simple to implement. I didn't spend a lot of time on it though. If we have the time now, I'd say taking a shot at remove the prop is the best bet to fixing all these issues.
Assignee | ||
Comment 10•15 years ago
|
||
Another option would be to add the SetParent(nsnull) per above +
NS_ADDREF/RELEASE the parent in SetParent. That would probably
fix the crash but we get a leak instead?
Assignee | ||
Comment 11•15 years ago
|
||
Here's a mParent solution for trunk... I think I can cook this down to a small
low-risk fix for branches.
Assignee: nobody → matspal
Assignee | ||
Comment 12•15 years ago
|
||
Keep a strong ref on the parent window, but don't use it for anything.
Attachment #435131 -
Flags: review?(jmathies)
![]() |
Reporter | |
Comment 13•15 years ago
|
||
+ parent = aParent ? (HWND)aParent->GetNativeData(NS_NATIVE_WINDOW) : NULL;
nit - let's stick with nsnull here.
We're going to land this on m-c first for a bit right?
![]() |
Reporter | |
Updated•15 years ago
|
Attachment #435131 -
Flags: review?(jmathies) → review+
Whiteboard: [sg:critical?] [no steps to reproduce] → [sg:critical?] [no steps to reproduce] [needs landing]
Assignee | ||
Comment 14•15 years ago
|
||
(In reply to comment #13)
> + parent = aParent ? (HWND)aParent->GetNativeData(NS_NATIVE_WINDOW) : NULL;
>
> nit - let's stick with nsnull here.
I'm sticking with the established convention here - to use NULL
for Windows types such as HWND, HCURSOR, HHOOK, HBRUSH etc
and nsnull for Mozilla types.
> We're going to land this on m-c first for a bit right?
I've landed the 1.9.2 patch on m-c for baking:
http://hg.mozilla.org/mozilla-central/rev/1fb768ca019e
Leaving the bug open to revert that and land the correct trunk patch later.
Assignee | ||
Updated•15 years ago
|
Whiteboard: [sg:critical?] [no steps to reproduce] [needs landing] → [sg:critical?] [no steps to reproduce]
Assignee | ||
Comment 15•15 years ago
|
||
Backed out changeset 1fb768ca019e due to unexpected leak:
1 nsBaseDragService
521 nsNativeDragTarget
Updated•15 years ago
|
Whiteboard: [sg:critical?] [no steps to reproduce] → [sg:critical?] [no steps to reproduce][ccbr]
Assignee | ||
Comment 16•15 years ago
|
||
It appears there are circular strong references between
nsWindow/nsNativeDragTarget/nsDragService.
I haven't figured out exactly why it leaked but calling
'mDragService->EndDragSession(PR_FALSE)' when the DnD is canceled
and doing the widget Destroy() call in nsXULWindow::Destroy()
fixes it. This patch is slightly more risky than the last,
but still low-risk I think.
TryServer unit tests pass without leaking.
I also tested the TryServer builds manually on all 3 platforms,
doing nasty stuff like detaching tabs with a plugin in them and such.
Attachment #435131 -
Attachment is obsolete: true
Attachment #439455 -
Flags: review?(jmathies)
![]() |
Reporter | |
Comment 17•15 years ago
|
||
+ else {
+ // Tell the drag service we're done so it can release objects.
+ mDragService->EndDragSession(PR_FALSE);
+ }
This feels like a bit of a hack. Is there any assurance DragOver will always get called after the drag operation canceled? What about unit tests?
Updated•15 years ago
|
Whiteboard: [sg:critical?] [no steps to reproduce][ccbr] → [sg:critical?] [no steps to reproduce][ccbr][critsmash:patch]
Comment 18•15 years ago
|
||
> Leaving the bug open to revert that and land the correct trunk patch later.
Please mark this bug as fixed when you reland the patch on trunk. Leaving bugs open for followups makes it hard to track what security holes affect a given branch, and makes it hard to track regressions. The new bug can be security-sensitive but [sg:nse].
Whiteboard: [sg:critical?] [no steps to reproduce][ccbr][critsmash:patch] → [sg:critical?] [no steps to reproduce][ccbr]
Updated•15 years ago
|
Whiteboard: [sg:critical?] [no steps to reproduce][ccbr] → [sg:critical?] [no steps to reproduce][ccbr][critsmash:patch]
Updated•15 years ago
|
Whiteboard: [sg:critical?] [no steps to reproduce][ccbr][critsmash:patch] → [sg:critical?] [no steps to reproduce][ccbr][critsmash:patch][needs mats to address review comment]
![]() |
Reporter | |
Comment 19•15 years ago
|
||
(In reply to comment #17)
> + else {
> + // Tell the drag service we're done so it can release objects.
> + mDragService->EndDragSession(PR_FALSE);
> + }
>
> This feels like a bit of a hack. Is there any assurance DragOver will always
> get called after the drag operation canceled? What about unit tests?
Mats, is it ok if I take this bug over? Id like to dig into this a bit and understand why this is needed.
Assignee | ||
Comment 20•15 years ago
|
||
> Is there any assurance DragOver will always
> get called after the drag operation canceled?
Not that I can see, no.
So, here's a version that doesn't depend on reaching DragOver.
I'm replacing the mDragCancelled flag with a DragCancel() call
that does the cleanup directly.
Attachment #439455 -
Attachment is obsolete: true
Attachment #446003 -
Flags: review?(jmathies)
Attachment #439455 -
Flags: review?(jmathies)
![]() |
Reporter | |
Comment 21•15 years ago
|
||
Comment on attachment 446003 [details] [diff] [review]
Low-risk patch for 1.9.2, v3
thanks for the update!
Attachment #446003 -
Flags: review?(jmathies) → review+
What has to be done for trunk here? The mixture of trunk and branch comments is a bit confusing.
Assignee | ||
Comment 23•15 years ago
|
||
I landed the 1.9.2 patch on trunk for baking, but had to back out due
to unexpected Tinderbox orange on Linux. The previously intermittent orange
bug 527099 and bug 551538 (Mooth), and bug 542580 (Mo5) became perma-orange.
I've debugged it I understand why it occurs with the patch -- adding the
nsWindow::Destroy call in nsXULWindow::Destroy makes SetPrompt(PR_FALSE),
which comes later, fail which in turn messes up window focus/activation.
I'm going to file a separate bug on this, hoping it'll fix those
"unable to restore focus" errors that is causing intermittent orange
on Linux. I don't know yet if that fix will be safe for branch though.
I've pushed the patch without the explicit Destroy to Try to see if it
still leaks (we've changed mDragCancelled since then which might have helped). I'll report back with the results...
Assignee | ||
Comment 24•15 years ago
|
||
I mean SetModal(PR_FALSE) of course, not SetPrompt.
BTW, that call is in nsXULWindow::ShowModal() after exiting the modal loop.
Assignee | ||
Comment 25•15 years ago
|
||
Yes, it still leaks nsNativeDragTarget without the explicit
mWindow->Destroy(). :-(
http://tinderbox.mozilla.org/showlog.cgi?log=MozillaTry/1274669543.1274670507.10879.gz
Jim, feel free to figure out if there's a better way to plug that leak.
I'm going to see if I can fix the focus problem on Linux (bug 567704)
in a safe way so we can keep the Destroy() call.
Assignee | ||
Comment 26•15 years ago
|
||
There's a fix in bug 567704 for the focus problem, it passed TryServer
unit tests together with this patch.
Depends on: 567704
Whiteboard: [sg:critical?] [no steps to reproduce][ccbr][critsmash:patch][needs mats to address review comment] → [sg:critical?] [no steps to reproduce][ccbr][critsmash:patch][needs fix in bug 567704 first]
Assignee | ||
Comment 28•15 years ago
|
||
All branches as far as I can tell from crash-stats.
Assignee | ||
Updated•15 years ago
|
Whiteboard: [sg:critical?] [no steps to reproduce][ccbr][critsmash:patch][needs fix in bug 567704 first] → [sg:critical?] [no steps to reproduce][ccbr][critsmash:patch]
Assignee | ||
Comment 29•15 years ago
|
||
Landed the lower-risk fix on trunk, will ask for branch approvals
(together with bug 567704) after baking it for a while.
http://hg.mozilla.org/mozilla-central/rev/9a3a1fa8233a
I filed bug 569583 for the additional (riskier) trunk changes (in wip10).
Status: NEW → RESOLVED
Closed: 15 years ago
status1.9.2:
--- → ?
Resolution: --- → FIXED
Target Milestone: --- → mozilla1.9.3a5
Comment 30•15 years ago
|
||
Looks like this might have caused blocking regression bug 585287.
Updated•14 years ago
|
Updated•14 years ago
|
Crash Signature: [@ nsWindow::GetParentWindow(int)]
Updated•10 years ago
|
Group: core-security → core-security-release
Updated•9 years ago
|
Group: core-security-release
You need to log in
before you can comment on or make changes to this bug.
Description
•