Closed Bug 531075 Opened 10 years ago Closed 10 years ago

crash [@ nsXULPopupManager::HidePopupsInDocShell(nsIDocShellTreeItem*) ], [@nsINode::GetOwnerDoc() ] with frame poisoned addresses

Categories

(Core :: Layout, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
Tracking Status
status1.9.2 --- beta5-fixed

People

(Reporter: chofmann, Assigned: dholbert)

References

(Blocks 1 open bug)

Details

(Keywords: regression, Whiteboard: [causes bug 526391])

Attachments

(3 files)

Flags: wanted1.9.2?
a couple of things to figure out with regard to the frame poisoned address crashes

how likely is the planned ramp in weave use likely to increase the fp crashes and possible and discovery of the fp address crashes.  

and any thoughts on if that one be also turned into an exploit against 3.5.x?
Flags: blocking1.9.2?
(In reply to comment #0)
> show frame poisoned address.   
> 
> ranked # 57. 3 0xfffffffff0dea817 Windows NT
> nsMenuPopupFrame::FindMenuWithShortcut(nsIDOMKeyEvent*, int&)
(In reply to comment #1)
> how likely is the planned ramp in weave use likely to increase the fp crashes
> and possible and discovery of the fp address crashes.  

FWIW, that particular crash in FindMenuWithShortcut is particularly easy for users to accidentally trigger (as described in Bug 526391 comment 10 & bug 526391 comment 11).  It basically happens whenever you open Weave's context menu during a Weave sync operation, I believe.  (which is easy when performing a long, full sync or when syncing over a slow connection)

So, I wouldn't be surprised if we see lots more crashes with that signature start coming in, as Weave grows in popularity.
Whiteboard: [depends on bug 526391]
Whiteboard: [depends on bug 526391] → [duplicate of bug 526391]
I'm not sure why we need this to be a separate bug from bug 526391.
Depends on: 526391
Flags: wanted1.9.2?
Flags: wanted1.9.2-
Flags: blocking1.9.2?
Flags: blocking1.9.2-
the query shows a mixed bag of versions and crash address for the same signature

are the frame poisoned firefox 3.6* crashes caused by the same bug as the other firefox 3.5.* crashes? 

or 

will the fix address both releases?  

I didn't know for sure, so I filed this bug.
I worked through this a bit with Mardak, and he came up with a simple patch for Weave that works around the crash  (since the Weave behavior that was triggering this crash was a bit sketchy -- details to follow).

Let's make bug 526391 be about getting this fixed in Weave, and this bug here about the actual underlying Layout bug.
Assignee: nobody → dholbert
Status: NEW → ASSIGNED
Flags: wanted1.9.2-
Flags: blocking1.9.2-
OS: Windows XP → All
Hardware: x86 → All
Whiteboard: [duplicate of bug 526391] → [causes bug 526391]
So what happens here is:
  1. User clicks Weave status bar icon.  Popup menu is shown (and stored in nsXULPopupManager::mPopups)
  2. User chooses "Sync Now".  This does some work, and then will dismiss the popup. (clearing nsXULManager::mPopups)

To illustrate the problem, let's break down Step 2 a bit:
  2a: "Sync Now" makes a synchronous call to Weave.Services.sync()
  2b: This involves network activity, which means we spin the event loop and allow the user to interact with the UI.  The popup has now *visually* disappeared, but it hasn't actually been dismissed yet.  (That won't happen unti "sync()" finishes.)
 
SO: since the user can interact with the UI during step 2b, they're free to click the Weave status-bar icon and reopen the popup menu, before the old one has been properly hidden & cleaned-up-after.

If the user does click the menu again, we'll create a new nsMenuChainItem for the new menu-popup, but we'll re-use the same frame (I think because it's already cached as the primary frame for the popup-menu-element).  This is how we end up in the situation that dbaron described in bug 526391 comment 17, with multiple nsMenuChainItems tied to the same frame.

FWIW, the fix for this on the Weave side is to make the "sync()" call in step 2a *asynchronous*, which lets us clean up the popup right away, before the event loop starts spinning again.
(In reply to comment #6)
>   2b: This involves network activity, which means we spin the event loop and
> allow the user to interact with the UI.  The popup has now *visually*
> disappeared, but it hasn't actually been dismissed yet.  (That won't happen
> unti "sync()" finishes.)

I should clarify what I mean by "hasn't actually been dismissed yet" -- I meant that we won't call nsXULPopupManager::PopupDestroyed, or nsMenuPopupFrame::Destroy, or any onpopuphiding-handlers (in JS) until we're return from doSync() (the JS method that was invoked by clicking "Sync Now").
Attached patch possible fixSplinter Review
Possible fix... Adds a IsPopupOpen() check to prevent us from opening a menu-popup when the menu is already in the focus-list (open or not).

If we were to use this fix, the assertions about "IsPopupOpen" in the contextual code could obviously be removed.  I'm actually a little confused by the presence of those assertions -- they imply that we expect to get into this situation...? (with the mPopups chain getting multiple entries with the same Content()/Frame() pointers)

Neil, does this fix make sense to you?  (and/or, do you know if there are any situations in which we would want to bypass this patch's added IsPopupOpen check -- to allow ourselves to open a popup that's already in our mPopups chain?  AFAICT that gets us into a dangerous state, so I tend to think it should be avoided at all costs...)
Flags: wanted1.9.2?
Flags: wanted1.9.2-
Flags: blocking1.9.2+
It seems to me that it would be better to completely dismiss the menu before running any script, no?
You probably want to askenn for review.
Er, "ask enn"
(In reply to comment #9)
> It seems to me that it would be better to completely dismiss the menu before
> running any script, no?

I was thinking that as well.  nsXULPopupManager::ExecuteMenu is the place we'd do that -- that's the place where the menu gets visually hidden before the command executes, at least.  However, that function has a comment saying we can't completely remove the menu until after the command executes:

> 953   // When a menuitem is selected to be executed, first hide all the open
> 954   // popups, but don't remove them yet. This is needed when a menu command
> 955   // opens a modal dialog. The views associated with the popups needed to be
> 956   // hidden and the accesibility events fired before the command executes, but
> 957   // the popuphiding/popuphidden events are fired afterwards.
http://mxr.mozilla.org/mozilla-central/source/layout/xul/base/src/nsXULPopupManager.cpp#953
Attachment #415365 - Flags: review?(enndeakin)
Enn's gone until at least tomorrow - is there another potential reviewer?
The closest to that would be me.  The patch looks ok to me as a quick-fix, but it should still get review from enn at some point....
And in particular, I think it would take me longer than until tomorrow to re-understand this code well enough to be able to really review the patch.
FWIW, the attached patch passed tests on try-server this morning.
Blocks: 526391
No longer depends on: 526391
Attachment #415365 - Flags: review?(bzbarsky)
Attachment #415365 - Flags: review?(bzbarsky) → review+
Comment on attachment 415365 [details] [diff] [review]
possible fix

Let's get this baking, pending enn's review.
Note-to-self:  As an optimization and a warning-spam-minimization, the added block of code in this bug's existing patch should move down a bit, to below this chunk:
> if (state != ePopupClosed && state != ePopupInvisible)
>   return PR_FALSE;

With the just-pushed patch, in a testcase that just does this...
  var popup = document.getElementById("myMenuPopup");
  popup.openPopup(...);
  popup.openPopup(...);
... we'll trigger the added warning on the second "openPopup" call.  If we move the added code down a bit, then this sort of testcase will instead fall into the already-existing early-return case quoted above. (since the popup isn't closed or invisible -- it's open.)
This automated XUL testcase crashes in nsXULPopupManager::HidePopupsInDocShell or nsXULPopupManager::UpdateKeyboardListeners after I reload it 3 or 4 times.  It requires enhanced privs, so it only works if you run a local copy of it.

It's automated to open the menu and click the entry, which spins the event loop like "Sync" does in weave.  It does this a few times.

Sample crash reports:
HidePopupsInDocShell:
http://crash-stats.mozilla.com/report/index/92875d55-cd68-4b49-807f-9d9922091201
http://crash-stats.mozilla.com/report/index/647fd28f-4360-4e4a-93fc-400cc2091201

UpdateKeyboardListeners:
http://crash-stats.mozilla.com/report/index/c06ed06a-3e9e-48ce-b3a3-89ae12091201
http://crash-stats.mozilla.com/report/index/7b107227-b178-43f4-8149-be0852091201

This testcase is fixed by the attached patch.  (The "Number of running tasks / open popups" field maxes out at 1, at which point this bug's patch kicks in and refuses to open new copies of the popup.)
Here's how to reliably crash with attached automated testcase, in an unpatched Firefox build:
  1. Load a locally-saved copy of the testcase
  2. When prompted for privs, tick the "Remember" box and click Allow
    --> Menu flashes open a few times
  3. After the menu stops flashing, hit "Ctrl+R" to reload (quickly -- within a second or two of the menu flashing)
  4. Wait 5-10 seconds, making sure the task-counter has reset to 0.
  5. Open a new tab...  OR close your tab... OR reload again
    ----> CRASH

I verified that this testcase crashes (with the above procedure) using a 3.6 nightly on Windows XP, too -- however, the XP crashes tend to be at nsINode::GetOwnerDoc:
http://crash-stats.mozilla.com/report/index/5f0ac5fd-f338-4c77-9e80-e4fea2091201
(note that bug 527448 mentions that signature in its title, and that bug is duped to the weave crash bug that this causes)

I wasn't able to reproduce this in Firefox 3.5.5, though. (tried on both Windows & Ubuntu)  --> Adding 'regression' keyword.
Keywords: regression
Summary: crash [@ nsXULPopupManager::HidePopupsInDocShell(nsIDocShellTreeItem*) ] with frame poisoned addresses → crash [@ nsXULPopupManager::HidePopupsInDocShell(nsIDocShellTreeItem*) ], [@nsINode::GetOwnerDoc() ] with frame poisoned addresses
Comment on attachment 415365 [details] [diff] [review]
possible fix

This is ok for the normal case, but I think this will cause the popup to fail to reopen in some cases (for instance, after switching to a different tab containing the popup and back)

What should probably happen is that the line at http://mxr.mozilla.org/mozilla-central/source/layout/xul/base/src/nsXULPopupManager.cpp#873 should look like:

873       frame->HidePopup(aDeselectMenu, called_from_HidePopupsInDocShell ? ePopupClosed : ePopupInvisible);

I can take a more detailed look tomorrow.
(In reply to comment #22)
> (From update of attachment 415365 [details] [diff] [review])
> I think this will cause the popup to fail
> to reopen in some cases (for instance, after switching to a different tab
> containing the popup and back)

I'm not sure I understand how this would happen -- AFAICT, a user can't switch tabs while a menupopup is open. (at least, I can't as a user -- though perhaps it could be done by script running with chrome privs? I'm not sure)

I tried viewing the attached testcase (bugzilla-hosted version) in stock mozilla-central -- if I open its menu, I can't switch tabs until I make the menu go away. Ctrl-PgUp/PgDown do nothing, Ctrl-Tab just closes the menu (as does 'Tab' by itself), and if I manually click a different tab, that just dismisses the menu as well.

Maybe I misunderstood the failure scenario you were describing, though.

> What should probably happen is that the line at [...]

So, it looks like your suggested fix would only change us away from current behavior when we're within HidePopupsInDocShell.  However, I think the HidePopupsInDocShell call is too late to fix the actual issue here (having multiple popups in our open-popup chain with the same frame/content pointers).

In the attached automated testcase, I don't get any calls to HidePopupsInDocShell, except...
 (a) when the enhanced privileges dialog appears
 (b) immediately upon reload

So if we don't get any calls to HidePopupsInDocShell while we're setting up the dangerous situation (opening multiple copies of the menu-popup), it doesn't seem like a change to our behavior during HidePopupsInDocShell would fix this (aside from perhaps making us handle the already-broken popup-chain more gracefully in a way that won't crash).
(In reply to comment #23)
> I think the
> HidePopupsInDocShell call is too late to fix the actual issue here (having
> multiple popups in our open-popup chain with the same frame/content pointers).

Note that this bug causes crashes that don't directly involve calls to HidePopupsInDocShell (at least, not obviously from the stacktrace).  Once we've got a "mPopups" chain with multiple entries for the same frame, there appear to be a number of different function-calls that can kill us.

For example: 
 - crash @ nsXULPopupManager::UpdateKeyboardListeners linked in comment 20
 - crash @ nsMenuPopupFrame::FindMenuWithShortcut (via nsXULPopupManager::KeyPress) in bug 526391 comment 14

As long as the attached fix doesn't break our normal behavior[1], I tend to think we need to catch this issue early (e.g. in MayShowPopup), because my debugging so far has suggested that we're playing with fire once we have multiple popups that point to the same frame.

Enn -- I'd much appreciate any clarification you can offer on how/whether this might break existing behavior, or on alternate fix strategies. (I think we're trying to get this bug fixed ASAP, since it's one of the few remaining 1.9.2 blockers.)
(In reply to comment #24)
> As long as the attached fix doesn't break our normal behavior[1]

Sorry -- I meant to link "[1]" there to the possible breakage that Enn brought up in comment 22 (but which I don't completely follow, per beginning of comment 23)
The patch already checked in is correct. 

However, popups that have been put into the ePopupInvisible state still linger in the menu chain unused. Any attempts to reopen them now just fail.

I was a bit off in my comment above as I only briefly looked at this. What should happen is that the invisible state should only be used for the ExecuteMenu path. For other paths, such as HidePopupsInDocShell, the popup should be hidden and also removed from the menu chain.
OK, let's get the patch on 1.9.2 then.
Whiteboard: [causes bug 526391] → [causes bug 526391][needs 192 landing]
This is fixed on trunk by Daniel's patch.
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
http://hg.mozilla.org/releases/mozilla-1.9.2/rev/f7a7cf25bf99
Whiteboard: [causes bug 526391][needs 192 landing] → [causes bug 526391]
(In reply to comment #26)
> The patch already checked in is correct. 
> 
> However, popups that have been put into the ePopupInvisible state still linger

Thanks Neil -- I'll look into fixing that in a follow-up patch.

(In reply to comment #29)
Thanks for the branch-landing, Karl.  I actually want to move the check down slightly to avoid needless warnings in certain cases, per comment 19, but I can do that as a quick followup patch.
Flags: in-testsuite?
Trivial followup to move the added check slightly lower, per comment 19.
Attachment #415987 - Flags: review?(bzbarsky)
Attachment #415987 - Attachment description: followup fix: check IsPopupOpen slightly later → followup fix: check IsPopupOpen slightly later, so we won't warn about open-and-visible popups
Attachment #415987 - Flags: review?(bzbarsky) → review+
Attachment #415365 - Flags: review?(enndeakin)
Comment on attachment 415365 [details] [diff] [review]
possible fix

The later patch should actually be sufficient after all. After investigating, the cleanup is being done properly.
If this is fixed on 1.9.2, could you set status1.9.2 appropriately?
Yes, sorry -- Karl already set it when he pushed the patch to 1.9.2, actually, but I apparently cleared it somehow when I posted a comment right after that.
Group: core-security
You need to log in before you can comment on or make changes to this bug.