Closed Bug 529834 Opened 11 years ago Closed 11 years ago

How are we getting the poison frame?

Categories

(Core :: Layout, defect)

x86
macOS
defect
Not set

Tracking

()

RESOLVED DUPLICATE of bug 529371

People

(Reporter: davidb, Unassigned)

References

Details

(Whiteboard: [sg:dupe 529371])

Spin off from bug 529371.

See example crash stack:
http://crash-stats.mozilla.com/report/index/bp-6e723c79-9a9d-4b3d-93e5-728442091119

It is possible to request to attempt creation of an nsWeakFrame initialized with a poisoned frame. This currently crashes us. Do we want to guard against this?
Also it appears possible that aSeeminglyLivingFrame->GetNextSibling() can return a poisoned frame.
Well, things are already very wrong if someone is trying to use a poisoned frame.

nsWeakFrame was originally designed to be initialized with a frame, which is
known to be alive. Then while the nsWeakFrame is alive (normally as stack
variable, but it doesn't have to be), weakFrame.IsAlive() check can be done 
after each possibly evil method call.

Why is a11y code keeping raw references to nsIFrame objects?
Although it could slow things down, replacing nsIFrame* with nsWeakFrame
could help. Then later in the code use nsIFrame* f = nsWeakFrame.GetFrame()
(need to null check f of course).
We are not holding raw nsIFrame objects very long anymore. In the crash stack above I don't see how the curFrame could go bad. Is frame poisoning happening off the main thread?
(In reply to comment #3)
> Is frame poisoning happening off the main thread?
No.
OK. Note we use nsWeakFrame quite a bit now; is it ok to hold onto them for a little while? In other words, can nsWeakFrame::GetFrame return a poisoned frame?
If nsWeakFrame is initialized with a valid nsIFrame,
GetFrame returns either that frame (if it is alive), or nsnull.

We should try to avoid using nsWeakFrames if possible. But in some cases we
do need to hold to an nsIFrame object for a bit longer.
For example nsEventStateManager has few nsWeakFrame member variables.
OK, so this bug is probably really about comment #1. I suspect we need one of:

1. GetNextSibling to return a living frame or nsnull.
2. Some API like IsFramePoisoned(nsIFrame*).

Or longer term:
3. A way accessibility can get native anonymous frame content more safely.
(In reply to comment #7)
> 1. GetNextSibling to return a living frame or nsnull.
So in which case can GetNextSibling return a deleted frame?
Is the 'this' in that case really alive?

> 2. Some API like IsFramePoisoned(nsIFrame*).
This shouldn't be needed, and I think wouldn't even be really possible, because
the memory may have been reallocated, so the frame doesn't necessarily
point to poisoned memory, but to a newly created frame object. 
We have nsWeakFrame to detect if something gets deleted.
(In reply to comment #8)
> (In reply to comment #7)
> > 1. GetNextSibling to return a living frame or nsnull.
> So in which case can GetNextSibling return a deleted frame?
> Is the 'this' in that case really alive?
> 

Looking at the crash state here's my take on it (I could be wrong):
Note mState.frame is type nsWeakFrame.
In UpdateFrame(PR_FALSE), we get by this:

nsIFrame *curFrame = mState.frame.GetFrame();
if (!curFrame) {
 return;
}

We then do:
mState.frame = curFrame->GetNextSibling();

This invokes the operator= of nsWeakFrame, and we end up at:
nsWeakFrame::InitInternal
And fail on:
nsIPresShell* shell = mFrame->PresContext()->GetPresShell();

With a poisoned signature.
s/crash state/crash stack

(In reply to comment #8)
> (In reply to comment #7)
> > 2. Some API like IsFramePoisoned(nsIFrame*).
> This shouldn't be needed, and I think wouldn't even be really possible, because
> the memory may have been reallocated, so the frame doesn't necessarily
> point to poisoned memory, but to a newly created frame object. 
> We have nsWeakFrame to detect if something gets deleted.

OK.
(In reply to comment #7)
> OK, so this bug is probably really about comment #1. I suspect we need one of:
> 
> 1. GetNextSibling to return a living frame or nsnull.

Or maybe some nsWeakFrame::GetNextSibling API?

Can anyone confirm my analysis in comment 9?
David, it seems like you don't really understand what's going on here. We probably didn't explain it very well.

0) Lots of operations can destroy frames, including anything that directly or indirectly calls nsIPresShell::FlushPendingNotifications or removes content from the DOM.

1) An nsIFrame* pointer should always refer to a live frame (unless it's a variable that you're never going to use again). If it's ever a dangling pointer that you might use again, that is a critical bug and we're doomed.

2) Assuming that 1) is OK, calling frame->GetNextSibling() will always return a pointer to a live frame, unless the frame tree is corrupt, which is also a critical bug that means we're doomed. Those bugs are not common so it's unlikely that's what's happening in these crashes.

3) Given point 1, IsFramePoisoned is not a useful API (and as Olli pointed out, it's not really implementable either). If you have an nsIFrame* pointer that may or may not be dangling, you just can't use it.

4) All nsWeakFrame does is clear itself to null when the frame it's holding is destroyed. That means if you passed in a good live frame pointer to begin with, calling GetFrame() later will either return null or a live frame.

5) nsWeakFrame effectively hooks itself up to the frame, so if you pass in a bogus frame pointer, it will crash, as we see in these stacks.

Since it's so easy for frames to go away without warning, code outside content/layout really shouldn't ever touch frames. They are dangerous.
OK, so Marco can reproduce the bug. That's good.

What are the exact steps to reproduce? What's the easiest way to trigger accessibility on Linux?
I probably misspoke if it appears I don't understand what is going on here.

Everything in comment 12 makes sense to me. I'm just not seeing where we are holding onto an nsIFrame* pointer during a frame flush in the crash stack above.
(In reply to comment #13)
> OK, so Marco can reproduce the bug. That's good.
> 
> What are the exact steps to reproduce? What's the easiest way to trigger
> accessibility on Linux?

On GNOME in the system prefs, under Assistive Technologies there should be a checkbox like "Enable Assistive Technologies".

I use Accerciser to confirm FF a11y is going:
http://live.gnome.org/Accerciser#head-2b5a8f89c9b01a5a820ffe224f7ebe5e3c8de0d6
For Linux steps to reproduce this stack (from aja I think) suggests displaying addons extensions tab is enough.

http://crash-stats.mozilla.com/report/index/a96616e6-43ae-4c73-a997-51f1e2091119

I'll be updating my linux vm tonight.
I'm on Windows while I reproduce the crash. All I do is:
1. Have NVDA running (http://www.nvda-project.org).
2. Start the nightly of either Minefield or Namoroka.
3. Go to Tools/Add-Ons. By default it opens to the "Extensions" page for me (in builds that don't crash). This dialog never comes up, or at least NVDA doesn't tell me. Instead, Firefox crashes.
(In reply to comment #17)
> I'm on Windows while I reproduce the crash. All I do is:
> 1. Have NVDA running (http://www.nvda-project.org).
> 2. Start the nightly of either Minefield or Namoroka.
> 3. Go to Tools/Add-Ons. By default it opens to the "Extensions" page for me (in
> builds that don't crash). This dialog never comes up, or at least NVDA doesn't
> tell me. Instead, Firefox crashes.

I don't get crash on trunk with 0.6p3.2 NVDA. "Extensions" page is appeared.
I don't get a crash with that version of NVDA on my own build, either.
Summary: nsWeakFrame should detect nsIFrame poisoning → How are we getting the poison frame?
Marco, can you do the following?
1) attach Visual Studio to your nightly build before Firefox crashes
2) set a breakpoint at nsFrame::Destroy
3) set a breakpoint on nsAccessibleTreeWalker::nsAccessibleTreeWalker and set its action to enable the breakpoint created in step #2
4) set a breakpoint on nsAccessibleTreeWalker::~nsAccessibleTreeWalker and set its action to disable the breakpoint created in step #2
5) try to trigger the crash...

I have no idea if Visual Studio is accessible enough to do steps 2 to 4...

David, if you can reproduce on Linux then you could do this using gdb.
Roc, here's the big problem: I don't crash if I build locally or if I use a try-server build. I only crash with regular nightlies, for whatever frickin' reason.

Another problem is that Visual Studio is, in principle, accessible to do the steps you ask me for. But because screen readers (any of them) hook deeply into the Firefox process, the moment the debugger halts at a breakpoint I'm dead in the water. My screen reader is halted along with Firefox. While mouse and keyboard actions still work, I no longer have control over what I'm doing, and I have no sighted assistance currently available to help me in such instances.

And as for the screen reader version: I am using NVDA 2009.1RC1, and I am on Windows 7, 64 bit edition. I never saw this crash (or the original one in bug 525579) when I was still on Windows XP. That might have something to do with it, too.
FYI: I am able to reproduce the still-happening crash of bug 529442 in Windows XP with the regular November 19 nightly build, see bp-780647b3-a6c4-4f27-8729-b01492091119
Aha, I can reproduce with the nightly build!
Oh thank you! Silly popup frame tree was biting us as per comment 1.
(Readers can follow on bug 529371 comment 25)
Pretty sure Roc nailed it. Closing as duplicate of fixed bug 529371; thanks all!
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 529371
Whiteboard: [sg:dupe 529371]
Group: core-security
You need to log in before you can comment on or make changes to this bug.