Closed Bug 529442 Opened 15 years ago Closed 15 years ago

crash [@ nsWeakFrame::InitInternal(nsIFrame*) ]

Categories

(Core :: Disability Access APIs, defect)

defect
Not set
critical

Tracking

()

VERIFIED FIXED
Tracking Status
status1.9.2 --- beta4-fixed

People

(Reporter: davidb, Assigned: surkov)

References

Details

(Keywords: crash, verified1.9.2, Whiteboard: [sg:critical?])

Crash Data

Attachments

(2 files, 3 obsolete files)

In nsWeakFrame::InitInternal:
nsIPresShell* shell = mFrame->PresContext()->GetPresShell();

mFrame appears poisoned?
Blocks: 525579
Hmmm. Maybe this is badness caused by nsWeakFrame(nsnull)?

In our AccessibleTreeWalker we can have this I think...
(via mState.frame = nsnull and the operator= of nsWeakFrame)
  nsIFrame *frame = mState.frame.GetFrame();
  mAccService->GetAccessible(mState.domNode, presShell, mWeakShell,
                             &frame, &mState.isHidden,
                             getter_AddRefs(mState.accessible));
  mState.frame = frame;

This looks bogus. I should have caught it before. We grab the frame from the nsWeakFrame, then we call GetAccessible (which we know might destroy the frame), then we store the (possibly now dead) frame pointer back into mState.frame.

What was that code supposed to be doing? Shouldn't we just remove those two lines before and after GetAccessible?
Oh, I see, it's because the aFrameHint parameter in GetAccessible is an in/out parameter. And sometimes GetAccessible doesn't change that parameter.

Probably the best thing to do here at the moment is change aFrameHint to be an nsWeakFrame*.
Right.

But you are also right about that last assignment, it creeped in when dbaron and I
passed the patch back and forth.
Forget comment 7 (I'm tired).

You are right, it is an in/out param.
(In reply to comment #6)
> Probably the best thing to do here at the moment is change aFrameHint to be an
> nsWeakFrame*.

It will be nicer but it will require a uuid change I think, so I'd like to avoid it for now.
Thinking aloud. Would it make sense to always flush layout just before using our nsAccessibleTreeWalker?
That might help, but it's hard to be sure.

I think you have to change the GetAccessible interface somehow. As it is, the caller can't tell the difference between
1) The frame pointed to by aFrameHint was destroyed and the pointer is now to (poisoned) garbage
2) The frame pointed to by aFrameHint was destroyed, and a new frame was allocated at the same address, and returned in aFrameHint

i.e. the caller cannot know whether the result of aFrameHint is valid or not.
Attached patch patch (obsolete) — Splinter Review
we could do something like this
That changes GetAccessible to return null in *aFrameHint on some paths where it left *aFrameHint unchanged before. Are you sure that's OK? If so, that's a good solution.
(In reply to comment #13)
> That changes GetAccessible to return null in *aFrameHint on some paths where it
> left *aFrameHint unchanged before. Are you sure that's OK? If so, that's a good
> solution.

I didn't want to change the logic. You're right. Either I need to 
if (weakFrameHint.GetFrame())
  *aFrameHint = weakFrameHint.GetFrame()

or to initialize weakFrameHint by *aFrameHint.

Do I understand correct?
> if (weakFrameHint.GetFrame())
>   *aFrameHint = weakFrameHint.GetFrame()

That doesn't fix the bug. In the buggy case, weakFrameHint.GetFrame() will return null because we destroyed the frame, so you won't update *aFrameHint, so it will continue to point to a dead frame, and the caller will crash.

> or to initialize weakFrameHint by *aFrameHint.

That should work. That means if GetAccessible destroys the frame, and doesn't put a new one in weakFrameHint, *aFrameHint will return null. I hope that's OK.
Attached patch patch2 (obsolete) — Splinter Review
Assignee: nobody → surkov.alexander
Attachment #413001 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #413035 - Flags: superreview?(roc)
Attachment #413035 - Flags: review?(bolterbugz)
Note that I can reproduce this crash at will by installing an add-on or by opening Tools/Add-Ons. See these reports by me:
bp-c50f7400-b893-49ce-a8e5-d44fa2091118
and
bp-b8602c11-5b93-40e9-ab54-4378c2091118
Note that I cannot reproduce this problem with a local build. The one thing I do notice is that it takes ages (up to 5 seconds) for the add-on window to come up, but it doesn't crash. Whereas with the regular nightly build it immediately crashes when I open Tools/Add-Ons.

So to verify that the above patch fixes the bug, ideally I'd need a try-server build based off of the 1.9.2 tree.
Attached patch patch192 (obsolete) — Splinter Review
This try-server build doesn't crash for me. I also see a delay now when opening Tools/Add-Ons, but I could install Firebug 1.5xb2 (which caused one of my above quoted crashes to happen with the previous nightly), and then go back in after the restart to check for updates (which caused the other above crash with the regular nightly).

From my end this looks good.
(In reply to comment #15)
> > if (weakFrameHint.GetFrame())
> >   *aFrameHint = weakFrameHint.GetFrame()
> 
> That doesn't fix the bug. In the buggy case, weakFrameHint.GetFrame() will
> return null because we destroyed the frame, so you won't update *aFrameHint, so
> it will continue to point to a dead frame, and the caller will crash.
> 

Right.

> > or to initialize weakFrameHint by *aFrameHint.
> 
> That should work. That means if GetAccessible destroys the frame, and doesn't
> put a new one in weakFrameHint, *aFrameHint will return null. I hope that's OK.

I think we can live with that for now.
We can still see the file ctrl with this patch so that's a good sign.
Comment on attachment 413035 [details] [diff] [review]
patch2

Some nice catches here thanks! I'm not sure we need two weakFrame objects in GetAccessible... I've pushed a tweaked patch to see if Marco can confirm it is still ok and I will attach if it is.
This keeps Alexander's catches but removes the extra weakframe object introduced in that patch.
Try server builds: https://build.mozilla.org/tryserver-builds/dbolter@mozilla.com-try-22b7683077df/
Attachment #413035 - Attachment is obsolete: true
Attachment #413084 - Flags: review?(marco.zehe)
Attachment #413035 - Flags: superreview?(roc)
Attachment #413035 - Flags: review?(bolterbugz)
The m-c try server build looks good. Neither installing an extension nor the update UI crash. So far, so good!

I'm waiting for the 1.9.2 try-server build since that more closely matches the previous conditions.
Waiting for try build...
Attachment #413061 - Attachment is obsolete: true
Attachment #413084 - Flags: review?(surkov.alexander)
(In reply to comment #18)
> Note that I cannot reproduce this problem with a local build.

This is a mystery. Anyone got a good hunch? Do local builds do something different with add-on updates?
I suppose a better question is, do beta builds have any special config for addon updates?
Comment on attachment 413084 [details] [diff] [review]
patch 2.5 (trunk)

Also works on the 1.9.2 try-server build. r=me for the functionality. Now you guys have to figure out the code :)
Attachment #413084 - Flags: review?(marco.zehe) → review+
Comment on attachment 413086 [details] [diff] [review]
patch 2.5 (1.9.2 version)

Note to 192 drivers, this will make FF 3.6 crash a lot less. (The patch simply improves the sync between our raw Frame and weak Frame state). Fix confirmed by Marco Zehe.
Attachment #413086 - Flags: approval1.9.2?
Comment on attachment 413084 [details] [diff] [review]
patch 2.5 (trunk)

r=me for these v2.5 patches.

Let's land on Alexander's behalf.
Attachment #413084 - Flags: review?(surkov.alexander) → review+
I mean land the trunk patch, and the 192 if we get a+ ;)
Comment on attachment 413086 [details] [diff] [review]
patch 2.5 (1.9.2 version)

a1.9.2=dbaron
Attachment #413086 - Flags: approval1.9.2? → approval1.9.2+
FYI, with the tryserver linux build of patch 2.5, the crashes in bugs 529371 and 529397 still occur (though no crash reporter gets launched, so I can't confirm the stacks are still the same).
Can you attach gdb to get some kind of stack?

Hmm, I don't know if there's a way to get Linux symbols for tryserver builds...
http://hg.mozilla.org/mozilla-central/rev/26a2aacf9e3a

Closing since we fixed a crash that Marco could reproduce. Aja, if you can still crash with the trunk build, please file a new bug and request blocking1.9.2!
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Whiteboard: [needs landing] → [needs 192 landing]
Now this is getting bizarre: With nightly builds done today, which include the fixes from this bug (they landed 10 hours before the nightly builds started), I get these crashes:
mozilla-central: bp-6e723c79-9a9d-4b3d-93e5-728442091119
mozilla-1.9.2: bp-b095688d-e759-4d65-ae70-181dd2091119

All I need to do is open Tools/Add-Ons.

This did not happen while I was testing the try-server builds linked to in comment#25 and comment#30 respectively. In fact I could open the add-ons manager multiple times without problems.
Ben, any hunches on comment #41? Build diffs?
(In reply to comment #42)
> Ben, any hunches on comment #41? Build diffs?

I have no idea; the try server opt builds are basically the same configuration as the nightly builds. Perhaps it's a different patch that's caused the crash? Not sure.
Flags: blocking1.9.2? → blocking1.9.2+
Verified fixed in Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.3a1pre) Gecko/20091121 Minefield/3.7a1pre (.NET CLR 3.5.30729)
Status: RESOLVED → VERIFIED
Verified fixed also in Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US; rv:1.9.2b4pre) Gecko/20091121 Namoroka/3.6b4pre (.NET CLR 3.5.30729)
Keywords: verified1.9.2
Severity: normal → critical
Keywords: crash
Summary: crash [@nsWeakFrame::InitInternal(nsIFrame*) ] → crash [@ nsWeakFrame::InitInternal(nsIFrame*) ]
Do older branches need this fix?
blocking1.9.1: --- → ?
status1.9.1: --- → ?
(In reply to comment #46)
> Do older branches need this fix?

No, I don't think we need to back port this use of nsWeakFrame. If we see an increase in treewalker or getaccessible crashes pre 192 we should consider it then. Thanks.
On second thought, Roc's reply to bug 504144 comment 5 is probably correct and we could backport this one too; note the nsPopupSetFrame fix is higher priority I think.
Group: core-security
blocking1.9.1: ? → .7+
Flags: wanted1.9.0.x?
Flags: blocking1.9.0.17?
Whiteboard: [sg:critical?]
Flags: wanted1.9.0.x?
Flags: wanted1.9.0.x+
Flags: blocking1.9.0.17?
Flags: blocking1.9.0.17+
I just took another look at our 1.9.1 code and realize Roc's concern doesn't apply. I retract the 1.9.1 blocking request with apologies.
Clearing pre 192 requests based on last comment.
Flags: wanted1.9.0.x+
Flags: blocking1.9.0.18+
blocking1.9.1: .8+ → ---
I probably should not have cleared status 1.9.1 wanted sorry (I can't set it back).
Crash Signature: [@ nsWeakFrame::InitInternal(nsIFrame*) ]
Group: core-security
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: