Closed Bug 340733 Opened 18 years ago Closed 18 years ago

[FIX]Crash [@ nsCSSFrameConstructor::GetFrameFor] involving image map, due to observer being called after it is removed

Categories

(Core :: Layout, defect)

x86
Linux
defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla1.9alpha1

People

(Reporter: jruderman, Assigned: bzbarsky)

References

Details

(5 keywords, Whiteboard: [sg:critical?])

Crash Data

Attachments

(7 files, 3 obsolete files)

Based on stack traces showing random hex addresses at the top, and seeing a 0xDDDDDxxx address in Valgrind output, I think this crash is a security hole.
Attached file testcase
Reloads itself because the crash is only about 50% repro.
It's easier to reproduce the crash if you turn off arena allocation of frames.
Whiteboard: [sg:critical?]
I think I have found the problem...
Assignee: nobody → mats.palmgren
The crash occurs because we get content notifications to an observer
after it has removed itself as an observer.
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/content/base/src/nsDocument.cpp&rev=3.652&root=/cvsroot&mark=2315,2325-2329#2314

When we start looping the observers we have a nsImageMap object
registered on behalf of a nsImageFrame (that owns it).
One of the earlier observers causes the frame to be destroyed (the frame
constructor does a WipeContainingBlock), so we run this:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/layout/generic/nsImageFrame.cpp&rev=1.387&root=/cvsroot&mark=246-248#240
which removes the observer:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/layout/generic/nsImageMap.cpp&rev=3.124&root=/cvsroot&mark=1063-1064#1059

The problem is that we made a copy of mObservers before starting the loop
so it still contains the nsImageMap (which keeps a weak ref to the frame).
See the attached trace for the call sequence.
Attached patch wipSplinter Review
1. add sanity checks to nsImageMap so it doesn't crash
2. check that the observer is still observing before notifying it

Each change on its own fixes the crash, the second makes the loop O(N^2)
although I'm not sure how performance critical this dispatch code is.
(Notifying an observer that has deregistered seems like a violation of the
observer pattern to me though.)

Also, I was a bit confused by the naming of nsImageMap::Destroy() at first
because I thought it actually had something to do with the desctruction of
the nsImageMap object. We should probably rename it to
OwnerWillBeDestroyed() or something...

Comments?
So this is a regression from bug 271669, right?

> although I'm not sure how performance critical this dispatch code is.

Sadly, it is.  It runs a _lot_.

I do agree that it's weird to notify an observer after it's been removed.  :(
Blocks: 271669
Flags: blocking1.9a1?
Flags: blocking1.8.1?
Flags: blocking1.8.0.5?
> So this is a regression from bug 271669, right?

Yes, I'm pretty sure it is.  I can't reproduce the crash with nightlies
but I did a local backout of 271669 and it did seem to fix it...

> Sadly, it is.  It runs a _lot_.

Ok, I did some quick and dirty tests on how often we actually do
notify an observer in a dirty state and it seems very rare (<1%).

So, maybe it would be enough to keep a flag and only re-check
mObservers in that case...

Or, we could do some sort of push/pop mechanism that links up all
these stack copies so that RemoveObserver() can poke at them...
It looks like it would be possible to make the loops remove-resilient.
Keywords: regression
Boris and I think the way to fix this is to create an array type that provides mutation-safe iterators, and use it for the observer array.  With any luck, this should reverse most of the Tdhtml and Txul increase from bug 271669, in addition to fixing the issue of observers getting called after they're removed.

Boris will probably try to implement it on Sunday, unless someone else volunteers to do it.
Summary: Crash [@ nsCSSFrameConstructor::GetFrameFor] involving image map and table → Crash [@ nsCSSFrameConstructor::GetFrameFor] involving image map, due to observer being called after it is removed
> mutation-safe iterators, and use it for the observer array.

I agree, that sounds like the right solution...
Assignee: mats.palmgren → bzbarsky
(In reply to comment #12)
>>mutation-safe iterators, and use it for the observer array.
>I agree, that sounds like the right solution...
Event dispatch could use that too, currently it uses that O(n²) check...
Attached patch Patch to test (obsolete) — Splinter Review
Can someone test it?  I can't reproduce the crash to start with, so can't really test....
> Can someone test it? 

It fixes the crash for me.
Comment on attachment 225217 [details] [diff] [review]
Patch to test

Jonas, could you review?

If we want to factor this code out into a general templated thing we probably can, but I'd like to get the simpler patch in on the branches first...
Attachment #225217 - Flags: superreview?(bugmail)
Attachment #225217 - Flags: review?(bugmail)
Summary: Crash [@ nsCSSFrameConstructor::GetFrameFor] involving image map, due to observer being called after it is removed → [FIX]Crash [@ nsCSSFrameConstructor::GetFrameFor] involving image map, due to observer being called after it is removed
Target Milestone: --- → mozilla1.9alpha
>+      NS_ASSERTION(mStep != 0, "Invalid step size");

Any reason to support an mStep other than 1 and -1?  I think doing so makes it harder to think about correctness.

>+    // How fast to advance.
>+    PRInt8 mStep;

It's more about direction than speed.  Also, please add something like "The special value 0 is used to indicate that this is a non-busy, pre-allocated iterator".

>+  PRBool AppendElement(nsIDocumentObserver* aElement) {
>+    // We're not going to worry about appends, though if we decide that forward
>+    // iterators that are at end-of-list should stay there we could do that.
>+    return mObservers.AppendElement(aElement);
>+  }

This comment makes me uncomfortable.  IMO, if observer X adds a new observer while being called, then whether the new observer is called shouldn't depend on whether X was the last observer in the list.  If we do that ("decide that forward iterators that are at end-of-list should stay there we could do that"), we'd be violating my "rule" and inviting subtle bugs.

Another problem is that if the iterator is at position 5, and someone calls Clear() and then appends 10 observers, odd things happen (for both forward and reverse iteration).  I don't know if that should be considered a problem with AppendElement or a problem with Clear.

>+  Iterator mAllocatedIterators[2];

How did you decide on two?  (How often is the second used?  How often would a third be used?)
> Any reason to support an mStep other than 1 and -1? 

Not really.  I could asser that the step is 1 or -1, sure.

> and someone calls Clear()

The only caller of Clear() is the nsDocument destructor, for what it's worth.

I agree that when the observers are added during notifications it's not predictable whether the get said notifications.  That's fine with me; in my opinion adding observers during notifications is an error on the part of the caller and I would be happy to add documentation to that effect to nsIDocument.  I certainly don't feel that I need to go out of my way to provide consistent behavior for such callers.  I do want to make sure that removal of observers works without crashing.

Of course if we try to make this more generic we'd have to clean it up more...

> How did you decide on two?  (How often is the second used?  How often would a
> third be used?)

I instrumented the code and spent about an hour browsing the web -- slashdot, BBC news, bugzilla, New York times, etc, while logging the length of the mIterators array at each iterator alloc (this was before I added the always-there iterators).  The length was 2 after the alloc about 1.5% of the time and 1 the rest of the time.  No other lengths were observed.
> in my opinion adding observers during notifications is an
> error on the part of the caller

Ok.  Perhaps add an assertion in AppendElement / PrependElement that there are no active iterators, in addition to documenting that?  And add a comment in PrependElement explaining why there's both an assertion and code that executes only in the error case?  Then I can tell you if I ever hit the assertion ;)
I'm almost certain we'd hit those assertions, actually, which makes me hesitant to add them -- I suspect that generally speaking observers are added to watch for notifications other than the one being fired anyway, so the whole thing is a non-issue...
Speaking of assertions, maybe we should take the change I suggested for
nsImageMap::Destroy() and add assertions to the nsImageMap callback methods?
Not that I'm implying that there is anything wrong with your patch Boris,
but it'd be a cheap way to detect future errors like this bug...
Yeah, changing nsImageMap too might be a good idea...  Possibly just the asserts, not the non-debug bail-out, depending on how safe we feel with my patch.
Flags: blocking1.8.0.5? → blocking1.8.0.5+
I'm not super happy about this approach, the alternative would be to force observers to be able to deal with getting notifications even after they're removed.

But if what we do now (copy the observer list) costs us performance then I think we should go with the new approach.

However I think the implementation is overly complicated. Why not simply use stack-based iterators rather than bother with preallocated ones and such. And you could even get rid of the iterators array by making them into a linked list.
> Why not simply use stack-based iterators

Hmm...  I suppose I can, yeah.  I'd forgotten that none of this stuff really has to survive indefinitely.  ;)

OK, I'll do that.  And I like the linked list idea.
Attached patch With stack-allocated iterators (obsolete) — Splinter Review
Mats, could you test this?

This addresses Jesse's comments, I think.

I could add a private operator new to ensure stack-allocation, but that doesn't seem worthwhile.
Attachment #225217 - Attachment is obsolete: true
Attachment #225384 - Flags: superreview?(bugmail)
Attachment #225384 - Flags: review?(bugmail)
Attachment #225217 - Flags: superreview?(bugmail)
Attachment #225217 - Flags: review?(bugmail)
(In reply to comment #25)
> Mats, could you test this?

Works fine for me.

+    // The observer array to work with
+    nsDocumentObserverList& mList;

"array" -> "list"
Attachment #225384 - Flags: approval1.8.0.5?
Attachment #225384 - Flags: approval-branch-1.8.1?(bugmail)
Whiteboard: [sg:critical?] → [sg:critical?] need r/sr=sicking
Comment on attachment 225384 [details] [diff] [review]
With stack-allocated iterators

Looks good.

r/sr/a=sicking
Attachment #225384 - Flags: superreview?(bugmail)
Attachment #225384 - Flags: superreview+
Attachment #225384 - Flags: review?(bugmail)
Attachment #225384 - Flags: review+
Attachment #225384 - Flags: approval-branch-1.8.1?(bugmail)
Attachment #225384 - Flags: approval-branch-1.8.1+
Land this on the trunk and 1.8.1 branch and we'll let it bake until tuesday or wednesday before landing on 1.8.0.

The only thing that i'm worried about is that we no longer keep owning references to observers as we notify them. This was a recent change, but maybe someone's started relying on it already.

Another possibility is to hold strong references... Might be a good idea to follow xpcom rules.
Attached patch With strong refs (obsolete) — Splinter Review
Attachment #225384 - Attachment is obsolete: true
Attachment #226250 - Flags: approval1.8.0.5?
Attachment #225384 - Flags: approval1.8.0.5?
Fixed on trunk and 1.8 branch.
Status: NEW → RESOLVED
Closed: 18 years ago
Keywords: fixed1.8.1
Resolution: --- → FIXED
Attachment #226250 - Attachment is obsolete: true
Attachment #226255 - Flags: approval1.8.0.5?
Attachment #226250 - Flags: approval1.8.0.5?
And this did indeed seem to improve Tdhtml by about 1%.
I'm getting a lot of "###!!! ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file ../../../dist/include/xpcom/nsCOMPtr.h, line 594" in here:

#4  0x00002aaaae2b01b0 in nsDocument::BeginUpdate (this=0xd44ec0,
    aUpdateType=4) at ../../../../mozilla/content/base/src/nsDocument.cpp:2204
2204      NS_DOCUMENT_NOTIFY_OBSERVERS(BeginUpdate, (this, aUpdateType));
I landed this on trunk to fix balsa orange, but I'd love reviews....  I'll handle merging to branches as needed for the XUL template stuff.
Attachment #226278 - Flags: superreview?(bugmail)
Attachment #226278 - Flags: review?(bugmail)
Attachment #226278 - Flags: approval1.8.0.5?
Attachment #226278 - Flags: approval-branch-1.8.1?(bugmail)
Actually.. how about just using an nsRefPtr? I've never thought that nsIDocumentObserver should be an interface anyway.
Whiteboard: [sg:critical?] need r/sr=sicking → [sg:critical?]
Attachment #226278 - Flags: approval-branch-1.8.1?(bugmail)
Using nsRefPtr would mean bigger code, I think (since we're already instantiating nsCOMPtr<nsIDocumentObserver> elsewhere in content, that version of the template is already being created, while there are no nsRefPtr<nsIDocumentObserver>s around).  And if we make it not be an interface, fine; but while it is, things that implement it should QI to it.
Attachment #226278 - Flags: approval1.8.1?
Comment on attachment 226278 [details] [diff] [review]
Patch to fix that

All addref and release stuff is inlined, so there is no difference codesize wise.

But I'm fine with making it a refptr when/if we make it not be an interface.
Attachment #226278 - Flags: superreview?(bugmail)
Attachment #226278 - Flags: superreview+
Attachment #226278 - Flags: review?(bugmail)
Attachment #226278 - Flags: review+
Comment on attachment 226278 [details] [diff] [review]
Patch to fix that

approved for 1.8.0 branch, a=dveditz for drivers
Attachment #226278 - Flags: approval1.8.0.5? → approval1.8.0.5+
Comment on attachment 226255 [details] [diff] [review]
Actually applying to 1.8.0 branch

approved for 1.8.0 branch, a=dveditz for drivers
Attachment #226255 - Flags: approval1.8.0.5? → approval1.8.0.5+
Checked in both patches for 1.8.0.5.  Still waiting on 1.8.1 approval for the second patch...
Keywords: fixed1.8.0.5
*** Bug 342224 has been marked as a duplicate of this bug. ***
Attachment #226278 - Flags: approval1.8.1? → approval1.8.1+
Assertions fixed on 1.8 branch too.
v.fixed on 1.8.0 branch with Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US;
rv:1.8.0.5) Gecko/20060626 Firefox/1.5.0.5, no crash with testcase.
Flags: blocking1.9a1?
Flags: blocking1.8.1?
https://bugzilla.mozilla.org/attachment.cgi?id=224777&action=view
ff2b2 windows, linux, macppc no crash

###!!! ASSERTION: FormatStringFromName() without format parameters: use GetStringFromName() instead: 'aParams && aLength', file /work/mozilla/builds/ff/2.0/mozilla/intl/strres/src/nsStringBundle.cpp, line 226

Would it make sense to use the same approach to fix bug 345529?
Group: security
Flags: in-testsuite?
Crashtest checked in.
Flags: in-testsuite? → in-testsuite+
Crash Signature: [@ nsCSSFrameConstructor::GetFrameFor]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: