Last Comment Bug 340733 - [FIX]Crash [@ nsCSSFrameConstructor::GetFrameFor] involving image map, due to observer being called after it is removed
: [FIX]Crash [@ nsCSSFrameConstructor::GetFrameFor] involving image map, due to...
Status: RESOLVED FIXED
[sg:critical?]
: crash, regression, testcase, verified1.8.0.5, verified1.8.1
Product: Core
Classification: Components
Component: Layout (show other bugs)
: Trunk
: x86 Linux
: -- critical (vote)
: mozilla1.9alpha1
Assigned To: Boris Zbarsky [:bz]
:
Mentors:
: 342224 (view as bug list)
Depends on:
Blocks: stirdom 271669
  Show dependency treegraph
 
Reported: 2006-06-07 16:03 PDT by Jesse Ruderman
Modified: 2011-06-13 10:01 PDT (History)
15 users (show)
dveditz: blocking1.8.0.5+
jruderman: in‑testsuite+
See Also:
Crash Signature:
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
testcase (571 bytes, text/html)
2006-06-07 16:04 PDT, Jesse Ruderman
no flags Details
gdb output (linux debug) (5.88 KB, text/plain)
2006-06-07 16:05 PDT, Jesse Ruderman
no flags Details
valgrind output (linux debug) (9.06 KB, text/plain)
2006-06-07 16:06 PDT, Jesse Ruderman
no flags Details
Call sequence + stack (8.88 KB, text/html)
2006-06-09 15:46 PDT, Mats Palmgren (:mats)
no flags Details
wip (4.74 KB, patch)
2006-06-09 15:51 PDT, Mats Palmgren (:mats)
no flags Details | Diff | Review
Patch to test (15.30 KB, patch)
2006-06-11 16:53 PDT, Boris Zbarsky [:bz]
no flags Details | Diff | Review
With stack-allocated iterators (13.53 KB, patch)
2006-06-12 22:34 PDT, Boris Zbarsky [:bz]
jonas: review+
jonas: superreview+
jonas: approval‑branch‑1.8.1+
Details | Diff | Review
With strong refs (13.48 KB, patch)
2006-06-19 18:01 PDT, Boris Zbarsky [:bz]
no flags Details | Diff | Review
Actually applying to 1.8.0 branch (13.32 KB, patch)
2006-06-19 18:30 PDT, Boris Zbarsky [:bz]
dveditz: approval1.8.0.5+
Details | Diff | Review
Patch to fix that (1.91 KB, patch)
2006-06-19 23:27 PDT, Boris Zbarsky [:bz]
jonas: review+
jonas: superreview+
dveditz: approval1.8.0.5+
mconnor: approval1.8.1+
Details | Diff | Review

Description Jesse Ruderman 2006-06-07 16:03:11 PDT
Based on stack traces showing random hex addresses at the top, and seeing a 0xDDDDDxxx address in Valgrind output, I think this crash is a security hole.
Comment 1 Jesse Ruderman 2006-06-07 16:04:16 PDT
Created attachment 224777 [details]
testcase

Reloads itself because the crash is only about 50% repro.
Comment 2 Jesse Ruderman 2006-06-07 16:05:33 PDT
Created attachment 224778 [details]
gdb output (linux debug)
Comment 3 Jesse Ruderman 2006-06-07 16:06:03 PDT
Created attachment 224779 [details]
valgrind output (linux debug)
Comment 4 Jesse Ruderman 2006-06-07 16:18:39 PDT
It's easier to reproduce the crash if you turn off arena allocation of frames.
Comment 5 Mats Palmgren (:mats) 2006-06-08 19:24:51 PDT
I think I have found the problem...
Comment 6 Mats Palmgren (:mats) 2006-06-09 15:46:02 PDT
Created attachment 225084 [details]
Call sequence + stack
Comment 7 Mats Palmgren (:mats) 2006-06-09 15:47:45 PDT
The crash occurs because we get content notifications to an observer
after it has removed itself as an observer.
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/content/base/src/nsDocument.cpp&rev=3.652&root=/cvsroot&mark=2315,2325-2329#2314

When we start looping the observers we have a nsImageMap object
registered on behalf of a nsImageFrame (that owns it).
One of the earlier observers causes the frame to be destroyed (the frame
constructor does a WipeContainingBlock), so we run this:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/layout/generic/nsImageFrame.cpp&rev=1.387&root=/cvsroot&mark=246-248#240
which removes the observer:
http://bonsai.mozilla.org/cvsblame.cgi?file=/mozilla/layout/generic/nsImageMap.cpp&rev=3.124&root=/cvsroot&mark=1063-1064#1059

The problem is that we made a copy of mObservers before starting the loop
so it still contains the nsImageMap (which keeps a weak ref to the frame).
See the attached trace for the call sequence.
Comment 8 Mats Palmgren (:mats) 2006-06-09 15:51:01 PDT
Created attachment 225085 [details] [diff] [review]
wip

1. add sanity checks to nsImageMap so it doesn't crash
2. check that the observer is still observing before notifying it

Each change on its own fixes the crash, the second makes the loop O(N^2)
although I'm not sure how performance critical this dispatch code is.
(Notifying an observer that has deregistered seems like a violation of the
observer pattern to me though.)

Also, I was a bit confused by the naming of nsImageMap::Destroy() at first
because I thought it actually had something to do with the desctruction of
the nsImageMap object. We should probably rename it to
OwnerWillBeDestroyed() or something...

Comments?
Comment 9 Boris Zbarsky [:bz] 2006-06-09 17:06:11 PDT
So this is a regression from bug 271669, right?

> although I'm not sure how performance critical this dispatch code is.

Sadly, it is.  It runs a _lot_.

I do agree that it's weird to notify an observer after it's been removed.  :(
Comment 10 Mats Palmgren (:mats) 2006-06-09 20:17:19 PDT
> So this is a regression from bug 271669, right?

Yes, I'm pretty sure it is.  I can't reproduce the crash with nightlies
but I did a local backout of 271669 and it did seem to fix it...

> Sadly, it is.  It runs a _lot_.

Ok, I did some quick and dirty tests on how often we actually do
notify an observer in a dirty state and it seems very rare (<1%).

So, maybe it would be enough to keep a flag and only re-check
mObservers in that case...

Or, we could do some sort of push/pop mechanism that links up all
these stack copies so that RemoveObserver() can poke at them...
It looks like it would be possible to make the loops remove-resilient.
Comment 11 Jesse Ruderman 2006-06-09 20:28:08 PDT
Boris and I think the way to fix this is to create an array type that provides mutation-safe iterators, and use it for the observer array.  With any luck, this should reverse most of the Tdhtml and Txul increase from bug 271669, in addition to fixing the issue of observers getting called after they're removed.

Boris will probably try to implement it on Sunday, unless someone else volunteers to do it.
Comment 12 Mats Palmgren (:mats) 2006-06-10 05:43:09 PDT
> mutation-safe iterators, and use it for the observer array.

I agree, that sounds like the right solution...
Comment 13 neil@parkwaycc.co.uk 2006-06-10 14:08:25 PDT
(In reply to comment #12)
>>mutation-safe iterators, and use it for the observer array.
>I agree, that sounds like the right solution...
Event dispatch could use that too, currently it uses that O(nĀ²) check...
Comment 14 Boris Zbarsky [:bz] 2006-06-11 16:53:36 PDT
Created attachment 225217 [details] [diff] [review]
Patch to test

Can someone test it?  I can't reproduce the crash to start with, so can't really test....
Comment 15 Mats Palmgren (:mats) 2006-06-11 19:49:54 PDT
> Can someone test it? 

It fixes the crash for me.
Comment 16 Boris Zbarsky [:bz] 2006-06-11 19:55:21 PDT
Comment on attachment 225217 [details] [diff] [review]
Patch to test

Jonas, could you review?

If we want to factor this code out into a general templated thing we probably can, but I'd like to get the simpler patch in on the branches first...
Comment 17 Jesse Ruderman 2006-06-11 22:17:09 PDT
>+      NS_ASSERTION(mStep != 0, "Invalid step size");

Any reason to support an mStep other than 1 and -1?  I think doing so makes it harder to think about correctness.

>+    // How fast to advance.
>+    PRInt8 mStep;

It's more about direction than speed.  Also, please add something like "The special value 0 is used to indicate that this is a non-busy, pre-allocated iterator".

>+  PRBool AppendElement(nsIDocumentObserver* aElement) {
>+    // We're not going to worry about appends, though if we decide that forward
>+    // iterators that are at end-of-list should stay there we could do that.
>+    return mObservers.AppendElement(aElement);
>+  }

This comment makes me uncomfortable.  IMO, if observer X adds a new observer while being called, then whether the new observer is called shouldn't depend on whether X was the last observer in the list.  If we do that ("decide that forward iterators that are at end-of-list should stay there we could do that"), we'd be violating my "rule" and inviting subtle bugs.

Another problem is that if the iterator is at position 5, and someone calls Clear() and then appends 10 observers, odd things happen (for both forward and reverse iteration).  I don't know if that should be considered a problem with AppendElement or a problem with Clear.

>+  Iterator mAllocatedIterators[2];

How did you decide on two?  (How often is the second used?  How often would a third be used?)
Comment 18 Boris Zbarsky [:bz] 2006-06-11 22:31:22 PDT
> Any reason to support an mStep other than 1 and -1? 

Not really.  I could asser that the step is 1 or -1, sure.

> and someone calls Clear()

The only caller of Clear() is the nsDocument destructor, for what it's worth.

I agree that when the observers are added during notifications it's not predictable whether the get said notifications.  That's fine with me; in my opinion adding observers during notifications is an error on the part of the caller and I would be happy to add documentation to that effect to nsIDocument.  I certainly don't feel that I need to go out of my way to provide consistent behavior for such callers.  I do want to make sure that removal of observers works without crashing.

Of course if we try to make this more generic we'd have to clean it up more...

> How did you decide on two?  (How often is the second used?  How often would a
> third be used?)

I instrumented the code and spent about an hour browsing the web -- slashdot, BBC news, bugzilla, New York times, etc, while logging the length of the mIterators array at each iterator alloc (this was before I added the always-there iterators).  The length was 2 after the alloc about 1.5% of the time and 1 the rest of the time.  No other lengths were observed.
Comment 19 Jesse Ruderman 2006-06-12 00:19:45 PDT
> in my opinion adding observers during notifications is an
> error on the part of the caller

Ok.  Perhaps add an assertion in AppendElement / PrependElement that there are no active iterators, in addition to documenting that?  And add a comment in PrependElement explaining why there's both an assertion and code that executes only in the error case?  Then I can tell you if I ever hit the assertion ;)
Comment 20 Boris Zbarsky [:bz] 2006-06-12 08:18:10 PDT
I'm almost certain we'd hit those assertions, actually, which makes me hesitant to add them -- I suspect that generally speaking observers are added to watch for notifications other than the one being fired anyway, so the whole thing is a non-issue...
Comment 21 Mats Palmgren (:mats) 2006-06-12 08:31:20 PDT
Speaking of assertions, maybe we should take the change I suggested for
nsImageMap::Destroy() and add assertions to the nsImageMap callback methods?
Not that I'm implying that there is anything wrong with your patch Boris,
but it'd be a cheap way to detect future errors like this bug...
Comment 22 Boris Zbarsky [:bz] 2006-06-12 09:18:22 PDT
Yeah, changing nsImageMap too might be a good idea...  Possibly just the asserts, not the non-debug bail-out, depending on how safe we feel with my patch.
Comment 23 Jonas Sicking (:sicking) 2006-06-12 16:24:56 PDT
I'm not super happy about this approach, the alternative would be to force observers to be able to deal with getting notifications even after they're removed.

But if what we do now (copy the observer list) costs us performance then I think we should go with the new approach.

However I think the implementation is overly complicated. Why not simply use stack-based iterators rather than bother with preallocated ones and such. And you could even get rid of the iterators array by making them into a linked list.
Comment 24 Boris Zbarsky [:bz] 2006-06-12 18:23:50 PDT
> Why not simply use stack-based iterators

Hmm...  I suppose I can, yeah.  I'd forgotten that none of this stuff really has to survive indefinitely.  ;)

OK, I'll do that.  And I like the linked list idea.
Comment 25 Boris Zbarsky [:bz] 2006-06-12 22:34:22 PDT
Created attachment 225384 [details] [diff] [review]
With stack-allocated iterators

Mats, could you test this?

This addresses Jesse's comments, I think.

I could add a private operator new to ensure stack-allocation, but that doesn't seem worthwhile.
Comment 26 Mats Palmgren (:mats) 2006-06-13 07:36:15 PDT
(In reply to comment #25)
> Mats, could you test this?

Works fine for me.

+    // The observer array to work with
+    nsDocumentObserverList& mList;

"array" -> "list"
Comment 27 Jonas Sicking (:sicking) 2006-06-19 14:49:42 PDT
Comment on attachment 225384 [details] [diff] [review]
With stack-allocated iterators

Looks good.

r/sr/a=sicking
Comment 28 Jonas Sicking (:sicking) 2006-06-19 16:25:38 PDT
Land this on the trunk and 1.8.1 branch and we'll let it bake until tuesday or wednesday before landing on 1.8.0.

The only thing that i'm worried about is that we no longer keep owning references to observers as we notify them. This was a recent change, but maybe someone's started relying on it already.

Another possibility is to hold strong references... Might be a good idea to follow xpcom rules.
Comment 29 Boris Zbarsky [:bz] 2006-06-19 18:01:29 PDT
Created attachment 226250 [details] [diff] [review]
With strong refs
Comment 30 Boris Zbarsky [:bz] 2006-06-19 18:02:56 PDT
Fixed on trunk and 1.8 branch.
Comment 31 Boris Zbarsky [:bz] 2006-06-19 18:30:08 PDT
Created attachment 226255 [details] [diff] [review]
Actually applying to 1.8.0 branch
Comment 32 Boris Zbarsky [:bz] 2006-06-19 20:16:05 PDT
And this did indeed seem to improve Tdhtml by about 1%.
Comment 33 Robert O'Callahan (:roc) (Exited; email my personal email if necessary) 2006-06-19 22:35:55 PDT
I'm getting a lot of "###!!! ASSERTION: QueryInterface needed: 'query_result.get() == mRawPtr', file ../../../dist/include/xpcom/nsCOMPtr.h, line 594" in here:

#4  0x00002aaaae2b01b0 in nsDocument::BeginUpdate (this=0xd44ec0,
    aUpdateType=4) at ../../../../mozilla/content/base/src/nsDocument.cpp:2204
2204      NS_DOCUMENT_NOTIFY_OBSERVERS(BeginUpdate, (this, aUpdateType));
Comment 34 Boris Zbarsky [:bz] 2006-06-19 23:27:53 PDT
Created attachment 226278 [details] [diff] [review]
Patch to fix that

I landed this on trunk to fix balsa orange, but I'd love reviews....  I'll handle merging to branches as needed for the XUL template stuff.
Comment 35 Jonas Sicking (:sicking) 2006-06-20 00:05:13 PDT
Actually.. how about just using an nsRefPtr? I've never thought that nsIDocumentObserver should be an interface anyway.
Comment 36 Boris Zbarsky [:bz] 2006-06-20 08:07:30 PDT
Using nsRefPtr would mean bigger code, I think (since we're already instantiating nsCOMPtr<nsIDocumentObserver> elsewhere in content, that version of the template is already being created, while there are no nsRefPtr<nsIDocumentObserver>s around).  And if we make it not be an interface, fine; but while it is, things that implement it should QI to it.
Comment 37 Jonas Sicking (:sicking) 2006-06-20 11:18:37 PDT
Comment on attachment 226278 [details] [diff] [review]
Patch to fix that

All addref and release stuff is inlined, so there is no difference codesize wise.

But I'm fine with making it a refptr when/if we make it not be an interface.
Comment 38 Daniel Veditz [:dveditz] 2006-06-20 14:12:36 PDT
Comment on attachment 226278 [details] [diff] [review]
Patch to fix that

approved for 1.8.0 branch, a=dveditz for drivers
Comment 39 Daniel Veditz [:dveditz] 2006-06-20 14:12:45 PDT
Comment on attachment 226255 [details] [diff] [review]
Actually applying to 1.8.0 branch

approved for 1.8.0 branch, a=dveditz for drivers
Comment 40 Boris Zbarsky [:bz] 2006-06-20 14:22:10 PDT
Checked in both patches for 1.8.0.5.  Still waiting on 1.8.1 approval for the second patch...
Comment 41 Jonas Sicking (:sicking) 2006-06-20 17:47:49 PDT
*** Bug 342224 has been marked as a duplicate of this bug. ***
Comment 42 Boris Zbarsky [:bz] 2006-06-22 09:11:37 PDT
Assertions fixed on 1.8 branch too.
Comment 43 Jay Patel [:jay] 2006-06-26 15:13:59 PDT
v.fixed on 1.8.0 branch with Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en-US;
rv:1.8.0.5) Gecko/20060626 Firefox/1.5.0.5, no crash with testcase.
Comment 44 Bob Clary [:bc:] 2006-08-22 10:32:41 PDT
https://bugzilla.mozilla.org/attachment.cgi?id=224777&action=view
ff2b2 windows, linux, macppc no crash

###!!! ASSERTION: FormatStringFromName() without format parameters: use GetStringFromName() instead: 'aParams && aLength', file /work/mozilla/builds/ff/2.0/mozilla/intl/strres/src/nsStringBundle.cpp, line 226

Comment 45 Jesse Ruderman 2006-10-13 07:46:29 PDT
Would it make sense to use the same approach to fix bug 345529?
Comment 46 Jesse Ruderman 2007-12-15 14:58:35 PST
Crashtest checked in.

Note You need to log in before you can comment on or make changes to this bug.