Closed Bug 354711 Opened 13 years ago Closed 13 years ago

Crash when accessing images.length

Categories

(Core :: XPCOM, defect, critical)

1.8 Branch
x86
All
defect
Not set
critical

Tracking

()

VERIFIED FIXED

People

(Reporter: sylvain.pasche, Assigned: smaug)

References

Details

(4 keywords)

Attachments

(4 files, 2 obsolete files)

How to reproduce:

- set extensions.checkCompatbility to false
- install "Extended Statusbar" extension (https://addons.mozilla.org/firefox/1433/)
- Go to http://wms1.ccgis.de/mapbender_dev/frames/login.php?name=test&password=test&mb_user_myGui=gui_digitize

Talkback seems down right now. I will attach a stacktrace.

The code form the extension which crashes is the last line of:


  onProgressChange : function (aProgress, aRequest,
                               aCurSelfProgress, aMaxSelfProgress,
                               aCurTotalProgress, aMaxTotalProgress)
  {
      [...]
      for (var i = 0; i < window._content.window.frames.length; i++)
      {
         docimgs = window._content.window.frames[i].document.images;
         allimgsc += docimgs.length;


When the .length property of the document.images in a frame is accessed.
Attached file stacktrace
s/extensions.checkCompatbility/extensions.checkCompatibility/

sorry
Severity: normal → critical
So nsBaseContentList::Reset gets re-called inside
nsBaseContentList::Reset... :(
It would probably be a good idea to make nsCOMArray::Clear safer such that it kills elements from the back by removing them from the array first and then releases them.
Attached patch fixes the crash (obsolete) — Splinter Review
This fixes the crash, but I agree it is better to make nsCOMArray safer.
Attached patch make nsCOMArray safer (obsolete) — Splinter Review
Yeah, I think we should fix this in the comarray instead.

Actually, the best way to fix it in the comarray would probably be to copy mImpl to a local variable and clear the elements in there.
Comment on attachment 240593 [details] [diff] [review]
make nsCOMArray safer

I'd say this is a bit too slow. Either do the mImpl trick, or you could alternativly add a .swap() function to nsVoidArray and nsCOMArray use that.
Attached patch proposed patch.Splinter Review
The patch fixes the crash (tested both 1.8 and trunk) and
is perhaps the simplest way to fix this.

Adding Swap wouldn't work in general cases, if someone tried
to swap for example SimpleVoidArray and nsAutoVoidArray.
Or at least I couldn't come up with any sensible API.
Assignee: general → Olli.Pettay
Attachment #240591 - Attachment is obsolete: true
Attachment #240593 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Component: DOM → XPCOM
Comment on attachment 240610 [details] [diff] [review]
proposed patch.

Not sure who is the right person to review this.
Attachment #240610 - Flags: review?(dbaron)
QA Contact: ian → xpcom
So wait.  How did we get into a state where the content list was the only thing holding a ref to the image?  That shouldn't be happening, should it?  Or is this on branch and the image is in a disconnected subtree or something?
Or is the issue that state is DIRTY and hence what's in the list has no bearing on reality?

We should probably clear the list when state goes DIRTY now that we're holding strong refs.  :(  Otherwise we could well hold refs to random stuff for a _long_ time.  I don't like the performance aspects, but I'm not that happy with the perf aspects of the COMArray changes either, and those would affect other callers too.  And the memory usage issue remains even with the COMArray changes.
Actually, perf of clearing when going to DIRTY is fine -- just moves the clear to a different location.  And then assert that we're empty where we clear right now.
This is a regression, for what it's worth.  We should really get this fixed for 1.8.1, I think.  :(
Flags: blocking1.9?
Flags: blocking1.8.1?
Flags: blocking1.8.0.8?
(In reply to comment #11)
>Or is this on branch 

This happens both in trunk and branch


Comment on attachment 240610 [details] [diff] [review]
proposed patch.

For what it's worth, I tend to think bugs like this are bugs in the caller -- and I'm not really confident that this will prevent bad things from happening in the future, since we'll still make it all the way to recurring into nsBaseContentList::Reset, which could easily do bad things.

That said, this patch does seem ok, although

> void
> nsCOMArray_base::Clear()
> {
>-    mArray.EnumerateForwards(ReleaseObjects, nsnull);
>-    mArray.Clear();
>+    nsVoidArray tmpArray;

I don't know of a good way to write an assertion around types without writing a bunch of templates, but at least add an assertion that sizeof(tmpArray) and sizeof(mArray) are equal, with the assertion text that tmpArray and mArray must be the same type.


I'd offset these 4 lines with whitespace before and after them, with the comment "swap tmpArray and mArray".
>+    char tmp[sizeof(nsVoidArray)];
>+    memcpy(tmp, &tmpArray, sizeof(nsVoidArray));
>+    memcpy(&tmpArray, &mArray, sizeof(nsVoidArray));
>+    memcpy(&mArray, tmp, sizeof(nsVoidArray));

>+    tmpArray.EnumerateForwards(ReleaseObjects, nsnull);

>+    tmpArray.Clear();

No need to bother with explicitly Clearing tmpArray.

Might want additional review from bsmedberg if he's available.
Attachment #240610 - Flags: review?(dbaron)
Attachment #240610 - Flags: review?(benjamin)
Attachment #240610 - Flags: review+
(Recurring into PopulateSelf seems more dangerous than recurring into Reset, and we'd still do that as well.)
Comment on attachment 240610 [details] [diff] [review]
proposed patch.

Why can't this just be:

nsVoidArray tmp(mArray);
mArray.Clear();

tmp.EnumerateForward(ReleaseObjects, nsnull);
(In reply to comment #18)
> (From update of attachment 240610 [details] [diff] [review] [edit])
> Why can't this just be: 

because
‘nsVoidArray::nsVoidArray(const nsVoidArray&)’ is private


Yay for us. Then how about nsVoidArray tmp; tmp = mArray; since the assignment operator is implemented?
(In reply to comment #20)
> Yay for us. Then how about nsVoidArray tmp; tmp = mArray; since the assignment
> operator is implemented?
> 

That does more work than just those 3 memcpys.

But I'm just testing whether this could be fixed by calling Reset()
whenever state goes dirty.
This works too. Clearing the elements list whenever the state goes dirty.
(Tested on 1.8 and trunk, though this patch applies to trunk only)
Attachment #240637 - Flags: review?(bzbarsky)
Any idea when this regressed? 
So what caused this regression, and why?  It seems like we might be better off moving to as well-tested a state as we can at this point.

The second patch seems to be asserting that we're better off releasing during ContentInserted, ContentAppended, ContentRemoved, and AttributeChanged.  How do we know it's safe to release there?  Could it introduce some other bug?
Talking to DBaron and Sicking we are a little nervous about this fix for RC2.  Given we have the same problem on 1.5.0.7 (unfortunately) we should probably review and land the second patch on trunk and try and get this on a 1.8.1.1 release..
Flags: blocking1.8.1?
Flags: blocking1.8.1.1?
Flags: blocking1.8.1-
Comment on attachment 240610 [details] [diff] [review]
proposed patch.

Removing my review+ given that I think it's really the callers responsibility not to recur into manipulating objects that are currently being manipulated.
Attachment #240610 - Flags: review+
Yeah, this is a regression from bug 348062.

I do think we should fix this on branches, and I can see being nervous about it for RC2.  Given that this issue can only be triggered by extensions (because it involves progress listeners) and that we can probably live with the memory issues, doing this for 1.8.0.9/1.8.1.1 sounds reasonable.
Blocks: 348062
Comment on attachment 240637 [details] [diff] [review]
alternative patch

>Index: content/base/src/nsContentList.h
>+  /**
>+   * Sets the state to LIST_DIRTY and clears mElements array.
>+   */

Document that this is the only acceptable way to set state to LIST_DIRTY?

With that r+sr=bzbarsky. Thanks for dealing with this!
Attachment #240637 - Flags: superreview+
Attachment #240637 - Flags: review?(bzbarsky)
Attachment #240637 - Flags: review+
So we have three options here:

1. Back out the original patch that caused this. That regresses a security bug.
   Though arguably that patch caused another security bug, i.e. this one, so 
   we'd regress an uncommon crasher for a more common one.

2. Leave as status quo. I'm actually worried that there may be other things that
   could be regressed here. All sorts of things tend to happen in destructors,
   so it's bad that we're potentially killing nodes in unexpected places.

3. Take one of the patches in this bug.


I actually think we should do 3. I don't think that the crash we're seing in this bug is known uncommon enough that we should try to live with it until the next dot-release. Landing attachment 240637 [details] [diff] [review] makes us behave much more like we did before the regression, without exposing us to security issues.

I do think we should land attachment 240610 [details] [diff] [review] as well (though in a slightly different form), but not for 2.0 release.
Flags: blocking1.8.1- → blocking1.8.1?
Comment on attachment 240637 [details] [diff] [review]
alternative patch

>@@ -787,20 +783,19 @@ void 
> nsContentList::PopulateSelf(PRUint32 aNeededLength)
> {
>   if (!mRootNode) {
>     return;
>   }
> 
>   ASSERT_IN_SYNC;
> 
>-  if (mState == LIST_DIRTY) {
>-    Reset();
>-  }

Please leave this in on the branch, just in case.

>   PRUint32 count = mElements.Count();
>+  NS_ASSERTION(mState != LIST_DIRTY || count == 0,
>+               "Reset() not called when setting state to LIST_DIRTY?");
> 
>   if (count >= aNeededLength) // We're all set
>     return;
> 
>   PRUint32 elementsToAppend = aNeededLength - count;
> #ifdef DEBUG
>   PRUint32 invariant = elementsToAppend + mElements.Count();
> #endif
Comment on attachment 240637 [details] [diff] [review]
alternative patch

r/sr=me fwiw
Attachment #240637 - Flags: approval1.8.1?
Comment on attachment 240637 [details] [diff] [review]
alternative patch

Approved for RC2. Thanks everyone.
Attachment #240637 - Flags: approval1.8.1? → approval1.8.1+
Whiteboard: [checkin needed (1.8 branch)]
Checked in to trunk and 1.8.1
Keywords: fixed1.8.1
Whiteboard: [checkin needed (1.8 branch)]
Attachment #240637 - Flags: approval1.8.0.8?
(Btw, when using *trunk* to test this (with or without patches), I see quite often crash Bug 330776 happening before the actual test is loaded. 
I've seen that bug also in some other cases when leaving a bug document. But anyway, that has nothing to do with this bug.)
This caused bug 355026, apparently on the 1.8 branch only.
(In reply to comment #37)
> This caused bug 355026, apparently on the 1.8 branch only.
> 

ARGH!
Comment on attachment 240637 [details] [diff] [review]
alternative patch

Because of bug 355026, need to think better patch for branches.
Attachment #240637 - Flags: approval1.8.0.8?
Backed out from 1.8.1 per comment
https://bugzilla.mozilla.org/show_bug.cgi?id=355026#c7
Keywords: fixed1.8.1
Clearing the 1.8.1 nom.  We are likely going to be shipping with this since the fix has caused regressions.  Would be great to get a fix ready for 1.8.1.1
Flags: blocking1.8.1?
Er, right.  So the problem is that the design on branch is to mark state as dirty at the end of PopulateSelf if we have no way of keeping track of mutations.

That doesn't work if marking dirty clears the list.

So the answer is probably to hoist that SetDirty call out to the callers of PopulateSelf.  So the code flow should be:

1)  PopulateSelf
2)  Get the answer the caller wants (node or index or whatever)
3)  SetDirty().

Right now #3 is happening before #2, which means we always have an empty list there, which is what breaks.
I'm more or less gone until sometime Tuesday late afternoon.  Sicking, if you decide this is ok, and it gets some testing, would you mind driving it in?
Attachment #240859 - Flags: superreview?(bugmail)
Attachment #240859 - Flags: review?(bugmail)
Comment on attachment 240859 [details] [diff] [review]
Branch patch with comment 42 taken into account

Doesn't really matter, but r=me, even though this is a bit ugly. (But don't know anything else that could be done on branches.)
Attachment #240859 - Flags: review+
Keywords: regression
Moving nomination to 1.8.0.9 -- if it's too risky for 1.8.1 then we don't want it in 1.8.0.8 either. Especially given the regressions.

If 1.8.1 approves attachment 240859 [details] [diff] [review] we'll take it for 1.8.0.8 too, but given the paltry number of nightly 1.8.0.x testers we can't take as many regression risks.
Flags: blocking1.8.0.8? → blocking1.8.0.9?
Oh, and note to self and everyone else here. When we're this close to release we should always make sure to write, review and test branch patches, not trunk patches. Not trying to put blame one anyone else since I reviewed and recommended the original patch to go on branch, just trying to learn from the misstake.
Comment on attachment 240859 [details] [diff] [review]
Branch patch with comment 42 taken into account

> nsContentList::PopulateSelf(PRUint32 aNeededLength)
> {
>   if (mState == LIST_DIRTY) {
>     Reset();
>   }
>   PRUint32 count = mElements.Count();
>+  NS_ASSERTION(mState != LIST_DIRTY || count == 0,
>+               "Reset() not called when setting state to LIST_DIRTY?");

The assertion should go before the |if| statement above.

r/sr=me with that. But I do think we should wait until 1.8.1.1 for this. Unfortunately the trunk is different enough that we can't really trunk bake this patch, so we should land asap after the tree opens.

Nominating for 1.8.0.8, but we may consider waiting for .9
Attachment #240859 - Flags: superreview?(bugmail)
Attachment #240859 - Flags: superreview+
Attachment #240859 - Flags: review?(bugmail)
Attachment #240859 - Flags: review+
Attachment #240859 - Flags: approval1.8.0.8?
The other thing we should do is add dynamic regression tests for content lists (there are XXX comments to this effect in the file) so we could exercise this code.  If we'd had those up, testing of the initial patch would have caught the problem.  At least if we had a way to run the tests on branch...
Oh, and removing 1.9 blocking nomination, since things are all good there.
Flags: blocking1.9?
Attachment #240610 - Flags: review?(benjamin)
Comment on attachment 240859 [details] [diff] [review]
Branch patch with comment 42 taken into account

We still don't want this on 1.8.0.8 if you think it should wait for 1.8.1.1. Moving approval request to 1.8.0.9
Attachment #240859 - Flags: approval1.8.0.9?
Attachment #240859 - Flags: approval1.8.0.8?
Attachment #240859 - Flags: approval1.8.0.8-
Comment on attachment 240637 [details] [diff] [review]
alternative patch

Clearing a flag to match patch backout
Attachment #240637 - Flags: approval1.8.1+
Moving nomination so we don't lose track, since this hasn't landed on trunk for testing the current release seems unlikely.
Flags: blocking1.8.1.2?
Flags: blocking1.8.1.1?
Flags: blocking1.8.0.9?
Flags: blocking1.8.0.10?
This has landed on trunk, but 1.8 patch is a bit different.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Attachment #240859 - Flags: approval1.8.1.1?
(In reply to comment #53)
> This has landed on trunk, but 1.8 patch is a bit different.
> 

So, shouldn't this be reopened, since it is a branch bug? Or is there another bug for branch and make this one trunk?
Comment on attachment 240859 [details] [diff] [review]
Branch patch with comment 42 taken into account

approved for 1.8/1.8.0 branch, a=dveditz for drivers
Attachment #240859 - Flags: approval1.8.1.1?
Attachment #240859 - Flags: approval1.8.1.1+
Attachment #240859 - Flags: approval1.8.0.9?
Attachment #240859 - Flags: approval1.8.0.9+
Fixed on both branches.
Flags: blocking1.8.1.2?
Flags: blocking1.8.1.1+
Flags: blocking1.8.0.9+
Flags: blocking1.8.0.10?
Verified fixed on trunk and branches, by using builds (with the Extended Statusbar extension installed) before the patch was checked in and after the patch was checked in.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.