Closed Bug 668245 Opened 9 years ago Closed 9 years ago

Crash [@ mozilla::places::History::NotifyVisited(nsIURI*) ]

Categories

(Toolkit :: Places, defect, critical)

x86
All
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla8
Tracking Status
firefox5 - wontfix
firefox6 + fixed
firefox7 + fixed
blocking2.0 --- -
status2.0 --- wontfix
status1.9.2 --- unaffected
status1.9.1 --- unaffected

People

(Reporter: bc, Assigned: mak)

References

(Blocks 1 open bug, )

Details

(Keywords: crash, verified-aurora, verified-beta, Whiteboard: [sg:critical?][qa!])

Crash Data

Attachments

(8 files, 1 obsolete file)

Attached file crash report 1
1. http://bella.heavenforum.org/t1220-koreana-film-mesum-bali
2. Opens multitudes of external programs for various protocols for example mail instances (Outlook on the Crash Automation) as well as IRC if you have that set up. 
3. Crash Aurora

Original Socorro report bp-6165cea3-0d6e-4841-ba25-f33692110626

0 	xul.dll 	mozilla::places::History::NotifyVisited 	
1 	xul.dll 	mozilla::places::`anonymous namespace'::NotifyVisitObservers::Run 	toolkit/components/places/History.cpp:477
2 	xul.dll 	nsThread::ProcessNextEvent 	xpcom/threads/nsThread.cpp:618
3 	nspr4.dll 	PR_GetThreadPrivate 	nsprpub/pr/src/threads/prtpd.c:232
4 	xul.dll 	NS_ProcessNextEvent_P 	obj-firefox/xpcom/build/nsThreadUtils.cpp:250
5 	xul.dll 	nsXULWindow::ShowModal 	xpfe/appshell/src/nsXULWindow.cpp:419
6 	xul.dll 	nsContentTreeOwner::ShowAsModal 	xpfe/appshell/src/nsContentTreeOwner.cpp:554
7 	xul.dll 	nsWindowWatcher::OpenWindowJSInternal 	

Other reports on Windows Firefox 4 and 5:

https://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=contains&query=mozilla%3A%3Aplaces%3A%3AHistory%3A%3ANotifyVisited&reason_type=contains&date=06%2F29%2F2011%2009%3A47%3A44&range_value=1&range_unit=weeks&hang_type=any&process_type=any&do_query=1&signature=mozilla%3A%3Aplaces%3A%3AHistory%3A%3ANotifyVisited%28nsIURI%2A%29

crash report 1 

Operating system: Windows NT
                  5.1.2600 Service Pack 3
CPU: x86
     GenuineIntel family 6 model 44 stepping 2
     1 CPU

Crash reason:  EXCEPTION_ACCESS_VIOLATION_READ
Crash address: 0x11

Thread 0 (crashed)
 0  xul.dll!nsTArray_base<nsTArrayDefaultAllocator>::Length() [nsTArray.h : 139 + 0x5]
    eip = 0x10061b1c   esp = 0x00119d00   ebp = 0x00119d04   ebx = 0x00000000
    esi = 0x12411750   edi = 0xffff0007   eax = 0x04442848   ecx = 0x00000011
    edx = 0x00000001   efl = 0x00010206
    Found by: given as instruction pointer in context
 1  xul.dll!mozilla::places::History::NotifyVisited(nsIURI *) [History.cpp : 1304 + 0x7]
    eip = 0x1106b584   esp = 0x00119d0c   ebp = 0x00119d2c
    Found by: call frame info
Attached file crash report 2
perating system: Windows NT
                  5.1.2600 Service Pack 3
CPU: x86
     GenuineIntel family 6 model 44 stepping 2
     1 CPU

Crash reason:  EXCEPTION_ACCESS_VIOLATION_READ
Crash address: 0x67726f2e

Thread 0 (crashed)
 0  xul.dll!nsTArray_base<nsTArrayDefaultAllocator>::Length() [nsTArray.h : 139 + 0x5]
    eip = 0x10061b1c   esp = 0x00119d00   ebp = 0x00119d04   ebx = 0x00000000
    esi = 0x12411750   edi = 0xffff0007   eax = 0x0446d320   ecx = 0x67726f2e
    edx = 0x00000001   efl = 0x00010206
    Found by: given as instruction pointer in context
 1  xul.dll!mozilla::places::History::NotifyVisited(nsIURI *) [History.cpp : 1304 + 0x7]
    eip = 0x1106b584   esp = 0x00119d0c   ebp = 0x00119d2c
    Found by: call frame info

Ted's exploitable tool rates this report as highly exploitable.
Attached file wgetted page content
Whiteboard: [sg:critical?]
Assignee: nobody → mak77
blocking2.0: --- → -
status2.0: --- → wontfix
I've been able to crash Aurora, but not Nightly.
Do we have any evidence of this affecting Nightly?
No, just Aurora and Windows XP afaict.
a couple interesting assertions we hit:

###!!! ASSERTION: Setting the link state of an unregistered Link!: 'mRegistered', file c:/mozilla/mozilla-aurora/content/base/src/Link.cpp, line 83
###!!! ASSERTION: Setting state to the currently set state!: 'mLinkState != aState', file c:/mozilla/mozilla-aurora/content/base/src/Link.cpp, line 85

I suspect Bug 598833 part 12 may have fixed this crash, but needs more investigation.
Also, keeping on this ends up with ASSERTION: Calling SetLinkState added or removed an observer! this means SetLinkState causes a call to UnregisterVisitedCallback(), the array changes its size while we are looping through it, next iteration tries to access memory for the just removed element.
Attached file Stack 1
Attached file stack 2
Attached file stack 3
All these stacks seem to be related to a modal dialog spinning the events loop
cc-ing boris who knows the "other-side" (HTML/DOM) better than me and changed SetLinkState recently.
The loop itself seems pretty unsafe if something can cause concurrent calls to ResetLinkState while we are invoking SetLinkState.
> ###!!! ASSERTION: Setting the link state of an unregistered Link!:
> 'mRegistered', file c:/mozilla/mozilla-aurora/content/base/src/Link.cpp, line 83

Hrm.  How did that happen?  mRegistered can only be false if we've never registered with history, if we've gotten a SetLinkState() call, or if we've successfully called UnregisterVisitedCallback()....

> Calling SetLinkState added or removed an observer!

I believe that _this_ is currently possible: SetLinkState triggers a ContentStateChanged which does selector matching, which can trigger Link::LinkState calls on arbitrary previous siblings and ancestors of the element.

Can we just use an nsTObserverArray for the observers here?
And does doing that fix the assertions from comment 5?
(In reply to comment #11)
> > ###!!! ASSERTION: Setting the link state of an unregistered Link!:
> > 'mRegistered', file c:/mozilla/mozilla-aurora/content/base/src/Link.cpp, line 83
> 
> Hrm.  How did that happen?  mRegistered can only be false if we've never
> registered with history, if we've gotten a SetLinkState() call, or if we've
> successfully called UnregisterVisitedCallback()....

Well, SetLinkState sets both mLinkState and mRegistered. From stack 3 looks like the next call to ContentStateChanged causes the opening of a modal dialog. This spins the events loop and may cause a nested NotifyVisited call. If this happens to be for the same URI, we may end up handling again the Link we were already handling.

> Can we just use an nsTObserverArray for the observers here?

Yes, I think this may solve at least the access to random memory, I was not aware we had such a nice array type, I can think of other uses for it. Btw, I'm about to test this.
> From stack 3 looks like the next call to ContentStateChanged causes the opening
> of a modal dialog.

That is ... possibly broken.  Filed bug 669934.

So I think we definitely want to use an nsTObserverArray here (per comment 11 second part).  But we may also want to wrap a script blocker around the whole notification pass, to avoid that nested modal dialog in the first place.  Does doing that work?
Specifically, nsAutoScriptBlocker at the top of History::NotifyVisited.
This means I have to make History.cpp depend on nsContentUtils.h, shouldn't be a big deal now that libxul is default, but it's creating some deep linking around, I can't find other toolkit code using ContentUtils.

A possible alternative may be to annotate in KeyClass that we are notifying its observers, and in NotifyVisited bail out early if this flag is set.

I was also looking at other crash stacks, one is crashing when trying to get Length from the array. I think this may happen if a nested call destroys the array while the external call is walking it, indeed after walking the array it is destroyed by the call to RemoveEntry(aURI).


Finally, I'm getting other interesting assertions on this page:
###!!! ASSERTION: We shouldn't be reentering here: '!mFrameIsUpdatingScrollbar',
 file c:/mozilla/mozilla-beta/layout/generic/nsGfxScrollFrame.cpp, line 3164
###!!! ASSERTION: Adding a child where we already have a child? This may misbehave: 'Error', file c:/mozilla/mozilla-beta/docshell/shistory/src/nsSHEntry.cpp, line 703
Attached patch patch v1.0 (obsolete) — Splinter Review
In previous comment I suggest something like this
> shouldn't be a big deal now that libxul is default

Right; that's why it's doable at all.

> A possible alternative may be to annotate in KeyClass that we are notifying
> its observers, and in NotifyVisited bail out early if this flag is set.

If we guarantee that observers are always appended and that they are notified in order, this would work, yes.

> ###!!! ASSERTION: Adding a child where we already have a child?

This is a known issue.  The scrollframe assert is not....
observers (Links) are only added in RegisterVisitedCallback:

  // Start tracking our Link.
  if (!observers.AppendElement(aLink)) {
    // Curses - unregister and return failure.
    (void)UnregisterVisitedCallback(aURI, aLink);
    return NS_ERROR_OUT_OF_MEMORY;
  }

so yeah, I think we can say they are always appended unless nsTObserverArray does fancy reordering!
nsTObserverArray doesn't play any games like that.  It's just an array that lets you know when it changes.  ;)
Even with the patch, I've been able to crash with a stack similar to stack 3 (obviously this time in  	xul.dll!nsTArray_base<nsTArrayDefaultAllocator>::Length()).
While mObservers has a valid address, and key is not null, key->array is invalid.
These are nested NotifyVisited calls, but not for the same uri.
I'm trying to figure out how can we end up with a valid key containing a null array.
Does the hashtable end up mutating while we're working with one of its entries?
Well, a previous call to NotifyVisited for URI1 causes a modal dialog (I think it's the alert we show for the "Unknown handler for this protocol" error), the events loop spins and NotifyVisited for URI2 is invoked. It does:

KeyClass* key = mObservers.GetEntry(aURI);
[ ... ]
ObserverArray::ForwardIterator iter(key->array);
while (iter.HasMore())

So I don't think it can mutate between the GetEntry call and the HasMore() call (that invokes Length on a null base array). I was thinking of a OOM condition but I'd not expect the key to be notnull then.
Not sure what's going on there...

I really think a scriptblocker is the right tool here.
You were right, during the iteration (when iter has mPosition = 1, so the first iteration happened, the second has to start), key starts pointing to bad memory, checking in the immediate window mObservers->GetEntry(aURI), it returns a quite different address, so the hash must have changed.
Various nested calls do a PutEntry and add observers, I suppose this may force the nsTHashTable to realloc.
The nsAutoScriptBlocker seems to avoid all nesting and perf hog, so I agree that it looks like the best approach, I just wanted to be sure there was not something more. Will attach the patch for that.

Most likely this page may cause other crashes not related to History, for sure I've seen once a memory writing violation during GC, always the nsGfxScrollFrame assertion and sometimes OOM. Most of the issues are being generated by the events loop spinning of the modal dialog, so maybe Jesse may create some fuzz tests that open these modal alerts.
Attached patch beta patch v1.0Splinter Review
patch for beta, going to make one for central/aurora.
Notice that, even with this patch, browser still crashes on OOM conditions.
Attachment #544591 - Attachment is obsolete: true
Attachment #544760 - Flags: review?(bzbarsky)
unbitrot and uses MOZ_ASSERT rather than a simple assertion, since that kind of failure should really be noticed.
Attachment #544763 - Flags: review?(bzbarsky)
Comment on attachment 544760 [details] [diff] [review]
beta patch v1.0

s/array, this/array, so this/

r=me

Why are we hitting OOM, though?  Followup bug needed?
Attachment #544760 - Flags: review?(bzbarsky) → review+
Comment on attachment 544763 [details] [diff] [review]
central/aurora patch v1.0

r=me
Attachment #544763 - Flags: review?(bzbarsky) → review+
(In reply to comment #28)
> Why are we hitting OOM, though?  Followup bug needed?

Sorry for late, I was traveling.  I reproduced this crash in a VM with 2GB memory, the page opened one hundred Outlook Express windows and a bunch of application handler selectors, plus its contents that is a bunch of iframes. Sometimes just doing that ended up with Outlook itself alerting for "Not enough memory". But I didn't pay enough attention to memory shared across the various apps. This week I don't have the XP box at hand, I may check that next week.
fwiw, on Windows 7 Beta/Firefox 6 I saw 3 Assertion failure: !cx->thread->data.conservativeGC.requestThreshold with address 0xffffffffdddddddd on 7/7.

I can also reproduce nsTArray_base<nsTArrayDefaultAllocator>::Length | mozilla::places::History::NotifyVisited mozilla::places::::NotifyVisitObservers::Run nsThread::ProcessNextEvent NS_ProcessNextEvent_P nsXULWindow::ShowModal on Mac OS X 10.5 and have seen that signature earlier in Windows 7. Linux may just be missing because my vms aren't configured with external protocol handlers. Flipping OS to All. I'll retest as soon as the patches land.
OS: Windows XP → All
the nsTArray length is fixed by this patch as well, the array mutates position in memory when the hash changes, so length fails.
Dunno about the GC, doesn't seem related at first glance.
Comment on attachment 544763 [details] [diff] [review]
central/aurora patch v1.0

Review of attachment 544763 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/places/History.cpp
@@ +1307,5 @@
> +    ObserverArray::ForwardIterator iter(key->array);
> +    while (iter.HasMore()) {
> +      Link* link = iter.GetNext();
> +      link->SetLinkState(eLinkState_Visited);
> +      // Verify that the observers hash doesn't mutate while looping through it.

We aren't looping through the observers hash though, so I don't think this comment is right.
right, will fix the comment on push, my English failed.
Backed out for a new intermittent-but-frequent |make check| orange on linux in http://hg.mozilla.org/integration/mozilla-inbound/rev/c4676497cbf3.
Thank you, I think the failures are barely related and the test is maybe a bit fragile by itself, but the make check is for sure this patch... Looks like I hit a glib bug or something like that, we saw similar issues in the past with glib...
I'll try to push a patch without moz_assert to tryserver and do a bunch of respins, since I suspect it's mostly causing the problem, even if I suspect fixing it properly may be hairy if it's really glib.
Any luck on an updated patch here, it'd be great to get this in on beta and aurora!
Whiteboard: [sg:critical?] → [sg:critical?] [needs updated patches]
sorry, I was traveling, looking into it today.
I'm waiting on these results, so far so good:
http://tbpl.mozilla.org/?tree=Try&rev=b9f0e38cf98b

Since the make check failures were on debug builds, my theory on MOZ_ASSERT sounds plausible, the patch on try uses NS_ABORT_IF_FALSE instead. Once I have enough green builds I'll reland.
according to tryserver it should stick...
http://hg.mozilla.org/mozilla-central/rev/bebc2764bed5
will ask approvals as soon as I see everything is fine in central.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla8
Attachment #544763 - Flags: approval-mozilla-aurora?
Comment on attachment 544760 [details] [diff] [review]
beta patch v1.0

notice, I'm going to use NS_ABORT_IF_FALSE on all branches.
Attachment #544760 - Flags: approval-mozilla-beta?
Whiteboard: [sg:critical?] [needs updated patches] → [sg:critical?]
Attachment #544760 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #544763 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
thank you very much, transplant worked fine from what I see.
qa+ for verification on Firefox 7
Whiteboard: [sg:critical?] → [sg:critical?][qa+]
verified on Nightly/9, Aurora/8, Beta/7 no crash with url on Windows XP, Windows 7, Linux 32/64 bit, Mac OS X 10.4
Status: RESOLVED → VERIFIED
Whiteboard: [sg:critical?][qa+] → [sg:critical?][qa!]
Group: core-security
You need to log in before you can comment on or make changes to this bug.