Closed Bug 589469 Opened 12 years ago Closed 11 years ago

Crash [@ TouchBadMemory][@ UnlockEnumerator]

Categories

(Core :: ImageLib, defect)

x86
All
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- beta5+

People

(Reporter: bc, Assigned: bholley)

References

()

Details

(Keywords: crash, regression)

Crash Data

Attachments

(4 files, 2 obsolete files)

Attached file crash report winxp
1. http://derstandard.at/
2. crash winxp/mac os x 10.5 intel32
   during shutdown.
   haven't seen on mac os x 10.5 ppc32 or vista
   mac implicated cycle collector with an abort related to removing an image not in the tracker from cycle collection.

Note mac had:

# set uninitialized memory to 0xAA
MallocPreScribble=1
# set freed memory to 0x55
MallocScribble=1
# add guard pages before/after large allocs
MallocGuardEdges=1
# abort() if heap corruption detected
MallocCheckHeapAbort=1
# abort() if illegal free() called
MallocBadFreeAbort=1

Operating system: Windows NT
                  5.1.2600 Service Pack 3
CPU: x86
     GenuineIntel family 6 model 23 stepping 8
     1 CPU

Crash reason:  EXCEPTION_ACCESS_VIOLATION
Crash address: 0x86ac910

Thread 0 (crashed)
 0  xul.dll!UnlockEnumerator(imgIRequest *,unsigned int,void *) [nsDocument.cpp : 8042 + 0x3]
    eip = 0x10a13816   esp = 0x0012b478   ebp = 0x0012b478   ebx = 0x7ffd4000
    esi = 0x000000c0   edi = 0x0022ae28   eax = 0x086ac910   ecx = 0x10a13810
    edx = 0x00000001   efl = 0x00010212
    Found by: given as instruction pointer in context
 1  xul.dll!nsBaseHashtable<nsPtrHashKey<imgIRequest>,unsigned int,unsigned int>::s_EnumReadStub(PLDHashTable *,PLDHashEntryHdr *,unsigned int,void *) [nsBaseHashtable.h : 345 + 0x1d]
    eip = 0x10a19990   esp = 0x0012b480   ebp = 0x0012b498
    Found by: call frame info
 2  xul.dll!PL_DHashTableEnumerate [pldhash.c : 754 + 0x18]
    eip = 0x10525e48   esp = 0x0012b4a0   ebp = 0x0012b4e4
    Found by: call frame info
 3  xul.dll!nsBaseHashtable<nsPtrHashKey<imgIRequest>,unsigned int,unsigned int>::EnumerateRead(PLDHashOperator (*)(imgIRequest *,unsigned int,void *),void *) [nsBaseHashtable.h : 206 + 0x11]
    eip = 0x10a159e2   esp = 0x0012b4ec   ebp = 0x0012b504
    Found by: call frame info
 4  xul.dll!nsDocument::SetImageLockingState(int) [nsDocument.cpp : 8057 + 0x29]
    eip = 0x10a1387d   esp = 0x0012b50c   ebp = 0x0012b51c
    Found by: call frame info


Operating system: Mac OS X
                  10.5.8 9L34
CPU: x86
     GenuineIntel family 6 model 26 stepping 5
     1 CPU

Crash reason:  EXC_BAD_ACCESS / KERN_PROTECTION_FAILURE
Crash address: 0x0

Thread 0 (crashed)
 0  libmozalloc.dylib!TouchBadMemory [mozalloc_abort.cpp : 64 + 0x5]
    eip = 0x0001af43   esp = 0xbfff6b90   ebp = 0xbfff6b98   ebx = 0x0001af66
    esi = 0x1660f170   edi = 0x05b96932   eax = 0x00000000   ecx = 0x0001af3d
    edx = 0x00000000   efl = 0x00010286
    Found by: given as instruction pointer in context
 1  libmozalloc.dylib!mozalloc_abort [mozalloc_abort.cpp : 85 + 0x4]
    eip = 0x0001af9f   esp = 0xbfff6ba0   ebp = 0xbfff6bb8   ebx = 0x0001af66
    esi = 0x1660f170   edi = 0x05b96932
    Found by: call frame info
 2  XUL!Abort [nsDebugImpl.cpp : 379 + 0xa]
    eip = 0x05bf8f95   esp = 0xbfff6bc0   ebp = 0xbfff6bd8   ebx = 0x05bf9297
    esi = 0x1660f170   edi = 0x05b96932
    Found by: call frame info
 3  XUL!NS_DebugBreak_P [nsDebugImpl.cpp : 337 + 0xd]
    eip = 0x05bf94e9   esp = 0xbfff6be0   ebp = 0xbfff7008   ebx = 0x05bf9297
    esi = 0x1660f170   edi = 0x05b96932
    Found by: call frame info
 4  XUL!nsDocument::RemoveImage [nsDocument.cpp : 8007 + 0x37]
    eip = 0x04b85737   esp = 0xbfff7010   ebp = 0xbfff7058   ebx = 0x04b85692
    esi = 0x1660f170   edi = 0x05b96932
    Found by: call frame info

nsXPConnect::CommenceShutdown()
###!!! ABORT: Removing image that wasn't in the tracker!: 'found', file /work/mozilla/builds/2.0.0/mozilla/content/base/src/nsDocument.cpp, line 8007
mozilla::layers::Image::Release()+0x001DE525 [/work/mozilla/builds/2.0.0/mozilla/firefox-debug/dist/FirefoxDebug.app/Contents/MacOS/XUL +0x005E3737]
mozilla::layers::Image::Release()+0x0023AB33 [/work/mozilla/builds/2.0.0/mozilla/firefox-debug/dist/FirefoxDebug.app/Contents/MacOS/XUL +0x0063FD45]
mozilla::layers::Image::Release()+0x0023AD4D [/work/mozilla/builds/2.0.0/mozilla/firefox-debug/dist/FirefoxDebug.app/Contents/MacOS/XUL +0x0063FF5F]
mozilla::layers::Image::Release()+0x0023AEDF [/work/mozilla/builds/2.0.0/mozilla/firefox-debug/dist/FirefoxDebug.app/Contents/MacOS/XUL +0x006400F1]
gfxRect::Scale(double, double)+0x00017637 [/work/mozilla/builds/2.0.0/mozilla/firefox-debug/dist/FirefoxDebug.app/Contents/MacOS/XUL +0x0076A851]
mozilla::layers::Image::Release()+0x002493BF [/work/mozilla/builds/2.0.0/mozilla/firefox-debug/dist/FirefoxDebug.app/Contents/MacOS/XUL +0x0064E5D1]
mozilla::layers::Image::Release()+0x002286B1 [/work/mozilla/builds/2.0.0/mozilla/firefox-debug/dist/FirefoxDebug.app/Contents/MacOS/XUL +0x0062D8C3]
gfxRect::Scale(double, double)+0x00017018 [/work/mozilla/builds/2.0.0/mozilla/firefox-debug/dist/FirefoxDebug.app/Contents/MacOS/XUL +0x0076A232]
gfxRGBA::gfxRGBA()+0x001266F2 [/work/mozilla/builds/2.0.0/mozilla/firefox-debug/dist/FirefoxDebug.app/Contents/MacOS/XUL +0x015D0966]
NS_CycleCollectorSuspect_P+0x00000C4B [/work/mozilla/builds/2.0.0/mozilla/firefox-debug/dist/FirefoxDebug.app/Contents/MacOS/XUL +0x0165F8A9]
NS_CycleCollectorSuspect_P+0x00000CAB [/work/mozilla/builds/2.0.0/mozilla/firefox-debug/dist/FirefoxDebug.app/Contents/MacOS/XUL +0x0165F909]
NS_CycleCollectorSuspect_P+0x0000127C [/work/mozilla/builds/2.0.0/mozilla/firefox-debug/dist/FirefoxDebug.app/Contents/MacOS/XUL +0x0165FEDA]
NS_CycleCollectorSuspect_P+0x000013B9 [/work/mozilla/builds/2.0.0/mozilla/firefox-debug/dist/FirefoxDebug.app/Contents/MacOS/XUL +0x01660017]
NS_CycleCollectorSuspect_P+0x000013EF [/work/mozilla/builds/2.0.0/mozilla/firefox-debug/dist/FirefoxDebug.app/Contents/MacOS/XUL +0x0166004D]
NS_GetDebug_P+0x000004CC [/work/mozilla/builds/2.0.0/mozilla/firefox-debug/dist/FirefoxDebug.app/Contents/MacOS/XUL +0x015D7EBE]
NS_ShutdownXPCOM_P+0x00000011 [/work/mozilla/builds/2.0.0/mozilla/firefox-debug/dist/FirefoxDebug.app/Contents/MacOS/XUL +0x015D828D]
XRE_GetBinaryPath+0x00000B8E [/work/mozilla/builds/2.0.0/mozilla/firefox-debug/dist/FirefoxDebug.app/Contents/MacOS/XUL +0x00006F54]
XRE_main+0x00003917 [/work/mozilla/builds/2.0.0/mozilla/firefox-debug/dist/FirefoxDebug.app/Contents/MacOS/XUL +0x0000FF67]
start+0x000003B6 [/work/mozilla/builds/2.0.0/mozilla/firefox-debug/dist/FirefoxDebug.app/Contents/MacOS/firefox-bin +0x0000181A]
start+0x00000036 [/work/mozilla/builds/2.0.0/mozilla/firefox-debug/dist/FirefoxDebug.app/Contents/MacOS/firefox-bin +0x0000149A]
###!!! ABORT: Removing image that wasn't in the tracker!: 'found', file /work/mozilla/builds/2.0.0/mozilla/content/base/src/nsDocument.cpp, line 8007
Attached file mac intel crash report
this is quite useful - thanks. I'm guessing this is the same as bug 588681, but there's more information here (specifically, the abort), so I'm going to leave them separate for now.
Assignee: nobody → bobbyholley+bmo
So here's my take on this stuff. I need dbaron's input, so I'm CCing him.

The crashes we're seeing are related to the nsDocument accessing a pointer to an imgIRequest that has gone away. In theory, the content/style classes registering the imgIRequests with the tracker all hold strong references to them, and are supposed to deregister those images before releasing them.

However, we also now have evidence of a second phenomenon: nsImageLoadingContent is calling nsDocument::RemoveImage(foo) without ever having called nsDocument::AddImage(foo). I can think of 3 reasonable explanations of this:

1-That one of imgLoader::LoadImageWithChannel(), nsContentUtils::LoadImage(), or imgIRequest::Clone() has a codepath where it returns an error but still sets its out-param to something non-null. The returned error would prevent the loaded image from being tracked, but the non-null out-param will cause the image to be untracked.

2-That it is possible for nsImageLoadingContent::GetOurDocument() to return null at some point in its lifetime (preventing the image from being tracked), and then at some _later_ time, not return null (allowing the image to be untracked).

3-That it is possible for the result of nsImageLoadingContent::GetOurDocument() to _change_ from one nsIDcoument to another during the lifetime of an nsImageLoadingContent object.

Option 3 would be the most satisfying, because it would explain both the ABORT _and_ the crash: we call AddImage(foo) on document a, but call RemoveImage(foo) on document b. Document b is confused because foo was never added, and Document a holds onto the stale foo pointer indefinitely, causing a crash down the line.

So I'm hoping dbaron can tell me whether it would ever be possible for the document referred to by nsImageLoadingContent::GetOurDocument to change. Note that all of stack we're getting appear to originate from the cycle-collector, if that's any help.
Severity: normal → critical
Bobby: Would these crashes in crash stats map to this bug - http://tinyurl.com/2a8gass?
(In reply to comment #4)
> Bobby: Would these crashes in crash stats map to this bug -
> http://tinyurl.com/2a8gass?

More specifically bug 588681. But if theory #3 is correct, this and that bug are symptoms of the same underlying problem.
Apparently dbaron is out this week, so I'm CCing bz.

bz - can you weigh in on comment 3?
Depends on: 590395
So Jesse's testcase over in bug 590395 appears to imply that an nsImageLoadingContent can be shared between a document and its iframe. Is this intentional? It seems wrong to me...
> 3-That it is possible for the result of nsImageLoadingContent::GetOurDocument()
> to _change_ from one nsIDcoument to another during the lifetime of an
> nsImageLoadingContent object.

Of course it is.  See nsIDOMDocument.adoptNode.  What patch introduced code that assumes it can't?

I'll put comments about bug 590395 in that bug, given the security state there.
Blocks: 512260
blocking2.0: --- → beta5+
Keywords: regression
Attached patch patch - v1 (obsolete) — Splinter Review
Added a patch that seems to do the trick. Flagging bz for review.
Attachment #469731 - Flags: review?(bzbarsky)
Comment on attachment 469731 [details] [diff] [review]
patch - v1

It's a scriptable interface, so random stuff can implement it, so that static_cast isn't really safe.  I'd prefer we just add a noscript notxpcom method on the interface that the nodeutils code will call (and of course rev the iid).  r=me with that.
Attachment #469731 - Flags: review?(bzbarsky) → review+
Attached patch patch - v2 (obsolete) — Splinter Review
Updated patch, addressing bz's concerns.
Attachment #469731 - Attachment is obsolete: true
Comment on attachment 469784 [details] [diff] [review]
patch - v2

carrying over r+
Attachment #469784 - Flags: review+
You need to rev the iid.  See comment 10.
Attached patch patch - v3Splinter Review
My mistake - apologies.

On that note, I guess this technically needs sr too. So I might as well reflag bz for everything.
Attachment #469784 - Attachment is obsolete: true
Attachment #469788 - Flags: superreview?(bzbarsky)
Attachment #469788 - Flags: review?(bzbarsky)
Comment on attachment 469788 [details] [diff] [review]
patch - v3

r+sr=me
Attachment #469788 - Flags: superreview?(bzbarsky)
Attachment #469788 - Flags: superreview+
Attachment #469788 - Flags: review?(bzbarsky)
Attachment #469788 - Flags: review+
Pushed the patch to mozilla-central:

http://hg.mozilla.org/mozilla-central/rev/cf4d7946e2e0

Leaving open so we can confirm that it's fixed.
Given that this is a beta-5 blocker, I'm going to close it, because I'm 99% sure that that patch fixed it. Please re-open if you see it again.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Duplicate of this bug: 590106
Attached file crash report
Bobby: I no longer see a crash with http://derstandard.at/ but I do see a sort of similar crash at:

http://www.astonmartin.com/configurator/db9coupe.asp?td=UL%2526x=14%2526y=10
http://www.astonmartin.com/configurator/db9volante.asp?td=UL%2526x=6%2526y=8

Is this the same issue or should I file a new bug?
(In reply to comment #19)

> Is this the same issue or should I file a new bug?

Different issue.
filed Bug 609499
Crash Signature: [@ TouchBadMemory] [@ UnlockEnumerator]
You need to log in before you can comment on or make changes to this bug.