Closed Bug 973533 Opened 11 years ago Closed 10 years ago

Crash [@ moz_abort | isalloc_validate | mozilla::scache::StartupCache::SizeOfEntryExcludingThis(nsACString_internal const&, nsAutoPtr<mozilla::scache::CacheEntry> const&, unsigned int (*)(void const*), void*) ]

Categories

(Core :: XPCOM, defect)

x86
Windows 8.1
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox30 - affected

People

(Reporter: andrei, Unassigned)

References

()

Details

(Keywords: crash, intermittent-failure, sec-moderate, Whiteboard: [mozmill])

Crash Data

Attachments

(1 file)

Attached file console_log.txt
This is the crash report: https://crash-stats.mozilla.com/report/index/b3406b26-1056-47ee-82e9-722ff2140217 Crashed while running mozmill tests on a Win81 32b node. http://mozmill-daily.blargon7.com/#/endurance/report/e35f22a68fec3f6d144713f9abdbca7a The full console log is attached. Prior to the crash we have a few messages like this: > [Child 1676] WARNING: pipe error: 109: file c:/builds/moz2_slave/m-cen-w32-ntly-000000000000000/build/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 310 I found a similar message reported in bug 966094
This crash sounds kinda bad. We are accessing some memory which should not be accessed (I think) Crash Reason EXCEPTION_BREAKPOINT Crash Address 0x73612555 Here the first 10 frames from the stack: 0 mozglue.dll moz_abort memory/mozjemalloc/jemalloc.c 1 mozglue.dll isalloc_validate memory/mozjemalloc/jemalloc.c 2 xul.dll mozilla::scache::StartupCache::SizeOfEntryExcludingThis(nsACString_internal const &,nsAutoPtr<mozilla::scache::CacheEntry> const &,unsigned int (*)(void const *),void *) startupcache/StartupCache.cpp 3 xul.dll nsBaseHashtable<nsCStringHashKey,nsAutoPtr<mozilla::scache::CacheEntry>,mozilla::scache::CacheEntry *>::s_SizeOfStub(PLDHashEntryHdr *,unsigned int (*)(void const *),void *) obj-firefox/dist/include/nsBaseHashtable.h 4 xul.dll SizeOfEntryExcludingThisEnumerator xpcom/glue/pldhash.cpp 5 xul.dll PL_DHashTableEnumerate(PLDHashTable *,PLDHashOperator (*)(PLDHashTable *,PLDHashEntryHdr *,unsigned int,void *),void *) xpcom/glue/pldhash.cpp 6 xul.dll PL_DHashTableSizeOfExcludingThis(PLDHashTable const *,unsigned int (*)(PLDHashEntryHdr *,unsigned int (*)(void const *),void *),unsigned int (*)(void const *),void *) xpcom/glue/pldhash.cpp 7 xul.dll nsBaseHashtable<nsCStringHashKey,nsAutoPtr<mozilla::scache::CacheEntry>,mozilla::scache::CacheEntry *>::SizeOfExcludingThis(unsigned int (*)(nsACString_internal const &,nsAutoPtr<mozilla::scache::CacheEntry> const &,unsigned int (*)(void const *),void *),unsigned int (*)(void const *),void *) obj-firefox/dist/include/nsBaseHashtable.h 8 xul.dll mozilla::scache::StartupCache::HeapSizeOfIncludingThis(unsigned int (*)(void const *)) startupcache/StartupCache.cpp 9 xul.dll mozilla::scache::StartupCache::CollectReports(nsIMemoryReporterCallback *,nsISupports *) startupcache/StartupCache.cpp 10 xul.dll nsMemoryReporterManager::GetReportsForThisProcess(nsIMemoryReporterCallback *,nsISupports *) xpcom/base/nsMemoryReporterManager.cpp Andrei, is that crash reproducible? If yes, we should find the regression range. Are other platforms also affected or only Windows 8.1?
Severity: normal → critical
Flags: needinfo?(andrei.eftimie)
Keywords: crash
Whiteboard: [mozmill]
Marking as security bug for now until we know if it is one or not.
Group: core-security
I haven't seen any other instance besides this one. Also note that this crash was on the 13th Feb, and we ran tests every day with no other occurrence. I'll give it a shot to try reproducing it in the same environment, but I think its not likely that it will reproduce.
Flags: needinfo?(andrei.eftimie)
(Not a networking cache but startup cache)
Component: Networking: Cache → XPCOM
Nick: how bad is it when this memory reporter (added in bug 697335) fires?
Flags: needinfo?(n.nethercote)
> Nick: how bad is it when this memory reporter (added in bug 697335) fires? I don't understand the question... how bad is what? If you're asking when it happens, it's when a user visits about:memory and hits the "measure" button. As you found, this is fairly old code that hasn't changed recently, AFAIK. Judging from the stack, I'd guess we're calling malloc_size_of() on something that's not a heap block. Here's the full code for CacheEntry: struct CacheEntry { nsAutoArrayPtr<char> data; uint32_t size; CacheEntry() : data(nullptr), size(0) { } // Takes possession of buf CacheEntry(char* buf, uint32_t len) : data(buf), size(len) { } ~CacheEntry() { } size_t SizeOfExcludingThis(mozilla::MallocSizeOf mallocSizeOf) { return mallocSizeOf(data); } }; Normally, the structure of a SizeOfExcludingThis() function is quite similar to the structure of a destructor, except that it measures heap blocks instead of freeing them. But this class isn't like that... ~CacheEntry() doesn't free |data|. In fact, I can't tell where |data| is freed. Taras, Nathan, or Aaron might understand this code better.
Flags: needinfo?(n.nethercote)
(In reply to Nicholas Nethercote [:njn] from comment #6) > Normally, the structure of a SizeOfExcludingThis() function is quite similar > to the structure of a destructor, except that it measures heap blocks > instead of freeing them. But this class isn't like that... ~CacheEntry() > doesn't free |data|. In fact, I can't tell where |data| is freed. ~nsAutoArrayPtr will take care of deleting the data. We only ever build a CacheEntry with heap-allocated data: http://mxr.mozilla.org/mozilla-central/source/startupcache/StartupCache.cpp#372 Hm, are we trying to get a memory report from a child that's already freed its startup cache? Perhaps this assert: http://mxr.mozilla.org/mozilla-central/source/memory/mozjemalloc/jemalloc.c#4384 is firing because we've already freed the pointer?
We're pretty clearly operating on a bad pointer, but it really doesn't look like we've already freed the cache. We could be hitting threading issues with startup cache clients: we have debug assertions about PutBuffer being on the correct thread but not release-mode checks. It could also be that we're doing unsafe modification of the hashtable itself via callbacks in the CacheCloseHelper enumerator. I must admit I've lost track of what the CacheCloseHelper is actually supposed to be doing, and its calls seem unable to re-enter at first glance, but I'm not 100% sure of that. I guess this was a release build that doesn't have assertions about threading or nested hashtable modification?
It seems like this is not a commonly occurring issues and that we don't have any reproducibility so since it's sec-moderate I'm not going to track this unless we see more activity on this signature.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
Group: core-security → core-security-release
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: