Closed Bug 1432615 Opened 7 years ago Closed 5 years ago

Crash in shutdownhang | mozilla::detail::HashKnownLength<T>

Categories

(Core :: Networking: Cache, defect, P3)

56 Branch
x86_64
Windows 7
defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: worcester12345, Unassigned)

References

Details

(Keywords: crash, crashreportid, Whiteboard: [necko-triaged])

Crash Data

This bug was filed from the Socorro interface and is report bp-bbe2bfc9-6744-4686-aeac-b252d1180123. ============================================================= Top 5 frames of crashing thread: 0 xul.dll mozilla::detail::HashKnownLength<unsigned char> obj-firefox/dist/include/mozilla/HashFunctions.h:226 1 xul.dll js::HashableValue::setValue js/src/builtin/MapObject.cpp:43 2 xul.dll js::MapObject::has_impl js/src/builtin/MapObject.cpp:693 3 xul.dll js::MapObject::has js/src/builtin/MapObject.cpp:704 4 @0x2cac4834eea =============================================================
Component: Tabbed Browser → JavaScript Engine
Product: Firefox → Core
Keywords: crash, crashreportid
A non-opt stack would include more frames. I think like this: HashableValue::setValue -> AtomizeString -> js::AtomHasher::Lookup::Lookup(const JSAtom*) -> mozilla::HashString() -> mozilla::HashString(const unsigned char* aStr, size_t aLength) -> mozilla::detail::HashKnownLength<unsigned char>(unsigned char const*, unsigned __int64) This is super rare and probably the Value is already corrupt by the time we get here, so we have nothing to go on. Not actionable for us in Core::JavaScript land.
There is a more common stack that crashes with the same signature: either mozilla::net::CacheStorageService::DropPrivateBrowsingEntries() or mozilla::net::CacheStorageService::Clear() -> mozilla::net::CacheStorageService::DoomStorageEntries(nsACString_internal const&, nsILoadContextInfo*, bool, bool, nsICacheEntryDoomCallback*) -> mozilla::net::RemoveExactEntry -> nsRefPtrHashtable<nsCStringHashKey, mozilla::net::CacheEntry>::Get(nsTSubstring<char> const&, mozilla::net::CacheEntry**) -> PLDHashTable::Search(void const*) -> mozilla::detail::HashKnownLength<unsigned char>(unsigned char const*, unsigned __int64) For example, bp-4ea210fa-c6a0-4799-8745-3da2e0180213.
Component: JavaScript Engine → Networking: Cache
Has this appeared recently? Since this code is really old.
Assignee: nobody → honzab.moz
Priority: -- → P3
Whiteboard: [necko-triaged]
(In reply to Honza Bambas (:mayhemer) from comment #3) > Has this appeared recently? Since this code is really old. 23 days ago.
Nicholas, have we changed something in our hashtable hashing algos (string keys) or anything that has recently changed and could lead to major slowdown of Get() ?
Flags: needinfo?(n.nethercote)
(In reply to Honza Bambas (:mayhemer) from comment #5) > Nicholas, have we changed something in our hashtable hashing algos (string > keys) or anything that has recently changed and could lead to major slowdown > of Get() ? I'm not an MFBT reviewer, but `hg blame` says that mfbt/HashFunctions.h hasn't changed since July 2017, and mfbt/HashFunctions.cpp hasn't changed since July 2014. It's a strange place to have a hang. HashKnownLength rips through memory pretty quickly. If one of its parameters was bogus -- e.g. a super large length -- you'd think that it would crash by hitting inaccessible memory.
Flags: needinfo?(n.nethercote)
Yep, I think as well this is probably a swapping problem... Question is, would doing just a simple Clear() on the hashtable help? It would also mean to rething and rewrite some stuff inside the top layers of cache2... Do you think using hashes (fast) as keys would help?
(In reply to Honza Bambas (:mayhemer) from comment #7) > Yep, I think as well this is probably a swapping problem... You say "as well" but I didn't say that I thought it was a swapping problem. I don't think there's enough information in the crash reports to draw any conclusiosn. > Question is, would doing just a simple Clear() on the hashtable help? It > would also mean to rething and rewrite some stuff inside the top layers of > cache2... > > Do you think using hashes (fast) as keys would help? I don't know what theoretical problem either of these suggestions would address.
Ahh!(In reply to Nicholas Nethercote [:njn] from comment #8) > (In reply to Honza Bambas (:mayhemer) from comment #7) > > Yep, I think as well this is probably a swapping problem... > > You say "as well" but I didn't say that I thought it was a swapping problem. > I don't think there's enough information in the crash reports to draw any > conclusiosn. Ahh! Yes, didn't read carefully, sorry. We have the strings under our control, but they contain full URLs. Extra longs URLs are possible, but not likely near 2^32. I believe we would OOM somewhere else before. This is a shutdown hang. So, something takes too long time. The code does 2 nested iterations, where the top level is expected to be no more than three items. The seconds level can be large. For each item of the second level we do a lookup+delete in another hashtable with lower magnitude The number of items in those hashtables can be one factor, other could be that the memory is swapping on a heavily loaded system. From the report it's hard to make a conclusion and in the past with similar reports we were never able to conclude what's the cause. If there were an actionable problem, it usually was a bug like indefinite or buggy loop. It's not likely in this case, since the code has not changed for a very long time. > > > Question is, would doing just a simple Clear() on the hashtable help? It > > would also mean to rething and rewrite some stuff inside the top layers of > > cache2... > > > > Do you think using hashes (fast) as keys would help? > > I don't know what theoretical problem either of these suggestions would > address. I was thinking about lowering the memory footprint here and thus lower the probability of this piece of code taking too long to exec during shutdown, but that's a wild guess only.
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Status: ASSIGNED → NEW
This has a low crash rate (~1.5 crash a day) and is mostly seen only on ESR. I can see 30% crashes are from mozilla::net::CacheStorageService::DropPrivateBrowsingEntries() and the rest from cache2.clear() at the shutdown time. Both could be somehow optimized to just drop-all and not walk the memory entries tables entry-by-entry. Anyway, at the moment I'm returning this to the unassigned pool, holding this bug for too long and no time soon to work on it.
Assignee: honzab.moz → nobody

Seem this is gone - no crashes in past month other than 52.xx ESR

Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.