Closed
Bug 1132502
Opened 9 years ago
Closed 8 years ago
crash in js::detail::HashTable<T>::lookupForAdd
Categories
(Core :: JavaScript Engine, defect, P1)
Tracking
()
People
(Reporter: alex_mayorga, Assigned: n.nethercote)
References
Details
(Keywords: crash, Whiteboard: [caf priority: p2][CR 1004423])
Crash Data
Attachments
(2 files)
11.76 KB,
patch
|
jandem
:
review+
ritu
:
approval-mozilla-beta+
n.nethercote
:
checkin+
|
Details | Diff | Splinter Review |
4.23 KB,
patch
|
jandem
:
review+
ritu
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-67343b27-8079-489a-a04f-6c2972150212. =============================================================
Updated•9 years ago
|
Component: General → JavaScript Engine
Product: Firefox → Core
Reporter | ||
Comment 1•9 years ago
|
||
bp-27e8a12a-fea7-401c-ba97-2ac602150320 20/03/2015 02:03 p.m.
status-firefox36:
--- → affected
status-firefox37:
--- → affected
Updated•9 years ago
|
Crash Signature: [@ js::detail::HashTable<js::HashMapEntry<char const*, JS::ClassInfo>, js::HashMap<char const*, JS::ClassInfo, js::CStringHashPolicy, js::SystemAllocPolicy>::MapHashPolicy, js::SystemAllocPolicy>::lookupForAdd(char const* const&)] → [@ js::detail::HashTable<js::HashMapEntry<char const*, JS::ClassInfo>, js::HashMap<char const*, JS::ClassInfo, js::CStringHashPolicy, js::SystemAllocPolicy>::MapHashPolicy, js::SystemAllocPolicy>::lookupForAdd(char const* const&)]
[@ js::detail::HashTabl…
Comment hidden (spam) |
Updated•9 years ago
|
status-firefox41:
--- → affected
status-firefox42:
--- → affected
Priority: -- → P1
Version: 36 Branch → 42 Branch
Reporter | ||
Comment 3•8 years ago
|
||
¡Hola! There were 743 crashes in the past week per https://crash-stats.mozilla.com/report/list?product=Firefox&signature=js%3A%3Adetail%3A%3AHashTable%3CT%3E%3A%3AlookupForAdd bp-f3a3a531-609f-4933-bf1e-64b6f2160216 16/02/2016 03:52 p.m. Crashing Thread (0) Frame Module Signature Source 0 xul.dll js::detail::HashTable<js::HashMapEntry<char const*, JS::ClassInfo>, js::HashMap<char const*, JS::ClassInfo, js::CStringHashPolicy, js::SystemAllocPolicy>::MapHashPolicy, js::SystemAllocPolicy>::lookupForAdd(char const* const&) js/public/HashTable.h 1 xul.dll AddClassInfo js/src/vm/MemoryMetrics.cpp 2 xul.dll StatsCellCallback<0> js/src/vm/MemoryMetrics.cpp 3 xul.dll IterateCompartmentsArenasCells js/src/gc/Iteration.cpp 4 xul.dll js::IterateZonesCompartmentsArenasCells(JSRuntime*, void*, void (*)(JSRuntime*, void*, JS::Zone*), void (*)(JSRuntime*, void*, JSCompartment*), void (*)(JSRuntime*, void*, js::gc::Arena*, JS::TraceKind, unsigned int), void (*)(JSRuntime*, void*, void*, JS::TraceKind, unsigned int)) js/src/gc/Iteration.cpp 5 xul.dll StatsCompartmentCallback js/src/vm/MemoryMetrics.cpp 6 xul.dll nsTHashtable<nsISupportsHashKey>::nsTHashtable<nsISupportsHashKey>() xpcom/glue/nsTHashtable.h 7 xul.dll xpc::JSReporter::CollectReports(nsDataHashtable<nsUint64HashKey, nsCString>*, nsDataHashtable<nsUint64HashKey, nsCString>*, nsIMemoryReporterCallback*, nsISupports*, bool) js/xpconnect/src/XPCJSRuntime.cpp 8 xul.dll nsWindowMemoryReporter::CollectReports(nsIMemoryReporterCallback*, nsISupports*, bool) dom/base/nsWindowMemoryReporter.cpp
status-firefox44:
--- → affected
status-firefox45:
--- → affected
status-firefox46:
--- → affected
status-firefox47:
--- → affected
Comment 4•8 years ago
|
||
When i look at mozilla crash reporter/other reports/reports, there are about 75% of those errors that happen at address 0xffffffffe5e5e5e5 Doesn't that information help in pinpointing the faulty code location?
Updated•8 years ago
|
Flags: needinfo?(nihsanullah)
Comment 5•8 years ago
|
||
Terrence this looks similar to https://bugzilla.mozilla.org/show_bug.cgi?id=1189934 and maybe https://bugzilla.mozilla.org/show_bug.cgi?id=1180954
Flags: needinfo?(nihsanullah) → needinfo?(terrence)
Comment 6•8 years ago
|
||
Any sort of heap corruption appears likely to hit this signature, so it is probably unrelated if there is a new spike.
Comment 7•8 years ago
|
||
If we expand to look at the 28 day table view, we can see that the 2016031515 build has 2521 crashes, where all the other builds are metastable between 1-4 crashes and tens to a few hundred. I think this massive spike is almost certainly an isolated incident that has already been fixed. The smaller spikes are probably in the same category for less noticeable bugs. Clearing the ni? as there's nothing else I can say here without STR.
Flags: needinfo?(terrence)
Comment 8•8 years ago
|
||
One of my desktops gets this crash about once per day, is there something i can do to help diagnose the issue?
Comment 9•8 years ago
|
||
(In reply to bugzilla.m7 from comment #8) > One of my desktops gets this crash about once per day, is there something i > can do to help diagnose the issue? Yes, there is. We suspect that some large portion of these random-looking crashes are caused by the roughly 8%[1] RAM failure rate. If you could run memtest86 overnight, that might give us some data to say whether that is likely cause or not. Thanks! 1- http://www.zdnet.com/article/dram-error-rates-nightmare-on-dimm-street/
Comment 10•8 years ago
|
||
Hmm, don't you think that i would get BSOD if i had faulty memory? and why would firefox be the only app to crash? and the fact that 75% of crashes are at address 0xffffffffe5e5e5e5 ? does it not mean that this is not a random crash?
Comment 11•8 years ago
|
||
(In reply to bugzilla.m7 from comment #10) > Hmm, don't you think that i would get BSOD if i had faulty memory? Not necessarily. We know a number of cases where people had crashes in GC code with varying signatures where faulty RAM was actually the issue. That said, if you always get this signature, I'd think it to be less likely. > and the fact that 75% of crashes are at address 0xffffffffe5e5e5e5 ? That actually makes me wonder more. The leading ffff... may be bogus, the e5e5... pattern points to memory that we have previously freed and that should not happen that consistently. There still could be some interaction with a graphics driver or some other 3rd-party (e.g. security) software running in our process that could play a role there. Anything you know that makes this machines different from others?
Comment 12•8 years ago
|
||
> Anything you know that makes this machines different from others? Well, i checked the configurations here: https://crash-stats.mozilla.com/report/list?signature=js%3A%3Adetail%3A%3AHashTable%3CT%3E%3A%3AlookupForAdd&range_value=7&range_unit=days&date=2016-04-14#tab-reports and it seems to happen to all sorts of OSSes, ffox versions... The only pattern i could find was this strange address 0xffffffffe5e5e5e5 common to 70% of all reports (not only mine), 13 of 19 pages of report during the last 28 days have this address. FYI the anti-virus on the offending PC is the free version of avast. PS: is there a way to get a csv of the reports data? i could run some analysis on it and check for other correlations
Comment 13•8 years ago
|
||
> 0xffffffffe5e5e5e5
Looks like jemalloc freed memory poison value that has been sign extended.
Comment 15•8 years ago
|
||
(In reply to bugzilla.m7 from comment #10) > Hmm, don't you think that i would get BSOD if i had faulty memory? and why > would firefox be the only app to crash? and the fact that 75% of crashes are > at address 0xffffffffe5e5e5e5 ? does it not mean that this is not a random > crash? Sorry, I think I got unlucky and happened to only see crashes without the jemalloc free pattern. That said, it's entirely possible to not get a BSOD with bad RAM. Most core OS software is written in C/C++, which is comparatively sparse in pointers, has a fairly fixed static size, and doesn't walk the heap continuously. Comparatively, firefox is huge (and hugely dynamic), mostly pointers, and continuously touching all of them.
Comment 16•8 years ago
|
||
I looked at 3 different crash reports (I picked random FF 45.0.1 reports with high values for uptime and last crash). We're crashing in StatsCellCallback because we have a BaseShape with clasp_->name == 0xe5e5e5e5. That's a bit weird - if we had a bogus BaseShape, I'd expect us to crash accessing clasp_ instead of clasp_->name. Most Classes have a statically-allocated name, but we do have this weird XPCNativeScriptableShared thing that manufactures Classes on the fly. Maybe that's causing problems somehow - shouldn't be too hard to add instrumentation to confirm that.
Comment 17•8 years ago
|
||
njn, see comment 16. You did some JSClass work recently and I vaguely remember you running into crashy BaseShape stuff in this code.
Comment 18•8 years ago
|
||
After reading more XPConnect code I'm pretty sure this is indeed an XPCNativeScriptableShared Class. I looked at ~30 crash reports and they all have (non-standard) add-ons. Question is what's the best way to fix it. We could set a flag on the BaseShape if clasp_->isWrappedNative(). Then we know it's unsafe to access clasp_, because it might be dynamically allocated. It's not very elegant and there's a risk of this footgun showing up elsewhere (like ObjectGroup::clasp_), but it gets the job done without adding a lot of complexity.
Updated•8 years ago
|
Summary: crash in js::detail::HashTable<js::HashMapEntry<char const*, JS::ClassInfo>, js::HashMap<char const*, JS::ClassInfo, js::CStringHashPolicy, js::SystemAllocPolicy>::MapHashPolicy, js::SystemAllocPolicy>::lookupForAdd(char const* const&) → crash in js::detail::HashTable<T>::lookupForAdd
Assignee | ||
Comment 19•8 years ago
|
||
Jan: thank you for the extra info. I agree that this doesn't look like generic heap corruption or faulty hardware. It's much too regular for that. We also have bug 1243529 which is an intermittent orange that looks like it has the same root cause. I had a look at three minidumps yesterday and found some interesting stuff; I'll now take a look at what Jan found and see how that fits into the puzzle.
Assignee: nobody → n.nethercote
Blocks: 1243529
Assignee | ||
Comment 20•8 years ago
|
||
The code in StatsCellCallback() looks like this: > case JS::TraceKind::BaseShape: { > BaseShape* base = static_cast<BaseShape*>(thing); > CompartmentStats* cStats = GetCompartmentStats(base->compartment()); > > JS::ClassInfo info; // This zeroes all the sizes. > info.shapesGCHeapBase += thingSize; > // No malloc-heap measurements. > > cStats->classInfo.add(info); > > const Class* clasp = base->clasp(); > const char* className = clasp->name; > AddClassInfo(granularity, cStats, className, info); > break; > } AddClassInfo() looks like this: > static void > AddClassInfo(Granularity granularity, CompartmentStats* cStats, const char* className, > JS::ClassInfo& info) > { > if (granularity == FineGrained) { > if (!className) > className = "<no class name>"; > CompartmentStats::ClassesHashMap::AddPtr p = > cStats->allClasses->lookupForAdd(className); // CRASH! > if (!p) { > bool ok = cStats->allClasses->add(p, className, info); > // Ignore failure -- we just won't record the > // object/shape/base-shape as notable. > (void)ok; > } else { > p->value().add(info); > } > } > } In the three minidumps I looked at, in AddInfo(): - cStats was nullptr - className was something bogus (I saw 0x6, 0x8a, 0xe5e5e5e5) It looks like the crash is actually occurring when trying to hash the className, which is unsurprising, given that className is bogus. If cStats is truly null, we probably would have crashed earlier, in the |cStats->classInfo.add(info);| call, or earlier in the |cStats->allClasses->lookupForAdd()| sequence. Furthermore, based on how the memory reporting works, I can't see how cStats could be null. So I don't really trust the cStats==nullptr claim, though I will change GetCompartmentStats() to check that. That way, if it really is nullptr, we'll start getting a new crash signature with a clearer cause. Now, onto Jan's theory about XPCNativeScriptableShared. If true, then we must have some BaseShapes with a clasp that points to a js::Class in XPCNativeScriptShared that has been freed, which seems dangerous in general. Why doesn't this cause problems elsewhere? Perhaps those BaseShapes are unreachable and so not touched by "normal" execution, but memory reporting does touch them because it touches every GC cell whether reachable or not. I asked mrbkap and he thought this was plausible. So I'll try to write a second patch that adds a flag indicating if a BaseShape points to a heap-allocated class. I also wonder if this flag should be consulted when measuring objects and shapes, which are handled similarly.
Flags: needinfo?(jdemooij)
Assignee | ||
Comment 21•8 years ago
|
||
We have inconclusive evidence that compartmentStats is sometimes nullptr during memory reporting, which would be bad. This patch makes us abort in that case. It also changes some pointers to references to make the expected non-nullness clearer.
Attachment #8741591 -
Flags: review?(jdemooij)
Assignee | ||
Comment 22•8 years ago
|
||
I looked into adding the flag to BaseShape. But it was a bit painful, and I wouldn't have any way of testing it. Furthermore, bholley says that XPCNativeScriptableShared is likely to be removed reasonably soon. So I went with the simpler option of just skipping this code for all BaseShapes. The consequences are minor and are described in a comment in the patch.
Attachment #8741637 -
Flags: review?(jdemooij)
Comment 23•8 years ago
|
||
Comment on attachment 8741591 [details] [diff] [review] (part 1) - Abort if compartmentStats is null during memory reporting Review of attachment 8741591 [details] [diff] [review]: ----------------------------------------------------------------- FWIW, did Visual Studio say it was nullptr? I don't really trust its "Locals" window when debugging opt builds - when in doubt I usually try to infer stuff from the code and the stack (the return addresses on the stack are useful to figure out where we are). This is a nice refactoring no matter what, though. ::: js/src/jscompartment.h @@ +764,5 @@ > + > + public: > + // This should only be called when it is non-null, i.e. during memory > + // reporting. > + JS::CompartmentStats& compartmentStats() { Nit: To avoid unexpected copies, maybe delete CompartmentStat's copy constructor? CompartmentStats(const CompartmentStats& other) = delete;
Attachment #8741591 -
Flags: review?(jdemooij) → review+
Comment 24•8 years ago
|
||
Comment on attachment 8741637 [details] [diff] [review] (part 2) - Don't call AddClassInfo() for BaseShapes Review of attachment 8741637 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for taking this bug. The sooner these fake Classes go away the better, it's too easy to assume Class pointers stay valid. ::: js/src/vm/MemoryMetrics.cpp @@ +534,5 @@ > + // non-trivial change, and heap-allocated js::Class instances are > + // likely to go away soon. > + // > + // So for now we just skip this code for all BaseShapes. The > + // consequence is that means that all BaseShapes will show up in Nit: s/that means//
Attachment #8741637 -
Flags: review?(jdemooij) → review+
Comment 25•8 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #20) > I also wonder if this flag should be > consulted when measuring objects and shapes, which are handled similarly. Objects are probably fine, because as long as we have an object with this class, it seems the XPConnect code will keep the Class alive. When the object is dead/swept, I think the iteration code here shouldn't visit it anymore. It seems we'll need something similar for Shapes though? I can imagine JIT caches etc holding onto a shape even though there are no live objects with that shape..
Updated•8 years ago
|
Flags: needinfo?(n.nethercote)
Assignee | ||
Comment 26•8 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #23) > > FWIW, did Visual Studio say it was nullptr? Yes, for three separate minidumps. > ::: js/src/jscompartment.h > @@ +764,5 @@ > > + > > + public: > > + // This should only be called when it is non-null, i.e. during memory > > + // reporting. > > + JS::CompartmentStats& compartmentStats() { > > Nit: To avoid unexpected copies, maybe delete CompartmentStat's copy > constructor? > > CompartmentStats(const CompartmentStats& other) = delete; compartmentStats() returns a reference. Is there a risk of a copy occurring? I would surely hope not...
Flags: needinfo?(n.nethercote)
Assignee | ||
Comment 27•8 years ago
|
||
> It seems we'll need something similar for Shapes though? I can imagine JIT
> caches etc holding onto a shape even though there are no live objects with
> that shape..
I haven't seen any crashes for Shapes. Also, Shapes use a lot more memory than BaseShapes, so losing the per-class reporting would be a shame. So for the moment I will just do this for BaseShapes.
Assignee | ||
Comment 29•8 years ago
|
||
(In reply to Nicholas Nethercote [:njn] from comment #27) > > I haven't seen any crashes for Shapes. Also, Shapes use a lot more memory > than BaseShapes, so losing the per-class reporting would be a shame. So for > the moment I will just do this for BaseShapes. I just saw https://crash-stats.mozilla.com/report/index/e2d28f14-f41d-495c-acd1-be88d2150731 from bug 1189934 and it *is* a Shape crash. Bleh.
Assignee | ||
Comment 31•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/223a5febb34f8dfdb326d15e4736c82d51a43811 Bug 1132502 (part 1) - Abort if compartmentStats is null during memory reporting. r=jandem.
Assignee | ||
Updated•8 years ago
|
Keywords: leave-open
Assignee | ||
Updated•8 years ago
|
Attachment #8741591 -
Flags: checkin+
Updated•8 years ago
|
Blocks: CAF-v2.2-metabug
Updated•8 years ago
|
Whiteboard: [CR 1004423]
Updated•8 years ago
|
Whiteboard: [CR 1004423] → [caf priority: p2][CR 1004423]
Comment 32•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/223a5febb34f
Assignee | ||
Comment 33•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/ffb1d08f2bd00a9490f76c89c58325e32c199066 Bug 1132502 (part 2) - Don't call AddClassInfo() for BaseShapes. r=jandem.
Comment 34•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/ffb1d08f2bd0
Assignee | ||
Comment 35•8 years ago
|
||
Comment on attachment 8741591 [details] [diff] [review] (part 1) - Abort if compartmentStats is null during memory reporting Approval Request Comment [Feature/regressing bug #]: N/A. [User impact if declined]: Crashes when looking at about:memory. [Describe test coverage new/current, TreeHerder]: No specific tests. Has been on trunk for a week. [Risks and why]: These two patches are aimed at addressing a moderately frequent crash. The first patch is very simple, and just adds a null check that will crash on failure, which is mostly for diagnostic purposes. The second patch is slightly more complex, but just disables a particular path in memory reporting that is crashy. (In case it's not clear, I want this uplifted to 47, which may be Beta by the time this request is assessed. I don't want it uplifted to 46.) [String/UUID change made/needed]: none.
Attachment #8741591 -
Flags: approval-mozilla-beta?
Assignee | ||
Updated•8 years ago
|
Attachment #8741637 -
Flags: approval-mozilla-beta?
Comment on attachment 8741591 [details] [diff] [review] (part 1) - Abort if compartmentStats is null during memory reporting Crash fix that has stabilized in Nightly for a week, Beta47+
Attachment #8741591 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
status-firefox48:
--- → fixed
Attachment #8741637 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 37•8 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/c1689b70c98e https://hg.mozilla.org/releases/mozilla-beta/rev/df8d482a008d
Assignee | ||
Updated•8 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•