<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Naveed Ihsanullah [:naveed]

Comment 4

•

9 years ago

When i look at mozilla crash reporter/other reports/reports, there are about 75% of those errors that happen at address 0xffffffffe5e5e5e5 Doesn't that information help in pinpointing the faulty code location?

Bobby Chien

Updated

•

9 years ago

Flags: needinfo?(nihsanullah)

Comment 5

•

9 years ago

Terrence this looks similar to https://bugzilla.mozilla.org/show_bug.cgi?id=1189934 and maybe https://bugzilla.mozilla.org/show_bug.cgi?id=1180954

Flags: needinfo?(nihsanullah) → needinfo?(terrence)

Comment 6

•

9 years ago

Any sort of heap corruption appears likely to hit this signature, so it is probably unrelated if there is a new spike.

Comment 7

•

9 years ago

If we expand to look at the 28 day table view, we can see that the 2016031515 build has 2521 crashes, where all the other builds are metastable between 1-4 crashes and tens to a few hundred. I think this massive spike is almost certainly an isolated incident that has already been fixed. The smaller spikes are probably in the same category for less noticeable bugs. Clearing the ni? as there's nothing else I can say here without STR.

Flags: needinfo?(terrence)

Comment 8

•

9 years ago

One of my desktops gets this crash about once per day, is there something i can do to help diagnose the issue?

Comment 9

•

9 years ago

(In reply to bugzilla.m7 from comment #8) > One of my desktops gets this crash about once per day, is there something i > can do to help diagnose the issue? Yes, there is. We suspect that some large portion of these random-looking crashes are caused by the roughly 8%[1] RAM failure rate. If you could run memtest86 overnight, that might give us some data to say whether that is likely cause or not. Thanks! 1- http://www.zdnet.com/article/dram-error-rates-nightmare-on-dimm-street/

Comment 10

•

9 years ago

Hmm, don't you think that i would get BSOD if i had faulty memory? and why would firefox be the only app to crash? and the fact that 75% of crashes are at address 0xffffffffe5e5e5e5 ? does it not mean that this is not a random crash?

Robert Kaiser

Comment 11

•

9 years ago

(In reply to bugzilla.m7 from comment #10) > Hmm, don't you think that i would get BSOD if i had faulty memory? Not necessarily. We know a number of cases where people had crashes in GC code with varying signatures where faulty RAM was actually the issue. That said, if you always get this signature, I'd think it to be less likely. > and the fact that 75% of crashes are at address 0xffffffffe5e5e5e5 ? That actually makes me wonder more. The leading ffff... may be bogus, the e5e5... pattern points to memory that we have previously freed and that should not happen that consistently. There still could be some interaction with a graphics driver or some other 3rd-party (e.g. security) software running in our process that could play a role there. Anything you know that makes this machines different from others?

Nick Fitzgerald [:fitzgen] [⏰PST; UTC-8]

Comment 12

•

9 years ago

> Anything you know that makes this machines different from others? Well, i checked the configurations here: https://crash-stats.mozilla.com/report/list?signature=js%3A%3Adetail%3A%3AHashTable%3CT%3E%3A%3AlookupForAdd&range_value=7&range_unit=days&date=2016-04-14#tab-reports and it seems to happen to all sorts of OSSes, ffox versions... The only pattern i could find was this strange address 0xffffffffe5e5e5e5 common to 70% of all reports (not only mine), 13 of 19 pages of report during the last 28 days have this address. FYI the anti-virus on the offending PC is the free version of avast. PS: is there a way to get a csv of the reports data? i could run some analysis on it and check for other correlations

Comment 13

•

9 years ago

> 0xffffffffe5e5e5e5 Looks like jemalloc freed memory poison value that has been sign extended.

Comment 14

•

9 years ago

I can take a quick look at some crash dumps.

Flags: needinfo?(jdemooij)

Comment 15

•

9 years ago

(In reply to bugzilla.m7 from comment #10) > Hmm, don't you think that i would get BSOD if i had faulty memory? and why > would firefox be the only app to crash? and the fact that 75% of crashes are > at address 0xffffffffe5e5e5e5 ? does it not mean that this is not a random > crash? Sorry, I think I got unlucky and happened to only see crashes without the jemalloc free pattern. That said, it's entirely possible to not get a BSOD with bad RAM. Most core OS software is written in C/C++, which is comparatively sparse in pointers, has a fairly fixed static size, and doesn't walk the heap continuously. Comparatively, firefox is huge (and hugely dynamic), mostly pointers, and continuously touching all of them.

Comment 16

•

9 years ago

I looked at 3 different crash reports (I picked random FF 45.0.1 reports with high values for uptime and last crash). We're crashing in StatsCellCallback because we have a BaseShape with clasp_->name == 0xe5e5e5e5. That's a bit weird - if we had a bogus BaseShape, I'd expect us to crash accessing clasp_ instead of clasp_->name. Most Classes have a statically-allocated name, but we do have this weird XPCNativeScriptableShared thing that manufactures Classes on the fly. Maybe that's causing problems somehow - shouldn't be too hard to add instrumentation to confirm that.

Comment 17

•

9 years ago

njn, see comment 16. You did some JSClass work recently and I vaguely remember you running into crashy BaseShape stuff in this code.

Comment 18

•

9 years ago

After reading more XPConnect code I'm pretty sure this is indeed an XPCNativeScriptableShared Class. I looked at ~30 crash reports and they all have (non-standard) add-ons. Question is what's the best way to fix it. We could set a flag on the BaseShape if clasp_->isWrappedNative(). Then we know it's unsafe to access clasp_, because it might be dynamically allocated. It's not very elegant and there's a risk of this footgun showing up elsewhere (like ObjectGroup::clasp_), but it gets the job done without adding a lot of complexity.

Robert Kaiser

Updated

•

9 years ago

Summary: crash in js::detail::HashTable<js::HashMapEntry<char const*, JS::ClassInfo>, js::HashMap<char const*, JS::ClassInfo, js::CStringHashPolicy, js::SystemAllocPolicy>::MapHashPolicy, js::SystemAllocPolicy>::lookupForAdd(char const* const&) → crash in js::detail::HashTable<T>::lookupForAdd

Assignee

Comment 19

•

9 years ago

Jan: thank you for the extra info. I agree that this doesn't look like generic heap corruption or faulty hardware. It's much too regular for that. We also have bug 1243529 which is an intermittent orange that looks like it has the same root cause. I had a look at three minidumps yesterday and found some interesting stuff; I'll now take a look at what Jan found and see how that fits into the puzzle.

Assignee: nobody → n.nethercote

Blocks: 1243529

Assignee

Comment 20

•

9 years ago

The code in StatsCellCallback() looks like this: > case JS::TraceKind::BaseShape: { > BaseShape* base = static_cast<BaseShape*>(thing); > CompartmentStats* cStats = GetCompartmentStats(base->compartment()); > > JS::ClassInfo info; // This zeroes all the sizes. > info.shapesGCHeapBase += thingSize; > // No malloc-heap measurements. > > cStats->classInfo.add(info); > > const Class* clasp = base->clasp(); > const char* className = clasp->name; > AddClassInfo(granularity, cStats, className, info); > break; > } AddClassInfo() looks like this: > static void > AddClassInfo(Granularity granularity, CompartmentStats* cStats, const char* className, > JS::ClassInfo& info) > { > if (granularity == FineGrained) { > if (!className) > className = "<no class name>"; > CompartmentStats::ClassesHashMap::AddPtr p = > cStats->allClasses->lookupForAdd(className); // CRASH! > if (!p) { > bool ok = cStats->allClasses->add(p, className, info); > // Ignore failure -- we just won't record the > // object/shape/base-shape as notable. > (void)ok; > } else { > p->value().add(info); > } > } > } In the three minidumps I looked at, in AddInfo(): - cStats was nullptr - className was something bogus (I saw 0x6, 0x8a, 0xe5e5e5e5) It looks like the crash is actually occurring when trying to hash the className, which is unsurprising, given that className is bogus. If cStats is truly null, we probably would have crashed earlier, in the |cStats->classInfo.add(info);| call, or earlier in the |cStats->allClasses->lookupForAdd()| sequence. Furthermore, based on how the memory reporting works, I can't see how cStats could be null. So I don't really trust the cStats==nullptr claim, though I will change GetCompartmentStats() to check that. That way, if it really is nullptr, we'll start getting a new crash signature with a clearer cause. Now, onto Jan's theory about XPCNativeScriptableShared. If true, then we must have some BaseShapes with a clasp that points to a js::Class in XPCNativeScriptShared that has been freed, which seems dangerous in general. Why doesn't this cause problems elsewhere? Perhaps those BaseShapes are unreachable and so not touched by "normal" execution, but memory reporting does touch them because it touches every GC cell whether reachable or not. I asked mrbkap and he thought this was plausible. So I'll try to write a second patch that adds a flag indicating if a BaseShape points to a heap-allocated class. I also wonder if this flag should be consulted when measuring objects and shapes, which are handled similarly.

Flags: needinfo?(jdemooij)

Assignee

Comment 21

•

9 years ago

Attached patch (part 1) - Abort if compartmentStats is null during memory reporting — Details — Splinter Review

We have inconclusive evidence that compartmentStats is sometimes nullptr during memory reporting, which would be bad. This patch makes us abort in that case. It also changes some pointers to references to make the expected non-nullness clearer.

Attachment #8741591 - Flags: review?(jdemooij)

Assignee

Comment 22

•

9 years ago

Attached patch (part 2) - Don't call AddClassInfo() for BaseShapes — Details — Splinter Review

I looked into adding the flag to BaseShape. But it was a bit painful, and I wouldn't have any way of testing it. Furthermore, bholley says that XPCNativeScriptableShared is likely to be removed reasonably soon. So I went with the simpler option of just skipping this code for all BaseShapes. The consequences are minor and are described in a comment in the patch.

Attachment #8741637 - Flags: review?(jdemooij)

Comment 23

•

9 years ago

Comment on attachment 8741591 [details] [diff] [review] (part 1) - Abort if compartmentStats is null during memory reporting Review of attachment 8741591 [details] [diff] [review]: ----------------------------------------------------------------- FWIW, did Visual Studio say it was nullptr? I don't really trust its "Locals" window when debugging opt builds - when in doubt I usually try to infer stuff from the code and the stack (the return addresses on the stack are useful to figure out where we are). This is a nice refactoring no matter what, though. ::: js/src/jscompartment.h @@ +764,5 @@ > + > + public: > + // This should only be called when it is non-null, i.e. during memory > + // reporting. > + JS::CompartmentStats& compartmentStats() { Nit: To avoid unexpected copies, maybe delete CompartmentStat's copy constructor? CompartmentStats(const CompartmentStats& other) = delete;

Attachment #8741591 - Flags: review?(jdemooij) → review+

Comment 24

•

9 years ago

Comment on attachment 8741637 [details] [diff] [review] (part 2) - Don't call AddClassInfo() for BaseShapes Review of attachment 8741637 [details] [diff] [review]: ----------------------------------------------------------------- Thanks for taking this bug. The sooner these fake Classes go away the better, it's too easy to assume Class pointers stay valid. ::: js/src/vm/MemoryMetrics.cpp @@ +534,5 @@ > + // non-trivial change, and heap-allocated js::Class instances are > + // likely to go away soon. > + // > + // So for now we just skip this code for all BaseShapes. The > + // consequence is that means that all BaseShapes will show up in Nit: s/that means//

Attachment #8741637 - Flags: review?(jdemooij) → review+

Comment 25

•

9 years ago

(In reply to Nicholas Nethercote [:njn] from comment #20) > I also wonder if this flag should be > consulted when measuring objects and shapes, which are handled similarly. Objects are probably fine, because as long as we have an object with this class, it seems the XPConnect code will keep the Class alive. When the object is dead/swept, I think the iteration code here shouldn't visit it anymore. It seems we'll need something similar for Shapes though? I can imagine JIT caches etc holding onto a shape even though there are no live objects with that shape..

Updated

•

9 years ago

Flags: needinfo?(n.nethercote)

Assignee

Comment 26

•

9 years ago

(In reply to Jan de Mooij [:jandem] from comment #23) > > FWIW, did Visual Studio say it was nullptr? Yes, for three separate minidumps. > ::: js/src/jscompartment.h > @@ +764,5 @@ > > + > > + public: > > + // This should only be called when it is non-null, i.e. during memory > > + // reporting. > > + JS::CompartmentStats& compartmentStats() { > > Nit: To avoid unexpected copies, maybe delete CompartmentStat's copy > constructor? > > CompartmentStats(const CompartmentStats& other) = delete; compartmentStats() returns a reference. Is there a risk of a copy occurring? I would surely hope not...

Flags: needinfo?(n.nethercote)

Assignee

Comment 27

•

9 years ago

> It seems we'll need something similar for Shapes though? I can imagine JIT > caches etc holding onto a shape even though there are no live objects with > that shape.. I haven't seen any crashes for Shapes. Also, Shapes use a lot more memory than BaseShapes, so losing the per-class reporting would be a shame. So for the moment I will just do this for BaseShapes.

Assignee

Comment 29

•

9 years ago

(In reply to Nicholas Nethercote [:njn] from comment #27) > > I haven't seen any crashes for Shapes. Also, Shapes use a lot more memory > than BaseShapes, so losing the per-class reporting would be a shame. So for > the moment I will just do this for BaseShapes. I just saw https://crash-stats.mozilla.com/report/index/e2d28f14-f41d-495c-acd1-be88d2150731 from bug 1189934 and it *is* a Shape crash. Bleh.

Assignee

Comment 31

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/223a5febb34f8dfdb326d15e4736c82d51a43811 Bug 1132502 (part 1) - Abort if compartmentStats is null during memory reporting. r=jandem.

Assignee

Updated

•

9 years ago

Keywords: leave-open

https://hg.mozilla.org/mozilla-central/rev/223a5febb34f

Assignee

Updated

•

9 years ago

Attachment #8741591 - Flags: checkin+

cafbot (PoC: ggrisco)

Updated

•

9 years ago

Blocks: CAF-v2.2-metabug

cafbot (PoC: ggrisco)

Updated

•

9 years ago

Whiteboard: [CR 1004423]

cafbot (PoC: ggrisco)

Updated

•

9 years ago

Whiteboard: [CR 1004423] → [caf priority: p2][CR 1004423]

Carsten Book [:Tomcat]

Comment 32

•

9 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/ffb1d08f2bd0

Assignee

Comment 33

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/ffb1d08f2bd00a9490f76c89c58325e32c199066 Bug 1132502 (part 2) - Don't call AddClassInfo() for BaseShapes. r=jandem.

Carsten Book [:Tomcat]

Comment 34

•

9 years ago

bugherder

Assignee

Comment 35

•

9 years ago

Comment on attachment 8741591 [details] [diff] [review] (part 1) - Abort if compartmentStats is null during memory reporting Approval Request Comment [Feature/regressing bug #]: N/A. [User impact if declined]: Crashes when looking at about:memory. [Describe test coverage new/current, TreeHerder]: No specific tests. Has been on trunk for a week. [Risks and why]: These two patches are aimed at addressing a moderately frequent crash. The first patch is very simple, and just adds a null check that will crash on failure, which is mostly for diagnostic purposes. The second patch is slightly more complex, but just disables a particular path in memory reporting that is crashy. (In case it's not clear, I want this uplifted to 47, which may be Beta by the time this request is assessed. I don't want it uplifted to 46.) [String/UUID change made/needed]: none.

Attachment #8741591 - Flags: approval-mozilla-beta?

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Assignee

Updated

•

9 years ago

Attachment #8741637 - Flags: approval-mozilla-beta?

Comment 36

•

9 years ago

Comment on attachment 8741591 [details] [diff] [review] (part 1) - Abort if compartmentStats is null during memory reporting Crash fix that has stabilized in Nightly for a week, Beta47+

Attachment #8741591 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Updated

•

9 years ago

status-firefox48: --- → fixed

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Updated

•

9 years ago

Attachment #8741637 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 37

•

9 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/c1689b70c98e https://hg.mozilla.org/releases/mozilla-beta/rev/df8d482a008d

status-firefox47: affected → fixed