Closed Bug 1296631 Opened 8 years ago Closed 4 years ago

Crash in PLDHashTable::Add | CCGraphBuilder::NoteJSChild

Categories

(Core :: XPCOM, defect, P3)

x86
Windows XP
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox49 --- wontfix
firefox50 --- wontfix
firefox51 --- wontfix
firefox52 --- wontfix
firefox-esr52 --- wontfix
firefox-esr60 --- wontfix
firefox53 --- wontfix
firefox56 --- wontfix
firefox57 --- wontfix
firefox58 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- wontfix
firefox62 --- wontfix
firefox63 --- wontfix
firefox64 --- wontfix
firefox65 --- wontfix

People

(Reporter: davidb, Unassigned)

References

Details

(4 keywords)

Crash Data

Crash volume for signature 'PLDHashTable::Add | CCGraphBuilder::NoteJSChild':
 - nightly (version 52): 2 crashes from 2016-09-19.
 - aurora  (version 51): 10 crashes from 2016-09-19.
 - beta    (version 50): 253 crashes from 2016-09-20.
 - release (version 49): 1257 crashes from 2016-09-05.
 - esr     (version 45): 0 crashes from 2016-06-01.

Crash volume on the last weeks (Week N is from 10-03 to 10-09):
            W. N-1  W. N-2
 - nightly       1       1
 - aurora       10       0
 - beta        232      21
 - release    1049     207
 - esr           0       0

Affected platform: Windows

Crash rank on the last 7 days:
           Browser   Content     Plugin
 - nightly           #481
 - aurora  #300      #152
 - beta    #62       #111
 - release #53       #64
 - esr
Keywords: regression
Any thoughts on this, mccr8?
Flags: needinfo?(continuation)
(In reply to Ryan VanderMeulen [:RyanVM] from comment #3)
> Any thoughts on this, mccr8?

I looked at this before, and I wasn't able to come up with anything. It is probably an OOM crash, given that almost all of the crashes are null derefs, and the "system memory use" numbers are very high. Maybe we're not null checking some allocation somewhere.
Flags: needinfo?(continuation)
Seems unactionable given comment 4.
Priority: -- → P3
https://crash-stats.mozilla.com/report/index/18c65c27-fae3-4fa3-a7c9-7db442161204
If that's helping somehow, firefox was idle when it crashed.
njn, can the uptime team in Taipei take a look at this?
Flags: needinfo?(n.nethercote)
(In reply to Jim Mathies [:jimm] from comment #8)
> njn, can the uptime team in Taipei take a look at this?

Kan-Ru would be a better person to ask about that...
Flags: needinfo?(n.nethercote) → needinfo?(kchen)
I tried to look at this but like comment 4 said, it's likely a missing OOM check somewhere else. Maybe we can add a MOZ_DIAGNOSTIC_ASSERT to find out which pointer that got added to CC is nullptr. Or we can try to look at Coverity and see if we can find where we miss a check.

That said, given the affected user were already OOMing, I doubt that fixing this can really help them.
Flags: needinfo?(kchen)
Crash Signature: [@ PLDHashTable::Add | CCGraphBuilder::NoteJSChild] → [@ PLDHashTable::Add | CCGraphBuilder::NoteJSChild] [@ PLDHashTable::Add | TraversalTracer::onChild] [@ PLDHashTable::Add | js::DispatchTyped<T>] [@ PtrToNodeMatchEntry ]
This is a top crash in 55.0.3 and in beta 56. If most of these are win 7 and OOM crashes, maybe this is not actionable. 

There are also significant Fennec crashes.
Signature report for PLDHashTable::Add | TraversalTracer::onChild

Showing results from 7 days ago

Firefox 	60.0a1 	12 	1.0% 	16
Firefox 	59.0b6 	7 	0.6% 	6
Firefox 	59.0b5 	33 	2.8% 	24
Firefox 	59.0b4 	26 	2.2% 	34
Firefox 	59.0b3 	8 	0.7% 	6
Firefox 	58.0b99 	4 	0.3% 	5
Firefox 	58.0b16 	3 	0.3% 	3
Firefox 	58.0b14 	3 	0.3% 	3
Firefox 	58.0 	79 	6.8% 	67
Firefox 	57.0b4 	1 	0.1% 	1
Firefox 	57.0.4 	36 	3.1% 	35
Firefox 	57.0.2 	1 	0.1% 	1
One place where we can end up with an incomplete hash entry is in |CCGraphBuilder::AddNode| [1]. If |mNodeBuilder.Add| fails you end up with an incomplete |mGraph| entry which would lead to null deref in comparison.

A simple solution is to remove the entry if |mNodeBuilder.Add| fails, but it's not clear what downstream affects that will have as most consumers don't really check for failure.

[1] https://searchfox.org/mozilla-central/rev/ad36eff63e208b37bc9441b91b7cea7291d82890/xpcom/base/nsCycleCollector.cpp#2278-2286
This looks like a high volume crash in 62 with 3000+ crashes in the last week on release, split between desktop and Firefox for Android. These exact signatures aren't showing up in 63/64, but other possibly related ones are.  
Should we bump up the priority of investigating this crash from P3?
Nathan or Eric, what do you think?
Flags: needinfo?(nfroyd)
Flags: needinfo?(erahm)
Keywords: topcrash
These are roughly the same as the various GC crashes, and are not really actionable. Something bad happened in the past, and this crash is just the first thing to notice.
Flags: needinfo?(nfroyd)
Flags: needinfo?(erahm)
> This looks like a high volume crash in 62 with 3000+ crashes in the last week on release, split between desktop and Firefox for Android. These exact signatures aren't showing up in 63/64, but other possibly related ones are.

Bug 1477627 changed the hash table used within CCGraph from PLDHashTable to mozilla::HashSet. That might explain changed signatures, but shouldn't have any other effect.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.