Closed Bug 1353702 Opened 5 years ago Closed 1 year ago

Crash in PtrToNodeMatchEntry (CompareCacheMatchEntry) and OOM | small during CC. memory corruption, or memory leak? (Thunderbird)

Categories

(Thunderbird :: General, defect)

52 Branch
x86
Windows 7
defect
Not set
critical

Tracking

(thunderbird_esr52 unaffected, thunderbird51 wontfix, thunderbird52 wontfix, thunderbird53 wontfix, thunderbird54 unaffected)

RESOLVED WORKSFORME
Tracking Status
thunderbird_esr52 --- unaffected
thunderbird51 --- wontfix
thunderbird52 --- wontfix
thunderbird53 --- wontfix
thunderbird54 --- unaffected

People

(Reporter: wsmwk, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug, )

Details

(Keywords: crash, regression, regressionwindow-wanted, Whiteboard: [regression:TB50?])

Crash Data

#6 crash for the early goings of TB52.0. I don't have time to analyze, so don't discount the possibility of being an extension.

Users who report this also tend to report crash OOM | Small

For example bp-663a1222-2036-477e-b565-29e8e2170405 (OOM | Small) and bp-4dea5b6e-2b8b-4077-9421-cd75f2170405.
 0 	xul.dll	CompareCacheMatchEntry	security/manager/ssl/nsCertTree.cpp:68
1 	xul.dll	PLDHashTable::SearchTable<1>(void const*, unsigned int)	xpcom/glue/PLDHashTable.cpp:388
2 	xul.dll	PLDHashTable::Add(void const*, mozilla::fallible_t const&)	xpcom/glue/PLDHashTable.cpp:571
3 	xul.dll	CCGraph::AddNodeToMap(void*)	xpcom/base/nsCycleCollector.cpp:924
4 	xul.dll	CCGraphBuilder::AddNode(void*, nsCycleCollectionParticipant*)	xpcom/base/nsCycleCollector.cpp:2214
5 	xul.dll	CCGraphBuilder::NoteChild(void*, nsCycleCollectionParticipant*, nsCString&)	xpcom/base/nsCycleCollector.cpp:2150
6 	xul.dll	CCGraphBuilder::NoteJSChild(JS::GCCellPtr const&)	xpcom/base/nsCycleCollector.cpp:2422
7 	xul.dll	TraversalTracer::onChild(JS::GCCellPtr const&)	xpcom/base/CycleCollectedJSContext.cpp:360
8 	xul.dll	JS::CallbackTracer::onObjectEdge(JSObject**)	C:/builds/moz2_slave/tb-rel-c-esr52-w32_bld-0000000/build/objdir-tb/dist/include/js/TracingAPI.h:142
9 	xul.dll	DoCallbackFunctor<JS::Value>::operator()<JSObject>(JSObject*, JS::CallbackTracer*, char const*)	js/src/gc/Tracer.cpp:62
10 	xul.dll	js::DispatchTyped<DoCallbackFunctor<JS::Value>, JS::CallbackTracer*&, char const*&>(DoCallbackFunctor<JS::Value>, JS::Value const&, JS::CallbackTracer*&, char const*&)	C:/builds/moz2_slave/tb-rel-c-esr52-w32_bld-0000000/build/objdir-tb/dist/include/js/Value.h:1483
11 	xul.dll	DoCallback<JS::Value>(JS::CallbackTracer*, JS::Value*, char const*)	js/src/gc/Tracer.cpp:70
12 	xul.dll	DispatchToTracer<JS::Value>(JSTracer*, JS::Value*, char const*)	js/src/gc/Marking.cpp:676
13 	xul.dll	DispatchToTracer<JS::Value>(JSTracer*, JS::Value*, char const*)	js/src/gc/Marking.cpp:676
14 	xul.dll	js::TraceManuallyBarrieredEdge<js::Shape*>(JSTracer*, js::Shape**, char const*)	js/src/gc/Marking.cpp:676
15 	xul.dll	JS::DispatchTraceKindTyped<TraceChildrenFunctor, JSTracer*&, void*&>(TraceChildrenFunctor, JS::TraceKind, JSTracer*&, void*&)	C:/builds/moz2_slave/tb-rel-c-esr52-w32_bld-0000000/build/objdir-tb/dist/include/js/TraceKind.h:186
16 	xul.dll	mozilla::CycleCollectedJSContext::NoteGCThingJSChildren(JS::GCCellPtr, nsCycleCollectionTraversalCallback&)	xpcom/base/CycleCollectedJSContext.cpp:639
17 	xul.dll	mozilla::CycleCollectedJSContext::TraverseGCThing(mozilla::CycleCollectedJSContext::TraverseSelect, JS::GCCellPtr, nsCycleCollectionTraversalCallback&)	xpcom/base/CycleCollectedJSContext.cpp:697
18 	xul.dll	mozilla::JSGCThingParticipant::Traverse(void*, nsCycleCollectionTraversalCallback&)	xpcom/base/CycleCollectedJSContext.cpp:307
19 	xul.dll	CCGraphBuilder::BuildGraph(js::SliceBudget&)	xpcom/base/nsCycleCollector.cpp:2282
20 	xul.dll	nsCycleCollector::MarkRoots(js::SliceBudget&)	xpcom/base/nsCycleCollector.cpp:2879 

Another user bp-63542d50-5840-4812-890c-c23942170405 (OOM | Small) and bp-57538aaf-81b3-487a-8b9f-6e73b2170405
What is this doing?
"Trace" suggests some type of GC operation?
Oh yes, I see CycleCollector at the end of the stack trace above.
But what is this "11 DoCallback"
It seems there is some type of Callback defined that is called upon entry to "Trace..."
9,10 suggests there is a meta function to invoke such callbacks.

9 	xul.dll	DoCallbackFunctor<JS::Value>::operator()<JSObject>(JSObject*, JS::CallbackTracer*, char const*)	js/src/gc/Tracer.cpp:62
10 	xul.dll	js::DispatchTyped<DoCallbackFunctor<JS::Value>, JS::CallbackTracer*&, char const*&>(DoCallbackFunctor<JS::Value>, JS::Value const&, JS::CallbackTracer*&, char const*&)	C:/builds/moz2_slave/tb-rel-c-esr52-w32_bld-0000000/build/objdir-tb/dist/include/js/Value.h:1483

Then we have
8 	xul.dll	JS::CallbackTracer::onObjectEdge(JSObject**)	C:/builds/moz2_slave/tb-rel-c-esr52-w32_bld-0000000/build/objdir-tb/dist/include/js/TracingAPI.h:142
So this could be the callback thus hooked (?)
The rest of the stack suggests that we are creating some graphs (for cycle detection), and
when the graph creation calls PDLhash, presumably to see if a node is already is in the being-built graph (which means duplication, and presumably suggests a cycle since a preexisting node is reached by following a series of edges),
somehow TB crashes.
I wonder why this is JA-related.

> see https://crash-stats.mozilla.com/signature/?signature=CompareCacheMatchEntry#comments
I see.
I guess I am just lucky?
Wayne, what the OSs these people are using.
I am using TB with English menus under linux without many issues.
Also I use TB with Japanese menus under Windows 10 without many issues.

[Not sure if my TB has been upgraded yet.
OUCH. I just checked. The window TB version was 45.8.0 !!! 
So I am likely to see the crashes. OMG.]


Is there a fixed set of steps to reproduce this?
Maybe I want to run TB under valgrind to see if there are memory-related issues that valgrind/memcheck can uncover if the problem also manifests under linux.

TIA
Flags: needinfo?(vseerror)
OK, the top of the stack.

 0 	xul.dll	CompareCacheMatchEntry	security/manager/ssl/nsCertTree.cpp:68
1 	xul.dll	PLDHashTable::SearchTable<1>(void const*, unsigned int)	xpcom/glue/PLDHashTable.cpp:388

I can understand why PLDHashTable is used for detecting node duplication. Hash table is an efficient mechanism for this type of match.

However, why on earth are we calling CompareCacheMatchEntry in nsCertTree.cpp?
Is there anything special about this node (in the graph that is being created)
that requires the invocation of CERT-specific matching function.

Maybe the CERT contains a Japanese character or two in a field which this matching function looks at?
(In reply to ISHIKAWA, Chiaki from comment #3)
> OK, the top of the stack.
> 
>  0 	xul.dll	CompareCacheMatchEntry	security/manager/ssl/nsCertTree.cpp:68
> 1 	xul.dll	PLDHashTable::SearchTable<1>(void const*, unsigned int)
> xpcom/glue/PLDHashTable.cpp:388
> 
> I can understand why PLDHashTable is used for detecting node duplication.
> Hash table is an efficient mechanism for this type of match.
> 
> However, why on earth are we calling CompareCacheMatchEntry in
> nsCertTree.cpp?

Due to compiler optimization.  VC's linker will merge same implementation code during linker time.

> Is there anything special about this node (in the graph that is being
> created)
> that requires the invocation of CERT-specific matching function.
> 
> Maybe the CERT contains a Japanese character or two in a field which this
> matching function looks at?

I don't think so because this is into cycle collector.
OS are all MS-Windows
Windows 7	2121	94.5%
Windows 10	85	3.8%
Windows Vista	25	1.1%
Windows 8.1	7	0.3%
Windows XP	7	0.3%

But perhaps there is a different crash signature for Mac?
Flags: needinfo?(vseerror)
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #5)
> OS are all MS-Windows
> Windows 7	2121	94.5%
> Windows 10	85	3.8%
> Windows Vista	25	1.1%
> Windows 8.1	7	0.3%
> Windows XP	7	0.3%
> 
> But perhaps there is a different crash signature for Mac?

[I thought I replied to this. But obviously the post failed.]

I don't have development environment for MS Windows.

Given that all these come from Windows, then I suspect toolchain issue or
W32API and/or other MS library issues.

However, you mention "Mac".

How can we take a look at the top N crash reports.
OK, I figured it out.
After looking at the top crash and the bugzilla comment, I conclude that
there maybe something fishy with GC, especially CycleCollector, that is invoked from time to time(?)

If I am not mistaken, all the stack trace associated with the crash here 
shows that the crash occurs in CycleCollector phase.


.
Short of compiler error or eager linker error that merges code erroneously 

looking at the code,
 mozilla-central/xpcom/ds/PLDHashTable.cpp
[Note this is when there was a collision and we are checking the
collision chain.]

    entry = AddressEntry(hash1);
    if (EntryIsFree(entry)) {
      return (Reason == ForAdd) ? (firstRemoved ? firstRemoved : entry)
                                : nullptr;
    }

    if (MatchEntryKeyhash(entry, aKeyHash) &&  <=== This is in the stack when the crash occurs
        matchEntry(entry, aKey)) {
      return entry;
    }

Given EntryIsFree(entry) is this


  static bool EntryIsFree(PLDHashEntryHdr* aEntry)
  {
    return aEntry->mKeyHash == 0;
  }

We can't have |entry| to be nullptr because if would have blown up in the |EntryIsFree()| call before TB blows up.

So let us take a look at CompareCacheMatchEntry().
This one in SSL mentioned in the final stack when TB blows up.

static bool
CompareCacheMatchEntry(const PLDHashEntryHdr *hdr, const void *key)
{
  const CompareCacheHashEntryPtr *entryPtr = static_cast<const CompareCacheHashEntryPtr*>(hdr);
  return entryPtr->entry->key == key;
}

|hdr| cannot be null. If the cast is not buggy then |entryPtr| is not null.
Hence |entryPtr->entry| must be null to cause the reference to 0x0.
But wait a second I saw something interesting for CompareCacheMatchEntry.


https://dxr.mozilla.org/mozilla-central/source/security/manager/ssl/nsCertTree.cpp#86
static const PLDHashTableOps gMapOps = {
  PLDHashTable::HashVoidPtrKeyStub,
  CompareCacheMatchEntry,
  PLDHashTable::MoveEntryStub,
  CompareCacheClearEntry,
  CompareCacheInitEntry
};
So it has defined the operation for hashtable manipulation.
That is why all of a sudden in the PDLHash operation, we see the call to
CompareCacheMatchEntry.

This function substitution is done in
https://dxr.mozilla.org/comm-central/source/mozilla/xpcom/ds/PLDHashTable.cpp#352
early in searchTable:

  // Hit: return entry.
  PLDHashMatchEntry matchEntry = mOps->matchEntry; <=== THIS
  if (MatchEntryKeyhash(entry, aKeyHash) &&
      matchEntry(entry, aKey)) {
    return entry;
  }

I thought hash table operation has been checked throughly by its extensive use.
However, cert code overriding the compare operation suggest that there is a chance that 
Cert code is broken one way or the other under Windows.

Someone with Windows development environment ought to insert the dump for
|entryPtr| and |entry| in CompareCacheMatchEntry.

The obvious bandage would be to add
 if (entryPtr->entry == nullptr)
    return false;
before the current call in CompareCacheMatchEntry, but I think this masks the underlying possibly deeper issue.

I think someone familiar with Cert code ought to take a look.

Maybe the chance of having a collision in the cache was rare before?
Hmm. Even 52.0.1 does not seem to solve the problem.

https://crash-stats.mozilla.com/report/index/18bb82ae-7144-41f2-8fda-d8e102170417

Wayne, I have a suspicion that there may be Cert-related problem under Windows.

Would you like to take a survey of 10-20 users to ask if they are using public (meaning that anyone can subscribe)
mail service?
Use: IMAP, POP3 ?
(AND *IF* they have multiple mail accounts.: if my guess that collision in the hash chain may be an issue, this question is
important. Ask them WHAT mail servers they are using. If they have authentication enabled on them. Are they public or private mail servers?)

The reason I am asking is that I may be able to subscribe to such a public mail service
under my Japanese Win10 and/or win7 (lucky me: a recently acquired second hand PC has win7 on it. I am trying to set up
my development environment Linux image on it.)
and see if the problem can be recreated there.

You can come up a question phrase in English, and then I can create a Japanese translation and add "answering in Japanese is fine", etc.
You can send out the question in English + my Japanese translation to the Japanese audience that sent back the crash report.
We can start to see what is going there if there is an unlucky public mail server that caused the crash issue.

TIA
PS: I am using TB under Windows10 on one of my PCs. At the office, I use linux.
If TB under windows10 crashes as reported by the users, I will be deep trouble.
Flags: needinfo?(vseerror)
This crash occurred for me on windows 7 with only one account - gmail via imap.
(In reply to Marc Auslander from comment #9)
> This crash occurred for me on windows 7 with only one account - gmail via
> imap.

I suppose that this is unlikely, but are you using a Japanese localized version of TB?
(I have to ask since I recall that some localized versions have different, say, search engine setting, etc.)

TIA
Flags: needinfo?(marcausl)
It's the standard English version, gotten but update from the previous release.
Flags: needinfo?(marcausl)
(In reply to Marc Auslander from comment #11)
> It's the standard English version, gotten but update from the previous
> release.

Thanks. Once my PC transition is completely over, I will investigate if using gmail with the latest TB under windows 10 ever crashes on me. 
[I could finally create TB with my local patches on the  new (actually second hand) PC. but I was shocked to see that |make mozmill| had so many errors and only about 650 tests succeeded and there were three or more timeouts. This is bad :-(
Something went very bad in the last couple of months. 

Then I realize that I have to move over the ssh key to my new environment so that I can submit try-comm-central job. I never thought moving environment was such a hassle. Maybe because in my current setup, due to electricity supply problem, I cannot turn on both the older and newer PC. At least, I have found that moving TB with its profile and mail files to a new environment is not that difficult. I have done so a few times. Also, I run a special command to save mail articles. So TB alone is not such a hassle to move over to a new PC. The problem is that I have come to depended on so many tiny bits in many places: firefox add-ons, various web services for which I can't even recall the ID/password on my own (firefox remembers them for me), etc. Even tools such as gawk under linux is not installed by default, etc.   In the hassle, I realize TB and FF are not such bad software as far as moving the user data/profile to a new environment: that is a relief.
OTOH, I am not sure how the next generation of users will cope with the proliferation of services including mail-like services that require users to log in with ID/passwords. Services such as skype that hold the all the user profiles ON THEIR HOSTS are much easier to deal with. But a lot of users are likely to (and ought to) feel uncomfortable with such services.
Well, I digress. This is probably good for tb-planning discussion.

Again, I will look into this issue (and pending cleaning up of my patches for enabling buffering for TB) and a few other bug issues. It is just I was so shocked that I have come to depend on "free" services in my computing and that moving all the
profiles to the new environemtn was necessary: in the old days, I think all I needed was the mail server password and that was it.

TIA
Add another signature.  (when compiler doesn't optimize well, symbol becomes PtrToNodeMatchEntry).

This is NOT thunderbird 52+ bug.  This is CC issue.
Crash Signature: [@ CompareCacheMatchEntry] → [@ CompareCacheMatchEntry] [@ PtrToNodeMatchEntry]
(In reply to Makoto Kato [:m_kato] from comment #13)
> Add another signature.  (when compiler doesn't optimize well, symbol becomes
> PtrToNodeMatchEntry).
> 
> This is NOT thunderbird 52+ bug.  This is CC issue.

I'm not certain what you are saying. Are you saying it should move to javascript: GC?


(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #0)
> 
> Users who report this also tend to report crash OOM | Small

And also signature CCGraphBuilder::BuildGraph

I am not able to correlate this to an addon.
Flags: needinfo?(vseerror) → needinfo?(m_kato)
Summary: Crash in CompareCacheMatchEntry → Crash in CompareCacheMatchEntry during GC
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #14)
> (In reply to Makoto Kato [:m_kato] from comment #13)
> > Add another signature.  (when compiler doesn't optimize well, symbol becomes
> > PtrToNodeMatchEntry).
> > 
> > This is NOT thunderbird 52+ bug.  This is CC issue.
> 
> I'm not certain what you are saying. Are you saying it should move to
> javascript: GC?

Since this is cycle collector, it is Core: XPCOM.
Flags: needinfo?(m_kato)
Component: General → XPCOM
Product: Thunderbird → Core
Summary: Crash in CompareCacheMatchEntry during GC → Crash in PtrToNodeMatchEntry (CompareCacheMatchEntry) during GC
I've seen some crashes like this in Firefox (for instance: bp-8553ac72-5a04-4ae5-903d-caf5a0170424) but never in this high volume. I always assumed it was some OOM failure where we failed to allocate memory, and somehow a null pointer got out. I never figured out exactly how. But this hash table can be large if there's a lot of garbage, and on 32-bit builds, it can be hard to find contiguous space for allocations of more than a few megabytes. That does seem to match up with comment 0 indicating the this is associated with OOM | small. Maybe there's some memory leak?
Summary: Crash in PtrToNodeMatchEntry (CompareCacheMatchEntry) during GC → Crash in PtrToNodeMatchEntry (CompareCacheMatchEntry) during CC
jorg, we will want this for esr. note comment 7.

the earliest buildid I find is 50.0b2 20161017040505 in bp-ba918105-9eeb-49a1-b317-4eab20170420 
unfortunately no crashes found for nightly builds.
Well, it crashed in M-C code, security/manager/ssl/nsCertTree.cpp, and apparently the CC is at fault, also an M-C piece. So as much as we may want it, the TB team won't fix the problem. And if it's not seen in Daily, even harder to catch.
Number of beta crashes in the prior 6 months:
53.0b2 - 20
53.0b1 - 0 
52.0b4 - 0
52.0b3 - 1
52.0b2 - 3
52.0b1 - 0
51.0b1 - 1
50.0b3 - 21
50.0b2 - 1

Only one nightly crash - bp-2f1810c2-4f63-44e9-aa1a-d62642170108 53.0a1 (user has bitdefender addon installed)
Summary: Crash in PtrToNodeMatchEntry (CompareCacheMatchEntry) during CC → Crash in PtrToNodeMatchEntry (CompareCacheMatchEntry) during CC. memory leak?
Whiteboard: [regression:TB??] → [regression:TB50?]
I am not sure if this is directly related, but I seem to have found a use-after-free problem using valgrind.
(Well, recent source code must have changed the timeout-value in a few places and so I get timeout-abort
and not much testing is done, but I see this use-after-free problems in some tests very clearly and I have not seen them before [ > 3 months ]. 
I see PDHash-related function GC-related function in the usage, allocation and release stack frames.

I have used GCC-5 and will check if GCC-6 will produce similar results.
More on this later, hopefully over the weekend.
FF 53.02 still causes this crash: https://crash-stats.mozilla.com/report/index/897843c2-8626-4ed9-8a6a-802fb1170515

In my case I was not doing anything in the browser in terms of active user actions/clicks/mouse movements, but there was a long YouTube video playing back. The memory footprint got fatter and fatter and then I got the crash (weirdly enough, I am getting sites' icons wiped out in all loaded tabs: #bug 1363397).
(In reply to User Dderss from comment #21)
> FF 53.02 still causes this crash:
> https://crash-stats.mozilla.com/report/index/897843c2-8626-4ed9-8a6a-
> 802fb1170515
> 
> In my case I was not doing anything in the browser in terms of active user
> actions/clicks/mouse movements, but there was a long YouTube video playing
> back. The memory footprint got fatter and fatter and then I got the crash
> (weirdly enough, I am getting sites' icons wiped out in all loaded tabs:
> #bug 1363397).

Hi, 

From the bug report 
> Product 	Firefox
and your description,
I take that you were browsing in Firefox when the issue happened, correct?

This is WONDERFUL report in that this is probably the first time a firefox user reported this issue!

It is now clear that the bad code is lurking in so called M-C firefox portion of the code and
so the hunt should be focused there (and I assume that everyone who saw the stack trace should have realized that already but for strange reasons, the bug is easily triggered in C-C TB , and strangely very often in Japanese locale !?)

BTW, I found a use-after-free issue by running C-C TB under valgrind while a test suite was run using |make mozmill| command.
It *may* be related to this issue, but may be not.
Bug 1364543 - Possible use-after-free with C-C TB (a new twist unlike bug 1298103)

TIA
Blocks: 1353704
Depends on: 1364543
Is cyclic collector reentrant?
Or for that matter mark/sweep invoked by timer reentrant?

See https://bugzilla.mozilla.org/show_bug.cgi?id=1364543#c0

If it is not, and the next invocation happens due to periodic timer firing while the previous invocation is still progressing,
then we have a problem. 

In my valgrind testing, the slowdown of CPU execution relative to wall clock time
may have caused the problem related to reentrancy. If under tight-memory condition, the memory is very much committed, Cyclic Collector may take a LOOONG time to finish, and thus may not return until the next timer event.
(In reply to ISHIKAWA, Chiaki from comment #23)
> Is cyclic collector reentrant?

It isn't supposed to be, but it does seem like there are conditions where it is. I've tried adding some release asserts to catch this, but I don't think they've turned up anything yet.
(In reply to Andrew McCreight [:mccr8] from comment #24)
> (In reply to ISHIKAWA, Chiaki from comment #23)
> > Is cyclic collector reentrant?
> 
> It isn't supposed to be, but it does seem like there are conditions where it
> is. I've tried adding some release asserts to catch this, but I don't think
> they've turned up anything yet.

Fingers crossed :-)

It is always easy to not to call Cycllic Collector or periodic Mark/Sweep if previous invocation is running, but then again, we may be disabling a fine-grained opportunity for GC progress...
Currently assessing whether this crash is gone in recent 54 betas
This crash signature doesn't exist in 54 beta, does exist in 53.0b2, but none reported for 53.0b1.  jorg, NI you in hopes you may have an idea ...

I have contacted many heavy crashing users, and the ~6 replies in the last 3 days report crash is gone when they use the beta.  (And as previously noted most of these also crash OOM | small)  So a puzzling and interesting question is, what patch exists in 54 beta that hasn't been uplifted to 52, either from comm-beta or mozilla-beta??  Or, maybe we need to ask a different question?  Is there a regression that 54 beta for some reason doesn't have, or is otherwise masked? 

I am pushing on this because we really must have this for 52 (and it needs to be marked tracking+), the suspected relationship to other top crashes like bug 1353702, and it is somewhat unclear so far whether bug 1364543 is the key to fixing this.
Flags: needinfo?(jorgk)
Flags: needinfo?(ishikawa)
What can I say, the crash is well and truly in M-C. The cycle collection calls xpcom/glue/PLDHashTable.cpp:388 and that calls into security/manager/ssl/nsCertTree.cpp:68.

I think tearing our hair about what's different in the different versions doesn't get us anywhere since many things are different and we don't know how this error is triggered.

The crash can be avoided easily. We know it crashes on this line:
https://hg.mozilla.org/mozilla-central/annotate/63b447888a64/security/manager/ssl/nsCertTree.cpp#l68
  return entryPtr->entry->key == key
So either entryPtr is null or entryPtr->entry is null. We can test that and not crash. But someone owning that code needs to look into it.
Flags: needinfo?(jorgk)
Component: XPCOM → General
Product: Core → Thunderbird
CompareCacheMatchEntry in 52.2.0 is gone (is currently #3 in 52.1.1)

PtrToNodeMatchEntry in 52.2.0 is #79  (not seeing it in top 200 of 52.1.1)
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #29)
> CompareCacheMatchEntry in 52.2.0 is gone (is currently #3 in 52.1.1)
> 
> PtrToNodeMatchEntry in 52.2.0 is #79  (not seeing it in top 200 of 52.1.1)

Currently running version 52.2.0 (32-bit).

Are you saying here this is broken and will not be fixed on this version, but WILL be fixed on version 53? How far off is that happening? Also, is it not broken in version 54? Would it just make sense to skip version 53 and jump to version 54 to avoid these bugs?

Thank you.
You will note there is no patch in this bug report nor in bug 1364543. So we do not know what changed to affect the rate of these crash signatures, and it also is not possible predict what is going to happen in future versions.
Summary: Crash in PtrToNodeMatchEntry (CompareCacheMatchEntry) during CC. memory leak? → Crash in PtrToNodeMatchEntry (CompareCacheMatchEntry) and OOM | small during CC. memory leak?
At this moment, I have nothing to add to this bug...
Flags: needinfo?(ishikawa)
(In reply to Worcester12345 from comment #30)
> Would it just make sense to skip version 53 and jump to version 54 to avoid these bugs?

Are you / were you impacted?


Crash rate dropped substantially, at the same time as signatures for bug 1353704.  CompareCacheMatchEntry  is #3 crash for 52.1.1 but not in top 100 for 52.2.1.  zero crashes on nightly channel in last 6 months https://crash-stats.mozilla.com/signature/?product=Thunderbird&release_channel=nightly&signature=CompareCacheMatchEntry&date=%3E%3D2017-01-21T11%3A29%3A36.000Z&date=%3C2017-07-21T11%3A29%3A36.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1

PtrToNodeMatchEntry is #77 for 52.2.1. And no nightly crashes since bp-b062f838-aa49-4240-b014-e0bc50170703 2017-07-03 05:42:13	Thunderbird	56.0a1	20170702030207
Flags: needinfo?(worcester12345)
(In reply to Wayne Mery (:wsmwk, NI for questions) from comment #33)
> (In reply to Worcester12345 from comment #30)
> > Would it just make sense to skip version 53 and jump to version 54 to avoid these bugs?
> 
> Are you / were you impacted?

I do not think so. Things seem to be working right now.

Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:55.0) Gecko/20100101 Firefox/55.0 ID:20170724055319
Flags: needinfo?(worcester12345)
CompareCacheMatchEntry: #64 crash for 52.4.0.  56 beta has none.  54 beta is the first to have none for CompareCacheMatchEntry according to statistics - https://crash-stats.mozilla.com/signature/?product=Thunderbird&release_channel=%21release&signature=CompareCacheMatchEntry&date=%3E%3D2017-07-05T12%3A54%3A00.000Z&date=%3C2017-10-24T12%3A54%3A59.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#reports

Several users report relief starting with 52.2.0. But graph of the last few months shows no change
https://crash-stats.mozilla.com/signature/?product=Thunderbird&release_channel=release&signature=CompareCacheMatchEntry&date=%3E%3D2017-07-05T12%3A54%3A00.000Z&date=%3C2017-10-24T12%3A54%3A59.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#graphs

However, approximately 2/3 of release channel users of the past month are using version older than 52.4.0, for example 52.1.1 and 52.1.0 (there appear to be none for 52.3.0). So I think if we subtracted those users out we would see a change in the graph

Several of the users have/had multiple signature, for example (ranked in this order)
OOM | small  
CompareCacheMatchEntry 
OOM | large | js::AutoEnterOOMUnsafeRegion::crash | js::AutoEnterOOMUnsafeRegion::crash | js::TenuringTracer::moveToTenured  (bug 1337105, bug 1257387)
xul.dll@0x11844d4   
CCGraphBuilder::BuildGraph   (bug 1272230)
OOM | large | NS_ABORT_OOM | nsAString_internal::Replace  
OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xmalloc | ToNewUnicode   
OOM | unknown | NS_ABORT_OOM | nsBaseHashtable<T>::Put | mozilla::image::ImageSurfaceCache::Insert  
OOM | unknown | js::AutoEnterOOMUnsafeRegion::crash | js::gc::StoreBuffer::MonoTypeBuffer<T>::put
Summary: Crash in PtrToNodeMatchEntry (CompareCacheMatchEntry) and OOM | small during CC. memory leak? → Crash in PtrToNodeMatchEntry (CompareCacheMatchEntry) and OOM | small during CC. memory leak? (Thunderbird)
See Also: → 1434413
(In reply to Wayne Mery (:wsmwk) from comment #37)
> So it turns out PtrToNodeMatchEntry had no crashes for 57 beta (seems odd),
> but 56 did eventually.  And a couple so far for 58 beta

seems to be back with 59.0b1 and 60.0a1

https://crash-stats.mozilla.com/signature/?product=Thunderbird&release_channel=beta&release_channel=nightly&_sort=-version&_sort=email&_sort=-date&signature=PtrToNodeMatchEntry&date=%3E%3D2017-08-19T16%3A07%3A41.000Z&date=%3C2018-02-19T15%3A07%3A41.000Z#graphs
See Also: → 1439393
Blocks: 1330872
Summary: Crash in PtrToNodeMatchEntry (CompareCacheMatchEntry) and OOM | small during CC. memory leak? (Thunderbird) → Crash in PtrToNodeMatchEntry (CompareCacheMatchEntry) and OOM | small during CC. memory corruption, or memory leak? (Thunderbird)
https://support.mozilla.org/en-US/questions/1213987 is another report.

The person associated with https://support.mozilla.org/en-US/questions/1163018 rebuilt the machine and the problem is gone. So at least in that case some external forces were likely related.
PtrToNodeMatchEntry 
- overall (which includes Firefox) Crash rate significantly dropped mid-September and mid-November when there were big pushes to version 60 https://crash-stats.mozilla.com/signature/?signature=PtrToNodeMatchEntry&date=%3E%3D2018-06-18T07%3A45%3A29.000Z&date=%3C2018-12-18T07%3A45%3A29.000Z#graphs
- Thunderbird only, crash rate is pretty flat, with a slight spike in Dec https://crash-stats.mozilla.com/signature/?product=Thunderbird&signature=PtrToNodeMatchEntry&date=>%3D2018-06-17T16%3A45%3A29.000Z&date=<2018-12-17T16%3A45%3A29.000Z#graphs

CompareCacheMatchEntry 
- 99% of crashes are THunderbird, and crash rate significantly increased early to mid-November https://crash-stats.mozilla.com/signature/?signature=CompareCacheMatchEntry&date=>%3D2018-06-18T07%3A44%3A42.000Z&date=<2018-12-18T07%3A44%3A42.000Z#graphs

PtrToNodeMatchEntry

See Also: → 1560037

No version 68 or 78 crashes, so => WFM (a toss up with incomplete)

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.