Closed
Bug 1284400
Opened 8 years ago
Closed 7 years ago
Crash in nsCycleCollector::FreeSnowWhite for null dereferencing
Categories
(Core :: XPCOM, defect, P3)
Core
XPCOM
Tracking
()
People
(Reporter: ting, Unassigned)
Details
(Keywords: crash)
Crash Data
This bug was filed from the Socorro interface and is report bp-8c202645-c0cf-4247-a301-e91b22160704. ============================================================= #17 of 20160703 linux nightly. This happens on all the platforms, and roughly >20 crashes per day. It is null dereferencing [1], more reports: bp-cd766b1b-f9b6-4c66-9aaf-caa3c2160702 bp-9d439a46-c3a4-48aa-b721-461b52160702 bp-6c6fe2d8-5391-4539-a220-5af452160106 [1] https://crash-stats.mozilla.com/signature/?product=Firefox&address=%3D0x0&date=%3E%3D2016-01-01&signature=nsCycleCollector%3A%3AFreeSnowWhite&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&page=1#reports
Comment 1•8 years ago
|
||
I'm not sure where the null deref is actually happening. In the report you linked, it is actually happening in CanonicalizeXPCOMParticipant, but the other ones list a line in FreeSnowWhite. Maybe there should be a null check of mObject in SnowWhiteKiller somewhere? I don't remember the invariant that is supposed to ensure that is non-null.
Comment 2•8 years ago
|
||
Well, either you have mObject pointing to an object and the entry is in purple buffer, or the entry is in the free list. And PurpleBuffer::VisitEntries has if (!(uintptr_t(e->mObject) & uintptr_t(1)) && e->mObject) { Based on the stack traces it isn't clear to me what actually is null.
Reporter | ||
Comment 3•8 years ago
|
||
By reading the disassembly, it seems the nsPurpleBufferEntry->mRefCnt is null, which is accessed without checking here: https://dxr.mozilla.org/mozilla-central/rev/95ffbc4ff63584631c408e8d9912961fcf68bb09/xpcom/base/nsCycleCollector.cpp#2694 and here: https://dxr.mozilla.org/mozilla-central/rev/95ffbc4ff63584631c408e8d9912961fcf68bb09/xpcom/base/nsCycleCollector.cpp#2801
Reporter | ||
Comment 4•8 years ago
|
||
I am not sure what does a null mRefCnt mean for a visitor, is changing the condition from |!aEntry->mRefCnt->get()| to |aEntry->mRefCnt && !aEntry->mRefCnt->get()| sufficient? aEntry->mRefCnt->IsPurple() is also need to be checked.
Flags: needinfo?(bugs)
Reporter | ||
Updated•8 years ago
|
Assignee: nobody → janus926
Comment 5•8 years ago
|
||
Why would we ever have null mRefCnt but non-null mObject (except when dealing with free list - then mObject is 1). NS_CycleCollectorSuspect3 should be called only by nsCycleCollectingAutoRefCnt and it passes always 'this', and then nsPurpleBuffer::Put sets both mObject and mRefCnt. Do we have some issue in free list handling?
Flags: needinfo?(bugs)
Reporter | ||
Comment 6•8 years ago
|
||
I read the code carefully, but didn't see a path could make null mRefCnt but non-null mObject. There are only two places set mRefCnt null: 1. UnmarkRemainingPurpleVisitor::Visit(), which will set mObject null 2. nsPurpleBuffer::Remove(), which will or mObject with 0x1 so PurpleBlock::VisitEntries() will not call Visit() on the entry Also nsPurpleBuffer::SelectPointers() will call mFirstBlock.InitNextPointers() after FreeBlocks(), so mObject will be reset properly. But there're two things I noticed: a) nsPurpleBuffer::Put() and nsPurpleBuffer::Remove() are not atomic functions and can be interleaved, b) there's no guarantee that there can be at most one visitor visiting the purple buffer. :smaug, could (a) or (b) happen?
Flags: needinfo?(bugs)
Comment 7•8 years ago
|
||
Interleaved how? Cycle collectors are per-thread, so no atomic operation should be needed, and re-entrancy shouldn't happen with ::Put or ::Remove. Hmm, multiple visitors...do we somewhere notify observerservice while a visitor is on stack. I don't see any such case. And in VisitEntries don't seem to be called nestedly.
Flags: needinfo?(bugs)
Comment 8•8 years ago
|
||
Putting a reentrance guard on the purple buffer might not be a bad idea, with a MOZ_RELEASE_ASSERT. Another bit of purple buffer code I don't really trust is the code that clears the buffer if things are left behind. I'm not sure how well tested it is.
Reporter | ||
Comment 9•8 years ago
|
||
If neither (a) nor (b) of comment 6 can happen, the only thing I can think about is hardware error as we use 1 bit to denote whether it is a free entry. If this is the case, maybe we don't need a patch. (In reply to Andrew McCreight [:mccr8] from comment #8) > Putting a reentrance guard on the purple buffer might not be a bad idea, > with a MOZ_RELEASE_ASSERT. Another bit of purple buffer code I don't really > trust is the code that clears the buffer if things are left behind. I'm not > sure how well tested it is. Do you mean the FreeBlocks() in ~nsPurpleBuffer()?
Comment 10•8 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #9) > Do you mean the FreeBlocks() in ~nsPurpleBuffer()? Yes.
Reporter | ||
Comment 11•8 years ago
|
||
Unassign because I am not actively working on this.
Assignee: janus926 → nobody
Comment 12•8 years ago
|
||
Crash volume for signature 'nsCycleCollector::FreeSnowWhite': - nightly (version 50): 9 crashes from 2016-06-06. - aurora (version 49): 6 crashes from 2016-06-07. - beta (version 48): 788 crashes from 2016-06-06. - release (version 47): 1635 crashes from 2016-05-31. - esr (version 45): 67 crashes from 2016-04-07. Crash volume on the last weeks: W. N-1 W. N-2 W. N-3 W. N-4 W. N-5 W. N-6 W. N-7 - nightly 0 1 2 1 3 1 0 - aurora 2 2 1 0 1 0 0 - beta 99 131 94 91 129 142 67 - release 217 236 235 188 215 216 207 - esr 2 12 6 10 8 7 7 Affected platforms: Windows, Mac OS X, Linux
status-firefox48:
--- → affected
status-firefox49:
--- → affected
status-firefox-esr45:
--- → affected
Comment 13•8 years ago
|
||
Crash volume for signature 'nsCycleCollector::FreeSnowWhite': - nightly (version 51): 2 crashes from 2016-08-01. - aurora (version 50): 0 crashes from 2016-08-01. - beta (version 49): 4 crashes from 2016-08-02. - release (version 48): 243 crashes from 2016-07-25. - esr (version 45): 89 crashes from 2016-05-02. Crash volume on the last weeks (Week N is from 08-22 to 08-28): W. N-1 W. N-2 W. N-3 - nightly 0 0 2 - aurora 0 0 0 - beta 1 1 1 - release 79 67 41 - esr 11 9 8 Affected platforms: Windows, Mac OS X, Linux Crash rank on the last 7 days: Browser Content Plugin - nightly - aurora - beta #6164 - release #257 - esr #705
status-firefox51:
--- → affected
Comment 14•7 years ago
|
||
Still see some crashes here, but not enough to be worried about? Andrew, are you willing to attempt comment 8?
Severity: critical → normal
Flags: needinfo?(continuation)
Priority: -- → P3
Comment 15•7 years ago
|
||
I think I already did.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(continuation)
Resolution: --- → INCOMPLETE
Updated•2 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•