Closed Bug 1284400 Opened 4 years ago Closed 2 years ago

Crash in nsCycleCollector::FreeSnowWhite for null dereferencing

Categories

(Core :: XPCOM, defect, P3)

defect

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox48 --- affected
firefox49 --- affected
firefox-esr45 --- affected
firefox50 --- affected
firefox51 --- affected

People

(Reporter: ting, Unassigned)

Details

(Keywords: crash)

Crash Data

I'm not sure where the null deref is actually happening. In the report you linked, it is actually happening in CanonicalizeXPCOMParticipant, but the other ones list a line in FreeSnowWhite. Maybe there should be a null check of mObject in SnowWhiteKiller somewhere? I don't remember the invariant that is supposed to ensure that is non-null.
Well, either you have mObject pointing to an object and the entry is in purple buffer, or the entry is in the free list.
And PurpleBuffer::VisitEntries has
if (!(uintptr_t(e->mObject) & uintptr_t(1)) && e->mObject) {

Based on the stack traces it isn't clear to me what actually is null.
I am not sure what does a null mRefCnt mean for a visitor, is changing the condition from |!aEntry->mRefCnt->get()| to |aEntry->mRefCnt && !aEntry->mRefCnt->get()| sufficient?

aEntry->mRefCnt->IsPurple() is also need to be checked.
Flags: needinfo?(bugs)
Assignee: nobody → janus926
Why would we ever have null mRefCnt but non-null mObject (except when dealing with free list - then mObject is 1). NS_CycleCollectorSuspect3 should be called only by nsCycleCollectingAutoRefCnt and it passes always 'this', and then nsPurpleBuffer::Put sets both mObject and mRefCnt.
Do we have some issue in free list handling?
Flags: needinfo?(bugs)
I read the code carefully, but didn't see a path could make null mRefCnt but non-null mObject.

There are only two places set mRefCnt null:

1. UnmarkRemainingPurpleVisitor::Visit(), which will set mObject null
2. nsPurpleBuffer::Remove(), which will or mObject with 0x1 so PurpleBlock::VisitEntries() will not call Visit() on the entry

Also nsPurpleBuffer::SelectPointers() will call mFirstBlock.InitNextPointers() after FreeBlocks(), so mObject will be reset properly.

But there're two things I noticed: a) nsPurpleBuffer::Put() and nsPurpleBuffer::Remove() are not atomic functions and can be interleaved, b) there's no guarantee that there can be at most one visitor visiting the purple buffer.

:smaug, could (a) or (b) happen?
Flags: needinfo?(bugs)
Interleaved how? Cycle collectors are per-thread, so no atomic operation should be needed, and
re-entrancy shouldn't happen with ::Put or ::Remove.

Hmm, multiple visitors...do we somewhere notify observerservice while a visitor is on stack.
I don't see any such case. And in VisitEntries don't seem to be called nestedly.
Flags: needinfo?(bugs)
Putting a reentrance guard on the purple buffer might not be a bad idea, with a MOZ_RELEASE_ASSERT. Another bit of purple buffer code I don't really trust is the code that clears the buffer if things are left behind. I'm not sure how well tested it is.
If neither (a) nor (b) of comment 6 can happen, the only thing I can think about is hardware error as we use 1 bit to denote whether it is a free entry. If this is the case, maybe we don't need a patch.

(In reply to Andrew McCreight [:mccr8] from comment #8)
> Putting a reentrance guard on the purple buffer might not be a bad idea,
> with a MOZ_RELEASE_ASSERT. Another bit of purple buffer code I don't really
> trust is the code that clears the buffer if things are left behind. I'm not
> sure how well tested it is.

Do you mean the FreeBlocks() in ~nsPurpleBuffer()?
(In reply to Ting-Yu Chou [:ting] from comment #9)
> Do you mean the FreeBlocks() in ~nsPurpleBuffer()?
Yes.
Unassign because I am not actively working on this.
Assignee: janus926 → nobody
Crash volume for signature 'nsCycleCollector::FreeSnowWhite':
 - nightly (version 50): 9 crashes from 2016-06-06.
 - aurora  (version 49): 6 crashes from 2016-06-07.
 - beta    (version 48): 788 crashes from 2016-06-06.
 - release (version 47): 1635 crashes from 2016-05-31.
 - esr     (version 45): 67 crashes from 2016-04-07.

Crash volume on the last weeks:
            W. N-1  W. N-2  W. N-3  W. N-4  W. N-5  W. N-6  W. N-7
 - nightly       0       1       2       1       3       1       0
 - aurora        2       2       1       0       1       0       0
 - beta         99     131      94      91     129     142      67
 - release     217     236     235     188     215     216     207
 - esr           2      12       6      10       8       7       7

Affected platforms: Windows, Mac OS X, Linux
Crash volume for signature 'nsCycleCollector::FreeSnowWhite':
 - nightly (version 51): 2 crashes from 2016-08-01.
 - aurora  (version 50): 0 crashes from 2016-08-01.
 - beta    (version 49): 4 crashes from 2016-08-02.
 - release (version 48): 243 crashes from 2016-07-25.
 - esr     (version 45): 89 crashes from 2016-05-02.

Crash volume on the last weeks (Week N is from 08-22 to 08-28):
            W. N-1  W. N-2  W. N-3
 - nightly       0       0       2
 - aurora        0       0       0
 - beta          1       1       1
 - release      79      67      41
 - esr          11       9       8

Affected platforms: Windows, Mac OS X, Linux

Crash rank on the last 7 days:
           Browser     Content   Plugin
 - nightly
 - aurora
 - beta    #6164
 - release #257
 - esr     #705
Still see some crashes here, but not enough to be worried about?  Andrew, are you willing to attempt comment 8?
Severity: critical → normal
Flags: needinfo?(continuation)
Priority: -- → P3
I think I already did.
Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(continuation)
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.