Closed Bug 1321338 Opened 8 years ago Closed 7 years ago

XPCOM refcount logging still doesn't work (in debug builds only)

Categories

(Core :: XPCOM, defect)

52 Branch
defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1307961
Tracking Status
firefox50 --- unaffected
firefox51 --- unaffected
firefox52 --- fix-optional
firefox53 --- fix-optional

People

(Reporter: ehsan.akhgari, Assigned: ehsan.akhgari)

References

Details

(Keywords: regression)

Attachments

(1 file, 1 obsolete file)

Bug 1316527 wasn't a complete fix.

STR:

1. export XPCOM_MEM_LOG_CLASSES=Timeout
2. Run a mochitest.

Result:
Assertion failure: !aCreate (If an object already has a serial number, we should be destroying it.), at /Users/ehsan/moz/src/xpcom/base/nsTraceRefcnt.cpp:584
Process 68766 stopped
* thread #1: tid = 0x1b7b93f, 0x0000000101d38020 XUL`GetSerialNumber(aPtr=0x0000000191c08b00, aCreate=true) + 160 at nsTraceRefcnt.cpp:584, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0)
    frame #0: 0x0000000101d38020 XUL`GetSerialNumber(aPtr=0x0000000191c08b00, aCreate=true) + 160 at nsTraceRefcnt.cpp:584
   581                                              HashNumber(aPtr),
   582                                              aPtr);
   583    if (hep && *hep) {
-> 584      MOZ_RELEASE_ASSERT(!aCreate, "If an object already has a serial number, we should be destroying it.");
   585      return static_cast<SerialNumberRecord*>((*hep)->value)->serialNumber;
   586    }
   587
Timeout uses MOZ_COUNT_CTOR: <http://searchfox.org/mozilla-central/rev/ef3cede9a5b050fffade9055ca274654f2bc719e/dom/base/Timeout.cpp#27>

NS_LogCtor() puts the pointer inside the gSerialNumbers hashtable.  Then NS_LogAddRef() gets called, and it asserts that the pointer cannot be found in the hashtable.

This whole notion that the first call to NS_LogAddRef() for an object can assert anything about whether the object exists is bogus due to MOZ_COUNT_CTOR, and we have no way of knowing whether the object is using MOZ_COUNT_CTOR or not.
We really have three distinct cases here:
  * Where the caller knows a new serial no shouldn't be created.
  * Where the caller knows a new serial number should be created.
  * Where the caller can guess that a new serial no should probably be created.

When NS_LogAddRef() sees a refcount of 1, it should assume the third case, since
a previous call to NS_LogCtor() may have created the serial no before the first
call to AddRef().
Attachment #8815810 - Flags: review?(nfroyd)
Hmm, another issue is the reverse problem in NS_LogDtor.  It's possible for the last call to NS_LogRelease() to free up the serial number.
Comment on attachment 8815810 [details] [diff] [review]
Make sure the XPCOM refcount logging deals with refcounted objects which use MOZ_COUNT_CTOR correctly

Review of attachment 8815810 [details] [diff] [review]:
-----------------------------------------------------------------

The correct thing to do here is to change Timeout so that it doesn't use MOZ_COUNT_{D,C}TOR, since it already has leak logging through being reference counted.  We have a bug open on MOZ_COUNT_{D,C}TOR erroring when you use it in a refcounted class.
Attachment #8815810 - Flags: review?(nfroyd)
We really have three distinct cases here:
  * Where the caller knows a new serial no shouldn't be created.
  * Where the caller knows a new serial number should be created.
  * Where the caller can guess that a new serial no should probably be created.

When NS_LogAddRef() sees a refcount of 1, it should assume the third case, since
a previous call to NS_LogCtor() may have created the serial no before the first
call to AddRef().
Attachment #8815812 - Flags: review?(nfroyd)
Attachment #8815810 - Attachment is obsolete: true
Is there a good reason to make people's debugging sessions crash while that bug is being fixed?

(And do you have a bug # handy by any chance?  Thanks!)
Flags: needinfo?(nfroyd)
I started writing a patch so that refcounted classes don't use MOZ_COUNT_CTOR, but there were like 100 classes or more classes that did that so I wasn't entirely sure I was right that that's the way things should be done.
(In reply to Andrew McCreight [:mccr8] from comment #7)
> I started writing a patch so that refcounted classes don't use
> MOZ_COUNT_CTOR, but there were like 100 classes or more classes that did
> that so I wasn't entirely sure I was right that that's the way things should
> be done.

I'm not disputing that this is the Right Way (TM) to do this.  I'm just saying that given the current reality where things use MOZ_COUNT_CTOR when they shouldn't, the assertions in the code are going to cause crashes for every single instance of someone doing something like comment 0.

(FWIW we already have code in the clang plugin for detecting refcounted objects, so making an analysis which would make such MOZ_COUNT_CTOR's not compile is super simple once we've fixed the occurrences.  Or perhaps we can employ a simpler static_assert based check to make sure the object doesn't have an AddRef/Release pair?)
Version: unspecified → 52 Branch
(In reply to :Ehsan Akhgari from comment #6)
> Is there a good reason to make people's debugging sessions crash while that
> bug is being fixed?

So they can fix their buggy code?</glib-answer>

> (And do you have a bug # handy by any chance?  Thanks!)

Bug 1307961.
Flags: needinfo?(nfroyd)
Summary: XPCOM refcount logging still doesn't work → XPCOM refcount logging still doesn't work (in debug builds only)
Ehsan told me we'd rather fix bug 1307961.
Per https://bugzilla.mozilla.org/show_bug.cgi?id=1307961#c21 and the fact that this regression is not affecting users, I'm marking this fix-optional for 52 and 53.
Let's just dupe this!
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: