nsAutoLock trashes stack and triggers MSVC 7 RTC warning about corrputed stack perhaps due to incorrect calling conventions

RESOLVED INVALID

Status

()

Core
XPCOM
--
critical
RESOLVED INVALID
12 years ago
12 years ago

People

(Reporter: Ben Turner (not reading bugmail, use the needinfo flag!), Assigned: dougt)

Tracking

1.8 Branch
x86
Windows XP
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(3 attachments)

I've been working on replacing some windows-specific threading code to use nspr and nsAutoLock. I've repeatedly run into the same crash where it looks like the nsAutoLock constructor is trashing the stack and then the destructor causes an exception trying to either unlock a bad PRLock* or the function returns to a bogus memory address. Running in debug mode with the runtime check compiler switches (/RTC1 and /GS) turned on I receive the warning 'Run-Time Check Failure #2 - Stack around the variable 'guidLock' was corrupted.'

Here's the function I'm crashing in:

  NS_IMETHODIMP DatabaseQuery::SetDatabaseGUID(const PRUnichar *dbGUID)
  {
    if (!dbGUID)
      return NS_ERROR_NULL_POINTER;

    nsAutoLock guidLock(m_pDatabaseGUIDLock);
    // m_DatabaseGUID = dbGUID;
  
    return NS_OK;
  }

The constructor and destructor:

  DatabaseQuery::DatabaseQuery()
  : m_pDatabaseGUIDLock(nsnull)
  {
    PRLock* pLock = PR_NewLock();
    if (pLock)
      m_pDatabaseGUIDLock = pLock;
  }

  DatabaseQuery::~DatabaseQuery()
  {
    if (m_pDatabaseGUIDLock)
      PR_DestroyLock(m_pDatabaseGUIDLock);
  }

When I saw that the stack was getting trashed I commented out everything in the function except the nsAutoLock just to make sure that the lock was indeed the culprit. I verified that m_pDatabaseGUIDLock is getting set correctly. Then I hopped into the debugger and started looking at the memory. Here's what's going on: 

nsAutoLock has two members:
  PRLock* mLock
  PRBool mLocked.
nsAutoLockBase has three members (in debug mode):
  void* mAddr
  nsAutoLockBase* mDown
  nsAutoLockType mType

The VC compiler is either not making enough space on the stack or the stack is misaligned. The base variables are initialized in the nsAutoLockBase constructor and occupy the 12 bytes immediately preceeding the other stack data (return pointer and other stuff). Then, when the nsAutoLock constructor runs, it overwrites the next 8 bytes (which contained the return address and some other data) and I get a friendly access violation. I'll describe that process a bit more here:

Stack (before constructors):

      ...     <--- lots more cc above here (this is VC padding)
  cc cc cc cc
  cc cc cc cc
  cc cc cc cc <--- &guidLock
  cc cc cc cc
  cc cc cc cc
  f0 1f e2 00 <--- return address
  .. .. .. .. \
  .. .. .. .. } <--- other data
  .. .. .. .. /

Then, in steps: (1,2,3) the base constructor is run; (4,5) then the other.

      ...
  cc cc cc cc
  cc cc cc cc
  68 47 43 01 <--- &guidLock (1) AND mAddr
  00 00 00 00 <--- (2) mDown
  00 00 00 00 <--- (3) mType
  f0 1f e2 00 <--- return address
  .. .. .. .. <--- other data

      ...
  cc cc cc cc
  cc cc cc cc
  68 47 43 01 <--- mAddr
  00 00 00 00 <--- mDown
  00 00 00 00 <--- mType
  68 47 43 01 <--- return address <- (4) mLock
  01 00 00 00 <--- other data <- (5) mLocked
  .. .. .. .. 

I'm pretty sure the problem has to do with calling conventions... My compiler is set to use __cdecl by default, and I'm pretty sure that nspr is compiled with __cdecl: bsmedberg pointed me to prtypes.h to figure that out. However, I realized that the NS_IMETHODIMP macro gets expanded to 'virtual __stdcall nsresult' and so that function was actually using __stdcall. We use the NS_IMETHODIMP macro because that is what XPIDL told us to use in this XPCOM component. Sadly, replacing all the instances of NS_IMETHODIMP declaring SetDatabaseGUID to 'virtual __cdecl nsresult' didn't fix the problem entirely. When compiled with that convention the stack was still misaligned but instead of overwriting 8 bytes of data it only overwrote 4. So I believe that I'm on the right track to figuring this out... Is XPIDL giving me the correct recommendation? Or is this somehow a real bug in nsAutoLock?
 
I've CC'ed the only folks I could find in bugzilla that had previously worked with nsAutoLock. There aren't many of you!
Created attachment 212202 [details]
Stack Trace

Trace of the call stack. I'm pretty sure this is useless info, but just in case...

Comment 2

12 years ago
Created attachment 212211 [details]
nsAutoLock.i + dbg.mozconfig

Ok. bugzilla and IE hate me. and I hate them. I'm having a really terrible morning. This comment has been eaten *Twice*. I'm very sorry for the fact that the comment you get is not going to be very good.

For now, let's pretend calling conventions are a red herring. please put back the NS_IMETHODIMP stuff and do:

make -C xpcom/threads nsAutoLock.i
make ... CDatabaseQuery.i

zip the two .i files and your mozconfig together and attach them. I won't be able to look before sunday.
if you want your functions to be callable from javascript, you better use the calling convention that NS_IMETHODIMP gives you...
Created attachment 212229 [details]
zip containing the 3 requested files

(In reply to comment #2)

Here you go.

Something interesting that I noticed: Line 130032 of DatabaseQuery.i is where nsAutoLockBase is defined. It does *not* show that it has any members (mAddr, mDonw, mType). Here's what the included header says, though:

class NS_COM nsAutoLockBase {
protected:
    nsAutoLockBase() {}
    enum nsAutoLockType {eAutoLock, eAutoMonitor, eAutoCMonitor};

#ifdef DEBUG
    nsAutoLockBase(void* addr, nsAutoLockType type);
    ~nsAutoLockBase();

    void            Show();
    void            Hide();

    void*           mAddr;
    nsAutoLockBase* mDown;
    nsAutoLockType  mType;
#else
    nsAutoLockBase(void* addr, nsAutoLockType type) {}
    ~nsAutoLockBase() {}

    void            Show() {}
    void            Hide() {}
#endif
};


Weird. I'll investigate this a bit more today.
(In reply to comment #4)
> Weird. I'll investigate this a bit more today.

What's the mystery?  You've pretty clearly compiled with and without DEBUG defined.

/be
Status: NEW → RESOLVED
Last Resolved: 12 years ago
Resolution: --- → INVALID
Argh, yes. Silly.

The MSVC standard define for debug mode is _DEBUG, and the mozilla define is DEBUG.

Sheesh.

Thanks guys.
You need to log in before you can comment on or make changes to this bug.