Closed Bug 1448635 Opened 6 years ago Closed 6 years ago

Crash in arena_t::DallocSmall | static void arena_dalloc

Categories

(Core :: Networking, defect)

Unspecified
Windows 10
defect
Not set
critical

Tracking

()

RESOLVED DUPLICATE of bug 1448612

People

(Reporter: jseward, Unassigned)

Details

(Keywords: crash)

Crash Data

This is the #4 topcrash of the Windows nightly 20180322100349,
with 38 crashes from 7 different installations.  From the
backtraces it looks to me like a heap corruption bug, 
possibly to do with XPCOM shutdown.

This bug was filed from the Socorro interface and is
report bp-048175a8-3ea3-448b-99c7-1afef0180323.
=============================================================

Top 10 frames of crashing thread:

0 mozglue.dll arena_t::DallocSmall memory/build/mozjemalloc.cpp:3418
1 mozglue.dll static void arena_dalloc memory/build/mozjemalloc.cpp:3504
2 nss3.dll static void DefaultFreeEntry nsprpub/lib/ds/plhash.c:55
3 nss3.dll PL_HashTableDestroy nsprpub/lib/ds/plhash.c:115
4 nss3.dll SECOID_Shutdown security/nss/lib/util/secoid.c:2258
5 nss3.dll nss_Shutdown security/nss/lib/nss/nssinit.c:1159
6 nss3.dll NSS_Shutdown security/nss/lib/nss/nssinit.c:1221
7 xul.dll mozilla::ShutdownXPCOM xpcom/build/XPCOMInit.cpp:1011
8 xul.dll ScopedXPCOMStartup::~ScopedXPCOMStartup toolkit/xre/nsAppRunner.cpp:1508
9 xul.dll XREMain::XRE_main toolkit/xre/nsAppRunner.cpp:4939

=============================================================
Looking at all 7 unique crash stacks I can't see any obvious pattern
other than "crash at shutdown".  Nathan, any ideas?
Flags: needinfo?(nfroyd)
Looking at more of the crash reports, e.g.:

https://crash-stats.mozilla.com/report/index/d58de278-0277-4358-9b5c-393b50180326
https://crash-stats.mozilla.com/report/index/7e2708c7-7dc8-450b-b504-bb49b0180326
https://crash-stats.mozilla.com/report/index/9ae120ad-ff06-46fd-b941-bb2310180325
https://crash-stats.mozilla.com/report/index/ab3e135c-c403-460d-aa41-5cb4b0180325
https://crash-stats.mozilla.com/report/index/f74edb69-bf1c-43cd-bee0-f79f00180325
https://crash-stats.mozilla.com/report/index/3d201026-ce98-4bbd-b6a6-e4b0c0180325
https://crash-stats.mozilla.com/report/index/2bf056fa-d635-492f-8556-fd5b60180325
https://crash-stats.mozilla.com/report/index/98f97781-2cbc-42a1-8574-30d2b0180324
https://crash-stats.mozilla.com/report/index/7fd5558c-96d7-4a25-ace4-cfc4a0180324
https://crash-stats.mozilla.com/report/index/de165b7a-8311-41e9-8cfe-961030180324
https://crash-stats.mozilla.com/report/index/f54230f2-20ba-48dd-bf82-f8e7d0180324
https://crash-stats.mozilla.com/report/index/ede50dc9-e987-4a9e-8eae-17f880180324
https://crash-stats.mozilla.com/report/index/10dd5d14-5289-428e-a26f-f65b80180324

which are basically all of the ones that happened for Nightlies on or after 20180324, we seem to be hitting some issues with streams.

Then again, the NSS shutdown crashes seem to be a thing, in addition to the crash in comment 0:

https://crash-stats.mozilla.com/report/index/fa9c0918-1756-43e6-be17-146cd0180319
https://crash-stats.mozilla.com/report/index/1b5713be-71a1-4276-996d-a22160180324
https://crash-stats.mozilla.com/report/index/fa9c0918-1756-43e6-be17-146cd0180319

I don't think they are related, but two separate issues?  Actually, the NSS shutdown crashes appear to be bug 721710.

Looking at the streams crashes first, we're crashing in nsStreamLoader::OnStopRequest:

    // provide nsIStreamLoader::request during call to OnStreamComplete
    mRequest = request;
    size_t length = mData.length();
    uint8_t* elems = mData.extractOrCopyRawBuffer();
    nsresult rv = mObserver->OnStreamComplete(this, mContext, aStatus,
                                              length, elems);
    if (rv != NS_SUCCESS_ADOPTED_DATA) {
      // The observer didn't take ownership of the extracted data buffer, so
      // put it back into mData.
      mData.replaceRawBuffer(elems, length); // XXX crash inside here
    }

replaceRawBuffer is pretty straightforward: we want to take ownership of the passed in buffer, if possible:

Vector<T, N, AP>::replaceRawBuffer(T* aP, size_t aLength, size_t aCapacity)
{
  MOZ_REENTRANCY_GUARD_ET_AL;

  /* Destroy what we have. */
  Impl::destroy(beginNoCheck(), endNoCheck());
  if (!usingInlineStorage()) {
    this->free_(beginNoCheck());
  }

  /* Take in the new buffer. */
  if (aCapacity <= kInlineCapacity) {
    /*
     * We convert to inline storage if possible, even though aP might
     * otherwise be acceptable.  Maybe this behaviour should be
     * specifiable with an argument to this function.
     */
    mBegin = inlineStorage();
    mLength = aLength;
    mTail.mCapacity = kInlineCapacity;
    Impl::moveConstruct(mBegin, aP, aP + aLength);
    Impl::destroy(aP, aP + aLength);
    this->free_(aP);
  } else {
    mBegin = aP;
    mLength = aLength;
    mTail.mCapacity = aCapacity;
  }
#ifdef DEBUG
  mTail.mReserved = aCapacity;
#endif
}

nsStreamLoader::mData doesn't *have* inline storage, so we could hit the first free_ call, or we could hit the second one if aCapacity is 0.  We can't hit frees when destroying elements because the elements are uint8_t, i.e. trivial.

But the state of the vector at this replaceRawBuffer call should be an empty vector; we should have extracted whatever data was resident in the extractOrCopyRawBuffer call in OnStopRequest prior to this.  (The copy case shouldn't be hit; we'd only copy the data out if we were using an inline buffer, which we aren't.  Therefore, we should always be passing out a pointer for the Vector's allocated data.)  So the first free() call should always receive nullptr...I think.  Unless we're hitting OnStopRequest on multiple threads?

But hitting the second free is equally uninteresting: the aLength passed in would have to be 0, and the buffer hasn't been obviously freed by anybody else.  The passed-in buffer ought to be solely owned by us in the first place!  So I don't know what's going on there either.

It would be super-helpful to know what mObserver is, so we could see if that object's OnStreamComplete was doing peculiar things.  But I have taken a cursory look at all the OnStreamComplete implementations in the tree and none of them seem to be doing anything especially unusual.

The peculiar thing is that this is all running from a ThrottledEventQueue, so we're on the main thread, running things through a TabGroup, either for a worker or a timer (cf. http://dxr.mozilla.org/mozilla-central/source/dom/base/TabGroup.cpp#68 ).  That feels significant, but I'm out of ideas for now.

Nothing on the stack for these crashes or related-ish code looks like it's changed recently.  ni? to baku and valentin (streams and networking, respectively) to see if they have any ideas about what might be going wrong here, or for poing holes in the above analysis.  Tentatively moving this to Networking...
Component: XPCOM → Networking
Flags: needinfo?(valentin.gosu)
Flags: needinfo?(amarchesini)
Sigh, I thought I redirected this...
Flags: needinfo?(nfroyd)
This seems a dup of bug 1448612.
Flags: needinfo?(amarchesini)
(In reply to Andrea Marchesini [:baku] from comment #4)
> This seems a dup of bug 1448612.

Yes, I think this is correct.  Thank you!
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(valentin.gosu)
Resolution: --- → DUPLICATE
I still need to understand why this happens. In theory, when that method is called, we have a ThreadSafeWorkerRef that keeps the worker alive. And the dispatching of WorkerControlRunnable should not fail. Or it was not failing before the introduction of ThreadSafeWorkerRef.
Group: core-security
You need to log in before you can comment on or make changes to this bug.