Closed Bug 1433046 Opened 2 years ago Closed 2 years ago

Crash in @0x0 | nsCOMPtr<T>::~nsCOMPtr<T> | mozilla::mscom::Interceptor::GetInterceptorForIID

Categories

(Core :: IPC: MSCOM, defect, P1, critical)

Unspecified
Windows 7
defect

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox-esr52 --- unaffected
firefox58 --- unaffected
firefox59 --- fixed
firefox60 --- verified

People

(Reporter: calixte, Assigned: aklotz)

References

(Blocks 1 open bug)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files)

This bug was filed from the Socorro interface and is
report bp-44504d30-e078-415f-937b-20cdc0180125.
=============================================================

Top 10 frames of crashing thread:

0  @0x0 
1 xul.dll nsCOMPtr<nsIDOMXULLabelElement>::~nsCOMPtr<nsIDOMXULLabelElement> 
2 xul.dll mozilla::mscom::Interceptor::GetInterceptorForIID ipc/mscom/Interceptor.cpp:720
3 xul.dll mozilla::mscom::Interceptor::WeakRefQueryInterface ipc/mscom/Interceptor.cpp:826
4 xul.dll mozilla::mscom::WeakReferenceSupport::QueryInterface ipc/mscom/WeakRef.cpp:121
5 ole32.dll CStdMarshal::CreateStub 
6 ole32.dll CStdMarshal::ConnectSrvIPIDEntry 
7 ole32.dll CStdMarshal::MarshalServerIPID 
8 ole32.dll CStdMarshal::MarshalObjRef 
9 ole32.dll CStdMarshal::MarshalInterface 

=============================================================

There are 2 crashes (from 1 installation) in nightly 60 with buildid 20180124220129. In analyzing the backtrace, the regression may have been introduced by patch [1] to fix bug 1428759.

[1] https://hg.mozilla.org/mozilla-central/rev?node=06e5dd66fb82
Flags: needinfo?(aklotz)
Crash Signature: [@ @0x0 | nsCOMPtr<T>::~nsCOMPtr<T> | mozilla::mscom::Interceptor::GetInterceptorForIID] → [@ @0x0 | nsCOMPtr<T>::~nsCOMPtr<T> | mozilla::mscom::Interceptor::GetInterceptorForIID] [@ mozilla::mscom::Interceptor::WeakRefQueryInterface]
Crash Signature: [@ @0x0 | nsCOMPtr<T>::~nsCOMPtr<T> | mozilla::mscom::Interceptor::GetInterceptorForIID] [@ mozilla::mscom::Interceptor::WeakRefQueryInterface] → [@ @0x0 | nsCOMPtr<T>::~nsCOMPtr<T> | mozilla::mscom::Interceptor::GetInterceptorForIID] [@ mozilla::mscom::Interceptor::WeakRefQueryInterface] [@ nsCOMPtr<T>::~nsCOMPtr<T> | mozilla::mscom::Interceptor::GetInterceptorForIID]
Flags: needinfo?(aklotz)
Crash Signature: [@ @0x0 | nsCOMPtr<T>::~nsCOMPtr<T> | mozilla::mscom::Interceptor::GetInterceptorForIID] [@ mozilla::mscom::Interceptor::WeakRefQueryInterface] [@ nsCOMPtr<T>::~nsCOMPtr<T> | mozilla::mscom::Interceptor::GetInterceptorForIID] → [@ @0x0 | nsCOMPtr<T>::~nsCOMPtr<T> | mozilla::mscom::Interceptor::GetInterceptorForIID] [@ mozilla::mscom::Interceptor::WeakRefQueryInterface] [@ nsCOMPtr<T>::~nsCOMPtr<T> | mozilla::mscom::Interceptor::GetInterceptorForIID] [@ @0x0 | nsBoxLayoutState…
Crash Signature: nsBoxLayoutState::~nsBoxLayoutState] → nsBoxLayoutState::~nsBoxLayoutState] [@ nsBoxLayoutState::~nsBoxLayoutState]
Crash Signature: nsBoxLayoutState::~nsBoxLayoutState] [@ nsBoxLayoutState::~nsBoxLayoutState] → nsBoxLayoutState::~nsBoxLayoutState] [@ nsBoxLayoutState::~nsBoxLayoutState] [@ @0x0 | RefPtr<T>::~RefPtr<T> | mozilla::mscom::Interceptor::GetInterceptorForIID] [@ RefPtr<T>::~RefPtr<T> | mozilla::mscom::Interceptor::GetInterceptorForIID]
Assignee: nobody → aklotz
Status: NEW → ASSIGNED
Keywords: leave-open
Crash Signature: nsBoxLayoutState::~nsBoxLayoutState] [@ nsBoxLayoutState::~nsBoxLayoutState] [@ @0x0 | RefPtr<T>::~RefPtr<T> | mozilla::mscom::Interceptor::GetInterceptorForIID] [@ RefPtr<T>::~RefPtr<T> | mozilla::mscom::Interceptor::GetInterceptorForIID] → nsBoxLayoutState::~nsBoxLayoutState] [@ nsBoxLayoutState::~nsBoxLayoutState] [@ @0x0 | RefPtr<T>::~RefPtr<T> | mozilla::mscom::Interceptor::GetInterceptorForIID] [@ RefPtr<T>::~RefPtr<T> | mozilla::mscom::Interceptor::GetInterceptorForIID] [@ Activat…
Comment on attachment 8945576 [details]
Bug 1433046: Use WeakReferenceSupport::StabilizeRefCount instead of regular kung-fu death grips when aggregating;

https://reviewboard.mozilla.org/r/215708/#review221910
Attachment #8945576 - Flags: review?(jteh) → review+
The patch looks fine and makes sense, so I was happy to r+, but I have a couple of questions/clarifications for my own understanding.

(In reply to Aaron Klotz [:aklotz] from comment #1)
> But if we're in the process of creating the outer object, that refcount might
> not yet have been incremented by 1, so the inner object's invocation of the
> outer object's Release() could trigger a deletion.

To clarify, the previous kungFuDeathGrip would have solved this, right? Because the kungFuDeathGrip outlives the inner's call to Release. Am I correct in saying the problem you're trying to solve with this patch is that with a regular kungfuDeathGrip, the release of the kungFuDeathGrip itself can cause deletion? That is:
1. Outer object begins creation. Ref count is at 0.
2. Outer object acquires kungFuDeathGrip. Ref count goes to 1.
3. Inner object does AddRef. Ref count goes to 2.
4. Inner object does Release. Ref count goes to 1.
5. kungFuDeathGrip gets released. Ref count goes to 0. Deletion ensues. Bam!

Assuming the above is correct, have you any idea why this would be new after bug 1428759? I understand that we had a critical section around QI there, but I wouldn't have thought that would protect all cases where this is a possibility. In fact, this should only be possible during Interceptor creation, but I don't think Interceptor creation can occur *within* a QI; there has to be an Interceptor to QI on in the first place in order for that previous critical section to have an impact.
(In reply to James Teh [:Jamie] from comment #3)
> Assuming the above is correct, have you any idea why this would be new after
> bug 1428759? I understand that we had a critical section around QI there,
> but I wouldn't have thought that would protect all cases where this is a
> possibility. In fact, this should only be possible during Interceptor
> creation, but I don't think Interceptor creation can occur *within* a QI;
> there has to be an Interceptor to QI on in the first place in order for that
> previous critical section to have an impact.

The short answer is: you are correct and I'm not sure.

But another piece of the puzzle here is that I have confirmed that the crashing smart pointer is the one named |interceptor|. Due to aggregation, |interceptor|'s IUnknown will actually be the mscom::WeakReferenceSupport IUnknown. So something is clearly going wrong with respect to our own refcount.
At the very least, your patch makes this area "correct" and allows us to eliminate one more variable. At best, it might fix the crashes. :)
Pushed by aklotz@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8e77a375a5e3
Use WeakReferenceSupport::StabilizeRefCount instead of regular kung-fu death grips when aggregating; r=Jamie
What are the next steps here?
Flags: needinfo?(aklotz)
There still are 35 crashes in nightly 60 with signature 'mozilla::mscom::Interceptor::WeakRefQueryInterface' since the patch landed: http://bit.ly/2ErUpim
I took a look at the dump for bp-14ecedd3-1f57-4bc2-beba-784cb0180208. WinDBG shows a bit more in the stack after Interceptor::WeakRefQueryInterface:

 # ChildEBP RetAddr  
WARNING: Frame IP not in any known module. Following frames may be wrong.
00 2460ebb4 11265970 0xe00000
01 (Inline) -------- xul!mozilla::RefPtrTraits<ICallInterceptor>::Release+0x6 [z:\build\build\src\obj-firefox\dist\include\mozilla\refptr.h @ 41] 
02 (Inline) -------- xul!RefPtr<ICallInterceptor>::ConstRemovingRefPtrTraits<ICallInterceptor>::Release+0x6 [z:\build\build\src\obj-firefox\dist\include\mozilla\refptr.h @ 398] 
03 (Inline) -------- xul!RefPtr<ICallInterceptor>::{dtor}+0xe [z:\build\build\src\obj-firefox\dist\include\mozilla\refptr.h @ 79] 
04 2460ec6c 11266229 xul!mozilla::mscom::Interceptor::GetInterceptorForIID(struct _GUID * aIid = 0x2a17a538 --- memory read error at address 0x2a17a538 ---, void ** aOutInterceptor = 0x2460ed0c)+0x4b1 [z:\build\build\src\ipc\mscom\interceptor.cpp @ 720] 

Aaron, I'm probably barking up the wrong tree here (COM aggregation still confuses the hell out of me on a regular basis). Still, the worst that can happen is I'm indisputably wrong. :) In GetInterceptorForIID, if we need to create a new interceptor, we get its ICallInterceptor interface so we can register our sink:

690       RefPtr<ICallInterceptor> interceptor;
691       hr = unkInterceptor->QueryInterface(IID_ICallInterceptor,
692                                           (void**)getter_AddRefs(interceptor));

Later, if we determine we've raced with another thread, we throw away the IUnknown for the interceptor we just created (|unkInterceptor|):

702         // We might have raced with another thread, so first check that we don't
703         // already have an entry for this
704         MapEntry* entry = Lookup(interceptorIid);
705         if (entry && entry->mInterceptor) {
706           unkInterceptor = entry->mInterceptor;

That means |interceptor| (RefPtr<ICallInterceptor>) will get cleaned up when the function returns. Normally, that'd be fine because both pointers refer to the same object, just different interfaces. However, if I understand correctly, unkInterceptor is the *inner* IUnknown, which controls the reference count of the *inner* object. In contrast, the QI to ICallInterceptor will AddRef on the ICallInterceptor, which controls the ref count of the *outer* object. So, destroying unkInterceptor deletes the inner object, thus invalidating the ICallInterceptor pointer.

If I'm correct, the fix is fairly simple: clear |interceptor| before resetting |unkInterceptor|. I really hope I'm correct, because this took me *hours* to puzzle out. :P
Priority: -- → P1
Comment on attachment 8949292 [details]
Bug 1433046: mscom::Interceptor: Don't destroy an aggregated interceptor before releasing its interface.

https://reviewboard.mozilla.org/r/218678/#review226228
Attachment #8949292 - Flags: review?(aklotz) → review+
That's a good point, Jamie! Let's see how this goes...
Flags: needinfo?(aklotz)
Keywords: leave-open
Pushed by aklotz@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/a26e2e7c3746
mscom::Interceptor: Don't destroy an aggregated interceptor before releasing its interface. r=aklotz
https://hg.mozilla.org/mozilla-central/rev/a26e2e7c3746
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla60
Comment on attachment 8945576 [details]
Bug 1433046: Use WeakReferenceSupport::StabilizeRefCount instead of regular kung-fu death grips when aggregating;

Approval Request Comment
[Feature/Bug causing the regression]: Exposed by bug 1428759, but may have occurred under other circumstances also.
[User impact if declined]: Crashes for accessibility users after bug 1428759.
[Is this code covered by automated tests?]: No, because we don't have a mechanism for platform accessibility testing.
[Has the fix been verified in Nightly?]: Yes.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: Other patch from this bug.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: Simple correctness patch which prevents premature destruction of an object.
[String changes made/needed]: None.
Attachment #8945576 - Flags: approval-mozilla-beta?
Comment on attachment 8949292 [details]
Bug 1433046: mscom::Interceptor: Don't destroy an aggregated interceptor before releasing its interface.

Approval request comment
[Feature/Bug causing the regression]: Exposed by bug 1428759, but may have occurred under other circumstances also.
[User impact if declined]: Crashes for accessibility users after bug 1428759.
[Is this code covered by automated tests?]: No, because we don't have a mechanism for platform accessibility testing.
[Has the fix been verified in Nightly?]: Yes.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: Other patch from this bug.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: Simple correctness patch which prevents use of a destroyed object.
[String changes made/needed]: None.
Attachment #8949292 - Flags: approval-mozilla-beta?
Comment on attachment 8945576 [details]
Bug 1433046: Use WeakReferenceSupport::StabilizeRefCount instead of regular kung-fu death grips when aggregating;

Needed to support the uplift of bug 1428759. Approved for Fx59rc1.
Attachment #8945576 - Flags: approval-mozilla-beta? → approval-mozilla-release+
Attachment #8949292 - Flags: approval-mozilla-beta? → approval-mozilla-release+
You need to log in before you can comment on or make changes to this bug.