Open Bug 1528170 Opened 7 years ago Updated 1 year ago

Null pointer / assert triggered by mozilla::PeerConnectionCtx::EverySecondTelemetryCallback_m

Categories

(Core :: WebRTC: Audio/Video, defect, P3)

defect

Tracking

()

People

(Reporter: hanno, Unassigned)

Details

(Keywords: regression)

I observed a null pointer write access in the ASAN builds. Unfortunately not reproducible, so I can only offer a stack trace. It looks like this is an assert-type bug.

According to the stack trace it looks like it's related to the telemetry code.

==20099==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000001 (pc 0x7f3c6dc869e8 bp 0x7ffc98764940 sp 0x7ffc98764930 T0)
==20099==The signal is caused by a WRITE memory access.
==20099==Hint: address points to the zero page.
#0 0x7f3c6dc869e7 in mozilla::MozPromise<mozilla::UniquePtr<mozilla::RTCStatsQuery, mozilla::DefaultDelete<mozilla::RTCStatsQuery> >, nsresult, true>::ThenValueBase::AssertIsDead() /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/MozPromise.h:429:9
#1 0x7f3c6dc86260 in mozilla::MozPromise<mozilla::UniquePtr<mozilla::RTCStatsQuery, mozilla::DefaultDelete<mozilla::RTCStatsQuery> >, nsresult, true>::AssertIsDead() /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/MozPromise.h:945:13
#2 0x7f3c6dc85ba8 in mozilla::MozPromise<mozilla::UniquePtr<mozilla::RTCStatsQuery, mozilla::DefaultDelete<mozilla::RTCStatsQuery> >, nsresult, true>::~MozPromise() /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/MozPromise.h:986:5
#3 0x7f3c6dc860ed in mozilla::MozPromise<mozilla::UniquePtr<mozilla::RTCStatsQuery, mozilla::DefaultDelete<mozilla::RTCStatsQuery> >, nsresult, true>::Private::~Private() /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/MozPromise.h:242:9
#4 0x7f3c6dc0b87e in Release /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/MozPromise.h:137:3
#5 0x7f3c6dc0b87e in Release /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/RefPtr.h:45
#6 0x7f3c6dc0b87e in Release /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/RefPtr.h:362
#7 0x7f3c6dc0b87e in ~RefPtr /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/RefPtr.h:76
#8 0x7f3c6dc0b87e in mozilla::PeerConnectionCtx::EverySecondTelemetryCallback_m(nsITimer*, void*) /builds/worker/workspace/build/src/media/webrtc/signaling/src/peerconnection/PeerConnectionCtx.cpp:287
#9 0x7f3c6beb8c35 in nsTimerImpl::Fire(int) /builds/worker/workspace/build/src/xpcom/threads/nsTimerImpl.cpp:559:7
#10 0x7f3c6beb83ad in nsTimerEvent::Run() /builds/worker/workspace/build/src/xpcom/threads/TimerThread.cpp:260:11
#11 0x7f3c6becaad6 in nsThread::ProcessNextEvent(bool, bool*) /builds/worker/workspace/build/src/xpcom/threads/nsThread.cpp:1162:14
#12 0x7f3c6bef53f1 in NS_InvokeByIndex /builds/worker/workspace/build/src/xpcom/reflect/xptcall/md/unix/xptcinvoke_asm_x86_64_unix.S:106
#13 0x7f3c6d997b90 in Invoke /builds/worker/workspace/build/src/js/xpconnect/src/XPCWrappedNative.cpp:1638:10
#14 0x7f3c6d997b90 in Call /builds/worker/workspace/build/src/js/xpconnect/src/XPCWrappedNative.cpp:1186
#15 0x7f3c6d997b90 in XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode) /builds/worker/workspace/build/src/js/xpconnect/src/XPCWrappedNative.cpp:1152
#16 0x7f3c6d99dc66 in XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*) /builds/worker/workspace/build/src/js/xpconnect/src/XPCWrappedNativeJSOps.cpp:946:10
#17 0x35321f3fccff (<unknown module>)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/MozPromise.h:429:9 in mozilla::MozPromise<mozilla::UniquePtr<mozilla::RTCStatsQuery, mozilla::DefaultDelete<mozilla::RTCStatsQuery> >, nsresult, true>::ThenValueBase::AssertIsDead()
==20099==ABORTING

Looks like this is in WebRTC code, not Telemetry.

Component: Telemetry → WebRTC: Audio/Video
Product: Toolkit → Core

Indeed, thanks.

Nico, can you have a look at this one?

Flags: needinfo?(na-g)

This looks to be a MozPromise usage issue. Nothing is holding onto the promise so it is being immediately torn down upon creation. Jean-Yves, do you mind having a look?

Flags: needinfo?(na-g) → needinfo?(jyavenard)

(In reply to Nico Grunbaum [:ng] from comment #3)

This looks to be a MozPromise usage issue. Nothing is holding onto the promise so it is being immediately torn down upon creation. Jean-Yves, do you mind having a look?

A MozPromise typically keeps an internal strong reference to themselves. Let delve into it:
First, temporary objects are kept alive until the end of the current expression, that is the last ; which is after the then. So there's no issue of the original RefPtr<MozPromise> going out of scope until MozPromise::Then() has been called.

Then if you look at the implementation of Then, https://searchfox.org/mozilla-central/source/xpcom/threads/MozPromise.h#914
you see:
return ReturnType(aCallSite, thenValue.forget(), this);
ReturnType here is typename ThenCommand<ThenValueType> (https://searchfox.org/mozilla-central/source/xpcom/threads/MozPromise.h#911)

The constructor of ThenCommand is:
https://searchfox.org/mozilla-central/source/xpcom/threads/MozPromise.h#842

which will store the MozPromise in mReceiver which is, with a definition of RefPtr<MozPromise> mReceiver;

So the RefPtr<MozPromise>::Then(...) found at media/webrtc/signaling/src/peerconnection/PeerConnectionCtx.cpp:287 in the crash report.
Return a ThenCommand() which holds a strong reference to the original MozPromise.
Upon finishing the for loop scope, now the RefPtr<ThenCommand> will go out of scope, and ~ThenCommand() is called:
https://searchfox.org/mozilla-central/source/xpcom/threads/MozPromise.h#852

We have the mThenValue being defined, and so that calls:
mReceiver->ThenInternal(mThenValue.forget(), mCallSite);

https://searchfox.org/mozilla-central/source/xpcom/threads/MozPromise.h#805
MozPromise<T>::ThenInternal will then dispatch the task, as IsPending() is false (we have never dispatched a task yet)

https://searchfox.org/mozilla-central/source/xpcom/threads/MozPromise.h#439

This will create a ResolveOrRejectRunnable which takes the MozPromise as an argument, https://searchfox.org/mozilla-central/source/xpcom/threads/MozPromise.h#377
The ResolveOrRejectRunnable has a mPromise member which is a refptr to the original MozPromise.

So as you can see, a strong reference of MozPromise is being held all the way to the task being dispatched that will run the two lambdas. Only once the task has been run will the MozPromise be deleted as its refcount becomes 0.

So this isn't a lifetime issue.

In the crash stack, what you see here is an assertion being triggered, not an actual 0 page write. This is due to a returned MozPromise not being used. Someone more familiar with the code needs to look into it, :bwc wrote that code.

The code is rather confusing in many ways. Why capture all the variables to start with in the lambda, none of them are used.
Then why use rvalue of a UniquePtr (which is good) only to pass a reference to the object. Why not move the object as C++11 allows?
Having said that, at a quick glance, everything looks okay to me.

Forwarding to :bwc as he's the author of this code.

Flags: needinfo?(jyavenard) → needinfo?(docfaraday)

What happens if this fails? Would that end up hitting the assertion we're seeing?

https://searchfox.org/mozilla-central/rev/f0ef51bfafc8ad0c3a2f523bf076edc57dc4891a/xpcom/threads/MozPromise.h#449

Flags: needinfo?(docfaraday) → needinfo?(jyavenard)
Priority: -- → P2

(In reply to Byron Campen [:bwc] from comment #5)

What happens if this fails? Would that end up hitting the assertion we're seeing?

https://searchfox.org/mozilla-central/rev/f0ef51bfafc8ad0c3a2f523bf076edc57dc4891a/xpcom/threads/MozPromise.h#449

I guess it could.

Work around would be to register for the shutdown service, and reject all pending premises should shutdown has started.

Flags: needinfo?(jyavenard)

Did this happen when you were closing the browser, or a tab?

Flags: needinfo?(hanno)

(In reply to Jean-Yves Avenard [:jya] from comment #6)

(In reply to Byron Campen [:bwc] from comment #5)

What happens if this fails? Would that end up hitting the assertion we're seeing?

https://searchfox.org/mozilla-central/rev/f0ef51bfafc8ad0c3a2f523bf076edc57dc4891a/xpcom/threads/MozPromise.h#449

I guess it could.

Work around would be to register for the shutdown service, and reject all pending premises should shutdown has started.

We should probably be shutting down this periodic telemetry ping when the shutdown service fires also.

(In reply to Byron Campen [:bwc] from comment #7)

Did this happen when you were closing the browser, or a tab?

I can't say for sure, but it's very well possible this was during closing the browser.

Flags: needinfo?(hanno)
Severity: normal → S3
Priority: P2 → P3
You need to log in before you can comment on or make changes to this bug.