Closed Bug 970685 Opened 11 years ago Closed 11 years ago

Add telemetry for WebRTC media quality metrics

Categories

(Core :: WebRTC: Audio/Video, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla32

People

(Reporter: abr, Assigned: jib)

References

Details

(Whiteboard: [p=1, 1.5:p3, ft:webrtc])

Attachments

(2 files, 2 obsolete files)

We should report telemetry for key quality metrics, minimally including packet loss (locally detected and RTCP-reported), jitter (locally detected and RTCP-reported), and RTT.
Blocks: 970426
This telemetry also needs to include information sufficient to debug sync issues; to wit: the delta between audio and video timestamps when received (prior to any imposition of any receiver-side synchronization).
Assignee: nobody → jib
Priority: -- → P3
Whiteboard: [p=2, 1.5:p3, ft:webrtc]
Target Milestone: --- → mozilla32
Priority: P3 → P2
Telemetry definitions + removed 1 unused dictionary.
Attachment #8419370 - Flags: review?(rjesup)
Comment on attachment 8419370 [details] [diff] [review] Part 1: definitions for WebRTC telemetry for jitter, packet-loss and RTT Review of attachment 8419370 [details] [diff] [review]: ----------------------------------------------------------------- r+ but because you touch a webidl file I think you need a DOM reviewer (checkin hooks block otherwise)
Attachment #8419370 - Flags: review?(rjesup) → review+
Attachment #8419370 - Flags: review?(bugs)
Works. This is the simplest approach. - High-level thread reuses stats code. - Telemetry is separate for video and audio, inbound and outbound once per second. RTT on outbound. - Wakes up every second once first call is made, and keeps going. Telemetry stops being reported once calls end by the virtue of the stats API returning zero data for zero PCs (though I should probably detect idle and stop timer?) Jesup and I discussed this thread approach vs. a piggyback approach which would not need a thread but may fail to report telemetry in cases where there are no incoming packets. The thread approach may be useful for other stuff we want to add, and it was the easiest to write, so there it is. I'm perhaps a bit concerned about the efficiency of this approach, but maybe it's OK. Let me know what you think. Notes to self: - Document that telememtry for RTCP would be more accurate using an approach that calculated a running average from every (sub-second) RTCP packet. However, this is arguably less important since the best measure of quality at the remote is at the remote. For locally derived info we assume the stack uses some form of averaging already, so this per-second sample should suffice. - Also calculate and add bandwidth telemetry.
Attachment #8419386 - Flags: feedback?(rjesup)
Attachment #8419386 - Flags: feedback?(docfaraday)
Whiteboard: [p=2, 1.5:p3, ft:webrtc] → [p=1, 1.5:p3, ft:webrtc]
(In reply to Jan-Ivar Bruaroey [:jib] from comment #4) > I'm perhaps a bit concerned about the efficiency of this approach, but maybe > it's OK. I should explain the code to those unfamiliar with the stats code: The threading model around the pipelines is weird: - The pipelines are containers, - the containers are only safe on main thread, the members are only safe on STS, - hence the there and back again approach.
Attachment #8419370 - Flags: review?(bugs) → review+
Comment on attachment 8419386 [details] [diff] [review] Part 2: Thread approach for WebRTC telemetry for jitter, packet-loss and RTT Review of attachment 8419386 [details] [diff] [review]: ----------------------------------------------------------------- With one change, this should be pretty efficient when idle, and about as efficient a use of the stats API as possible. Some other minor nits. ::: media/webrtc/signaling/src/peerconnection/PeerConnectionCtx.cpp @@ +252,5 @@ > +static void > +EverySecondTelemetryCallback_s(nsAutoPtr<RTCStatsQueries> aQueryList) { > + using namespace Telemetry; > + > + for (auto pc = aQueryList->begin(); pc != aQueryList->end(); ++pc) { |pc| is a confusing name here to me. @@ +295,5 @@ > +PeerConnectionCtx::EverySecondTelemetryCallback_m(nsITimer* timer, void *) { > + MOZ_ASSERT(NS_IsMainThread()); > + if (PeerConnectionCtx::isActive()) { > + PeerConnectionCtx *ctx = PeerConnectionCtx::GetInstance(); > + MOZ_ASSERT(ctx); Do we actually need these checks, if the timer is only created when a PeerConnectionCtx is initted, destroyed when the same is destroyed, and a PeerConnectionCtx* passed to us here in that second param? @@ +314,5 @@ > + p->second->BuildStatsQuery_m(nullptr, // all tracks > + queries->back()); > + } > + } > + rv = RUN_ON_THREAD(stsThread, You probably want to bail here if |queries| is empty; that will help the efficiency a lot (PeerConnectionCtx::isActive() is not going to tell you whether there are any PeerConnectionImpl). Once you do this, you aren't doing much at all, and you're only doing it once a second.
Attachment #8419386 - Flags: feedback?(docfaraday) → feedback+
(In reply to Byron Campen [:bwc] from comment #6) > > + for (auto pc = aQueryList->begin(); pc != aQueryList->end(); ++pc) { > > |pc| is a confusing name here to me. Doesn't each QueryList pretty much represent a PeerConnection? No worries, I'll call it q or something. > > + if (PeerConnectionCtx::isActive()) { > > + PeerConnectionCtx *ctx = PeerConnectionCtx::GetInstance(); > > + MOZ_ASSERT(ctx); > > Do we actually need these checks, if the timer is only created when a > PeerConnectionCtx is initted, destroyed when the same is destroyed, and a > PeerConnectionCtx* passed to us here in that second param? isActive()? You're right. I cribbed it from your WebRTCGlobalInformation code, but it's not needed here. I'll change it to a MOZ_ASSERT. > > + rv = RUN_ON_THREAD(stsThread, > > You probably want to bail here if |queries| is empty; that will help the > efficiency a lot (PeerConnectionCtx::isActive() is not going to tell you > whether there are any PeerConnectionImpl). Once you do this, you aren't > doing much at all, and you're only doing it once a second. Thanks, good idea!
Comment on attachment 8419386 [details] [diff] [review] Part 2: Thread approach for WebRTC telemetry for jitter, packet-loss and RTT Review of attachment 8419386 [details] [diff] [review]: ----------------------------------------------------------------- r+ with nits/comments ::: media/webrtc/signaling/src/peerconnection/PeerConnectionCtx.cpp @@ +246,5 @@ > +#ifdef MOZILLA_INTERNAL_API > +typedef Vector<nsAutoPtr<RTCStatsQuery>> RTCStatsQueries; > + > +static void > +Free_m(nsAutoPtr<RTCStatsQueries> aQueryList) { MOZ_ASSERT(NS_IsMainThread()); } Please comment this; it will confuse people. Also comment the threading model you mentioned in the bug. Also, more than one line please. @@ +253,5 @@ > +EverySecondTelemetryCallback_s(nsAutoPtr<RTCStatsQueries> aQueryList) { > + using namespace Telemetry; > + > + for (auto pc = aQueryList->begin(); pc != aQueryList->end(); ++pc) { > + auto& q = *pc; Huh, we can use 'auto' now. Handy. @@ +258,5 @@ > + PeerConnectionImpl::ExecuteStatsQuery_s(q); > + auto& r = q->report; > + if (r.mInboundRTPStreamStats.WasPassed()) { > + auto& array = r.mInboundRTPStreamStats.Value(); > + for (uint32_t i = 0; i < array.Length(); i++) { Hmmm, not trivial to recode this to use auto for 'i'. uint32_t will do @@ +389,5 @@ > + MOZ_ASSERT(mTelemetryTimer); > + nsresult rv = mTelemetryTimer->SetTarget(gMainThread); > + NS_ENSURE_SUCCESS(rv, rv); > + mTelemetryTimer->InitWithFuncCallback(EverySecondTelemetryCallback_m, this, 1000, > + nsITimer::TYPE_REPEATING_PRECISE_CAN_SKIP); I'd even be ok with SLACK here since the timings aren't very critical. However, PRECISE_CAN_SKIP is also probably fine. ::: media/webrtc/signaling/src/peerconnection/PeerConnectionCtx.h @@ +104,4 @@ > // Telemetry Peer conection counter > int mConnectionCounter; > > + nsCOMPtr<nsITimer> mTelemetryTimer; Since we don't explicitly stop this, I presume ~PeerConnectionCtx() dies on the same thread that started the timer.
Attachment #8419386 - Flags: feedback?(rjesup)
Attachment #8419386 - Flags: feedback?(docfaraday)
Attachment #8419386 - Flags: feedback+
Comment on attachment 8419386 [details] [diff] [review] Part 2: Thread approach for WebRTC telemetry for jitter, packet-loss and RTT undoing accidental re-f? to bwc
Attachment #8419386 - Flags: feedback?(docfaraday) → feedback+
(In reply to Randell Jesup [:jesup] from comment #8) > I'd even be ok with SLACK here since the timings aren't very critical. > However, PRECISE_CAN_SKIP is also probably fine. I think the latter gives more accuracy to the telemetry numbers over time. You can literally add up the totals to know how man seconds of call time this represents. Skipping shouldn't normally happen at this interval I think. > > + nsCOMPtr<nsITimer> mTelemetryTimer; > > Since we don't explicitly stop this, I presume ~PeerConnectionCtx() dies on > the same thread that started the timer. Good question. ~ was called on main thread when I checked in the debugger. http://mxr.mozilla.org/mozilla-central/source/xpcom/threads/nsITimer.idl#51 says only the target thread should interact with a timer after it's been launched, and main is our target, so this should be fine. Should I add a main-thread assert to ~PeerConnectionCtx?
Fixed two typos in histogram descriptions. Carrying forward r+ from jesup and smaug.
Attachment #8419370 - Attachment is obsolete: true
Attachment #8421204 - Flags: review+
Comment on attachment 8420946 [details] [diff] [review] Part 2: Thread approach for WebRTC telemetry for jitter, packet-loss and RTT (2) Review of attachment 8420946 [details] [diff] [review]: ----------------------------------------------------------------- ::: media/webrtc/signaling/src/peerconnection/PeerConnectionCtx.cpp @@ +397,5 @@ > + mTelemetryTimer = do_CreateInstance(NS_TIMER_CONTRACTID); > + MOZ_ASSERT(mTelemetryTimer); > + nsresult rv = mTelemetryTimer->SetTarget(gMainThread); > + NS_ENSURE_SUCCESS(rv, rv); > + mTelemetryTimer->InitWithFuncCallback(EverySecondTelemetryCallback_m, this, 1000, PR_MSEC_PER_SEC, though this is pretty optional
Attachment #8420946 - Flags: review?(rjesup) → review+
I put bandwidth patches in Bug 875556.
Blocks: 875556
Patches in comment 17 and comment 18 belong to Bug 875556. I put the wrong bug# in the commit msg. Sorry.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: