Closed
Bug 970685
Opened 11 years ago
Closed 11 years ago
Add telemetry for WebRTC media quality metrics
Categories
(Core :: WebRTC: Audio/Video, defect, P2)
Core
WebRTC: Audio/Video
Tracking
()
RESOLVED
FIXED
mozilla32
People
(Reporter: abr, Assigned: jib)
References
Details
(Whiteboard: [p=1, 1.5:p3, ft:webrtc])
Attachments
(2 files, 2 obsolete files)
|
7.57 KB,
patch
|
jesup
:
review+
|
Details | Diff | Splinter Review |
|
4.77 KB,
patch
|
jib
:
review+
|
Details | Diff | Splinter Review |
We should report telemetry for key quality metrics, minimally including packet loss (locally detected and RTCP-reported), jitter (locally detected and RTCP-reported), and RTT.
| Reporter | ||
Comment 1•11 years ago
|
||
This telemetry also needs to include information sufficient to debug sync issues; to wit: the delta between audio and video timestamps when received (prior to any imposition of any receiver-side synchronization).
| Assignee | ||
Updated•11 years ago
|
Assignee: nobody → jib
Updated•11 years ago
|
Blocks: webrtc_media_quality
Updated•11 years ago
|
Priority: -- → P3
Whiteboard: [p=2, 1.5:p3, ft:webrtc]
Target Milestone: --- → mozilla32
Updated•11 years ago
|
Priority: P3 → P2
| Assignee | ||
Comment 2•11 years ago
|
||
Telemetry definitions + removed 1 unused dictionary.
Attachment #8419370 -
Flags: review?(rjesup)
Comment 3•11 years ago
|
||
Comment on attachment 8419370 [details] [diff] [review]
Part 1: definitions for WebRTC telemetry for jitter, packet-loss and RTT
Review of attachment 8419370 [details] [diff] [review]:
-----------------------------------------------------------------
r+ but because you touch a webidl file I think you need a DOM reviewer (checkin hooks block otherwise)
Attachment #8419370 -
Flags: review?(rjesup) → review+
| Assignee | ||
Updated•11 years ago
|
Attachment #8419370 -
Flags: review?(bugs)
| Assignee | ||
Comment 4•11 years ago
|
||
Works. This is the simplest approach.
- High-level thread reuses stats code.
- Telemetry is separate for video and audio, inbound and outbound once per second.
RTT on outbound.
- Wakes up every second once first call is made, and keeps going. Telemetry
stops being reported once calls end by the virtue of the stats API returning
zero data for zero PCs (though I should probably detect idle and stop timer?)
Jesup and I discussed this thread approach vs. a piggyback approach which would not need a thread but may fail to report telemetry in cases where there are no incoming packets. The thread approach may be useful for other stuff we want to add, and it was the easiest to write, so there it is.
I'm perhaps a bit concerned about the efficiency of this approach, but maybe it's OK.
Let me know what you think.
Notes to self:
- Document that telememtry for RTCP would be more accurate
using an approach that calculated a running average from every
(sub-second) RTCP packet. However, this is arguably less important
since the best measure of quality at the remote is at the remote.
For locally derived info we assume the stack uses some form
of averaging already, so this per-second sample should suffice.
- Also calculate and add bandwidth telemetry.
Attachment #8419386 -
Flags: feedback?(rjesup)
Attachment #8419386 -
Flags: feedback?(docfaraday)
| Assignee | ||
Updated•11 years ago
|
Whiteboard: [p=2, 1.5:p3, ft:webrtc] → [p=1, 1.5:p3, ft:webrtc]
| Assignee | ||
Comment 5•11 years ago
|
||
(In reply to Jan-Ivar Bruaroey [:jib] from comment #4)
> I'm perhaps a bit concerned about the efficiency of this approach, but maybe
> it's OK.
I should explain the code to those unfamiliar with the stats code:
The threading model around the pipelines is weird:
- The pipelines are containers,
- the containers are only safe on main thread, the members are only safe on STS,
- hence the there and back again approach.
Updated•11 years ago
|
Attachment #8419370 -
Flags: review?(bugs) → review+
Comment 6•11 years ago
|
||
Comment on attachment 8419386 [details] [diff] [review]
Part 2: Thread approach for WebRTC telemetry for jitter, packet-loss and RTT
Review of attachment 8419386 [details] [diff] [review]:
-----------------------------------------------------------------
With one change, this should be pretty efficient when idle, and about as efficient a use of the stats API as possible. Some other minor nits.
::: media/webrtc/signaling/src/peerconnection/PeerConnectionCtx.cpp
@@ +252,5 @@
> +static void
> +EverySecondTelemetryCallback_s(nsAutoPtr<RTCStatsQueries> aQueryList) {
> + using namespace Telemetry;
> +
> + for (auto pc = aQueryList->begin(); pc != aQueryList->end(); ++pc) {
|pc| is a confusing name here to me.
@@ +295,5 @@
> +PeerConnectionCtx::EverySecondTelemetryCallback_m(nsITimer* timer, void *) {
> + MOZ_ASSERT(NS_IsMainThread());
> + if (PeerConnectionCtx::isActive()) {
> + PeerConnectionCtx *ctx = PeerConnectionCtx::GetInstance();
> + MOZ_ASSERT(ctx);
Do we actually need these checks, if the timer is only created when a PeerConnectionCtx is initted, destroyed when the same is destroyed, and a PeerConnectionCtx* passed to us here in that second param?
@@ +314,5 @@
> + p->second->BuildStatsQuery_m(nullptr, // all tracks
> + queries->back());
> + }
> + }
> + rv = RUN_ON_THREAD(stsThread,
You probably want to bail here if |queries| is empty; that will help the efficiency a lot (PeerConnectionCtx::isActive() is not going to tell you whether there are any PeerConnectionImpl). Once you do this, you aren't doing much at all, and you're only doing it once a second.
Attachment #8419386 -
Flags: feedback?(docfaraday) → feedback+
| Assignee | ||
Comment 7•11 years ago
|
||
(In reply to Byron Campen [:bwc] from comment #6)
> > + for (auto pc = aQueryList->begin(); pc != aQueryList->end(); ++pc) {
>
> |pc| is a confusing name here to me.
Doesn't each QueryList pretty much represent a PeerConnection? No worries, I'll call it q or something.
> > + if (PeerConnectionCtx::isActive()) {
> > + PeerConnectionCtx *ctx = PeerConnectionCtx::GetInstance();
> > + MOZ_ASSERT(ctx);
>
> Do we actually need these checks, if the timer is only created when a
> PeerConnectionCtx is initted, destroyed when the same is destroyed, and a
> PeerConnectionCtx* passed to us here in that second param?
isActive()? You're right. I cribbed it from your WebRTCGlobalInformation code, but it's not needed here. I'll change it to a MOZ_ASSERT.
> > + rv = RUN_ON_THREAD(stsThread,
>
> You probably want to bail here if |queries| is empty; that will help the
> efficiency a lot (PeerConnectionCtx::isActive() is not going to tell you
> whether there are any PeerConnectionImpl). Once you do this, you aren't
> doing much at all, and you're only doing it once a second.
Thanks, good idea!
Comment 8•11 years ago
|
||
Comment on attachment 8419386 [details] [diff] [review]
Part 2: Thread approach for WebRTC telemetry for jitter, packet-loss and RTT
Review of attachment 8419386 [details] [diff] [review]:
-----------------------------------------------------------------
r+ with nits/comments
::: media/webrtc/signaling/src/peerconnection/PeerConnectionCtx.cpp
@@ +246,5 @@
> +#ifdef MOZILLA_INTERNAL_API
> +typedef Vector<nsAutoPtr<RTCStatsQuery>> RTCStatsQueries;
> +
> +static void
> +Free_m(nsAutoPtr<RTCStatsQueries> aQueryList) { MOZ_ASSERT(NS_IsMainThread()); }
Please comment this; it will confuse people. Also comment the threading model you mentioned in the bug. Also, more than one line please.
@@ +253,5 @@
> +EverySecondTelemetryCallback_s(nsAutoPtr<RTCStatsQueries> aQueryList) {
> + using namespace Telemetry;
> +
> + for (auto pc = aQueryList->begin(); pc != aQueryList->end(); ++pc) {
> + auto& q = *pc;
Huh, we can use 'auto' now. Handy.
@@ +258,5 @@
> + PeerConnectionImpl::ExecuteStatsQuery_s(q);
> + auto& r = q->report;
> + if (r.mInboundRTPStreamStats.WasPassed()) {
> + auto& array = r.mInboundRTPStreamStats.Value();
> + for (uint32_t i = 0; i < array.Length(); i++) {
Hmmm, not trivial to recode this to use auto for 'i'. uint32_t will do
@@ +389,5 @@
> + MOZ_ASSERT(mTelemetryTimer);
> + nsresult rv = mTelemetryTimer->SetTarget(gMainThread);
> + NS_ENSURE_SUCCESS(rv, rv);
> + mTelemetryTimer->InitWithFuncCallback(EverySecondTelemetryCallback_m, this, 1000,
> + nsITimer::TYPE_REPEATING_PRECISE_CAN_SKIP);
I'd even be ok with SLACK here since the timings aren't very critical. However, PRECISE_CAN_SKIP is also probably fine.
::: media/webrtc/signaling/src/peerconnection/PeerConnectionCtx.h
@@ +104,4 @@
> // Telemetry Peer conection counter
> int mConnectionCounter;
>
> + nsCOMPtr<nsITimer> mTelemetryTimer;
Since we don't explicitly stop this, I presume ~PeerConnectionCtx() dies on the same thread that started the timer.
Attachment #8419386 -
Flags: feedback?(rjesup)
Attachment #8419386 -
Flags: feedback?(docfaraday)
Attachment #8419386 -
Flags: feedback+
Comment 9•11 years ago
|
||
Comment on attachment 8419386 [details] [diff] [review]
Part 2: Thread approach for WebRTC telemetry for jitter, packet-loss and RTT
undoing accidental re-f? to bwc
Attachment #8419386 -
Flags: feedback?(docfaraday) → feedback+
| Assignee | ||
Comment 10•11 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #8)
> I'd even be ok with SLACK here since the timings aren't very critical.
> However, PRECISE_CAN_SKIP is also probably fine.
I think the latter gives more accuracy to the telemetry numbers over time. You can literally add up the totals to know how man seconds of call time this represents. Skipping shouldn't normally happen at this interval I think.
> > + nsCOMPtr<nsITimer> mTelemetryTimer;
>
> Since we don't explicitly stop this, I presume ~PeerConnectionCtx() dies on
> the same thread that started the timer.
Good question. ~ was called on main thread when I checked in the debugger. http://mxr.mozilla.org/mozilla-central/source/xpcom/threads/nsITimer.idl#51 says only the target thread should interact with a timer after it's been launched, and main is our target, so this should be fine.
Should I add a main-thread assert to ~PeerConnectionCtx?
| Assignee | ||
Comment 11•11 years ago
|
||
Incorporated feedback. Green try - https://tbpl.mozilla.org/?tree=Try&rev=c437a83ded5c
Attachment #8419386 -
Attachment is obsolete: true
Attachment #8420946 -
Flags: review?(rjesup)
| Assignee | ||
Comment 12•11 years ago
|
||
Fixed two typos in histogram descriptions. Carrying forward r+ from jesup and smaug.
Attachment #8419370 -
Attachment is obsolete: true
Attachment #8421204 -
Flags: review+
Comment 13•11 years ago
|
||
Comment on attachment 8420946 [details] [diff] [review]
Part 2: Thread approach for WebRTC telemetry for jitter, packet-loss and RTT (2)
Review of attachment 8420946 [details] [diff] [review]:
-----------------------------------------------------------------
::: media/webrtc/signaling/src/peerconnection/PeerConnectionCtx.cpp
@@ +397,5 @@
> + mTelemetryTimer = do_CreateInstance(NS_TIMER_CONTRACTID);
> + MOZ_ASSERT(mTelemetryTimer);
> + nsresult rv = mTelemetryTimer->SetTarget(gMainThread);
> + NS_ENSURE_SUCCESS(rv, rv);
> + mTelemetryTimer->InitWithFuncCallback(EverySecondTelemetryCallback_m, this, 1000,
PR_MSEC_PER_SEC, though this is pretty optional
Attachment #8420946 -
Flags: review?(rjesup) → review+
| Assignee | ||
Updated•11 years ago
|
Keywords: checkin-needed
Comment 15•11 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/79f9c9553e29
https://hg.mozilla.org/integration/mozilla-inbound/rev/307078994617
Keywords: checkin-needed
Comment 16•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/79f9c9553e29
https://hg.mozilla.org/mozilla-central/rev/307078994617
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 18•11 years ago
|
||
| Assignee | ||
Comment 19•11 years ago
|
||
Patches in comment 17 and comment 18 belong to Bug 875556. I put the wrong bug# in the commit msg. Sorry.
You need to log in
before you can comment on or make changes to this bug.
Description
•