Closed
Bug 970742
Opened 11 years ago
Closed 11 years ago
Add metrics/telemetry for errors and error recovery
Categories
(Core :: WebRTC, defect, P2)
Tracking
()
RESOLVED
FIXED
mozilla32
People
(Reporter: mreavy, Assigned: jesup)
References
(Blocks 1 open bug)
Details
(Whiteboard: [p=3, 1.5:p1, ft:webrtc])
Attachments
(3 files, 2 obsolete files)
13.79 KB,
patch
|
jib
:
review+
|
Details | Diff | Splinter Review |
30.01 KB,
patch
|
jesup
:
review+
|
Details | Diff | Splinter Review |
10.78 KB,
patch
|
jib
:
review+
|
Details | Diff | Splinter Review |
We want to measure how effective our congestion control, error correction, and compositor rendering (Does the compositor render every frame it's given) is.
Assignee | ||
Comment 1•11 years ago
|
||
Congestion control measurements: (mostly only useful in Telemetry and about:webrtc, not Talos unless/until we have real network simulation)
* Time from call start until within (say) 10% of max stable bitrate (not max instantaneous).
* percentage of time in the different modes of congestion control (increase, stable, drain)
* Average and max-stable RTT minus minimum stable RTT (basically try to subtract out "real" RTT to determine how much buffering is occurring in the network on top of that).
Assignee | ||
Comment 2•11 years ago
|
||
Error Correction: what modes it chooses for error recovery (rexmit, full-frame PLI, older-reference-frame).
How long it took to recover in frames/ms (histogram). More/mostly useful in about:webrtc and telemetry than Talos.
Assignee | ||
Comment 3•11 years ago
|
||
Assigning to pkerr, but this is tentative - jib or I might end up doing it
Assignee: nobody → paulrkerr
Updated•11 years ago
|
Priority: -- → P1
Whiteboard: [p=5, 1.5:p1, ft:webrtc]
Target Milestone: --- → mozilla32
Reporter | ||
Updated•11 years ago
|
Priority: P1 → P2
Reporter | ||
Updated•11 years ago
|
Whiteboard: [p=5, 1.5:p1, ft:webrtc] → [p=3, 1.5:p1, ft:webrtc]
Comment 4•11 years ago
|
||
This should be applicable to B2G as well. Isn't this going into the v2.0 release? If so, I think we should start tracking this.
Reporter | ||
Comment 5•11 years ago
|
||
We are tracking this as part of the WebRTC Platform release for Fx 32 (which aligns with v2.0). Please see the following if you want more details: https://wiki.mozilla.org/Media/WebRTC#Sprint:_mozilla32_WebRTC_Platform
Assignee | ||
Comment 6•11 years ago
|
||
I'll file followup bugs for the stuff I split off from this one, which was a bit of a grab-bag of wants.
Assignee: pkerr → rjesup
Summary: Add metric tests for congestion control, error correction, compositor rendering → Add metrics/telemetry for errors and error recovery
Assignee | ||
Comment 7•11 years ago
|
||
Assignee | ||
Comment 8•11 years ago
|
||
Upstream patch
Assignee | ||
Updated•11 years ago
|
Attachment #8436020 -
Attachment is obsolete: true
Assignee | ||
Comment 9•11 years ago
|
||
gecko/telemetry patch
Assignee | ||
Updated•11 years ago
|
Attachment #8436242 -
Flags: review?(jib)
Assignee | ||
Updated•11 years ago
|
Attachment #8436243 -
Flags: review?(jib)
Comment 10•11 years ago
|
||
Comment on attachment 8436242 [details] [diff] [review]
Monitor decoder error state to enable recording errors and error recovery times
Review of attachment 8436242 [details] [diff] [review]:
-----------------------------------------------------------------
lgtm. Some comments/questions.
::: media/webrtc/trunk/webrtc/modules/video_coding/main/source/receiver.cc
@@ +60,3 @@
> } else {
> state_ = kPassive;
> + receiveState_ = kReceiveStateWaitingKey; // XXX Initial?
Maybe remove comment or clarify?
I find uncertain-sounding comments ambiguous, for instance, here I can't tell if you're uncertain or if you're soliciting opinions with this comment. You're inventing these states elsewhere, so I suspect the former, which I think means you get to pick. If not, please elaborate.
@@ +391,5 @@
> }
> // Update the dual receiver state.
> if (frame.Complete() && frame.FrameType() == kVideoFrameKey) {
> UpdateState(kPassive);
> + return;
If frame.MissingFrame() then this now sets state to kPassive whereas before it set it to kReceiving. Intentional?
::: media/webrtc/trunk/webrtc/modules/video_coding/main/source/receiver.h
@@ +74,5 @@
> // Dual decoder.
> bool DualDecoderCaughtUp(VCMEncodedFrame* dual_frame,
> VCMReceiver& dual_receiver) const;
> VCMReceiverState State() const;
> + VideoReceiveState ReceiveState() const;
Naming is pretty dense here. A receiver with two "receiver-states". Can we come up with better names?
::: media/webrtc/trunk/webrtc/modules/video_coding/main/source/video_receiver.cc
@@ +432,3 @@
> } else {
> _dualReceiver.Reset();
> + SetReceiveState(kReceiveStateWaitingKey); // XXX is this correct?
I don't know.
::: media/webrtc/trunk/webrtc/video/receive_statistics_proxy.cc
@@ +67,5 @@
>
> +void ReceiveStatisticsProxy::ReceiveStateChange(const int video_channel,
> + VideoReceiveState state) {
> + CriticalSectionScoped cs(lock_.get());
> + printf("*** New state for %d: %d (was %d)\n", video_channel, state, receive_state_);
remove printf
Updated•11 years ago
|
Attachment #8436242 -
Flags: review?(jib) → review+
Comment 11•11 years ago
|
||
Comment on attachment 8436243 [details] [diff] [review]
Log and send to telemetry video errors and error recovery time in webrtc
Review of attachment 8436243 [details] [diff] [review]:
-----------------------------------------------------------------
lgtm. Disclaimer: I wasn't able to deduce a state transition diagram for the new receive state. A comment somewhere on that might be helpful.
::: media/webrtc/signaling/src/media-conduit/VideoConduit.cpp
@@ +112,5 @@
> + if (mStatsProxy) {
> + mStatsProxy->EndOfCallStats();
> + }
> + mPtrRTP->DeregisterReceiveChannelRtcpStatisticsCallback(mChannel, mStatsProxy);
> + mPtrRTP->DeregisterReceiveChannelRtpStatisticsCallback(mChannel, mStatsProxy);
Do you want to call these when mStatsProxy == nullptr? If not, perhaps put this inside the if above.
@@ +395,5 @@
> mPtrViEBase->LastError());
> return kMediaConduitRTCPStatusError;
> }
> +
> + mStatsProxy = new WebrtcReceiveStats(0, // XXX local_ssrc?
Odd comment. Can we find out more?
@@ +404,5 @@
> + if(mPtrRTP->RegisterReceiveChannelRtpStatisticsCallback(mChannel, mStatsProxy) != 0)
> + {
> + CSFLogError(logTag, "%s RegisterReceiveChannelRtpStatistics Failed %d ", __FUNCTION__,
> + mPtrViEBase->LastError());
> + return kMediaConduitRtpStatsError;
Looks like these can't fail, so these error definitions are perhaps overkill?
@@ +1133,5 @@
> }
>
> int WebrtcVideoConduit::SendRTCPPacket(int channel, const void* data, int len)
> {
> + CSFLogDebug(logTag, "%s : 0x%02x %d - channel %d , len %d, transmitting: %d", __FUNCTION__,
Why dump the first byte of data? A comment explaining the intent would be helpful.
@@ +1381,5 @@
>
> +void WebrtcReceiveStats::ReceiveStateChange(const int aChannel,
> + webrtc::VideoReceiveState aState)
> +{
> + printf("*** New state for %d: %d (was %d)\n", aChannel, aState, mReceiveState);
remove printf
@@ +1386,5 @@
> +#ifdef MOZILLA_INTERNAL_API
> + if (mFirstDecodeTime.IsNull()) {
> + mFirstDecodeTime = TimeStamp::Now();
> + }
> + if (mReceiveState == webrtc::kReceiveStateNormal || mReceiveState == webrtc::kReceiveStateInitial) {
switch?
@@ +1388,5 @@
> + mFirstDecodeTime = TimeStamp::Now();
> + }
> + if (mReceiveState == webrtc::kReceiveStateNormal || mReceiveState == webrtc::kReceiveStateInitial) {
> + // in a normal state
> + if (aState != webrtc::kReceiveStateNormal && aState != webrtc::kReceiveStateInitial) { // paranoia
Isn't receiving a state-change of kReceiveStateInitial at least grounds for an assert?
Since I think you invent these states in the other patch, I'd prefer to see a firmer state transition model, perhaps with assertions instead of silent-running paranoia if()s.
@@ +1396,5 @@
> + }
> + }
> + } else {
> + // not in a normal state
> + if (aState == webrtc::kReceiveStateNormal || aState == webrtc::kReceiveStateInitial) { // paranoia
Same question here.
@@ +1402,5 @@
> + if (mReceiveState == webrtc::kReceiveStatePreemptiveNACK) {
> + mRecoveredBeforeLoss++;
> + } else if (!mReceiveFailureTime.IsNull()) { // safety
> + TimeDuration timeDelta = TimeStamp::Now() - mReceiveFailureTime;
> + CSFLogError(logTag, "Video error duration: %u ms", (uint32_t) timeDelta.ToMilliseconds());
Maybe use static_cast<> here or %f ?
@@ +1403,5 @@
> + mRecoveredBeforeLoss++;
> + } else if (!mReceiveFailureTime.IsNull()) { // safety
> + TimeDuration timeDelta = TimeStamp::Now() - mReceiveFailureTime;
> + CSFLogError(logTag, "Video error duration: %u ms", (uint32_t) timeDelta.ToMilliseconds());
> + Telemetry::Accumulate(Telemetry::WEBRTC_VIDEO_ERROR_RECOVERY_MS, timeDelta.ToMilliseconds());
Odd that this doesn't produce a warning. ToMilliseconds() returns double.
Perhaps consider AccumulateTimeDelta or a cast?
@@ +1424,5 @@
> + if (callDelta.ToSeconds() != 0) {
> + CSFLogError(logTag, "Video recovery before error per min %f",
> + (float) mRecoveredBeforeLoss/(callDelta.ToSeconds()/60));
> + Telemetry::Accumulate(Telemetry::WEBRTC_VIDEO_RECOVERY_BEFORE_ERROR_PER_MIN,
> + (float) mRecoveredBeforeLoss/(callDelta.ToSeconds()/60));
I think a stack temporary to reduce chance of repetition bugs would be nice here.
@@ +1428,5 @@
> + (float) mRecoveredBeforeLoss/(callDelta.ToSeconds()/60));
> + CSFLogError(logTag, "Video recovery after error per min %f",
> + (float) mRecoveredLosses/(callDelta.ToSeconds()/60));
> + Telemetry::Accumulate(Telemetry::WEBRTC_VIDEO_RECOVERY_AFTER_ERROR_PER_MIN,
> + (float) mRecoveredLosses/(callDelta.ToSeconds()/60));
same here
@@ +1432,5 @@
> + (float) mRecoveredLosses/(callDelta.ToSeconds()/60));
> + CSFLogError(logTag, "Video error time percentage %f%%",
> + (float) mTotalLossTime.ToSeconds()*100/(callDelta.ToSeconds()));
> + Telemetry::Accumulate(Telemetry::WEBRTC_VIDEO_ERROR_TIME_PERMILLE,
> + (float) mTotalLossTime.ToSeconds()*1000/(callDelta.ToSeconds()));
and here
::: toolkit/components/telemetry/Histograms.json
@@ +5290,5 @@
> + "high": 1000,
> + "n_buckets": 200,
> + "description": "Number of losses recovered after error per min"
> + },
> + "WEBRTC_VIDEO_ERROR_TIME_PERMILLE": {
Description says this is time in recovery, so maybe include "RECOVERY" somewhere in the name?
Attachment #8436243 -
Flags: review?(jib) → review+
Assignee | ||
Comment 12•11 years ago
|
||
removed some no-longer-needed bits, carry forward r=jib
Assignee | ||
Updated•11 years ago
|
Attachment #8436242 -
Attachment is obsolete: true
Assignee | ||
Comment 13•11 years ago
|
||
Replaces gecko changes for receive state monitoring with CodecStatistics object use
Assignee | ||
Updated•11 years ago
|
Attachment #8436410 -
Flags: review+
Assignee | ||
Updated•11 years ago
|
Attachment #8436411 -
Flags: review?(jib)
Comment 14•11 years ago
|
||
Comment on attachment 8436411 [details] [diff] [review]
Add receive state monitoring to webrtc CodecStatistics
Review of attachment 8436411 [details] [diff] [review]:
-----------------------------------------------------------------
r=me. Nits optional.
::: media/webrtc/signaling/src/media-conduit/CodecStatistics.cpp
@@ +92,5 @@
> + * WaitingKey -> PreemptiveNACK
> + * DecodingWithErrors -> PreemptiveNACK
> + */
> +
> + if (mReceiveState == webrtc::kReceiveStateNormal || mReceiveState == webrtc::kReceiveStateInitial) {
Two states to keep track of in lots of if-statements. Maybe:
switch(mReceiveState) {
case kReceiveStateInitial:
case kReceiveStateNormal:
// in a normal state
break;
default:
// no longer in a normal state
}
to break things up a bit? Also you're using namespace webrtc here, so you don't need webrtc::
@@ +94,5 @@
> + */
> +
> + if (mReceiveState == webrtc::kReceiveStateNormal || mReceiveState == webrtc::kReceiveStateInitial) {
> + // in a normal state
> + if (aState != webrtc::kReceiveStateNormal && aState != webrtc::kReceiveStateInitial) { // paranoia
Again not a fan of silent paranoia if() statements, but I like the transitions comment. Would prefer asserts to back it up though.
@@ +102,5 @@
> + }
> + }
> + } else {
> + // not in a normal state
> + if (aState == webrtc::kReceiveStateNormal || aState == webrtc::kReceiveStateInitial) { // paranoia
webrtc:: and paranoia
@@ +112,5 @@
> + TimeDuration timeDelta = TimeStamp::Now() - mReceiveFailureTime;
> + CSFLogError(logTag, "Video error duration: %u ms",
> + static_cast<uint32_t>(timeDelta.ToMilliseconds()));
> + Telemetry::Accumulate(Telemetry::WEBRTC_VIDEO_ERROR_RECOVERY_MS,
> + static_cast<uint32_t>(timeDelta.ToMilliseconds()));
Use AccumulateTimeDelta here.
::: toolkit/components/telemetry/Histograms.json
@@ +5330,5 @@
> + "high": 10000,
> + "n_buckets": 500,
> + "description": "Time to recover from a video error in ms"
> + },
> + "WEBRTC_VIDEO_RECOVERY_BEFORE_ERROR_PER_MIN": {
This is also in the ReceiveStatisticsProxy patch in this bug, but I understand from IRC that this replaces that patch, so all good.
Attachment #8436411 -
Flags: review?(jib) → review+
Assignee | ||
Comment 15•11 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/460666c9b930
https://hg.mozilla.org/integration/mozilla-inbound/rev/0f82d42806cb
Blocks: webrtc_upstream_bugs
Comment 16•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/460666c9b930
https://hg.mozilla.org/mozilla-central/rev/0f82d42806cb
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
No longer blocks: webrtc_upstream_bugs
You need to log in
before you can comment on or make changes to this bug.
Description
•