Closed
Bug 970742
Opened 10 years ago
Closed 10 years ago
Add metrics/telemetry for errors and error recovery
Categories
(Core :: WebRTC, defect, P2)
Tracking
()
RESOLVED
FIXED
mozilla32
People
(Reporter: mreavy, Assigned: jesup)
References
(Blocks 1 open bug)
Details
(Whiteboard: [p=3, 1.5:p1, ft:webrtc])
Attachments
(3 files, 2 obsolete files)
13.79 KB,
patch
|
jib
:
review+
|
Details | Diff | Splinter Review |
30.01 KB,
patch
|
jesup
:
review+
|
Details | Diff | Splinter Review |
10.78 KB,
patch
|
jib
:
review+
|
Details | Diff | Splinter Review |
We want to measure how effective our congestion control, error correction, and compositor rendering (Does the compositor render every frame it's given) is.
Assignee | ||
Comment 1•10 years ago
|
||
Congestion control measurements: (mostly only useful in Telemetry and about:webrtc, not Talos unless/until we have real network simulation) * Time from call start until within (say) 10% of max stable bitrate (not max instantaneous). * percentage of time in the different modes of congestion control (increase, stable, drain) * Average and max-stable RTT minus minimum stable RTT (basically try to subtract out "real" RTT to determine how much buffering is occurring in the network on top of that).
Assignee | ||
Comment 2•10 years ago
|
||
Error Correction: what modes it chooses for error recovery (rexmit, full-frame PLI, older-reference-frame). How long it took to recover in frames/ms (histogram). More/mostly useful in about:webrtc and telemetry than Talos.
Assignee | ||
Comment 3•10 years ago
|
||
Assigning to pkerr, but this is tentative - jib or I might end up doing it
Assignee: nobody → paulrkerr
Updated•10 years ago
|
Priority: -- → P1
Whiteboard: [p=5, 1.5:p1, ft:webrtc]
Target Milestone: --- → mozilla32
Reporter | ||
Updated•10 years ago
|
Priority: P1 → P2
Reporter | ||
Updated•10 years ago
|
Whiteboard: [p=5, 1.5:p1, ft:webrtc] → [p=3, 1.5:p1, ft:webrtc]
Comment 4•10 years ago
|
||
This should be applicable to B2G as well. Isn't this going into the v2.0 release? If so, I think we should start tracking this.
Reporter | ||
Comment 5•10 years ago
|
||
We are tracking this as part of the WebRTC Platform release for Fx 32 (which aligns with v2.0). Please see the following if you want more details: https://wiki.mozilla.org/Media/WebRTC#Sprint:_mozilla32_WebRTC_Platform
Assignee | ||
Comment 6•10 years ago
|
||
I'll file followup bugs for the stuff I split off from this one, which was a bit of a grab-bag of wants.
Assignee: pkerr → rjesup
Summary: Add metric tests for congestion control, error correction, compositor rendering → Add metrics/telemetry for errors and error recovery
Assignee | ||
Comment 7•10 years ago
|
||
Assignee | ||
Comment 8•10 years ago
|
||
Upstream patch
Assignee | ||
Updated•10 years ago
|
Attachment #8436020 -
Attachment is obsolete: true
Assignee | ||
Comment 9•10 years ago
|
||
gecko/telemetry patch
Assignee | ||
Updated•10 years ago
|
Attachment #8436242 -
Flags: review?(jib)
Assignee | ||
Updated•10 years ago
|
Attachment #8436243 -
Flags: review?(jib)
Comment 10•10 years ago
|
||
Comment on attachment 8436242 [details] [diff] [review] Monitor decoder error state to enable recording errors and error recovery times Review of attachment 8436242 [details] [diff] [review]: ----------------------------------------------------------------- lgtm. Some comments/questions. ::: media/webrtc/trunk/webrtc/modules/video_coding/main/source/receiver.cc @@ +60,3 @@ > } else { > state_ = kPassive; > + receiveState_ = kReceiveStateWaitingKey; // XXX Initial? Maybe remove comment or clarify? I find uncertain-sounding comments ambiguous, for instance, here I can't tell if you're uncertain or if you're soliciting opinions with this comment. You're inventing these states elsewhere, so I suspect the former, which I think means you get to pick. If not, please elaborate. @@ +391,5 @@ > } > // Update the dual receiver state. > if (frame.Complete() && frame.FrameType() == kVideoFrameKey) { > UpdateState(kPassive); > + return; If frame.MissingFrame() then this now sets state to kPassive whereas before it set it to kReceiving. Intentional? ::: media/webrtc/trunk/webrtc/modules/video_coding/main/source/receiver.h @@ +74,5 @@ > // Dual decoder. > bool DualDecoderCaughtUp(VCMEncodedFrame* dual_frame, > VCMReceiver& dual_receiver) const; > VCMReceiverState State() const; > + VideoReceiveState ReceiveState() const; Naming is pretty dense here. A receiver with two "receiver-states". Can we come up with better names? ::: media/webrtc/trunk/webrtc/modules/video_coding/main/source/video_receiver.cc @@ +432,3 @@ > } else { > _dualReceiver.Reset(); > + SetReceiveState(kReceiveStateWaitingKey); // XXX is this correct? I don't know. ::: media/webrtc/trunk/webrtc/video/receive_statistics_proxy.cc @@ +67,5 @@ > > +void ReceiveStatisticsProxy::ReceiveStateChange(const int video_channel, > + VideoReceiveState state) { > + CriticalSectionScoped cs(lock_.get()); > + printf("*** New state for %d: %d (was %d)\n", video_channel, state, receive_state_); remove printf
Updated•10 years ago
|
Attachment #8436242 -
Flags: review?(jib) → review+
Comment 11•10 years ago
|
||
Comment on attachment 8436243 [details] [diff] [review] Log and send to telemetry video errors and error recovery time in webrtc Review of attachment 8436243 [details] [diff] [review]: ----------------------------------------------------------------- lgtm. Disclaimer: I wasn't able to deduce a state transition diagram for the new receive state. A comment somewhere on that might be helpful. ::: media/webrtc/signaling/src/media-conduit/VideoConduit.cpp @@ +112,5 @@ > + if (mStatsProxy) { > + mStatsProxy->EndOfCallStats(); > + } > + mPtrRTP->DeregisterReceiveChannelRtcpStatisticsCallback(mChannel, mStatsProxy); > + mPtrRTP->DeregisterReceiveChannelRtpStatisticsCallback(mChannel, mStatsProxy); Do you want to call these when mStatsProxy == nullptr? If not, perhaps put this inside the if above. @@ +395,5 @@ > mPtrViEBase->LastError()); > return kMediaConduitRTCPStatusError; > } > + > + mStatsProxy = new WebrtcReceiveStats(0, // XXX local_ssrc? Odd comment. Can we find out more? @@ +404,5 @@ > + if(mPtrRTP->RegisterReceiveChannelRtpStatisticsCallback(mChannel, mStatsProxy) != 0) > + { > + CSFLogError(logTag, "%s RegisterReceiveChannelRtpStatistics Failed %d ", __FUNCTION__, > + mPtrViEBase->LastError()); > + return kMediaConduitRtpStatsError; Looks like these can't fail, so these error definitions are perhaps overkill? @@ +1133,5 @@ > } > > int WebrtcVideoConduit::SendRTCPPacket(int channel, const void* data, int len) > { > + CSFLogDebug(logTag, "%s : 0x%02x %d - channel %d , len %d, transmitting: %d", __FUNCTION__, Why dump the first byte of data? A comment explaining the intent would be helpful. @@ +1381,5 @@ > > +void WebrtcReceiveStats::ReceiveStateChange(const int aChannel, > + webrtc::VideoReceiveState aState) > +{ > + printf("*** New state for %d: %d (was %d)\n", aChannel, aState, mReceiveState); remove printf @@ +1386,5 @@ > +#ifdef MOZILLA_INTERNAL_API > + if (mFirstDecodeTime.IsNull()) { > + mFirstDecodeTime = TimeStamp::Now(); > + } > + if (mReceiveState == webrtc::kReceiveStateNormal || mReceiveState == webrtc::kReceiveStateInitial) { switch? @@ +1388,5 @@ > + mFirstDecodeTime = TimeStamp::Now(); > + } > + if (mReceiveState == webrtc::kReceiveStateNormal || mReceiveState == webrtc::kReceiveStateInitial) { > + // in a normal state > + if (aState != webrtc::kReceiveStateNormal && aState != webrtc::kReceiveStateInitial) { // paranoia Isn't receiving a state-change of kReceiveStateInitial at least grounds for an assert? Since I think you invent these states in the other patch, I'd prefer to see a firmer state transition model, perhaps with assertions instead of silent-running paranoia if()s. @@ +1396,5 @@ > + } > + } > + } else { > + // not in a normal state > + if (aState == webrtc::kReceiveStateNormal || aState == webrtc::kReceiveStateInitial) { // paranoia Same question here. @@ +1402,5 @@ > + if (mReceiveState == webrtc::kReceiveStatePreemptiveNACK) { > + mRecoveredBeforeLoss++; > + } else if (!mReceiveFailureTime.IsNull()) { // safety > + TimeDuration timeDelta = TimeStamp::Now() - mReceiveFailureTime; > + CSFLogError(logTag, "Video error duration: %u ms", (uint32_t) timeDelta.ToMilliseconds()); Maybe use static_cast<> here or %f ? @@ +1403,5 @@ > + mRecoveredBeforeLoss++; > + } else if (!mReceiveFailureTime.IsNull()) { // safety > + TimeDuration timeDelta = TimeStamp::Now() - mReceiveFailureTime; > + CSFLogError(logTag, "Video error duration: %u ms", (uint32_t) timeDelta.ToMilliseconds()); > + Telemetry::Accumulate(Telemetry::WEBRTC_VIDEO_ERROR_RECOVERY_MS, timeDelta.ToMilliseconds()); Odd that this doesn't produce a warning. ToMilliseconds() returns double. Perhaps consider AccumulateTimeDelta or a cast? @@ +1424,5 @@ > + if (callDelta.ToSeconds() != 0) { > + CSFLogError(logTag, "Video recovery before error per min %f", > + (float) mRecoveredBeforeLoss/(callDelta.ToSeconds()/60)); > + Telemetry::Accumulate(Telemetry::WEBRTC_VIDEO_RECOVERY_BEFORE_ERROR_PER_MIN, > + (float) mRecoveredBeforeLoss/(callDelta.ToSeconds()/60)); I think a stack temporary to reduce chance of repetition bugs would be nice here. @@ +1428,5 @@ > + (float) mRecoveredBeforeLoss/(callDelta.ToSeconds()/60)); > + CSFLogError(logTag, "Video recovery after error per min %f", > + (float) mRecoveredLosses/(callDelta.ToSeconds()/60)); > + Telemetry::Accumulate(Telemetry::WEBRTC_VIDEO_RECOVERY_AFTER_ERROR_PER_MIN, > + (float) mRecoveredLosses/(callDelta.ToSeconds()/60)); same here @@ +1432,5 @@ > + (float) mRecoveredLosses/(callDelta.ToSeconds()/60)); > + CSFLogError(logTag, "Video error time percentage %f%%", > + (float) mTotalLossTime.ToSeconds()*100/(callDelta.ToSeconds())); > + Telemetry::Accumulate(Telemetry::WEBRTC_VIDEO_ERROR_TIME_PERMILLE, > + (float) mTotalLossTime.ToSeconds()*1000/(callDelta.ToSeconds())); and here ::: toolkit/components/telemetry/Histograms.json @@ +5290,5 @@ > + "high": 1000, > + "n_buckets": 200, > + "description": "Number of losses recovered after error per min" > + }, > + "WEBRTC_VIDEO_ERROR_TIME_PERMILLE": { Description says this is time in recovery, so maybe include "RECOVERY" somewhere in the name?
Attachment #8436243 -
Flags: review?(jib) → review+
Assignee | ||
Comment 12•10 years ago
|
||
removed some no-longer-needed bits, carry forward r=jib
Assignee | ||
Updated•10 years ago
|
Attachment #8436242 -
Attachment is obsolete: true
Assignee | ||
Comment 13•10 years ago
|
||
Replaces gecko changes for receive state monitoring with CodecStatistics object use
Assignee | ||
Updated•10 years ago
|
Attachment #8436410 -
Flags: review+
Assignee | ||
Updated•10 years ago
|
Attachment #8436411 -
Flags: review?(jib)
Comment 14•10 years ago
|
||
Comment on attachment 8436411 [details] [diff] [review] Add receive state monitoring to webrtc CodecStatistics Review of attachment 8436411 [details] [diff] [review]: ----------------------------------------------------------------- r=me. Nits optional. ::: media/webrtc/signaling/src/media-conduit/CodecStatistics.cpp @@ +92,5 @@ > + * WaitingKey -> PreemptiveNACK > + * DecodingWithErrors -> PreemptiveNACK > + */ > + > + if (mReceiveState == webrtc::kReceiveStateNormal || mReceiveState == webrtc::kReceiveStateInitial) { Two states to keep track of in lots of if-statements. Maybe: switch(mReceiveState) { case kReceiveStateInitial: case kReceiveStateNormal: // in a normal state break; default: // no longer in a normal state } to break things up a bit? Also you're using namespace webrtc here, so you don't need webrtc:: @@ +94,5 @@ > + */ > + > + if (mReceiveState == webrtc::kReceiveStateNormal || mReceiveState == webrtc::kReceiveStateInitial) { > + // in a normal state > + if (aState != webrtc::kReceiveStateNormal && aState != webrtc::kReceiveStateInitial) { // paranoia Again not a fan of silent paranoia if() statements, but I like the transitions comment. Would prefer asserts to back it up though. @@ +102,5 @@ > + } > + } > + } else { > + // not in a normal state > + if (aState == webrtc::kReceiveStateNormal || aState == webrtc::kReceiveStateInitial) { // paranoia webrtc:: and paranoia @@ +112,5 @@ > + TimeDuration timeDelta = TimeStamp::Now() - mReceiveFailureTime; > + CSFLogError(logTag, "Video error duration: %u ms", > + static_cast<uint32_t>(timeDelta.ToMilliseconds())); > + Telemetry::Accumulate(Telemetry::WEBRTC_VIDEO_ERROR_RECOVERY_MS, > + static_cast<uint32_t>(timeDelta.ToMilliseconds())); Use AccumulateTimeDelta here. ::: toolkit/components/telemetry/Histograms.json @@ +5330,5 @@ > + "high": 10000, > + "n_buckets": 500, > + "description": "Time to recover from a video error in ms" > + }, > + "WEBRTC_VIDEO_RECOVERY_BEFORE_ERROR_PER_MIN": { This is also in the ReceiveStatisticsProxy patch in this bug, but I understand from IRC that this replaces that patch, so all good.
Attachment #8436411 -
Flags: review?(jib) → review+
Assignee | ||
Comment 15•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/460666c9b930 https://hg.mozilla.org/integration/mozilla-inbound/rev/0f82d42806cb
Blocks: webrtc_upstream_bugs
Comment 16•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/460666c9b930 https://hg.mozilla.org/mozilla-central/rev/0f82d42806cb
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
No longer blocks: webrtc_upstream_bugs
You need to log in
before you can comment on or make changes to this bug.
Description
•