Closed Bug 1207416 Opened 9 years ago Closed 9 years ago

Telemetry for time spent in different load states

Categories

(Core :: WebRTC, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla44
Tracking Status
firefox44 --- fixed

People

(Reporter: jesup, Assigned: jesup)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 1 obsolete file)

We'd like to collect data on how much time in calls is spent in each load state.

We need to be a little careful that very short "test" calls don't bias the data too much.

I think the best way to report this is by the percentage spent in each state per call, with very short calls filtered out.
'short' call cutoff is set to 20 seconds
Attachment #8664604 - Flags: review?(gpascutto)
Comment on attachment 8664604 [details] [diff] [review]
Add Telemetry for time spent in different Load states in WebRTC

Review of attachment 8664604 [details] [diff] [review]:
-----------------------------------------------------------------

r- mostly because I want to see the answers to some questions.

More fundamentally, I'm concerned about the 20 seconds filtering. It's reasonable to assume that call duration will be linked to the user experience, and the user experience might be strongly linked to the load state. So by filtering short calls, you might be strongly biasing this data. More concretely, calls on overloaded systems might be dropped quickly, and we could erroneously conclude it doesn't happen a lot.

::: dom/media/systemservices/LoadManager.cpp
@@ +193,5 @@
>      if (mLoadMonitor) {
> +      // Record how long we spent in the final state for later Telemetry or display
> +      TimeStamp now = TimeStamp::Now();
> +      mTimeInState[mCurrentState] += (now - mLastStateChange).ToMilliseconds();
> +      mLastStateChange = now;

Does this statement have any effect?

@@ +196,5 @@
> +      mTimeInState[mCurrentState] += (now - mLastStateChange).ToMilliseconds();
> +      mLastStateChange = now;
> +
> +      float total = 0;
> +      for (size_t i = 0; i < sizeof(mTimeInState)/sizeof(mTimeInState[0]); i++) {

mfbt/ArrayUtils.h has MOZ_ARRAY_LENGTH

@@ +207,5 @@
> +                              (uint32_t) (mTimeInState[webrtc::CPULoadState::kLoadRelaxed]/total * 100));
> +        Telemetry::Accumulate(Telemetry::WEBRTC_LOAD_STATE_NORMAL,
> +                              (uint32_t) (mTimeInState[webrtc::CPULoadState::kLoadNormal]/total * 100));
> +        Telemetry::Accumulate(Telemetry::WEBRTC_LOAD_STATE_STRESSED,
> +                              (uint32_t) (mTimeInState[webrtc::CPULoadState::kLoadStressed]/total * 100));

I forgot the exact rules when we send Telemetry, but does gathering these values in Shutdown (because that's realistically where this happens now) leave any opportunity for them to actually ever get sent?

::: dom/media/systemservices/LoadManager.h
@@ +57,5 @@
>      // This protects access to the mObservers list, the current state, and
>      // pretty much all the other members (below).
>      Mutex mLock;
>      nsTArray<webrtc::CPULoadStateObserver*> mObservers;
>      webrtc::CPULoadState mCurrentState;

Maybe this should now be called mLastState for consistency.

@@ +60,5 @@
>      nsTArray<webrtc::CPULoadStateObserver*> mObservers;
>      webrtc::CPULoadState mCurrentState;
> +    TimeStamp mLastStateChange;
> +    // update if states in CPULoadState change
> +    float mTimeInState[static_cast<int>(webrtc::CPULoadState::kLoadStressed) + 1];

I'd generally strongly favor extending the enum with a kLoadFinal or kLoadInvalid value, and putting that here. If we ever add more load states you're going to have to remember this code.
Attachment #8664604 - Flags: review?(gpascutto) → review-
(In reply to Gian-Carlo Pascutto [:gcp] from comment #2)
> Comment on attachment 8664604 [details] [diff] [review]
> Add Telemetry for time spent in different Load states in WebRTC
> 
> Review of attachment 8664604 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> r- mostly because I want to see the answers to some questions.
> 
> More fundamentally, I'm concerned about the 20 seconds filtering. It's
> reasonable to assume that call duration will be linked to the user
> experience, and the user experience might be strongly linked to the load
> state. So by filtering short calls, you might be strongly biasing this data.
> More concretely, calls on overloaded systems might be dropped quickly, and
> we could erroneously conclude it doesn't happen a lot.

I could filter out say <5 second calls, and put 5-20 second calls in a different set of telemetry items.
Note that our Nightly and Aurora channels see a lot of test traffic, the call duration graphs are rather different than beta (or release would be).

> 
> ::: dom/media/systemservices/LoadManager.cpp
> @@ +193,5 @@
> >      if (mLoadMonitor) {
> > +      // Record how long we spent in the final state for later Telemetry or display
> > +      TimeStamp now = TimeStamp::Now();
> > +      mTimeInState[mCurrentState] += (now - mLastStateChange).ToMilliseconds();
> > +      mLastStateChange = now;
> 
> Does this statement have any effect?

Probably not; I was being cautious.

> @@ +196,5 @@
> > +      mTimeInState[mCurrentState] += (now - mLastStateChange).ToMilliseconds();
> > +      mLastStateChange = now;
> > +
> > +      float total = 0;
> > +      for (size_t i = 0; i < sizeof(mTimeInState)/sizeof(mTimeInState[0]); i++) {
> 
> mfbt/ArrayUtils.h has MOZ_ARRAY_LENGTH

Cool; I used to sprinkle ARRAY_SIZE() macros in my code pre-mozilla; I just was being lazy and didn't look it up.

> 
> @@ +207,5 @@
> > +                              (uint32_t) (mTimeInState[webrtc::CPULoadState::kLoadRelaxed]/total * 100));
> > +        Telemetry::Accumulate(Telemetry::WEBRTC_LOAD_STATE_NORMAL,
> > +                              (uint32_t) (mTimeInState[webrtc::CPULoadState::kLoadNormal]/total * 100));
> > +        Telemetry::Accumulate(Telemetry::WEBRTC_LOAD_STATE_STRESSED,
> > +                              (uint32_t) (mTimeInState[webrtc::CPULoadState::kLoadStressed]/total * 100));
> 
> I forgot the exact rules when we send Telemetry, but does gathering these
> values in Shutdown (because that's realistically where this happens now)
> leave any opportunity for them to actually ever get sent?

We run this whenever the last PeerConnection shuts down (verified this), so unless you hit the close box in-call it will get reported (and perhaps even then depending on when the peerconnection gets reaped).

> 
> ::: dom/media/systemservices/LoadManager.h
> @@ +57,5 @@
> >      // This protects access to the mObservers list, the current state, and
> >      // pretty much all the other members (below).
> >      Mutex mLock;
> >      nsTArray<webrtc::CPULoadStateObserver*> mObservers;
> >      webrtc::CPULoadState mCurrentState;
> 
> Maybe this should now be called mLastState for consistency.

Well, it is the CurrentState when it's running, so the name is correct.

> 
> @@ +60,5 @@
> >      nsTArray<webrtc::CPULoadStateObserver*> mObservers;
> >      webrtc::CPULoadState mCurrentState;
> > +    TimeStamp mLastStateChange;
> > +    // update if states in CPULoadState change
> > +    float mTimeInState[static_cast<int>(webrtc::CPULoadState::kLoadStressed) + 1];
> 
> I'd generally strongly favor extending the enum with a kLoadFinal or
> kLoadInvalid value, and putting that here. If we ever add more load states
> you're going to have to remember this code.

Yeah; I didn't want to touch the common upstream code, but probably it makes sense.  Too bad there's no ENUM_MAX_VALUE() macro...
interdiffs to follow
Attachment #8665003 - Flags: review?(gpascutto)
Attachment #8664604 - Attachment is obsolete: true
Attached patch interdiffSplinter Review
Comment on attachment 8665003 [details] [diff] [review]
Add Telemetry for time spent in different Load states in WebRTC

Review of attachment 8665003 [details] [diff] [review]:
-----------------------------------------------------------------

r+ but please change the enum.

::: dom/media/systemservices/LoadManager.h
@@ +59,5 @@
>      Mutex mLock;
>      nsTArray<webrtc::CPULoadStateObserver*> mObservers;
>      webrtc::CPULoadState mCurrentState;
> +    TimeStamp mLastStateChange;
> +    float mTimeInState[static_cast<int>(webrtc::kLoadLast) + 1];

You can get rid of the +1, see the pattern below.

::: media/webrtc/trunk/webrtc/common_types.h
@@ +793,2 @@
>  };
> +const CPULoadState kLoadLast = kLoadStressed;

Hmm?! The canonical way to do this is:

enum CPULoadState {
  kLoadRelaxed = 0,
  kLoadNormal,	
  kLoadStressed,
  kLoadLast,
};

Your pattern still breaks if you don't update it when states gets added (though the distance between use and declaration is lessened - but it's still unneeded).

::: toolkit/components/telemetry/Histograms.json
@@ +6401,5 @@
> +    "expires_in_version": "never",
> +    "kind": "linear",
> +    "high": "100",
> +    "n_buckets": "25",
> +    "description": "Percentage of time spent in the Relaxed load state in calls over 20 seconds."

It's 30 seconds now. (Same mistake repeats several times)
Attachment #8665003 - Flags: review?(gpascutto) → review+
> ::: media/webrtc/trunk/webrtc/common_types.h
> @@ +793,2 @@
> >  };
> > +const CPULoadState kLoadLast = kLoadStressed;
> 
> Hmm?! The canonical way to do this is:
> 
> enum CPULoadState {
>   kLoadRelaxed = 0,
>   kLoadNormal,	
>   kLoadStressed,
>   kLoadLast,
> };
> 
> Your pattern still breaks if you don't update it when states gets added
> (though the distance between use and declaration is lessened - but it's
> still unneeded).

Yeah; I shied away from that pattern because none of the other webrtc.org enums (at least common ones) use that pattern, and I want to upstream this tiny change.  But it's not a big deal, I'll change it.

> > +    "description": "Percentage of time spent in the Relaxed load state in calls over 20 seconds."
> 
> It's 30 seconds now. (Same mistake repeats several times)

Right
https://hg.mozilla.org/mozilla-central/rev/8f459c4b475d
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
No longer depends on: webrtc-telemetry
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: