Closed Bug 1440673 Opened 6 years ago Closed 6 years ago

Add the ability to report a summary of Telemetry events

Categories

(Toolkit :: Telemetry, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
mozilla61
Tracking Status
firefox61 --- fixed

People

(Reporter: mreid, Assigned: chutten)

References

Details

User Story

As a feature owner, I want an easy way to measure user engagement with various parts of the feature so that I know where to focus effort and evaluate risk.

Attachments

(12 files)

59 bytes, text/x-review-board-request
Dexter
: review+
Details
59 bytes, text/x-review-board-request
Dexter
: review+
Details
59 bytes, text/x-review-board-request
Dexter
: review+
Details
59 bytes, text/x-review-board-request
Dexter
: review+
Details
59 bytes, text/x-review-board-request
Dexter
: review+
Details
59 bytes, text/x-review-board-request
Dexter
: review+
Details
59 bytes, text/x-review-board-request
Dexter
: review+
Details
59 bytes, text/x-review-board-request
Dexter
: review+
Details
59 bytes, text/x-review-board-request
Dexter
: review+
Details
59 bytes, text/x-review-board-request
Dexter
: review+
Details
59 bytes, text/x-review-board-request
Dexter
: review+
Details
59 bytes, text/x-review-board-request
Dexter
: review+
Details
Currently Telemetry supports instrumenting data collection using events which are reported in the "main" ping. These events can be used to easily measure engagement with various parts of the browser.

One drawback is that they quickly add up in terms of storage and transmission cost.

Instead of event data reporting being simply "on" or "off", we could introduce a "summary" option.

This would report scalar *counts* of events, rather than the full sequence of events.

Event data would be recorded as usual, but at report time, the array of event objects would be digested to the form of a keyed scalar for "event counts", with the key being
"<event category>.<event method>.<event object>"
and the value being the count of events of this type.​

So if event reporting is "on", we send the full sequence. If it is "off", we send no event data. If it is "summary", we send the scalar counts of events only.

This would allow efficiently answering questions like:
  What is the MAU and DAU for any given feature that is instrumented with events?
  What fraction of clients use UI feature X
  What is the level of engagement with feature X
(In reply to Mark Reid [:mreid] from comment #0)
> Event data would be recorded as usual, but at report time, [...]

Quick note:
This would have memory impact, we should probably not record the events into storage.
Priority: -- → P2
We need some more time to go through the client design implications.
I'll take the design aspect this iteration.
Assignee: nobody → chutten
Status: NEW → ASSIGNED
Priority: P2 → P1
Storage
-------
The way I see it, we have two options for storage:

Option 1 - Scalar taxonomy
We reserve a portion of the scalar namespace (say telemetry.event.counts.*) for "category_method_object" uint scalars.
Pros:
 - Plain scalars, which have better tooling than keyed scalars
Cons:
 - Bit of a pain to implement.
 - 40-character limit for combined `${category}_${method}_${object}` scalar name means we'll likely be truncating things (30 (category) + 20 (method) + 20 (object) + 2 (underscores) > 40)
 - the typical scalar name taxonomy recommends underscore delimiters which are likely to be used within category, method, and objects: results in unambiguous parsing.
 - '.' is allowed in a category, method, or object but is not permitted in scalar names

Option 2 - Keyed scalar
We define a single keyed uint scalar (say telemetry.event_counts) whose keys are `${category}|${method}|${object}` (or similar, using a non-period non-underscore delimiter).
Pros:
 - We're allowed any characters we want, so unambiguous parsing
 - Cheap to implement
 - 70-character limit for combined key string means we're unlikely to truncate (30 (category) + 20 (method) + 20 (object) + 2 (delimiters) = 72)
Cons:
 - Tooling's not as good (no automated alerts from cerberus+medusa, queries get a wee bit more difficult to write, TMO doesn't nicely display keyed things in its current state (shows only the most popular 4))
 - Limit of 100 keys per subsession means we have a limit of how many events we can summarize

Implementation
--------------
Regardless of storage, we're using scalars to count uses of Telemetry Events. This means we will be instrumenting RecordEvent to increment scalars even if that category's event recording is disabled. I do not think we will want to record the events themselves, even to throw them away later, but simply increment a scalar and move on. This will be release collection and will not expire.

We will want a little bit of introspective Telemetry on top of this as well. If we go with Option 1 we should consider reporting the number of different category+method+object tuples we map to the same scalar name. If we go with Option 2 we should consider reporting the number of category+method+object tuples we failed to summarize due to key exhaustion.

The original request was to control this through a preference. If we choose Option 1, I recommend we include a preference in case we wish to turn this off. There are ~27^70 possible scalars that could be automatically created should this go awry, and upstream resource requirements could expand as a result. If we choose Option 2, the 100-key limit and single scalar limit the impact of runaway use, so I don't think we will need a panic pref.

In terms of ping size impact, both options will introduce long strings and small numbers to the "main" ping. I do not anticipate this ballooning the "main" ping's size sufficiently to register in the budget dashboards or trigger "ping size exceeds limit" discarding conditions.

We will need to test this summarization, likely within existing Telemetry Events gtests and/or xpcshell tests.

We will need to update the Events documentation to include information on this summarization.

Recommendation
--------------
I recommend Option 2. It should prove quicker to implement, and most of its concerns are with tooling we plan on improving this year.

Target
------
Nightly 61 early enough to be uplifted to mid-Beta 60.

Future
------
:gfritzsche, :mreid - Do you have any questions, concerns, comments, clarifications, or criticisms? If you can get me this feedback before the Event Telemetry meeting tomorrow, I can present a plan at that time and start filing implementation bugs.
Flags: needinfo?(mreid)
Flags: needinfo?(gfritzsche)
Jan-Erik, with your fresh perspective, does this design make sense?
Flags: needinfo?(jrediger)
This looks good.
Two things here:
1) We can use an internal flag for recording the keyed scalars, so we can skip the key length limit. This can be "private" as we don't need to use the "public" API in Telemetry.h.
2) How does this affect dynamic/addon events?
3) We need to make sure search events data is not showing up in the aggregator.
4) Related to 2) & 3), we'll need to have conversations if we can show all dynamic events on the TMO dashboard. Are some of them non-public like when running search experiments?
Flags: needinfo?(gfritzsche)
How can i tell which process the event came from?
Option 2 sounds good to me!

Should we consider increasing the max length of keys to 72 characters to avoid possible truncation? (Note I saw Georg's suggestion after I wrote this - that sounds fine too).

Are the planned improvements to tooling around keyed scalars already captured by existing bugs?
Flags: needinfo?(mreid)
Option 2 indeed sounds doable.
I'd go with the slightly increased length limit as well to avoid truncation (given it's 2 characters only).

Do we have a rough idea how quickly we would hit the 100-item limit or is that to be found out once we have the data collected?
Flags: needinfo?(jrediger)
(In reply to Georg Fritzsche (slow to respond) [:gfritzsche] from comment #6)
> 1) We can use an internal flag for recording the keyed scalars, so we can
> skip the key length limit. This can be "private" as we don't need to use the
> "public" API in Telemetry.h.
(In reply to Mark Reid [:mreid] from comment #8)
> Should we consider increasing the max length of keys to 72 characters to
> avoid possible truncation? (Note I saw Georg's suggestion after I wrote this
> - that sounds fine too).
(In reply to Jan-Erik Rediger [:janerik] from comment #9)
> I'd go with the slightly increased length limit as well to avoid truncation
> (given it's 2 characters only).

Okay, okay, I get it! :D

> 2) How does this affect dynamic/addon events?

This treats them identically to static events.

> 3) We need to make sure search events data is not showing up in the
> aggregator.

Javaun's starting an email thread with Legal and BD to see if this is actually necessary.

> 4) Related to 2) & 3), we'll need to have conversations if we can show all
> dynamic events on the TMO dashboard. Are some of them non-public like when
> running search experiments?

You're right. At present Events has no expectation of publication. It would be surprising to suddenly have those event counts public.

(In reply to Georg Fritzsche (slow to respond) [:gfritzsche] from comment #7)
> How can i tell which process the event came from?

At present, you can't. This might not be a problem for most uses, but we can prepend the process and another delimiter to the key (since we'll be extending the key length anyway) to make it possible to tell.

(In reply to Mark Reid [:mreid] from comment #8)
> Are the planned improvements to tooling around keyed scalars already
> captured by existing bugs?

TMO's improvements are still in planning documents. Bugs will come later.

(In reply to Jan-Erik Rediger [:janerik] from comment #9)
> Do we have a rough idea how quickly we would hit the 100-item limit or is
> that to be found out once we have the data collected?

It'll be easier to find out post-facto, especially since we don't have a whole lot of static events coming in. I mean, I've counted them on the existing events dataset (https://sql.telemetry.mozilla.org/queries/51963/source) and the max there is under twenty... but at present we're hardly using Events at all.

I expect the use of Events will grow, and I'll be able to assume that this summary thing will be one of the reasons why :)

---

Thank you, all, for your prompt and helpful input!

I will proceed with Option 2 with three addenda:
1) Extend the key length for either all keyed scalars or just this keyed scalar to be just big enough to fit everything including the delimiters. I'd prefer to keep this added capability to just the events summary, but we'll see what's clearest when we get into the bits.
2) The key will now be `${process}|${category}|${method}|${object}` to identify from which process the event came. This is definitely not a concern for existing static events (search), but there's nothing saying it couldn't be important to dynamic events or future static events.
3) Forbid python_mozaggregator from aggregating or publishing (or both) the events summary scalar. We may be able to relax this requirement later, but for now it seems prudent to begin by limiting analysis to internal tools.
(In reply to Chris H-C :chutten from comment #4)
>  - Limit of 100 keys per subsession means we have a limit of how many events
> we can summarize

What are we protecting against with this limit? Do we need this?
Is this concern different from how we think about e.g. dynamic scalars?
(In reply to Chris H-C :chutten from comment #10)
> I will proceed with Option 2 with three addenda:

Can you summarize with an updated plan for easier reading?

> 2) The key will now be `${process}|${category}|${method}|${object}` to
> identify from which process the event came. This is definitely not a concern
> for existing static events (search), but there's nothing saying it couldn't
> be important to dynamic events or future static events.

We already have an existing internal naming pattern of "category#method#object" in `UniqueEventName()`. Should we re-use that pattern?
(In reply to Georg Fritzsche (slow to respond) [:gfritzsche] from comment #11)
> (In reply to Chris H-C :chutten from comment #4)
> >  - Limit of 100 keys per subsession means we have a limit of how many events
> > we can summarize
> 
> What are we protecting against with this limit? Do we need this?
> Is this concern different from how we think about e.g. dynamic scalars?

We're protecting against accidentally ballooning the size of "main" pings with too many keys. The idea being, I presume, that dynamic scalars will be sent by fewer users than All Of Them whereas scalars in Scalars.yaml will indeed be send by All Of Them.

The event summary falls into the All Of Them category, so having this protection is probably worth potential data loss (especially since this is "just" a summary)

(In reply to Georg Fritzsche (slow to respond) [:gfritzsche] from comment #12)
> (In reply to Chris H-C :chutten from comment #10)
> > I will proceed with Option 2 with three addenda:
> 
> Can you summarize with an updated plan for easier reading?

I'll put that in the next comment, sure.

> > 2) The key will now be `${process}|${category}|${method}|${object}` to
> > identify from which process the event came. This is definitely not a concern
> > for existing static events (search), but there's nothing saying it couldn't
> > be important to dynamic events or future static events.
> 
> We already have an existing internal naming pattern of
> "category#method#object" in `UniqueEventName()`. Should we re-use that
> pattern?

I'm happy to take nearly any letter-width, printing, one-byte delimiter that isn't a dot or a hyphen. '#' fits the bill, so I'll gladly use it instead of pipe.
THE PLAN
--------

Define a single keyed uint scalar to count (process, category, method, object) tuples that want to be recorded. These will be counted whether that category has been enabled to record or not. These will include `dynamic` scalars. These counts will be reported on all channels (opt-out).

This will necessitate a few extra things that might not be immediately apparent:
 - Extend scalar keys' character limit for either all keys scalars-wide or just these keys so we can fit the whole (process, category, method, object) tuple in a string with delimiters without truncation.
 - Forbid python_mozaggregator from aggregating this particular scalar, as events consumers presently do not expect public publication of their data. The data will still be available in ATMO and sql.tmo as other non-aggregated data is, so it'll still be very useful.
 - Count and report the number of event tuples we can't count because we've reached the "maximum number of keys (100)" limit. This can and should be aggregated by python_mozaggregator to enable regression detection (cerberus) and general "just looking at things on TMO to see if they're working" use cases.


...I think that's about it.
(In reply to Chris H-C :chutten from comment #13)
> (In reply to Georg Fritzsche (slow to respond) [:gfritzsche] from comment
> #11)
> > (In reply to Chris H-C :chutten from comment #4)
> > >  - Limit of 100 keys per subsession means we have a limit of how many events
> > > we can summarize
> > 
> > What are we protecting against with this limit? Do we need this?
> > Is this concern different from how we think about e.g. dynamic scalars?
> 
> We're protecting against accidentally ballooning the size of "main" pings
> with too many keys. The idea being, I presume, that dynamic scalars will be
> sent by fewer users than All Of Them whereas scalars in Scalars.yaml will
> indeed be send by All Of Them.

The current first stage of Project Savant (Firefox UI metrics) is looking to add >70 events, other events might get added in parallel (Shield, Lockbox, DevTools, ...).
I'm worried that this limit is too low to start with and we need to raise it (just for the events summary counts).
Dynamic events have the interesting property of being enabled just before they're recorded, so come to think of it they might not benefit from summarization. Is Savant going to dynamically-register, or are they static? 

What would you think if I made a further change to the plan and disallowed summarization of dynamic events, due to them always being enabled when recording? That would give us more space in the summary, regardless of what we decide wrt limits.
(In reply to Chris H-C :chutten from comment #16)
> Dynamic events have the interesting property of being enabled just before
> they're recorded, so come to think of it they might not benefit from
> summarization. Is Savant going to dynamically-register, or are they static? 

It's registering dynamically for the first iteration, but i assume this might become standard Firefox instrumentation and grow.

> What would you think if I made a further change to the plan and disallowed
> summarization of dynamic events, due to them always being enabled when
> recording? That would give us more space in the summary, regardless of what
> we decide wrt limits.

I think that's breaking the promise of "we'll always summarize your events".
The convenient pitch here is to always have your events summarized in the main ping, no matter whether we send them or on what ping.

The 100-key limit is currently in place for the standard keyed scalar usage.
For the event summary, keyed scalars are just a serialization/transport detail.
We should enable event usage and make it easy to use event summary counts, without thinking much about it.

If we need a limit, let's look at it from the perspective of "what's an acceptable upper bound for the ping size impact".
(In reply to Georg Fritzsche (away Mar 16 - 26) [:gfritzsche] from comment #17)
> If we need a limit, let's look at it from the perspective of "what's an
> acceptable upper bound for the ping size impact".

I argue we should have a limit. So let's crunch some numbers.
The scalar name is going to be at most 30 bytes.
Each key will be <process name length> + <delimiter length> + 72 (see comment #4) = 83 ("extension" is the longest process name at present)
Value size and punctuation... let's round up to 100 bytes.

100 keys is under 10k. 1000 keys is under 100k.

What would you like a limit to be? We have room: https://mzl.la/2FRHCGL

mreid: in your opinion, what would be a sensible and prudent (maximum) size increase per ping for this feature?
Flags: needinfo?(mreid)
(In reply to Chris H-C :chutten from comment #13)
> > > 2) The key will now be `${process}|${category}|${method}|${object}` to
> > > identify from which process the event came. This is definitely not a concern
> > > for existing static events (search), but there's nothing saying it couldn't
> > > be important to dynamic events or future static events.
> > 
> > We already have an existing internal naming pattern of
> > "category#method#object" in `UniqueEventName()`. Should we re-use that
> > pattern?
> 
> I'm happy to take nearly any letter-width, printing, one-byte delimiter that
> isn't a dot or a hyphen. '#' fits the bill, so I'll gladly use it instead of
> pipe.

One thing came to mind:
We don't need to put the process name into the key.
Instead we can put the keyed scalar in the right process payloads (payload.processes.{main,...}), matching the semantics we use elsewhere.
(In reply to Georg Fritzsche (away Mar 16 - 26) [:gfritzsche] from comment #19)
> One thing came to mind:
> We don't need to put the process name into the key.
> Instead we can put the keyed scalar in the right process payloads
> (payload.processes.{main,...}), matching the semantics we use elsewhere.

That'll require a deeper approach than what I was currently trying, but it would almost certainly be worth it.

Oh, and I bothered :mreid and others about limits at the Events Meeting and we collectively came to the decision that the limit should:
1) Be controlled by a (hidden) pref reported via the Environment (so that we have an off switch and reporting of when it's used)
2) Default to 500 (since 100 is too low, and we're really only trying to save infra while we patch the offender)
Flags: needinfo?(mreid)
Comment on attachment 8963315 [details]
bug 1440673 - Fix off-by-one errors in keyed scalars

https://reviewboard.mozilla.org/r/232216/#review237814

Great catch!
Attachment #8963315 - Flags: review?(alessio.placitelli) → review+
Comment on attachment 8963316 [details]
bug 1440673 - Allow scalar keys to be an extra 2 chars long

https://reviewboard.mozilla.org/r/232218/#review237816

::: toolkit/components/telemetry/TelemetryScalar.cpp:925
(Diff revision 1)
>        break;
>      case ScalarResult::KeyIsEmpty:
>        errorMessage.AppendLiteral(u" - The key must not be empty.");
>        break;
>      case ScalarResult::KeyTooLong:
> -      errorMessage.AppendLiteral(u" - The key length must be limited to 70 characters.");
> +      errorMessage.AppendLiteral(u" - The key length must be limited to 72 characters.");

At one point this could be refactored to use `nsPrintfCString` ([example](https://searchfox.org/mozilla-central/rev/7e663b9fa578d425684ce2560e5fa2464f504b34/toolkit/components/telemetry/TelemetryScalar.cpp#2442-2445)) and make this function less ugly. That "one point" is probably not today ;-) If you feel like trying that in this bug, feel free to... but you don't *have* to :)
Attachment #8963316 - Flags: review?(alessio.placitelli) → review+
Comment on attachment 8963317 [details]
bug 1440673 - Allow TelemetryScalar.h to be included in tests

https://reviewboard.mozilla.org/r/232220/#review237822
Attachment #8963317 - Flags: review?(alessio.placitelli) → review+
Comment on attachment 8963318 [details]
bug 1440673 - Summarize events to a keyed scalar

https://reviewboard.mozilla.org/r/232222/#review237836

::: toolkit/components/telemetry/TelemetryEvent.cpp:485
(Diff revision 1)
>      return RecordEventResult::Ok;
>    }
>  
> -  // Check whether the extra keys passed are valid.
> -  if (!CheckExtraKeysValid(*eventKey, extra)) {
> -    return RecordEventResult::InvalidExtraKey;
> +  // Count the number of times this event has been recorded, even if its
> +  // category does not have recording enabled.
> +  TelemetryScalar::SummarizeEvent(UniqueEventName(category, method, object),

Is there a particular reason why we can't simply use `TelemetryScalar::ScalarAdd` here? To save typing the scalar name and options every single time, we could create an helper function, if needed, in this module.

::: toolkit/components/telemetry/TelemetryScalar.cpp:2517
(Diff revision 1)
>    }
>    return NS_OK;
>  }
>  
>  /**
> + * Count in Scalars how many of which events were recorded. See bug 1440673

nit: would you mind documenting the parameters? I'm especially interested in aUniqueEventName, maybe let's mention we expect it to be in a particular format.

::: toolkit/components/telemetry/TelemetryScalar.cpp:2521
(Diff revision 1)
>  /**
> + * Count in Scalars how many of which events were recorded. See bug 1440673
> + */
> +void
> +TelemetryScalar::SummarizeEvent(const nsCString& aUniqueEventName,
> +                                ProcessID aProcessType, bool aDynamic)

I think we can drop the `bool aDynamic` part off the signature, see below.

::: toolkit/components/telemetry/TelemetryScalar.cpp:2529
(Diff revision 1)
> +  if (!XRE_IsParentProcess()) {
> +    return;
> +  }
> +
> +  StaticMutexAutoLock lock(gTelemetryScalarsMutex);
> +  ScalarKey scalarKey{static_cast<uint32_t>(ScalarID::TELEMETRY_EVENT_COUNTS), aDynamic};

`TELEMETRY_EVENT_COUNTS` is a "static"/"built-in" scalar, because it's defined in the `Scalars.yaml` file. For this reason, I think that `aDynamic` should rather be just `false`. This would be `true` only if we'd be trying to accumulate to a scalar that was registered dynamically, which is not true in this case.

Does this make sense?
Comment on attachment 8963319 [details]
bug 1440673 - Allow changing the max number of keys per-keyed-scalar

https://reviewboard.mozilla.org/r/232224/#review237840

::: toolkit/components/telemetry/TelemetryScalar.cpp:656
(Diff revision 1)
>    nsresult GetValue(nsTArray<KeyValuePair>& aValues) const;
>  
>    // To measure the memory stats.
>    size_t SizeOfIncludingThis(mozilla::MallocSizeOf aMallocSizeOf);
>  
> +  // To permit more keys than normal

nit: missing trailing dot
Attachment #8963319 - Flags: review?(alessio.placitelli) → review+
Comment on attachment 8963320 [details]
bug 1440673 - Set the max number of event summary keys by pref

https://reviewboard.mozilla.org/r/232226/#review237842

::: toolkit/components/telemetry/TelemetryScalar.cpp:2553
(Diff revision 1)
>    }
>  
> +  static uint32_t sMaxEventSummaryKeys =
> +    Preferences::GetUint("toolkit.telemetry.maxEventSummaryKeys", 500);
> +
> +  // Set this each time as it may have been cleared and recreated between calls

Does this have any perf implication? I don't think we have such high frequency events at the moment, so this might not be an issue at all.

Is it worth registering for changes on the pref and only setting it when the value changes?
Comment on attachment 8963321 [details]
bug 1440673 - Test event summary scalar collection

https://reviewboard.mozilla.org/r/232228/#review237844

If we end up using `TelemetryScalar::ScalarAdd`, I would rather craete a new cpp file called `TelemetryEvents.cpp` and test summaries there, directly calling the `TelemetryEvent::RecordEvent` function. This has more to do with events than scalars (except for the key limit part, which I would leave in the scalars test code).

Since we're now in the multi-process era, but we just want to store event summaries in the parent process, do you think it's worth adding a browser test to make sure this works as expected? For an example, see [here](https://searchfox.org/mozilla-central/rev/7e663b9fa578d425684ce2560e5fa2464f504b34/toolkit/components/telemetry/tests/browser/browser_DynamicScalars.js#51-58). It's fine if you think it's not worth the effort.

::: toolkit/components/telemetry/tests/gtest/TestScalars.cpp:254
(Diff revision 1)
>      ASSERT_TRUE(keyedSnapshot.isUndefined()) << "No keyed scalar must be recorded";
>    }
>  #endif
>  }
> +
> +TEST_F(TelemetryTestFixture, ScalarEventSummary) {

Can we add a separate test for controlling the key limit through the pref?

::: toolkit/components/telemetry/tests/gtest/TestScalars.cpp:269
(Diff revision 1)
> +
> +  // Check the recorded value.
> +  JS::RootedValue scalarsSnapshot(cx.GetJSContext());
> +  GetScalarsSnapshot(true, cx.GetJSContext(), &scalarsSnapshot);
> +
> +  CheckKeyedUintScalar(kScalarName, kLongestEvent, cx.GetJSContext(), scalarsSnapshot, 1);

Since we want the summary to happen for events regardless of their active status, let's test that counting works for both enabled and disabled events.
Comment on attachment 8963318 [details]
bug 1440673 - Summarize events to a keyed scalar

https://reviewboard.mozilla.org/r/232222/#review237836

> Is there a particular reason why we can't simply use `TelemetryScalar::ScalarAdd` here? To save typing the scalar name and options every single time, we could create an helper function, if needed, in this module.

After discussing this with :chutten, turns out we need to define a custom function to make sure each summary gets assigned to the right process. This can't be done with `ScalarAdd`. Let's leave a comment in `SummarizeEvent` to make sure we don't forget that.
(In reply to Alessio Placitelli [:Dexter] from comment #29)
> Comment on attachment 8963316 [details]
> bug 1440673 - Allow scalar keys to be an extra 2 chars long
> 
> https://reviewboard.mozilla.org/r/232218/#review237816
> 
> ::: toolkit/components/telemetry/TelemetryScalar.cpp:925
> (Diff revision 1)
> >        break;
> >      case ScalarResult::KeyIsEmpty:
> >        errorMessage.AppendLiteral(u" - The key must not be empty.");
> >        break;
> >      case ScalarResult::KeyTooLong:
> > -      errorMessage.AppendLiteral(u" - The key length must be limited to 70 characters.");
> > +      errorMessage.AppendLiteral(u" - The key length must be limited to 72 characters.");
> 
> At one point this could be refactored to use `nsPrintfCString`
> ([example](https://searchfox.org/mozilla-central/rev/
> 7e663b9fa578d425684ce2560e5fa2464f504b34/toolkit/components/telemetry/
> TelemetryScalar.cpp#2442-2445)) and make this function less ugly. That "one
> point" is probably not today ;-) If you feel like trying that in this bug,
> feel free to... but you don't *have* to :)

Filed bug 1450098
Comment on attachment 8963320 [details]
bug 1440673 - Set the max number of event summary keys by pref

https://reviewboard.mozilla.org/r/232226/#review237842

> Does this have any perf implication? I don't think we have such high frequency events at the moment, so this might not be an issue at all.
> 
> Is it worth registering for changes on the pref and only setting it when the value changes?

I'm not sure I understand. High frequency events mean we'll be taking the static value of 500 and setting it to the same KeyedScalar over and over again... but setters are very cheap.

Do you mean the interaction with Preferences? That's done just once, the first time this function is called.
Comment on attachment 8963321 [details]
bug 1440673 - Test event summary scalar collection

https://reviewboard.mozilla.org/r/232228/#review237844

> Can we add a separate test for controlling the key limit through the pref?

The preference is just checked the once. Does our gtest harness restart the process for each test?

> Since we want the summary to happen for events regardless of their active status, let's test that counting works for both enabled and disabled events.

Hm, these tests are (presently) just unit tests for the TelemetryScalar API. 

Looks like I'm missing coverage for the TelemetryEvent portions, though. I can add that.
Comment on attachment 8963320 [details]
bug 1440673 - Set the max number of event summary keys by pref

https://reviewboard.mozilla.org/r/232226/#review238164
Attachment #8963320 - Flags: review?(alessio.placitelli) → review+
Comment on attachment 8964683 [details]
bug 1440673 - Test Event Summarization in xpcshell

https://reviewboard.mozilla.org/r/233374/#review238968


Code analysis found 2 defects in this patch:
 - 2 defects found by mozlint

You can run this analysis locally with:
 - `./mach lint path/to/file` (JS/Python)


If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: toolkit/components/telemetry/tests/unit/test_TelemetryEvents.js:47
(Diff revision 1)
> + * @param clearScalars - true if you want to clear the scalars
> + */
> +function checkEventSummary(summaries, clearScalars) {
> +  let scalars = Telemetry.snapshotKeyedScalars(OPTOUT, clearScalars);
> +  dump(JSON.stringify(summaries));
> +  for (let [process, [category, eObject, method, ...rest], count] of summaries) {

Error: 'rest' is assigned a value but never used. [eslint: no-unused-vars]

::: toolkit/components/telemetry/tests/unit/test_TelemetryEvents.js:147
(Diff revision 1)
>    }
>  
> +  // Check that the events were summarized properly.
> +  let summaries = {};
> +  expected.forEach(({optout, event}) => {
> +    let [category, eObject, method, ...rest] = event;

Error: 'rest' is assigned a value but never used. [eslint: no-unused-vars]
Comment on attachment 8964717 [details]
bug 1440673 - Test Event Summary's key limit pref

https://reviewboard.mozilla.org/r/233440/#review239024


Code analysis found 4 defects in this patch:
 - 4 defects found by mozlint

You can run this analysis locally with:
 - `./mach lint path/to/file` (JS/Python)


If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: toolkit/components/telemetry/tests/unit/test_TelemetryEvents.js:47
(Diff revision 1)
>   * @param clearScalars - true if you want to clear the scalars
>   */
>  function checkEventSummary(summaries, clearScalars) {
>    let scalars = Telemetry.snapshotKeyedScalars(OPTOUT, clearScalars);
>    dump(JSON.stringify(summaries));
>    for (let [process, [category, eObject, method, ...rest], count] of summaries) {

Error: 'rest' is assigned a value but never used. [eslint: no-unused-vars]

::: toolkit/components/telemetry/tests/unit/test_TelemetryEvents.js:82
(Diff revision 1)
> +      methods: ["testMethod"],
> +      objects,
> +      record_on_release: true,
> +    }
> +  });
> +  for (object of objects) {

Error: 'object' is not defined. [eslint: no-undef]

::: toolkit/components/telemetry/tests/unit/test_TelemetryEvents.js:83
(Diff revision 1)
> +      objects,
> +      record_on_release: true,
> +    }
> +  });
> +  for (object of objects) {
> +    Telemetry.recordEvent("telemetry.test.dynamic", "testMethod", object);

Error: 'object' is not defined. [eslint: no-undef]

::: toolkit/components/telemetry/tests/unit/test_TelemetryEvents.js:185
(Diff revision 1)
>    }
>  
>    // Check that the events were summarized properly.
>    let summaries = {};
>    expected.forEach(({optout, event}) => {
>      let [category, eObject, method, ...rest] = event;

Error: 'rest' is assigned a value but never used. [eslint: no-unused-vars]
Comment on attachment 8964683 [details]
bug 1440673 - Test Event Summarization in xpcshell

https://reviewboard.mozilla.org/r/233374/#review239026


Code analysis found 1 defect in this patch:
 - 1 defect found by mozlint

You can run this analysis locally with:
 - `./mach lint path/to/file` (JS/Python)


If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: toolkit/components/telemetry/tests/unit/test_TelemetryEvents.js:147
(Diff revision 2)
>    }
>  
> +  // Check that the events were summarized properly.
> +  let summaries = {};
> +  expected.forEach(({optout, event}) => {
> +    let [category, eObject, method, ...rest] = event;

Error: 'rest' is assigned a value but never used. [eslint: no-unused-vars]
Comment on attachment 8964717 [details]
bug 1440673 - Test Event Summary's key limit pref

https://reviewboard.mozilla.org/r/233440/#review239028


Code analysis found 3 defects in this patch:
 - 3 defects found by mozlint

You can run this analysis locally with:
 - `./mach lint path/to/file` (JS/Python)


If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: toolkit/components/telemetry/tests/unit/test_TelemetryEvents.js:82
(Diff revision 2)
> +      methods: ["testMethod"],
> +      objects,
> +      record_on_release: true,
> +    }
> +  });
> +  for (object of objects) {

Error: 'object' is not defined. [eslint: no-undef]

::: toolkit/components/telemetry/tests/unit/test_TelemetryEvents.js:83
(Diff revision 2)
> +      objects,
> +      record_on_release: true,
> +    }
> +  });
> +  for (object of objects) {
> +    Telemetry.recordEvent("telemetry.test.dynamic", "testMethod", object);

Error: 'object' is not defined. [eslint: no-undef]

::: toolkit/components/telemetry/tests/unit/test_TelemetryEvents.js:185
(Diff revision 2)
>    }
>  
>    // Check that the events were summarized properly.
>    let summaries = {};
>    expected.forEach(({optout, event}) => {
>      let [category, eObject, method, ...rest] = event;

Error: 'rest' is assigned a value but never used. [eslint: no-unused-vars]
Comment on attachment 8963318 [details]
bug 1440673 - Summarize events to a keyed scalar

https://reviewboard.mozilla.org/r/232222/#review239120
Attachment #8963318 - Flags: review?(alessio.placitelli) → review+
Comment on attachment 8964680 [details]
bug 1440673 - Permit snapshotting non-parent-process scalars

https://reviewboard.mozilla.org/r/233368/#review239124
Attachment #8964680 - Flags: review?(alessio.placitelli) → review+
Comment on attachment 8964681 [details]
bug 1440673 - Summarize dynamic events to a dynamic scalar

https://reviewboard.mozilla.org/r/233370/#review239128

This looks good, thanks! Can we please add some documentation about this new feature to the [events' docs](https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/telemetry/collection/events.html)? Let's call out what this is about, why, highlight that static/dynamic summaries go to different scalars and mention the pref to tweak the number of allowed summary keys. Let's have this as a separate commit!
Attachment #8964681 - Flags: review?(alessio.placitelli) → review+
Comment on attachment 8964682 [details]
bug 1440673 - Test that dynamic events are summarized to a dynamic scalar

https://reviewboard.mozilla.org/r/233372/#review239134
Attachment #8964682 - Flags: review?(alessio.placitelli) → review+
Comment on attachment 8964683 [details]
bug 1440673 - Test Event Summarization in xpcshell

https://reviewboard.mozilla.org/r/233374/#review239136

::: commit-message-3cb0f:5
(Diff revision 2)
> +bug 1440673 - Test Event Summarization in xpcshell r?Dexter
> +
> +DO NOT YET LAND
> +
> +Needs tests for dynamic events which will conflict with bug 1450690

Is this still the case, given that you added them in the gtest suite?
Attachment #8964683 - Flags: review?(alessio.placitelli) → review+
Comment on attachment 8964717 [details]
bug 1440673 - Test Event Summary's key limit pref

https://reviewboard.mozilla.org/r/233440/#review239138
Attachment #8964717 - Flags: review?(alessio.placitelli) → review+
Comment on attachment 8963321 [details]
bug 1440673 - Test event summary scalar collection

https://reviewboard.mozilla.org/r/232228/#review239142
Attachment #8963321 - Flags: review?(alessio.placitelli) → review+
Comment on attachment 8964683 [details]
bug 1440673 - Test Event Summarization in xpcshell

https://reviewboard.mozilla.org/r/233374/#review239136

> Is this still the case, given that you added them in the gtest suite?

The gtests cases test the TelemetryScalar portions. test_TelemetryEvents.js tests the TelemetryEvent portions.

I need coverage for the dynamic cases of the TelemetryEvent parts of the code. Shouldn't be long.
Comment on attachment 8964683 [details]
bug 1440673 - Test Event Summarization in xpcshell

https://reviewboard.mozilla.org/r/233374/#review239704


Code analysis found 1 defect in this patch:
 - 1 defect found by mozlint

You can run this analysis locally with:
 - `./mach lint path/to/file` (JS/Python)


If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: toolkit/components/telemetry/tests/unit/test_TelemetryEvents.js:147
(Diff revision 3)
>    }
>  
> +  // Check that the events were summarized properly.
> +  let summaries = {};
> +  expected.forEach(({optout, event}) => {
> +    let [category, eObject, method, ...rest] = event;

Error: 'rest' is assigned a value but never used. [eslint: no-unused-vars]
Comment on attachment 8964717 [details]
bug 1440673 - Test Event Summary's key limit pref

https://reviewboard.mozilla.org/r/233440/#review239706


Code analysis found 3 defects in this patch:
 - 3 defects found by mozlint

You can run this analysis locally with:
 - `./mach lint path/to/file` (JS/Python)


If you see a problem in this automated review, please report it here: http://bit.ly/2y9N9Vx


::: toolkit/components/telemetry/tests/unit/test_TelemetryEvents.js:82
(Diff revision 3)
> +      methods: ["testMethod"],
> +      objects,
> +      record_on_release: true,
> +    }
> +  });
> +  for (object of objects) {

Error: 'object' is not defined. [eslint: no-undef]

::: toolkit/components/telemetry/tests/unit/test_TelemetryEvents.js:83
(Diff revision 3)
> +      objects,
> +      record_on_release: true,
> +    }
> +  });
> +  for (object of objects) {
> +    Telemetry.recordEvent("telemetry.test.dynamic", "testMethod", object);

Error: 'object' is not defined. [eslint: no-undef]

::: toolkit/components/telemetry/tests/unit/test_TelemetryEvents.js:185
(Diff revision 3)
>    }
>  
>    // Check that the events were summarized properly.
>    let summaries = {};
>    expected.forEach(({optout, event}) => {
>      let [category, eObject, method, ...rest] = event;

Error: 'rest' is assigned a value but never used. [eslint: no-unused-vars]
Depends on: 1451779
Comment on attachment 8963318 [details]
bug 1440673 - Summarize events to a keyed scalar

Data Collection Review Request:


    What questions will you answer with this data?
How many of which events are recorded over the course of a subsession?
How quickly is an instrumented feature taken up by users?
How many events are we losing due to categories being disabled, or hitting the built-in event limit?
How often are users X? How many users X? (where X is an activity instrumented by Telemetry Events)

    Why does Mozilla need to answer these questions? Are there benefits for users? Do we need this information to address product or business requirements?
Our current Event Telemetry mechanism has a hard limit on the number of events it can record before it truncates. We need to know the shape of the data we're missing. We also don't always enable all event categories for all users, so knowing the shape of the events that aren't enabled is of value to make decisions.

    What alternative methods did you consider to answer these questions? Why were they not sufficient?
We could always for all users record all events, but that could become expensive as we anticipate an increase in the use of Event Telemetry in the coming quarters.

    Can current instrumentation answer these questions?
No. We can't know what we don't know, and our truncations and category disabling limit what we know.

    List all proposed measurements and indicate the category of data collection for each measurement, using the Firefox data collection categories on the Mozilla wiki.

telemetry.{dynamic_}event_counts keyed scalars | Category 2 | bug 1440673

    How long will this data be collected? Choose one of the following:

I want to permanently monitor this data. (:chutten)

    What populations will you measure?

All. No filters.

    If this data collection is default on, what is the opt-out mechanism for users?

Standard Telemetry Mechanism.

    Please provide a general description of how you will analyze this data.

Queries on sql.tmo graphing event counts over time to measure feature uptake, cross-sectional analyses of "how do users who X compare with users who don't?"... me, personally, I'm not going to analyze it often. I expect this to be useful to consumers of Event Telemetry.

    Where do you intend to share the results of your analysis?

I expect these analyses to be in the usual places: re:dash, maybe MDC... anywhere Event Telemetry analyses are published today.
Attachment #8963318 - Flags: review?(francois)
See Also: → 1451813
Blocks: 1451814
Comment on attachment 8963318 [details]
bug 1440673 - Summarize events to a keyed scalar

(In reply to Chris H-C :chutten from comment #77)

1) Is there or will there be **documentation** that describes the schema for the ultimate data set available publicly, complete and accurate?

Yes, Scalars.yaml.

2) Is there a control mechanism that allows the user to turn the data collection on and off?

Yes, telemetry setting.

3) If the request is for permanent data collection, is there someone who will monitor the data over time?**

Yes, :chutten.

4) Using the **[category system of data types](https://wiki.mozilla.org/Firefox/Data_Collection)** on the Mozilla wiki, what collection type of data do the requested measurements fall under?  **

Category 2.

5) Is the data collection request for default-on or default-off?

Default on, all channels.

6) Does the instrumentation include the addition of **any *new* identifiers** (whether anonymous or otherwise; e.g., username, random IDs, etc.  See the appendix for more details)?

No.

7) Is the data collection covered by the existing Firefox privacy notice?

Yes

8) Does there need to be a check-in in the future to determine whether to renew the data?

No, permanent.
Attachment #8963318 - Flags: review?(francois) → review+
Pushed by chutten@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/5881e2f7bdbb
Fix off-by-one errors in keyed scalars r=Dexter
https://hg.mozilla.org/integration/autoland/rev/48e639fb8f5d
Allow scalar keys to be an extra 2 chars long r=Dexter
https://hg.mozilla.org/integration/autoland/rev/f7ad01646cdb
Allow TelemetryScalar.h to be included in tests r=Dexter
https://hg.mozilla.org/integration/autoland/rev/f817a9fcc64b
Summarize events to a keyed scalar r=Dexter
https://hg.mozilla.org/integration/autoland/rev/66fa06d26606
Allow changing the max number of keys per-keyed-scalar r=Dexter
https://hg.mozilla.org/integration/autoland/rev/d2f3d55e6f05
Set the max number of event summary keys by pref r=Dexter
https://hg.mozilla.org/integration/autoland/rev/de0ee14920b7
Test event summary scalar collection r=Dexter
https://hg.mozilla.org/integration/autoland/rev/7fad52ffe8b4
Permit snapshotting non-parent-process scalars r=Dexter
https://hg.mozilla.org/integration/autoland/rev/eeeb53bcc761
Summarize dynamic events to a dynamic scalar r=Dexter
https://hg.mozilla.org/integration/autoland/rev/43f220ac60ae
Test that dynamic events are summarized to a dynamic scalar r=Dexter
https://hg.mozilla.org/integration/autoland/rev/eab606063a0d
Test Event Summarization in xpcshell r=Dexter
https://hg.mozilla.org/integration/autoland/rev/59150042cfc8
Test Event Summary's key limit pref r=Dexter
Can we request uplift on this?
Flags: needinfo?(chutten)
When I asked on Monday, :mreid and :Dexter preferred that it ride the trains.
Flags: needinfo?(chutten)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: