Closed Bug 1302663 Opened 8 years ago Closed 8 years ago

Implement Event Telemetry core

Categories

(Toolkit :: Telemetry, defect, P1)

defect
Points:
3

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
firefox52 --- fixed
firefox53 --- fixed

People

(Reporter: gfritzsche, Assigned: gfritzsche)

References

(Blocks 1 open bug, )

Details

(Whiteboard: [measurement:client])

Attachments

(6 files, 10 obsolete files)

863 bytes, text/plain
Details
4.71 KB, text/plain
Details
25.94 KB, patch
gfritzsche
: review+
Details | Diff | Splinter Review
8.91 KB, patch
Dexter
: review+
Details | Diff | Splinter Review
50.31 KB, patch
froydnj
: review+
Details | Diff | Splinter Review
6.40 KB, patch
gfritzsche
: review+
Details | Diff | Splinter Review
For the event registry YAML format definition, we need a script as part of the Firefox build that:
* parses the YAML
* validates the contents
* generates headers with the data for the Firefox code

This would be similar to the Scalars.yaml and Histograms.json handling scripts.
Blocks: 1302664
Priority: P3 → P1
Assignee: nobody → gfritzsche
Points: --- → 3
Attachment #8800778 - Flags: feedback?(alessio.placitelli)
Attached file Example output: TelemetryEventEnums.h (obsolete) —
Attached file Example output: TelemetryEventData.h (obsolete) —
Comment on attachment 8800778 [details] [diff] [review]
Add script to generate headers with event data from Events.yaml

Review of attachment 8800778 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good from an high level view, I mostly pointed out python-style things.

In general, all our Python files should be indented using a 4-spaces indentation per level (see https://www.python.org/dev/peps/pep-0008/#indentation ). The parse_events.py file seems to use 2 spaces per block instead.

::: toolkit/components/telemetry/Events.yaml
@@ +9,5 @@
> +  expiry_date: never
> +  extra_keys:
> +    key1: This is just a test description.
> +    key2: This is another test description.
> +- methods: ["test_optout"]

nit: do you think that adding a space between "- methods" blocks would make the file more readable? I'm not sure if that's even a problem but... just raising the point :)

::: toolkit/components/telemetry/gen-event-data.py
@@ +106,5 @@
> +    print(file_header, file=output)
> +
> +    # Write the extra keys table.
> +    string_table = StringTable()
> +    extra_table = write_extra_table(events, output, string_table)

Can we add a |static_assert| here as we do for the string table? Would that be helpful?

::: toolkit/components/telemetry/parse_events.py
@@ +17,5 @@
> +
> +IDENTIFIER_PATTERN = r'^[a-zA-Z][a-zA-Z0-9_.]+[a-zA-Z0-9]$'
> +
> +def nice_type_name(t):
> +  if t is basestring:

I'm not sure if that works for unicode strings as well (or if it's intended to). To be safe, we could simply change |if t is basestring| to |if isinstance(t, basestring)|, which evaluates to true for both unicode and non-unicode strings. That's also clearer, IMHO.

@@ +24,5 @@
> +
> +def convert_to_cpp_identifier(s, sep):
> +  return string.capwords(s, sep).replace(sep, "")
> +
> +# This is a placeholder type for the checker below.

AFAIK, docstrings for a particular object/function should live within its scope and, in general, be wrapped by triple quotes """ [2].

So this becomes

class OneOf:
    """This is a placeholder type for the checker below.
    It signals that the checked value should match one of the following arguments ....
    """

Same for other function and class comments below.

[2] - https://www.python.org/dev/peps/pep-0257/#what-is-a-docstring

@@ +27,5 @@
> +
> +# This is a placeholder type for the checker below.
> +# It signals that the checked value should match one of the following arguments
> +# passed to the checker constructor.
> +class one_of:

Class names in pythons should use CapWords [1], so this should be OneOf.

[1] - https://www.python.org/dev/peps/pep-0008/#class-names

@@ +30,5 @@
> +# passed to the checker constructor.
> +class one_of:
> +  pass
> +
> +# This implements a convenience type checker to make the validation code more

doc nic: move this inside the object scope and use triple quotes

@@ +32,5 @@
> +  pass
> +
> +# This implements a convenience type checker to make the validation code more
> +# readable.
> +class checker:

s/checker/Checker ... or (nit) maybe something like TypeChecker/Validator?

Moreover, could this be a good candidate to be shared in "shaed_telemetry_utils.py" and also usd by parse_scalars.py? (if so, we can maybe take this to a separate bug)

@@ +33,5 @@
> +
> +# This implements a convenience type checker to make the validation code more
> +# readable.
> +class checker:
> +  # This takes 1-3 arguments, specifying the value type to check for.

doc nic: move this inside the __init__ scope and use triple quotes

@@ +91,5 @@
> +                             nice_type_name(self._args[1]),
> +                             k,
> +                             nice_type_name(type(x)))
> +
> +def type_check_event_fields(category, definition):

nit: missing doc

@@ +246,5 @@
> +
> +  event_list = []
> +
> +  # Events are defined in a fixed two-level hierarchy within the definition file.
> +  # The first level contains the group name, while the second level contains the

nit: I saw you are referring to the "group names" as "categories" in the YAML file. Is the latter the right name? If so, let's use it throughout this document as well to stay consistent.
Attachment #8800778 - Flags: feedback?(alessio.placitelli) → feedback+
Attachment #8800778 - Attachment is obsolete: true
We should at least do the core implementation in this bug as well to make reviewing easier:
* storage handling
* recording & snapshotting API
* adding events to the main ping
* testing through test-only events
Summary: Add script to generate headers with event data from Events.yaml → Implement Event Telemetry core
Attachment #8802115 - Attachment is obsolete: true
Open questions that i'd like feedback on:
- How to best handle ownership for gEventRecords & EventRecord instances?
  Do we have an array equivalent of nsClassHashtable?
- Do we have IDL syntax to directly pass string arrays from JS (without a separate count parameter)?
  Or do we have to resort to jsval for that and convert?

The C++ API will come in a future patch, as will adding the payload to pings and updating about:telemetry.
Attachment #8804718 - Flags: feedback?(nfroyd)
Attached file Example output: TelemetryEventData.h (obsolete) —
Attachment #8800781 - Attachment is obsolete: true
Comment on attachment 8804718 [details] [diff] [review]
Part 2 - Implement Event Telemetry recording

(In reply to Georg Fritzsche [:gfritzsche] from comment #9)
> Created attachment 8804718 [details] [diff] [review]
> Part 2 - Implement Event Telemetry recording
> 
> Open questions that i'd like feedback on:
> - How to best handle ownership for gEventRecords & EventRecord instances?

I don't understand this question.  It looks like gEventRecords grows without bound, which sounds bad?

>   Do we have an array equivalent of nsClassHashtable?

I don't understand this question.

> - Do we have IDL syntax to directly pass string arrays from JS (without a
> separate count parameter)?
>   Or do we have to resort to jsval for that and convert?

We don't have IDL syntax to do what you want.  We can deal with the count parameter, or we can do the jsval thing.

I haven't really looked carefully at the patch, as I have no sense of what this is for, or what all this code is supposed to be doing.  I can look again once you provide some background.  The bug doesn't seem to have any information.

I would *strongly* encourage you to drop the internal_* naming convention and pass along |const StaticMutexAutoLock&| parameters, making it impossible to call supposed-to-be-locked functions without the mutex held.  It doesn't do a whole lot for the deadlock case you mention in the comment (then again, neither does the naming convention...), but it is much easier to understand what's going on if you're not intimately familiar with the Telemetry code and it's much safer to let the compiler verify your locking conventions for you.
Attachment #8804718 - Flags: feedback?(nfroyd)
(In reply to Nathan Froyd [:froydnj] from comment #11)
...
> I haven't really looked carefully at the patch, as I have no sense of what
> this is for, or what all this code is supposed to be doing.  I can look
> again once you provide some background.  The bug doesn't seem to have any
> information.

Sorry, let me add some context here.
This is part of the bug 1286606 dependency tree, for adding event recording to Firefox Telemetry:
https://docs.google.com/document/d/1hNuS9lUJMvMqgntZXbFA6xZBU9zBpQgo7x73-sXKRpI/

Part 1 adds a registry for those events and generates headers with event data from it.
This Part 2 adds the main event recording/storage/serialization in TelemetryEvent.cpp.

> (In reply to Georg Fritzsche [:gfritzsche] from comment #9)
> > Open questions that i'd like feedback on:
> > - How to best handle ownership for gEventRecords & EventRecord instances?
> 
> I don't understand this question.  It looks like gEventRecords grows without
> bound, which sounds bad?

Right, that's why the TODOs and questions are there in Telemetry (the other part is applying hard limits on storage size).
To handle deletions/ownership of EventRecord, the following options come to mind:
- manually delete elements when they are removed from gEventRecords, not so great
- adding ref-counting implementations for use with nsRefPtr or similar, which seems like overkill for internal use in the module
- i can't see a non-intrusive shared pointer in our code base
- nsClassHashtable actually manages the life-time of the contained values, but i can't see an equivalent for our arrays / lists

> I would *strongly* encourage you to drop the internal_* naming convention
> and pass along |const StaticMutexAutoLock&| parameters, making it impossible
> to call supposed-to-be-locked functions without the mutex held.  It doesn't
> do a whole lot for the deadlock case you mention in the comment (then again,
> neither does the naming convention...), but it is much easier to understand
> what's going on if you're not intimately familiar with the Telemetry code
> and it's much safer to let the compiler verify your locking conventions for
> you.

I don't particularly like switching to that format with TelemetryScalar.cpp & TelemetryHistogram.cpp using it now, but i see your point.
Maybe we should just start here and adopt this in the other modules later.
Flags: needinfo?(nfroyd)
(In reply to Georg Fritzsche [:gfritzsche] from comment #12)
> > I would *strongly* encourage you to drop the internal_* naming convention
> > and pass along |const StaticMutexAutoLock&| parameters, making it impossible
> > to call supposed-to-be-locked functions without the mutex held.  It doesn't
> > do a whole lot for the deadlock case you mention in the comment (then again,
> > neither does the naming convention...), but it is much easier to understand
> > what's going on if you're not intimately familiar with the Telemetry code
> > and it's much safer to let the compiler verify your locking conventions for
> > you.
> 
> I don't particularly like switching to that format with TelemetryScalar.cpp
> & TelemetryHistogram.cpp using it now, but i see your point.
> Maybe we should just start here and adopt this in the other modules later.

Julian, do you see specific reasons to not adopt that style?
Flags: needinfo?(jseward)
Comment on attachment 8804705 [details] [diff] [review]
Part 1 - Add script to generate headers with event data from Events.yaml

Review of attachment 8804705 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good with the below addressed. The only big change here is the indentation of the parse_events.py file, which is 2 spaces vs 4 spaces.

::: toolkit/components/telemetry/EventInfo.h
@@ +5,5 @@
> +
> +#ifndef TelemetryEventInfo_h__
> +#define TelemetryEventInfo_h__
> +
> +#include <ctime>

Is this include still needed?

::: toolkit/components/telemetry/gen-event-data.py
@@ +76,5 @@
> +        print("  {%d, %d, %d, %d, %d, %s}," %
> +                (string_table.stringIndex(e.category),
> +                 string_table.stringIndex(e.expiry_version),
> +                 extras[0],  # extra keys index
> +                 extras[1], # extra keys count

nit: align the comments

::: toolkit/components/telemetry/parse_events.py
@@ +18,5 @@
> +IDENTIFIER_PATTERN = r'^[a-zA-Z][a-zA-Z0-9_.]+[a-zA-Z0-9]$'
> +DATE_PATTERN = r'^[0-9]{4}-[0-9]{2}-[0-9]{2}$'
> +
> +def nice_type_name(t):
> +  if isinstance(t, basestring):

Ouch, it looks this file is still using 2 spaces indentation instead of 4 spaces :( Would you mind fixing that?

@@ +45,5 @@
> +    self._kind = kind
> +    self._args = args
> +
> +  def check(self, key, value):
> +    # TODO: Make error message useful by pointing to event entry index or file and line.

Was this TODO left intentionally? If we're planning to do this in a followup bug, would you please file a bug about it?

@@ +139,5 @@
> +  """A class representing one event."""
> +
> +  def __init__(self, category, definition):
> +    # TODO:
> +    # - expiry_version: regex check?

I see that you added this check, so we can remove this comment block.
Attachment #8804705 - Flags: review?(alessio.placitelli) → review+
(In reply to Georg Fritzsche [:gfritzsche] from comment #13)
> (In reply to Georg Fritzsche [:gfritzsche] from comment #12)
> > > I would *strongly* encourage you to drop the internal_* naming convention
> > > and pass along |const StaticMutexAutoLock&| parameters, making it impossible
> > > to call supposed-to-be-locked functions without the mutex held.  It doesn't
> > > do a whole lot for the deadlock case you mention in the comment (then again,
> > > neither does the naming convention...), but it is much easier to understand
> > > what's going on if you're not intimately familiar with the Telemetry code
> > > and it's much safer to let the compiler verify your locking conventions for
> > > you.
> > 
> > I don't particularly like switching to that format with TelemetryScalar.cpp
> > & TelemetryHistogram.cpp using it now, but i see your point.
> > Maybe we should just start here and adopt this in the other modules later.
> 
> Julian, do you see specific reasons to not adopt that style?

Maybe that would be a better thing to do, than use a naming convention.  My
reservation with it is as follows: passing a const StaticMutexAutoLock& everywhere
that needs the lock to be held, allows the compiler to verify that indeed it is held
when necessary.  But it does nothing for the inverse case: it can't verify that
you don't hold the lock when you should not.  The first case is important for
race avoidance, but the second case is important for deadlock avoidance, and
as we've seen, avoiding deadlocks in this game is difficult.

But on the whole -- maybe we should move to that style.  I have no particular
objection.

If anyone can think up a mechanism that allows the compiler to verify not
only the lock-held case but also the lock-not-held case, that would be ideal;
but I can't imagine how to do that.
Flags: needinfo?(jseward)
Flags: needinfo?(nfroyd)
Attachment #8800780 - Attachment is obsolete: true
Attachment #8804800 - Attachment is obsolete: true
Attachment #8804705 - Attachment is obsolete: true
Attachment #8805038 - Flags: review+
(In reply to Alessio Placitelli [:Dexter] from comment #14)
> > +  def check(self, key, value):
> > +    # TODO: Make error message useful by pointing to event entry index or file and line.
> 
> Was this TODO left intentionally? If we're planning to do this in a followup
> bug, would you please file a bug about it?

I'll skip this for now, we can get to it later if it's actually a problem.
Attachment #8804718 - Attachment is obsolete: true
Comment on attachment 8805039 [details] [diff] [review]
Part 2 - Implement Event Telemetry recording

@Alessio, can you review this for having sane Telemetry logic, sufficient test coverage, correct plumbing, correct refactoring?
@Nathan, do you have time for a C++ oriented review, focusing on TelemetryEvent.{h,cpp} & nsITelemetry.idl?

As mentioned before, this adds the C++ implementation for recording, storage & serialization of events in Telemetry per the Event Telemetry document [0].

The main change is adding TelemetryEvent.cpp & test_TelemetryEvent.js, the rest is plumbing, smaller refactoring for sharing code between modules, etc.

The TL;DR of what this is trying to achieve:
- events get registered in Events.yaml
- the python scripts in part 1 validate that and generate TelemetryEventEnums.h & TelemetryEventData.h (example output attached)
- recording of events happens through `nsITelemetry.recordEvent()` (implemented in part 2)
- snapshots of the recorded data can be taken through `nsITelemetry.snapshotEvents()` (and cleared too)
- the recorded data format is:
  [timestamp, category, method, object, stringValue, extraDictionary]
- the recorded events will be send with the main pings (in a future patch)

0: https://docs.google.com/document/d/1hNuS9lUJMvMqgntZXbFA6xZBU9zBpQgo7x73-sXKRpI/
Attachment #8805039 - Flags: review?(nfroyd)
Attachment #8805039 - Flags: review?(alessio.placitelli)
Blocks: 1313326
Blocks: 1313327
No longer blocks: 1302664
Comment on attachment 8805039 [details] [diff] [review]
Part 2 - Implement Event Telemetry recording

Review of attachment 8805039 [details] [diff] [review]:
-----------------------------------------------------------------

This looks good overall, is only missing a couple of edge cases in the test coverage. I've not taken a deep look at TelemetryEvent.cpp, as @nfroyd will take care of that. However, from a quick look I found |TelemetryEvent::RecordEvent| a bit too long. Maybe that can be split?

This is basically r+, but I'm holding back it and giving f+ as I'd love to take another look at the test file with the new additions.

::: toolkit/components/telemetry/TelemetryCommon.cpp
@@ +66,5 @@
>    return false;
>  }
>  
> +nsresult
> +MsSinceProcessStart(double* aResult)

Could you please add documentation? Like if we expect that to fail in some specific cases.

::: toolkit/components/telemetry/TelemetryCommon.h
@@ +7,5 @@
>  #define TelemetryCommon_h__
>  
>  #include "nsTHashtable.h"
>  #include "jsapi.h"
> +#include "nsIScriptError.h"

Why are you including this here? If there's a reason to, maybe we can remove it from TelemetryCommon.cpp

::: toolkit/components/telemetry/TelemetryEvent.h
@@ +8,5 @@
> +
> +#include "mozilla/TelemetryEventEnums.h"
> +
> +// This module is internal to Telemetry. It encapsulates Telemetry's
> +// scalar accumulation and storage logic. It should only be used by

nit: s/scalar/events

@@ +34,5 @@
> +void ClearEvents();
> +
> +size_t SizeOfIncludingThis(mozilla::MallocSizeOf aMallocSizeOf);
> +
> +} // namespace TelemetryScalar

nit: TelemetryEvent

@@ +36,5 @@
> +size_t SizeOfIncludingThis(mozilla::MallocSizeOf aMallocSizeOf);
> +
> +} // namespace TelemetryScalar
> +
> +#endif // TelemetryScalar_h__
\ No newline at end of file

nit: TelemetryEvent_h__

::: toolkit/components/telemetry/TelemetryScalar.cpp
@@ +27,5 @@
>  using mozilla::Telemetry::Common::AutoHashtable;
>  using mozilla::Telemetry::Common::IsExpiredVersion;
>  using mozilla::Telemetry::Common::CanRecordDataset;
>  using mozilla::Telemetry::Common::IsInDataset;
> +using mozilla::Telemetry::Common::LogToBrowserConsole;

Since we're no longer defining the console implementation here, would you please remove the related headers, if possible? Like

#include "nsIConsoleService.h"
#include "nsIScriptError.h"
#include "nsThreadUtils.h"

(these are the ones that you added in TelemetryCommon.cpp)

::: toolkit/components/telemetry/tests/unit/test_TelemetryEvents.js
@@ +1,2 @@
> +/* Any copyright is dedicated to the Public Domain.
> +   http://creativecommons.org/publicdomain/zero/1.0/

This looks good overall, I think we should also cover:

- event storage limits (e.g. what happens when kMaxEventRecords is reached?)
- Unicode characters in strings
Attachment #8805039 - Flags: review?(alessio.placitelli) → feedback+
Attachment #8805036 - Attachment mime type: text/x-chdr → text/plain
Attachment #8805037 - Attachment mime type: text/x-chdr → text/plain
Comment on attachment 8805039 [details] [diff] [review]
Part 2 - Implement Event Telemetry recording

Review of attachment 8805039 [details] [diff] [review]:
-----------------------------------------------------------------

I don't understand the questions from comment 12 about ownership of EventRecords: it looks like we only have them owned in one place?  Unless you're asking whether we should setup different limits for event records per-category or something like that?  I have no particular opinion on that; I worry a little bit that every client of this code is never going to pass |clear=true| when snapshotting records, so they'll all be unwittingly stepping on each other's toes....

Some notes on the patch below.

::: toolkit/components/telemetry/TelemetryEvent.cpp
@@ +76,5 @@
> +
> +const uint32_t kEventCount = mozilla::Telemetry::EventID::EventCount;
> +// This is a special event id used to mark expired events, to make expiry checks
> +// faster at runtime.
> +const uint32_t kExpiredEventId = kEventCount + 1;

Maybe static_assert that kEventCount < kExpiredEventId, to make sure we didn't overflow?

@@ +84,5 @@
> +const uint32_t kMaxEventRecords = 10000;
> +
> +typedef nsDataHashtable<nsCStringHashKey, uint32_t> EventMapType;
> +typedef nsClassHashtable<nsCStringHashKey, nsCString> StringMap;
> +typedef nsTArray<Pair<nsCString, nsCString>> StringPairArray;

I dislike using Pair; is it really too much trouble to define our own struct here so we can have pleasant names throughout?

@@ +103,5 @@
> +    , mValue(value)
> +    , mExtra(extra)
> +  {}
> +
> +  EventRecord(const EventRecord& other)

Usually when you define the copy constructor, you want to define the copy assignment operator as well.  But I don't really see any copying of objects; it seems like everything could be done with move construction and assignment instead, so why not define those?  You ought to be able to just:

  EventRecord(EventRecord&&) = default;
  EventRecord& operator=(EventRecord&&) = default;

@@ +124,5 @@
> +  Maybe<nsCString> mValue;
> +  StringPairArray mExtra;
> +};
> +
> +// Implements the methods for EventInfo.

Thanks for doing a string table for these.

It seems like in many cases, you're just creating nsDependentString out of the return values below.  Maybe consider storing the lengths of the strings as well, and returning nsDependentCString from these methods, so you don't have to continually recalculate the length via strlen?

@@ +163,5 @@
> +size_t
> +EventRecord::SizeOfIncludingThis(mozilla::MallocSizeOf aMallocSizeOf) const
> +{
> +  size_t n = 0;
> +  n += sizeof(this);

sizeof(*this), surely?

@@ +164,5 @@
> +EventRecord::SizeOfIncludingThis(mozilla::MallocSizeOf aMallocSizeOf) const
> +{
> +  size_t n = 0;
> +  n += sizeof(this);
> +  n += sizeof(mValue);

This is only measuring the size of the Maybe<nsCString>, not the actual string contained therein.  And you would be double-counting mValue, assuming that you really mean |sizeof(*this)| above.

@@ +165,5 @@
> +{
> +  size_t n = 0;
> +  n += sizeof(this);
> +  n += sizeof(mValue);
> +  n += mExtra.ShallowSizeOfExcludingThis(aMallocSizeOf);

Why not the strings contained in mExtra?

@@ +269,5 @@
> +  // Check whether we can record this event.
> +  const CommonEventInfo& common = gEventInfo[eventId].common_info;
> +  if (!CanRecordEvent(lock, common)) {
> +    return RecordEventResult::Ok;
> +  }

I think all these checks on the event would be better if we moved them out of here and before we did a lot of the processing of |value| and |extra| from JS in TelemetryEvent::RecordEvent.  That way if we can't record the event because it's expired or whatever, we don't waste a bunch of time converting things from JS.  The extra keys checks could also be moved out of here, so we're spending as little time as possible with the lock held.

@@ +274,5 @@
> +
> +  // Check whether the extra keys passed are valid.
> +  nsTHashtable<nsCStringHashKey> validExtraKeys;
> +  for (uint32_t i = 0; i < common.extra_count; ++i) {
> +    validExtraKeys.PutEntry(nsDependentCString(common.extra_key(i)));

Creating this hashtable on every event recording seems rather expensive.  How often do you expect events to be recorded?  How many extra keys are events expected to be defined to have, and/or passed in when they're recorded?

We could pre-calculate all of these tables when telemetry is initialized...which would also enable checking keys as we extract them from JS.

@@ +277,5 @@
> +  for (uint32_t i = 0; i < common.extra_count; ++i) {
> +    validExtraKeys.PutEntry(nsDependentCString(common.extra_key(i)));
> +  }
> +
> +  for (uint32_t i = 0; i < extra.Length(); ++i) {

You should use |uint32_t i = 0, len = extra.Length(); i < len; ++i| here to make it slightly more efficient.  Or perhaps just pass extra in as |const StringPairArray&|, so the compiler knows it's not going to change.

@@ +388,5 @@
> +  // Check value argument.
> +  if ((optional_argc > 0) && !aValue.isNull() && !aValue.isString()) {
> +    LogToBrowserConsole(nsIScriptError::warningFlag,
> +                        NS_LITERAL_STRING("Invalid type for value parameter."));
> +    return NS_OK;

Seems like returning an error would be more appropriate here...

@@ +401,5 @@
> +                          NS_LITERAL_STRING("Invalid string value for value parameter."));
> +      return NS_OK;
> +    }
> +
> +    value = mozilla::Some(nsCString(NS_ConvertUTF16toUTF8(str)));

Do you really need the extra nsCString here?  NS_ConvertUTF16toUTF8 is already an nsCString...or is that extra conversion to satisfy the compiler?

@@ +445,5 @@
> +        return NS_OK;
> +      }
> +
> +      extra.AppendElement(mozilla::MakePair(nsCString(NS_ConvertUTF16toUTF8(key)),
> +                                            nsCString(NS_ConvertUTF16toUTF8(str))));

Pretty sure you don't want the extra nsCString conversion here; you could just explicitly instantiate mozilla::MakePair<nsCString, nsCString>(...).

@@ +504,5 @@
> +      }
> +
> +      events.AppendElement(gEventRecords[i]);
> +      if (aClear) {
> +        gEventRecords.RemoveElementAt(i);

Please come up with a better way to do this that's not going to be O(N^2), like shifting elements down as you go, or recording which elements get removed and doing the shifting outside of the loop.

@@ +516,5 @@
> +  JS::RootedObject eventsArray(cx, JS_NewArrayObject(cx, events.Length()));
> +  if (!eventsArray) {
> +    return NS_ERROR_FAILURE;
> +  }
> +  aResult.setObject(*eventsArray);

Seems like it would be slightly better to do this after the loop below and you're sure that eventsArray has been fully constructed.

@@ +563,5 @@
> +      return NS_ERROR_FAILURE;
> +    }
> +
> +    // Add the optional extra dictionary.
> +    if (record.Extra().Length() < 1) {

record.Extra().IsEmpty(), please.

@@ +616,5 @@
> +  n += gEventRecords.ShallowSizeOfIncludingThis(aMallocSizeOf);
> +  n += gEventNameIDMap.ShallowSizeOfIncludingThis(aMallocSizeOf);
> +
> +  for (uint32_t i = 0; i < gEventRecords.Length(); ++i) {
> +    n += gEventRecords[i].SizeOfIncludingThis(aMallocSizeOf);

The guidelines given at:

https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Memory_reporting#Other_Considerations

Suggest that these function names are backwards from the usual convention: when you have a pointer, you want ->IncludingThis, but when you have an actual variable, you want .ExcludingThis.  In particular, the calls outside the loop are going to measure the object sizes of gEventRecords and gEventNameIDMap, which are not runtime-allocated memory, and this call here is going to measure the object sizes of gEventRecords[i], which you already measured with the ShallowSizeOf call above...
Attachment #8805039 - Flags: review?(nfroyd) → feedback+
(In reply to Nathan Froyd [:froydnj] from comment #23)
> I don't understand the questions from comment 12 about ownership of
> EventRecords: it looks like we only have them owned in one place? 

That was worrying about how to handle more dynamic life-time of EventRecords - with the second patch version i noticed that i probably don't even need to worry about this in the first place and rewrote the code accordingly.

> Unless
> you're asking whether we should setup different limits for event records
> per-category or something like that?  I have no particular opinion on that;
> I worry a little bit that every client of this code is never going to pass
> |clear=true| when snapshotting records, so they'll all be unwittingly
> stepping on each other's toes....

I am dropping the "extraction & clearing per category" because in this bug we are only concerned with recording events for the main ping.
That means Telemetry modules are responsible for snapshotting & clearing these.
I'm thinking that, when expanding this with future patches, we will use separate limits for those "builtin"/"main ping" events vs. the ones recorded for other pings.

> @@ +103,5 @@
> > +    , mValue(value)
> > +    , mExtra(extra)
> > +  {}
> > +
> > +  EventRecord(const EventRecord& other)
> 
> Usually when you define the copy constructor, you want to define the copy
> assignment operator as well.  But I don't really see any copying of objects;
> it seems like everything could be done with move construction and assignment
> instead, so why not define those?

Copying happens in `CreateSnapshots()`, because of the JS vs. Telemetry deadlock scenario:
* We need to extract events from storage while holding the lock; copying them to a temporary array
* With the lock not held anymore, we serialize the copied events to JS.

This is because the JS code can call into Telemetry, hence we can't hold the storage lock while calling into it.
Generally we approach functions dealing with JS in three phases:
* convert JS values to C++/Variant values
* get lock, access protected internals, drop lock
* convert outgoing values to JS

> @@ +124,5 @@
> > +  Maybe<nsCString> mValue;
> > +  StringPairArray mExtra;
> > +};
> > +
> > +// Implements the methods for EventInfo.
> 
> Thanks for doing a string table for these.
> 
> It seems like in many cases, you're just creating nsDependentString out of
> the return values below.  Maybe consider storing the lengths of the strings
> as well, and returning nsDependentCString from these methods, so you don't
> have to continually recalculate the length via strlen?

I'll take this to a follow-up bug?
If we change the string table generator, that will affect TelemetryHistogram.cpp & TelemetryScalar.cpp as well.

> @@ +269,5 @@
> > +  // Check whether we can record this event.
> > +  const CommonEventInfo& common = gEventInfo[eventId].common_info;
> > +  if (!CanRecordEvent(lock, common)) {
> > +    return RecordEventResult::Ok;
> > +  }
> 
> I think all these checks on the event would be better if we moved them out
> of here and before we did a lot of the processing of |value| and |extra|
> from JS in TelemetryEvent::RecordEvent.  That way if we can't record the
> event because it's expired or whatever, we don't waste a bunch of time
> converting things from JS.  The extra keys checks could also be moved out of
> here, so we're spending as little time as possible with the lock held.

This has the same background with calling into JS code causing deadlocks.
Aquiring the lock earlier just to check `CanRecordEvent()` etc. doesn't seem sensible.

> 
> @@ +274,5 @@
> > +
> > +  // Check whether the extra keys passed are valid.
> > +  nsTHashtable<nsCStringHashKey> validExtraKeys;
> > +  for (uint32_t i = 0; i < common.extra_count; ++i) {
> > +    validExtraKeys.PutEntry(nsDependentCString(common.extra_key(i)));
> 
> Creating this hashtable on every event recording seems rather expensive. 
> How often do you expect events to be recorded?  How many extra keys are
> events expected to be defined to have, and/or passed in when they're
> recorded?
> 
> We could pre-calculate all of these tables when telemetry is
> initialized...which would also enable checking keys as we extract them from
> JS.


> @@ +388,5 @@
> > +  // Check value argument.
> > +  if ((optional_argc > 0) && !aValue.isNull() && !aValue.isString()) {
> > +    LogToBrowserConsole(nsIScriptError::warningFlag,
> > +                        NS_LITERAL_STRING("Invalid type for value parameter."));
> > +    return NS_OK;
> 
> Seems like returning an error would be more appropriate here...

The consensus in bug 1311100 is that we will only throw to JS when attempting to use unknown probes (discussed there for histograms, but the same applies to scalars & events).
(In reply to Nathan Froyd [:froydnj] from comment #23)
> @@ +274,5 @@
> > +
> > +  // Check whether the extra keys passed are valid.
> > +  nsTHashtable<nsCStringHashKey> validExtraKeys;
> > +  for (uint32_t i = 0; i < common.extra_count; ++i) {
> > +    validExtraKeys.PutEntry(nsDependentCString(common.extra_key(i)));
> 
> Creating this hashtable on every event recording seems rather expensive. 
> How often do you expect events to be recorded?  How many extra keys are
> events expected to be defined to have, and/or passed in when they're
> recorded?
> 
> We could pre-calculate all of these tables when telemetry is
> initialized...which would also enable checking keys as we extract them from
> JS.

I'm worried about the memory impact of pre-allocating all of those.
The current planned limit is 20 extra keys, although i'm looking at the known use-cases to see if we can lower this further.
That would mean 20 key entries in hash tables for each event (and we could have hundreds of events registered).

I can't predict how often we will record events, but i would expect this to be much less common than histogram accumulations (because events are not aggregated, so the bandwidth & storage costs are much higher).

The expected use-cases i'm covering first would be recording from JS, e.g.:
* specific events in the lifetime of a document
* button clicks
* navigation to a website
* sync attempting/completing

Later - in bug 1313327 - i want to add a more efficient C++ API for platform use-cases that would skip those expensive extra key tests.
This implements the API, storage and serialization for Telemetry event recording.
Attachment #8805039 - Attachment is obsolete: true
Attachment #8806270 - Flags: review?(alessio.placitelli)
Attachment #8806271 - Flags: review?(alessio.placitelli)
Comment on attachment 8806269 [details] [diff] [review]
Part 2 - Implement Event Telemetry recording

* Addressed review comments (except where mentioned in comment 24 and comment 25).
* Dropped snapshots per category for now (to keep things simpler).
* Added length limit for string values (and truncating).

Interdiff:
https://bugzilla.mozilla.org/attachment.cgi?oldid=8805039&action=interdiff&newid=8806269&headers=1
Attachment #8806269 - Flags: review?(nfroyd)
Attachment #8806269 - Flags: review?(alessio.placitelli)
Comment on attachment 8806270 [details] [diff] [review]
Part 3 - Add events to the main ping

Review of attachment 8806270 [details] [diff] [review]:
-----------------------------------------------------------------

It looks good with the comment change below addressed.

::: toolkit/components/telemetry/TelemetrySession.jsm
@@ +1006,5 @@
>    },
>  
> +  getEvents: function(isSubsession, clearSubsession) {
> +    if (!isSubsession) {
> +      // We only support scalars for subsessions.

s/scalars/events
Attachment #8806270 - Flags: review?(alessio.placitelli) → review+
Comment on attachment 8806269 [details] [diff] [review]
Part 2 - Implement Event Telemetry recording

Review of attachment 8806269 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/TelemetryCommon.h
@@ +51,5 @@
>  bool IsInDataset(uint32_t aDataset, uint32_t aContainingDataset);
>  bool CanRecordDataset(uint32_t aDataset, bool aCanRecordBase, bool aCanRecordExtended);
>  
> +/**
> + * Return the number of seconds since process start using monotonic

s/seconds/milliseconds?

::: toolkit/components/telemetry/TelemetryEvent.cpp
@@ +125,5 @@
> +
> +  EventRecord& operator=(const EventRecord& other) = delete;
> +
> +  /*
> +  EventRecord& operator=(const EventRecord& other) {

drive-by: any reason why this block is here? Leftover?
Attachment #8806269 - Flags: review?(alessio.placitelli) → review+
Attachment #8806271 - Flags: review?(alessio.placitelli) → review+
Comment on attachment 8806269 [details] [diff] [review]
Part 2 - Implement Event Telemetry recording

Review of attachment 8806269 [details] [diff] [review]:
-----------------------------------------------------------------

Comments below.

::: toolkit/components/telemetry/TelemetryEvent.cpp
@@ +86,5 @@
> +const uint32_t kMaxEventRecords = 10000;
> +// Maximum length of any passed value string.
> +const uint32_t kMaxValueStringLength = 100;
> +// Maximum length of any string value in the extra dictionary.
> +const uint32_t kMaxExtraValueStringLength = 100;

These should be clearer on whether they are the maximum UTF16 string length, or the UTF8 byte sequence length.

@@ +249,5 @@
> +// The Name -> ID cache map.
> +EventMapType gEventNameIDMap(kEventCount);
> +
> +// The main event storage. Events are inserted here in recording order.
> +nsTArray<EventRecord> gEventRecords;

Static constructors, boo. :(

@@ +436,5 @@
> +    nsCString str = NS_ConvertUTF16toUTF8(jsStr);
> +    if (str.Length() > kMaxValueStringLength) {
> +      LogToBrowserConsole(nsIScriptError::warningFlag,
> +                          NS_LITERAL_STRING("Value parameter exceeds maximum string length, truncating."));
> +      str = Substring(str, 0, kMaxValueStringLength);

I think you probably want |str.SetLength()| here...but you also want some clever handling so that you don't cut the string right in the middle of a UTF8 character sequence...  You'll want to look into RewindToPriorUTF8Codepoint.

@@ +485,5 @@
> +      nsCString str = NS_ConvertUTF16toUTF8(jsStr);
> +      if (str.Length() > kMaxExtraValueStringLength) {
> +        LogToBrowserConsole(nsIScriptError::warningFlag,
> +                            NS_LITERAL_STRING("Extra value exceeds maximum string length, truncating."));
> +        str = Substring(str, 0, kMaxExtraValueStringLength);

Here too.

@@ +644,5 @@
> +{
> +  StaticMutexAutoLock locker(gTelemetryEventsMutex);
> +  size_t n = 0;
> +
> +  n += gEventRecords.ShallowSizeOfIncludingThis(aMallocSizeOf);

...ExcludingThis (*this lives in static storage, so there's no way it's allocated memory).
Attachment #8806269 - Flags: review?(nfroyd) → feedback+
(In reply to Nathan Froyd [:froydnj] from comment #32)
> @@ +249,5 @@
> > +// The Name -> ID cache map.
> > +EventMapType gEventNameIDMap(kEventCount);
> > +
> > +// The main event storage. Events are inserted here in recording order.
> > +nsTArray<EventRecord> gEventRecords;
> 
> Static constructors, boo. :(

We followed the same pattern in TelemetryScalar.cpp (gScalarNameIDMap, gScalarStorageMap, gKeyedScalarStorageMap).
Would you suggest a specific change here? Switch to heap-allocated storage?
This implements the API, storage and serialization for Telemetry event recording.
Attachment #8808561 - Flags: review?(nfroyd)
Attachment #8806269 - Attachment is obsolete: true
Attachment #8808561 - Flags: review?(nfroyd) → review+
Blocks: 1316810
Pushed by georg.fritzsche@googlemail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ac12dbbe7b63
Part 1 - Add script to generate headers with event data from Events.yaml. r=dexter
https://hg.mozilla.org/integration/mozilla-inbound/rev/ab64c55508aa
Part 2 - Implement Event Telemetry recording. r=froydnj,dexter
https://hg.mozilla.org/integration/mozilla-inbound/rev/574bf5f49185
Part 3 - Add events to the main ping. r=dexter
https://hg.mozilla.org/integration/mozilla-inbound/rev/a4471f64e67a
Part 4 - Add event rendering to about:telemetry. r=dexter
Backed out for eslint failure and leaks:

https://hg.mozilla.org/integration/mozilla-inbound/rev/557a7e72a15086747793eb5dd0642c4f149b0937
https://hg.mozilla.org/integration/mozilla-inbound/rev/215e84d751786dcf690f1a7511aa9806b2dc9a0b
https://hg.mozilla.org/integration/mozilla-inbound/rev/f20e3b45a15660fe9d5d719f33ec73178f9d0602
https://hg.mozilla.org/integration/mozilla-inbound/rev/89d19198a0de6d839a1b2b390f9fc50dc9aa78ef

ESlint failure: https://treeherder.mozilla.org/logviewer.html#?job_id=39117221&repo=mozilla-inbound
> toolkit/components/telemetry/tests/unit/test_TelemetryEvents.js:73:30 | 'expected' is already declared in the upper scope. (no-shadow)

Leaks (example): https://treeherder.mozilla.org/logviewer.html#?job_id=39118591&repo=mozilla-inbound
[task 2016-11-12T19:42:14.420533Z] 19:42:14     INFO - TEST-START | Shutdown
[task 2016-11-12T19:42:14.420571Z] 19:42:14     INFO - Passed:  2
[task 2016-11-12T19:42:14.420590Z] 19:42:14     INFO - Failed:  0
[task 2016-11-12T19:42:14.420616Z] 19:42:14     INFO - Todo:    0
[task 2016-11-12T19:42:14.420650Z] 19:42:14     INFO - Mode:    non-e10s
[task 2016-11-12T19:42:14.420685Z] 19:42:14     INFO - Slowest: 1108ms - chrome://mochitests/content/chrome/dom/base/test/test_copypaste.xul
[task 2016-11-12T19:42:14.420705Z] 19:42:14     INFO - SimpleTest FINISHED
[task 2016-11-12T19:42:14.420723Z] 19:42:14     INFO - TEST-INFO | Ran 1 Loops
[task 2016-11-12T19:42:14.420740Z] 19:42:14     INFO - SimpleTest FINISHED
[task 2016-11-12T19:42:15.342889Z] 19:42:15     INFO - --DOCSHELL 0x7f931707c000 == 5 [pid = 2374] [id = 3]
[task 2016-11-12T19:42:16.217620Z] 19:42:16     INFO - [2374] WARNING: NS_ENSURE_TRUE(maybeContext) failed: file /home/worker/workspace/build/src/xpcom/threads/nsThread.cpp, line 977
[task 2016-11-12T19:42:16.218831Z] 19:42:16     INFO - [2374] WARNING: 'NS_FAILED(RemovePermissionChangeObserver())', file /home/worker/workspace/build/src/dom/notification/Notification.cpp, line 669
[task 2016-11-12T19:42:16.266630Z] 19:42:16     INFO - --DOCSHELL 0x7f932b721000 == 4 [pid = 2374] [id = 1]
[task 2016-11-12T19:42:16.584930Z] 19:42:16     INFO - --DOCSHELL 0x7f93149d6000 == 3 [pid = 2374] [id = 5]
[task 2016-11-12T19:42:16.585901Z] 19:42:16     INFO - --DOCSHELL 0x7f931707f000 == 2 [pid = 2374] [id = 4]
[task 2016-11-12T19:42:16.586639Z] 19:42:16     INFO - --DOCSHELL 0x7f9309d49000 == 1 [pid = 2374] [id = 6]
[task 2016-11-12T19:42:16.586704Z] 19:42:16     INFO - --DOCSHELL 0x7f9320265800 == 0 [pid = 2374] [id = 2]
[task 2016-11-12T19:42:17.447320Z] 19:42:17     INFO - --DOMWINDOW == 17 (0x7f9320267000) [pid = 2374] [serial = 4] [outer = (nil)] [url = about:blank]
[task 2016-11-12T19:42:17.451742Z] 19:42:17     INFO - --DOMWINDOW == 16 (0x7f930c268000) [pid = 2374] [serial = 17] [outer = (nil)] [url = about:blank]
[task 2016-11-12T19:42:17.451831Z] 19:42:17     INFO - --DOMWINDOW == 15 (0x7f9311c1d800) [pid = 2374] [serial = 18] [outer = (nil)] [url = chrome://mochitests/content/chrome/dom/base/test/test_copypaste.xul]
[task 2016-11-12T19:42:17.451913Z] 19:42:17     INFO - --DOMWINDOW == 14 (0x7f9309d49800) [pid = 2374] [serial = 15] [outer = (nil)] [url = chrome://mochitests/content/chrome/dom/base/test/test_copypaste.xul]
[task 2016-11-12T19:42:17.451978Z] 19:42:17     INFO - --DOMWINDOW == 13 (0x7f9309d4f800) [pid = 2374] [serial = 16] [outer = (nil)] [url = about:blank]
[task 2016-11-12T19:42:17.452049Z] 19:42:17     INFO - --DOMWINDOW == 12 (0x7f930b679000) [pid = 2374] [serial = 13] [outer = (nil)] [url = http://mochi.test:8888/redirect.html]
[task 2016-11-12T19:42:17.467947Z] 19:42:17     INFO - --DOMWINDOW == 11 (0x7f932b721800) [pid = 2374] [serial = 1] [outer = (nil)] [url = resource://gre-resources/hiddenWindow.html]
[task 2016-11-12T19:42:17.468089Z] 19:42:17     INFO - --DOMWINDOW == 10 (0x7f932b722800) [pid = 2374] [serial = 2] [outer = (nil)] [url = about:blank]
[task 2016-11-12T19:42:17.468252Z] 19:42:17     INFO - --DOMWINDOW == 9 (0x7f9320266000) [pid = 2374] [serial = 3] [outer = (nil)] [url = chrome://browser/content/browser.xul]
[task 2016-11-12T19:42:17.468339Z] 19:42:17     INFO - --DOMWINDOW == 8 (0x7f931707d800) [pid = 2374] [serial = 6] [outer = (nil)] [url = about:blank]
[task 2016-11-12T19:42:17.468961Z] 19:42:17     INFO - --DOMWINDOW == 7 (0x7f9317080000) [pid = 2374] [serial = 7] [outer = (nil)] [url = about:blank]
[task 2016-11-12T19:42:17.469575Z] 19:42:17     INFO - --DOMWINDOW == 6 (0x7f93149d6800) [pid = 2374] [serial = 8] [outer = (nil)] [url = chrome://mochikit/content/harness.xul]
[task 2016-11-12T19:42:17.469796Z] 19:42:17     INFO - --DOMWINDOW == 5 (0x7f93147a0800) [pid = 2374] [serial = 9] [outer = (nil)] [url = about:blank]
[task 2016-11-12T19:42:17.472803Z] 19:42:17     INFO - --DOMWINDOW == 4 (0x7f93108c2800) [pid = 2374] [serial = 14] [outer = (nil)] [url = chrome://mochikit/content/harness.xul]
[task 2016-11-12T19:42:17.474084Z] 19:42:17     INFO - --DOMWINDOW == 3 (0x7f9314238800) [pid = 2374] [serial = 10] [outer = (nil)] [url = about:blank]
[task 2016-11-12T19:42:17.474730Z] 19:42:17     INFO - --DOMWINDOW == 2 (0x7f931423c000) [pid = 2374] [serial = 11] [outer = (nil)] [url = about:blank]
[task 2016-11-12T19:42:17.475337Z] 19:42:17     INFO - --DOMWINDOW == 1 (0x7f9314130800) [pid = 2374] [serial = 12] [outer = (nil)] [url = about:blank]
[task 2016-11-12T19:42:17.476466Z] 19:42:17     INFO - --DOMWINDOW == 0 (0x7f932f836000) [pid = 2374] [serial = 5] [outer = (nil)] [url = resource://gre-resources/hiddenWindow.html]
[task 2016-11-12T19:42:17.516839Z] 19:42:17     INFO - [2374] WARNING: '!compMgr', file /home/worker/workspace/build/src/xpcom/glue/nsComponentManagerUtils.cpp, line 63
[task 2016-11-12T19:42:17.517622Z] 19:42:17     INFO - [2374] WARNING: OOPDeinit() without successful OOPInit(): file /home/worker/workspace/build/src/toolkit/crashreporter/nsExceptionHandler.cpp, line 3488
[task 2016-11-12T19:42:17.521299Z] 19:42:17     INFO - [2374] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 172
[task 2016-11-12T19:42:17.522463Z] 19:42:17     INFO - [2374] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 172
[task 2016-11-12T19:42:17.525317Z] 19:42:17     INFO - [2374] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 172
[task 2016-11-12T19:42:17.525606Z] 19:42:17     INFO - [2374] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 172
[task 2016-11-12T19:42:17.526032Z] 19:42:17     INFO - [2374] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 172
[task 2016-11-12T19:42:17.526598Z] 19:42:17     INFO - [2374] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 172
[task 2016-11-12T19:42:17.527221Z] 19:42:17     INFO - [2374] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 172
[task 2016-11-12T19:42:17.527319Z] 19:42:17     INFO - [2374] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 172
[task 2016-11-12T19:42:17.527788Z] 19:42:17     INFO - [2374] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 172
[task 2016-11-12T19:42:17.528198Z] 19:42:17     INFO - [2374] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 172
[task 2016-11-12T19:42:17.528641Z] 19:42:17     INFO - [2374] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 172
[task 2016-11-12T19:42:17.529063Z] 19:42:17     INFO - [2374] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 172
[task 2016-11-12T19:42:17.529466Z] 19:42:17     INFO - [2374] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 172
[task 2016-11-12T19:42:17.529736Z] 19:42:17     INFO - [2374] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 172
[task 2016-11-12T19:42:17.530244Z] 19:42:17     INFO - [2374] WARNING: XPCOM objects created/destroyed from static ctor/dtor: file /home/worker/workspace/build/src/xpcom/base/nsTraceRefcnt.cpp, line 172
[task 2016-11-12T19:42:17.530587Z] 19:42:17     INFO - nsStringStats
[task 2016-11-12T19:42:17.530735Z] 19:42:17     INFO -  => mAllocCount:         103818
[task 2016-11-12T19:42:17.530987Z] 19:42:17     INFO -  => mReallocCount:         6895
[task 2016-11-12T19:42:17.531407Z] 19:42:17     INFO -  => mFreeCount:          103818
[task 2016-11-12T19:42:17.531564Z] 19:42:17     INFO -  => mShareCount:          89916
[task 2016-11-12T19:42:17.531772Z] 19:42:17     INFO -  => mAdoptCount:           7097
[task 2016-11-12T19:42:17.532009Z] 19:42:17     INFO -  => mAdoptFreeCount:       7097
[task 2016-11-12T19:42:17.532291Z] 19:42:17     INFO -  => Process ID: 2374, Thread ID: 140270744028992
[task 2016-11-12T19:42:17.584026Z] 19:42:17     INFO - TEST-INFO | Main app process: exit 0
[task 2016-11-12T19:42:17.584274Z] 19:42:17     INFO - runtests.py | Application ran for: 0:00:11.462610
[task 2016-11-12T19:42:17.584516Z] 19:42:17     INFO - zombiecheck | Reading PID log: /tmp/tmpvCRsUNpidlog
[task 2016-11-12T19:42:17.584804Z] 19:42:17     INFO - ==> process 2374 launched child process 2402
[task 2016-11-12T19:42:17.585123Z] 19:42:17     INFO - zombiecheck | Checking for orphan process with PID: 2402
[task 2016-11-12T19:42:17.585985Z] 19:42:17     INFO - Stopping web server
[task 2016-11-12T19:42:17.622257Z] 19:42:17     INFO - Stopping web socket server
[task 2016-11-12T19:42:17.642527Z] 19:42:17     INFO - Stopping ssltunnel
[task 2016-11-12T19:42:17.674965Z] 19:42:17     INFO - TEST-INFO | leakcheck | default process: leak threshold set at 0 bytes
[task 2016-11-12T19:42:17.675397Z] 19:42:17     INFO - TEST-INFO | leakcheck | plugin process: leak threshold set at 0 bytes
[task 2016-11-12T19:42:17.675873Z] 19:42:17     INFO - TEST-INFO | leakcheck | tab process: leak threshold set at 10000 bytes
[task 2016-11-12T19:42:17.676405Z] 19:42:17     INFO - TEST-INFO | leakcheck | geckomediaplugin process: leak threshold set at 20000 bytes
[task 2016-11-12T19:42:17.676829Z] 19:42:17     INFO - TEST-INFO | leakcheck | gpu process: leak threshold set at 0 bytes
[task 2016-11-12T19:42:17.677102Z] 19:42:17     INFO - 
[task 2016-11-12T19:42:17.677448Z] 19:42:17     INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, default process 2374
[task 2016-11-12T19:42:17.677720Z] 19:42:17     INFO - 
[task 2016-11-12T19:42:17.678013Z] 19:42:17     INFO -      |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
[task 2016-11-12T19:42:17.678334Z] 19:42:17     INFO -      |                                      | Per-Inst   Leaked|   Total      Rem|
[task 2016-11-12T19:42:17.678945Z] 19:42:17     INFO -    0 |TOTAL                                 |       33      120|  978462       15|
[task 2016-11-12T19:42:17.682436Z] 19:42:17     INFO - 1156 |nsStringBuffer                        |        8      112|  110713       14|
[task 2016-11-12T19:42:17.682801Z] 19:42:17     INFO - 1201 |nsTArray_base                         |        8        8|  245103        1|
[task 2016-11-12T19:42:17.683657Z] 19:42:17     INFO - 
[task 2016-11-12T19:42:17.684042Z] 19:42:17     INFO - nsTraceRefcnt::DumpStatistics: 1317 entries
[task 2016-11-12T19:42:17.684403Z] 19:42:17     INFO - TEST-INFO | leakcheck | default process: leaked 14 nsStringBuffer
[task 2016-11-12T19:42:17.684670Z] 19:42:17     INFO - TEST-INFO | leakcheck | default process: leaked 1 nsTArray_base
[task 2016-11-12T19:42:17.685110Z] 19:42:17  WARNING - TEST-UNEXPECTED-FAIL | leakcheck | default process: 120 bytes leaked (nsStringBuffer, nsTArray_base)
[task 2016-11-12T19:42:17.685528Z] 19:42:17     INFO - runtests.py | Running tests: end.
[task 2016-11-12T19:42:17.721620Z] 19:42:17     INFO - Buffered messages finished
[task 2016-11-12T19:42:17.722326Z] 19:42:17     INFO -  dir: toolkit/components/aboutmemory/tests
[task 2016-11-12T19:42:17.743545Z] 19:42:17     INFO -  Setting pipeline to PAUSED ...
[task 2016-11-12T19:42:17.744346Z] 19:42:17     INFO -  Pipeline is PREROLLING ...
[task 2016-11-12T19:42:17.745000Z] 19:42:17     INFO -  Pipeline is PREROLLED ...
[task 2016-11-12T19:42:17.745370Z] 19:42:17     INFO -  Setting pipeline to PLAYING ...
[task 2016-11-12T19:42:17.745673Z] 19:42:17     INFO -  New clock: GstSystemClock
[task 2016-11-12T19:42:17.778561Z] 19:42:17     INFO -  Got EOS from element "pipeline0".
[task 2016-11-12T19:42:17.779120Z] 19:42:17     INFO -  Execution ended after 33439320 ns.
[task 2016-11-12T19:42:17.779173Z] 19:42:17     INFO -  Setting pipeline to PAUSED ...
[task 2016-11-12T19:42:17.779861Z] 19:42:17     INFO -  Setting pipeline to READY ...
[task 2016-11-12T19:42:17.779928Z] 19:42:17     INFO -  Setting pipeline to NULL ...
[task 2016-11-12T19:42:17.779988Z] 19:42:17     INFO -  Freeing pipeline ...
[task 2016-11-12T19:42:17.864000Z] 19:42:17     INFO -  mozprofile.addons WARNING | Could not install /home/worker/workspace/build/tests/mochitest/extensions/mozscreenshots: [Errno 2] No such file or directory: '/home/worker/workspace/build/tests/mochitest/extensions/mozscreenshots/install.rdf'
[task 2016-11-12T19:42:18.040795Z] 19:42:18     INFO -  pk12util: PKCS12 IMPORT SUCCESSFUL
Flags: needinfo?(gfritzsche)
Pushed by georg.fritzsche@googlemail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/5c047b770037
Part 1 - Add script to generate headers with event data from Events.yaml. r=dexter
https://hg.mozilla.org/integration/mozilla-inbound/rev/64b95c8f9f12
Part 2 - Implement Event Telemetry recording. r=froydnj,dexter
https://hg.mozilla.org/integration/mozilla-inbound/rev/ae8ae9a484d3
Part 3 - Add events to the main ping. r=dexter
https://hg.mozilla.org/integration/mozilla-inbound/rev/0bd621a27696
Part 4 - Add event rendering to about:telemetry. r=dexter
Fixed these:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4a34dedf058a8afe11cf5e389cab1c9ed1cd7285&selectedJob=31342279

I ended up changing `gEventRecords` to be heap-allocated here because the nsTArray_base kept leaking with 8 bytes when it was file-static.
Flags: needinfo?(gfritzsche)
Pushed by georg.fritzsche@googlemail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/2af0b28f843b
Part 1 - Add script to generate headers with event data from Events.yaml. r=dexter
https://hg.mozilla.org/integration/mozilla-inbound/rev/190f6fc837b5
Part 2 - Implement Event Telemetry recording. r=froydnj,dexter
https://hg.mozilla.org/integration/mozilla-inbound/rev/bb4c5358f83a
Part 3 - Add events to the main ping. r=dexter
https://hg.mozilla.org/integration/mozilla-inbound/rev/58d95de5bd74
Part 4 - Add event rendering to about:telemetry. r=dexter
Whiteboard: [measurement:client] → [measurement:client] [measurement:client:uplift]
Comment on attachment 8805038 [details] [diff] [review]
Part 1 - Add script to generate headers with event data from Events.yaml

Approval Request Comment
[Feature/Bug causing the regression]: Event Telemetry.
[User impact if declined]: Data of Event Telemetry behavior reaches us later, delaying our decision making for the project.
[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]: Yes, bug 1302671.
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]:
The individual parts are:
* bug 1302663 - basic Telemetry implementation
* bug 1316810 - adjust event limits
* bug 1318284 - improve metrics that track effects on ping sending
* bug 1323628 - fix
* bug 1316281 - record search event in Telemetry
[Is the change risky?]: No.
[Why is the change risky/not risky?]: It's validated on Nightly and the changes are selective and relatively well understood.
[String changes made/needed]: Only two string additions for about:telemetry in attachment 8806271 [details] [diff] [review].

This is part of an uplift need for the initial Event Telemetry [1][2] implementation to Firefox 52.
We want to move this to Beta a little faster to see somewhat realistic data of an example event (search) coming in and verify that it matches our expectations.

I validated that this behaves as expected on Nightly over the last two weeks in bug 1302671.

1: https://docs.google.com/document/d/1hNuS9lUJMvMqgntZXbFA6xZBU9zBpQgo7x73-sXKRpI/
2: https://wiki.mozilla.org/Event_Telemetry
Attachment #8805038 - Flags: approval-mozilla-aurora?
Comment on attachment 8806270 [details] [diff] [review]
Part 3 - Add events to the main ping

Approval Request Comment
... see comment 44.
Attachment #8806270 - Flags: approval-mozilla-aurora?
Comment on attachment 8806271 [details] [diff] [review]
Part 4 - Add event rendering to about:telemetry

See comment 44.
Comment on attachment 8806271 [details] [diff] [review]
Part 4 - Add event rendering to about:telemetry

Approval Request Comment
... see comment 44.
Attachment #8806271 - Flags: approval-mozilla-aurora?
Comment on attachment 8808561 [details] [diff] [review]
Part 2 - Implement Event Telemetry recording

Approval Request Comment
... see comment 44.
Attachment #8808561 - Flags: approval-mozilla-aurora?
(In reply to Georg Fritzsche [:gfritzsche] from comment #44)
> [String changes made/needed]: Only two string additions for about:telemetry
> in attachment 8806271 [details] [diff] [review].

At this point of the cycle I strongly advise to provide a patch with hard-coded strings for aurora.
Ok, i'm happy to change that.
Attachment #8806271 - Attachment is obsolete: true
Attachment #8806271 - Flags: approval-mozilla-aurora?
Comment on attachment 8820242 [details] [diff] [review]
Part 4 - Add event rendering to about:telemetry

Approval Request Comment
... see comment 44.

I udpated this patch to use hard-coded strings.
Attachment #8820242 - Flags: review+
Attachment #8820242 - Flags: approval-mozilla-aurora?
Comment on attachment 8805038 [details] [diff] [review]
Part 1 - Add script to generate headers with event data from Events.yaml

add event telemetry for aurora52
Attachment #8805038 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8806270 [details] [diff] [review]
Part 3 - Add events to the main ping

add event telemetry for aurora52
Attachment #8806270 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8808561 [details] [diff] [review]
Part 2 - Implement Event Telemetry recording

add event telemetry for aurora52
Attachment #8808561 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment on attachment 8820242 [details] [diff] [review]
Part 4 - Add event rendering to about:telemetry

add event telemetry for aurora52
Attachment #8820242 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Whiteboard: [measurement:client] [measurement:client:uplift] → [measurement:client]
Depends on: 1422035
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: