Closed Bug 1276190 Opened 8 years ago Closed 8 years ago

Add script to generate headers with scalar data from Scalars.yaml

Categories

(Toolkit :: Telemetry, defect, P1)

defect
Points:
2

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
firefox49 --- affected
firefox50 --- fixed

People

(Reporter: gfritzsche, Assigned: Dexter)

References

(Blocks 1 open bug)

Details

(Whiteboard: [measurement:client])

Attachments

(4 files, 18 obsolete files)

1.70 KB, text/plain
Details
478 bytes, text/plain
Details
5.74 KB, patch
Dexter
: review+
benjamin
: feedback+
Details | Diff | Splinter Review
27.64 KB, patch
Dexter
: review+
Details | Diff | Splinter Review
We want to add a python script, similar to histogram_tools et al, that generates the headers with scalar probe information. We need: (1) a header with enums (2) a header with an array of scalar data (type, data-set, expiry, ...) (3) a header with a string table for scalar names (for enum <-> name lookups, could be in (2)) Lets start off Scalars.yaml for this with test probes for the initially supported types etc.
Priority: -- → P3
Whiteboard: [measurement:client]
Assignee: nobody → alessio.placitelli
Priority: P3 → P1
Attached patch bug1276190.patch (obsolete) — Splinter Review
This patch: - Adds the Scalars.yaml that defines the test scalars - Adds a script that generates the Scalar C++ enums - Adds a script that generates the Scalar C++ structs (incomplete)
Attached patch bug1276190.patch (obsolete) — Splinter Review
Attachment #8758347 - Attachment is obsolete: true
Attachment #8758766 - Flags: feedback?(gfritzsche)
Attached file Sample TelemetryScalarEnums.h (obsolete) —
Attached file Sample TelemetryScalarData.inc (obsolete) —
Comment on attachment 8758766 [details] [diff] [review] bug1276190.patch Review of attachment 8758766 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/telemetry/Scalars.yaml @@ +1,2 @@ > +# Scalar Prototype > +test_scalar_prototype: We want to use a fixed two-level structure here: > section.name: > probe_name: > ... There should be no top-level probes. @@ +120,5 @@ > + kind: int > + notification_emails: > + - fake@email.address > + > +test.flat: Let's lead with a good example and group our test probes into the "telemetry.test" section. ::: toolkit/components/telemetry/gen-scalar-data.py @@ +47,5 @@ > + print("const ScalarInfo gScalars[] = {", file=output) > + for s in scalars: > + # We add both the scalar name and the expiration string to the strings > + # table. > + name_index = string_table.stringIndex(s.name) If we would want to match the enum-name here for searchability, we'd need to use upper-case + '_' for both. On the other hand, this would loose us the '.'-separated names in the serializations, which means we couldn't easily display the data hierarchically without looking at Scalars.yaml. Hm. ::: toolkit/components/telemetry/gen_utils.py @@ +1,1 @@ > +# This Source Code Form is subject to the terms of the Mozilla Public Can we name this better? @@ +6,5 @@ > +# scripts. > + > +from __future__ import print_function > + > +class StringTable: This needs commenting. ::: toolkit/components/telemetry/moz.build @@ +59,5 @@ > > GENERATED_FILES = [ > 'TelemetryHistogramData.inc', > 'TelemetryHistogramEnums.h', > + 'TelemetryScalarData.inc', This is a confusing approach, let's do this in a proper C++ style now: * define the scalar info structure in a separate header * let the script generate TelemetryScalarData.h which includes that header * let TelemetryScalar.cpp properly include TelemetryScalarData.h at the top ::: toolkit/components/telemetry/scalar_tools.py @@ +1,1 @@ > +# This Source Code Form is subject to the terms of the Mozilla Public Lets name this a bit more descriptively: parse_scalars.py, load_scalars.py, ...? @@ +5,5 @@ > +import re > +import yaml > + > +# The required and optional fields in a scalar type definition. > +REQUIRED_FIELDS = { With the fixed two-level structure can move this and the following definitions into the validation function. @@ +41,5 @@ > + def __init__(self, name, definition): > + # Set the name, so we don't need to pass it to the validation functions. > + self._name = name > + > + # Validating the scalar definition. Let's also validate the group and probe name: * group name should be alpha-numeric + '.', no leading digit * probe name should be alpha-numeric + '_', no leading digit @@ +87,5 @@ > + if not isinstance(definition[f], REQUIRED_FIELDS[f])] > + if len(wrong_field_types) > 0: > + raise TypeError(self._name + ' - wrong fields types: ' + ', '.join(wrong_field_types)) > + > + # If we have any optional field, make sure it has the correct type. We don't need to type-check optional fields separately, lets do one common type check after checking for required + optional properties. @@ +236,5 @@ > + # we assume this is a group and the scalar definitions are one level below. > + scalar_group = scalars[scalar_name] > + for sub_name in scalar_group: > + grouped_scalar_name = scalar_name + '.' + sub_name > + yield ScalarType(grouped_scalar_name, scalar_group[sub_name]) With the fixed two-level structure, we can keep the group and probe name seperately. We can join them when needed.
Attachment #8758766 - Flags: feedback?(gfritzsche)
Attached file Sample TelemetryScalarEnums.h (obsolete) —
Attachment #8759082 - Attachment is obsolete: true
Attached file Sample TelemetryScalarData.h (obsolete) —
Attachment #8759083 - Attachment is obsolete: true
Attached patch bug1276190.patch (obsolete) — Splinter Review
Attachment #8758766 - Attachment is obsolete: true
Attachment #8759215 - Flags: review?(gfritzsche)
Given PEP0008 [1], we should probably rename the "gen-scalar*.py" to "gen_scalar_*.py". > Modules should have short, all-lowercase names. Underscores can be used in the module name > if it improves readability. Python packages should also have short, all-lowercase names, > although the use of underscores is discouraged. [1] - https://www.python.org/dev/peps/pep-0008/#package-and-module-names
Attachment #8759213 - Attachment mime type: text/x-chdr → text/plain
Attachment #8759212 - Attachment mime type: text/x-chdr → text/plain
Comment on attachment 8759215 [details] [diff] [review] bug1276190.patch Review of attachment 8759215 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/telemetry/Scalars.yaml @@ +1,1 @@ > +telemetry.test: We can add comments, so let's add one before this section to explain that it's about test probes (and that the probes in this section will not be submitted in pings outside of tests). @@ +1,3 @@ > +telemetry.test: > + # Scalar Prototype > + test_scalar_prototype: That becomes: TELEMETRY_TEST_TEST_... I think one TEST is enough. Let's not add a sample just for copy-n-paste here, there will be documentation soon and other entries to copy from. @@ +14,5 @@ > + - fake@email.address > + release_channel_collection: opt-in # Optional, "opt-in" (default) or "opt-out" > + > + # Test Scalar Types > + test_int_kind: We decided to start out with uint. @@ +21,5 @@ > + description: > > + This is a test int type with a really long description, maybe spanning even multiple > + lines, to just prove a point: everything works just fine. > + expires: never > + keyed: false This should be optional and default to false. @@ +24,5 @@ > + expires: never > + keyed: false > + kind: int > + notification_emails: > + - fake@email.address You can use our telemetry-client-dev address. @@ +100,5 @@ > + notification_emails: > + - fake@email.address > + release_channel_collection: opt-out > + > + test_flat: This seems redundant - left-over? ::: toolkit/components/telemetry/gen-scalar-data.py @@ +10,5 @@ > + > +import parse_scalars > +import sys > + > +# The banner/text at the top of the the... ? @@ +11,5 @@ > +import parse_scalars > +import sys > + > +# The banner/text at the top of the > +banner = """/* This file is auto-generated, see gen-scalar-data.py. */ This should have a disclaimer that it's only for internal use in TelemetryScalar.h. @@ +47,5 @@ > + print("const ScalarInfo gScalars[] = {", file=output) > + for s in scalars: > + # We add both the scalar name and the expiration string to the strings > + # table. > + name_index = string_table.stringIndex(s.name) This seems to use only probe_name, not group.name.probe_name @@ +53,5 @@ > + # Write the scalar info entry. > + write_scalar_info(s, output, name_index, exp_index) > + print("};", file=output) > + > + string_table_name = "gScalarsStringTable" Ditto on naming. ::: toolkit/components/telemetry/gen-scalar-enum.py @@ +24,5 @@ > + print("#ifndef mozilla_TelemetryScalarEnums_h", file=output); > + print("#define mozilla_TelemetryScalarEnums_h", file=output); > + print("namespace mozilla {", file=output) > + print("namespace Telemetry {", file=output) > + print("enum SCALAR_ID : uint32_t {", file=output) All upper case is for constants; let's use `ScalarID`. ::: toolkit/components/telemetry/parse_scalars.py @@ +10,5 @@ > +SCALAR_TYPES_MAP = { > + 'bool': 'nsITelemetry::SCALAR_BOOLEAN', > + 'flag': 'nsITelemetry::SCALAR_FLAG', > + 'int': 'nsITelemetry::SCALAR_COUNT', > + 'string': 'nsITelemetry::SCALAR_STRING' We decided to start out with `uint` and `string`. Lets not allow anything else. @@ +27,5 @@ > + # Validating the scalar definition. > + self.validate_types(definition) > + self.validate_values(definition) > + > + # Everything is ok, set the rest of the data. The following are all just copying things from `definition` plus some default values. Can't we just store `definition` and access it in the getters? @@ +56,5 @@ > + chars_regxp = r'^[a-zA-Z0-9' + allowed_char_regexp + r']+$' > + if not re.match(chars_regxp, name): > + raise TypeError(name + ' should be alpha-numeric + \'{}\''.format(allowed_char_regexp)) > + > + # Don't allow leading digits, '.' or '_'. Also not trailing. @@ +96,5 @@ > + } > + > + # Concatenate the required and optional field definitions. > + ALL_FIELDS = \ > + {k: v for d in (REQUIRED_FIELDS, OPTIONAL_FIELDS) for k, v in d.iteritems()} Given two dicts a and b, the clearer pattern here is: c = a.copy() c.update(b) @@ +109,5 @@ > + if len(unknown_fields) > 0: > + raise KeyError(self._name + ' - unknown fields: ' + ', '.join(unknown_fields)) > + > + # Checks the type for all the required fields. > + wrong_field_types = [f for f in definition.keys() if not isinstance(definition[f], ALL_FIELDS[f])] Nit: this is confusingly named, it sounds like it contains the type names. @@ +111,5 @@ > + > + # Checks the type for all the required fields. > + wrong_field_types = [f for f in definition.keys() if not isinstance(definition[f], ALL_FIELDS[f])] > + if len(wrong_field_types) > 0: > + raise TypeError(self._name + ' - wrong fields types: ' + ', '.join(wrong_field_types)) histogram_tools.py tells us what the expected type is - let's do that here too. @@ +118,5 @@ > + list_fields = [f for f in definition if isinstance(definition[f], list)] > + for field in list_fields: > + broken_types = [v for v in definition[field] if not isinstance(v, LIST_FIELDS_CONTENT[field])] > + if len(broken_types) > 0: > + raise TypeError(self._name + ' - wrong fields types: ' + ', '.join(broken_types)) Can we drop a hint on the types? @@ +213,5 @@ > + """ Validate the expiration version and account for 'a1'. This function may change > + the 'expires' field in the definition. > + > + :param definition: the dictionary containing the scalar properties. > + :todo: this was taken from histogram_tools.py. Maybe we should share the code? Yes - make things pass the actual expiration value and make it shared - lets just expand shared_header_utils to shared_telemetry_utils? We will have similar sharing needs with the histogram code for the C++ parts. @@ +244,5 @@ > + raise Exception('Error parsing scalars in ' + filename + ': ' + e.message) > + > + # TODO: Should we make sure scalars_yaml is an OrderedDict? The would probably > + # violate the YAML spec (see http://pyyaml.org/ticket/29#comment:4). That said, > + # it seems to be important for us given the comments in histogram_tools.py. Let's visit that wen we actually run into it - right now we only parse from one file anyway. @@ +251,5 @@ > + # The first level contains the group name, while the second level contains the > + # probe name (e.g. "group.name: probe: ..."). > + for group_name in scalars: > + group = scalars[group_name] > + for probe_name in group: We should assert that `group` has at least one child. @@ +256,5 @@ > + # We found a scalar type. Go ahead and parse it. > + scalar_info = group[probe_name] > + yield ScalarType(group_name, probe_name, scalar_info) > + > +def load_scalars(filenames): We don't need to support loading from more than one file right. Let's just restrict this to one file and come back to it when actually needed.
Attachment #8759215 - Flags: review?(gfritzsche)
Attached file Sample TelemetryScalarData.h (obsolete) —
Attachment #8759213 - Attachment is obsolete: true
Attached file Sample TelemetryScalarEnums.h (obsolete) —
Attachment #8759212 - Attachment is obsolete: true
Attached patch bug1276190.patch (obsolete) — Splinter Review
Attachment #8759215 - Attachment is obsolete: true
Attachment #8760187 - Flags: review?(gfritzsche)
Comment on attachment 8760187 [details] [diff] [review] bug1276190.patch Review of attachment 8760187 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/telemetry/gen-scalar-enum.py @@ +24,5 @@ > + print("#ifndef mozilla_TelemetryScalarEnums_h", file=output); > + print("#define mozilla_TelemetryScalarEnums_h", file=output); > + print("namespace mozilla {", file=output) > + print("namespace Telemetry {", file=output) > + print("enum Scalar_ID : uint32_t {", file=output) This should be an |enum class Scalar_ID|...
Attachment #8760185 - Attachment mime type: text/x-chdr → text/plain
Attachment #8760186 - Attachment mime type: text/x-chdr → text/plain
Depends on: 1278556
Comment on attachment 8760187 [details] [diff] [review] bug1276190.patch Review of attachment 8760187 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/telemetry/Scalars.yaml @@ +1,1 @@ > +telemetry.test: Let's add an explanatory comment on what the file is for at the top (we will update it with documentation links later). As mentioned before, add an explanatory comment before this section, that this is for probes testing the Telemetry system. For probe coverage, i'm missing an expired and unexpired test probe below. @@ +6,5 @@ > + description: > > + This is a test uint type with a really long description, maybe spanning even multiple > + lines, to just prove a point: everything works just fine. > + expires: never > + kind: uint I think we don't need to keep this short here, so readability - e.g. `unsigned_int` - is probably preferred? @@ +19,5 @@ > + kind: string > + notification_emails: > + - telemetry-client-dev@mozilla.com > + > + keyed_uint: We don't want to support keyed histograms in the first stage, so lets not allow them for now. ::: toolkit/components/telemetry/gen-histogram-data.py @@ +13,5 @@ > import itertools > > banner = """/* This file is auto-generated, see gen-histogram-data.py. */ > """ > Nit: Let's clean out this character while we're here. @@ +42,3 @@ > static_assert(output, "sizeof(%s) <= UINT32_MAX" % strtab_name, > "index overflow") > Ditto. ::: toolkit/components/telemetry/gen-scalar-data.py @@ +30,5 @@ > + > + print(" {{ {}, {}, {}, {}, {} }},"\ > + .format(scalar.nsITelemetry_kind, name_index, expiration_index, > + scalar.dataset, > + "true" if scalar.keyed else "false"), Nit: Let's have each argument on a new line here, that will be more readable than the mixed style. @@ +68,5 @@ > + print(banner, file=output) > + print("#ifndef mozilla_TelemetryScalarData_h", file=output); > + print("#define mozilla_TelemetryScalarData_h", file=output); > + print("#include \"ScalarInfo.h\"", file=output); > + print("namespace {", file=output) Nit: Why not use two multi-line strings, prefix & suffix? ::: toolkit/components/telemetry/gen-scalar-enum.py @@ +24,5 @@ > + print("#ifndef mozilla_TelemetryScalarEnums_h", file=output); > + print("#define mozilla_TelemetryScalarEnums_h", file=output); > + print("namespace mozilla {", file=output) > + print("namespace Telemetry {", file=output) > + print("enum Scalar_ID : uint32_t {", file=output) Remember the C++ naming conventions: `enum class ScalarID ...` ::: toolkit/components/telemetry/parse_scalars.py @@ +41,5 @@ > + def check_name(name, allowed_char_regexp): > + # Check if we only have the allowed characters. > + chars_regxp = r'^[a-zA-Z0-9' + allowed_char_regexp + r']+$' > + if not re.search(chars_regxp, name): > + raise TypeError(name + ' should be alpha-numeric + \'{}\''.format(allowed_char_regexp)) Nit: "must be" Also, let's change the format a bit to mention what kind of property failed the check, e.g.: "group name must be ..., got: X" @@ +45,5 @@ > + raise TypeError(name + ' should be alpha-numeric + \'{}\''.format(allowed_char_regexp)) > + > + # Don't allow leading digits, '.' or '_'. > + if re.search(r'(^[\d\._])|([\d\._])$', name): > + raise TypeError(name + ' should have a leading/trailing digit, a dot or underscore.') Nit: "must not have" Ditto on the format, e.g.: "group name must ..., got: X" @@ +70,5 @@ > + 'notification_emails': list # This contains strings. See LIST_FIELDS_CONTENT. > + } > + > + OPTIONAL_FIELDS = { > + 'keyed': bool, We don't support keyed scalars yet. @@ +102,5 @@ > + if len(wrong_type_names) > 0: > + raise TypeError(self._name + ' - ' + ', '.join(wrong_type_names)) > + > + # Check that data in the lists have the correct types. > + list_fields = [f for f in definition if isinstance(definition[f], list)] We need to enforce non-empty lists. @@ +116,5 @@ > + :param definition: the dictionary containing the scalar properties. > + :raises ValueError: if a scalar definition field contains an unexpected value. > + """ > + > + definition['expires'] = validate_expiration(definition['expires']) Overwriting this here is unexpected. This function is also not named clearly, it doesn't just validate. @@ +125,5 @@ > + raise ValueError(self._name + ' - unknown scalar kind: ' + scalar_kind) > + > + # Validate the collection policy. > + collection_policy = definition.get('release_channel_collection', 'opt-in') > + if collection_policy not in ['opt-in', 'opt-out']: Nit: Why do the check even if it's not present? @@ +171,5 @@ > + > + @property > + def keyed(self): > + """Return true if the scalar is keyed""" > + return self._definition.get('keyed', False) Not yet. @@ +246,5 @@ > + :param filenames: the YAML files containing the scalars definition. It will only > + load the first file. > + """ > + if len(filenames) > 1: > + raise Exception('We don\'t support loading from more than one file.') Why do we accept a list then? Let's just accept a single file name.
Attachment #8760187 - Flags: review?(gfritzsche)
Attached patch bug1276190.patch (obsolete) — Splinter Review
Thanks for the review, Georg.
Attachment #8760187 - Attachment is obsolete: true
Attachment #8761096 - Flags: review?(gfritzsche)
Attached file Sample TelemetryScalarData.h (obsolete) —
Attachment #8760185 - Attachment is obsolete: true
Attached file Sample TelemetryScalarEnums.h (obsolete) —
Attachment #8760186 - Attachment is obsolete: true
Attachment #8761096 - Flags: review?(gfritzsche)
Attachment #8761097 - Attachment is obsolete: true
Attachment #8761098 - Attachment is obsolete: true
Attached patch bug1276190.patch (obsolete) — Splinter Review
Changed "is" (wrong) to "==" when checking the policy.
Attachment #8761096 - Attachment is obsolete: true
Attachment #8761150 - Flags: review?(gfritzsche)
Comment on attachment 8761150 [details] [diff] [review] bug1276190.patch Review of attachment 8761150 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/telemetry/Scalars.yaml @@ +1,2 @@ > +# This file contains a definition of the scalar probes that are available for use > +# within the client. "This file contains a definition of the scalar probes that are recorded in Telemetry. They are submitted with the "main" pings and can be inspected in about:telemetry." @@ +1,4 @@ > +# This file contains a definition of the scalar probes that are available for use > +# within the client. > + > +# The following section is for probes testing the Telemetry system. + "They will not be submitted in pings and are only used for testing." @@ +26,5 @@ > + expired: > + bug_numbers: > + - 1276190 > + description: This is an expired testing scalar; not meant to be touched. > + expires: 4.0a1 As mentioned, you also need to add an unexpired probe with a specific "expires" version (just put it far into the future). ::: toolkit/components/telemetry/gen-scalar-data.py @@ +67,5 @@ > + if len(filenames) > 1: > + raise Exception('We don\'t support loading from more than one file.') > + scalars = parse_scalars.load_scalars(filenames[0]) > + > + file_header = """\ For consistence, shouldn't we have `banner` next to these? Either move `banner` here or these up. @@ +81,5 @@ > + """ > + > + # Write the scalar data file. > + print(banner, file=output) > + print(dedent(file_header), file=output) Why do we need this and import `dedent` etc. when you just can directly make `file_header` contain what we want to print? ::: toolkit/components/telemetry/gen-scalar-enum.py @@ +21,5 @@ > + if len(filenames) > 1: > + raise Exception('We don\'t support loading from more than one file.') > + scalars = parse_scalars.load_scalars(filenames[0]) > + > + file_header = """\ Ditto on `dedent` and moving `banner`. ::: toolkit/components/telemetry/parse_scalars.py @@ +42,5 @@ > + def check_name(name, error_msg_prefix, allowed_char_regexp): > + # Check if we only have the allowed characters. > + chars_regxp = r'^[a-zA-Z0-9' + allowed_char_regexp + r']+$' > + if not re.search(chars_regxp, name): > + raise TypeError(error_msg_prefix + ' name must be alpha-numeric. Got: \'{}\''.format(name)) Why not use "" and avoid escaping? Ditto below. @@ +51,5 @@ > + ' name must not have a leading/trailing digit, a dot or underscore. Got: \'{}\''\ > + .format(name)) > + > + check_name(group_name, "Group", r'\.') > + check_name(probe_name, "Probe", r'_') Nit: Randomly mixed "" & ''. @@ +114,5 @@ > + # Check the type of the list content. > + broken_types = ['{} must be {}'.format(v, LIST_FIELDS_CONTENT[field].__name__) \ > + for v in definition[field] if not isinstance(v, LIST_FIELDS_CONTENT[field])] > + if len(broken_types) > 0: > + raise TypeError(self._name + ' - ' + field + ': ' + ', '.join(broken_types)) This needs a more descriptive error message. We want to message that field F for probe P must only contain values of type T.
Attachment #8761150 - Flags: review?(gfritzsche)
Attached patch bug1276190.patch (obsolete) — Splinter Review
Attachment #8761150 - Attachment is obsolete: true
Attachment #8761348 - Flags: review?(gfritzsche)
(In reply to Georg Fritzsche [:gfritzsche] from comment #22) > Comment on attachment 8761150 [details] [diff] [review] > bug1276190.patch > @@ +26,5 @@ > > + expired: > > + bug_numbers: > > + - 1276190 > > + description: This is an expired testing scalar; not meant to be touched. > > + expires: 4.0a1 > > As mentioned, you also need to add an unexpired probe with a specific > "expires" version (just put it far into the future). Yeah, sorry, I thought you meant something else. > @@ +81,5 @@ > > + """ > > + > > + # Write the scalar data file. > > + print(banner, file=output) > > + print(dedent(file_header), file=output) > > Why do we need this and import `dedent` etc. when you just can directly make > `file_header` contain what we want to print? I did that to deal with indentation in the multi line strings. Now that I've moved file_* near banner at the top of the file, that's no longer needed. I've removed dedent.
Status: NEW → ASSIGNED
Points: --- → 2
One thing i missed before: We should enforce a maximum string length on group & probe names right away to avoid problems later.
Attachment #8761348 - Flags: review?(gfritzsche)
Attached patch bug1276190.patch (obsolete) — Splinter Review
Attachment #8761348 - Attachment is obsolete: true
Attachment #8761928 - Flags: review?(gfritzsche)
(In reply to Georg Fritzsche [:gfritzsche] from comment #25) > One thing i missed before: > We should enforce a maximum string length on group & probe names right away > to avoid problems later. Just added that. I've enforced a 40char length limit on group and probe names. This will allow us to fit the current longest key paths in the group name (e.g. "environment.system.gfx.adapter1") and all the current probe names (e.g. CACHE_SERVICE_LOCK_WAIT_MAINTHREAD_NSCACHEENTRYDESCRIPTOR_GETPREDICTEDDATASIZE is one of the longest probe names, with the group prefix taking a consistent chunk of its length).
Comment on attachment 8761928 [details] [diff] [review] bug1276190.patch Review of attachment 8761928 [details] [diff] [review]: ----------------------------------------------------------------- This looks good with the below fixed. Lets add proper in-tree documentation here in a separate patch that contains: * what this file is good for * how/when it is called * what it generates * the expected format * restrictions on the individual fields We should flag Benjamin for feedback on that specification and the initial Scalars.yaml from a data collection perspective. ::: toolkit/components/telemetry/parse_scalars.py @@ +42,5 @@ > + def check_name(name, error_msg_prefix, allowed_char_regexp): > + # Check if we only have the allowed characters. > + chars_regxp = r'^[a-zA-Z0-9' + allowed_char_regexp + r']+$' > + if not re.search(chars_regxp, name): > + raise TypeError(error_msg_prefix + " name must be alpha-numeric. Got: '{}'".format(name)) ValueError @@ +46,5 @@ > + raise TypeError(error_msg_prefix + " name must be alpha-numeric. Got: '{}'".format(name)) > + > + # Don't allow leading/trailing digits, '.' or '_'. > + if re.search(r'(^[\d\._])|([\d\._])$', name): > + raise TypeError(error_msg_prefix + ValueError @@ +52,5 @@ > + .format(name)) > + > + # Enforce a maximum length on group and probe names. > + MAX_NAME_LENGTH = 40 > + for n in [group_name, probe_name]: Nit: move this before `def check_name`. @@ +54,5 @@ > + # Enforce a maximum length on group and probe names. > + MAX_NAME_LENGTH = 40 > + for n in [group_name, probe_name]: > + if len(n) > MAX_NAME_LENGTH: > + raise TypeError("'{}' must be limited to {} characters"\ `ValueError`. Lets make this a more helpful message pattern: "Name X exceeds maximum name length of N characters." @@ +116,5 @@ > + list_fields = [f for f in definition if isinstance(definition[f], list)] > + for field in list_fields: > + # Check for empty lists. > + if len(definition[field]) == 0: > + raise TypeError(self._name + ' - must not be empty.') You need to mention the field name as well in the message. @@ +244,5 @@ > + > + for probe_name in group: > + # We found a scalar type. Go ahead and parse it. > + scalar_info = group[probe_name] > + yield ScalarType(group_name, probe_name, scalar_info) I'm not sure why this actually is a generator. I think it would be clearer to just make this function `load_scalars()` and make it build and return a list.
Attachment #8761928 - Flags: review?(gfritzsche) → review+
Attachment #8761928 - Attachment is obsolete: true
Attachment #8762063 - Flags: review+
Attached patch Part 2 - Add the docs (obsolete) — Splinter Review
Comment on attachment 8762099 [details] [diff] [review] Part 2 - Add the docs @Benjamin, would you mind giving a feedback on the docs from a data collection perspective? We've put in place (and documented) a limit on the length of the data in the string scalar: this is currently capped at 50 chars, giving back an error on longer strings. Do you have an opinion on the cap values from the data collection perspective? We could always bump the limit if needed.
Attachment #8762099 - Flags: review?(gfritzsche)
Attachment #8762099 - Flags: feedback?(benjamin)
Comment on attachment 8762099 [details] [diff] [review] Part 2 - Add the docs Review of attachment 8762099 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/telemetry/docs/index.rst @@ +16,5 @@ > :maxdepth: 2 > > pings > common-ping > + scalars Nit: Lets move this after the environment. ::: toolkit/components/telemetry/docs/scalars.rst @@ +3,5 @@ > +======= > + > +Historically we started to overload our histogram mechanism to also collect scalar data, > +such as flag values, counts, labels and others. > +The scalar measurement types are the suggested way to collect that kind of scalar data. Add: "The serialized scalar data is submitted with the :doc:`main pings <main-ping>`." @@ +27,5 @@ > + probe: > + kind: int > + ... > + > +Groups and probes names need to follow a few rules: Nit: "Group and probe names". @@ +31,5 @@ > +Groups and probes names need to follow a few rules: > + > +- they cannot exceed 40 characters each; > +- group names must be alpha-numeric + '.', with no leading/trailing digit or '.'; > +- probe names must be alpha-numeric + '_', with no leading/trailing digit or '_' Lets make this ``.`` and ``_``. @@ +35,5 @@ > +- probe names must be alpha-numeric + '_', with no leading/trailing digit or '_' > + > +A probe can be defined as follows:: > + > + a_scalar: It might be confusing to not have the group name here. @@ +49,5 @@ > +--------------- > + > +- ``bug_numbers``: A list of unsigned integers representing the number of the bugs the probe was introduced in. > +- ``description``: A single or multi-line string describing the purpose of the probe. > +- ``expires``: The version number in which the scalar expires, e.g. "30"; a version number of type "N" and "N.0" is automatically converted to "N.0a1" in order to expire the scalar also in the development channels. A telemetry probe acting on an expired scalar will return an error. For scalars that never expire the value "never" can be used. ``never``? @@ +51,5 @@ > +- ``bug_numbers``: A list of unsigned integers representing the number of the bugs the probe was introduced in. > +- ``description``: A single or multi-line string describing the purpose of the probe. > +- ``expires``: The version number in which the scalar expires, e.g. "30"; a version number of type "N" and "N.0" is automatically converted to "N.0a1" in order to expire the scalar also in the development channels. A telemetry probe acting on an expired scalar will return an error. For scalars that never expire the value "never" can be used. > +- ``kind``: A string representing the scalar type. Allowed values are ``uint`` and ``string``. > +- ``notification_emails``: A list of email addresses to notify in case of regressions. "... in case of regressions or other issues." @@ +56,5 @@ > + > +Optional Fields > +--------------- > + > +- ``cpp_guard``: A string that gets inserted as an #ifdef directive around the automatically generated C++ declaration. This is typically used for platform-specific scalars, e.g.``ANDROID``. ``#ifdef``. Missing space before ``ANDROID``. @@ +58,5 @@ > +--------------- > + > +- ``cpp_guard``: A string that gets inserted as an #ifdef directive around the automatically generated C++ declaration. This is typically used for platform-specific scalars, e.g.``ANDROID``. > +- ``release_channel_collection``: This can be either ``opt-in`` (default) or ``opt-out``. With the former the scalar is submitted by default on pre-release channels; on the release channel only > +if the user opted into additional data collection. With the latter the scalar is submitted by default on release and pre-release channels, unless the user opted out. You need to remove the line break, the generated docs have this outside the list.
Attachment #8762099 - Flags: review?(gfritzsche) → review+
Comment on attachment 8762099 [details] [diff] [review] Part 2 - Add the docs >diff --git a/toolkit/components/telemetry/docs/scalars.rst b/toolkit/components/telemetry/docs/scalars.rst >+Required Fields >+--------------- >+ >+- ``bug_numbers``: A list of unsigned integers representing the number of the bugs the probe was introduced in. >+- ``description``: A single or multi-line string describing the purpose of the probe. The description should be primarily *what* data is collected and *when* the data is collected. Purpose/why is not necessary. >+- ``expires``: The version number in which the scalar expires, e.g. "30"; a version number of type "N" and "N.0" is automatically converted to "N.0a1" in order to expire the scalar also in the development channels. A telemetry probe acting on an expired scalar will return an error. For scalars that never expire the value "never" can be used. What does "a telemetry probe acting on an expired scalar will return an error" mean in practice? I hope that it doesn't actually throw JS exceptions: that's kinda what we don't want, so that we can expire metrics without requiring teams to remove the code that collected them. >+- ``notification_emails``: A list of email addresses to notify in case of regressions. "in case of regressions" is not correct yet. Alerts about expiring probes will be sent here. More importantly though, this is the email address that I (as data steward) will use to verify that probes are still useful.
Attachment #8762099 - Flags: feedback?(benjamin) → feedback-
Attachment #8762099 - Attachment is obsolete: true
Attachment #8764269 - Flags: review+
Comment on attachment 8764269 [details] [diff] [review] Part 2 - Add the docs Thanks for the feedback, Benjamin. Do you have any opinion on the length limit (50 chars) we enforce on the content for the scalar string type?
Attachment #8764269 - Flags: feedback?(benjamin)
Not really. From a data-steward perspective it's not critical to have any limit, but it's nice defense-in-depth and I'm sure you want it just to limit ping size.
Attachment #8764269 - Flags: feedback?(benjamin) → feedback+
As discussed over IRC, we can unfortunately have other, hidden services pulling histogram_tools.py out of mc. We don't want to break stuff while landing the scalar. As a short term workaround, I've restored histogram_tools.py to its original state, so it doesn't use shared_telemetry_utils.py. As a medium/longer term project and to discuss the issue, I've filed bug 1282098.
Attachment #8762063 - Attachment is obsolete: true
Attachment #8764958 - Flags: review+
Blocks: 1282098
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
No longer depends on: 1278556
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: