1401612 - Telemetry histogram validator should provide more than 1 error/warning/whatever at a time

Reporter

Description

•

7 years ago

STR:

1. try to add histogram
2. compile
3. error from the validator
4. fix error
5. compile
6. different error from the validator

(repeat steps 4-6 until you appease the validator)

Expected:

3. all errors from the validator
4. fix all the errors
5. compile
6. happy validator!


This is pretty annoying when trying to add a new histogram. We've added a lot of required properties, which then have whitelists for existing histograms, so just copy-pasting and modifying an existing histogram isn't really a guarantee to avoid validator issues.

Please can we make the compile/edit/recompile cycle less painful.

Chris H-C :chutten

Comment 1

•

7 years ago

There are a variety of things we could do to make this less painful:

1) We could refactor the errors into "immediately-fatal" and "eventually-fatal" and let it run through all of the "eventually-fatal" failures (missing fields, wrong-typed fields, etc) before kicking out of the process.
2) We could fix the build rules so `./mach build faster toolkit/components/telemetry/` would run the validator quickly and effectively after just changing one of the data definition files (Histograms.json, Scalars.yaml, Events.yaml)
3) We could introduce a validate script so that developers could run just that code from the command line like `python toolkit/components/telemetry/validate_all.py`

Mentor: chutten

status-firefox57: affected → wontfix

Priority: -- → P3

Whiteboard: [good second bug][lang=python]

Alessio Placitelli [:Dexter]

Comment 2

•

7 years ago

(In reply to Chris H-C :chutten from comment #1)
> 2) We could fix the build rules so `./mach build faster
> toolkit/components/telemetry/` would run the validator quickly and
> effectively after just changing one of the data definition files
> (Histograms.json, Scalars.yaml, Events.yaml)
> 3) We could introduce a validate script so that developers could run just
> that code from the command line like `python
> toolkit/components/telemetry/validate_all.py`

Both of these are already solved by running

> ./mach build toolkit/components/telemetry

The parsers for all the .json/.yaml files are the first things that get called before any real compilation happens. However, printing all the errors (1) seems like a sensible thing to do.

(In reply to :Gijs from comment #0)
> This is pretty annoying when trying to add a new histogram. We've added a
> lot of required properties, which then have whitelists for existing
> histograms, so just copy-pasting and modifying an existing histogram isn't
> really a guarantee to avoid validator issues.
> 
> Please can we make the compile/edit/recompile cycle less painful.

FWIW, there's a sample histogram with all the required properties in the documentation: https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/telemetry/collection/histograms.html#declaring-a-histogram

Aditya Bharti [:adbugger]

Assignee

Comment 3

•

7 years ago

Hi, I would like to work on this bug.

I've done some work regarding telemtry before but I'm fairly new and some resources regarding histograms (apart from https://firefox-source-docs.mozilla.org/toolkit/components/telemetry/telemetry/collection/histograms.html) and telemetry in general would be much appreciated.

Apart from that, where do the compiler files reside? If it's ok to work on this, I'll start by looking there.

Alessio Placitelli [:Dexter]

Comment 4

•

7 years ago

(In reply to Aditya Bharti [:adbugger] from comment #3)
> Hi, I would like to work on this bug.
 
Hi Aditya! I've assigned the bug to you, thank you for your interest in it!
 
> Apart from that, where do the compiler files reside? If it's ok to work on
> this, I'll start by looking there.

We have 3 different Python parser, one for each of our data primitives: scalars, events and histograms.

The one for the scalars lives at [1], the other 2 live in the same directory and are called parse_histograms.py and parse_events.py.

The istructions for fixing this bug can be found in comment 1

[1] - https://searchfox.org/mozilla-central/source/toolkit/components/telemetry/parse_scalars.py

Assignee: nobody → adibhar97

Aditya Bharti [:adbugger]

Assignee

Comment 5

•

7 years ago

So, I was just thinking about how to go about this fix. What if, instead of raising the 'eventually fatal' errors, I keep appending the ParserError objects into an array. When an 'immediately fatal' error is hit, I print the error messages for all errors in array and then raise the final 'immediately fatal' error? 

Since this will require modifying almost all 'raise ParserError' calls, is it a good idea to delegate this responsibilty to the ParserError class itself? (The error array can be a class attribute.) That way, it will be applied to parse_histograms.py, parse_events.py and parse_scalars.py all in one go.

Does this seem like a sensible way to proceed?

Flags: needinfo?(chutten)

Aditya Bharti [:adbugger]

Assignee

Comment 6

•

7 years ago

Attached patch Initial method towards fix for 1401612 (obsolete) — Details — Splinter Review

Here's an initial version of what I have in mind. Added a wrong histogram in Histograms.json and changed the ParserError class. For now, I've only changed two 'raise's in parse_histograms.py check_name() function to illustrate what happens.

Here is the output of ./mach build toolkit/components/telemetry:

 0:00.15 /usr/bin/make -C /home/user/mozilla-central/obj-x86_64-pc-linux-gnu -j4 -s backend
 0:00.22 /usr/bin/make -C toolkit/components/telemetry -j4 -s
 0:00.30 TelemetryHistogramData.inc
 0:00.30 TelemetryHistogramEnums.h
 0:00.30 TelemetryScalarData.h
 0:00.30 TelemetryScalarEnums.h
 0:00.52 Error for histogram name "LABELS_WRONG#HISTOGRAM": "#" is not allowed.
 0:00.52 
 0:00.52 Error processing histograms:
 0:00.52 Error for histogram name "LABELS_WRONG#HISTOGRAM":  can not start with "LABELS_".
 0:00.52 
 0:00.54 Error for histogram name "LABELS_WRONG#HISTOGRAM": "#" is not allowed.
 0:00.54 
 0:00.54 Error processing histograms:
 0:00.54 Error for histogram name "LABELS_WRONG#HISTOGRAM":  can not start with "LABELS_".
 0:00.54 
 0:00.54 backend.mk:30: recipe for target 'TelemetryHistogramEnums.h' failed
 0:00.54 make[1]: *** [TelemetryHistogramEnums.h] Error 1
 0:00.54 make[1]: *** Deleting file 'TelemetryHistogramEnums.h'
 0:00.56 make[1]: *** Waiting for unfinished jobs....
 0:00.56 backend.mk:23: recipe for target 'TelemetryHistogramData.inc' failed
 0:00.56 make[1]: *** [TelemetryHistogramData.inc] Error 1
 0:00.56 make[1]: *** Deleting file 'TelemetryHistogramData.inc'
 0:00.72 /home/user/mozilla-central/config/rules.mk:432: recipe for target 'default' failed
 0:00.72 make: *** [default] Error 2

Both errors are printed to console. I'm not sure why all errors are printed twice though; that was the behaviour even before my changes.

Flags: needinfo?(chutten)

Aditya Bharti [:adbugger]

Assignee

Comment 7

•

7 years ago

Sorry, accidentally canceled the needinfo back there.

Flags: needinfo?(chutten)

Chris H-C :chutten

Comment 8

•

7 years ago

I wonder if there's code that is printing the error and then a default exception handler that is also printing the error when it gets it.

This seems like a reasonable approach. I'm interested to see what a full conversion might look like, and how it might behave.

Chris H-C :chutten

Updated

•

7 years ago

Flags: needinfo?(chutten)

Aditya Bharti [:adbugger]

Assignee

Comment 9

•

7 years ago

IF there a default exception handler, then shouldn't only one error get duplicated? Only one 'immediately fatal' error is raised, the others just get their messages printed. I'm not sure what's happening here but I'll keep investigating.

As of now, a full conversion will require removing all 'raise' statements from the parsers and replacing them with ParserError().handle_now/later(). The 'raise' statements must be removed because they will only raise the error they are given and not print the messages of the previous 'eventually fatal' errors in array. 

To minimize changes to code, I thought about overloading the 'raise' keyword for the ParserError class so that the eventual errors are also printed but it appears that 'raise' is built into the python interpreter itself and is not a class method. So I guess modifying all the raise statements in the parser files is the only way. Given the changes involved, do you want me to proceed in this direction?

Flags: needinfo?(chutten)

Aditya Bharti [:adbugger]

Assignee

Comment 10

•

7 years ago

Alright, so I think I know why each error is printed twice.

I changed the print statement in ParserError().handle_now() to: print(self.error + "handling now"). The output of ./mach build toolkit/components/telemetry is:
 0:00.51 Error for histogram name "LABELS_WRONG#HISTOGRAM": "#" is not allowed.handling now
 0:00.51 
 0:00.51 Error processing histograms:
 0:00.51 Error for histogram name "LABELS_WRONG#HISTOGRAM":  can not start with "LABELS_".
 0:00.52 
 0:00.52 Error for histogram name "LABELS_WRONG#HISTOGRAM": "#" is not allowed.handling now
 0:00.52 
 0:00.52 Error processing histograms:
 0:00.52 Error for histogram name "LABELS_WRONG#HISTOGRAM":  can not start with "LABELS_".
 0:00.53 
The same print function is being called twice; there is no other default exception handler.

Looking at these lines in the output:
 0:00.54 backend.mk:23: recipe for target 'TelemetryHistogramData.inc' failed    <--
 0:00.54 make[1]: *** [TelemetryHistogramData.inc] Error 1
 0:00.54 make[1]: *** Deleting file 'TelemetryHistogramData.inc'
 0:00.54 make[1]: *** Waiting for unfinished jobs....
 0:00.54 backend.mk:30: recipe for target 'TelemetryHistogramEnums.h' failed     <--
 0:00.54 make[1]: *** [TelemetryHistogramEnums.h] Error 1
 0:00.54 make[1]: *** Deleting file 'TelemetryHistogramEnums.h'

It appears there is a backend.mk makefile that is calling the parsers twice in the build process. Since the build uses 4 parallel jobs (make -j4),
 0:00.14 /usr/bin/make -C /home/user/mozilla-central/obj-x86_64-pc-linux-gnu -j4 -s backend
 0:00.22 /usr/bin/make -C toolkit/components/telemetry -j4 -s
the error on line 23 (in backend.mk) does not stop the build completely, and it waits for unfinished jobs, causing the parser to be called again on line 30 and printing the errors again.

On doing a `find . -type f -name backend.mk mozilla-central`, I found many backend.mk files generated during the build process in the obj-linux directory. I think the file of interest is at ./obj-x86_64-pc-linux-gnu/toolkit/components/telemetry/backend.mk . It definitely has the following lines:

export:: TelemetryHistogramData.inc
GARBAGE += TelemetryHistogramData.inc
EXTRA_MDDEPEND_FILES += TelemetryHistogramData.inc.pp
TelemetryHistogramData.inc: /home/user/mozilla-central/toolkit/components/telemetry/gen_histogram_data.py $(srcdir)/Histograms.json $(topsrcdir)/dom/base/UseCounters.conf $(topsrcdir)/dom/base/nsDeprecatedOperationList.h
	$(REPORT_BUILD)     [line 23]
	$(call py_action,file_generate,/home/user/mozilla-central/toolkit/components/telemetry/gen_histogram_data.py main TelemetryHistogramData.inc $(MDDEPDIR)/TelemetryHistogramData.inc.pp $(srcdir)/Histograms.json $(topsrcdir)/dom/base/UseCounters.conf $(topsrcdir)/dom/base/nsDeprecatedOperationList.h)

export:: TelemetryHistogramEnums.h
GARBAGE += TelemetryHistogramEnums.h
EXTRA_MDDEPEND_FILES += TelemetryHistogramEnums.h.pp
TelemetryHistogramEnums.h: /home/user/mozilla-central/toolkit/components/telemetry/gen_histogram_enum.py $(srcdir)/Histograms.json $(topsrcdir)/dom/base/UseCounters.conf $(topsrcdir)/dom/base/nsDeprecatedOperationList.h
	$(REPORT_BUILD)     [line 30]
	$(call py_action,file_generate,/home/user/mozilla-central/toolkit/components/telemetry/gen_histogram_enum.py main TelemetryHistogramEnums.h $(MDDEPDIR)/TelemetryHistogramEnums.h.pp $(srcdir)/Histograms.json $(topsrcdir)/dom/base/UseCounters.conf $(topsrcdir)/dom/base/nsDeprecatedOperationList.h)

So, yeah, I guess a build process using parallel jobs is the source of the errors being printed twice. Since that is not a bug, I guess the only question I have now is whether or not to proceed modifying the parse calls as per comment #9 . Not canceling needinfo.

Chris H-C :chutten

Comment 11

•

7 years ago

Excellent deduction of the cause of the duplicate error messages! It highlights that we are parsing Histograms.json twice for each build (once for .inc, the other for .h), which might result in an interesting direction for future build improvements.

Please go ahead with your store-then-throw approach. I look forward to seeing the full solution.

Flags: needinfo?(chutten)

Aditya Bharti [:adbugger]

Assignee

Comment 12

•

7 years ago

Attached patch Example of partially functioning fix: extended ParserError class (obsolete) — Details — Splinter Review

I have removed all raise ParserError statements from parse_histograms.py for now. However, some raise ValueError statements still exist which cannot be handled by the ParserError class.

Extended implementation of ParserError class. The old version would not have raised an error or printed any 'eventually fatal' messages unless there was an 'immediately fatal' error.

The problem I'm facing now is the case of 'eventually fatal' errors without any 'immediate' ones. The use of a sys.exitfunc in the parser file checks for any remaining 'eventually fatal' errors and prints them to console. However, the build does not stop and continues:

0:00.14 /usr/bin/make -C /home/user/mozilla-central/obj-x86_64-pc-linux-gnu -j4 -s backend
0:00.22 /usr/bin/make -C toolkit/components/telemetry -j4 -s
0:00.30 TelemetryEventData.h
0:00.30 TelemetryEventEnums.h
0:00.30 TelemetryHistogramData.inc
0:00.30 TelemetryHistogramEnums.h
0:00.30 TelemetryProcessData.h
0:00.49 TelemetryProcessEnums.h
0:00.53 TelemetryScalarData.h
0:00.66 TelemetryScalarEnums.h
0:00.70 Error in sys.exitfunc:
0:00.70 Error for histogram name "LABELS_WRONG_HISTOGRAM": can not start with "LABELS_".
0:00.70 Traceback (most recent call last):
0:00.70 File "/home/user/mozilla-central/toolkit/components/telemetry/shared_telemetry_utils.py", line 44, in handle_now
0:00.70 raise self
0:00.70 shared_telemetry_utils.ParserError: Checking for residual errors.
0:01.14 Error in sys.exitfunc:
0:01.14 Error for histogram name "LABELS_WRONG_HISTOGRAM": can not start with "LABELS_".
0:01.14 Traceback (most recent call last):
0:01.14 File "/home/user/mozilla-central/toolkit/components/telemetry/shared_telemetry_utils.py", line 44, in handle_now
0:01.14 raise self
0:01.14 shared_telemetry_utils.ParserError: Checking for residual errors.
0:01.27 Elapsed: 0.05s; From ../../dist/idl: Kept 1012 existing; Added/updated 0; Removed 0 files and 0 directories.
0:13.81 libtoolkit_components_telemetry.a.desc
0:16.69 libtelemetrytest.a.desc
0:17.15 /usr/bin/make -C browser/app -j4 -s
0:17.42 /usr/bin/make -C toolkit/components/build -j4 -s
0:17.70 /usr/bin/make -C toolkit/library -j4 -s
0:17.75 libxul_s.a.desc
0:17.75 libxul.so

There is an 'eventually fatal' ParserError being being printed, and a generic 'checking for residual errors' ParserError being thrown but it is not stopping the build for some reason. There are two options at this point:
1) Figure out how to make the build stop in this case when only 'eventually fatal' errors occur. In this case, I will take some time to study the build process in detail.
2) Leave it to the responsibility of the programmer to read and handle the case when only 'eventually fatal' errors occur; and continue with the build. Continuing with a possibly time-consuming build seems pointless when errors are there.

In any case, a careful segregation of errors into 'eventually' and 'immediately' fatal is required, regardless of the final implementation. For now, I need some advice on how to proceed.

I tried an approach where the class keeps a track of the number of ParserError objects instantiated and performs a check when the '__del__' function is called on the error objects. However, the __del__ function is not always called when the interpreter exits. Hence I resorted to the current sys.exitfunc approach.

I will need some help handling this case of only 'eventually fatal' errors.

Note that when at least one 'immediately fatal' error occurs, everything works smoothly; the 'eventual' errors are printed and the 'immediate' error is thrown, stopping the build.

Flags: needinfo?(chutten)

Chris H-C :chutten

Comment 13

•

7 years ago

Comment on attachment 8934497 [details] [diff] [review]
Example of partially functioning fix: extended ParserError class

Review of attachment 8934497 [details] [diff] [review]:
-----------------------------------------------------------------

Formatting and cleanup aside, this looks very promising. I think we should prefer handle_later in all but the worst cases to attempt to print as many errors as possible at once. Obviously, any error that results in a state that is unrecoverable must be handled immediately, but I think those cases should be few and far between.

Instead of sys.exitfunc (deprecated in Python 2.4) have you considered the atexit module? If you wanted, it looks as though you could use that to store the ParserErrors as atexit handlers instead of storing them in the type.

::: toolkit/components/telemetry/parse_histograms.py
@@ +11,4 @@
>  import sys
>  import shared_telemetry_utils as utils
>  
> +from shared_telemetry_utils import ParserError; sys.exitfunc = ParserError('Checking for residual errors.').handle_now

Maybe the "raise self" doesn't result in a non-zero exit value for the script?

@@ +288,5 @@
>          if self._strict_type_checks:
>              pattern = '^[a-z][a-z0-9_]+[a-z0-9]$'
>              if not re.match(pattern, name, re.IGNORECASE):
> +                ParserError('Error for histogram name "%s": name does not conform to "%s"' %
> +                                  (name, pattern)).handle_now()

This error should be fine to handle later, I think

@@ +310,2 @@
>                                 '\nSee: {}#expires-in-version')
> +                              .format(name, field, expiration, HISTOGRAMS_DOC_URL)).handle_now()

Can probably be handle_later

@@ +321,5 @@
>  
>          invalid = filter(lambda l: len(l) > MAX_LABEL_LENGTH, labels)
>          if len(invalid) > 0:
> +            ParserError('Label values for "%s" exceed length limit of %d: %s' %
> +                              (name, MAX_LABEL_LENGTH, ', '.join(invalid))).handle_now()

Can probably be handle_later

@@ +326,4 @@
>  
>          if len(labels) > MAX_LABEL_COUNT:
> +            ParserError('Label count for "%s" exceeds limit of %d' %
> +                              (name, MAX_LABEL_COUNT)).handle_now()

Can probably be handle_later

@@ +332,5 @@
>          # the label values to a strict pattern.
>          invalid = filter(lambda l: not re.match(CPP_IDENTIFIER_PATTERN, l, re.IGNORECASE), labels)
>          if len(invalid) > 0:
> +            ParserError('Label values for %s are not matching pattern "%s": %s' %
> +                              (name, CPP_IDENTIFIER_PATTERN, ', '.join(invalid))).handle_now()

Can probably be handle_later

@@ +345,5 @@
>          DOC_URL = HISTOGRAMS_DOC_URL + "#record-in-processes"
>  
>          if not rip:
> +            ParserError('Histogram "%s" must have a "%s" field:\n%s'
> +                              % (name, field, DOC_URL)).handle_now()

Should probably be handle_later, then we can get multiple "missing field" errors.

@@ +350,5 @@
>  
>          for process in rip:
>              if not utils.is_valid_process_name(process):
> +                ParserError('Histogram "%s" has unknown process "%s" in %s.\n%s' %
> +                                  (name, process, field, DOC_URL)).handle_now()

Can be handle_later

@@ +417,4 @@
>  
>          for field in ['alert_emails', 'bug_numbers']:
>              if field not in definition and name not in whitelists[field]:
> +                ParserError('New histogram "%s" must have a "%s" field.' % (name, field)).handle_now()

This can be handle_later, no?

@@ +421,4 @@
>              if field in definition and name in whitelists[field]:
>                  msg = 'Histogram "%s" should be removed from the whitelist for "%s" in ' \
>                        'histogram-whitelists.json.'
> +                ParserError(msg % (name, field)).handle_now()

We want this to be handle_later so it can tell us all the whitelists it can be removed from all at once.

@@ +494,5 @@
>          if not self._strict_type_checks:
>              return
>          for key in definition.iterkeys():
>              if key not in allowed_keys:
> +                ParserError('Key "%s" is not allowed for histogram "%s".' % (key, name)).handle_now()

handle_later will allow us to enumerate all the invalid keys at once

@@ +509,5 @@
>                      ' amounts of resources. Contact a Telemetry peer (e.g. in #telemetry)'
>                      ' if you think an exception ought to be made:\n'
>                      'https://wiki.mozilla.org/Modules/Toolkit#Telemetry'
>                      % self._name
> +                    ).handle_now()

Can probably be handle_later

@@ +569,5 @@
>  
>          value = definition.get('releaseChannelCollection', 'opt-in')
>          if value not in datasets:
> +            ParserError('Unknown value for releaseChannelCollection'
> +                              ' policy for histogram "%s".' % self._name).handle_now()

handle_later

@@ +581,4 @@
>      d = collections.OrderedDict()
>      for key, value in ordered_pairs:
>          if strict_type_checks and key in d:
> +            ParserError("Found duplicate key in Histograms file: %s" % key).handle_now()

handle_later

@@ +666,4 @@
>  
>          for (name, definition) in histograms.iteritems():
>              if name in all_histograms:
> +                ParserError('Duplicate histogram name "%s".' % name).handle_now()

Can probably be handle_later

Aditya Bharti [:adbugger]

Assignee

Comment 14

•

7 years ago

Attached patch Used atexit. Same problem as before. Also extended ParserError. (obsolete) — Details — Splinter Review

(In reply to Chris H-C :chutten from comment #13)

> Instead of sys.exitfunc (deprecated in Python 2.4) have you considered the
> atexit module? If you wanted, it looks as though you could use that to store
> the ParserErrors as atexit handlers instead of storing them in the type.
I used sys.exitfunc because I thought importing an extra module could be avoided. Did not know about the deprecation though. Have used atexit.register in this version of the patch. Except for the strict_type_check error in check_name function (for demonstration, output below), all other errors have been converted to handle_later.

> ::: toolkit/components/telemetry/parse_histograms.py
> @@ +11,4 @@
> >  import sys
> >  import shared_telemetry_utils as utils
> >  
> > +from shared_telemetry_utils import ParserError; sys.exitfunc = ParserError('Checking for residual errors.').handle_now
> 
> Maybe the "raise self" doesn't result in a non-zero exit value for the
> script?
When calling ParserError.handle_now() directly, the "raise self" raises the error and correctly stops the build. Output below:
 0:00.14 /usr/bin/make -C /home/user/mozilla-central/obj-x86_64-pc-linux-gnu -j4 -s backend
 0:00.22 /usr/bin/make -C toolkit/components/telemetry -j4 -s
 0:00.30 TelemetryHistogramData.inc
 0:00.30 TelemetryHistogramEnums.h
 0:00.46 Error for histogram name "LABELS_WRONG#HISTOGRAM": "#" is not allowed.
 0:00.46 Error for histogram name "LABELS_WRONG#HISTOGRAM":  can not start with "LABELS_".
 0:00.46 
 0:00.46 Error processing histograms:
 0:00.46 Error for histogram name "LABELS_WRONG#HISTOGRAM": name does not conform to "^[a-z][a-z0-9_]+[a-z0-9]$"
 0:00.46 
 0:00.48 backend.mk:23: recipe for target 'TelemetryHistogramData.inc' failed
 0:00.48 make[1]: *** [TelemetryHistogramData.inc] Error 1
 0:00.48 make[1]: *** Deleting file 'TelemetryHistogramData.inc'
 0:00.48 make[1]: *** Waiting for unfinished jobs....
 0:00.48 Error for histogram name "LABELS_WRONG#HISTOGRAM": "#" is not allowed.
 0:00.48 Error for histogram name "LABELS_WRONG#HISTOGRAM":  can not start with "LABELS_".
 0:00.48 
 0:00.48 Error processing histograms:
 0:00.48 Error for histogram name "LABELS_WRONG#HISTOGRAM": name does not conform to "^[a-z][a-z0-9_]+[a-z0-9]$"
 0:00.48 
 0:00.50 backend.mk:30: recipe for target 'TelemetryHistogramEnums.h' failed
 0:00.50 make[1]: *** [TelemetryHistogramEnums.h] Error 1
 0:00.50 make[1]: *** Deleting file 'TelemetryHistogramEnums.h'
 0:00.50 /home/user/mozilla-central/config/rules.mk:432: recipe for target 'default' failed
 0:00.50 make: *** [default] Error 2
All errors are correctly printed and the last 'immediately fatal' error is raised, stopping the build, so I don't think the "raise self" statement is the problem.

> @@ +288,5 @@
> >          if self._strict_type_checks:
> >              pattern = '^[a-z][a-z0-9_]+[a-z0-9]$'
> >              if not re.match(pattern, name, re.IGNORECASE):
> > +                ParserError('Error for histogram name "%s": name does not conform to "%s"' %
> > +                                  (name, pattern)).handle_now()
> 
> This error should be fine to handle later, I think
This error is handled now just for above output. Will change in final fix. Also, why define pattern again, when the global CPP_IDENTIFIER_PATTERN is the same?

> Instead of sys.exitfunc (deprecated in Python 2.4) have you considered the
> atexit module? If you wanted, it looks as though you could use that to store
> the ParserErrors as atexit handlers instead of storing them in the type.
The atexit.register provides the same functionality as sys.exitfunc and runs sys.exitfunc in the background. I tried registering a new atexit handler for each 'handle_later' error. The same problem persists, the errors are printed but the build doesn't stop, even if an error is raised. It appears the return value of any exit handler is ignored. So for now, added a classmethod, ParserError.exit_func, and registered as exit_handler in parse_histograms.py to handle all 'eventually fatal' errors upon exit. The output now is more readable than multiple exit handlers but the same problem persists; all the errors are still printed but the build doesn't stop:

 0:00.17 /usr/bin/make -C /home/user/mozilla-central/obj-x86_64-pc-linux-gnu -j4 -s backend
 0:00.24 /usr/bin/make -C toolkit/components/telemetry -j4 -s
 0:00.32 TelemetryEventData.h
 0:00.32 TelemetryEventEnums.h
 0:00.33 TelemetryHistogramData.inc
 0:00.33 TelemetryHistogramEnums.h
 0:00.33 TelemetryProcessData.h
 0:00.53 TelemetryProcessEnums.h
 0:00.55 TelemetryScalarData.h
 0:00.70 TelemetryScalarEnums.h
 0:00.72 Error for histogram name "LABELS_WRONG#HISTOGRAM": "#" is not allowed.
 0:00.72 Error for histogram name "LABELS_WRONG#HISTOGRAM":  can not start with "LABELS_".
 0:00.72 Error for histogram name "LABELS_WRONG#HISTOGRAM": name does not conform to "^[a-z][a-z0-9_]+[a-z0-9]$"   <-- changed to handle later for this output
 0:00.72 Key "what_should_happen" is not allowed for histogram "LABELS_WRONG#HISTOGRAM".
 0:00.72 Error in atexit._run_exitfuncs:
 0:00.72 Traceback (most recent call last):
 0:00.72   File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
 0:00.72     func(*targs, **kargs)
 0:00.72   File "/home/user/mozilla-central/toolkit/components/telemetry/shared_telemetry_utils.py", line 54, in exit_func
 0:00.72     cls("Some errors occurred").handle_now()
 0:00.73   File "/home/user/mozilla-central/toolkit/components/telemetry/shared_telemetry_utils.py", line 44, in handle_now
 0:00.73     raise self
 0:00.73 ParserError: Some errors occurred
 0:00.73 Error in sys.exitfunc:
 0:00.73 Traceback (most recent call last):
 0:00.73   File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
 0:00.73     func(*targs, **kargs)
 0:00.73   File "/home/user/mozilla-central/toolkit/components/telemetry/shared_telemetry_utils.py", line 54, in exit_func
 0:00.73     cls("Some errors occurred").handle_now()
 0:00.73   File "/home/user/mozilla-central/toolkit/components/telemetry/shared_telemetry_utils.py", line 44, in handle_now
 0:00.73     raise self
 0:00.73 shared_telemetry_utils.ParserError: Some errors occurred
 0:01.15 Error for histogram name "LABELS_WRONG#HISTOGRAM": "#" is not allowed.
 0:01.15 Error for histogram name "LABELS_WRONG#HISTOGRAM":  can not start with "LABELS_".
 0:01.15 Error for histogram name "LABELS_WRONG#HISTOGRAM": name does not conform to "^[a-z][a-z0-9_]+[a-z0-9]$"
 0:01.15 Key "what_should_happen" is not allowed for histogram "LABELS_WRONG#HISTOGRAM".
 0:01.15 Error in atexit._run_exitfuncs:
 0:01.15 Traceback (most recent call last):
 0:01.15   File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
 0:01.15     func(*targs, **kargs)
 0:01.15   File "/home/user/mozilla-central/toolkit/components/telemetry/shared_telemetry_utils.py", line 54, in exit_func
 0:01.15     cls("Some errors occurred").handle_now()
 0:01.15   File "/home/user/mozilla-central/toolkit/components/telemetry/shared_telemetry_utils.py", line 44, in handle_now
 0:01.15     raise self
 0:01.15 ParserError: Some errors occurred
 0:01.15 Error in sys.exitfunc:
 0:01.15 Traceback (most recent call last):
 0:01.15   File "/usr/lib/python2.7/atexit.py", line 24, in _run_exitfuncs
 0:01.15     func(*targs, **kargs)
 0:01.15   File "/home/user/mozilla-central/toolkit/components/telemetry/shared_telemetry_utils.py", line 54, in exit_func
 0:01.15     cls("Some errors occurred").handle_now()
 0:01.15   File "/home/user/mozilla-central/toolkit/components/telemetry/shared_telemetry_utils.py", line 44, in handle_now
 0:01.15     raise self
 0:01.15 shared_telemetry_utils.ParserError: Some errors occurred
 0:01.28 Elapsed: 0.05s; From ../../dist/idl: Kept 1012 existing; Added/updated 0; Removed 0 files and 0 directories.
 0:01.63 In file included from /home/user/mozilla-central/toolkit/components/telemetry/Telemetry.h:16:0,
 0:01.63                  from /home/user/mozilla-central/toolkit/components/telemetry/KeyedStackCapturer.h:9,
 0:01.63                  from /home/user/mozilla-central/toolkit/components/telemetry/KeyedStackCapturer.cpp:7:
 0:01.63 /home/user/mozilla-central/obj-x86_64-pc-linux-gnu/dist/include/mozilla/TelemetryHistogramEnums.h:13:15: error: stray ‘#’ in program
 0:01.63    LABELS_WRONG#HISTOGRAM,
 0:01.63                ^
 0:01.83 In file included from /home/user/mozilla-central/toolkit/components/telemetry/Telemetry.h:16:0,
 0:01.83                  from /home/user/mozilla-central/toolkit/components/telemetry/Telemetry.cpp:51:
 0:01.83 /home/user/mozilla-central/obj-x86_64-pc-linux-gnu/dist/include/mozilla/TelemetryHistogramEnums.h:13:15: error: stray ‘#’ in program
 0:01.83    LABELS_WRONG#HISTOGRAM,
 0:01.83                ^ 
 
As per this, the atexit function runs and raises and error using the "raise self" statement which previously stopped the build. This time, the error when raised by the exitfunc (sys.exitfunc or atexit.register), it is ignored and the build continues, resulting in errors further on. Any other ideas on how to proceed? In the meantime, where can I find resources to understand the build process?

NOTES:
-----
1. Imported sys module in parser. All error messages are printed to sys.stderr now, instead of the normal stdout. Not directly related to this bug, but seemed appropriate because otherwise some errors would go to stdout and some errors (e.g. the final 'immediately fatal' one which is raised) would go to stderr.

2. In the 'strict_type_check' condition of the check_name function in parse_histograms.py, the pattern variable is defined again even though it is equal to the global CPP_IDENTIFIER_PATTERN. Not related to this bug so not touching, but seems like something that can be changed though.

Attachment #8933486 - Attachment is obsolete: true

Attachment #8934497 - Attachment is obsolete: true

Chris H-C :chutten

Comment 15

•

7 years ago

You are likely correct about the pattern/CPP_IDENTIFIER_PATTERN duplication. It was likely missed in a refactor some time ago. Please file a bug in Toolkit::Telemetry so that we don't forget about it.

You've been very patient with the oddities in our build. Thank you for that. I'm not sure why raising errors at exit isn't returning a non-0 exit code (which is what the build process uses to detect failed steps).

So let's go straight to the source, then. Instead of raise self, print the error being handled now and then sys.exit(1). I've tried this locally and it brings the build to a satisfying halt.

Flags: needinfo?(chutten)

Aditya Bharti [:adbugger]

Assignee

Comment 16

•

7 years ago

Attached patch Used sys.exit. Build halts successfullly now. (obsolete) — Details — Splinter Review

You are right. The build respects the return value from sys.exit. Output attached:

 0:00.16 /usr/bin/make -C /home/user/mozilla-central/obj-x86_64-pc-linux-gnu -j4 -s backend
 0:00.24 /usr/bin/make -C toolkit/components/telemetry -j4 -s
 0:00.31 TelemetryHistogramData.inc
 0:00.31 TelemetryHistogramEnums.h
 0:00.31 TelemetryProcessData.h
 0:00.31 TelemetryProcessEnums.h
 0:00.31 TelemetryScalarData.h
 0:00.52 TelemetryScalarEnums.h
 0:00.71 Error for histogram name "LABELS_WRONG#HISTOGRAM": "#" is not allowed.
 0:00.71 Error for histogram name "LABELS_WRONG#HISTOGRAM":  can not start with "LABELS_".
 0:00.71 Error for histogram name "LABELS_WRONG#HISTOGRAM": name does not conform to "^[a-z][a-z0-9_]+[a-z0-9]$"
 0:00.71 Key "what_should_happen" is not allowed for histogram "LABELS_WRONG#HISTOGRAM".
 0:00.71 Exception KeyError: KeyError(140319735588608,) in <module 'threading' from '/usr/lib/python2.7/threading.pyc'> ignored
 0:00.72 backend.mk:30: recipe for target 'TelemetryHistogramEnums.h' failed
 0:00.73 make[1]: *** [TelemetryHistogramEnums.h] Error 1
 0:00.73 make[1]: *** Deleting file 'TelemetryHistogramEnums.h'
 0:00.73 make[1]: *** Waiting for unfinished jobs....
 0:00.98 Error for histogram name "LABELS_WRONG#HISTOGRAM": "#" is not allowed.
 0:00.98 Error for histogram name "LABELS_WRONG#HISTOGRAM":  can not start with "LABELS_".
 0:00.98 Error for histogram name "LABELS_WRONG#HISTOGRAM": name does not conform to "^[a-z][a-z0-9_]+[a-z0-9]$"
 0:00.98 Key "what_should_happen" is not allowed for histogram "LABELS_WRONG#HISTOGRAM".
 0:00.98 Exception KeyError: KeyError(140009284740864,) in <module 'threading' from '/usr/lib/python2.7/threading.pyc'> ignored
 0:00.99 backend.mk:23: recipe for target 'TelemetryHistogramData.inc' failed
 0:00.99 make[1]: *** [TelemetryHistogramData.inc] Error 1
 0:00.99 make[1]: *** Deleting file 'TelemetryHistogramData.inc'
 0:01.03 /home/user/mozilla-central/config/rules.mk:432: recipe for target 'default' failed
 0:01.03 make: *** [default] Error 2

I guess now cleanup and formatting is left before this turns into a proper patch. 

What exactly did you mean by cleanup?
Regarding formatting, eslint shows no errors or warnings on the changed files. Is there another linter that I have to use?

Also, what about parse_scalars.py and parse_events.py? They also use the ParserError class and can be changed to print multiple errors at a time. I guess that will be a different bug, if at all.

Attachment #8934828 - Attachment is obsolete: true

Flags: needinfo?(chutten)

Aditya Bharti [:adbugger]

Assignee

Comment 17

•

7 years ago

Attached patch Used sys.exit. Build halts successfullly now. [Uploaded wrong diff in previous attachment] (obsolete) — Details — Splinter Review

Accidentally uploaded wrong diff file in comment 16. Apologies for that.

Attachment #8935636 - Attachment is obsolete: true

Aditya Bharti [:adbugger]

Assignee

Comment 18

•

7 years ago

Filed bug 1424148 regarding the duplication of pattern and CPP_IDENTIFIER_PATTERN in parse_histograms.py.

Alessio Placitelli [:Dexter]

Comment 19

•

7 years ago

(In reply to Aditya Bharti [:adbugger] from comment #16)
> What exactly did you mean by cleanup?
> Regarding formatting, eslint shows no errors or warnings on the changed
> files. Is there another linter that I have to use?

For python files, flake8 should be used. You can run it via:

./mach lint -l flake8 toolkit/components/telemetry

Chris H-C :chutten

Comment 20

•

7 years ago

Like for the pattern/CPP_IDENTIFIER_PATTERN duplication, we'll need separate bugs to handle those files. Luckily bugs are cheap :)

Follow Alessio's advice to run the python linter, remove any commented-out code, and set the r? flag on the patch to :chutten, and we'll see about making this available for everyone to enjoy.

Flags: needinfo?(chutten)

Aditya Bharti [:adbugger]

Assignee

Comment 21

•

7 years ago

Attached patch 1401612_v5.patch (obsolete) — Details — Splinter Review

Handling of 'only eventual errors' case:

0:00.16 /usr/bin/make -C /home/user/mozilla-central/obj-x86_64-pc-linux-gnu -j4 -s backend
0:00.23 /usr/bin/make -C toolkit/components/telemetry -j4 -s
0:00.31 TelemetryEventData.h
0:00.31 TelemetryEventEnums.h
0:00.31 TelemetryHistogramData.inc
0:00.31 TelemetryHistogramEnums.h
0:00.31 TelemetryProcessData.h
0:00.50 TelemetryProcessEnums.h
0:00.51 TelemetryScalarData.h
0:00.67 TelemetryScalarEnums.h
0:00.77 Error for histogram name "LABELS_WRONG#HISTOGRAM": "#" is not allowed.
0:00.77 Error for histogram name "LABELS_WRONG#HISTOGRAM":  can not start with "LABELS_".
0:00.77 Error for histogram name "LABELS_WRONG#HISTOGRAM": name does not conform to "^[a-z][a-z0-9_]+[a-z0-9]$"
0:00.77 Key "what_should_happen" is not allowed for histogram "LABELS_WRONG#HISTOGRAM".
0:00.77 Some errors occurred
0:00.77 Exception KeyError: KeyError(139650580150016,) in <module 'threading' from '/usr/lib/python2.7/threading.pyc'> ignored

Handling of case with an 'immediately fatal' error:

0:00.42 Traceback (most recent call last):
0:00.42   File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
0:00.42 Traceback (most recent call last):
0:00.42   File "/usr/lib/python2.7/runpy.py", line 174, in _run_module_as_main
0:00.44     "__main__", fname, loader, pkg_name)
0:00.44     "__main__", fname, loader, pkg_name)
0:00.45   File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
0:00.45   File "/usr/lib/python2.7/runpy.py", line 72, in _run_code
0:00.45     exec code in run_globals
0:00.45   File "/home/user/mozilla-central/python/mozbuild/mozbuild/action/file_generate.py", line 112, in <module>
0:00.45     exec code in run_globals
0:00.45   File "/home/user/mozilla-central/python/mozbuild/mozbuild/action/file_generate.py", line 112, in <module>
0:00.45     sys.exit(main(sys.argv[1:]))
0:00.45   File "/home/user/mozilla-central/python/mozbuild/mozbuild/action/file_generate.py", line 63, in main
0:00.45     sys.exit(main(sys.argv[1:]))
0:00.45     ret = module.__dict__[method](output, *args.additional_arguments)
0:00.45   File "/home/user/mozilla-central/toolkit/components/telemetry/gen_histogram_enum.py", line 49, in main
0:00.45   File "/home/user/mozilla-central/python/mozbuild/mozbuild/action/file_generate.py", line 63, in main
0:00.45     all_histograms = list(parse_histograms.from_files(filenames))
0:00.45   File "/home/user/mozilla-central/toolkit/components/telemetry/parse_histograms.py", line 704, in from_files
0:00.45     ret = module.__dict__[method](output, *args.additional_arguments)
0:00.45   File "/home/user/mozilla-central/toolkit/components/telemetry/gen_histogram_data.py", line 196, in main
0:00.45     histograms = list(parse_histograms.from_files(filenames))
0:00.45   File "/home/user/mozilla-central/toolkit/components/telemetry/parse_histograms.py", line 704, in from_files
0:00.45     yield Histogram(name, definition, strict_type_checks=strict_type_checks)
0:00.45   File "/home/user/mozilla-central/toolkit/components/telemetry/parse_histograms.py", line 131, in __init__
0:00.45     self.verify_attributes(name, definition)
0:00.45   File "/home/user/mozilla-central/toolkit/components/telemetry/parse_histograms.py", line 268, in verify_attributes
0:00.45     allowed_keys = table[kind]
0:00.45 KeyError    yield Histogram(name, definition, strict_type_checks=strict_type_checks)
0:00.45   File "/home/user/mozilla-central/toolkit/components/telemetry/parse_histograms.py", line 131, in __init__
0:00.45 : u'unsupported'
0:00.45     self.verify_attributes(name, definition)
0:00.45   File "/home/user/mozilla-central/toolkit/components/telemetry/parse_histograms.py", line 268, in verify_attributes
0:00.45     allowed_keys = table[kind]
0:00.45 KeyErrorUnknown kind "unsupported" for histogram "LABELS_WRONG#HISTOGRAM".
0:00.45 : u'unsupported'
0:00.45 Some errors occurred

More verbose but still informative.

With the wrong histogram removed, the build passes. Uploading patch. Commented code removed and files properly linted.

Attachment #8935638 - Attachment is obsolete: true

Attachment #8935999 - Flags: review?(chutten)

Chris H-C :chutten

Comment 22

•

6 years ago

Comment on attachment 8935999 [details] [diff] [review]
1401612_v5.patch

Review of attachment 8935999 [details] [diff] [review]:
-----------------------------------------------------------------

I apologize for the delay.

This patch looks good! One question to be answered or addressed, and then we're good to go, I think. Do you have access to run this on try, or would you like me to handle it?

::: toolkit/components/telemetry/parse_histograms.py
@@ +423,4 @@
>  
>          for field in ['alert_emails', 'bug_numbers']:
>              if field not in definition and name not in whitelists[field]:
> +                ParserError('New histogram "%s" must have a "%s" field.' %

Does this have to be handled immediately?

Attachment #8935999 - Flags: review?(chutten) → review+

Aditya Bharti [:adbugger]

Assignee

Comment 23

•

6 years ago

Attached patch 1401612_v6.patch — Details — Splinter Review

I guess you're right. With the build now halting successfully in case of eventual errors, handling this eventually makes sense too. Made the required change. Marked the previous patch obsolete. flake8 checks out.

I do not have try server access. I'll need your help to handle this one on the try server. For future work though, how do I get access?

Attachment #8935999 - Attachment is obsolete: true

Attachment #8936997 - Flags: review?(chutten)

Chris H-C :chutten

Comment 24

•

6 years ago

Comment on attachment 8936997 [details] [diff] [review]
1401612_v6.patch

Review of attachment 8936997 [details] [diff] [review]:
-----------------------------------------------------------------

The full description of what Try is and how to get access is on the wiki over here: https://wiki.mozilla.org/ReleaseEngineering/TryServer

The short summary is you need Level 1 Commit Access to access try, and then you need to do some configuration. You don't have to worry about this now if you don't want to. There are always plenty of people around to help :)

For now I have pushed your commit up to try so that we can watch treeherder as the various builds come back and tell us whether or not they succeeded: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d3c331f08ebabcbeca6d28c77d88586e27754c7f

If these all come back green, we'll mark this as checkin-needed and get it into the tree.

Attachment #8936997 - Flags: review?(chutten) → review+

Chris H-C :chutten

Comment 25

•

6 years ago

Builds clean on try. Let's get this in.

Keywords: checkin-needed

Pulsebot

Comment 26

•

6 years ago

Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/92dbf32cc352
Telemetry histogram validator now prints multiple error messages at a time. r=chutten

Keywords: checkin-needed

Aditya Bharti [:adbugger]

Assignee

Comment 27

•

6 years ago

Is it a good idea to file bugs regarding changing the parse_events.py and parse_scalars.py right after this check-in or should I wait for the status change from NEW to VERIFIED/RESOLVED?

Flags: needinfo?(chutten)

Dorel Luca [:dluca]

Comment 28

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/92dbf32cc352

Status: NEW → RESOLVED

Closed: 6 years ago

status-firefox59: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla59

Chris H-C :chutten

Comment 29

•

6 years ago

You can do that (and start working on them, if you like :D ) whenever you'd like. I tend to file bugs as soon as I think of them, so I don't forget. Since they're related, you can put this bug's number (1401612) in the "See Also" field of these other bugs.

Flags: needinfo?(chutten)

Aditya Bharti [:adbugger]

Assignee

Updated

•

6 years ago

Updated

•

6 years ago

Initial method towards fix for 1401612 7 years ago Aditya Bharti [:adbugger] 2.66 KB, patch		Details \| Diff \| Splinter Review
Example of partially functioning fix: extended ParserError class 7 years ago Aditya Bharti [:adbugger] 16.71 KB, patch		Details \| Diff \| Splinter Review
Used atexit. Same problem as before. Also extended ParserError. 7 years ago Aditya Bharti [:adbugger] 17.00 KB, patch		Details \| Diff \| Splinter Review
Used sys.exit. Build halts successfullly now. 7 years ago Aditya Bharti [:adbugger] 16.99 KB, patch		Details \| Diff \| Splinter Review
Used sys.exit. Build halts successfullly now. [Uploaded wrong diff in previous attachment] 7 years ago Aditya Bharti [:adbugger] 17.04 KB, patch		Details \| Diff \| Splinter Review
1401612_v5.patch 7 years ago Aditya Bharti [:adbugger] 17.76 KB, patch	chutten : review+	Details \| Diff \| Splinter Review
1401612_v6.patch 6 years ago Aditya Bharti [:adbugger] 17.77 KB, patch	chutten : review+	Details \| Diff \| Splinter Review