Closed Bug 1337022 Opened 7 years ago Closed 7 years ago

Fix the regex in the Telemetry event parser

Categories

(Toolkit :: Telemetry, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox54 --- wontfix
firefox55 --- fixed

People

(Reporter: Dexter, Assigned: fedepad)

References

Details

(Whiteboard: [measurement:client])

Attachments

(1 file)

The current regular expression in the Telemetry events doc doesn't match [1] the provided example data (e.g. "ui").

> ^[:alpha:][:alnum:_.]+[:alnum:]$

We should fix the regex and update the schemas that live at [2]

[1] - https://gecko.readthedocs.io/en/latest/toolkit/components/telemetry/telemetry/collection/events.html
[2] - https://github.com/mozilla-services/mozilla-pipeline-schemas
Blocks: 1302666
Whiteboard: [measurement:client]
The regex in parse_events.py is too strict here.
We should change it and update the docs.
Priority: -- → P1
Summary: Fix the regex in the Telemetry event docs → Fix the regex in the Telemetry event parser
Priority: P1 → P2
Priority: P2 → P3
(In reply to Georg Fritzsche [:gfritzsche] [at workweek] from comment #1)
> The regex in parse_events.py is too strict here.
> We should change it and update the docs.

I think I have an idea on how to change minimally the "IDENTIFIER_PATTERN" in parse_events.py to match also on "ui" and update the docs.

But I have no idea about the schemas Alessio is talking....should also this patch supposed to take care of those too?

And who should I put as a reviewer after I push for review?

Thanks!
(In reply to Federico Padua (fedepad) from comment #2)
> (In reply to Georg Fritzsche [:gfritzsche] [at workweek] from comment #1)
> > The regex in parse_events.py is too strict here.
> > We should change it and update the docs.
> 
> I think I have an idea on how to change minimally the "IDENTIFIER_PATTERN"
> in parse_events.py to match also on "ui" and update the docs.
> 
> But I have no idea about the schemas Alessio is talking....should also this
> patch supposed to take care of those too?

The schemas are linked in comment 0. Once this patch is r+, we can work on a PR to update the schemas there.

You can put Georg or me as reviewers.
Assignee: nobody → federico_padua
Status: NEW → ASSIGNED
Hello! Seems like I figured it out. I propose this new regex: "^[a-z]+[a-z0-9]*[_.]?[a-z0-9]+$"

I've tested it:

import re
mylist = ["ui","u","ui_","_ui","ui8","8u","8","qwert","qw_ty","qw_8","qw.9m","u8u_8g","8u_g"]
for x in mylist:
 match = re.search("^[a-z]+[a-z0-9]*[_.]?[a-z0-9]+$",x)
 if match:
  print x + " is fine"
 else: 
   print x + " is not!"	

And the output is:

ui    is fine
u    is not!
ui_    is not!
_ui    is not!
ui8    is fine
8u    is not!
8      is not!
qwert    is fine
qw_ty    is fine
qw_8    is fine
qw.9m    is fine
u8u_8g    is fine
8u_g    is not!

If the fine outputs are eligible, I'd like to be assigned to this bug and submit a patch.
Hello! I propose this new regex: "^[a-z]+[a-z0-9]*[_.]?[a-z0-9]+$"

I've tested it:

import re
mylist = ["ui","u","ui_","_ui","ui8","8u","8","qwert","qw_ty","qw_8","qw.9m","u8u_8g","8u_g"]
for x in mylist:
 match = re.search("^[a-zA-Z]+[a-zA-Z0-9]*[_.]?[a-zA-Z0-9]+$",x)
 if match:
  print x + " is fine"
 else: 
   print x + " is not!"	

And the output is:

ui    is fine
u    is not!
ui_    is not!
_ui    is not!
ui8    is fine
8u    is not!
8      is not!
qwert    is fine
qw_ty    is fine
qw_8    is fine
qw.9m    is fine
u8u_8g    is fine
8u_g    is not!

If the fine outputs are eligible, I'd like to be assigned to this bug and submit a patch.
Flags: needinfo?(gfritzsche)
Flags: needinfo?(alessio.placitelli)
Hi Daria!

It looks Federico is already working on this bug. I'll follow up separately with a recommendation for a different bug.
Flags: needinfo?(gfritzsche)
Flags: needinfo?(alessio.placitelli)
Sunah, does the suggested regex (reported below) look sane to you?

> ^[:alpha:][:alnum:_.]*[:alnum:]$

The original one fails to validate the example identifiers reported in the docs (e.g. "ui"):

> ^[:alpha:][:alnum:_.]+[:alnum:]$
Flags: needinfo?(ssuh)
Makes sense to me -- parsing this in plain English for my own understanding: the old regex required an alphabetical character as the first character and an alphanumeric character as the last character, and *required* at least one more character in the middle that's alphanumeric or underscore or period. The new regex allows the absence of a middle character.

One thing to note is that the regex still requires the string to be at least 2 characters (which seems like an acceptable limitation to me, at least.)
Cheers, thanks Sunah!
Flags: needinfo?(ssuh)
Comment on attachment 8848474 [details]
Bug 1337022 - Fix the regex in the Telemetry event parser.

https://reviewboard.mozilla.org/r/121400/#review125546

Thanks Federico, sorry for the delay!
Attachment #8848474 - Flags: review?(alessio.placitelli) → review+
Comment on attachment 8848474 [details]
Bug 1337022 - Fix the regex in the Telemetry event parser.

https://reviewboard.mozilla.org/r/121400/#review125546

No problem! Trigger try builds just in case! And let me know if needs rebase...
Pushed by alessio.placitelli@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/c0a377462d47
Fix the regex in the Telemetry event parser. r=Dexter
(In reply to Federico Padua (fedepad) from comment #12)
> Comment on attachment 8848474 [details]
> Bug 1337022 - Fix the regex in the Telemetry event parser.
> 
> https://reviewboard.mozilla.org/r/121400/#review125546
> 
> No problem! Trigger try builds just in case! And let me know if needs
> rebase...

Pheew, no need for a rebased patch ;-) There's just one bit of work left to do.

Would you kindly file a PR against [1] to add the event id regex to the main ping schema [2] so that we can properly validate stuff that comes in?

Feel free to flag me as a reviewer there too.

[1] - https://github.com/mozilla-services/mozilla-pipeline-schemas
[2] - https://github.com/mozilla-services/mozilla-pipeline-schemas/blob/master/telemetry/main.schema.json#L841
Flags: needinfo?(federico_padua)
Should be ready. 
So I just a need to open a PR on GitHub?
And with what title and commit message? I guess should be related to this but maybe not the same title, etc....

Let me know and I will open it soon!
Flags: needinfo?(federico_padua) → needinfo?(alessio.placitelli)
Yes, just open a PR on Github.

"Add regex validation to Telemetry events" could work as title/commit message. Feel free to add additional lines to the commit message as you consider appropriate.
Flags: needinfo?(alessio.placitelli)
https://hg.mozilla.org/mozilla-central/rev/c0a377462d47
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: