Closed Bug 1282098 Opened 3 years ago Closed Last year

Pull probe parsers out of m-c and into a separate package

Categories

(Toolkit :: Telemetry, defect, P1)

defect
Points:
1

Tracking

()

RESOLVED FIXED
mozilla64
Tracking Status
firefox64 --- fixed

People

(Reporter: Dexter, Assigned: Dexter)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [measurement:client])

Attachments

(1 file)

Bug 1276190 adds parse_scalars.py, a script that adds scalar support as histogram_tools.py [2] does with scalars. They have a few functions in common that were put in shared_telemetry_utils.py.

We know that a.t.m.o uses histogram_tools.py to produce aggregates, through [1]. Unfortunately, could be other hidden services fetching histogram_tools.py and using the exposed API. For this reason, we cannot make it depend on another file without breaking these services.

We need a smarter way to handle this kind of situations. We could, for example, package histogra_tools.py, parse_scalars.py and shared_telemetry_utils.py in a python package and fetch it using pip. Unfortunately, we're not sure this could be used within our build system.

[1] - https://github.com/mozilla/python_moztelemetry/
[2] - https://dxr.mozilla.org/mozilla-central/rev/d1102663db10b3d4b9358f3cf4e16b7c56902352/toolkit/components/telemetry/histogram_tools.py
Blocks: 1275517
Whiteboard: [measurement:client]
Depends on: 1276190
Priority: -- → P3
Blocks: 1201022
No longer blocks: 1275517
Priority: P3 → P2
We also have parse_events.py now.
This bug seems to have a few separate tasks, we should try to break it down.
Summary: Pull histogram_tools.py and parse_scalars.py out of mc and as a separate package → Pull probe parsers out of m-c and into a separate package
Priority: P2 → P3
Duplicate of this bug: 1302684
Assignee: nobody → alessio.placitelli
Priority: P3 → P1
Hey :gps, is there any way to make the build system download a Python package before the build starts?
We're considering moving the parsers [1] to a separate Python package, so that upstream Telemetry dependencies can depend on this rather than importing the parsers themselves.

I'm not sure downloading a fresh package out of a public repo is a good idea (for security), but I'm not an expert of the build system.

A fallback option would be to still create a package, but to mirror the code here [2] whenever a new version of the Telemetry parsers package is released. Does this make more sense? Can you see any drawback about this other than the Telemetry team (us:D) forgetting to update the version in-tree?

[1] - https://searchfox.org/mozilla-central/search?q=&case=false&regexp=false&path=telemetry%2Fparse_*.py
[2] - https://searchfox.org/mozilla-central/source/python/README
Flags: needinfo?(gps)
The Firefox build system should be hermetic and not make network requests. For Python dependencies, our policy is to vendor packages in the repo under third_party/python. It's worth noting we also have some Mozilla-authored Python packages that canonically live in mozilla-central and are published to PyPI. See every directory under testing/mozbase for example.

You can use `mach vendor python` to initiate a vendoring of a PyPI hosted package. See https://hg.mozilla.org/mozilla-central/rev/a566524dd127 for what other files you may want to change so the package is importable by the build system.

It's best to get a build peer review on any vendorings. But other people like ahal and davehunt can sign off on the vendoring bits that don't touch the build system. I've CCd them here in case they want to provide additional context.
Flags: needinfo?(gps)
(In reply to Gregory Szorc [:gps] from comment #4)
> The Firefox build system should be hermetic and not make network requests.
> For Python dependencies, our policy is to vendor packages in the repo under
> third_party/python. It's worth noting we also have some Mozilla-authored
> Python packages that canonically live in mozilla-central and are published
> to PyPI. See every directory under testing/mozbase for example.
> 
> You can use `mach vendor python` to initiate a vendoring of a PyPI hosted
> package. See https://hg.mozilla.org/mozilla-central/rev/a566524dd127 for
> what other files you may want to change so the package is importable by the
> build system.
> 
> It's best to get a build peer review on any vendorings. But other people
> like ahal and davehunt can sign off on the vendoring bits that don't touch
> the build system. I've CCd them here in case they want to provide additional
> context.

Thanks :gps, that's very helpful! Given that, I think a way forward could be:

- Create a Github repo for the parsers.
- Enable CI on that.
- Enable automatic deployment on pypi from the "master" branch.
- Vendor the package in m-c (this requires a build peer to sign off).

Georg, does this sound like a plan? Can you see any drawback here or can we move on with this?
Flags: needinfo?(gfritzsche)
That sounds solid to me.
Let's run this past some Python developer for feedback, maybe :jezdez?
Flags: needinfo?(gfritzsche)
Jannis, does the plan in comment 5 sound good to you? We're planning on putting Telemetry parser in a separate pypi package so that upstream consumers (Telemetry pipeline services) can easily depend on that. Anything we should watch out for?
Flags: needinfo?(jezdez)
(In reply to Alessio Placitelli [:Dexter] from comment #7)
> Jannis, does the plan in comment 5 sound good to you? We're planning on
> putting Telemetry parser in a separate pypi package so that upstream
> consumers (Telemetry pipeline services) can easily depend on that. Anything
> we should watch out for?

Sounds good to me in general, although with the caveat we should follow best practices for Python package structure and setup, similar to how we've done for python_moztelemetry and python_mozaggregator.

Can you clarify if the extracted code could/should live in those packages to reduce redundancy, since any new package is a long-term investement in maintenance?
Flags: needinfo?(jezdez)
(In reply to Jannis Leidel [:jezdez] from comment #8)
> (In reply to Alessio Placitelli [:Dexter] from comment #7)
> > Jannis, does the plan in comment 5 sound good to you? We're planning on
> > putting Telemetry parser in a separate pypi package so that upstream
> > consumers (Telemetry pipeline services) can easily depend on that. Anything
> > we should watch out for?
> 
> Sounds good to me in general, although with the caveat we should follow best
> practices for Python package structure and setup, similar to how we've done
> for python_moztelemetry and python_mozaggregator.

Yes, I completely agree on this!

> Can you clarify if the extracted code could/should live in those packages to
> reduce redundancy, since any new package is a long-term investement in
> maintenance?

As of now, a number of services (probe-scraper [1], python_moztelemetry [2], ...) rely on the parsers which live in mozilla-central. The parsers are currently simply copied over the tree, when any update occurs. However, this approach can sometimes cause issues: if we forget to keep the version in sync, things might fail (and have failed in the past, see bug 1430115 et al).

We expect the number of services relying on the parsers to grow in the future, and we really need a better way to reduce the redundancy and keep things in sync (or, at least, to notify for updates!). For this reason I think the cost of maintaining a new package is balanced out by the time we loose each time to sync up the different copies of the parser or fix breakages :)

[1] - https://github.com/mozilla/probe-scraper/
[2] - https://github.com/mozilla/python_moztelemetry/
(In reply to Alessio Placitelli [:Dexter] from comment #9)

> > Can you clarify if the extracted code could/should live in those packages to
> > reduce redundancy, since any new package is a long-term investement in
> > maintenance?
> 
> As of now, a number of services (probe-scraper [1], python_moztelemetry [2],
> ...) rely on the parsers which live in mozilla-central. The parsers are
> currently simply copied over the tree, when any update occurs. However, this
> approach can sometimes cause issues: if we forget to keep the version in
> sync, things might fail (and have failed in the past, see bug 1430115 et al).

Gotcha, don't forget to open tickets in those other services/packages to use the new dependency.

> We expect the number of services relying on the parsers to grow in the
> future, and we really need a better way to reduce the redundancy and keep
> things in sync (or, at least, to notify for updates!). For this reason I
> think the cost of maintaining a new package is balanced out by the time we
> loose each time to sync up the different copies of the parser or fix
> breakages :)
> 
> [1] - https://github.com/mozilla/probe-scraper/
> [2] - https://github.com/mozilla/python_moztelemetry/

Understood, glad you found this maintenance risk! Since naming things is hard, let's stay consistent and call the library python_mozparsers maybe?
Unless you are targeting developing on GitHub and getting contributors on GitHub, I'd keep the source of truth in mozilla-central and release to PyPI from mozilla-central. Just like what mozbase does. Otherwise you are just creating headaches around synchronizing changes, forcing releases / vendoring due to minor changes, etc. That can add up to a bit of overhead.
I think having it accessible on GitHub is one factor (i'm not sure how important of a factor it is though).
AFAICT, there are two main contributing parties here:
1) The Firefox Telemetry team, which does work in mozilla-central.
2) The Data platform team, which is not using mozilla-central and generally not set up for Firefox builds.

Maybe we should check back with the latter and confirm?
Hey Mark and Frank, we're considering releasing the Telemetry parsers (the various parser_*.py scripts that currently live in the mozilla-central repo) as a PyPi package. We have two options for doing that (see comment 5 and comment 11):

1) Move the parsers to a GitHub repo, use that as the main development repo, and only vendor (read = import) the package in mozilla-central. This would be beneficial to developers not working on mozilla-central, as it wouldn't be required to actually change something in the parsers (or fix bugs :D).
2) Keep the development in mozilla-central and just release the scripts as PyPi packages so that upstream consumers can rely on that. This wouldn't make it easy for contribution from people who don't have a Firefox build locally.

Do you feel like (1) would be beneficial to you? Can you suggest anyone else who might have an opinion on this?
Flags: needinfo?(mreid)
Flags: needinfo?(fbertsch)
From a quick discussion on IRC with Jan-Erik, there's another perspective to this:

> (github scenario, - option 1) if the parser is changed, but we forget to actually vendor the new version in time, then someone changes Histograms.yaml with incompatible data, we only catch that later when we update the vendored version; if it would be in-tree, we land that parser change and a change to histograms.yaml would fail

This makes me lean toward what :gps suggested: having this in tree and just pushed to pypi on updates.
(In reply to Alessio Placitelli [:Dexter] from comment #14)
> From a quick discussion on IRC with Jan-Erik, there's another perspective to
> this:
> 
> > (github scenario, - option 1) if the parser is changed, but we forget to actually vendor the new version in time, then someone changes Histograms.yaml with incompatible data, we only catch that later when we update the vendored version; if it would be in-tree, we land that parser change and a change to histograms.yaml would fail

Is that different from landing the changes in-tree?
Doesn't it just move adapting the Histograms.yaml to a later time instead of immediately?

On the package side, we need some form of backward compatibility to historic versions anyway.
There are also two more kinds of probe registries that potentially live outside of the Firefox tree:
- Mobile probe registries.
- Add-on probe registries (not yet supported, but possibly in the future for e.g. Shield studies).
Frank and I discussed this with the data platform team, and we would prefer the approach of keeping the code in mozilla-central and release to PyPI (Option 2 from Comment 13). This keeps the ownership/responsibility in Firefox, but makes it easier to manage dependencies from other projects. We don't see a need to contribute to that codebase outside of mozilla-central.
Flags: needinfo?(mreid)
Flags: needinfo?(fbertsch)
Priority: P1 → P2
(In reply to Mark Reid [:mreid] from comment #17)
> Frank and I discussed this with the data platform team, and we would prefer
> the approach of keeping the code in mozilla-central and release to PyPI
> (Option 2 from Comment 13). This keeps the ownership/responsibility in
> Firefox, but makes it easier to manage dependencies from other projects. We
> don't see a need to contribute to that codebase outside of mozilla-central.

I see, thanks for the feedback. Just to clarify: regardless of the location, the ownership of that would still be completely on the Telemetry team. In that regard, storage location and ownership are probably orthogonal problems IMHO.

(In reply to Georg Fritzsche [:gfritzsche] from comment #15)
> (In reply to Alessio Placitelli [:Dexter] from comment #14)
> > From a quick discussion on IRC with Jan-Erik, there's another perspective to
> > this:
> > 
> > > (github scenario, - option 1) if the parser is changed, but we forget to actually vendor the new version in time, then someone changes Histograms.yaml with incompatible data, we only catch that later when we update the vendored version; if it would be in-tree, we land that parser change and a change to histograms.yaml would fail
> 
> Is that different from landing the changes in-tree?
> Doesn't it just move adapting the Histograms.yaml to a later time instead of
> immediately?

Mh, you have a point here. If we're on github, update the parsers, and land that on m-c, we'd still run tests in-tree. So if there's any incompatible data (e.g. we reduce some limits for event strings), we would address that as we land the updated version of the parsers.
(In reply to Mark Reid [:mreid] from comment #17)
> Frank and I discussed this with the data platform team, and we would prefer
> the approach of keeping the code in mozilla-central and release to PyPI
> (Option 2 from Comment 13). This keeps the ownership/responsibility in
> Firefox, but makes it easier to manage dependencies from other projects. We
> don't see a need to contribute to that codebase outside of mozilla-central.

I agree that ownership is in Firefox, specifically the Firefox Telemetry team.
Where the code lives seems orthogonal, but it is up to our teams process then.
We'll need to talk about how mobile will fit in here, then proceed from there.
Priority: P2 → P1
Hey Frank, Mark! Given the ownership clarification in comment 19 and comment 18, would you still strongly advise against having a github repo?
Flags: needinfo?(mreid)
Flags: needinfo?(fbertsch)
I don't feel very strongly about it, I just don't see a ton of value in moving the canonical source out of mozilla-central. Regardless of where it is, the "backend" platform code will make use of the published package.

Having such a published package will definitely be useful, thanks!
Flags: needinfo?(mreid)
(In reply to Alessio Placitelli [:Dexter] from comment #20)
> Hey Frank, Mark! Given the ownership clarification in comment 19 and comment
> 18, would you still strongly advise against having a github repo?

Given the ownership question is resolved it seems fine to have it be wherever the owners desire! No strong opinion here either.
Flags: needinfo?(fbertsch)
Points: --- → 1
Blocks: 1486009
This patch moves the parsers' code to python_mozparsers, which is
the definitive name of the package on pypi. The data generation scripts
are adjusted accordingly. The README and setup files are added to
enable publishing on pypi.
Here's the package's page, as shown on the test PyPi instance: https://test.pypi.org/project/python-mozparsers/
Comment on attachment 9006589 [details]
Bug 1282098 - Pull probe parsers out of m-c and into a separate package. r?chutten,gfritzsche

Chris H-C :chutten has approved the revision.
Attachment #9006589 - Flags: review+
Pushed by aplacitelli@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/28fb503aacbc
Pull probe parsers out of m-c and into a separate package. r=chutten
https://hg.mozilla.org/mozilla-central/rev/28fb503aacbc
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla64
The published PyPi package lives at: https://pypi.org/project/python-mozparsers/
The guide to publish or update the PyPi package lives here: https://packaging.python.org/tutorials/packaging-projects/
Depends on: 1490227

(In reply to Alessio Placitelli [:Dexter] from comment #29)

The published PyPi package lives at:
https://pypi.org/project/python-mozparsers/

Just for the benefit of anyone else landing here, I think the new package location is:
https://pypi.org/project/mozparsers/

You need to log in before you can comment on or make changes to this bug.