Implement the 'object' metric type
Categories
(Data Platform and Tools :: Glean: SDK, task, P2)
Tracking
(Not tracked)
People
(Reporter: janerik, Assigned: janerik)
References
(Blocks 4 open bugs)
Details
Attachments
(5 files, 1 obsolete file)
Proposal: https://docs.google.com/document/d/1ov2txT70sO4u3m7rj-HQdcJLY7VC-Otx2pFiCA-a9fc/edit#heading=h.l4ya7hop8p4q
(public link)
A first rough PoC is available in https://github.com/mozilla/glean/pull/2489
with parser changes here: https://github.com/mozilla/glean_parser/pull/587
Far from done.
This will also need some work on the pipeline to properly ingest the data into a JSON column.
Assignee | ||
Updated•11 months ago
|
Assignee | ||
Comment 1•10 months ago
|
||
Assignee | ||
Comment 2•10 months ago
|
||
Assignee | ||
Comment 3•10 months ago
|
||
Using this bug as a big of a worklog, as this change requires multiple PRs to different projects and some coordination to land in full:
- The above PRs add the necessary changes to glean_parser and glean-core, implementing the type only for the RLB for now. They are in decent shape (minus some naming, tests and docs).
- I have a hacky patch compiling adding the object metric to m-c with a JS API; the C++ API is half-broken and needs some work.
- Patches to mozilla-schema-generator and mozilla-pipeline-schemas are easy.
- We probably need to touch jsonschema-transpiler to get us the right schema using a JSON column
- and then we need to modify ingestion to transform the data correctly
1-3 are nearing completion.
4 I will take a look at, I don't expect much work.
For 5 relud pointed me here: https://github.com/mozilla/gcp-ingestion/blob/5e33bdbf86da7fbee5699de32b6695aec0a093e8/ingestion-core/src/main/java/com/mozilla/telemetry/ingestion/core/transform/PubsubMessageToObjectNode.java#L328
That might be the trickiest part because I never touched that code but maybe I can rely on some help from relud.
Assignee | ||
Updated•10 months ago
|
Assignee | ||
Updated•10 months ago
|
Assignee | ||
Updated•9 months ago
|
Assignee | ||
Comment 4•9 months ago
|
||
For now this only provides a JavaScript API, as it's easier to generate
and to use (we just use JavaScript objects, validation is done by
(de)serialization on the Rust side).
No JOG support. Parent-process only right now.
Todo:
- The test API should return an object, not a JSON string
Assignee | ||
Comment 5•8 months ago
|
||
As called out I will need help with the related changes across the pipeline.
1. mozilla-pipeline-schemas
Added new metric type: https://github.com/badboy/mozilla-pipeline-schemas/commit/35b0870fbf304662d179d26dc934b1217663c73b
No further schema limits applied. However this breaks mozilla-schema-generator later.
Tried to fix this with: https://github.com/badboy/mozilla-pipeline-schemas/commit/10984502191a513957f552921dbc1d8adf0992b9
Note: this is wrong. type: any
is not valid jsonschema.
Help needed. Changes depend on next part.
2. mozilla-schema-generator
First: URLs from which to fetch schemas/repos is hard-coded, so I had to manually change that to test it against local checkouts.
MSG requires items
for type: array
items
then cannot be empty, as it requires types in size check: https://github.com/badboy/mozilla-schema-generator/blob/b696aec9a4f5423a141069612365af7c492e1e42/mozilla_schema_generator/schema.py#L111any
is not a valid jsonschema type, so this probably will break down the line?
I'm not sure how to handle this here.
Help needed.
3. jsonschema-transpiler
I have a PR up: https://github.com/mozilla/jsonschema-transpiler/pull/107
This breaks test, I can fix these.
But: I manually created what I think is the schema and tested it against that, so this might be wrong. We need to fix MPS and MSG first to generate a schema to test with, then fix the transpiler.
I can handle this when 1. and 2. are handled.
4. gcp-ingestion
I think it's already able to handle putting JSON into a JSON column.
Ihave no idea how to test it. And I don't have the dev environment for it set up either.
I went through the code:
- Ingestion starts basically here: https://github.com/mozilla/gcp-ingestion/blob/5e33bdbf86da7fbee5699de32b6695aec0a093e8/ingestion-core/src/main/java/com/mozilla/telemetry/ingestion/core/transform/PubsubMessageToObjectNode.java#L328
- Goes to
processField
- Ends up in
coerceSingleValueToBqType
- It checks against the table schema. Should drop to last
else
branch, which gets usOptional.of(o)
.- I think this is then fine for a colum of JSON type.
Help needed. Does not depend on the other parts. I can provide a sample payload and table schema.
Pinging arkadiusz and relud for help.
We can split the work in individual bugs for easier tracking if you want.
Comment 6•8 months ago
|
||
type: any
is not valid jsonschema.
MSG requires
items
fortype: array
I think a hint needs to be added to indicate the JSON type to mozilla-schema-generator and jsonschema-transpiler. format
seems like a good choice, but it is technically only a string property, so I'm not sure if it would break anything.
Assignee | ||
Comment 7•8 months ago
|
||
(In reply to Daniel Thorn [:relud] from comment #6)
type: any
is not valid jsonschema.MSG requires
items
fortype: array
I think a hint needs to be added to indicate the JSON type to mozilla-schema-generator and jsonschema-transpiler.
format
seems like a good choice, but it is technically only a string property, so I'm not sure if it would break anything.
That works and is relatively easy to implement. So far it doesn't seem to break anything.
Assignee | ||
Comment 8•8 months ago
|
||
I spoke too soon. It works with all our tools, but as far as I can tell is not proper jsonschema if I use format
with also type: [array, object]
.
Comment 9•7 months ago
|
||
Clearing NI as we discussed this in person (store object paths in json schemas and use a side input to jsonschema-transpiler to tell it where to stop parsing).
Assignee | ||
Comment 10•7 months ago
|
||
I have something working now.
First off branches and commits for all projects involved here.
- Fenix: https://github.com/mozilla-mobile/firefox-android/compare/main...badboy:firefox-android:object-metric-type-test
- branch: object-metric-type-test
- commit: 6fda51ef62d935f2d0ee5304fab955000cbbed52
- probe-scraper: https://github.com/mozilla/probe-scraper/compare/main...badboy:probe-scraper:object-metric-type-test
- branch: object-metric-type-test
- commit: a4bbb42be6c0d1ba0a21e4501ce0d0b1320835c2
- jsonschema-transpiler: https://github.com/mozilla/jsonschema-transpiler/pull/107
- branch: object-metric-type-json-column
- commit: 8450041a1df544877ba5bc546fab9c23ed3dd0d2
- mozilla-pipeline-schemas: https://github.com/mozilla-services/mozilla-pipeline-schemas/compare/main...badboy:mozilla-pipeline-schemas:object-metric-type-2
- branch: object-metric-type-2
- commit: b1973563667e2f091277f99c9798e26937239f66
- mozilla-schema-generator: https://github.com/mozilla/mozilla-schema-generator/compare/main...badboy:mozilla-schema-generator:object-type
- branch: object-type
- commit: 9f90876b23bd01e5af6b7e06c31101bec089dda1
Clone probe-scraper, checkout the mentioned commit, then run and host probe-scraper:
python3 -m probe_scraper.runner --dry-run --cache-dir tmp/cache --out-dir tmp/out --glean --glean-repo fenix --glean-repo glean-core
cd tmp/out && httplz -p 8000 # or: python3 -m http.server
Clone MSG, checkout the mentioned commit, then run mozilla-schema-generator:
rm -rf .probe_cache # just in case.
mozilla-schema-generator generate-glean-pings --out-dir out --repo org-mozilla-fenix --pretty --mps-branch object-metric-type-2
Install and run the transpiler (this is what bin/generate_commit
does)
cargo install --git https://github.com/mozilla/jsonschema-transpiler --branch object-metric-type-json-column
jsonschema-transpiler --resolve drop --type bigquery --normalize-case --force-nullable --tuple-struct out/org-mozilla-fenix/baseline/baseline.1.schema.json > out/org-mozilla-fenix/baseline/baseline.1.bq
Result:
$ rg -C3 crash.stack out/org-mozilla-fenix/metrics/metrics.1.bq
1436- {
1437- "description": "All threads' frame information.",
1438- "mode": "NULLABLE",
1439: "name": "crash_stack_threads",
1440- "type": "JSON"
1441- }
1442- ],
Success!?!
Assignee | ||
Comment 11•7 months ago
|
||
If the above changes will be accepted I think we need to go about it this way:
- new release of jsonschema-transpiler
- update jsonschema-transpiler in use in MSG
- include new metric type changes
- land MPS changes
- land updates to glean_parser
- get that into probe-scraper too
- land object metric type in Glean
Comment 12•7 months ago
|
||
Assignee | ||
Comment 13•7 months ago
|
||
Assignee | ||
Comment 14•7 months ago
|
||
Arkadiusz, can you take a look at https://bugzilla.mozilla.org/show_bug.cgi?id=1839640#c10 and see if that would be an acceptable way forward?
Assignee | ||
Comment 15•5 months ago
|
||
In the linked Jira issue we planned and executed a quick test if the pipeline correctly ingests data into JSON columns.
That test succeeded and it all works as expected, with no changes. That means we can move forward with the implementation parts.
Assignee | ||
Comment 16•4 months ago
|
||
Assignee | ||
Comment 17•4 months ago
•
|
||
Next steps:
- Get reviews on glean_parser / Glean. Those things can then land and we can prepare a release
- Update the m-c patch. It's working locally, mostly waiting for the above releases
- Land the jsonschema-transpiler patch and get that used in our tooling
- Land the MSG patch
- Land the MPS patch
- Start using object metrics!
1 & 2 can get rolling independently already.
3-5 need to land in that order then. As long as no one uses object metrics they are not risky.
Once landed we should do proper testing that it works all as expected.
Comment 18•4 months ago
|
||
Considering the pipeline ingestion side, the plan outlined in https://bugzilla.mozilla.org/show_bug.cgi?id=1839640#c11 looks good to me. To reiterate, steps are:
- Update jsonschema-transpiler to support object metric type (https://github.com/mozilla/jsonschema-transpiler/pull/107)
- Update mozilla-schema-generator to use the updated jsonschema-transpiler. I believe it's this PR: https://github.com/mozilla/mozilla-schema-generator/pull/257/files
- Update glean schema in mozilla-pipeline-schemas: https://github.com/mozilla-services/mozilla-pipeline-schemas/pull/793/files
Assignee | ||
Comment 19•4 months ago
|
||
Assignee | ||
Comment 20•4 months ago
|
||
Assignee | ||
Comment 21•4 months ago
|
||
Assignee | ||
Comment 22•4 months ago
|
||
Assignee | ||
Comment 23•3 months ago
|
||
badboy merged PR [mozilla/glean]: Bug 1839640 - Add new metric type: object (#2489) in 78e85ac.
This concludes most pieces in the SDK and pipeline.
I'm keeping this bug open as they are some dependent bugs, but those require m-c work first.
I will create followup bugs blocking this.
Updated•3 months ago
|
Assignee | ||
Updated•2 months ago
|
Assignee | ||
Comment 24•1 month ago
|
||
The bulk of the work landed. The remaining "Depends on" bugs can be tackled separately. This should unblock usage now.
Description
•