Closed Bug 1769579 Opened 3 years ago Closed 3 years ago

Update probe scraper for Mozilla Rally

Categories

(Data Platform and Tools :: Glean Platform, task, P1)

task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aagarwal, Assigned: Dexter)

Details

Attachments

(5 files)

Metrics and pings definitions for Mozilla Rally have moved to a new monorepo. Also, the same set of definitions are now being used for both the Core Add-on and the Web Platform.

As such, the fields in the probe-scraper repositories.yaml should be updated:

canonical_app_name: Rally Core Add-on and Web Platform
app_description: |
The Rally Core Add-on and Web Platform orchestrate
the installation and the lifecycle of [Rally](https://rally.mozilla.org/)
studies.
url: https://github.com/mozilla-rally/rally
branch: main
metrics_files:
  - web-platform/glean/metrics.yaml
ping_files:
  - web-platform/glean/pings.yaml

(rest can remain as-is, from my understanding)

Flags: needinfo?(alessio.placitelli)
Flags: needinfo?(akomarzewski)
Assignee: nobody → alessio.placitelli
Flags: needinfo?(alessio.placitelli)
Priority: -- → P1
Flags: needinfo?(akomarzewski)

Looks like this introduced an incompatible schema change: https://workflow.telemetry.mozilla.org/log?dag_id=probe_scraper&task_id=mozilla_schema_generator&execution_date=2022-05-17T00%3A00%3A00%2B00%3A00

[2022-05-18 20:31:29,051] {{pod_launcher.py:149}} INFO - --- /app/validate_schema_evolution/da26f512f0f322e06a297ffa05f2cc51d3eabc72_base-revision/rally-core.demographics.1.txt
[2022-05-18 20:31:29,051] {{pod_launcher.py:149}} INFO - +++ /app/validate_schema_evolution/97fad6adf5de1a80deb777074b36a40037fbe520_generated-schemas/rally-core.demographics.1.txt
[2022-05-18 20:31:29,051] {{pod_launcher.py:149}} INFO - @@ -46,4 +46,2 @@
[2022-05-18 20:31:29,051] {{pod_launcher.py:149}} INFO - root.metrics.labeled_boolean.user_gender.[].value BOOL
[2022-05-18 20:31:29,051] {{pod_launcher.py:149}} INFO - -root.metrics.labeled_boolean.user_income.[].key STRING
[2022-05-18 20:31:29,051] {{pod_launcher.py:149}} INFO - -root.metrics.labeled_boolean.user_income.[].value BOOL
[2022-05-18 20:31:29,052] {{pod_launcher.py:149}} INFO - root.metrics.labeled_boolean.user_origin.[].key STRING

Hey folks,

looks like the rally-core-addon repository removed the user.income on July 2021 and started sending the user.exact_income. The monorepo metrics.yaml is now identical to the rally-core-addon one (minus one forward compatible change).

Why are we tripping on this now? How can we fix this? Is this because the monorepo did not retain the full history on the YAML files?

Flags: needinfo?(whd)
Flags: needinfo?(dthorn)

Is this because the monorepo did not retain the full history on the YAML files?

This is most likely the case. It looks like you were involved in a similar fix for regrets reporter so I'd recommend doing something similar:

https://github.com/mozilla-extensions/regrets-reporter/pull/28
https://github.com/mozilla/probe-scraper/pull/380
https://bugzilla.mozilla.org/show_bug.cgi?id=1744585

Flags: needinfo?(whd)
Flags: needinfo?(dthorn)

Hey Ashwin, Rob,

looks like that the history for the YAML files was not ported over to the monorepo. Unfortunately, to make this whole dance work, we need to carry over the history (or, rather, the commit-by-commit changes) for the two YAML files from the rally-core addon. Any chance you could kindly do it?

Then I could re-land the scraper changes.

Flags: needinfo?(rhelmer)
Flags: needinfo?(aagarwal)

(In reply to Alessio Placitelli [:Dexter] from comment #6)

Hey Ashwin, Rob,

looks like that the history for the YAML files was not ported over to the monorepo. Unfortunately, to make this whole dance work, we need to carry over the history (or, rather, the commit-by-commit changes) for the two YAML files from the rally-core addon. Any chance you could kindly do it?

Then I could re-land the scraper changes.

Yes we are planning to do this, will your tooling follow renames OK (e.g. git log --follow)? I know github has problems with them.

(In reply to Robert Helmer [:rhelmer] from comment #7)

(In reply to Alessio Placitelli [:Dexter] from comment #6)

Hey Ashwin, Rob,

looks like that the history for the YAML files was not ported over to the monorepo. Unfortunately, to make this whole dance work, we need to carry over the history (or, rather, the commit-by-commit changes) for the two YAML files from the rally-core addon. Any chance you could kindly do it?

Then I could re-land the scraper changes.

Yes we are planning to do this, will your tooling follow renames OK (e.g. git log --follow)? I know github has problems with them.

I'm not sure. But even something as simple as exporting the commits as patches are re-applying them in order would work for this. See comment 5.

I applied patches in this PR which is now merged: https://github.com/mozilla-rally/rally/pull/24

We should be ready to re-open the probe-scraper PR.

Flags: needinfo?(aagarwal) → needinfo?(alessio.placitelli)

probe scraper only checks commits in the linear history as returned by git log -- filename, and does not specify --follow, so https://github.com/mozilla-rally/rally/pull/24 was probably not sufficient because it needed to use a rebase merge (instead of a merge commit) to expose old versions of the file to probe scraper.

Flags: needinfo?(alessio.placitelli) → needinfo?(aagarwal)

Opened a new PR with a rebase merge: https://github.com/mozilla-rally/rally/pull/26

Flags: needinfo?(aagarwal)

Closed the PR; the repo should be ready now.

Flags: needinfo?(dthorn)
Flags: needinfo?(alessio.placitelli)
Flags: needinfo?(dthorn)

Hey :relud, looks like the PR from comment 13 was merged. Did it work? Can this be closed?

Flags: needinfo?(rhelmer)
Flags: needinfo?(dthorn)
Flags: needinfo?(alessio.placitelli)

FWIW I am not seeing the new "unenrollment" table in the analysis environment yet, but it's been less than 24 hours.

:Dexter looks like schema deploys worked, so yeah, this can be closed.

Ashwin: yeah, I expect, it will take until a few hours into tomorrow UTC before that is available.

Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(dthorn)
Resolution: --- → FIXED

Ashwin: yeah, I expect, it will take until a few hours into tomorrow UTC before that is available.

FWIW I am not seeing the new "unenrollment" table in the analysis environment yet, but it's been less than 24 hours.

Ashwin pinged me to get that PR merged before today's probe scraper run so the (intentional) issue here is that Rally schema updates do not automatically propagate, see https://mana.mozilla.org/wiki/pages/viewpage.action?pageId=124977458#Rally(akaPioneer)-Schemasupdatesdon'tautomaticallypropagatetoproduction.

I've deployed these tables to production and authorized them so you should have access to the tables via the analysis environment now.

I just looked in the analysis environment and I do indeed see the new "unenrollment" table.

However, I now also see an "onboarding" table that I don't think was there before. I looked through the commit history and it looks like an "onboarding" ping was introduced in this commit but subsequently removed in this commit and never re-added.

I'm guessing this is happening because that second commit where the "onboarding" ping was removed is also now being ignored by probe-scraper as per :relud's necessary patch. So the schema generation process never sees that ping get removed.

For the time being, I don't think it's a blocking issue for us to have an extraneous table (correct me if I'm wrong). But it would be good to get rid of it so as not to cause confusion. Let me know how we can proceed with that, including if I should file a new bug.

Flags: needinfo?(whd)
Flags: needinfo?(dthorn)
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

I'm guessing this is happening because that second commit where the "onboarding" ping was removed is also now being ignored by probe-scraper as per :relud's necessary patch. So the schema generation process never sees that ping get removed.

I haven't investigated this hypothesis closely but in order to remove this ping we will need to update the incompatibility allowlist in
https://github.com/mozilla/mozilla-schema-generator/commits/main/incompatibility-allowlist to allow the deletion to propagate.

For the time being, I don't think it's a blocking issue for us to have an extraneous table (correct me if I'm wrong).

This is correct.

But it would be good to get rid of it so as not to cause confusion. Let me know how we can proceed with that, including if I should file a new bug.

I'll leave :relud's NI so he can make a recommendation here.

An alternate (rally-specific) way we can hide the fact that this table exists is to exclude the authorized view from being published to the analysis project. This is similar to how we treat deletion-request pings. I'd recommend we simply delete the schemas and associated tables but this would be a low effort alternative.

Flags: needinfo?(whd)

(In reply to Ashwin Agarwal from comment #18)

I'm guessing this is happening because that second commit where the "onboarding" ping was removed is also now being ignored by probe-scraper as per :relud's necessary patch. So the schema generation process never sees that ping get removed.

normally we don't remove tables even when the ping is removed from glean, because we still accept data from every version of the application. If we want to delete that table anyway, we should ignore all commits where the ping exists in probe scraper and add it to the incompatibility allowlist like :whd mentioned

Flags: needinfo?(dthorn)

(In reply to Daniel Thorn [:relud] from comment #20)

normally we don't remove tables even when the ping is removed from glean, because we still accept data from every version of the application. If we want to delete that table anyway, we should ignore all commits where the ping exists in probe scraper and add it to the incompatibility allowlist like :whd mentioned

Ah ok that makes sense. I just ran some queries and it looks like the table is empty, so for us it makes sense to not have it in the analysis environment. I am okay with either deleting it or hiding it, :relud whatever you'd prefer?

Flags: needinfo?(dthorn)

let's delete it

Flags: needinfo?(dthorn)

The tables and views have been deleted.

Status: REOPENED → RESOLVED
Closed: 3 years ago3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: