Bug 1815432 Comment 2 Edit History

Note: The actual edited comment in the bug view page will always show the original commenter’s name and original timestamp.

Yes, we would see duplicate metric errors in prod from this. 

After checking out [PR 551](https://github.com/mozilla/probe-scraper/pull/551) and applying [PR 558](https://github.com/mozilla/probe-scraper/pull/558) to fix duplicate metric checks, the following command shows approximately what would actually happen on a production run:

```sh
python3 -m probe_scraper.runner --dry-run --cache-dir tmp/cache --out-dir tmp/out --output-bucket=gs://probe-scraper-prod-artifacts --update --glean --glean-repo=firefox-desktop --glean-limit-date=$(date -d -1day +%F) --env=dev
```

> The combination of --glean-limit-date, --update, etc. is not clear to me.

as seen in the above example, use `--update` with `--output-bucket=gs://probe-scraper-prod-artifacts` and `--glean-limit-date=$(date -d -1day +%F)` with `--glean-repo` (or `--glean-url`) to approximate how prod runs.

`--update` and `--output-bucket=gs://probe-scraper-prod-artifacts` ensure that previous results are pulled from the prod output bucket and "updated", instead of starting from scratch. Your normal credentials should be sufficient to read from the output bucket, but not write to it, and `--env=dev` (which is the default when not specified) ensures that you won't try to write to it.

 `--glean-limit-date` with `--glean-repo` causes a shallow clone of git repos so that you only scrape commits on or after that utc date, and it's basically like batching all push mode updates that would have been sent in that time.

> So I would need someone to verify that the steps I ran are equivalent to what happens when we land all this.

The lack of `--update` and `--output-bucket` means the steps you ran wouldn't account for `gecko` still having crash metrics even after it no longer references `toolkit/components/crashes/metrics.yaml` in `repositories.yaml`.

Also, before [PR 558](https://github.com/mozilla/probe-scraper/pull/558) there would have incorrectly been no duplicate metric error, which is why you had to go through extra steps to correctly trigger one.
Yes, we would see duplicate metric errors in prod from this. 

After checking out [PR 551](https://github.com/mozilla/probe-scraper/pull/551) and applying [PR 558](https://github.com/mozilla/probe-scraper/pull/558) to fix duplicate metric checks, the following command shows approximately what would actually happen on a production run:

```sh
python3 -m probe_scraper.runner --dry-run --cache-dir tmp/cache --out-dir tmp/out --output-bucket=gs://probe-scraper-prod-artifacts --update --glean --glean-repo=firefox-desktop --glean-limit-date=$(date -d -1day +%F) --env=dev
```

---
(In reply to Jan-Erik Rediger [:janerik] from comment #0)
> The combination of --glean-limit-date, --update, etc. is not clear to me.

as seen in the above example, use `--update` with `--output-bucket=gs://probe-scraper-prod-artifacts` and `--glean-limit-date=$(date -d -1day +%F)` with `--glean-repo` (or `--glean-url`) to approximate how prod runs.

`--update` and `--output-bucket=gs://probe-scraper-prod-artifacts` ensure that previous results are pulled from the prod output bucket and "updated", instead of starting from scratch. Your normal credentials should be sufficient to read from the output bucket, but not write to it, and `--env=dev` (which is the default when not specified) ensures that you won't try to write to it.

 `--glean-limit-date` with `--glean-repo` causes a shallow clone of git repos so that you only scrape commits on or after that utc date, and it's basically like batching all push mode updates that would have been sent in that time.

---
(In reply to Jan-Erik Rediger [:janerik] from comment #0)
> So I would need someone to verify that the steps I ran are equivalent to what happens when we land all this.

The lack of `--update` and `--output-bucket` means the steps you ran wouldn't account for `gecko` still having crash metrics even after it no longer references `toolkit/components/crashes/metrics.yaml` in `repositories.yaml`.

Also, before [PR 558](https://github.com/mozilla/probe-scraper/pull/558) there would have incorrectly been no duplicate metric error, which is why you had to go through extra steps to (correctly) trigger one.
Yes, we would see duplicate metric errors in prod from this. 

After checking out [PR 551](https://github.com/mozilla/probe-scraper/pull/551) and applying [PR 558](https://github.com/mozilla/probe-scraper/pull/558) to fix duplicate metric checks, the following command shows approximately what would actually happen on a production run:

```sh
python3 -m probe_scraper.runner --dry-run --cache-dir tmp/cache --out-dir tmp/out --output-bucket=gs://probe-scraper-prod-artifacts --update --glean --glean-repo=firefox-desktop --glean-limit-date=$(date -d -1day +%F) --env=dev
```

---
(In reply to Jan-Erik Rediger [:janerik] from comment #0)
> The combination of --glean-limit-date, --update, etc. is not clear to me.

as seen in the above example, use `--update` with `--output-bucket=gs://probe-scraper-prod-artifacts` and `--glean-limit-date=$(date -d -1day +%F)` with `--glean-repo` (or `--glean-url`) to approximate how prod runs.

`--update` and `--output-bucket=gs://probe-scraper-prod-artifacts` ensure that previous results are pulled from the prod output bucket and "updated", instead of starting from scratch. Your normal credentials should be sufficient to read from the output bucket, but not write to it, and `--env=dev` (which is the default when not specified) ensures that you won't try to write to it.

 `--glean-limit-date` with `--glean-repo` causes a shallow clone of git repos so that you only scrape commits on or after that utc date, and it's basically like batching all push mode updates that would have been sent in that time.

---
(In reply to Jan-Erik Rediger [:janerik] from comment #0)
> So I would need someone to verify that the steps I ran are equivalent to what happens when we land all this.

The lack of `--update` and `--output-bucket` means the steps you ran wouldn't account for `gecko` still having crash metrics even after it no longer references `toolkit/components/crashes/metrics.yaml` in `repositories.yaml`.

Also, before [PR 558](https://github.com/mozilla/probe-scraper/pull/558) there would have incorrectly been no duplicate metric error in prod, which is why you had to go through extra steps to (correctly) trigger one.

Back to Bug 1815432 Comment 2