Closed Bug 1529234 Opened 6 years ago Closed 6 years ago

Validate new pre-account ping

Categories

(Toolkit :: Telemetry, task, P1)

task
Points:
1

Tracking

()

RESOLVED FIXED

People

(Reporter: janerik, Assigned: janerik)

References

Details

Once the pre-account ping landed we need to validate it.

  1. Is the interval as expected?
  2. Does it contain the right metrics?
Depends on: 1534730

[removed, that comment wasn't ready]

Assignee: nobody → jrediger
Type: enhancement → task
Points: --- → ?
Priority: P2 → P1
Points: ? → 1

Known issues: validation errors because of missing speedMhz. Will fix that in a PR to the schemas repository.

Validation notebook

Summary:
We have ~64k pings currently in (Is this low? high? I have no idea).
The majority of pings are with reason "shutdown" or "periodic". IMO that is as expected. We have a few logins and even fewer logouts.

80% of the pings contain the one scalar we collect.
Of the ones without the scalar a majority are from short sessions (<10 minutes), so ... reasonable that they might just not have opened an URI?

For the durations:
We see a number of pings with negative durations (1.3% of all pings). This is worrisome and I filed bug 1545365.

We see a number of pings with durations over 48 hours, but only "shutdown" and "periodic" pings.
It's less than 1% and even less for builds after we switched from idle-daily to our own scheduler (though this might also be due to volume).
It's also all* from Windows, where sleep times are included, so it's a little bit expected (maybe we should eventually do something about that...)

(* except one single ping from a Linux machine)

:chutten, could you give this a look and double-check that my analysis makes sense? Did I miss anything or misinterpret the data? Do we need to fix anything besides the 2 bugs mentioned above?

Flags: needinfo?(chutten)

Context-free validation notes:

  1. Please pin the submission dates that you're using. Either use the same ones everywhere in WHERE clauses or parameterize them or restrict your view.
  2. Drop Cmd 12 in favour of Cmd 13
  3. I'd like to see if the scheduler refactor introduced a change in the distribution of negative durations
  4. No client_id checks? We could see how many pings per client per day we're receiving (maybe all the nonsense is from a single client?)

(In reply to Jan-Erik Rediger [:janerik] from comment #2)

Known issues: validation errors because of missing speedMhz. Will fix that in a PR to the schemas repository.

What impact does this have on your analysis?

Summary:
We have ~64k pings currently in (Is this low? high? I have no idea).

Taking a look at the number of "main" pings over the same builds and dates (buildid > 20190326, 20190410 < submission_date_s3 < 20190418) we see about 1M of them. Now, we don't send "main" pings at the same time as "pre-account" pings... but at the very least the number of "shutdown"-reason "main" pings (676K) should be within an order of magnitude of the number of "shutdown"-reason "pre-account" pings (51K) (minus the proportion of users who are logged in to FxA (just under 12%)...

I'm thinking this is rather (an order of magnitude) low. Enough lower that it might impact the entire analysis.

The majority of pings are with reason "shutdown" or "periodic". IMO that is as expected. We have a few logins and even fewer logouts.

We could (and probably should) cross-compare with "sync" pings to see what the rates of logins/logouts were over the population and period under study.

80% of the pings contain the one scalar we collect.
Of the ones without the scalar a majority are from short sessions (<10 minutes), so ... reasonable that they might just not have opened an URI?

Perhaps. It's still a little lower than I'd expect. We can cross-check between counts of total_uri_count-missing "pre-account"/"shutdown" and "main"/"shutdown" to ensure we're within an order of magnitude of correct.

For the durations:
We see a number of pings with negative durations (1.3% of all pings). This is worrisome and I filed bug 1545365.

Good choice. It's higher than we'd like. Though, given the overall low number of pings we might find upon reexamination that the proportion is much smaller and that this subpopulation is unusually-likely to contain weird pings.

We see a number of pings with durations over 48 hours, but only "shutdown" and "periodic" pings.
It's less than 1% and even less for builds after we switched from idle-daily to our own scheduler (though this might also be due to volume).
It's also all* from Windows, where sleep times are included, so it's a little bit expected (maybe we should eventually do something about that...)

(* except one single ping from a Linux machine)

Is this far outside the expected proportion of Linux users compared to Windows? 1% seems about right (though maybe it's higher on Nightly)

:chutten, could you give this a look and double-check that my analysis makes sense? Did I miss anything or misinterpret the data? Do we need to fix anything besides the 2 bugs mentioned above?

One big problem, a couple of small omissions, and some possibilities for tightening up a future edition.

Flags: needinfo?(chutten)

(In reply to Chris H-C :chutten from comment #3)

  1. Please pin the submission dates that you're using. Either use the same ones everywhere in WHERE clauses or parameterize them or restrict your view.

Done. Now having one view, limited to the right dates.

  1. Drop Cmd 12 in favour of Cmd 13

Uhm, damn. Now I mixed and edited some. I think it was the "Percentage of pings with negative duration across all pings" I should drop and keep the one separated by reason?

  1. I'd like to see if the scheduler refactor introduced a change in the distribution of negative durations

I think we don't have enough data for that (17k with negative durations before the change, 600k with negative durations after the change).

  1. No client_id checks? We could see how many pings per client per day we're receiving (maybe all the nonsense is from a single client?)

We ... don't have a client ID, that's the whole point.

Known issues: validation errors because of missing speedMhz. Will fix that in a PR to the schemas repository.

What impact does this have on your analysis?

Summary:
We have ~64k pings currently in (Is this low? high? I have no idea).

Taking a look at the number of "main" pings over the same builds and dates (buildid > 20190326, 20190410 < submission_date_s3 < 20190418) we see about 1M of them. Now, we don't send "main" pings at the same time as "pre-account" pings... but at the very least the number of "shutdown"-reason "main" pings (676K) should be within an order of magnitude of the number of "shutdown"-reason "pre-account" pings (51K) (minus the proportion of users who are logged in to FxA (just under 12%)...

I'm thinking this is rather (an order of magnitude) low. Enough lower that it might impact the entire analysis.

turns out: merged doesn't mean deployed. I'll file a bug with the right team and see to get that more clearly documented (and maybe have a way to know what is deployed and running?)

We now have for the pre-account ping in the period 20190417 to 20190423:

  • 628k pings total
    • 523k reason=shutdown
    • 103k reason=periodic

For the main ping in the same period:

  • 546k reason=shutdown
  • 106k reason=daily
  • 228k reason=environment-change

The majority of pings are with reason "shutdown" or "periodic". IMO that is as expected. We have a few logins and even fewer logouts.

We could (and probably should) cross-compare with "sync" pings to see what the rates of logins/logouts were over the population and period under study.

  • We saw 1.2k pre-accounts with reason "login".
  • For the same timeframe there where 1.8k sync events with why: login

That's at least in the same order of magnitude

80% of the pings contain the one scalar we collect.
Of the ones without the scalar a majority are from short sessions (<10 minutes), so ... reasonable that they might just not have opened an URI?

Perhaps. It's still a little lower than I'd expect. We can cross-check between counts of total_uri_count-missing "pre-account"/"shutdown" and "main"/"shutdown" to ensure we're within an order of magnitude of correct.

  • For pre-account/shutdown we have a ratio of 78%/22% for having/missing the total_uri_count.
  • For main/shutdown we have a ratio of 76%/24% for having/missing the total_uri_count.

We see a number of pings with durations over 48 hours, but only "shutdown" and "periodic" pings.
It's less than 1% and even less for builds after we switched from idle-daily to our own scheduler (though this might also be due to volume).
It's also all* from Windows, where sleep times are included, so it's a little bit expected (maybe we should eventually do something about that...)

(* except one single ping from a Linux machine)

Is this far outside the expected proportion of Linux users compared to Windows? 1% seems about right (though maybe it's higher on Nightly)

Ah right, proportional this is correct.
Apart from that looking at the scheduler change is not meaningful, we just don't have enough data from before that (as mentioned above)

Flags: needinfo?(chutten)

(In reply to Jan-Erik Rediger [:janerik] from comment #4)

  1. I'd like to see if the scheduler refactor introduced a change in the distribution of negative durations

I think we don't have enough data for that (17k with negative durations before the change, 600k with negative durations after the change).

Ah well, preliminary counts look like it's a lower proportion, but it doesn't go completely away.

  1. No client_id checks? We could see how many pings per client per day we're receiving (maybe all the nonsense is from a single client?)

We ... don't have a client ID, that's the whole point.

facepalm doi. I knew that, why didn't I think about that. I was too busy hoping we could explain it all away with a single, noisy client.

Known issues: validation errors because of missing speedMhz. Will fix that in a PR to the schemas repository.

What impact does this have on your analysis?

Summary:
We have ~64k pings currently in (Is this low? high? I have no idea).

Taking a look at the number of "main" pings over the same builds and dates (buildid > 20190326, 20190410 < submission_date_s3 < 20190418) we see about 1M of them. Now, we don't send "main" pings at the same time as "pre-account" pings... but at the very least the number of "shutdown"-reason "main" pings (676K) should be within an order of magnitude of the number of "shutdown"-reason "pre-account" pings (51K) (minus the proportion of users who are logged in to FxA (just under 12%)...

I'm thinking this is rather (an order of magnitude) low. Enough lower that it might impact the entire analysis.

turns out: merged doesn't mean deployed. I'll file a bug with the right team and see to get that more clearly documented (and maybe have a way to know what is deployed and running?)

We now have for the pre-account ping in the period 20190417 to 20190423:

  • 628k pings total
    • 523k reason=shutdown
    • 103k reason=periodic

For the main ping in the same period:

  • 546k reason=shutdown
  • 106k reason=daily
  • 228k reason=environment-change

Looking good.

The majority of pings are with reason "shutdown" or "periodic". IMO that is as expected. We have a few logins and even fewer logouts.

We could (and probably should) cross-compare with "sync" pings to see what the rates of logins/logouts were over the population and period under study.

  • We saw 1.2k pre-accounts with reason "login".
  • For the same timeframe there where 1.8k sync events with why: login

That's at least in the same order of magnitude

So few it's hard to say for certain, but it's not inconsistent with the hypothesis that both are tracking the same events.

80% of the pings contain the one scalar we collect.
Of the ones without the scalar a majority are from short sessions (<10 minutes), so ... reasonable that they might just not have opened an URI?

Perhaps. It's still a little lower than I'd expect. We can cross-check between counts of total_uri_count-missing "pre-account"/"shutdown" and "main"/"shutdown" to ensure we're within an order of magnitude of correct.

  • For pre-account/shutdown we have a ratio of 78%/22% for having/missing the total_uri_count.
  • For main/shutdown we have a ratio of 76%/24% for having/missing the total_uri_count.

This is not at all what I expected. Shows that pre-account is doing as well as main, but... wow. Hm. Didn't realize so few main pings contained that scalar.

We see a number of pings with durations over 48 hours, but only "shutdown" and "periodic" pings.
It's less than 1% and even less for builds after we switched from idle-daily to our own scheduler (though this might also be due to volume).
It's also all* from Windows, where sleep times are included, so it's a little bit expected (maybe we should eventually do something about that...)

(* except one single ping from a Linux machine)

Is this far outside the expected proportion of Linux users compared to Windows? 1% seems about right (though maybe it's higher on Nightly)

Ah right, proportional this is correct.

Not so much any more. Only 5 too-long pings from Linux and ~2k from Windows is disproportionate. Might be a "bug 1535632 and friends"-type problem.

Apart from that looking at the scheduler change is not meaningful, we just don't have enough data from before that (as mentioned above)

Fair.


So it seems we now have enough data to say that negative durations are definitely a problem we should look into.

In addition, we have some outrageous claims of numbers of uris and lengths of sessions which should at least be documented (in the in-tree docs?).

Aside from those, looks like it's behaving well: the validation checks out.

Now.... if you wanted to confirm that multi-store is working well, we're still missing the piece where we make sure the distribution of total_uri_count didn't change as reported by the main ping over the period where pre-account started being sent. For that we can just check TMO, which seems to report findings consistent with that hypothesis: https://mzl.la/2ZrCsYO

Flags: needinfo?(chutten)
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.