Closed Bug 1351396 Opened 7 years ago Closed 7 years ago

Validate the "new-profile" ping data for the Nightly channel

Categories

(Toolkit :: Telemetry, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
Tracking Status
firefox55 --- affected

People

(Reporter: Dexter, Assigned: Dexter)

References

Details

(Whiteboard: [measurement:client])

Once bug 1120370 lands we should make sure to validate the incoming data on our servers. We should at least check:

- That the install ping is received within a reasonable time after the profile creation.
- That the subsessionCounter behave as expected (since we're on a fresh profile, this should be 1 for the ping and the profile)
- That the attribution codes are available
Blocks: 1351394
Priority: -- → P3
Whiteboard: [measurement:client]
Blocks: 1351397
No longer blocks: 1351397
Summary: Validate the "install" ping data for the Nightly channel → Validate the "new-profile" ping data for the Nightly channel
Priority: P3 → P1
Assignee: nobody → alessio.placitelli
Chris, could you take a quick sanity check to the gist at [1]? This is a work in progress analysis, not yet ready for a full fledged review.

The conclusions so far:

- (bad) we might be failing to mark the 'new-profile' ping as sent and resending it for some clients;
- (bad) the profile creation date (in the environment) might be very, very wrong on some profiles (but we knew that already);
- (bad) the pingsender is failing often, on some machines, and the pings are being picked up again by Firefox and sent;
- (good) 70% of the 'new-profile' ping creation dates match exactly with the profile creationDate in the environment;


[1] - https://gist.github.com/Dexterp37/c4fbe5455ddbec4871d0f4154bbbef82
Flags: needinfo?(chutten)
(In reply to Alessio Placitelli [:Dexter] from comment #1)
> Chris, could you take a quick sanity check to the gist at [1]? This is a
> work in progress analysis, not yet ready for a full fledged review.

Did you consider looking into whether the sessionLength of sessions sending new-profile on shutdown were, indeed, < 30min?

You use pct before you def it. (Also, you can mult by 100.0 instead of casting to float, if you want)

Duplicated docIds of 0.2% is nice and low. (Compare/contrast 1% dupe main/crash pings even before ping sender) However, keep in mind that we drop duplicates at ingestion these days. Talk to :trink to get the "true" docId dupe rate for new-profile.

You should use the deduped-by-docid subset instead of subset when you're looking at the reasons ( In [39] ). 

Also for In [39], order is important, so in reduceByKey addition order should be based on Timestamp.

Then you can map them to (reasonArray, 1) and countByKey() to get a more condensed representation of how this goes wrong.

You say "Most of the duplicate pings are being sent at shutdown." but it is also true that most of the non-duplicate pings are sent at shutdown as well. Getting a proportion to compare with the 75/25 from In [23] would be illustrative.

Need a concluding statement above "Does the profileCreationDate match the date we received the pings?" to say something about how misbehaving clients' new-profile pings are twice as likely to be sent by pingsender than behaving clients'... but it's tricky, since pingsender only _gets_ one chance to send things, so this might not actually be as damning as we expect. After all, our definition of misbehaviour precludes pingsender from being most common.

> The conclusions so far:
> 
> - (bad) we might be failing to mark the 'new-profile' ping as sent and
> resending it for some clients;

Would this account for duped docids or duped client ids? Or both?

> - (bad) the profile creation date (in the environment) might be very, very
> wrong on some profiles (but we knew that already);

We did know that? Got a bug#?

> - (bad) the pingsender is failing often, on some machines, and the pings are
> being picked up again by Firefox and sent;

To be fair, the second half of that is a good thing. If we didn't see them being picked up again, we would have to worry about whether we're receiving all of the new-profile pings we generate.

> - (good) 70% of the 'new-profile' ping creation dates match exactly with the
> profile creationDate in the environment;

Even being a day off is probably okay (if, for example, it was sent by TelemetrySend instead of pingsender).

All of your conclusions seem backed by results found in the analysis and the analysis appears correct (if a little under-populated). Once tidied and accepted at reports.tmo we can make plans about what to do about these findings.
Flags: needinfo?(chutten)
(In reply to Chris H-C :chutten from comment #2)
> (In reply to Alessio Placitelli [:Dexter] from comment #1)
> > Chris, could you take a quick sanity check to the gist at [1]? This is a
> > work in progress analysis, not yet ready for a full fledged review.
> 
> Did you consider looking into whether the sessionLength of sessions sending
> new-profile on shutdown were, indeed, < 30min?

Good suggestion, I've just added this to the analysis at reports.tmo (https://github.com/mozilla/mozilla-reports/pull/65).

> You say "Most of the duplicate pings are being sent at shutdown." but it is
> also true that most of the non-duplicate pings are sent at shutdown as well.
> Getting a proportion to compare with the 75/25 from In [23] would be
> illustrative.

Good point. That doesn't look too bad compared to that number, we're about 1%.

> > The conclusions so far:
> > 
> > - (bad) we might be failing to mark the 'new-profile' ping as sent and
> > resending it for some clients;
> 
> Would this account for duped docids or duped client ids? Or both?

Duped client ids.

> > - (bad) the profile creation date (in the environment) might be very, very
> > wrong on some profiles (but we knew that already);
> 
> We did know that? Got a bug#?

Heh, nope. Just realized that it happened when performing other analyses using the creationDate.
We had some dates way earlier than the Firefox release date or way ahead in the future.
I'm not sure if there's a bug for that, though, or how big of an issue that is.

> > - (bad) the pingsender is failing often, on some machines, and the pings are
> > being picked up again by Firefox and sent;
> 
> To be fair, the second half of that is a good thing. If we didn't see them
> being picked up again, we would have to worry about whether we're receiving
> all of the new-profile pings we generate.

Indeed. Moreover, looking again at the analysis, this doesn't seem to be a problem for us.

> All of your conclusions seem backed by results found in the analysis and the
> analysis appears correct (if a little under-populated). Once tidied and
> accepted at reports.tmo we can make plans about what to do about these
> findings.

Ok, the analysis is in review!
Saptarshi, a preliminary analysis of the 'new-profile' ping data coming from Nightly is available at https://github.com/mozilla/mozilla-reports/pull/65 !

Feel free to drop comments/leave your feedback there too!
Flags: needinfo?(sguha)
The analysis was reviewed and merged. It's available at [1]. The main points:

- the 'new-profile' pings are sent with the correct reason, 'startup', when sent on sessions of at least 30 minutes;
- the 'new-profile' pings are sent with the correct reason, 'shutdown', if the session terminates earlier than 30 minutes;
- we have a low duplicates rate, 0.2%;
- some clients are sending multiple 'new-profile' pings, even if they shouldn't; this could be due to crashes or some other bug in the client; however, we do not have enough data to support any of the two cases.

Closing this as fixed for now, with the objective of running the analysis again on Beta after the SF workweek (bug 1366819).

[1] - https://github.com/mozilla/mozilla-reports/blob/master/projects/newprofile_ping_nightly_validation.kp/knowledge.md
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Flags: needinfo?(sguha)
You need to log in before you can comment on or make changes to this bug.