Put provenance attribution in `installation.first_seen` event
Categories
(Firefox :: Installer, task, P2)
Tracking
()
People
(Reporter: bytesized, Assigned: bytesized)
References
Details
(Whiteboard: [fidedi-ope])
Attachments
(1 file)
48 bytes,
text/x-phabricator-request
|
diannaS
:
approval-mozilla-beta+
|
Details | Review |
Filing this as a followup bug to do the work described by this comment:
The [
installation.first_seen
] event will be on the "event" ping and the Scalars will be on the "new-profile" ping. You should be able to link the two by client_id (since there should only ever be one "new-profile" ping per client_id). There's no guarantee you'd get both -- I can think of circumstances where you'd end up with only one or the other -- but in the case of folks who stick around and aren't obviously robots with fifteen second initial sessions? I think you should have a decent chance of being able to link.But by all means dump the scalar info into the event extras to save yourself the hassle.
Remember, so long as we're not talking event-per-vsync sorts of things, optimize not for the size of the dataset: optimize for the length of time it takes to analyze the data.
Or, put another way: data that is too hard to use will not be used. We don't want to collect data that won't be used. So make it easy.
Updated•2 years ago
|
Assignee | ||
Comment 1•2 years ago
|
||
It's not actually clear to me that we can do this. Events appear to be limited to 10 keys in their extras
object. There already appear to be 10 keys in the extras object for installation.first_seen
:
version
build_id
admin_user
profdir_existed
from_msi
silent
default_path
install_existed
other_inst
other_msix_inst
Comment 2•2 years ago
|
||
(In reply to Robin Steuber (they/them) [:bytesized] from comment #1)
It's not actually clear to me that we can do this. Events appear to be limited to 10 keys in their
extras
object. There already appear to be 10 keys in the extras object forinstallation.first_seen
:
version
build_id
admin_user
profdir_existed
from_msi
silent
default_path
install_existed
other_inst
other_msix_inst
Oh dear. TIL. chutten, is there anything to be done here? Can we have another level of nesting to avoid these limits some way?
Assignee | ||
Updated•2 years ago
|
Comment 3•2 years ago
|
||
Nesting isn't supported because these string -> string maps are interpreted as that structure on ingestion.
(And I don't recommend encoding your own second-level nesting in the event extra values either. No one should write parsers in SQL).
With present technology I'd recommend having a second event with a similar name (maybe with a suffix like _ext
) with the extra extras. There's a possibility we might only receive one or the other event instead of both, but that should only happen if the "event" ping splits between them and one of the two pings fails to be received properly (ie, should basically never happen).
Otherwise or as well, we could look into raising the event extras limit (might just be a matter of raising the limits). Changes to Legacy Telemetry are still possible, after all. But we'd need to consider the ramifications. Glean has the same limit, after all, so we may not have yet had a proper think about the impact of large numbers of event extras. (e.g., should we change the event record trigger for flushing storage and submitting the ping?)
Assignee | ||
Comment 4•2 years ago
|
||
(In reply to Chris H-C :chutten from comment #3)
With present technology I'd recommend having a second event with a similar name (maybe with a suffix like
_ext
) with the extra extras. There's a possibility we might only receive one or the other event instead of both, but that should only happen if the "event" ping splits between them and one of the two pings fails to be received properly (ie, should basically never happen).
@nalexander does this sound like an improvement over the current situation? Should I get started on this?
Comment 5•2 years ago
•
|
||
(In reply to Robin Steuber (they/them) [:bytesized] from comment #4)
(In reply to Chris H-C :chutten from comment #3)
With present technology I'd recommend having a second event with a similar name (maybe with a suffix like
_ext
) with the extra extras. There's a possibility we might only receive one or the other event instead of both, but that should only happen if the "event" ping splits between them and one of the two pings fails to be received properly (ie, should basically never happen).@nalexander does this sound like an improvement over the current situation? Should I get started on this?
Argh -- a partially typed comment that I missed finishing. Anyway:
Thanks, :chutten.
:bytesized: yes, I think we'd better to do this, since I'm not having a lot of luck joining installation.first_seen
events to new-profile
pings in either direction.
Earlier I witnessed the reverse, lots (>50%) of new_profile
pings without installation.first_seen
events, over in https://bugzilla.mozilla.org/show_bug.cgi?id=1811374. (That is expected, prompting that ticket.) See https://sql.telemetry.mozilla.org/queries/90885/source?p_range=2022-12-01--2022-12-28.
But now I witness lots of installation.first_seen
events without corresponding new_profile
pings, perhaps >60%! See https://sql.telemetry.mozilla.org/queries/90886/source?p_range=2022-11-01--2022-11-29.
So yes, we'd better get an additional event, in the hopes that we'll get two events and be able to actually have coherent data from the single system. Thanks!
Assignee | ||
Updated•2 years ago
|
Assignee | ||
Comment 6•2 years ago
|
||
We can't add the provenance data to the installation.first_seen
extra data because it is already at its maximum number of keys. So instead we will add the installation.first_seen_prov_ext
event which will be sent at the same time as installation.first_seen
and will contain provenance attribution data in its extras object.
Assignee | ||
Comment 7•2 years ago
|
||
I confirmed with :chutten that since this telemetry sends precisely the same values that are sent by the code added in Bug 1815023, no additional data review is needed for this telemetry change.
Comment 9•2 years ago
|
||
bugherder |
Assignee | ||
Comment 10•2 years ago
|
||
Comment on attachment 9322883 [details]
Bug 1821189 - Associate provenance attribution with installation.first_seen
event r=nalexander!
Beta/Release Uplift Approval Request
- User impact if declined: This is a telemetry-only change. So there would be no impact to users, but it will negatively impact our ability to analyze what sort of attribution data is available in the Zone Identifier Alternate Data Stream on Windows.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Code changes are telemetry-only. And they are wrapped in
try/catch
blocks. So even if there is something wrong with the code, there should be no user-facing impact. - String changes made/needed: None
- Is Android affected?: No
Comment 11•2 years ago
|
||
Comment on attachment 9322883 [details]
Bug 1821189 - Associate provenance attribution with installation.first_seen
event r=nalexander!
Approved for 112.0b3
Comment 12•2 years ago
|
||
bugherder uplift |
Description
•