Preliminary ad hoc System Validation of Messaging System reinstrumentation
Categories
(Toolkit :: Telemetry, task)
Tracking
()
People
(Reporter: chutten, Unassigned)
References
Details
Once the reinstrumentation of Messaging System reaches Nightly and begins sending data, we can start taking a casual ad hoc look at the data that's being received and conduct iterative system validation.
This should include (but is of course not limited to):
- Ensuring the descriptions are as full and complete and correct as they can be
- Ensuring we didn't miss any fields (showing up in invalid_nested_data or unknown_keys)
- Ensuring the data is of comparable shape and content to its predecessors in the
messaging_system.*
dataset namespace
Comment 1•2 years ago
|
||
Also including JSON parse errors
Reporter | ||
Comment 2•2 years ago
|
||
Some nuance on the JSON parse error is that some event_context
values are coming in that aren't JSON (they can be bare strings like "whats-new-panel" for the "undesired-events" messaging_system.ping_type
). This appears to be deliberate and doesn't affect the ability of the basic messaging_system.event_context
to report the value so itself isn't an error, even though it is appropriately catching a ParseError
on the client.
Put another way: the presence of a non-zero value of messaging_system.event_context_parse_error
is not sufficient evidence of invalidity of the reinstrumentation. The rest of the ping payload needs to be examined to see if it is a case like "undesired-events"
(( Note that the scope of this bug includes adding discoveries such as these to the metrics' descriptions or at least as Gleannotations to aid future analysts ))
Comment 3•2 years ago
•
|
||
Initial Exploration Notebook: https://colab.research.google.com/drive/1ieOZziL3QI8mj8xU-xByFDRJPYsBD8TV#scrollTo=kcU16XNrgmiz
This uses a sample of 5000 pings.
Some interesting findings:
First of all, we don't see data that is very obviously wrong. Always a good sign!
It is worth rethinking context parse error; I will investigate potential implications of the way we've chosen to handle it now that we see the percentage of cases that will error: https://bugzilla.mozilla.org/show_bug.cgi?id=1835151
That's mostly it on that front.
Moving onto fields:
Here are the events we're actually getting:
ASR_RS_NO_MESSAGES 4736
ASR_RS_ERROR 113
IMPRESSION 85
MOMENTS_PAGE_SET 20
CLICK_BUTTON 18
DISMISS 17
INDEXEDDB_OPEN_FAILED 3
TRANSACTION_FAILED 2
TARGETING_EXPRESSION_ERROR 2
SELECT_CHECKBOX 1
DISMISSED 1
SESSION_END 1
ENABLE 1
Name: string.messaging_system_event
While we aren't getting a huge percentage of pings containing attribution, we do see some!
%2528not%2Bset%2529 27
whatsnew 3
Name: string.messaging_system_attribution_campaign
%2528not%2Bset%2529 30
Name: string.messaging_system_attribution_content
mozillaci 21
mozorg 7
Name: string.messaging_system_attribution_dlsource
www.google.com 16
www.bing.com 11
firefox-browser 3
Name: string.messaging_system_attribution_source
chrome 16
edge 11
firefox 3
Name: string.messaging_system_attribution_ua
Our ASR specific recorded locale matches our expectations for nightly locales!
Here's where the pings were actually coming from:
undesired-events 4856
moments 20
spotlight 20
whats-new-panel 7
cfr 6
infobar 2
Name: string.messaging_system_ping_type, dtype: int64
Here's some useful context data, consider we can tell who clicked primary and secondary buttons!
{"source":"secondary_button","page":"spotlight"}
"message-groups"
{"page":"about:firefoxview"}
{"page":"about:welcome"}
{"source":"primary_button","page":"about:firefoxview"}
{"source":"primary_button","page":"spotlight"}
We even observe some upgrades:
FX_MR_106_UPGRADE 18
Reporter | ||
Comment 4•2 years ago
|
||
Looking at all the data that's come in over the weekend, we have 0 unknown keys and 0 invalid nested data.
Of note is that somehow I specified invalid_nested_data
be sent on the "metrics" ping instead of the "messaging-system" ping. Whoops. (bug 1835656)
We might be able to call this ad hoc approach done and move on to building the system health dashboard. Dan, what do you think?
Reporter | ||
Comment 5•1 years ago
|
||
Dan and the group chatted about the ad hoc data validation at today's Messaging System PingCentre Reinstrumentation sync and he's happy to move on to building the system health dashboard.
Description
•