Several performance timing distribution metrics contain unexpected histogram keys
Categories
(Data Platform and Tools :: Glean: SDK, defect, P1)
Tracking
(Not tracked)
People
(Reporter: esmyth, Assigned: mdroettboom)
Details
Attachments
(3 files)
I found several cases of clients with timing distribution recording times for histogram keys larger than the 600000000000 ns max. In a few cases, the value is larger than bigquery's INT64 type can handle and it broke a query.
SELECT
client_info.client_id,
CAST(key AS NUMERIC) AS histogram_key,
SUM(value) AS histogram_value,
FROM `moz-fx-data-shared-prod.org_mozilla_firefox.metrics`
CROSS JOIN UNNEST(metrics.timing_distribution.gfx_content_paint_time.values)
WHERE DATE(submission_timestamp) >= '2020-08-18'
AND DATE(submission_timestamp) < '2020-09-18'
GROUP BY 1, 2
HAVING histogram_value > 0
AND histogram_key > 600000000000
ORDER BY histogram_key DESC
The following probes show similar issues:
performance_page_non_blank_paint
performance_time_response_start
performance_time_dom_interactive
performance_time_dom_content_loaded_start
performance_time_dom_content_loaded_end
performance_interaction_keypress_present_latency
geckoview_page_load_time
geckoview_page_reload_time
javascript_gc_slice_time
javascript_gc_mark_time
The performance_time_load_event_end
distribution somehow includes the key 30370004h
SELECT
client_info.client_id,
metrics.timing_distribution.performance_time_load_event_end.time_unit,
key AS histogram_key,
SUM(value) AS histogram_value,
FROM `moz-fx-data-shared-prod.org_mozilla_firefox.metrics`
CROSS JOIN UNNEST(metrics.timing_distribution.performance_time_load_event_end.values)
WHERE DATE(submission_timestamp) >= '2020-08-18'
AND DATE(submission_timestamp) < '2020-09-18'
AND NOT REGEXP_CONTAINS(key, r'^\d+$')
GROUP BY 1, 2, 3
ORDER BY histogram_key DESC
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 1•4 years ago
|
||
First, the maximum expected value is 6e17 (not 6e11) for metrics where the input is defined in ms, which is the case for many Geckoview metrics. With that, the query only returns a single invalid value, which is MAXINT64, all from a single client. The fact that it's a single client makes me think there is just something broken or about that specific client.
Same for the second issue of outright garbage characters in the key -- that only happens for a single client.
I plan to (a) drill down on what might be special about these specific clients and (b) consider adding code at the edge schema to reject pings with these errors.
Assignee | ||
Comment 2•4 years ago
|
||
This is the single ping found with non-numeric data in the timing_distribution keys, as extracted from the payload_bytes_decoded
table. There doesn't seem to be anything else wrong with the file. Just some random event, perhaps?
Comment 3•4 years ago
|
||
Assignee | ||
Comment 4•4 years ago
|
||
Fix for the second half of the bug where there is outright invalid characters...
Updated•4 years ago
|
Assignee | ||
Comment 5•4 years ago
|
||
Closing this, as with the correct maximum used, it boils down to a single erroneous client.
Description
•