Closed Bug 1686816 Opened 3 years ago Closed 3 years ago

Slow script probe is reporting unexpected values for the hang_duration

Categories

(Core :: Performance, defect)

Firefox 83
defect

Tracking

()

RESOLVED FIXED
86 Branch
Tracking Status
firefox86 --- fixed

People

(Reporter: esmyth, Assigned: Gijs)

Details

Attachments

(2 files)

Basic information

Steps to Reproduce:

SELECT DISTINCT
  REGEXP_EXTRACT(emv.value, r'[^0-9.]'),
  UNICODE(REGEXP_EXTRACT(emv.value, r'[^0-9.]'))
FROM `moz-fx-data-shared-prod.telemetry.events`
  LEFT JOIN UNNEST(event_map_values) AS emv
WHERE event_category = 'slow_script_warning'
  AND sample_id = 53
  AND submission_date >= DATE_SUB(current_date, INTERVAL 7 DAY)
  AND emv.key = 'hang_duration'
  AND REGEXP_EXTRACT(emv.value, r'[^0-9.]') IS NOT NULL

Expected Results:
hang_duration is a float representing the duration of the hang

Actual Results:
hang_duration is a string that can include characters like ,, /, ', R

Nika suggested that to fix this we should make the nsPrintfCString stringifier delegate to rust's standard library instead of snprintf, and that'd be an XPCOM change.

Component: Performance → XPCOM
Summary: Slow script probe is reporting unexpected values for the hang_duration → nsPrintfCString uses locale-adapted decimal separators when stringifying floats (was: Slow script probe is reporting unexpected values for the hang_duration)

... except it turns out doing that is really not straightforward. What would be easier is just not putting floats into this string. We'd lose sub-ms precision but given the hangs have to be >10s to count that probably doesn't matter.

Component: XPCOM → Performance
Summary: nsPrintfCString uses locale-adapted decimal separators when stringifying floats (was: Slow script probe is reporting unexpected values for the hang_duration) → Slow script probe is reporting unexpected values for the hang_duration
Assignee: nobody → gijskruitbosch+bugs
Status: NEW → ASSIGNED

uptime appears to suffer from the same issue as hang_duration.

wait_count and n_tab_deselect are reporting bad values, too (should be int).

(In reply to esmyth (please @ me on Slack/Riot) from comment #5)

uptime appears to suffer from the same issue as hang_duration.

This is kind of odd because it's not passed at all from the C++ case. The JS callsite uses string coercion, which should use .toString, which should always use . as a separator, AIUI (last sentence in this section on MDN , plus compare with Number.toLocaleString).

wait_count and n_tab_deselect are reporting bad values, too (should be int).

What kind of bad values? These keys also aren't passed from the C++ callsite. It does look like, if the calls happen during shutdown, https://searchfox.org/mozilla-central/rev/2a24205479519e70c0574929f45730d285141584/browser/modules/ProcessHangMonitor.jsm#664-669 might end up not having values for some of the properties, which would result in the strings "undefined" (and maybe "NaN"?) being in the data. Does that look like what's happening?

Flags: needinfo?(esmyth)

I just confirmed that the bad values for wait_count and n_tab_deselect are all occurrences of the string undefined

Flags: needinfo?(esmyth)

Unfortunately a meaningful automated test here is difficult because we're dealing with shutdown.

Pushed by jwein@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/00a42e2cf846
ensure we have useful values for waitCount and deselectCount to avoid 'undefined' showing up in telemetry, r=jaws
Pushed by gijskruitbosch@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/1780fe9f6446
use AppendFloat to fix floating point separator issues with the slow script warning probe in localized environments, r=mccr8
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 86 Branch
You need to log in before you can comment on or make changes to this bug.