Closed Bug 1522381 Opened 9 months ago Closed 9 months ago

Sync panel broken in nightly due to TelemetryUtils.jsm

Categories

(Firefox :: Preferences, defect, P1)

defect

Tracking

()

VERIFIED FIXED
Firefox 67
Tracking Status
firefox-esr60 --- wontfix
firefox65 --- wontfix
firefox66 --- verified
firefox67 --- verified

People

(Reporter: pauljt, Assigned: markh)

Details

(Keywords: regression)

Attachments

(1 file)

I'm seeing an issue that when you are logged in to sync on nightly on windows, the sync panel in preferences doesn't load. The error trace is something like:

TypeError: date is undefined[Learn More] TelemetryUtils.jsm:121:5
	truncateToHours resource://gre/modules/TelemetryUtils.jsm:121
	SyncTelemetryImpl resource://services-sync/telemetry.js:482
	<anonymous> resource://services-sync/telemetry.js:799
	<anonymous> resource://services-sync/service.js:31
	getter resource://services-sync/main.js:21
	ensureLoaded jar:file:///C:/Program Files/Nightly/omni.ja!/components/Weave.js:75
	init chrome://browser/content/preferences/in-content/sync.js:85
	init chrome://browser/content/preferences/in-content/preferences.js:43
	init_category_if_required chrome://browser/content/preferences/in-content/preferences.js:36
	gotoPref chrome://browser/content/preferences/in-content/preferences.js:166
	init_all chrome://browser/content/preferences/in-content/preferences.js:70
	_fireOnSelect chrome://global/content/elements/richlistbox.js:305
	selectItem chrome://global/content/elements/richlistbox.js:431
	onxblmousedown chrome://global/content/bindings/richlistbox.xml:125

From debugging this, it looks like the offending line istelemetry.js#480

let sessionStartDate = Services.startup.getStartupInfo().main;

For me, let sessionStartDate = Services.startup.getStartupInfo() returns an object that does NOT have a main property:

{process:"2107-04-18T20:00:31.345Z",start:"2019-01-24T04:32:27.973Z"}

This then gets passed to TelemetryUtils.jsm#120

So either that changed, or my system is borked. But Ive confirmed this on two different windows machines with different profiles and sync accounts. (havent checked extensions yet).

Note: you need to be logged into sync to see this issue.

From nsIAppStartup.idl#172

/** 
* Returns an object with main, process, firstPaint, sessionRestored properties.
* Properties may not be available depending on platform or application
*/
[implicit_jscontext] jsval getStartupInfo();

So not checking for undefined is the bug here I think.

Can you find a regression window?

Flags: needinfo?(ptheriault)
Priority: -- → P1

I can't reproduce this. I chatted with Paul in IRC and he will not have access to the machines until Monday, so we'll have a go at debugging this further.

Even though I can't reproduce it, Paul correctly points out that sync's telemetry code is unconditionally using an attribute which is explicitly documented as not always being present.

Dexter, in sync's telemetry.js we are calling TelemetryUtils.truncateToHours() to make the times less precise. This needs the session start time passed in, and we are obtaining that via Services.startup.getStartupInfo().main. The telemetry modules themselves have the session start time internally but doesn't expose it in a convenient way (and best I can tell doesn't even try to get it from getStartupInfo()). Can you recommend a way for us to obtain the session start time in a more reliable way?

Flags: needinfo?(alessio.placitelli)

(In reply to Mark Hammond [:markh] from comment #5)

Even though I can't reproduce it, Paul correctly points out that sync's telemetry code is unconditionally using an attribute which is explicitly documented as not always being present.

This is interesting, because as far as I can tell, the main should always be there. Apparently that's being recorded right at startup, if it wasn't recorded for some reason, the fixup is happening in the getStartupInfo implementation, and some telemetry data is being accumulated in the STARTUP_MEASUREMENT_ERRORS histogram.

Dexter, in sync's telemetry.js we are calling TelemetryUtils.truncateToHours() to make the times less precise. This needs the session start time passed in, and we are obtaining that via Services.startup.getStartupInfo().main. The telemetry modules themselves have the session start time internally but doesn't expose it in a convenient way (and best I can tell doesn't even try to get it from getStartupInfo()). Can you recommend a way for us to obtain the session start time in a more reliable way?

Within telemetry, the sessionStartDate ends up being a simple call to new Date, as you can verify by tracing back the calls from here.

I would personally file a bug in the appropriate component to mention that main is broken, sometimes. Then, depending on the triage/timing, decide on whether to use new Date.

Flags: needinfo?(alessio.placitelli)

Given telemetry uses new Date(), our attempt to use Services.startup.getStartupInfo().main so we have the same sessionStartDate as the main ping is never going to do what we want - so I'm adding a patch to unconditionally use new Date() in sync's telemetry too.

Assignee: nobody → markh
Flags: needinfo?(ptheriault)
Pushed by mhammond@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/db429d8a3715
don't rely on Services.startup.getStartupInfo().main for telemetry session start. r=tcsc
Status: NEW → RESOLVED
Closed: 9 months ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 67

AFAICT, this issue goes back away. Is this worth backporting to Beta?

Flags: needinfo?(markh)

Comment on attachment 9042351 [details]
Bug 1522381 - don't rely on Services.startup.getStartupInfo().main for telemetry session start. r?tcsc

Beta/Release Uplift Approval Request

Feature/Bug causing the regression

None

User impact if declined

In circumstances we don't quite understand, users may find about:preferences broken.

Is this code covered by automated tests?

Yes

Has the fix been verified in Nightly?

No

Needs manual test from QE?

No

If yes, steps to reproduce

List of other uplifts needed

None

Risk to taking this patch

Low

Why is the change risky/not risky? (and alternatives if risky)

While I haven't been able to reproduce the underlying issue causing the problem, the change is very small and I have high confidence the issue will be resolved.

String changes made/needed

Flags: needinfo?(markh)
Attachment #9042351 - Flags: approval-mozilla-beta?

Are you also planning to file a followup bug as suggested in comment 6? Or do you think it is not used (currently) anywhere else? Even if so it might be worth figuring out why it fails intermittently.

Flags: needinfo?(markh)

Paul, can you still reproduce the issue in Nightly now that the fix landed?

Flags: needinfo?(ptheriault)

Comment on attachment 9042351 [details]
Bug 1522381 - don't rely on Services.startup.getStartupInfo().main for telemetry session start. r?tcsc

Let's fix this for correctness. I'd still like some confirmation / verification of the fix though.

Attachment #9042351 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

(In reply to Liz Henry (:lizzard) (use needinfo) from comment #13)

Are you also planning to file a followup bug as suggested in comment 6? Or do you think it is not used (currently) anywhere else? Even if so it might be worth figuring out why it fails intermittently.

As noted in comment 2, the documentation explicitly mentions that some of these fields might be missing "depending on platform or application", and given we don't quite know what causes the info to be missing in this specific case, I don't see an actionable bug here.

If Paul can reproduce the problem and we can determine that the info should be there in that case, it might be worth tracking in a bug, but for now, I don't think it is.

Flags: needinfo?(markh)

(In reply to Liz Henry (:lizzard) (use needinfo) from comment #14)

Paul, can you still reproduce the issue in Nightly now that the fix landed?

I can't, but I am also no longer able to find a computer/profile combination that results in a Services.startup.getStartupInfo() call which does NOT have the .main attribute. The change avoids this code path altogether though.

For good measure, I will audit all of the current uses of Services.startup.getStartupInfo() and file follow-up bugs if I find anything. I'll mark as resolved once Ive done this.

I don't see any other cases with the same pattern. Most involve addition/subtration so won't cause an error. In ActivityStream there is one potential issue [1] but I note there is already a //todo in the code to add error checking so Im not going to file a bug for that case (and its.process, not .main, so different to this issue).

[1] https://searchfox.org/mozilla-central/source/browser/components/newtab/AboutNewTabService.jsm#342

Status: RESOLVED → VERIFIED
Flags: needinfo?(ptheriault)

Hello,

We were unable to reproduce this issue on Windows 10x64 and Windows 7x64.

Marking this as verified fixed for 66 and 67 based on comment 19.

Feel free to re-open this issue if necessary .

You need to log in before you can comment on or make changes to this bug.