Closed Bug 1824084 Opened 2 years ago Closed 2 years ago

telemetry broken on OpenBSD

Categories

(Toolkit :: Telemetry, defect)

Unspecified
OpenBSD
defect

Tracking

()

RESOLVED FIXED
113 Branch
Tracking Status
firefox113 --- fixed

People

(Reporter: gaston, Assigned: gaston)

Details

Attachments

(1 file)

dunno if that's expected or not, but on OpenBSD even if enabled by default, telemetry seems broken at least on 112.

about:telemetry says Telemetry is collecting release data and upload is enabled. but about:telemetry#raw-payload-tab only shows 'open in the json viewer' which yields 'null'.

the browser console shows this:

Uncaught (in promise) 
Exception { name: "NS_ERROR_NOT_AVAILABLE", message: "Component returned failure code: 0x80040111 (NS_ERROR_NOT_AVAILABLE) [nsITelemetry.msSinceProcessStartExcludingSuspend]", result: 2147746065, filename: "resource://gre/modules/TelemetrySession.sys.mjs", lineNumber: 747, columnNumber: 0, data: null, stack: "getSessionPayload@resource://gre/modules/TelemetrySession.sys.mjs:747:28\ngetPayload@resource://gre/modules/TelemetrySession.sys.mjs:1086:17\ngetPayload@resource://gre/modules/TelemetrySession.sys.mjs:172:17\ngetCurrentPingData@resource://gre/modules/TelemetryControllerParent.sys.mjs:1175:43\ngetCurrentPingData@resource://gre/modules/TelemetryControllerParent.sys.mjs:169:17\n_updateCurrentPingData@chrome://global/content/aboutTelemetry.js:360:36\nupdate@chrome://global/content/aboutTelemetry.js:349:14\n", location: XPCWrappedNative_NoHelper }
TelemetrySession.sys.mjs:747

my wild guess would be about sandboxing preventing a timing operation to happen, but what do i know.. i think telemetry used to work some releases ago, but it seems it's also broken on 102esr.

:gaston, do you know if XP_LINUX is defined on OpenBSD's Firefox builds? If it's not (and we're not Mac or Windows) the timing code assumes we don't know how to do times.

Flags: needinfo?(landry)

(In reply to Chris H-C :chutten from comment #1)

:gaston, do you know if XP_LINUX is defined on OpenBSD's Firefox builds? If it's not (and we're not Mac or Windows) the timing code assumes we don't know how to do times.

nope, so now i understand why. sigh. Given that clock_gettime seems to be somewhat posix, i'll have to compare the behaviour with http://man.openbsd.org/clock_gettime... we've had CLOCK_BOOTIME since a while.

Flags: needinfo?(landry)

trying a rebuild of 112.0b5 with the obvious

--- mozglue/misc/Uptime.cpp.orig
+++ mozglue/misc/Uptime.cpp
@@ -75,7 +75,7 @@ Maybe<uint64_t> NowIncludingSuspendMs() {
   return Some(interrupt_time / kHNSperMS);
 }
 
-#elif defined(XP_LINUX)  // including Android
+#elif defined(XP_LINUX) || defined(XP_OPENBSD)  // including Android
 #  include <time.h>
 
 // Number of nanoseconds in a millisecond.

maybe that could be XP_UNIX ?

How does it behave on OpenBSD if the system is suspended? This is what's important here.

Flags: needinfo?(landry)

(In reply to Paul Adenot (:padenot) from comment #4)

How does it behave on OpenBSD if the system is suspended? This is what's important here.

the manpage above seems explicit about it:

CLOCK_BOOTTIME
    The uptime clock. Its absolute value is the time elapsed since the system booted.
    The clock begins at zero and advances continuously.
CLOCK_UPTIME
    The runtime clock. Its absolute value is the time elapsed since the system booted less any time the system was suspended.
    The clock begins at zero and advances while the system is not suspended.
Flags: needinfo?(landry)

That's great, looks like I found an old version and that wasn't that explicit.

(In reply to Landry Breuil (:gaston) from comment #3)

maybe that could be XP_UNIX ?

We can, but only because macOS is taken care of above that.

there was previous discussion about it in bug 1686405 which said solaris didnt implement CLOCK_BOOTIME at the time. petr, is it still the case ?

Flags: needinfo?(petr.sumbera)

and now i stumble upon bug 1686713 which has a bitrotten patch in https://phabricator.services.mozilla.com/D101748 .. so this one is a dupe and we could/should fix it better there ?

(In reply to Landry Breuil (:gaston) from comment #8)

there was previous discussion about it in bug 1686405 which said solaris didnt implement CLOCK_BOOTIME at the time. petr, is it still the case ?

It's still the same. No support for CLOCK_BOOTIME on Solaris.

Flags: needinfo?(petr.sumbera)

ok, thanks for confirming. With the quick patch from https://bugzilla.mozilla.org/show_bug.cgi?id=1824084#c3 i have working about:telemetry (including seeing it as json), so i think it's worth unbreaking it at least for BSDs.

according to https://man.freebsd.org/cgi/man.cgi?query=clock_gettime, for FreeBSD CLOCK_BOOTIME has this meaning:

     CLOCK_BOOTTIME
	     Starts at zero when the kernel boots and increments monotonically
	     in	SI seconds while the machine is	running.

so my understanding is that the behaviour is the same as for OpenBSD ? NetBSD doesnt seem to have CLOCK_BOOTIME per https://man.netbsd.org/clock_gettime.2. so i think that was the original reason for the ifdef dance in https://phabricator.services.mozilla.com/D101748.

should we use #elif defined(XP_LINUX) || defined(XP_OPENBSD) || defined(XP_FREEBSD) to make sure NetBSD & solaris still uses the fallback #else ? there's no XP_DRAGONFLY in https://searchfox.org/mozilla-central/source/build/moz.configure/init.configure#528

Sylvestre, since you were interested in https://phabricator.services.mozilla.com/D101748, any opinion ? should i dupe this bug with bug 1686713 and revive the phabricator review, or create a new one ?

Flags: needinfo?(sledru)

As you wish :)
I don't have strong opinions on this

Flags: needinfo?(sledru)

i've reread the code, and since there's a failsafe handling of not having CLOCK_BOOTIME defined in https://searchfox.org/mozilla-central/source/mozglue/misc/Uptime.cpp#98 i think we should be good with replacing XP_LINUX by XP_UNIX (which comes from NSPR), since NetBSD and Solaris will have a working NowExcludingSuspendMs with that, and still return Nothing() in NowIncludingSuspendMs. I think that'll be simpler than https://phabricator.services.mozilla.com/D101748.

Now that i'm rereading the manpages and freebsd's code and openbsd's code, i understand things aren't so simple, since Linux, OpenBSD and FreeBSD don't seem to agree on the BOOTTIME/UPTIME meaning.

FreeBSD's manpage says

Finally, CLOCK_BOOTTIME is an alias for CLOCK_UPTIME for compatibility with other systems.

while for OpenBSD's, the code in https://github.com/openbsd/src/blob/50603113d13b32b82b4569f48902052db016a195/sys/kern/kern_time.c#L119 and the manpage seems to mean that CLOCK_UPTIME should be used in NowExcludingSuspendMs and CLOCK_BOOTIME in NowIncludingSuspendMs. The more i think about it, the more it seems https://phabricator.services.mozilla.com/D101748 was right.

to check the values, in about:telemetry i'm checking browser.engagement.session_time_including_suspend and browser.engagement.session_time_excluding_suspend in the scalars/parent tree. (eg about:telemetry#scalars-tab_search=sus) With my current fix (adding || defined (XP_OPENBSD), i have the same values for both, even after a suspend/resume session.

Will recheck with an adaptation/update of the aforementioned bitrotten patch. Val, did you end up with the same analysis as me ?

Of course, just replacing XP_LINUX by XP_UNIX wont break build for any BSD platform and will enable telemetry, just with a single value not reporting 'correct' information for OpenBSD. It's a simpler patch, but should we strive for absolute correctness and do the #ifdef dance ?

Flags: needinfo?(val)

I think it would be great.
We can always revisit this later if the numbers are clearly wrong

should unbreak telemetry for BSDs and Solaris.

Assignee: nobody → landry
Status: NEW → ASSIGNED

on top of https://phabricator.services.mozilla.com/D173625 i'm testing the following in NowExcludingSuspendMs:

#ifdef XP_OPENBSD
  if (clock_gettime(CLOCK_UPTIME, &ts)) {
#else
  if (clock_gettime(CLOCK_MONOTONIC, &ts)) {
#endif

(In reply to Landry Breuil (:gaston) from comment #18)

on top of https://phabricator.services.mozilla.com/D173625 i'm testing the following in NowExcludingSuspendMs:

#ifdef XP_OPENBSD
  if (clock_gettime(CLOCK_UPTIME, &ts)) {
#else
  if (clock_gettime(CLOCK_MONOTONIC, &ts)) {
#endif

with that chunk on 112.0b6 and restarting firefox then suspending/resuming, i get this:

browser.engagement.session_time_excluding_suspend 	49609
browser.engagement.session_time_including_suspend 	76725 

previously i had the same value for both metrics. So will push this chunk to phabricator on top of the previous diff.

(In reply to Landry Breuil (:gaston) from comment #15)

Will recheck with an adaptation/update of the aforementioned bitrotten patch. Val, did you end up with the same analysis as me ?

Nah… I wasn't entirely correct then. New understanding:

  • FreeBSD currently does not have a monotonic clock that includes suspend time. At all. I might look into fixing that sometime…
  • OpenBSD actually includes suspend time by default i.e. right into CLOCK_MONOTONIC, so CLOCK_BOOTTIME indeed means the same thing as Linux, but for that reason. OpenBSD introduced CLOCK_UPTIME to mean excluding. And the name makes sense: the uptime command should mean "the time spent actually up and not and suspend". btw: microtime(9) is the key to reading the code you linked.
  • macOS also does it the OpenBSD way (manpage; and this file we're modifying too), except there's no CLOCK_BOOTTIME alias and the excluding clock is only available as CLOCK_UPTIME_RAW, with the _RAW suffix

Conclusion: this is a complete mess

CLOCK_MONOTONIC CLOCK_BOOTTIME CLOCK_UPTIME
Linux excluding including
OpenBSD including including excluding
macOS including excluding (called …_RAW)
FreeBSD (as of early 2023) excluding == CLOCK_UPTIME, proposed change == CLOCK_MONOTONIC

So it seems that if I'm going to try to add a suspend-including monotonic clock to FreeBSD, it's going to have to match OpenBSD's behavior. Submitted the alias change to code review now, which should start a general discussion in general about adding the suspend-aware clock :)

So it should be safe to assume:

  • existence of CLOCK_UPTIME (and this check should be higher priority) implies it's the excluding one and CLOCK_MONOTONIC is including
  • existence of CLOCK_BOOTTIME implies it's the including one and CLOCK_MONOTONIC is excluding
Flags: needinfo?(val)

Thanks for double-checking everything everyone, I agree that this is a bit of a mess. I unfortunately don't have an OpenBSD or FreeBSD readily available for testing, but I'm around for reviewing and merging whatever maintainers for each system feel is the best.

Regarding comment 20, we're not going to switch macOS to using the libc layer, it's a lot slower, we use directly the kernel APIs. Last I measured, going through libc to make things somewhat neater was about 3 times slower, not that it matters here, but it does for regular time-stamping code.
Let's keep that outside of this already complex situation.

:gaston, your number look about right to me, I'm going to merge this patch for now.

:unrelentingtech, let me know what direction FreeBSD ends up going. I read the patch there, not sure if there's been a decision elsewhere or if we're waiting -- it's only been a few days, after all. We can quickly follow up and land a patch in Gecko to do whatever is best for you folks (maybe just send a patch or a diff if it's just a define or two).

Pushed by alissy@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/fc4934c14a1f use linux codepaths for BSDs in uptime-related functions r=padenot
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 113 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: