Open Bug 1205567 Opened 9 years ago Updated 1 year ago

Include sleep time in Telemetry sessionLength and subsessionLength

Categories

(Toolkit :: Telemetry, defect, P4)

defect
Points:
1

Tracking

()

Tracking Status
firefox43 --- affected

People

(Reporter: gfritzsche, Assigned: padenot)

References

(Depends on 1 open bug)

Details

(Whiteboard: [measurement:client])

Per bug 1201480 and bug 1204823, sessionLength & subsessionLength behave inconsistently over platforms.

On Windows, they include the time the computer was put into sleep, on OSX (& apparently Linux) they don't.
We should make this behave consistently if we want to base measurements on this.
Depends on: 1204823
Depends on: 1201480
I assume we should be fine with having measurements based on "time since Firefox was started" (i.e. include sleep time).

That is the behavior of totalTime so far. activeTicks would give us "active browser usage" times, if we additionally needed "time of Firefox open and computer not in sleep" we are probably better off with adding that as a new measurement.

Any objections or concerns?
Flags: needinfo?(vladan.bugzilla)
Flags: needinfo?(rvitillo)
Flags: needinfo?(bcolloran)
Sounds good. We should add a new measurements for "time of Firefox open and computer not in sleep" as well.
Flags: needinfo?(rvitillo)
Yes, I agree:
- sessionLength and subsessionLength should report wall clock time on all platforms
- we need a new measurement for time-not-asleep
- we need to fix activeTicks bug 1107782
Flags: needinfo?(vladan.bugzilla)
Agree with Roberto and Vladan.
Flags: needinfo?(bcolloran)
Points: --- → 1
Priority: -- → P2
Whiteboard: [unifiedTelemetry] → [unifiedTelemetry][measurement:client]
Priority: P2 → P3
See Also: → 1205985
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #3)
> - we need a new measurement for time-not-asleep

This is bug 1205985.
See Also: → 1219315
Blocks: 1105864
Whiteboard: [unifiedTelemetry][measurement:client] → [measurement:client]
Priority: P3 → P4
See Also: → 1485692
See Also: → 1535632

https://github.com/python-trio/trio/issues/1586 has some details about clock behavior with respect to sleep on different platforms.

Using clock_gettime(CLOCK_MONOTONIC, ...) includes sleep time on MacOS and FreeBSD. Using CLOCK_BOOTTIME is alleged to include sleep time on Linux.

Summary: Make Telemetry sessionLength and subsessionLength behave consistently over platforms → Include sleep time in Telemetry sessionLength and subsessionLength

https://github.com/padenot/clock-test/blob/main/main.cpp has code for all platforms. The requirements are as such:

David, Haik, Jed, would you mind sanity checking the link above for your particular platforms? I've tested this on real machines multiple times with my kitchen timer, it seems to work well.

The goal here is to align all platforms: make macOS and Linux report the time including suspended time (so that it matches Windows), and then add another probe that will have the time without the suspended time, as comment 2 says (confirmed by folks in #telemetry). Android uses Glean, and the modifications there will come right after.

Flags: needinfo?(jld)
Flags: needinfo?(haftandilian)
Flags: needinfo?(dmajor)
Assignee: nobody → padenot

Windows looks good on both hardware and VM.

Flags: needinfo?(dmajor)

According to this helpful chart on Wikipedia, RHEL 6 is still supported under an extended support arrangement, and this would probably fail there (according to this table of software versions indicating the kernel is forked from 2.6.32). If the impact is only that we get incomplete telemetry from those systems, or if we fall back to not counting sleep time, that may not be a showstopper.

Otherwise, the Linux case looks okay to me. (I don't know that it needs to use floating-point arithmetic, but that's a minor detail.)

Flags: needinfo?(jld)

Discussed this with Paul on chat. Here's a 5 minute sleep on macOS Big Sur.

$ ./a.out 
Time is 68816049ms (awake time 68270213ms)
Put the computer to sleep, come back in some time..., and type 'y'y
Time is 69135794ms (awake time 68367205ms)
Program duration was 319745ms (awake time 96992ms)
Flags: needinfo?(haftandilian)

We ended up fixing this differently. We have two new telemetry probes that track the session time:

  • browser.engagement.session_time_including_suspend
  • browser.engagement.session_time_excluding_suspend

that return the duration of the session including or excluding the time the device was suspended, on all OSes, that can be used for computations accross OSes.

It was decided to not change the semantics of the existing probes to not unnecessarily break the continuity of the existing data.

The new probes are implemented stand-alone in a new file, Uptime.cpp, that uses the most modern system calls on all OSes, when available, with various fallbacks when they are not available, making the code trivial to understand, especially compared to our platform-specific TimeStamp implementations.

All in all, it might be time to close this and maybe to audit the queries that are in use to subtitute session length by one of the two new alternative probes. I think that in general, we're mostly interested in the time excluding suspend. chutten, Jan-Erik, any opinion on this?

Flags: needinfo?(jrediger)
Flags: needinfo?(chutten)

Note: this has implications that go beyond changing queries to use the new field. If any query or if the subsession length is used for any of the KPI-related work, this should be carefully evaluated before any substitution happens (and that's well beyond our team)

I think maybe we hold off on substituting the core values in Firefox Telemetry until and unless Data Science comes to us with a request to change it over. The no-sleep or all-sleep data's there now for those who wish to be consistent, but I don't know how many analyses rely on such precision.

Jan-Erik, what are your thoughts about how we measure time in Glean? FOG "just" uses glean-core's reckoning which I think uses its own logic around sleep.

Flags: needinfo?(chutten)

In Glean we use two different ways to get time:

  • For events we use time measurment including suspend time.
  • For timing distribution and timespan we use time::precise_time_ns and I actually currently do not know whether that includes or excludes suspend time. Something we should figure out and properly document.
Flags: needinfo?(jrediger)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.