Open Bug 1434680 Opened 7 years ago Updated 3 years ago

Negative session_duration reported in the session telemetry

Categories

(Firefox :: New Tab Page, defect, P3)

defect

Tracking

()

People

(Reporter: nanj, Unassigned)

References

Details

Attachments

(1 file)

The telemetry shows that this has affected 57 - 60 across all the channels, approx. 10% of the session pings in the release channel reported negative session_duration.
Here are some raw telemetry logs. { "addon_version": "2017.11.07.1100-7f4e3634", "session_duration": -7967354, "release_channel": "release", "perf": { "load_trigger_ts": 1517348815558.1025, "load_trigger_type": "menu_plus_or_keyboard", "topsites_first_painted_ts": 1517356761445.8562, "visibility_event_rcvd_ts": 1517356783786.0862 }, "timestamp": 1517356771355, "locale": "en-US", "action": "activity_stream_session", "version": "57.0.2", "date": "2018-01-30", "user_prefs": 7, "ua": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:57.0) Gecko/20100101 Firefox/57.0", "page": "about:newtab" } { "addon_version": "2017.11.07.1100-7f4e3634", "session_duration": -309102195, "release_channel": "release", "perf": { "load_trigger_ts": 1517045077341.914, "load_trigger_type": "menu_plus_or_keyboard", "topsites_first_painted_ts": 1517344424877.7678, "visibility_event_rcvd_ts": 1517354180710.688 }, "timestamp": 1517356771466, "locale": "en-US", "action": "activity_stream_session", "version": "57.0.4", "date": "2018-01-30", "user_prefs": 7, "ua": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:57.0) Gecko/20100101 Firefox/57.0", "page": "about:newtab" } { "addon_version": "2018.01.04.0062-4997c81d", "session_duration": -427630, "release_channel": "release", "perf": { "is_preloaded": true, "is_prerendered": false, "visibility_event_rcvd_ts": 1517343239686.2334, "load_trigger_ts": 1517342807719.01, "topsites_icon_stats": { "rich_icon": 1, "no_image": 0, "screenshot_with_icon": 10, "screenshot": 0, "tippytop": 1 }, "topsites_first_painted_ts": 1517341854804.7534, "load_trigger_type": "menu_plus_or_keyboard" }, "timestamp": 1517356771665, "locale": "en-US", "action": "activity_stream_session", "version": "58.0.1", "date": "2018-01-30", "user_prefs": 51, "ua": "Mozilla/5.0 (Macintosh; Intel Mac OS X 10.10; rv:58.0) Gecko/20100101 Firefox/58.0", "page": "about:newtab" }
Flags: needinfo?(dmose)
It appears to be saying that the RemotePages is sending an unload event many seconds before the page has become visible (although it does appear to be have been made visible). This seems pretty unlikely... ;-) Looking at the data, I notice that the ping from 58.* has is_preloaded and is_prerendered info, which is helpful. A couple of questions: * are we still getting pings in 60 like this? * if so, would it be possible to get, say, 10 or 15 pings only from 58 and later?
Flags: needinfo?(dmose) → needinfo?(najiang)
Flags: needinfo?(najiang)
(In reply to Dan Mosedale (:dmose) from comment #2) > > * are we still getting pings in 60 like this? Yes, it remains in 60. See the attached pings. > * if so, would it be possible to get, say, 10 or 15 pings only from 58 and later? Yep, there are 30 pings in the attachment, * 1-10 from about:home (58.0.2 release) * 11-20 from about:newtab (58.0.2 release) * 21-30 from about:newtab (60.0a1)
Some interesting things I've noticed about the data in the JSON: * only one of the 10 about:home pings has first_window_opened as a trigger, the rest are "unexpected". Maybe this means that clicking the "about:home" or opening a second or third window are more likely to trigger the race? * it seems to happen with pretty much any combination of "prerender" and "preload" boolean settings * negative data here would theoretically mean that TelemetryFeed.endSession in the chrome is being called before the visibility event being set to visible in the content. The numbers claim that this is happening hundreds or thousands of seconds beforehand, which, by virtue of being so big, make it seem somewhat unlikely that the problem is just clock-skew between processes. It's not obvious how this could happen short of one or multiple bugs. We have a number of ugly heuristics in the code, and these could be removed by (at least partially) fixing bug 1425494. If that doesn't fix the problem, it would at least give us more confidence in the data we're using to diagnose, and it would give us more data to diagnose with. So I'm going to set this as depending on that bug.
Depends on: 1425494
(In reply to Dan Mosedale (:dmose) from comment #5) > Some interesting things I've noticed about the data in the JSON: > > * negative data here would theoretically mean that TelemetryFeed.endSession > in the chrome is being called before the visibility event being set to > visible in the content. The numbers claim that this is happening hundreds > or thousands of seconds beforehand, which, by virtue of being so big, make > it seem somewhat unlikely that the problem is just clock-skew between > processes. Agree. This sounds more like a code issue to me. Given that this has affected about 10% of sessions, also we're currently building dashboard to analyze session duration of AS. Let's dig it deeper to find the cause of this issue. Perhaps try to find out how to reproduce this issue first? Or revisit the related code path? I happened to see a negative session duration in my nightly a few weeks ago, but just couldn't produce it constantly.
Priority: -- → P3
See Also: → 1548648
Component: Activity Streams: Newtab → New Tab Page
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: