Closed Bug 1120982 Opened 5 years ago Closed 5 years ago

Redesign the main ping format (session ping)

Categories

(Firefox Health Report Graveyard :: Client: Desktop, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: gfritzsche, Assigned: gfritzsche)

References

Details

Attachments

(1 file, 5 obsolete files)

We are changing some of the main telemetry ping format, with the biggest change being around switching to submitting session-based.

We should properly specify the format and document it.
(In reply to Georg Fritzsche [:gfritzsche] from comment #0)
> with the biggest change being around switching to submitting session-based.

This should be: "with the biggest change being that we are adding environment data and breaking pings up at regular intervals, environment changes, ..."
Assignee: nobody → gfritzsche
Attached file Main ping format, v1 (obsolete) —
We didn't want to radically restructure the packet, but we made some changes to the |info| part by stripping every property except |reason| and |revision|. The others are already in the top level environment description. 

We also removed |clientID|, since it already is in the common ping data.
Reviewing payload.info:
 * previousBuildId - not needed due to updatePing?
 * revision - keep
 * reason - keep
 * platformBuildId - should move to the environment data
 * persona - should move to the environment
 * redundant to environment:
   * flashVersion
   * addons - remove TelemetryPing.setAddOns()!
   * gfxInfo
   * app*
   * locale
   * OS
   * sysInfo
   * experiment stuff

In payload.simpleMeasurements we have two things we should look into:
 * uptime - would be made redundant by totalTime from bug 1120992
 * the StartupInfo fields may be refactored via bug 1120992
Attachment #8549006 - Flags: feedback?(vdjeric)
Attachment #8549006 - Flags: feedback?(rvitillo)
Attachment #8549006 - Flags: feedback?(mreid)
Ok, apparently there are not clear opinions on what should live in the environment and what not, so we need to wait for the discussion to be finalized on bug 1069880 first.
Summary: Redesign the main ping format → Redesign the main ping format (session ping)
Additional data we need:
 * session start date (when the "subsession" started after possible ping splits on environment changes or 24h intervals)
 * random session id that identifies the whole user session over "subsessions"
 * maybe: timezone?
Attached file Main ping format, v2 (obsolete) —
Attachment #8549006 - Attachment is obsolete: true
Attachment #8549006 - Flags: feedback?(vdjeric)
Attachment #8549006 - Flags: feedback?(rvitillo)
Attachment #8549006 - Flags: feedback?(mreid)
Comment on attachment 8550283 [details]
Main ping format, v2

That's what changed in this revision:

* added sessionStartDate
* keeping previousBuildId because update ping is not blocking for 38
* added sessionId
* added timezoneOffset
* changed the version to 2
Attachment #8550283 - Flags: feedback?(vdjeric)
Attachment #8550283 - Flags: feedback?(rvitillo)
Attachment #8550283 - Flags: feedback?(mreid)
Attachment #8550283 - Flags: feedback?(benjamin)
Attachment #8550283 - Flags: feedback?(bcolloran)
Attached file Main ping format, v3 (obsolete) —
We're bumping the Main Ping version to 4, as other values have already been used.
The server converts the Histogram format when it is received, and updates the version field to 2 when they're saved in S3. Also, version = 3 was used for something needed for FxOS.
Attachment #8550283 - Attachment is obsolete: true
Attachment #8550283 - Flags: feedback?(vdjeric)
Attachment #8550283 - Flags: feedback?(rvitillo)
Attachment #8550283 - Flags: feedback?(mreid)
Attachment #8550283 - Flags: feedback?(benjamin)
Attachment #8550283 - Flags: feedback?(bcolloran)
Attachment #8550303 - Flags: feedback?(vdjeric)
Attachment #8550303 - Flags: feedback?(rvitillo)
Attachment #8550303 - Flags: feedback?(mreid)
Attachment #8550303 - Flags: feedback?(benjamin)
Attachment #8550303 - Flags: feedback?(bcolloran)
(In reply to Alessio Placitelli [:Dexter] from comment #8)
> Also, version = 3 was used for something needed for FxOS.

That was bug 987232.
Question: based on the design doc https://docs.google.com/a/mozilla.com/document/d/1IGpzsYGi_sq3YFQDAPyKOkU_BKvXAC95fZYA2i4ceVs/edit# it seems like the 'main ping' is used for both FHR *and* Telemetry-- is that right, or are there going to be separate Telemetry and FHR main pings? (Only one ping "main ping" mentioned in the design doc.)

If there is just one "main ping", what will differ between the FHR version and the Telemetry version? Presumably not all of those subfields are appropriate for FHR; which subfields will be included, and how will the currently existing FHR daily measurements be represented?

If, on the other hand, there will be separate FHR and Telemetry main pings, is there a bug for the FHR version?
Since there can be more than session fragment ping per day, the "sessionStartDate" needs to be a session start time so that the session fragments can be properly sequenced when we reassemble all of a client's pings on the server side.

I would recommend just expanding the ISO date to an ISO datetime "YYYY-MM-DDThh:mm:ssTZD" (getting timezones right is critical here, since we're splitting session fragments at local midnight -- http://www.w3.org/TR/NOTE-datetime).
(In reply to brendan c from comment #11)
> Since there can be more than session fragment ping per day, the
> "sessionStartDate" needs to be a session start time so that the session
> fragments can be properly sequenced when we reassemble all of a client's
> pings on the server side.
Going one step further, it might be useful to have a "session fragment sequence number" if strict ordering is important (then we're not at the mercy of the client clock).
(In reply to brendan c from comment #10)
> Question: based on the design doc
> https://docs.google.com/a/mozilla.com/document/d/
> 1IGpzsYGi_sq3YFQDAPyKOkU_BKvXAC95fZYA2i4ceVs/edit# it seems like the 'main
> ping' is used for both FHR *and* Telemetry-- is that right, or are there
> going to be separate Telemetry and FHR main pings? (Only one ping "main
> ping" mentioned in the design doc.)

There is only one main ping type with different contents depending on user settings.
 
> If there is just one "main ping", what will differ between the FHR version
> and the Telemetry version? Presumably not all of those subfields are
> appropriate for FHR; which subfields will be included, and how will the
> currently existing FHR daily measurements be represented?

Bug 1120369 will allow specifying different data sets for the histograms.
"FHR" will be the "base data set", "Telemetry" data will be the extended data set (opt-in on release, opt-out on pre-release).
Bug 1122062 is about reviewing the data for being in the right set.

(In reply to brendan c from comment #11)
> Since there can be more than session fragment ping per day, the
> "sessionStartDate" needs to be a session start time so that the session
> fragments can be properly sequenced when we reassemble all of a client's
> pings on the server side.
> 
> I would recommend just expanding the ISO date to an ISO datetime
> "YYYY-MM-DDThh:mm:ssTZD" (getting timezones right is critical here, since
> we're splitting session fragments at local midnight --
> http://www.w3.org/TR/NOTE-datetime).

The intention was to do it as per e.g. Date.toISOString() (YYYY-MM-DDTHH:mm:ss.sssZ).
A question i can not answer is: is the ease in readability worth the increased payload size etc.? I'm assuming yes from previous comments on the environment.

(In reply to Mark Reid [:mreid] from comment #12)
> (In reply to brendan c from comment #11)
> > Since there can be more than session fragment ping per day, the
> > "sessionStartDate" needs to be a session start time so that the session
> > fragments can be properly sequenced when we reassemble all of a client's
> > pings on the server side.
> Going one step further, it might be useful to have a "session fragment
> sequence number" if strict ordering is important (then we're not at the
> mercy of the client clock).

Shall we add this?
(In reply to Georg Fritzsche [:gfritzsche] [slow to respond until Jan 19] from comment #13)

> There is only one main ping type with different contents depending on user
> settings.
>  

> Bug 1120369 will allow specifying different data sets for the histograms.
> "FHR" will be the "base data set", "Telemetry" data will be the extended
> data set (opt-in on release, opt-out on pre-release).
> Bug 1122062 is about reviewing the data for being in the right set.

Thanks Georg. I hadn't seen those bugs yet


> The intention was to do it as per e.g. Date.toISOString()
> (YYYY-MM-DDTHH:mm:ss.sssZ).
> A question i can not answer is: is the ease in readability worth the
> increased payload size etc.? I'm assuming yes from previous comments on the
> environment.

Perfect. Yea, the readability it is worth it.


> > Going one step further, it might be useful to have a "session fragment
> > sequence number" if strict ordering is important (then we're not at the
> > mercy of the client clock).
> 
> Shall we add this?

It's a reasonable idea. If it's cheap to put in then I'd favor it. But my hunch is that the clock timestamp should be effective in the vast majority of cases (i.e., that even on systems with clock skew, the clock is likely skewed *consistently*), so I don't see it as a firm requirement if it's thought to be too costly in terms of effort to add it or characters added to the payload. I'm happy to be over-ridden by anyone who has strong feelings on the matter one way or the other.
Blocks: 1124212
(In reply to brendan c from comment #14)
> > > Going one step further, it might be useful to have a "session fragment
> > > sequence number" if strict ordering is important (then we're not at the
> > > mercy of the client clock).
> > 
> > Shall we add this?
> 
> It's a reasonable idea. If it's cheap to put in then I'd favor it. But my
> hunch is that the clock timestamp should be effective in the vast majority
> of cases (i.e., that even on systems with clock skew, the clock is likely
> skewed *consistently*), so I don't see it as a firm requirement if it's
> thought to be too costly in terms of effort to add it or characters added to
> the payload. I'm happy to be over-ridden by anyone who has strong feelings
> on the matter one way or the other.
For the record, I have no strong feelings - I just wasn't sure if strict ordering was important, or how we want to handle things like daylight savings time.
(In reply to Georg Fritzsche [:gfritzsche] from comment #3)
> Reviewing payload.info:
>  * previousBuildId - not needed due to updatePing?
>  * revision - keep
>  * reason - keep
>  * platformBuildId - should move to the environment data
>  * persona - should move to the environment
>  * redundant to environment:
>    * flashVersion
>    * addons - remove TelemetryPing.setAddOns()!

The environment now only contains active addon/plugin data. Bug 1124128 takes care the FHR addon provider migration, we should consider removing this when we get there.
(In reply to Mark Reid [:mreid] from comment #15)
> (In reply to brendan c from comment #14)
> > > > Going one step further, it might be useful to have a "session fragment
> > > > sequence number" if strict ordering is important (then we're not at the
> > > > mercy of the client clock).
> > > 
> > > Shall we add this?
> > 
> > It's a reasonable idea. If it's cheap to put in then I'd favor it. But my
> > hunch is that the clock timestamp should be effective in the vast majority
> > of cases (i.e., that even on systems with clock skew, the clock is likely
> > skewed *consistently*), so I don't see it as a firm requirement if it's
> > thought to be too costly in terms of effort to add it or characters added to
> > the payload. I'm happy to be over-ridden by anyone who has strong feelings
> > on the matter one way or the other.
> For the record, I have no strong feelings - I just wasn't sure if strict
> ordering was important, or how we want to handle things like daylight
> savings time.
Thought of another use case for this - ensuring that we actually have all the fragments of a session. Is that a compelling enough reason to add it?
How will this document be associated with the environment information?
The ping contains both the environment and the payload (bug 1120981).
The payload for the specific ping type "main" is the data discussed here. It contains mostly the "saved-session" equivalent data etc.
(In reply to Mark Reid [:mreid] from comment #17)
> (In reply to Mark Reid [:mreid] from comment #15)
> > (In reply to brendan c from comment #14)
> > > > > Going one step further, it might be useful to have a "session fragment
> > > > > sequence number" if strict ordering is important (then we're not at the
> > > > > mercy of the client clock).
> > > > 
> > > > Shall we add this?
> > > 
> > > It's a reasonable idea. If it's cheap to put in then I'd favor it. But my
> > > hunch is that the clock timestamp should be effective in the vast majority
> > > of cases (i.e., that even on systems with clock skew, the clock is likely
> > > skewed *consistently*), so I don't see it as a firm requirement if it's
> > > thought to be too costly in terms of effort to add it or characters added to
> > > the payload. I'm happy to be over-ridden by anyone who has strong feelings
> > > on the matter one way or the other.
> > For the record, I have no strong feelings - I just wasn't sure if strict
> > ordering was important, or how we want to handle things like daylight
> > savings time.
> Thought of another use case for this - ensuring that we actually have all
> the fragments of a session. Is that a compelling enough reason to add it?

I'd say it's easy enough to remove later, while we possibly can't reconstruct this later.
So lets add it.
(In reply to Georg Fritzsche [:gfritzsche] from comment #19)
> The ping contains both the environment and the payload (bug 1120981).
> The payload for the specific ping type "main" is the data discussed here. It
> contains mostly the "saved-session" equivalent data etc.

Ok, so this document replaces "payload: { ... }" in the common structure, and sets the "type" field accordingly.  Got it.
(In reply to Georg Fritzsche [:gfritzsche] from comment #20)
> > > > > > Going one step further, it might be useful to have a "session fragment
> > > > > > sequence number" if strict ordering is important (then we're not at the
> > > > > > mercy of the client clock).
> I'd say it's easy enough to remove later, while we possibly can't
> reconstruct this later.
> So lets add it.

I've thought about this more in the last couple days and I think something like it will be very helpful for data auditing. But particularly in the case of copied profiles/branching profiles, and other things like that, the most useful thing would actually be a pointer in the current session fragment ping to the id of the last session fragment ping. This should allow us to unambiguously reconstruct the chain of session fragments, but also to reconstruct the tree of session fragments in the case of branching profiles.

Also, if there is a "fragment number" it should be implemented as a counter that is cumulative across sessions, and not just local to a session. That is: if a particular session has four fragments, and the client has submitted 1000 session fragments previously, these new fragments should be labeled 1001-1004, not 1-4. This will give us one extra check to verify that the chain of session fragment pings we have collected is complete, which will be a nice extra assurance that we aren't suffering data loss.

(Verifying that session fragment chains look the way we expect them to is probably one of the first checks we'll want to run on the reconstructed data per clientId.)
Comment on attachment 8550303 [details]
Main ping format, v3

>{
>    version: 4,
>
>    info: {
>        previousBuildId: <string>,

What is this for?

Otherwise this looks fine.
Attachment #8550303 - Flags: feedback?(benjamin) → feedback+
It was introduced in bug 841028 to understand, in case of updates, what build users updated from.
I've been holding off on reviewing this patch while we debate the approach on the fhr-dev mailing list, but I think what you've proposed actually doesn't depend on the outcome of that discussion. I'll do a review tomorrow
1. In which section of a main ping will the system hardware information be reported? e.g. graphics card model. I don't see it in this propoosal or the common ping proposal in bug 1120981
2. We need to count session fragments starting with 0 at the beginning of any session. This will help with "stitching" fragments into sessions on the server. Session stitching will begin with the final fragment of a session and using that fragment we need to be able to know how many other (earlier) fragments we need to find from the same session.
A count starting from profile creation is not sufficient.
3. Using ISO timestamps for session start seems like a privacy leak. Do we really need that much precision?

ni? benjamin for opinion on question 3
Flags: needinfo?(benjamin)
(In reply to Vladan Djeric (:vladan) from comment #26)
> 1. In which section of a main ping will the system hardware information be
> reported? e.g. graphics card model. I don't see it in this propoosal or the
> common ping proposal in bug 1120981

That moved to the environment in bug 1069880.

> 2. We need to count session fragments starting with 0 at the beginning of
> any session. This will help with "stitching" fragments into sessions on the
> server. Session stitching will begin with the final fragment of a session
> and using that fragment we need to be able to know how many other (earlier)
> fragments we need to find from the same session.
> A count starting from profile creation is not sufficient.

Can you coordinate with Brendan what the best option is? See comment 12 to comment 22.
I think that comment 26 and 12 are asking for the same thing, which is a sequence number 0,1,2 for subsession pings with a larger session. We should do that.

Can you explain the concern about ISO timestamps? I don't think I understand the concern.
Flags: needinfo?(benjamin)
Flags: needinfo?(vdjeric)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #28)
> I think that comment 26 and 12 are asking for the same thing, which is a
> sequence number 0,1,2 for subsession pings with a larger session. We should
> do that.

Brendan has a different suggestion in comment 22. I'm not sure if we need both or which is more useful server-side.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #28)
> Can you explain the concern about ISO timestamps? I don't think I understand
> the concern.

We'd be collecting a record of a user's presence at their computer (every Firefox launch or environment split) with a second precision. Seems a bit off and unnecessary.

(In reply to Georg Fritzsche [:gfritzsche] from comment #29)
> Brendan has a different suggestion in comment 22. I'm not sure if we need
> both or which is more useful server-side.

I think there's value to FHR to having both. The persistent count across session will sometimes be incorrect (copied profiles, browser crashes, Firefox pruning old pings, etc) but I think that's why Brendan wants it in the first place.
Flags: needinfo?(vdjeric)
> > Brendan has a different suggestion in comment 22. I'm not sure if we need
> > both or which is more useful server-side.
> 
> I think there's value to FHR to having both. The persistent count across
> session will sometimes be incorrect (copied profiles, browser crashes,
> Firefox pruning old pings, etc) but I think that's why Brendan wants it in
> the first place.

FTR, the use case I'm talking about could be better served by adding a "previousPingId" field to the common ping ( https://bugzilla.mozilla.org/show_bug.cgi?id=1120981 ). This would allow unambiguous reconstruction of ping trees in cases of copying/vms/rollback etc. Having the data structured in this way would really help us get a handle on the frequency of these behaviors in our user population.

So I guess I would favor: session fragment numbering per session to serve Vladan's use case, and pointers to previous pings to serve the session chain/tree reconstruction and data auditing use case.
Comment on attachment 8550303 [details]
Main ping format, v3

r+ if main ping will contain the order of the fragment within a session, and sessions are linked to previous session(s) somehow. I'll defer to Ben on privacy implication of ISO dates.

Another thought.. for linking different sessions, a session counter per-profile would be more resilient than a prevSessionID.. cause the previous session could be missing and then the ordering of sessions is unclear. You could also do both prevSessionID and sessionCounter.
Attachment #8550303 - Flags: feedback?(vdjeric) → feedback+
(In reply to Vladan Djeric (:vladan) from comment #32)
> Another thought.. for linking different sessions, a session counter
> per-profile would be more resilient than a prevSessionID.. cause the
> previous session could be missing and then the ordering of sessions is
> unclear. You could also do both prevSessionID and sessionCounter.

prevSessionID is required for the case of profile branching. If it's just a counter and there are, say, ten distinct sessions that are number 100 in the session sequence, then it will not be clear which of those ten sessions a given session #101 follows from. To reconstruct the tree correctly, each node needs to be labeled.

Good point that a missing session will break the chain/tree... I guess I've naively been thinking that our updated submission policy (requiring an ack on submission) will prevent that, but there will always be anomalies and bugs, so having the absolute session fragment count would help to mitigate that problem (and will help us detect it more easily).

Re Vlad's point about timestamps-- I agree that this should be avoided if we don't need it, and the conversation has moved in the direction of a better solution to the original problem from comment 11. If we have:
-prevSessionId
-session fragment count per session
-session fragment count per client
then we can unambiguously reconstruct session chains and trees in the correct order, which was the reason I asked for the high res time stamp. I think that use case is much better served by these solutions, and date-level binning is sufficient for FHR data collection, so I am satisfied that we can drop the high res time stamp in favor of a simple date stamp.
(In reply to brendan c from comment #33)
> Re Vlad's point about timestamps-- I agree that this should be avoided if we
> don't need it, and the conversation has moved in the direction of a better
> solution to the original problem from comment 11. If we have:
> -prevSessionId
> -session fragment count per session
> -session fragment count per client
> then we can unambiguously reconstruct session chains and trees in the
> correct order, which was the reason I asked for the high res time stamp. I
> think that use case is much better served by these solutions, and date-level
> binning is sufficient for FHR data collection, so I am satisfied that we can
> drop the high res time stamp in favor of a simple date stamp.

We would presumably still want to know (sub)session lengths though, but for that a sessionDuration field or similar would be enough.
What precision do we need for that for metrics? Seconds?
Flags: needinfo?(bcolloran)
(In reply to brendan c from comment #33)
> solution to the original problem from comment 11. If we have:
> -prevSessionId
> -session fragment count per session
> -session fragment count per client
> then we can unambiguously reconstruct session chains and trees in the
> correct order

Ok, so we want all 3 added?
OS: Mac OS X → All
Hardware: x86 → All
Attached file Main ping format, v4 (obsolete) —
I think we're done here once the outstanding question is cleared up.
Attachment #8550303 - Attachment is obsolete: true
Attachment #8550303 - Flags: feedback?(rvitillo)
Attachment #8550303 - Flags: feedback?(mreid)
Attachment #8550303 - Flags: feedback?(bcolloran)
re comment 34 -- currently in FHR session duration is measured to seconds (clean/abortedActiveTime). Let's keep that level of resolution for each session fragment.

re comment 35 -- yea, i think all 3 of those and a simple datestamp should be ok for all of the needs i've heard.
Flags: needinfo?(bcolloran)
(In reply to brendan c from comment #37)
> re comment 35 -- yea, i think all 3 of those and a simple datestamp should
> be ok for all of the needs i've heard.

Can you expand on the datestamp?
Datestamp for what point in time?
I assume you mean with day resolution?
Flags: needinfo?(bcolloran)
Yes, like you say, I mean that we just need day resolution for the start of every session fragment, not second resolution. Sorry if I expressed it in a confusing way.

I think you already have it covered with--
sessionStartDate: <ISO Date>, // the date the session was started
Flags: needinfo?(bcolloran)
Attachment #8550303 - Flags: review+
(In reply to brendan c from comment #39)
> I think you already have it covered with--
> sessionStartDate: <ISO Date>, // the date the session was started

Ok, last question to be sure :)
Do you want the start date of the (browser) session? Or the start date of the subsession / session fragment?
Flags: needinfo?(bcolloran)
No prob :-)

We'll need the start date of each session fragment so that we can attribute activity that occurs during that session fragment (searches, active time,  etc) to a local calendar date. This will serve the kind of business questions FHR was designed to answer.

By using the other three mechanisms,
> -prevSessionId
> -session fragment count per session
> -session fragment count per client
we'll be able to reconstruct the entire browser session, so if we need the start date of the browser session for some reason we can take the date of the first session fragment in the reconstructed session, but this is not a typical requirement for FHR use cases.
Flags: needinfo?(bcolloran)
I'd like the session ping to have the start date of both the whole session and this subsession. That way when reconstructing a session, we have a bound on the date range to query.
Also, will it be clear from a session ping whether that ping is the last one of a session? That may be important for reconstruction purposes.
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #43)
> Also, will it be clear from a session ping whether that ping is the last one
> of a session? That may be important for reconstruction purposes.

We would tell this from the reason field. Right now we have "saved-session" as a reason which signifies shutdown.
We will have similar reasons (lets say e.g. "environment-change", "shutdown", ...) for the subsession payloads.
Attached file Main ping format, v5 (obsolete) —
Updated with:
* sessionStartDate
* subsessionStartDate
* more "reason" examples
Attachment #8557894 - Attachment is obsolete: true
Status: NEW → ASSIGNED
This seems finished then.
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Oops, I misstated my requirement-- for stitching together session fragments, we need session *fragment* ids and pointers between them, not just session IDs. So replace:

    sessionId: <uuid>,  // random session id, shared by subsessions
    previousSessionId: <uuid>, // session id of the previous session or null

with for example (I don't care about the labels):

    sessionId: <uuid>,  // random session id, shared by subsessions
    subsessionId: <uuid>,  // random subsession id
    previousSubsessionId: <uuid>, // subsession id of the previous subsession or null

The point here is to look for gaps and divergences in the set of actual pings sent by clients; for that we will need subsession granularity.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
(In reply to brendan c from comment #47)
> Oops, I misstated my requirement-- for stitching together session fragments,
> we need session *fragment* ids and pointers between them, not just session
> IDs. So replace:
> 
>     sessionId: <uuid>,  // random session id, shared by subsessions
>     previousSessionId: <uuid>, // session id of the previous session or null
> 
> with for example (I don't care about the labels):
> 
>     sessionId: <uuid>,  // random session id, shared by subsessions
>     subsessionId: <uuid>,  // random subsession id
>     previousSubsessionId: <uuid>, // subsession id of the previous
> subsession or null
> 
> The point here is to look for gaps and divergences in the set of actual
> pings sent by clients; for that we will need subsession granularity.

I don't really want to suggest adding even more fields, but won't we need both pointers to handle the case where the history diverges mid-session vs. the case where it diverges *between* sessions?  Is the mid-session divergence even important enough to add extra fields for? The next session will show the divergence anyways (two sessions with the same previous id).
If it diverges between sessions, we'll know because multiple session fragments in multiple distinct client sessions will point to a single final session fragment in a single client session. That is good enough for me.
I don't think I understand - wouldn't it just look like two independent sessions?
Ah, well the previousSubsessionId pointer must always point to the previous subsessionId even if that subsession occurred during a previous and independent client session.

So imagine we get the following three submissions from one clientId:

sessionId: 1
subsessionId: A
previousSubsessionId: Z

sessionId: 1
subsessionId: Q
previousSubsessionId: A

sessionId: 1
subsessionId: F
previousSubsessionId: A

We can see that the client histories diverged after subsession A. But the situation is the same if we get:

sessionId: 1
subsessionId: A
previousSubsessionId: Z

sessionId: 679
subsessionId: B
previousSubsessionId: A

sessionId: 42
subsessionId: F
previousSubsessionId: A

It is still clear that the client histories diverged after subsession A, even though subsession A was in session 1 and subsessions B and F are in subsequent sessions.

Indeed, I expect most of the history divergence to occur between sessions, but since subsessions are the smaller units from which full sessions are constructed, if we can reconstruct the complete subsession tree we get the corresponding session tree for free.

Does this make sense now? Lmk if not, I want to get this right...
Yes, makes perfect sense now.  When I read
>     previousSubsessionId: <uuid>, // subsession id of the previous subsession or null

I assumed that the value would be null for the first subsession of each session.  But really it should only be null for the first subsession of all time.

Can we change the doc string (and the behaviour!) to something like:
  previousSubsessionId: <uuid>, // subsession id of the previous subsession, even if it was in a different session
Attached file Main ping format, v6
Ok, done.
Attachment #8558104 - Attachment is obsolete: true
Status: REOPENED → RESOLVED
Closed: 5 years ago5 years ago
Resolution: --- → FIXED
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in before you can comment on or make changes to this bug.