Closed Bug 885650 Opened 7 years ago Closed 2 years ago

Address session data bloat


(Firefox Health Report Graveyard :: Client: Android, defect, P1)



(Not tracked)



(Reporter: rnewman, Unassigned)



I have FHR data for my primary phone recorded since June 13.

My total current payload size -- after a week! -- is 15KB uncompressed in compact JSON. That gzips down to 3KB, which is great.

In those 8 days my phone has recorded 405 sessions. As a moderately heavy user, then, in 180 days I can expect about nine thousand sessions to be recorded.

At about 18 bytes per session payload, this is 164KB of uploaded data for raw, undecorated sessions alone, and a corresponding DB volume on the device.

(It'll probably be more, because some sessions have startup time data and are thus twice the size.)

This would be computed and uploaded each day. Assuming good 80% compression, each month we would use about 1MB of the user's data allowance _just to upload session data_ (not whole payloads) over and over again. That doesn't seem reasonable.

I propose aggregating session data beyond a short lookback period. We could easily roll up on a daily basis to, e.g.:

  For each termination type:
  * Count
  * Mean duration
  * Median duration
  * Count of cold boots, and corresponding aggregates for each startup time.

We can save space by, e.g., omitting total (= count * mean)

Instead of:

"normal": [
"r": "P",
"d": 2,
"sj": 518
"r": "P",
"d": 19
"r": "P",
"d": 25
"r": "P",
"d": 74
"r": "P",
"d": 60
"r": "P",
"sg": 2272,
"d": 101,
"sj": 431
... and so on

You'd get 

"P": {
  "n": 22,       // 22 sessions
  "m": 74,       // Mean 74 second duration
  "mdn": 72      // Median
"": {
  "n": 3,        // 3 Gecko starts
  "m": 2412,     // Mean 2412msec
  "mdn": 2272    // Median
"": {
  "n": 3,        // 3 Java activity starts
  "m": 480,
  "mdn": 444

The principal saving here is to turn those days with, e.g., 22 sessions -- 22 DB rows and 500 bytes plus row overheads -- into an essentially fixed-size daily 100-byte payload.

I'm open to additional suggestions (e.g., discarding old data altogether?), but I want to make sure we address this within the 25 release cycle -- that's still 12 weeks of data that users will be building up with no way to discard.
Flags: needinfo?(deinspanjer)
Priority: -- → P1
I've asked Saptarshi to chip in with some advice here.  At a bare minimum, we'd want to add stddev to the summary metrics.  Mean alone would not be enough.

Note that if we summarize, we will lose the ability to perform certain types of analysis, and that cost/benefit calculation is what I'm wanting to get input from Saptarshi.
Flags: needinfo?(deinspanjer) → needinfo?(sguha)
Not sure what r,d,sg,sj mean, but were we to ever wish to see if
values of d and sg are related, aggregating would lose that ability. For example, suppose we had several documents like this (low values of sj associated with low values of d, and sj increases with d)

d: 10,20,68,68,70,80,90,10,20
sj: 103,120,250,270,280,290,110,103

We would not pick this up. 

I'm not sure if this really required, so it depends on what sort of questions need to be answered.

however, at the very least rnewmans suggested format should have the standard dev added to it.
Flags: needinfo?(sguha)
I should perhaps rephrase to give you some more flexibility in the format: given the choice between "we will automatically truncate all data older than M months/N hundred entries", and some form of session data aggregation that reduces the number of records kept by some significant amount, what kind of aggregation or truncation would you like?

There is no justification strong enough to keep ten thousand session entries over six months on a mobile device, so the only questions are (a) whether you care, or if we can just throw old data away, and (b) if you care, what aggregation process (and over what granularity) is appropriate?
(In reply to "Saptarshi Guha[:joy]" from comment #2)
> Not sure what r,d,sg,sj mean

r = reason for session ending. "P" = Pause.
sg = Gecko startup time (msec)
sj = Java startup time (msec)
d = session duration (sec)
> over six months on a mobile device, so the only questions are (a) whether
> you care, or if we can just throw old data away, and (b) if you care, what
> aggregation process (and over what granularity) is appropriate?

I have to agree! Again, it depends on the question.
I guess for mobile usage we would want
- days and number of times per 'used' (whatever 'used' means)
- Q: for mobile do we need 6 months? 6 months comes cases in which the behavior  happens less frequently. I imagine that a mobile device and it's apps are used much more frequently.
    - so, maybe 'day' usage i.e. used on Day1(n1 times, total usage t1), used on Day2(n2 times, total usage t2, like you suggested)
- and startup times from last S sessions, S need not be all sessions for last 180 days ...

I think your question is most important and will help format questions quicker:

Q: What do we care about?
OK, so let's summarize some questions for jjensen:

* Do we want the same 6-month lookback for mobile as for desktop?
* Should that lookback be the same for every data type?
* What questions are we trying to answer for session data? Would it be acceptable to, say, keep raw session timing data for the past month, and before that keep daily counters for total engagement and some startup time metrics?

Bear in mind (as I'm sure you know) that we can do server-side analysis as we go, so it's not like we can't figure out trends for startup time improvements over longer-term scales.
Flags: needinfo?(jjensen)
Hi all, missed this earlier -- will confer with Saptarshi/others tomorrow and comment.
Flags: needinfo?(jjensen)
(In reply to John Jensen from comment #7)
> Hi all, missed this earlier -- will confer with Saptarshi/others tomorrow
> and comment.

Any update on this, John?
See Also: → 888052
Per, this component is deprecated. Resolving this bug as incomplete, per :sdaswani.
Closed: 2 years ago
Resolution: --- → INCOMPLETE
Product: Firefox Health Report → Firefox Health Report Graveyard
You need to log in before you can comment on or make changes to this bug.