Closed Bug 1252337 Opened 4 years ago Closed 4 years ago

Decide whether to gzip core telemetry pings

Categories

(Firefox for Android :: General, defect)

All
Android
defect
Not set

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: mcomella, Assigned: mcomella)

References

(Blocks 1 open bug)

Details

Find out how large some average payloads are before & after gzipping to find out whether the savings (or quite the opposite!) is worth it.
In my local build, I added a log statement with `HttpEntity.getContentLength` before the payload is gzipped and after. The content is a byte array representing the string and we return `this.content.length` so we're returning the length of the byte array, or the number of bytes. I get the following results:

Galaxy S4 ---
03-01 11:32:03.728 D/lol     (11841): non-gzip: 215
03-01 11:32:03.738 D/lol     (11841): gzip: 196

03-01 11:32:23.853 D/lol     (11841): non-gzip: 215
03-01 11:32:23.853 D/lol     (11841): gzip: 197

Nexus 9 ---
03-01 11:34:49.289 D/lol     ( 3424): non-gzip: 218
03-01 11:34:49.300 D/lol     ( 3424): gzip: 197

03-01 11:35:09.897 D/lol     ( 3424): non-gzip: 195
03-01 11:35:09.902 D/lol     ( 3424): gzip: 181

Caveats being that a proper analysis would:
  * look at payload sizes for real data, not my test profiles
  * test over a large number of payloads (for statistical significance)

However, assuming my results are consistent with other payloads (there shouldn't be much variation in length in real payloads), we learned:
  * There's an average of about 18 bytes saved for gzipping the payloads
  * Notably, the payloads did not increase in size after compression

According to Alessio's analysis [1], there is a mean of 9 pings sent per client per day, which is a savings of 162 bytes per day, 5,022 bytes per month (31 days; assuming uniform Firefox usage), and 57,024 bytes a year, or ~57kB.

Alessio also provided ping size stats (I believe including headers without compression), which is a mean of 679 bytes. This equates to 6111 bytes per day, 189,441 bytes per month, and 2,151,072 bytes per year (~2.05MB) vs. 2,094,048 (~2MB) w/ compression.

That means our savings is 2.65% via compression, which seems insignificant to me.

The negatives to keeping compression in are:
  * CPU time
  * Device memory (we allocate a byte buffer equivalent to the non-gzipped payload, plus some objects)
  * (negligible because we need to do this for main pings) Added code complexity potentially contributing bugs

I haven't measured CPU time but it does occur in the background during application onStart (when the application is starting, or reappearing), which could play a part in UI stutter but hasn't appeared to cause problems yet on my devices.

Memory use is probably negligible because Android applications are more negatively affected by interrupting GC cycles more than absolute memory use (extreme memory use not-withstanding) and we're only deallocating this buffer once, not repetitively.

As such, I don't think there is much to make leaving it in or taking it out seem overwhelmingly better. I say we use engineering/test time as our guide and change nothing. (Then again, we still need to verify the pings appear on the server correctly... bug 1244861)

[1]: https://gist.github.com/Dexterp37/e5176ac397b5ea37ad17
The above analysis is assuming the ping contents stay the same.
Would have adding session times, a few entries in "experiments" and maybe future like a creation date change the impact?
(In reply to Georg Fritzsche [:gfritzsche] from comment #2)
> The above analysis is assuming the ping contents stay the same.
> Would have adding session times, a few entries in "experiments" and maybe
> future like a creation date change the impact?

My naive expectation is that the benefits of compression will only continue to increase as the data size increases (context: my assumption that doesn't have much evidence is that generally there is some small size of data where gzip starts to increase the size of the result rather than decrease it but above that size, you won't see the gzipped output be larger than the input). As such, I'd say let's leave in the compression.

If we wanted to be more sure about this, we could:
  * Gzip on the client, compare the non-gzip version to the gzip version and send the one that is smaller in size.
  * Run a regular analysis (once with each release?) on the server-side data – how large were the compressed payloads when they were received and how large are they after decompression? This could be useful generally too and be a good solution across 

Given my naive expectation, I think it's not worth my time to do this on the client. though I'm not sure how expensive it'd be to do these analyses on the server.

Georg, do you have opinions on how to proceed here?
Flags: needinfo?(gfritzsche)
I don't need we should put too much time into this, i just wouldn't discard the compression based on current minimal ping sizes.

Is the performance/memory impact something to worry about?
If not then maybe we can just look at the size of slightly larger pings (with the full data in that we want and experiments data) and decide based on the ratio there.
Flags: needinfo?(gfritzsche)
(In reply to Georg Fritzsche [:gfritzsche] from comment #4)
> Is the performance/memory impact something to worry about?

* Memory: I'm not really concerned. In the current implementation, we allocate a buffer that's the same size as the payload (200 bytes) and comparing that to what's on my heap after loading a simple html page, we have 693 array allocations of 1-byte values, with an median size of 168 B – about the size of our payloads. It seems insignificant, particularly compared to some of our thumbnail allocations. If we needed to improve this, we could try adding chunked compression: bug 1249358

* Performance: this is harder to measure. It occurs on a lower-priority background thread so on modern devices, it shouldn't affect the UI too much. If we start seeing performance issues on restarting fennec, we could look into this then.

To avoid spending more time, I'm calling this case closed.
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WORKSFORME
(To be explicit, we're leaving the gzip implementation in that landed in bug 1244861)
You need to log in before you can comment on or make changes to this bug.