Closed Bug 1244861 Opened 8 years ago Closed 8 years ago

gzip telemetry pings

Categories

(Firefox for Android Graveyard :: General, defect)

All
Android
defect
Not set
normal

Tracking

(firefox46 affected, firefox47 fixed, fennec47+)

RESOLVED FIXED
Firefox 47
Tracking Status
firefox46 --- affected
firefox47 --- fixed
fennec 47+ ---

People

(Reporter: mcomella, Assigned: mcomella)

References

Details

Attachments

(1 file)

I didn't do anything to compress the telemetry pings but apparently gzip json is expected by the server (though raw json can be read by the server and is actionable).

I was thinking maybe BaseResource just took care of this for me but the content-type of my ping (when stepping through the debugger) was "application/json" so it does not appear that way.

We should try to land this before we push it to release to save our users' data!
(In reply to Michael Comella (:mcomella) from comment #0)
> I was thinking maybe BaseResource just took care of this for me but the
> content-type of my ping (when stepping through the debugger) was
> "application/json" so it does not appear that way.

I don't know how the telemetry server is implemented, but shouldn't the content-type stay the same and just "Content-Encoding" set to "gzip" if the content is send compressed?
(In reply to Sebastian Kaspari (:sebastian) from comment #1)
> I don't know how the telemetry server is implemented, but shouldn't the
> content-type stay the same and just "Content-Encoding" set to "gzip" if the
> content is send compressed?

After doing some research, that's what I'm concluding – thanks.

It looks like this functionality is built into httpclientandroidlib as GzipCompressingEntity, but that we don't call it from anywhere:
  https://mxr.mozilla.org/mozilla-central/search?string=gzipcompressing&find=mobile%2Fandroid&findi=&filter=^[^\0]*%24&hitlimit=&tree=mozilla-central
  https://mxr.mozilla.org/mozilla-central/search?string=gzipcompress&find=mobile%2Fandroid&findi=&filter=^[^\0]*%24&hitlimit=&tree=mozilla-central
Just pass whatever entity type you want into `post`. You should just be able to set the Content-Encoding header, then wrap:

  resource.post(GzipCompressingEntity(jsonEntity(obj));
It's possible that the client will be smart enough to set the encoding header for you, too.
I did this, the one caveat is the GzipCompressingEntity returns -1 for the content length, causing the content-length header to be absent. This breaks on the gzip test server [1] but I haven't heard back yet if this is a problem on the telemetry servers proper.

[1]: https://github.com/vdjeric/gzipServer
Mark, does the server expect a 'content-length' header? I notice my http library does not provide a content-length when writing gzip'd content and the test gzip server (https://github.com/vdjeric/gzipServer) doesn't appear to handle this case.
Flags: needinfo?(mreid)
This build does not specify the 'Content-Length' header item as it is not built
into GzipCompressingEntity.

If this header is required by the telemetry server, we can create our entity to
get the content length and add it manually, however.

Review commit: https://reviewboard.mozilla.org/r/33133/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/33133/
Status: NEW → ASSIGNED
tracking-fennec: --- → 46+
Attachment #8714534 - Flags: review+
Comment on attachment 8714534 [details]
MozReview Request: Bug 1244861 - Gzip outgoing telemetry pings. r=rnewman

https://reviewboard.mozilla.org/r/33133/#review29881

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/net/BaseResource.java:510
(Diff revision 1)
> +    HttpEntity entity = jsonEntity(o);

I'd much rather have `shouldCompress` be a flag on the resource itself, and have `put`/`post` apply wrapping automatically further down the stack:

```
  BaseResource r = new BaseResource("http://...");
  r.shouldCompress = true;
  r.postBlocking(jsonObj);
```
  
But if you're horribly pressed for time, this will suffice.
(In reply to Michael Comella (:mcomella) from comment #6)
> Mark, does the server expect a 'content-length' header? I notice my http
> library does not provide a content-length when writing gzip'd content and
> the test gzip server (https://github.com/vdjeric/gzipServer) doesn't appear
> to handle this case.

via irc:

15:55 <~whd> mcomella: I don't see anywhere that we actually use the content-length header
15:56 <mcomella> ok, ty
15:56 <~whd> we keep it around in landfill / error messages probably for debug
15:56 <mcomella> whd: Once we get pings to the server, I'll switch over to gzip'd pings and make sure everything is still working properly
---

So I'll hold on this until bug 1244297 is completed.
Depends on: 1244297
Flags: needinfo?(mreid)
We should try to land this with the initial core ping implementation (bug 1205835).
tracking-fennec: 46+ → 45+
Alas, I tried to upload my updated patch to the telemetry server but I got an HTTP 411 error: length required.

Time to implement the Content-Length header.
This is not strictly necessary for bug 1205835 – it's just an optimization.
No longer blocks: 1205835
Depends on: 1205835
Comment on attachment 8714534 [details]
MozReview Request: Bug 1244861 - Gzip outgoing telemetry pings. r=rnewman

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/33133/diff/1-2/
Comment on attachment 8714534 [details]
MozReview Request: Bug 1244861 - Gzip outgoing telemetry pings. r=rnewman

Major revision to the previous patch so r?.
Attachment #8714534 - Flags: review+ → review?(rnewman)
Actually, the pings are small so I don't think we need to rush this – let's land in Aurora and let it ride the trains.
tracking-fennec: 45+ → 46+
Comment on attachment 8714534 [details]
MozReview Request: Bug 1244861 - Gzip outgoing telemetry pings. r=rnewman

https://reviewboard.mozilla.org/r/33133/#review31789

::: mobile/android/base/java/org/mozilla/gecko/telemetry/TelemetryUploadService.java:199
(Diff revision 2)
> +        resource.setShouldCompressUploadedEntity(true, false); // Telemetry servers don't support chunking.

```
resource.setShouldCompressBody(true);
resource.setShouldChunkUploads(false);
```

::: mobile/android/services/src/main/java/org/mozilla/gecko/sync/net/GzipNonChunkedCompressingEntity.java:90
(Diff revision 2)
> +                (int) unzippedContentLength : Integer.MAX_VALUE;

`MAX_VALUE` is 2,147,483,647 -- 2GB.

I would suggest checking the result of `getContentLength` in `initBuffer`, and throw an exception if it's beyond a reasonable range.

There is no way we are going to be able to allocate a >2GB output stream and a >2GB byte array, so we shouldn't see if we'll succeed with a mere 2GB stream!

Then you can just use a cast and the constructor directly in `initBuffer`; no need for this method.
Attachment #8714534 - Flags: review?(rnewman) → review+
https://reviewboard.mozilla.org/r/33133/#review31789

> ```
> resource.setShouldCompressBody(true);
> resource.setShouldChunkUploads(false);
> ```

One caveat is that chunking is only supported with gzip compression, which is why I combined the messages in the first place. However, `AbstractHttpEntity` has a method to set chunking that claims "chunking is only a hint" so I'm fine having a separate method.

> `MAX_VALUE` is 2,147,483,647 -- 2GB.
> 
> I would suggest checking the result of `getContentLength` in `initBuffer`, and throw an exception if it's beyond a reasonable range.
> 
> There is no way we are going to be able to allocate a >2GB output stream and a >2GB byte array, so we shouldn't see if we'll succeed with a mere 2GB stream!
> 
> Then you can just use a cast and the constructor directly in `initBuffer`; no need for this method.

Makes sense. I don't have great judgement for the maximum size upload `BaseResource` will try to make but I capped it at 10 MB.
NI self to request verification that we still receive valid telemetry data after this change.
Flags: needinfo?(michael.l.comella)
Transfer-Encoding: chunked is mandated for HTTP/1.1. If a server doesn't support chunked transfer in 2016, WTF.

Digging deeper, the test server at https://github.com/vdjeric/gzipServer/blob/master/gzipServer.py assumes there is a Content-Length request header, which means it isn't a fully compliant HTTP/1.1 server. https://tools.ietf.org/html/rfc7230#section-3.3.2 says that Content-Length *must* not occur if Transfer-Encoding is set. I /think/ Python's built-in HTTP server transparently supports Transfer-Encoding - the Telemetry test server is just making poor assumptions about the Content-Length header existing.

If you are expecting to send large'ish HTTP bodies, you should consider using chunked transfer because it will almost certainly use less memory. Without chunked transfer, you need to know the full size of the thing you are sending before starting to send it. That usually means reading a file size from disk or in your case buffering the gzipped data in memory.
https://hg.mozilla.org/mozilla-central/rev/f43052add92f
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 47
Finkle, can you verify data from the latest Nightly build is correctly appearing on the telemetry servers?

(In reply to Gregory Szorc [:gps] from comment #20)
> Transfer-Encoding: chunked is mandated for HTTP/1.1. If a server doesn't
> support chunked transfer in 2016, WTF.
> 
> Digging deeper, the test server at

My comment 11 refers to the actual telemetry server, not the test server:

(In reply to Michael Comella (:mcomella) from comment #11)
> Alas, I tried to upload [with gzip compression in] my updated patch to the
> telemetry server but I got an HTTP 411 error: length required.
> 
> Time to implement the Content-Length header.

There are layers of abstraction so I have not verified the exact headers sent – perhaps TRANSFER_ENCODING wasn't properly set? I called out to httpclientandroidlib to do the compression so you'd think this would just work but perhaps not.

> If you are expecting to send large'ish HTTP bodies,

What would be considered large'ish? At least to start, we're trying to keep these telemetry pings as small as possible.

This sounds like it could be useful to investigate when our telemetry pings get larger – I filed bug 1249358.
Flags: needinfo?(michael.l.comella) → needinfo?(mark.finkle)
NI self to uplift once verified.
Flags: needinfo?(michael.l.comella)
I started to try to figure out the telemetry system in order to verify my results but I spoke with ckarlof who mentioned that we're intending on making the pipeline easier to use and more self-service so I decided that figuring it out wasn't a worthwhile investment (especially considering how expensive context switching is).

Alessio, let me know if you don't have time for this (or you're not the right person for the job!), but can you verify we're still receiving data on nightly builds since this landed (comment 21)? Let me know if you need more specifics for the analysis (e.g. build ID).

Clear NI because there's no time left to uplift.
Flags: needinfo?(michael.l.comella) → needinfo?(alessio.placitelli)
Actually, we can uplift to 46 – NI self.
Flags: needinfo?(michael.l.comella)
(In reply to Michael Comella (:mcomella) from comment #24)
> Alessio, let me know if you don't have time for this (or you're not the
> right person for the job!), but can you verify we're still receiving data on
> nightly builds since this landed (comment 21)? Let me know if you need more
> specifics for the analysis (e.g. build ID).

I'm the right person for the job ;-) It looks like the pings are flowing in: https://gist.github.com/Dexterp37/c6805569d8cea2fa1da5

This notebook:

- checks that the Fennec Nightly builds generated from the 19th of February submit core data;
- breaks the submissions by build id and submission date

Michael, let me know if that answers your question (and if the build ids are correct).

Georg, would you mind reviewing the notebook?
Flags: needinfo?(alessio.placitelli) → needinfo?(gfritzsche)
This looks reasonable.
Note that for [27] and [32], you don't need key-value pair RDDs (you never use the value part).
Flags: needinfo?(gfritzsche)
(In reply to Michael Comella (:mcomella) from comment #24)
> I started to try to figure out the telemetry system in order to verify my
> results but I spoke with ckarlof who mentioned that we're intending on
> making the pipeline easier to use and more self-service so I decided that
> figuring it out wasn't a worthwhile investment (especially considering how
> expensive context switching is).

For what it's worth, the Spark analysis jobs are self-serve right now.
We will have other data analysis tools like SQL-query-based access coming up more self-serve in the future, but those will be looking at specific exported data sets.
(In reply to Alessio Placitelli [:Dexter] from comment #26)
> Michael, let me know if that answers your question (and if the build ids are
> correct).

Yep! Thanks Alessio!
Flags: needinfo?(michael.l.comella)
Flags: needinfo?(mark.finkle)
Comment on attachment 8714534 [details]
MozReview Request: Bug 1244861 - Gzip outgoing telemetry pings. r=rnewman

Approval Request Comment
[Feature/regressing bug #]: bug 1205835 landed telemetry v1 w/o gzip compression
[User impact if declined]: Users will be uploading ungzipped payloads. The change in payload size isn't too extreme (see bug 1252337) – several bytes per payload – but it can be useful as we add new fields to our pings.
[Describe test coverage new/current, TreeHerder]: Tested locally, verified server-side
[Risks and why]: Low. We add some boolean flags to see if we want to gzip and if we do, we run a wrap the existing element in a gzip entity I wrote which leans on the existing gzip implementation to do the heavy lifting. Essentially, I set up a buffer before doing any of the other operations, which involve calling out to the existing gzip implementation, all of which is fairly straightforward.
[String/UUID change made/needed]: None
Attachment #8714534 - Flags: approval-mozilla-aurora?
To answer your questions from IRC, my analysis [1] indicated we'd save users on average 57kB of data (which is probably over the network) a year, which is not significant versus the 2.05MB they'd be sending for the data overall. So we're not gaining a direct significant benefit by uplifting this, but it's a nice thing to do for users. Additionally, if we uplift any other fields (e.g. maybe bug 1249288?), this benefit will only increase.

[1]: https://bugzilla.mozilla.org/show_bug.cgi?id=1252337#c1
Comment on attachment 8714534 [details]
MozReview Request: Bug 1244861 - Gzip outgoing telemetry pings. r=rnewman

This is the last day for uplifts before the merge, and I would like to decrease risk for 46 beta 1 as best possible. Let's let this ride with 47 since there isn't a clear user benefit yet.
Attachment #8714534 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora-
To be explicit, I supported this decision over irc.
tracking-fennec: 46+ → 47+
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: