Closed Bug 807134 Opened 7 years ago Closed 7 years ago

Don't use Content-Encoding for FHR upload

Categories

(Mozilla Metrics :: Metrics Data Ping, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED
Moved to JIRA

People

(Reporter: gps, Unassigned)

References

Details

(Whiteboard: [JIRA METRICS-1264])

I was talking with Brian Smith (Necko peer) about the FHR upload and the subject of how to encode the data came up. After some discussion, we think it would be best for the encoding of the HTTP body to be communicated via strictly Content-Type rather than Content-Encoding.

The problem with Content-Encoding is that some HTTP stacks may treat it specially. Specifically, it's possible that if we use Content-Encoding an intermediary HTTP agent (such as a load balancer in Mozilla's network) may transparently decode the entity body. When we're talking about tens of megabytes per second of compressed data for FHR, this would likely be the wrong decision because of the hardware resources required to uncompress the data.

If we use Content-Type to declare compression, only the application on the origin server (Bagheera for FHR) should be involved with reading the HTTP request body. It is in full control and gets to make all the calls.

So, I'm requesting that we communicate the media type fully through the Content-Type request header and don't apply any content encodings via the standard HTTP mechanism. For the initial landing with compressed JSON, we could go with something like:

  application/zlib+json

I'm not sure if that is a proper media type. We could easily go the route of inventing our own x-prefixed media type:

  application/x-zlib-json

(zlib and gzip use the same compression algorithm - gzip includes file header fields which I don't think are necessary and thus just waste space).
I believe the correct way to form this MIME type would be

  application/json+zlib

following the example of RFC 3023. Or even:

  application/fhr+json+zlib

to follow the example of application/rdf+xml, and extend it with the concept of "built on top of a third MIME type".
Target Milestone: Unreviewed → Moved to JIRA
Whiteboard: [JIRA METRICS-1264]
Hi folks -

Content-Encoding must be set in order for Bagheera to decompress. We have a backwards compatible dependency on this with the Telemetry project. This is well-established API used across many services in Metrics. 

Please consult Sec 14.11 of http://www.w3.org/Protocols/rfc2616/rfc2616-sec14.html
for background.

From the JIRA

 Xavier Stevens added a comment - 2012-11-12 09:48 AM

This would force us to replicate the functionality we already get for free with the Netty API. This seems unnecessary and our efforts would be better spent just making sure our load balancer doesn't decompress the content automatically. Please consult with network operations on the details of the load balancer.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
Just as an aside, the w3 doc is the best reference I had available.

We are upping our documentation efforts and the current bagheera docs do not cover the exact request that you make, but we can certainly make it a todo to update them.

In the meantime, the docs we do have on the bagheera service are at:

https://mana.mozilla.org/wiki/pages/viewpageattachments.action?pageId=19660820&highlight=bagheera-data-flow.png#Metrics+Services+Cheat+Sheet-attachment-bagheera-data-flow.png

Again, for your purposes, probably not as useful as the w3 spec bagheera was built to conform to at this time.
I'm not asking Bagheera to remove support for Content-Encoding. Instead, I'm requesting that Bagheera support data compression through an alternative means.

As mentioned in the initial comment, using Content-Encoding means that origin servers (and possibly intermediate agents) will almost certainly decompress data no questions asked. For the overwhelming majority of services on the web, this makes sense. However, it's possible (hopefully unlikely) that we will not want this. It's possible that the release of FHR causes so much CPU load due to compression overhead that we want to defer decompression of the entity body until post-request. If we are using Content-Encoding, we will likely lose this ability. Using Content-Type gives us a backdoor at the application level.

The request in this bug is about giving the Bagheera service more flexibility and control. If we do not implement this change, our exposure to CPU saturation by way of decompression is higher. Furthermore, if the load balancers ever perform the decompression themselves, there will be a significant cost to Mozilla because the cost per CPU core on a load balancer is much higher than the cost of a CPU core on a commodity server (assuming we are using specialized load balancer appliances).

While this won't block deployment of FHR, speaking from my experience of running services at scale, I highly encourage you to rethink the decision to WONTFIX.
For now I've added support for the mime type "application/json+zlib" to mean the same thing as Content-Encoding: deflate as long as Content-Encoding isn't specified.
Resolution: WONTFIX → FIXED
You need to log in before you can comment on or make changes to this bug.