Closed Bug 1129222 Opened 5 years ago Closed 5 years ago

Implement HTTP Edge Server

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect)

defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mreid, Assigned: mreid)

References

()

Details

Implement the HTTP Edge Server spec as discussed on the dev-metrics-pipeline group.
Blocks: 1129179
Assignee: nobody → mreid
Duplicate of this bug: 1125430
Things that are not currently available in the Heka Messages generated by HttpListenerInput:
- URL Path
- URL Host
- Client IP (RemoteAddr)

Things we'd like in the "metadata" message that are not available:
- Request Duration
- Server response code
- Server response body (this is not terribly important)
The Path, Host, and RemoteAddr fields were added with this PR:
https://github.com/mozilla-services/heka/pull/1328
The next challenge is that by default, the HttpListenInput splits incoming payloads into multiple messages. I think we need a new splitter that just reads to EOF for each request.
I initially thought I could use a simple workaround - create a splitter that never returned a record, and use "GetRemainingData" at the end to return a single record for the entire payload.

Unfortunately this doesn't work properly if the payload contains more than MAX_MESSAGE_SIZE bytes - I get an infinite loop.

Rob, can you recommend a good way to get the entire message body of the HttpListenInput as a single message?
Flags: needinfo?(rmiller)
As discussed in IRC, I'll implement an EOFSplitter that will resolve this issue.
Flags: needinfo?(rmiller)
Thanks!
We should be able to implement the full HTTP Edge spec in the new SandboxInput plugin which is now available:  https://github.com/mozilla-services/heka/pull/1340
For the short term, I'd like to use the HttpListenInput with the EOFSplitter to implement the majority of the desired spec.

It appears that it would take some significant effort / refactoring of the HttpListenInput to implement the full spec, and may break backwards compatibility (namely the HTTP Response codes and behaviour).

So I propose that we do a v1 with as much as we can reasonably get from the HttpListenInput, and aim for a v2 edge server based on the SandboxInput with a complete implementation of the spec.

Rob, Trink, thoughts?
Flags: needinfo?(rmiller)
Flags: needinfo?(mtrinkala)
+1
Flags: needinfo?(rmiller)
+1
Flags: needinfo?(mtrinkala)
The majority of the specification has been implemented using the HttpListenInput with a custom SandboxDecoder.

See details at:
https://github.com/mozilla-services/data-pipeline/blob/master/heka/sandbox/decoders/http_edge_decoder.lua

Items that were *not* implemented in the initial version:
- HTTP response codes for GET responses and POSTs to invalid namespaces.  All requests return 200 as I understand it.
- The metadata message describing the message size, request duration, etc.

A more complete implementation is tracked in Bug 1137424.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.