Closed
Bug 1369737
Opened 8 years ago
Closed 7 years ago
Add an ingestion endpoint to the edge server for the Pioneer project
Categories
(Data Platform and Tools :: General, enhancement, P1)
Data Platform and Tools
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: trink, Assigned: whd)
References
Details
(Whiteboard: [SvcOps])
Full endpoint details pending (will be similar to telemetry). We have capacity to handle this data on the current edge and it will be fully encrypted so we can share the existing infrastructure here.
Basic requirements:
1) new s3 landfill bucket to upload the raw encrypted submissions
2) new Kafka topic
3) POST request with JWE content
4) URI will hopefully be in the form of '/submit/pioneer/<docType>/<version>/<?documentId?>'
- docType is the JSON schema specification name (in this case heatmap)
- version is the JSON schema version of the encrypted content
- documentID is a unique identifier per submission for de-duplication
Reporter | ||
Comment 1•8 years ago
|
||
Victor will you provide details about the client submission. URI, Headers, HTTP Method desired endpoint URL. Also, please attach a sample HTTP submission, thanks.
Flags: needinfo?(vng)
Assignee | ||
Updated•8 years ago
|
Whiteboard: [SvcOps]
Updated•8 years ago
|
Assignee: nobody → whd
Points: --- → 1
Priority: -- → P2
Comment 2•8 years ago
|
||
While Victor is still busy, I can provide some of those details.
The URL endpoint scheme suggested by trink looks good (POST request to /submit/...), there should be no extra HTTP headers, the content type should be 'application/jose'. I've used a UUID1 for the documentId, something slightly different might be easier from JavaScript.
To test, an example would be (replacing <service> with the real DNS):
curl -i -H 'Content-Type: application/jose' https://<service>/submit/pioneer/heatmap/1/d71bc4a24b9611e78f2bf40f2431382e -d 'eyJhbGciOiJSU0EtT0FFUCIsImVuYyI6IkEyNTZHQ00iLCJ6aXAiOiJERUYifQ.Dg9BlOP_TPKT8LZAcOElpPMZw_qwqg1H5QlskQrMQ7MPS9UgHU28MaVSMYFhOw_VWD2NCxSLjP5Lx4LXwwyy0oF_j6N0AfCeHYI_T77zlX0Cg5Bw8pEGCpoWanLGGvAfuFzvYWiMg047bR8sgsLdeY-LkonRc5XWl5fbpSWjDwr4jr9nHMHp4d-PYXDrBEz_bmICfLtJSZLkTz6w8P-1WI0vTgxRab8BBlCSmruF-9HHcfNhh3VzemnVfIpb7K54RfIPhhBttGyy6P4XJBm4IOYLCL4Wa7OZagVkc3phWDcW21wMtVJliqABD6nQ3eyl97Zm23vEguFLE6baOYa6xQ.nHTR3GbELbs_yaxB.u2hfshoYoIXqVN5k-wak6daQcYPDc9WN8N8uCz9sz4jCTNyAI5UfzTJmiPhSAilID8e-F2reqQI0XSlQ53vNxaGqH3sDs23EMQYWbsW5i2WV_fTISaMTdEgh4V2YWOLvEUsk0yy8nyCDoOVptXFTVT-CyMi2FprJb4jEi4Lm8S3h8eXrJboXuBXuP7WxLpsVxsF1PiVZvK1F4zocqlB_3ZNSabPx9QRvToZM5BbguvRtBmPtF58C.LsX82ppr7z9l0b7tPGGRsw'
Typical upload sizes are small, 1kb - 10kb. This assumes daily uploads of per-user data and the typical average of 70 URLs being visited per user per day. Maximum upload size could reach 1mb.
The body is a compact representation of a JWE payload using zlib compression before encryption. This is a dot separated combination of multiple base64 encoded strings, so the body is even ASCII safe.
Reporter | ||
Comment 3•8 years ago
|
||
Doh, I thought I was done with the integration tests... adding the new zlib requirement
Reporter | ||
Comment 4•8 years ago
|
||
The cjose lib does not support the optional zip header. Ideally we should patch the library as oppose to shoehorning it into the Lua wrapper.
Comment 5•8 years ago
|
||
:trink I'm not sure we can easily change this. The TelemetryController code in Firefox doesn't let me control much after the /submit/ in the URL when the submitExternalPing method is invoked.
All pings end up getting sent in the 'telemetry' namespace. The only field I have control over right now is the docType by setting aType.
http://searchfox.org/mozilla-central/source/toolkit/components/telemetry/TelemetryController.jsm#211
Can the endpoint parse the data properly if we set the docType using a pattern like 'pioneer_<study_name>'? So we would have different studies like 'pioneer_heatmap' or 'pioneer_study2'?
Flags: needinfo?(vng)
Comment 6•8 years ago
|
||
cc'ing :gfritzsche
Ok - so talking to trink - we want to get the 'namespace', 'id', and the list of dimensions specified when submitExternalPing is invoked. I'm writing a small patch to control the id - I can try to add the extra field specification at the same time.
The problem here is that the TelemetryController is not in a system addon - so this kind of modification would have to ride the trains for release.
The only way I can see this going out faster to users is to copy the code and embed new versions of the TelemetryController and TelemetrySend into a system addon. :gfritsche - do you see any other way to get this out faster?
Flags: needinfo?(gfritzsche)
Comment 7•8 years ago
|
||
I'm not sure i understand the full context here.
Why can't we submit this data in a standard ping format instead of changing the upload/POST mechanism?
You don't need to change anything on the client when just using:
payload = {
studyName: "...";
// other properties ...
data: "... your encrypted payload";
};
TelemetryController.submitExternalPing("pioneer", payload, {...options...});
Flags: needinfo?(gfritzsche)
Assignee | ||
Updated•7 years ago
|
Status: NEW → ASSIGNED
OS: Windows 10 → Unspecified
Priority: P2 → P1
Hardware: x86_64 → Unspecified
Assignee | ||
Comment 8•7 years ago
|
||
The edge routing portion of this landed last week. We have other bugs to track the processing of the data, so I'm calling this done.
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Updated•3 years ago
|
Component: Pipeline Ingestion → General
You need to log in
before you can comment on or make changes to this bug.
Description
•