Include a reason/channel/version component in telemetry submission url

RESOLVED FIXED in mozilla24

Status

()

defect
RESOLVED FIXED
6 years ago
6 years ago

People

(Reporter: taras.mozilla, Assigned: froydnj)

Tracking

unspecified
mozilla24
x86_64
Windows 8
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [Telemetry:P1])

Attachments

(3 attachments)

Currently we store all telemetry data together. This is inefficient for map/reduce purposes. We plan to move to storing each channel/version in individual directories/files.

If we do that, we have to parse the packet to extract version/channel. This is expensive. The client should indicate this info via url to reduce load on server.
Mark, what url format should we use?
Flags: needinfo?(mreid)
(In reply to Taras Glek (:taras) from comment #0)
> If we do that, we have to parse the packet to extract version/channel. This
> is expensive. The client should indicate this info via url to reduce load on
> server.

For avoidance of doubt: when you say "info via url" you mean something like "info via a custom HTTP header", right?
Flags: needinfo?(taras.mozilla)
Summary: Include a channel/version component in telemetry submission url → Include a reason/channel/version component in telemetry submission url
(In reply to Nathan Froyd (:froydnj) from comment #2)
> (In reply to Taras Glek (:taras) from comment #0)
> > If we do that, we have to parse the packet to extract version/channel. This
> > is expensive. The client should indicate this info via url to reduce load on
> > server.
> 
> For avoidance of doubt: when you say "info via url" you mean something like
> "info via a custom HTTP header", right?

No i mean, the xhr URL eg http://data.mozilla...blah/idle-daily/nightly/23
Flags: needinfo?(taras.mozilla)
See comment 3 of bug 850491.  

Bagheera can be configured to accept wildcard-based topics, so we can make use of that.

That said, please do NOT change the submission URL until we've had a chance to verify Bagheera's behaviour and performance using the wildcard approach.
Flags: needinfo?(mreid)
(In reply to Mark Reid [:mreid] from comment #4)
> See comment 3 of bug 850491.  
> 
> Bagheera can be configured to accept wildcard-based topics, so we can make
> use of that.
> 
> That said, please do NOT change the submission URL until we've had a chance
> to verify Bagheera's behaviour and performance using the wildcard approach.

I'm still waiting for a concrete url proposal from you
Flags: needinfo?(mreid)
The wildcard approach likely won't apply cleanly to using HBase to store the data because we must create the hbase table schemas beforehand.  This is especially a problem where the channel name and version reported by the client is dynamic.  We could potentially end up with a lot of long tail misc channels or versions if we just arbitrarily created the tables as the requests came in.
Sorry for the long delay on this.

Here is a proposal for handling arbitrary partition data in the submission URL:
https://etherpad.mozilla.org/QT7KXZtI0V
Flags: needinfo?(mreid)
The initial code to implement partitioning in Bagheera is here:
https://github.com/mozilla-metrics/bagheera/pull/22

Please use the following partitions for the Telemetry client code:
http://<bagheera_host>/submit/telemetry/<id>/<reason>/<appName>/<appVersion>/<appUpdateChannel>/<appBuildID>
This is just a change because we're sending pings to more specific URLs now.
Later tests will add basic consistency checking for where we're sending pings.

We could add handlers for the specific URLs, but I think doing it this
way makes it easier to do the checking in the HTTP response handlers
that we are responding to the correct URL.
Attachment #764789 - Flags: review?(vdjeric)
What happens here is that we double-encode saved pings, first:

  { reason: ..., slug: ..., payload: <JSON.stringify'd ping info> }

and then JSON-encode that when we save it to disk.  This is inefficient for all
the escaping we will do of |payload|, but that's the way it is.

This scheme also causes problems, because all the information we need to determine
where to send the ping is hidden inside a JSON-ified string.  So we could either:

1. JSON.parse the payload every time we need to determine the submission URL; or
2. Don't JSON.stringfy the payload.

I thought the latter option was better.
Attachment #764794 - Flags: review?(vdjeric)
...and the patch we really wanted in the first place.
Attachment #764795 - Flags: review?(vdjeric)
Attachment #764789 - Flags: review?(vdjeric) → review+
Comment on attachment 764794 [details] [diff] [review]
part 2 - don't double-JSON-encode saved pings

+      if (typeof(ping.payload) == "string") {
+        ping.payload = JSON.parse(ping.payload);
+      }

this needs a comment indicating that this is for backward compat and that we should remove it sometime.
Comment on attachment 764795 [details] [diff] [review]
part 3 - send telemetry pings to new, partitioned URLs

+  let pathComponents = request.path.split("/").slice(3);
is good,
but this needs a check that the rest of the path components are what we expect
(In reply to Taras Glek (:taras) from comment #13)
> Comment on attachment 764795 [details] [diff] [review]
> part 3 - send telemetry pings to new, partitioned URLs
> 
> +  let pathComponents = request.path.split("/").slice(3);
> is good,
> but this needs a check that the rest of the path components are what we
> expect

nevermind this comment. I forgot how slice works.
Attachment #764794 - Flags: review?(vdjeric) → review+
Comment on attachment 764795 [details] [diff] [review]
part 3 - send telemetry pings to new, partitioned URLs

Taras: Afaict, only one of the pathComponents is currently being checked for correctness (the submission reason). We should probably check the rest of the submisison path as well.

Nathan, have you tested this patch with a localhost server? If not, I can send you a very simple script that dumps submitted Telemetry pings to stdout.
Attachment #764795 - Flags: review?(vdjeric) → review+
(In reply to Vladan Djeric (:vladan) from comment #15)
> Comment on attachment 764795 [details] [diff] [review]
> part 3 - send telemetry pings to new, partitioned URLs
> 
> Taras: Afaict, only one of the pathComponents is currently being checked for
> correctness (the submission reason). We should probably check the rest of
> the submisison path as well.

right, that should be addressed. Can do it in a followup patch, I don't wanna block this landing.
Comment on attachment 764789 [details] [diff] [review]
part 1 - make TelemetryPing tests use registerPrefixHandler

Review of attachment 764789 [details] [diff] [review]:
-----------------------------------------------------------------

::: toolkit/components/telemetry/tests/unit/test_TelemetryPing.js
@@ +78,5 @@
>  }
>  
>  function registerPingHandler(handler) {
> +  httpserver.registerPrefixHandler("/submit/telemetry/",
> +				   wrapWithExceptionHandler(handler));

There seem to be tabs here :(
You need to log in before you can comment on or make changes to this bug.