Closed
Bug 1071671
Opened 11 years ago
Closed 8 years ago
Implement a stats proxy for aggregating in task stats
Categories
(Taskcluster :: Workers, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jlal, Unassigned)
Details
(Whiteboard: [docker-worker])
In _tasks_ I want to expose the ability to log metrics (in a scoped fashion!) so for our CI runs we can measure at a granular level what is happening...
The implementation should be very similar to the live logger and proxy essentially we just need to forward UDP packets to statsd. The credentials should be exposed to this proxy but not to the tasks.
Independently I have bugs open for reporting metrics from mozharness and bash:
- https://gist.github.com/lightsofapollo/c839614df737f9db3c7c
- https://bugzilla.mozilla.org/show_bug.cgi?id=1071281
This is an area where I think we _could_ use influxdb but its harder to implement that well because of the more structured format... The proxy could convert (or multiplex) the values between statsd/influx if we want to transition these metrics over.
Comment 1•11 years ago
|
||
So influxdb series are created dynamically, and they can have special characters in their name (see [1]).
Hence, requiring the `stats:put-points:<seriesName>` in order for a proxy of sorts to submit, a point to
a series named `task-statistics/<seriesName>` would be perfectly valid.
Depending on how many measurements we want to do, this could perhaps be done with the existing proxy.
Imagine an API end-point on stats.taskcluster.net, looking like:
PUT /v1/points
[
{
name: "task-statistics/<seriesName>",
...
},
...
]
Where the scope `stats:put-points:<seriesName>` is a required scope. This end-point would just authenticate and forward to influxdb. We could also make a proxy on docker-worker that caches the points and sends them all together.
Remark:
An api end-point like:
PUT /v1/point/<seriesName>?col1=val1&col2=val2&col3=...
Would also be possible to make, and an end-point like this would be very easy to use with CURL from a bash script or from python within mozharness.
Anyways, my point is, this can definitely be done with influxdb, by putting scopes on the serie name, and prefix all series containing task statistics with "task-statistics/".
References,
[1] http://influxdb.com/docs/v0.8/api/query_language.html
Comment 2•11 years ago
|
||
Actually, thinking about this, I realize that API end-points or UDP ports might not be the best solution.
Why not just declare a file, this will work when running things locally too...
Example:
{
payload: {
image: "...",
command: ["...", "cat $A_JSON_LINE >> /my-stats-file"],
statistics: [
"/my-stats-file"
]
}
}
In this case the task would write JSON blobs to /my-stats-file, so that it looks like:
/my-stats-file
{name: "<seriesName>", col1: "val1", col2: "val2", ...}\n
{name: "<seriesName>", col1: "val1", col2: "val2", ...}\n
...
Then when the docker container is terminated, docker-worker reads the "/my-stats-file" from the container, and submits the entries written to influxdb. While validating that the task has sufficient scopes to write to the specified series.
It's not like we really need the statistics from tasks to arrive in real-time. It's seems perfectly sane to me that they are submitted when the task is completed.
Furthermore, "/my-stats-file" can be copied out with "docker copy" when doing a local debug run, so it's easy to see the stats locally and debug any issues with them.
--------------
Yet, another, perhaps even more generic solution would be for the task to upload an artifact called "statistics/influx-points.csv" and have an alternative route called: "statistics.<seriesName>"
Then stats.taskcluster.net could listen for completed tasks with the routing key: "route.statistics.#"
Fetch the artifact "statistics/influx-points.csv" from these tasks and upload those points to influxdb.
This would be much more generic, and when we add a windows/osx/phone worker that isn't docker-worker based, this would still work, as long as the new worker supports exporting artifacts.
Additional, benefit is that it would be easy to inspect "statistics/influx-points.csv" and look for bugs, as this is submitted as an artifact. (Note we can use expiration date for artifacts, to not keep these around for years).
Only, downside to this is that errors, if garbage is inserted into "statistics/influx-points.csv", won't show up in task logs. We can report these errors as a special statistics series... But that's about it.
(But if we use CSV, we can just assume null, if a row doesn't have enough values, and skip values if there is too many).
Remark, using the route "statistics.<seriesName>" prevents us from submitting to multiple series from a task. If we want that we can easily architect our way around this limitation.
Reporter | ||
Comment 3•10 years ago
|
||
More context: We want to use influxdb.. I think the sane way to implement this is with a "proxy" (much like the TC proxy) which holds the credentials outside of the container which will submit them. We then link in a bridge (identical to what we do for the taskcluster-proxy) which will have no credentials which the container can submit stats to.
Other notes:
- The proxy probably should force which database is written to.
- The proxy _could_ support other formats which may be easier to write to (like what http://opentsdb.net/overview.html does which would make submitting from bash easy)
Updated•10 years ago
|
Component: TaskCluster → Docker-Worker
Product: Testing → Taskcluster
Comment 4•10 years ago
|
||
we have introduced the concept of log prefixes for log entries within a task, such as:
[source:event] message
In this case we could log within a task:
[taskcluster-task:stat] payload
Then the logs would be piped to another stream that does something with the stat events or at the very least outputs them to a seperate log file so they can be acted upon independently.
Updated•10 years ago
|
Summary: docker-worker: Stats Proxy → Implement a stats proxy for aggregating in task stats
Updated•9 years ago
|
Whiteboard: [docker-worker]
Updated•9 years ago
|
Component: Docker-Worker → Worker
Comment 5•8 years ago
|
||
I think with some of the logging data we include in logs, as well as what we record to signalfx, we are in much better shape. There could be some more specific bugs opened about what users would like to be recorded as stats, but I believe that should be entered as more specific bugs as the need occurs.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•6 years ago
|
Component: Worker → Workers
You need to log in
before you can comment on or make changes to this bug.
Description
•