Closed Bug 1304723 Opened 8 years ago Closed 6 years ago

Design tc-stats-collector components to gather data related to AWS costs

Categories

(Taskcluster :: Operations and Service Requests, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: dustin, Unassigned)

References

Details

It would be great to have ongoing measures of some basic things like storage consumed, compute time, etc., preferably broken down by branch and platform.
Another interesting analysis might be push-level statistics: E2E, total compute, total space, maybe even total cost.  Better yet if we could display those on the push inspector and treeherder!


Fri Sep 23, 9:37:34 - dustin@mozilla.com                                                11% - 25 in progress

    7d2e37d82d82 DH try: -b o -p all -u all -t none                                         Linux opt 	  tc(B) ..
    2e46d0c12db6 Bug 1304679 - Box-model highlighter now highlights text nodes; r=gl        Linux64 opt   tc(B) ..

  2.8 days of compute time, 11.2 GB of artifacts stored, USD$5.19 total cost (est.)

(I totally made those figures up.. I have no idea what they would be!!)
Summary: Design tc-stat-collector components to gather data related to AWS costs → Design tc-stats-collector components to gather data related to AWS costs
I think the first step here is to get this data into signalfx.  The data CloudHealth consumes is available from the cloudwatch API, so we can access it directly, given appropriate AWS credentials.
The cloudhealth API appears to be nonfunctional.

let scanCloudwatchMetrics = async ({cfg}) => {
  let cw = new aws.CloudWatch({region: 'us-west-2'}, cfg.aws.credentials);
  let res = await cw.listMetrics({Namespace: 'AWS/S3'}).promise();
  res.data.Metrics.forEach(met => {
    console.log(met);
  });

  let met = res.data.Metrics[0];
  let metres = await cw.getMetricStatistics({
    StartTime: new Date(2016, 10, 20),
    EndTime: new Date(2016, 10, 22),
    Namespace: 'AWS/S3',
    MetricName: met.MetricName,
    Dimensions: met.Dimensions,
    Period: 3600,
    Statistics: ['Average'],
  }).promise();
  console.log(metres.data);   
}; 

gives:

{ Namespace: 'AWS/S3',
  MetricName: 'NumberOfObjects',
  Dimensions:
   [ { Name: 'StorageType', Value: 'AllStorageTypes' },
     { Name: 'BucketName', Value: 'taskcluster-docker-registry-v2' } ] }
...
{ Namespace: 'AWS/S3',
  MetricName: 'NumberOfObjects',
  Dimensions: 
   [ { Name: 'StorageType', Value: 'AllStorageTypes' },
     { Name: 'BucketName', Value: 'test-bucket-for-any-garbage2' } ] }
{ ResponseMetadata: { RequestId: 'c82cf063-9c94-11e6-8b46-33c7964d5cf9' },
  Label: 'NumberOfObjects',
  Datapoints: [] }

which is the same result I get with the aws command-line tool: no datapoints.  But in the UI console, I can see datapoints for these values.  In all cases I'm using my IAM user, which has administrative access (and is able to list metrics, so that's not invalid).  The above all uses the same region, so no issue there.

http://docs.aws.amazon.com/AmazonS3/latest/dev/cloudwatch-monitoring.html suggests this should "just work".  I've found some forum postings suggesting that omitting the dimensions for a metric will not automatically aggregate across that dimension (that is, omitting the BucketName dimension would not aggregate across all buckets).  Still, regardless of the dimensions I pass, I get no data.  Regardless of the date range I pass, no data (some forums posts suggest that including the current time, or even the current date, in the date range causes no datapoints).

If I shrink the Period to something small enough, I get an error indicating that the call would return more than thousands of datapoints, and I should change the period or start/end time.  Haha, very funny, you know perfectly well you'd still give me no datapoints, AWS.
With some help, I was able to get this working on the commandline, at least.  I think the script above failed because it didn't provide `Unit: "Count"`.  maybe.  On the command line, omitting the option worked..

$ aws --region us-west-2 cloudwatch get-metric-statistics --namespace AWS/S3 --metric-name NumberOfObjects  --start-time 2016-10-20T00:00:00 --end-time 2016-10-27T00:00:00 --statistics Average --period 3600 --dimensions Name=BucketName,Value=taskcluster-public-artifacts Name=StorageType,Value=AllStorageTypes

$ aws --region us-west-2 cloudwatch get-metric-statistics --namespace AWS/S3 --metric-name BucketSizeBytes  --start-time 2016-10-20T00:00:00 --end-time 2016-10-27T00:00:00 --statistics Average --unit Bytes --period 3600 --dimensions Name=BucketName,Value=taskcluster-public-artifacts Name=StorageType,Value=StandardStorage
I still think this would be nice in general, but I'm not sure what it would look like.  And I'm busy making a mess of tc-stats-collector elsewhere..
Assignee: dustin → nobody
Component: General → Operations
Found in triage.

Cost Explorer allows us to view this at a macro-level. For larger projects, using separate AWS accounts would allow us to use Cost Explorer to provide some of that info.

Dustin: does Cost Explorer get us far enough here?
Flags: needinfo?(dustin)
I think that's a better approach, yes.
Status: NEW → RESOLVED
Closed: 6 years ago
Flags: needinfo?(dustin)
Resolution: --- → WONTFIX
Component: Operations → Operations and Service Requests
You need to log in before you can comment on or make changes to this bug.