Closed Bug 1303841 Opened 8 years ago Closed 8 years ago

Break down compute time by task category

Categories

(Taskcluster :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

References

Details

Attachments

(1 file)

https://s3.amazonaws.com/taskcluster-bug1303153/tasks.csv
has a nice bunch of data regarding all tasks' durations, branches, workerTypes, tiers, and creation date.

Goals:
 * identify what value we're getting from the move to bigger test instances, and the cost
 * identify classes of tasks that could be turned off with little user impact
This is the breakdown of compute hours by workerType, branch, and platform, only for those with >1000d of compute in the last year.
Worth noting, since it had me baffled for a while: at the end of May, we turned try expiration down to 14 days.  So most of the try jobs submitted since then have expired and been removed from the data table.  But not all -- lots of try jobs are based on very old commits.  So for example (limited to try):

            workerType total      | 15-09  15-10  15-11  15-12  16-01  16-02  16-03  16-04  16-05  16-06  16-07  16-08  16-09
          desktop-test 12899d     | ..     ..     ..     140d   851d   2527d  2701d  2520d  1550d  49d    25d    1d     2536d

so we started doing tests in earnest in December '15, and were doing about compute-days per month.  There's missing data for the end of may, but September has 14 days' data (actually probably more, since this is a merged dataset from a few downloads).
Limiting just to inbound, which has a 1-year expiration and always has:

            workerType total    | 15-09  15-10  15-11  15-12  16-01  16-02  16-03  16-04  16-05  16-06  16-07  16-08  16-09 
    taskcluster-images 2d       | ..     ..     ..     ..     1h     1h     2h     ..     ..     3h     10h    8h     7h    
        flame-kk-2-sim 14d      | 5d     9d     ..     ..     ..     ..     ..     ..     ..     ..     ..     ..     ..    
        gecko-decision 34d      | 2d     3d     9d     2d     3d     3d     3d     4d     3d     ..     ..     3d     20h   
        flame-kk-0-sim 67d      | 25d    42d    ..     ..     ..     ..     ..     ..     ..     ..     ..     ..     ..    
           mulet-debug 73d      | ..     ..     ..     ..     ..     2h     1d     1d     21h    ..     23d    31d    16d   
               win2012 109d     | ..     ..     ..     ..     ..     ..     ..     ..     ..     ..     109d   ..     ..    
     gecko-3-b-win2012 120d     | ..     ..     ..     ..     ..     ..     ..     ..     ..     ..     ..     7d     113d  
       b2g-desktop-opt 126d     | 23d    37d    31d    22d    12d    ..     ..     ..     ..     ..     ..     ..     ..    
     b2g-desktop-debug 129d     | 24d    35d    33d    24d    13d    ..     ..     ..     ..     ..     ..     ..     ..    
        android-api-11 129d     | ..     33d    37d    29d    30d    ..     ..     ..     ..     ..     ..     ..     ..    
        flame-kk-1-sim 173d     | 73d    100d   ..     ..     ..     ..     ..     ..     ..     ..     ..     ..     ..    
          spidermonkey 183d     | ..     ..     ..     ..     ..     ..     ..     11d    60d    61d    51d    ..     ..    
          emulator-ics 273d     | 25d    42d    37d    29d    41d    40d    30d    29d    ..     ..     ..     ..     ..    
     gecko-1-b-win2012 274d     | ..     ..     ..     ..     ..     ..     ..     ..     ..     ..     53d    221d   ..    
    emulator-ics-debug 298d     | 27d    47d    43d    32d    46d    42d    31d    31d    ..     ..     ..     ..     ..    
           emulator-jb 319d     | 27d    47d    43d    34d    46d    43d    43d    37d    ..     ..     ..     ..     ..    
             mulet-opt 332d     | 19d    33d    25d    20d    30d    29d    29d    32d    24d    28d    24d    27d    13d   
          dbg-macosx64 337d     | 11d    34d    31d    21d    33d    36d    44d    40d    39d    4d     4d     23d    16d   
     emulator-jb-debug 345d     | 30d    54d    47d    35d    50d    46d    45d    39d    ..     ..     ..     ..     ..    
           emulator-kk 346d     | 28d    51d    46d    33d    46d    44d    38d    43d    17d    ..     ..     ..     ..    
     emulator-kk-debug 368d     | 30d    55d    48d    37d    51d    47d    39d    44d    17d    ..     ..     ..     ..    
          opt-macosx64 368d     | ..     16d    21d    15d    23d    29d    63d    62d    60d    9d     8d     37d    27d   
        android-api-15 397d     | ..     ..     ..     ..     12d    52d    42d    11d    4d     7d     80d    126d   63d   
            emulator-l 406d     | 32d    55d    49d    38d    55d    53d    54d    50d    21d    ..     ..     ..     ..    
      emulator-l-debug 435d     | 33d    61d    54d    41d    59d    57d    56d    52d    22d    ..     ..     ..     ..    
           opt-linux32 468d     | 6d     54d    42d    33d    46d    48d    53d    47d    54d    41d    34d    11d    ..    
       gecko-3-b-linux 487d     | ..     ..     ..     ..     ..     ..     ..     ..     ..     ..     ..     303d   184d  
           dbg-linux32 520d     | 8d     53d    49d    36d    51d    52d    60d    53d    60d    47d    40d    12d    ..    
       emulator-x86-kk 653d     | ..     86d    96d    73d    106d   99d    65d    94d    35d    ..     ..     ..     ..    
           dbg-linux64 965d     | 17d    49d    44d    34d    49d    51d    58d    54d    185d   174d   148d   86d    14d   
           opt-linux64 1252d    | 21d    76d    63d    49d    70d    72d    91d    86d    146d   256d   219d   90d    13d   
    desktop-test-large 3489d    | ..     ..     ..     ..     ..     ..     ..     ..     ..     ..     69d    1743d  1677d 
              flame-kk 5305d    | 393d   813d   742d   582d   843d   659d   458d   153d   135d   124d   117d   184d   101d  
      b2gtest-emulator 15009d   | 1640d  5612d  422d   705d   1436d  4984d  20d    189d   ..     ..     ..     ..     ..    
               b2gtest 16637d   | 1310d  2318d  2175d  1574d  1721d  2197d  2050d  2397d  69d    78d    745d   2d     1d    
   desktop-test-xlarge 25165d   | ..     ..     ..     ..     3d     356d   618d   556d   1042d  1672d  8116d  8869d  3933d 
          desktop-test 45656d   | ..     ..     ..     110d   1032d  2383d  3686d  3358d  6978d  8539d  9110d  8117d  2342d 

So I don't see any dropoff in usage of desktop-test, despite desktop-test-xlarge now using the same amount of compute time

Adding the desktop-test* together,

            workerType total    | 15-09  15-10  15-11  15-12  16-01  16-02  16-03  16-04  16-05  16-06  16-07  16-08  16-09 
    desktop-test-large 3489d    | ..     ..     ..     ..     ..     ..     ..     ..     ..     ..     69d    1743d  1677d 
   desktop-test-xlarge 25165d   | ..     ..     ..     ..     3d     356d   618d   556d   1042d  1672d  8116d  8869d  3933d 
          desktop-test 45656d   | ..     ..     ..     110d   1032d  2383d  3686d  3358d  6978d  8539d  9110d  8117d  2342d 
                   sum                                 110d   1035d  2739d  3976d  3914d  8020d 10211d 17295d 18729d  7952d

so that's a big jump through June, more than doubling from May to July.  Breaking down just the desktop-test* workerTypes and just on mozilla-inbound:

                                    platform total   | 15-12  16-01  16-02  16-03  16-04  16-05  16-06  16-07  16-08  16-09 
                   android-api-15-gradle/opt 156d    | ..     ..     ..     ..     ..     ..     ..     ..     96d    60d   
                         android-4-2-x86/opt 208d    | ..     ..     ..     ..     ..     ..     ..     ..     139d   69d   
                 android-4-3-armv7-api15/opt 5868d   | ..     ..     ..     ..     ..     ..     236d   2332d  2331d  969d  
                                linux64/asan 7362d   | ..     ..     ..     ..     ..     ..     529d   2434d  3045d  1355d 
                                 linux64/pgo 7839d   | ..     ..     ..     ..     ..     887d   2228d  1818d  2022d  884d  
                                 linux64/opt 10780d  | ..     ..     ..     ..     138d   2712d  2573d  2070d  2327d  960d  
               android-4-3-armv7-api15/debug 13079d  | ..     ..     ..     ..     ..     ..     499d   5333d  5054d  2194d 
                               linux64/debug 29018d  | 110d   1035d  2739d  4305d  3776d  4421d  4147d  3309d  3715d  1461d 

So I think we have the culprit: adding Android doubled our test load (I think the "more than double" is "double" plus normal growth in push rate).

That doesn't really answer either of the questions in comment 0, but it's an important observation.
https://github.com/taskcluster/taskcluster-queue/pull/119 can be useful in repeating this kind of analysis.
So looking just at linux64/debug testing on mozilla-inbound:

            workerType total   | 15-12  16-01  16-02  16-03  16-04  16-05  16-06  16-07  16-08  16-09
    desktop-test-large 1106d   | ..     ..     ..     ..     ..     ..     ..     20d    568d   518d
   desktop-test-xlarge 4131d   | ..     3d     356d   618d   541d   630d   503d   479d   701d   298d
          desktop-test 23781d  | 110d   1032d  2383d  3686d  3235d  3791d  3644d  2809d  2446d  645d
                 TOTAL 29018d  | 110d   1035d  2739d  4305d  3776d  4421d  4147d  3309d  3715d  1461d 

so, all things considered, using the new large and xlarge have proven to be a savings in total compute time, although a minor one.  Note that they are also a savings in moving us off of a deprecated EC2 instance type that requires special PV AMIs, so there's a lot in favor of the move.

We don't really have good historical pricing data (AWS's spot price graphs are total fantasy), but looking at current spot prices, we're paying a max of $0.0186 for m1.medium's, $0.0398 for m3.large, and $0.0868 for m3.xlarge.  Those prices are based on the provisioners measurements with a constant factor applied (2x), so let's assume that the ratios approximate actual prices paid.  Scaling everything to m1.medium, that means we're paying (ignoring constant factors)

  m1.medium: 1 unit
  m3.large: 2.14 unit
  m3.xlarge: 4.67 unit

applying those ratios to the chart above (again we're ignoring constant factors, so this just shows ratios, not dollar amounts or anything like that):

            workerType total   | 15-12  16-01  16-02  16-03  16-04  16-05  16-06  16-07  16-08  16-09
    desktop-test-large 2355    | ..     ..     ..     ..     ..     ..     ..     42     1209   1103
   desktop-test-xlarge 19250   | ..     13     1658   2879   2521   2935   2343   2232   3266   1388
          desktop-test 23781   | 110    1032   2383   3686   3235   3791   3644   2809   2446   645
                 TOTAL 45386   | 110    1045   2739d  6565   5756   6726   5987   5083   6921   3136 

conclusion: we're definitely spending more for these instance types, but it looks like it's around 20% (comparing march and august, which had similar total hours consumed), which isn't so bad for all of the advantages.

This also suggests that increasing capacity on the desktop-test-*large instances might get us a bit of a savings while still delivering results to developers faster.

So that answers the first question in comment 0
As for comment 1, every way I slice this I don't see much that makes me wonder "why are we doing that??".  A few places to explore:

* if we're still building android debug/opt on buildbot, we should stop!
* flame-kk is using >100 compute days per month, which at $0.07/hr for m3.2xlarge comes to around $5k/mo.  That said, most b2g stuff has already been turned off in July so this is probably a cost worth carrying.

I think that's the end of my analysis for now..
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: