taskcluster-auth: CI broken

RESOLVED FIXED

Status

RESOLVED FIXED
3 years ago
3 years ago

People

(Reporter: pmoore, Assigned: jonasfj)

Tracking

Details

(Reporter)

Updated

3 years ago
Blocks: 1240465
Assignee: jopsen → dustin
I'd like to get this to the point where it's green for PRs, but that's not something I can work on soon.  It will require a fake pulse, fake AWS, and support from azure-entities for fake SAS URLs.

Pete, if you think green on master is enough, please close.
Assignee: dustin → nobody
Flags: needinfo?(pmoore)
Eh, master's not green :(

It fails against the real Azure with what seems to be timeouts.
Assignee: nobody → dustin
Flags: needinfo?(pmoore)
  taskcluster-client Calling: deleteRole, retry: 4 +1ms
  superagent max redirects 5 +0ms
  superagent set Authorization "Hawk id="root", ts="1453841917", nonce="vB7Nos", mac="lRSkyLjLKyIJM/yQ6C84uVtaTkWPX+7KGA2glgRaGuw="" +0ms
  superagent DELETE http://localhost:60551/v1/roles/test-role-8 +1ms
  superagent timeout 30000ms DELETE http://localhost:60551/v1/roles/test-role-8 +0ms
  express:router dispatching DELETE /v1/roles/test-role-8 +5s
  express:router query  : /v1/roles/test-role-8 +0ms
  express:router expressInit  : /v1/roles/test-role-8 +0ms
  express:router logger  : /v1/roles/test-role-8 +0ms
  express:router trim prefix (/v1) from url /v1/roles/test-role-8 +0ms
  express:router router /v1 : /v1/roles/test-role-8 +0ms
  express:router dispatching DELETE /roles/test-role-8 +3s
  express:router textParser  : /v1/roles/test-role-8 +0ms
  body-parser:text skip empty body +5s
  express:router <anonymous>  : /v1/roles/test-role-8 +0ms
  express:router <anonymous>  : /v1/roles/test-role-8 +0ms
  azure:table Request: DELETE /TestRoles25(PartitionKey='test-role-8',RowKey='role'), retry: 0 +2ms
  azure:table Error code: ResourceNotFound for GET /TestRoles25(PartitionKey='test-role-8',RowKey='role') on retry: 0 +28ms
  base:stats NullDrain.addPoint(AzureTableOperations, {"component":"auth-tests","process":"server","duration":31.85746,"table":"TestRoles25","method":"getEntity","error":"ResourceNotFound"}) +29ms
  azure:table Error code: ResourceNotFound for DELETE /TestRoles25(PartitionKey='test-role-8',RowKey='role') on retry: 0 +3s
  base:stats NullDrain.addPoint(AzureTableOperations, {"component":"auth-tests","process":"server","duration":2561.60694,"table":"TestRoles25","method":"deleteEntity","error":"ResourceNotFound"}) +3s
  base:exchanges Publishing message on exchange: exchange/jonasfj-debug/v1/role-deleted +0ms
  azure:table Request: GET /TestRoles25(PartitionKey='test-role-8',RowKey='role'), retry: 0 +1ms
  superagent clear timeout DELETE http://localhost:60551/v1/roles/test-role-8 +1ms
  taskcluster-client Request error calling deleteRole (retrying), err: Error: socket hang up, JSON: Error: socket hang up +0ms
  base:stats NullDrain.addPoint(ResponseTimes, {"duration":2563.786103,"requestMethod":"delete","statusCode":200,"method":"deleteRole","component":"auth-tests","paramRoleId":"test-role-8"}) +1ms
  morgan log request +0ms
  base:stats NullDrain.addPoint(ExchangeReports, {"component":"auth-tests","process":"server","duration":23.322495,"routingKeys":1,"payloadSize":36,"exchange":"exchange/jonasfj-debug/v1/role-deleted","error":"false"}) +22ms
  azure:table Error code: ResourceNotFound for GET /TestRoles25(PartitionKey='test-role-8',RowKey='role') on retry: 0 +29ms
  base:stats NullDrain.addPoint(AzureTableOperations, {"component":"auth-tests","process":"server","duration":30.501236,"table":"TestRoles25","method":"getEntity","error":"ResourceNotFound"}) +8ms
  azure:table Request: GET /TestRoles25(PartitionKey='test-role-8',RowKey='role'), retry: 0 +3s
  taskcluster-client Calling: deleteRole, retry: 5 +1ms
  superagent max redirects 5 +0ms
  superagent set Authorization "Hawk id="root", ts="1453841922", nonce="qoFmaf", mac="XW9OFjRockpbU5ce0yPnFqm1FFGrQvUSJanpCOYlqBI="" +0ms
  superagent DELETE http://localhost:60551/v1/roles/test-role-8 +1ms
  superagent timeout 30000ms DELETE http://localhost:60551/v1/roles/test-role-8 +0ms
  express:router dispatching DELETE /v1/roles/test-role-8 +5s
  express:router query  : /v1/roles/test-role-8 +0ms
  express:router expressInit  : /v1/roles/test-role-8 +0ms
  express:router logger  : /v1/roles/test-role-8 +0ms
  express:router trim prefix (/v1) from url /v1/roles/test-role-8 +0ms
  express:router router /v1 : /v1/roles/test-role-8 +0ms
  express:router dispatching DELETE /roles/test-role-8 +3s
  express:router textParser  : /v1/roles/test-role-8 +0ms
  body-parser:text skip empty body +5s
  express:router <anonymous>  : /v1/roles/test-role-8 +0ms
  express:router <anonymous>  : /v1/roles/test-role-8 +0ms
  azure:table Request: DELETE /TestRoles25(PartitionKey='test-role-8',RowKey='role'), retry: 0 +2ms
  azure:table Error code: ResourceNotFound for GET /TestRoles25(PartitionKey='test-role-8',RowKey='role') on retry: 0 +25ms
  base:stats NullDrain.addPoint(AzureTableOperations, {"component":"auth-tests","process":"server","duration":29.163865,"table":"TestRoles25","method":"getEntity","error":"ResourceNotFound"}) +26ms
  azure:table Error code: ResourceNotFound for DELETE /TestRoles25(PartitionKey='test-role-8',RowKey='role') on retry: 0 +3s
  base:stats NullDrain.addPoint(AzureTableOperations, {"component":"auth-tests","process":"server","duration":2587.096901,"table":"TestRoles25","method":"deleteEntity","error":"ResourceNotFound"}) +3s
  base:exchanges Publishing message on exchange: exchange/jonasfj-debug/v1/role-deleted +1ms
  azure:table Request: GET /TestRoles25(PartitionKey='test-role-8',RowKey='role'), retry: 0 +1ms
  superagent clear timeout DELETE http://localhost:60551/v1/roles/test-role-8 +1ms
  taskcluster-client Request error calling deleteRole NOT retrying!, err: Error: socket hang up, JSON: Error: socket hang up +0ms
  1) indirect roles works (with 50 roles)
  mocha:runner run suite api (role logic) +8m
  base:stats NullDrain.addPoint(ResponseTimes, {"duration":2600.257307,"requestMethod":"delete","statusCode":200,"method":"deleteRole","component":"auth-tests","paramRoleId":"test-role-8"}) +12ms
  morgan log request +11ms
  base:stats NullDrain.addPoint(ExchangeReports, {"component":"auth-tests","process":"server","duration":32.148527,"routingKeys":1,"payloadSize":36,"exchange":"exchange/jonasfj-debug/v1/role-deleted","error":"false"}) +20ms
  mocha:runner finished running +27ms

they are actually socket hangups from the express server, after 3s.. I'm not totally sure how to read the debug time offsets -- I see "3s" twice, which suggests a total of 6s?  Anyway, worth noting that the superagent timeout is 3s.  Yet we're getting answers from Azure?

Jonas, is this obvious to you by any chance?
Flags: needinfo?(jopsen)
..and now I can't reproduce it.  However, I can confirm that debug() logs time offsets from the last message.  Something is really funny here.  Just in this interaction

  superagent DELETE http://localhost:60551/v1/roles/test-role-8 +1ms
  superagent timeout 30000ms DELETE http://localhost:60551/v1/roles/test-role-8 +0ms
..5 seconds pass..
  express:router dispatching DELETE /v1/roles/test-role-8 +5s
  express:router query  : /v1/roles/test-role-8 +0ms

I did notice, in trying to redirect the debug output to `tee` to get it to timestamp things, that half of the debug messages are to stdout and half to stderr, so there may be two sets of interleaved timings.

Anecdotally, watching the logs scroll by, it seems that the actual delay is after each 'base:stats NullDrain.addPoint' message, so the +3s log entries above.  I'm seeing +1s without failures.

It looks like there may also be pollution of the test tables; trying to bisect, I get

      AssertionError: entity.Version isn't 1
(Assignee)

Comment 6

3 years ago
> they are actually socket hangups from the express server, after 3s.. 
Ah, it's possible that we set timeout on the express server to be low..
Because we use keepAlive with taskcluster-client, and if we don't let the connection timeout,
it'll keep the test process alive forever. 

Example:
https://github.com/taskcluster/taskcluster-lib-testing/blob/b44e0ebf1f8b7e899aea70cded541758b9efcb07/src/mockauth.js#L200-L201

Here is the potential case:
https://github.com/taskcluster/taskcluster-auth/blob/master/test/helper.js#L66

Just saying maybe it's that? I'm not sure...
Flags: needinfo?(jopsen)
(Assignee)

Comment 7

3 years ago
Probably related to CPU usage... hopefully fixed in the current PR.
The last comments there still indicate Travis issues -- are those to do with the node upgrade, perhaps?
(Reporter)

Comment 9

3 years ago
Thanks guys for looking into this! =)
Assignee: dustin → jopsen
It's green now..
Status: NEW → RESOLVED
Last Resolved: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.