Closed Bug 1240452 Opened 9 years ago Closed 9 years ago

taskcluster-auth: CI broken

Categories

(Taskcluster :: Services, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: pmoore, Assigned: jonasfj)

References

Details

Blocks: 1240465
Assignee: jopsen → dustin
I'd like to get this to the point where it's green for PRs, but that's not something I can work on soon. It will require a fake pulse, fake AWS, and support from azure-entities for fake SAS URLs. Pete, if you think green on master is enough, please close.
Assignee: dustin → nobody
Flags: needinfo?(pmoore)
Eh, master's not green :( It fails against the real Azure with what seems to be timeouts.
Assignee: nobody → dustin
Flags: needinfo?(pmoore)
taskcluster-client Calling: deleteRole, retry: 4 +1ms superagent max redirects 5 +0ms superagent set Authorization "Hawk id="root", ts="1453841917", nonce="vB7Nos", mac="lRSkyLjLKyIJM/yQ6C84uVtaTkWPX+7KGA2glgRaGuw="" +0ms superagent DELETE http://localhost:60551/v1/roles/test-role-8 +1ms superagent timeout 30000ms DELETE http://localhost:60551/v1/roles/test-role-8 +0ms express:router dispatching DELETE /v1/roles/test-role-8 +5s express:router query : /v1/roles/test-role-8 +0ms express:router expressInit : /v1/roles/test-role-8 +0ms express:router logger : /v1/roles/test-role-8 +0ms express:router trim prefix (/v1) from url /v1/roles/test-role-8 +0ms express:router router /v1 : /v1/roles/test-role-8 +0ms express:router dispatching DELETE /roles/test-role-8 +3s express:router textParser : /v1/roles/test-role-8 +0ms body-parser:text skip empty body +5s express:router <anonymous> : /v1/roles/test-role-8 +0ms express:router <anonymous> : /v1/roles/test-role-8 +0ms azure:table Request: DELETE /TestRoles25(PartitionKey='test-role-8',RowKey='role'), retry: 0 +2ms azure:table Error code: ResourceNotFound for GET /TestRoles25(PartitionKey='test-role-8',RowKey='role') on retry: 0 +28ms base:stats NullDrain.addPoint(AzureTableOperations, {"component":"auth-tests","process":"server","duration":31.85746,"table":"TestRoles25","method":"getEntity","error":"ResourceNotFound"}) +29ms azure:table Error code: ResourceNotFound for DELETE /TestRoles25(PartitionKey='test-role-8',RowKey='role') on retry: 0 +3s base:stats NullDrain.addPoint(AzureTableOperations, {"component":"auth-tests","process":"server","duration":2561.60694,"table":"TestRoles25","method":"deleteEntity","error":"ResourceNotFound"}) +3s base:exchanges Publishing message on exchange: exchange/jonasfj-debug/v1/role-deleted +0ms azure:table Request: GET /TestRoles25(PartitionKey='test-role-8',RowKey='role'), retry: 0 +1ms superagent clear timeout DELETE http://localhost:60551/v1/roles/test-role-8 +1ms taskcluster-client Request error calling deleteRole (retrying), err: Error: socket hang up, JSON: Error: socket hang up +0ms base:stats NullDrain.addPoint(ResponseTimes, {"duration":2563.786103,"requestMethod":"delete","statusCode":200,"method":"deleteRole","component":"auth-tests","paramRoleId":"test-role-8"}) +1ms morgan log request +0ms base:stats NullDrain.addPoint(ExchangeReports, {"component":"auth-tests","process":"server","duration":23.322495,"routingKeys":1,"payloadSize":36,"exchange":"exchange/jonasfj-debug/v1/role-deleted","error":"false"}) +22ms azure:table Error code: ResourceNotFound for GET /TestRoles25(PartitionKey='test-role-8',RowKey='role') on retry: 0 +29ms base:stats NullDrain.addPoint(AzureTableOperations, {"component":"auth-tests","process":"server","duration":30.501236,"table":"TestRoles25","method":"getEntity","error":"ResourceNotFound"}) +8ms azure:table Request: GET /TestRoles25(PartitionKey='test-role-8',RowKey='role'), retry: 0 +3s taskcluster-client Calling: deleteRole, retry: 5 +1ms superagent max redirects 5 +0ms superagent set Authorization "Hawk id="root", ts="1453841922", nonce="qoFmaf", mac="XW9OFjRockpbU5ce0yPnFqm1FFGrQvUSJanpCOYlqBI="" +0ms superagent DELETE http://localhost:60551/v1/roles/test-role-8 +1ms superagent timeout 30000ms DELETE http://localhost:60551/v1/roles/test-role-8 +0ms express:router dispatching DELETE /v1/roles/test-role-8 +5s express:router query : /v1/roles/test-role-8 +0ms express:router expressInit : /v1/roles/test-role-8 +0ms express:router logger : /v1/roles/test-role-8 +0ms express:router trim prefix (/v1) from url /v1/roles/test-role-8 +0ms express:router router /v1 : /v1/roles/test-role-8 +0ms express:router dispatching DELETE /roles/test-role-8 +3s express:router textParser : /v1/roles/test-role-8 +0ms body-parser:text skip empty body +5s express:router <anonymous> : /v1/roles/test-role-8 +0ms express:router <anonymous> : /v1/roles/test-role-8 +0ms azure:table Request: DELETE /TestRoles25(PartitionKey='test-role-8',RowKey='role'), retry: 0 +2ms azure:table Error code: ResourceNotFound for GET /TestRoles25(PartitionKey='test-role-8',RowKey='role') on retry: 0 +25ms base:stats NullDrain.addPoint(AzureTableOperations, {"component":"auth-tests","process":"server","duration":29.163865,"table":"TestRoles25","method":"getEntity","error":"ResourceNotFound"}) +26ms azure:table Error code: ResourceNotFound for DELETE /TestRoles25(PartitionKey='test-role-8',RowKey='role') on retry: 0 +3s base:stats NullDrain.addPoint(AzureTableOperations, {"component":"auth-tests","process":"server","duration":2587.096901,"table":"TestRoles25","method":"deleteEntity","error":"ResourceNotFound"}) +3s base:exchanges Publishing message on exchange: exchange/jonasfj-debug/v1/role-deleted +1ms azure:table Request: GET /TestRoles25(PartitionKey='test-role-8',RowKey='role'), retry: 0 +1ms superagent clear timeout DELETE http://localhost:60551/v1/roles/test-role-8 +1ms taskcluster-client Request error calling deleteRole NOT retrying!, err: Error: socket hang up, JSON: Error: socket hang up +0ms 1) indirect roles works (with 50 roles) mocha:runner run suite api (role logic) +8m base:stats NullDrain.addPoint(ResponseTimes, {"duration":2600.257307,"requestMethod":"delete","statusCode":200,"method":"deleteRole","component":"auth-tests","paramRoleId":"test-role-8"}) +12ms morgan log request +11ms base:stats NullDrain.addPoint(ExchangeReports, {"component":"auth-tests","process":"server","duration":32.148527,"routingKeys":1,"payloadSize":36,"exchange":"exchange/jonasfj-debug/v1/role-deleted","error":"false"}) +20ms mocha:runner finished running +27ms they are actually socket hangups from the express server, after 3s.. I'm not totally sure how to read the debug time offsets -- I see "3s" twice, which suggests a total of 6s? Anyway, worth noting that the superagent timeout is 3s. Yet we're getting answers from Azure? Jonas, is this obvious to you by any chance?
Flags: needinfo?(jopsen)
..and now I can't reproduce it. However, I can confirm that debug() logs time offsets from the last message. Something is really funny here. Just in this interaction superagent DELETE http://localhost:60551/v1/roles/test-role-8 +1ms superagent timeout 30000ms DELETE http://localhost:60551/v1/roles/test-role-8 +0ms ..5 seconds pass.. express:router dispatching DELETE /v1/roles/test-role-8 +5s express:router query : /v1/roles/test-role-8 +0ms I did notice, in trying to redirect the debug output to `tee` to get it to timestamp things, that half of the debug messages are to stdout and half to stderr, so there may be two sets of interleaved timings. Anecdotally, watching the logs scroll by, it seems that the actual delay is after each 'base:stats NullDrain.addPoint' message, so the +3s log entries above. I'm seeing +1s without failures. It looks like there may also be pollution of the test tables; trying to bisect, I get AssertionError: entity.Version isn't 1
> they are actually socket hangups from the express server, after 3s.. Ah, it's possible that we set timeout on the express server to be low.. Because we use keepAlive with taskcluster-client, and if we don't let the connection timeout, it'll keep the test process alive forever. Example: https://github.com/taskcluster/taskcluster-lib-testing/blob/b44e0ebf1f8b7e899aea70cded541758b9efcb07/src/mockauth.js#L200-L201 Here is the potential case: https://github.com/taskcluster/taskcluster-auth/blob/master/test/helper.js#L66 Just saying maybe it's that? I'm not sure...
Flags: needinfo?(jopsen)
Probably related to CPU usage... hopefully fixed in the current PR.
The last comments there still indicate Travis issues -- are those to do with the node upgrade, perhaps?
Thanks guys for looking into this! =)
Assignee: dustin → jopsen
It's green now..
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Component: Authentication → Services
You need to log in before you can comment on or make changes to this bug.