Closed Bug 1299814 Opened 9 years ago Closed 8 years ago

Intermittent [taskcluster:error] Task was aborted because states could not be created successfully. Error calling 'link' for taskclusterProxy : Failed to initialize taskcluster proxy service.

Categories

(Taskcluster :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: intermittent-failure, Whiteboard: [docker-link-failure][stockwell infra][fennec-scouting])

I'm guessing this shares a common cause or remediation with bug 1285090, which has a lot more stars.
Whiteboard: [docker-error-pulling]
Whiteboard: [docker-error-pulling] → [docker-link-failure]
No longer blocks: 1307493
Depends on: 1307493
:dustin, this error spiked recently- can you take a stab at fixing this?
Flags: needinfo?(dustin)
Greg is already working on it in the blocker bug.
Flags: needinfo?(dustin)
A large majority of the ones I'm seeing for the last couple of days seem to be due to a 503 error returned from heroku, which seems to be a request timeout. I've asked Jonas to take a look at determining root cause. We do use a client which retries before failing completely.
Greg, as a note, this spiked yesterday- not sure if this is a one time trend, or a larger issue
I noticed we had a server side taskcluster-proxy crash which caused https://tools.taskcluster.net/task-inspector/#Q1DigDtfSc-vq55qXNPSNA/0 failure: From https://papertrailapp.com/systems/686520052/events?highlight=765432606006349830&focus=765432606006349830 unexpected fault address 0x0#015 fatal error: fault#015 [signal SIGSEGV: segmentation violation code=0x80 addr=0x0 pc=0x45bc79]#015 #015 goroutine 11 [running]:#015 runtime.throw(0x6c8b59, 0x5)#015 #011/usr/local/go/src/runtime/panic.go:566 +0x95 fp=0xc42018cc98 sp=0xc42018cc78#015 runtime.sigpanic()#015 #011/usr/local/go/src/runtime/sigpanic_unix.go:27 +0x288 fp=0xc42018ccf0 sp=0xc42018cc98#015 runtime.memmove(0xc4201f9fe0, 0x0, 0x1e0d8ce2189f0a78)#015 #011/usr/local/go/src/runtime/memmove_amd64.s:129 +0x1e9 fp=0xc42018ccf8 sp=0xc42018ccf0#015 reflect.typedslicecopy(0x667840, 0xc4201f9fe0, 0x4, 0x4, 0x0, 0xa3c1b19c4313e14f, 0x57f6bc6680176e91, 0x412935)#015 #011/usr/local/go/src/runtime/mbarrier.go:303 +0x6a fp=0xc42018cd50 sp=0xc42018ccf8#015 reflect.Copy(0x680260, 0xc420080630, 0x197, 0x663a80, 0xc420202000, 0x97, 0x97)#015 #011/usr/local/go/src/reflect/value.go:1873 +0x1e0 fp=0xc42018cdf0 sp=0xc42018cd50#015 encoding/asn1.parseField(0x680260, 0xc420080630, 0x197, 0xc42016737e, 0x1d, 0x153, 0x0, 0x0, 0x0, 0x0, ...)#015 #011/usr/local/go/src/encoding/asn1/asn1.go:775 +0x1f6b fp=0xc42018d4d0 sp=0xc42018cdf0#015 encoding/asn1.parseField(0x6987c0, 0xc420080630, 0x199, 0xc42016737c, 0x40, 0x155, 0x0, 0x0, 0x0, 0x0, ...)#015 #011/usr/local/go/src/encoding/asn1/asn1.go:856 +0x6c7 fp=0xc42018dbb0 sp=0xc42018d4d0#015 encoding/asn1.parseSequenceOf(0xc42016737c, 0x40, 0x155, 0x7edec0, 0x663700, 0x7edec0, 0x6987c0, 0x1, 0x1f4, 0x0, ...)#015 #011/usr/local/go/src/encoding/asn1/asn1.go:563 +0x485 fp=0xc42018dcf0 sp=0xc42018dbb0#015 encoding/asn1.parseField(0x663700, 0xc4201fe538, 0x197, 0xc420167188, 0x234, 0x349, 0x1f0, 0x101, 0x0, 0xc4201fb2b8, ...)#015 #011/usr/local/go/src/encoding/asn1/asn1.go:872 +0x1161 fp=0xc42018e3d0 sp=0xc42018dcf0#015 encoding/asn1.parseField(0x6b8580, 0xc4201fe318, 0x199, 0xc420167184, 0x34c, 0x34d, 0x0, 0x0, 0x0, 0x0, ...)#015 #011/usr/local/go/src/encoding/asn1/asn1.go:856 +0x6c7 fp=0xc42018eab0 sp=0xc42018e3d0#015 encoding/asn1.parseField(0x6a2240, 0xc4201fe300, 0x199, 0xc420167180, 0x350, 0x351, 0x0, 0x0, 0x0, 0x0, ...)#015 #011/usr/local/go/src/encoding/asn1/asn1.go:856 +0x6c7 fp=0xc42018f190 sp=0xc42018eab0#015 encoding/asn1.UnmarshalWithParams(0xc420167180, 0x350, 0x351, 0x65ab40, 0xc4201fe300, 0x0, 0x0, 0x411e68, 0x300, 0x6a2240, ...)#015 #011/usr/local/go/src/encoding/asn1/asn1.go:995 +0x14f fp=0xc42018f268 sp=0xc42018f190#015 encoding/asn1.Unmarshal(0xc420167180, 0x350, 0x351, 0x65ab40, 0xc4201fe300, 0xc42019bb21, 0xc4201eb0b0, 0xc4201eb080, 0xc42019bb21, 0x1)#015 #011/usr/local/go/src/encoding/asn1/asn1.go:988 +0x72 fp=0xc42018f2d8 sp=0xc42018f268#015 crypto/x509.ParseCertificate(0xc420167180, 0x350, 0x351, 0xb, 0xc42019bf01, 0x3efeb)#015 #011/usr/local/go/src/crypto/x509/x509.go:1193 +0x95 fp=0xc42018f350 sp=0xc42018f2d8#015 crypto/x509.(*CertPool).AppendCertsFromPEM(0xc420165a40, 0xc42019bfb9, 0x3efeb, 0x3f1eb, 0x431a4)#015 #011/usr/local/go/src/crypto/x509/cert_pool.go:108 +0x126 fp=0xc42018f3a8 sp=0xc42018f350#015 crypto/x509.loadSystemRoots(0xc42018f4c8, 0xc42018f4d0, 0x50ca8d)#015 #011/usr/local/go/src/crypto/x509/root_unix.go:31 +0x22b fp=0xc42018f490 sp=0xc42018f3a8#015 crypto/x509.initSystemRoots()#015 #011/usr/local/go/src/crypto/x509/root.go:21 +0x26 fp=0xc42018f4c8 sp=0xc42018f490#015 sync.(*Once).Do(0x821488, 0x6edc60)#015 #011/usr/local/go/src/sync/once.go:44 +0xdb fp=0xc42018f500 sp=0xc42018f4c8#015 crypto/x509.systemRootsPool(0x0)#015 #011/usr/local/go/src/crypto/x509/root.go:16 +0x39 fp=0xc42018f520 sp=0xc42018f500#015 crypto/x509.(*Certificate).Verify(0xc420075680, 0xc420119820, 0x15, 0xc4201659b0, 0x0, 0xed02b1870, 0x11a6452a, 0x804da0, 0x0, 0x0, ...)#015 #011/usr/local/go/src/crypto/x509/verify.go:247 +0x666 fp=0xc42018f770 sp=0xc42018f520#015 crypto/tls.(*clientHandshakeState).doFullHandshake(0xc42018fe08, 0xc42016c3c0, 0x59)#015 #011/usr/local/go/src/crypto/tls/handshake_client.go:300 +0x221f fp=0xc42018fbf0 sp=0xc42018f770#015 crypto/tls.(*Conn).clientHandshake(0xc420166a80, 0x6ee660, 0xc420166b88)#015 #011/usr/local/go/src/crypto/tls/handshake_client.go:228 +0xfd1 fp=0xc42018fec0 sp=0xc42018fbf0#015 crypto/tls.(*Conn).Handshake(0xc420166a80, 0x0, 0x0)#015 #011/usr/local/go/src/crypto/tls/conn.go:1260 +0x1b8 fp=0xc42018ff30 sp=0xc42018fec0#015 net/http.(*Transport).dialConn.func3(0xc420166a80, 0xc42000d780, 0xc42016c300)#015 #011/usr/local/go/src/net/http/transport.go:1033 +0x2f fp=0xc42018ff78 sp=0xc42018ff30#015 runtime.goexit()#015 #011/usr/local/go/src/runtime/asm_amd64.s:2086 +0x1 fp=0xc42018ff80 sp=0xc42018ff78#015 created by net/http.(*Transport).dialConn#015 #011/usr/local/go/src/net/http/transport.go:1038 +0xb4f#015 #015 goroutine 1 [select]:#015 net/http.(*Transport).getConn(0xc42000a1e0, 0xc420119800, 0x0, 0xc42000d100, 0x5, 0xc420119820, 0x19, 0x0, 0x0, 0x8059c0)#015 #011/usr/local/go/src/net/http/transport.go:890 +0x9cc#015 net/http.(*Transport).RoundTrip(0xc42000a1e0, 0xc42000aff0, 0xc42000a1e0, 0x0, 0xc400000000)#015 #011/usr/local/go/src/net/http/transport.go:367 +0x307#015 net/http.send(0xc42000aff0, 0x7e5dc0, 0xc42000a1e0, 0x0, 0x0, 0x0, 0x8, 0xc4200334d8, 0xc420020538)#015 #011/usr/local/go/src/net/http/client.go:256 +0x15f#015 net/http.(*Client).send(0xc420033770, 0xc42000aff0, 0x0, 0x0, 0x0, 0xc420020538, 0x0, 0x1)#015 #011/usr/local/go/src/net/http/client.go:146 +0x102#015 net/http.(*Client).doFollowingRedirects(0xc420033770, 0xc42000aff0, 0x6ee190, 0x3, 0x1, 0x0)#015 #011/usr/local/go/src/net/http/client.go:528 +0x5e5#015 net/http.(*Client).Do(0xc420033770, 0xc42000aff0, 0x0, 0x0, 0x10)#015 #011/usr/local/go/src/net/http/client.go:184 +0x1ea#015 github.com/taskcluster/taskcluster-client-go.(*ConnectionData).Request.func1(0xc420033800, 0x45c033, 0x58992170, 0x9acfaa2, 0xc420033820)#015 #011/home/jonasfj/Mozilla/go/src/github.com/taskcluster/taskcluster-client-go/http.go:87 +0x32f#015 github.com/taskcluster/httpbackoff.(*Client).Retry.func1(0xc420047800, 0x411e68)#015 #011/home/jonasfj/Mozilla/go/src/github.com/taskcluster/httpbackoff/httpbackoff.go:86 +0x6c#015 github.com/cenkalti/backoff.RetryNotify(0xc420049ce0, 0x7e8180, 0xc420047800, 0x6ede30, 0xc41fffdc42, 0xc4200339d0)#015 #011/home/jonasfj/Mozilla/go/src/github.com/cenkalti/backoff/retry.go:32 +0x3f#015 github.com/taskcluster/httpbackoff.(*Client).Retry(0x803830, 0xc4200477a0, 0x60, 0x6acb40, 0x1, 0xc4200477a0)#015 #011/home/jonasfj/Mozilla/go/src/github.com/taskcluster/httpbackoff/httpbackoff.go:125 +0x21b#015 github.com/taskcluster/httpbackoff.Retry(0xc4200477a0, 0xc4200477a0, 0x0, 0x0, 0x0)#015 #011/home/jonasfj/Mozilla/go/src/github.com/taskcluster/httpbackoff/httpbackoff.go:139 +0x37#015 github.com/taskcluster/taskcluster-client-go.(*ConnectionData).Request(0xc4201194a0, 0x821418, 0x0, 0x0, 0x6c8557, 0x3, 0xc4201194c0, 0x1c, 0x0, 0x0, ...)#015 #011/home/jonasfj/Mozilla/go/src/github.com/taskcluster/taskcluster-client-go/http.go:93 +0x198#015 github.com/taskcluster/taskcluster-client-go.(*ConnectionData).APICall(0xc4201194a0, 0x0, 0x0, 0x6c8557, 0x3, 0xc4201194c0, 0x1c, 0x65b7c0, 0xc4200b4300, 0x0, ...)#015 #011/home/jonasfj/Mozilla/go/src/github.com/taskcluster/taskcluster-client-go/http.go:139 +0x129#015 github.com/taskcluster/taskcluster-client-go/queue.(*Queue).Task(0xc420033e70, 0x7fff4b3e0df5, 0x16, 0xc420033de8, 0x1, 0xc420048f01)#015 #011/home/jonasfj/Mozilla/go/src/github.com/taskcluster/taskcluster-client-go/queue/queue.go:89 +0x180#015 main.main()#015 #011/home/jonasfj/Mozilla/go/src/github.com/taskcluster/taskcluster-proxy/main.go:92 +0x954#015 #015 goroutine 5 [chan receive]:#015 net/http.(*Transport).dialConn(0xc42000a1e0, 0x7ea100, 0xc4200123c8, 0x0, 0xc42000d100, 0x5, 0xc420119820, 0x19, 0x0, 0x0, ...)#015 #011/usr/local/go/src/net/http/transport.go:1039 +0xb91#015 net/http.(*Transport).getConn.func4(0xc42000a1e0, 0x7ea100, 0xc4200123c8, 0xc420164330, 0xc4200479e0)#015 #011/usr/local/go/src/net/http/transport.go:885 +0x78#015 created by net/http.(*Transport).getConn#015 #011/usr/local/go/src/net/http/transport.go:887 +0x3a1#015
this spiked up greatly starting last night and this morning, can we document what happened here?
Flags: needinfo?(garndt)
Spot checking about a dozen of these it is largely concentrated around retrieving the task definition when using the taskclusterProxy. Yesterday we were having a lot of timeouts within the taskcluster-queue that has been since addressed. These failures align with the period of time we were having timeouts. While investigating this, we did identify two things that should be addressed (but were not the root cause of the timeouts): https://bugzilla.mozilla.org/show_bug.cgi?id=1338611 https://bugzilla.mozilla.org/show_bug.cgi?id=1338630
Flags: needinfo?(garndt)
Whiteboard: [docker-link-failure] → [docker-link-failure][stockwell infra]
a pickup in aurora failures, but not enough to get me to investigate more.
Whiteboard: [docker-link-failure][stockwell infra] → [docker-link-failure][stockwell infra][fennec-scouting]
Going to close this out, last failure was last month and we have since implemented better retry mechanisms.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.