Closed Bug 716256 Opened 14 years ago Closed 14 years ago

Frequent talos failures on multiple trees starting ~7am 2012-01-07 MVT

Categories

(Testing :: Talos, defect)

defect
Not set
critical

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 716326

People

(Reporter: emorley, Unassigned)

Details

Talos tp has been failing much more frequently since ~7am 2012-01-07 MVT. edmorley: jmaher: any idea what could have broken things at 7am MVT today? jmaher: edmorley: no idea; I suspect there was either a hiccup on the network jmaher: or maybe some issues with disks on certain slaves edmorley: jmaher: multiple machines, trees and for several hours jmaher: edmorley: hmm edmorley: https://tbpl.mozilla.org/?jobname=talos%20tp and https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=talos%20tp (plus obviously press down arrow x a few) jmaher: odd that we are getting crashes and timeouts jmaher: edmorley: so this is the first instance I see on inbound: https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=talos%20tp&rev=757a48403154 jmaher: and that is on osx 64 bit jmaher: edmorley: but the frequency of that is pretty high; wonder if the responsiveness stuff is causing it to crash; that has been live for almost 2 months, but we had troubles initially with osx crashes jmaher: edmorley: it all sounds like a slave issue jmaher: or a drive that serves the files got corrupted Examples: * https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=talos%20tp&rev=879883efec3c * https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=talos%20tp&rev=1a4ef8ec3f5a * https://tbpl.mozilla.org/?tree=Mozilla-Inbound&jobname=talos%20tp&rev=757a48403154 * https://tbpl.mozilla.org/?jobname=talos%20tp&rev=ff517d1a0c4a * https://tbpl.mozilla.org/?jobname=talos%20tp&rev=5a446202be5f
Summary: Frequent talos tp failures on multiple trees starting ~7am 2012-01-07 MVT → Frequent talos failures on multiple trees starting ~7am 2012-01-07 MVT
Blocks: 438871
Whiteboard: [orange]
Depends on: 716326
I only see two things in there: bug 716326, something broke on talos-r4-snow-007 and it chewed up 30 jobs, and a big cluster of bug 664371, where we hang after the log line "loaded http://localhost/page_load_test/tp5/goo.ne.jp/goo.ne.jp/index.html (next: http://localhost/page_load_test/tp5/alipay.com/www.alipay.com/index.html)" which from what alice told me long ago means that we have actually finished loading the goo.ne.jp page, and we are loading the aliplay.com page. Far and away the most likely two things are that something snuck through in the process of turning it into a local-only page, and it hits the network for something which intermittently fails in a nasty way, or, like the classic zombocom episode, it intermittently gets busted DNS and a DNS lookup from something like prefetching gets stuck and hangs the browser. Either way, it's going to be someone with access to the pageset who determines what's actually happening, but either way it should have considerably higher priority than it has ever gotten, since it's entirely reasonable to expect that a perf test which intermittently hits clusters of hangs also intermittently hit clusters of slows, and if I landed something that regressed tp, I'd just sit and argue and delay for as long as possible, waiting to see whether my "regression" just disappeared when either the buildfarm DNS server or alipay's DNS or the buildfarm network or alipay's servers got fixed. That sort of utter lack of faith in a test is... not ideal. Are there any that I missed noticing, which are both not talos-r4-snow-007 and are not a tp5 run which hung after the "(next: http://localhost/page_load_test/tp5/alipay.com/www.alipay.com/index.html)" line?
(In reply to Phil Ringnalda (:philor) from comment #7) > Are there any that I missed noticing, which are both not talos-r4-snow-007 > and are not a tp5 run which hung after the "(next: > http://localhost/page_load_test/tp5/alipay.com/www.alipay.com/index.html)" > line? I don't believe so, your analysis looks correct. Think this was just a case of more-caffeine required on my part, with accidentally including the alipay logs, which then confused my results in comment 0. Thanks for making sense of it :-) Going to dupe to bug 716326.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → DUPLICATE
No longer blocks: 438871
No longer depends on: 716326
Whiteboard: [orange]
You need to log in before you can comment on or make changes to this bug.