Closed
Bug 363292
Opened 19 years ago
Closed 17 years ago
win32 tinderbox hangs with connection::get_buffer() unexpected error from select: 10093
Categories
(Release Engineering :: General, defect, P3)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: philor, Unassigned)
References
()
Details
Around 19:00 2006-12-08 both Firefox Windows tinderboxes stopped building (or at least stopped showing up in showbuilds.cgi). Nobody seemed to be around in #build last night or this morning, and nobody in #developers seemed to have an interest in sheriffing, though 12 hours without Windows builds seems to me like a tree-closer; ccing a few of the most likely to decide about that instead.
It may just be one tinderbox, since the second one just does perf tests on the other's builds (and if there aren't any builds, it probably won't do anything).
Comment 2•19 years ago
|
||
It's been long enough for them to completely drop off the page by now.
Comment 3•19 years ago
|
||
shouldn't the tree be closed at this point?
Comment 5•19 years ago
|
||
(In reply to comment #1)
> It may just be one tinderbox, since the second one just does perf tests on the
> other's builds (and if there aren't any builds, it probably won't do anything).
That is correct, test-only machines only test when the tinderbox status is "success" and a new file has been published to the hourly dir.
fx-win32-tbox has been restarted and is reporting now, closing the bug.
Status: NEW → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Comment 6•19 years ago
|
||
Unfortunately it stopped again
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
| Reporter | ||
Comment 7•19 years ago
|
||
And, curiously, at the moment sea-win32-tbox has been pretending to build SeaMonkey for seven hours, sb-win32-tbox last built Calendar 12 hours ago, moz180-win32-tbox and tb180-win32-tbox both stopped more than 12 hours ago, and fxexp-win32-tbox stopped trying to build XULRunner more than 24 hours ago. Unless I'm forgetting something, Thunderbird trunk and the 1.8 branch are the only things with working Windows tinderboxes.
Comment 8•19 years ago
|
||
They are all exhibiting the same problem as earlier, looping:
connection::get_buffer() unexpected error from select: 10093
connection::get_buffer() unexpected error: 10093
I'll go through and restart them, I'm expecting it to happen again at this point since it seems to be happening on all of our trunk ref VMs (and happened again after a restart of fx/fxexp by me this afternoon) so we should start looking for the root cause.
Comment 9•19 years ago
|
||
(In reply to comment #8)
> They are all exhibiting the same problem as earlier, looping:
>
> connection::get_buffer() unexpected error from select: 10093
> connection::get_buffer() unexpected error: 10093
Here is a breakdown:
* fx-win32-tbox was looping the errors above
* sea-win32-tbox, moz180-win32-tbox, tb180-win32-tbox do not have extra logging on; apparently stopped between builds, reason unknown (possibly same as above)
* sb-win32-tbox seems fine
Restarted tinderbox on all.
Status: REOPENED → RESOLVED
Closed: 19 years ago → 19 years ago
Resolution: --- → FIXED
Comment 10•19 years ago
|
||
Comment 11•19 years ago
|
||
there we go again....
Gone since 5 hours
Comment 12•19 years ago
|
||
(In reply to comment #8)
> They are all exhibiting the same problem as earlier, looping:
>
> connection::get_buffer() unexpected error from select: 10093
> connection::get_buffer() unexpected error: 10093
>
> I'll go through and restart them, I'm expecting it to happen again at this
> point since it seems to be happening on all of our trunk ref VMs (and happened
> again after a restart of fx/fxexp by me this afternoon) so we should start
> looking for the root cause.
Did this happen on the testing tinderboxen, or on the build tinderboxen?
I ask because if it's the former, then those aren't really ref VMs, correct? If they are, then I'll help investigate the problem. If they aren't, then we should try to test this condition with the new testing ref VMs which, to my knowledge, don't yet exist.
Comment 13•19 years ago
|
||
(In reply to comment #8)
> They are all exhibiting the same problem as earlier, looping:
>
> connection::get_buffer() unexpected error from select: 10093
> connection::get_buffer() unexpected error: 10093
>
> I'll go through and restart them, I'm expecting it to happen again at this
> point since it seems to be happening on all of our trunk ref VMs (and happened
> again after a restart of fx/fxexp by me this afternoon) so we should start
> looking for the root cause.
Just for the record, this error appears to be coming from blat:
http://www.google.com/codesearch?hl=en&q=+connection::get_buffer+show:Udf_fs4xs4M:NqTyq18_Bfs:abj9dJt_89g&sa=N&cd=4&ct=rc&cs_p=http://www.enterprisesecuritysolutions.net/files/checksyslog.zip&cs_f=checksyslog/blat194/GENSOCK/gensock.cpp#a0
And the select error is WSAENOTINITIALIZED:
http://support.ipswitch.com/kb/WSK-19980714-EM37.htm
Comment 14•19 years ago
|
||
(In reply to comment #12)
> (In reply to comment #8)
> > They are all exhibiting the same problem as earlier, looping:
> >
> > connection::get_buffer() unexpected error from select: 10093
> > connection::get_buffer() unexpected error: 10093
> >
> > I'll go through and restart them, I'm expecting it to happen again at this
> > point since it seems to be happening on all of our trunk ref VMs (and happened
> > again after a restart of fx/fxexp by me this afternoon) so we should start
> > looking for the root cause.
>
> Did this happen on the testing tinderboxen, or on the build tinderboxen?
>
> I ask because if it's the former, then those aren't really ref VMs, correct? If
> they are, then I'll help investigate the problem. If they aren't, then we
> should try to test this condition with the new testing ref VMs which, to my
> knowledge, don't yet exist.
The build machines; see comment 9.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
| Reporter | ||
Comment 15•19 years ago
|
||
Dropping the severity, since we've now made it through two open-tree weekends without any Windows perf data, and it doesn't seem to have blocked anyone's development and/or testing work.
Severity: blocker → normal
Comment 16•19 years ago
|
||
Uh... No. This IS a blocker, and the tree should NOT be open until this bug is resolved. If it's open now, it needs to be closed.
Just because people suck and check in when we don't have perf metrics doesn't mean doing that is OK; if they can't figure out that a box is missing (because that's hard to realize with tinderbox) we have to close the tree to make it clear that checking in is not allowed.
That said, the machines are up, no? Why is this bug still open?
Severity: normal → blocker
Comment 17•19 years ago
|
||
We have better monitoring in place now, so build@mozilla.org and #build are notified as builds fall off the tinderbox page.
Looks like the root cause of this particular may be blat, I am going to change the summary of this bug to reflect that. Please open a new bug if you see any more tinderbox problems, and we can either point back to this one if it turns out to be the cause or investigate new issues that come up.
Summary: Windows Firefox tinderboxes have gone AWOL → win32 tinderbox hangs with connection::get_buffer() unexpected error from select: 10093
Updated•19 years ago
|
Assignee: rhelmer → build
Severity: blocker → major
Status: REOPENED → NEW
Updated•18 years ago
|
Priority: -- → P3
Updated•18 years ago
|
Severity: major → normal
| Reporter | ||
Comment 18•18 years ago
|
||
Don't suppose anyone happens to remember exactly when it was that someone got pissed off about this, and updated blat, and on which machine(s)? I seem to be filing on tb-win32-tbox a fair bit, but haven't filed on fx-win32-tbox in quite a while (which could just be more aggressive monitoring getting to its hangs before I do).
Updated•18 years ago
|
Assignee: build → nobody
QA Contact: mozpreed → build
Comment 19•18 years ago
|
||
Has this recurred recently?
I haven't had to unstick a machine from this condition myself in ~6 months, but server-ops may be getting paged before I even notice it.
Moving to MozillaBuild has helped us a bit here, but we still have lots of Windows tinderboxes running cygwin+blat. I'd like to close this, but am reluctant to do so if the issue is still lurking.
Comment 20•17 years ago
|
||
Seems likely this was fixed by reference platform upgrades.
Status: NEW → RESOLVED
Closed: 19 years ago → 17 years ago
Resolution: --- → FIXED
| Assignee | ||
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•