Closed
Bug 458939
Opened 17 years ago
Closed 16 years ago
buildbotcustom.steps.test.GraphServerPost needs to use Twisted's http library instead of urlopen()
Categories
(Release Engineering :: General, defect, P2)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bhearsum, Assigned: bhearsum)
Details
Attachments
(1 file, 2 obsolete files)
|
4.81 KB,
patch
|
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
The moz2 Buildbot hung twice today because it couldn't submit results to the graph server. This caused the tree to burn, forced us to restarted release builds, and required lots of manually intervention on windows slaves because of hung processes.
Like I mentioned in the original bug, we really need to be using Twisted's http library here.
https://bugzilla.mozilla.org/show_bug.cgi?id=433710#c25
https://bugzilla.mozilla.org/show_bug.cgi?id=458937
| Assignee | ||
Updated•17 years ago
|
Component: Release Engineering → Release Engineering: Future
| Assignee | ||
Comment 2•16 years ago
|
||
This bug needs to get resolved soon. Again we had something up with graphs.m.o hang the master.
Severity: major → critical
Component: Release Engineering: Future → Release Engineering
Comment 3•16 years ago
|
||
A quick fix here would be to add an appropriate socket timeout value and then catch the exception that occurs on timeout. This is how talos deal with graph server issues - it catches the timeout and then reports an appropriate error.
In the case of build machines it could report the error and then continue with normal operation.
| Assignee | ||
Comment 4•16 years ago
|
||
Unfortunately, urlopen doesn't have a 'timeout' option, and I don't think we want to use socket.setdefaulttimeout() - that would effect the master as a whole (and probably cause an implosion).
If no one else takes this by the new year, I guess I will.
| Assignee | ||
Updated•16 years ago
|
Assignee: nobody → bhearsum
Priority: -- → P3
| Assignee | ||
Comment 5•16 years ago
|
||
I'm testing a fix for this today.
Status: NEW → ASSIGNED
Priority: P3 → P2
| Assignee | ||
Comment 6•16 years ago
|
||
Okay, here's a non-blocking version of this BuildStep. Each getPage Deferred has a callback which does the TinderboxPrint, and an errback to detect failure. The DeferredList fires once all of the getPaged Deferreds have finished, and does the final log printing and determines overall success or failure.
I had to add flunkOnFailure=True to make sure this step burns the overall build.
I couldn't figure out a way to cause the HTTP posts to hang, so I added '127.0.0.1 graphs-stage.mozilla.org' to test the errbacks and failure code. Unfortunately, this means that I couldn't 100% confirm that the hang is gone, but this code is non-blocking, so I'm pretty confident about it.
Attachment #358220 -
Flags: review?(l10n)
| Assignee | ||
Updated•16 years ago
|
Attachment #358220 -
Flags: review?(anodelman)
Updated•16 years ago
|
Attachment #358220 -
Flags: review?(l10n) → review+
Comment 7•16 years ago
|
||
Comment on attachment 358220 [details] [diff] [review]
GraphServerPost, nonblocking
r=me. Two nits, I'd use
if "RETURN:" in line:
instead of
if line.find("RETURN:") > -1:
The latter is too perlish ;-)
A bit more substance has:
I'd move from self.addCompleteLog() in postFinished() to self.addStdout in doTinderboxPrint(). That gives you incremental logs as the posts succeed. And you don't have to carry around .summary.
Nice to see us moving over to POST on the way, did you make sure that the graphserver takes those allright? At least it looks like that used to be GET requests.
Updated•16 years ago
|
Attachment #358220 -
Flags: review?(anodelman) → review+
Comment 8•16 years ago
|
||
Comment on attachment 358220 [details] [diff] [review]
GraphServerPost, nonblocking
This looks fine to me as well, has it been running on stage at all?
| Assignee | ||
Comment 9•16 years ago
|
||
Thanks for the nits Axel, I'll address them and re-post.
Alice, I've run this on staging-master in both the leak test and codesighs scenario without issue.
| Assignee | ||
Comment 10•16 years ago
|
||
Okay, I've addressed your nits here Axel. I thought about trying to get rid of postFinished, but I don't think I can without bailing after the first error, which doesn't seem ideal. I think it's cleaner to set overall step status here, anyways.
Attachment #358220 -
Attachment is obsolete: true
Attachment #358391 -
Flags: review?(l10n)
Comment 11•16 years ago
|
||
Comment on attachment 358391 [details] [diff] [review]
GraphServerPost, nonblocking, nits addressed
and yet another nit, in postFailed, I'd self.stdio.addStderr instead of addStdout. Didn't occur to me that we can change that one on the first round of review.
r=me.
Attachment #358391 -
Flags: review?(l10n) → review+
| Assignee | ||
Comment 12•16 years ago
|
||
Attachment #358391 -
Attachment is obsolete: true
| Assignee | ||
Comment 13•16 years ago
|
||
Comment on attachment 358408 [details] [diff] [review]
use addStderr in postFailed
changeset: 180:e43363144d85
Attachment #358408 -
Flags: checked‑in+
| Assignee | ||
Comment 14•16 years ago
|
||
This changed landed without issue.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•