Closed
Bug 476208
Opened 16 years ago
Closed 16 years ago
have build machines send data to graph server in new graph server format.
Categories
(Release Engineering :: General, defect, P2)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: anodelman, Assigned: anodelman)
References
Details
Attachments
(3 files, 8 obsolete files)
4.07 KB,
patch
|
rdoherty
:
review+
anodelman
:
checked-in+
|
Details | Diff | Splinter Review |
3.39 KB,
patch
|
bhearsum
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
7.48 KB,
patch
|
bhearsum
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
The new graph server requires a new format for sending data, the builders should comply with this format.
We should roll this out in staging to get a decent amount of testing done. When the new graph server goes into production we can wait a few weeks to ensure that everything is working correctly and then switch to sending data to it.
I'm assuming that we will not require data migration or double sending of data (ie, have the builders sending results to both old/new graph servers for a limited set of time) for this as we don't have much history and the baseline can be generated over a single run.
Assignee | ||
Updated•16 years ago
|
Assignee: nobody → anodelman
Priority: -- → P2
Assignee | ||
Comment 1•16 years ago
|
||
Assignee | ||
Comment 2•16 years ago
|
||
bhearsum - can you take a quick look over this, as you touched this code just recently?
The only block that I'm hitting is that I'm going to require the sourcestamp for the build that was testing to send along to the graph server. Is that information available easily somewhere? Also, would it be possible to get the 'real' buildid instead of the current solution that involves just taking a timestamp from localtime (so, not related to the build but ends up related to the time the test was run).
Comment 3•16 years ago
|
||
You can parse both SourceStamp and BuildID out of application.ini
Assignee | ||
Comment 4•16 years ago
|
||
Can I get a directory where the ini would be located from where this graph sending code is? ie, is it a build property that I could access?
Assignee | ||
Comment 5•16 years ago
|
||
Missing here is the post_file.py file - it's the same as that used by talos. We could check in a second copy for each of use or we could pull from a single source to avoid duplication.
Attachment #360201 -
Attachment is obsolete: true
Attachment #360843 -
Flags: review?(bhearsum)
Assignee | ||
Comment 6•16 years ago
|
||
Attachment #360844 -
Flags: review?(bhearsum)
Assignee | ||
Comment 7•16 years ago
|
||
Attachment #360851 -
Flags: review?(rdoherty)
Assignee | ||
Comment 8•16 years ago
|
||
This will require allowing the cpu_speed column to have NULLs - is that acceptable?
Attachment #360851 -
Attachment is obsolete: true
Attachment #360854 -
Flags: review?(rdoherty)
Attachment #360851 -
Flags: review?(rdoherty)
Comment 9•16 years ago
|
||
Comment on attachment 360844 [details] [diff] [review]
changes to mozilla stage to pass correct variables to buildbotcustom
>diff -r 9e0527a80b00 mozilla2-staging/master-main.cfg
>--- a/mozilla2-staging/master-main.cfg Tue Feb 03 14:32:48 2009 -0500
>+++ b/mozilla2-staging/master-main.cfg Thu Feb 05 18:03:35 2009 -0800
>@@ -200,7 +200,7 @@
> stageBasePath=STAGE_BASE_PATH,
> graphServer=GRAPH_SERVER,
> graphSelector=GRAPH_SELECTOR,
>- graphBranch=branch['major_version'],
>+ graphBranch=branch['waterfall_name'],
No need to add a new parameter for this, it's already in config.py as branch['tinderbox_tree']
Attachment #360844 -
Flags: review?(bhearsum) → review-
Comment 10•16 years ago
|
||
Comment on attachment 360843 [details] [diff] [review]
update buildbotcustom to handle new graph server send protocol
This code isn't non-blocking. I don't want to put us in another situation where graph server or network problems hang Buildbot.
You should be able to use getPage just as before. I did some playing and it looks like this will work:
fh = open(filename)
data = fh.read()
fh.close()
content_type, body = encode_multipart_formdata(
[('key', 'value')],
[('filename', filename, data)]
)
headers = {'Content-Type': content_type, 'Content-Length': str(len(data))}
d = getPage(self.graphurl, timeout=self.timeout, method='POST',
headers=headers, postdata=body)
The deferreds should fire exactly the same as before with this code. You can probably get rid of sendAndRemoveFile, and you'll have to move the deletion of the tempfile to postFinished, you can pass it in like this:
d1.addCallback(self.postFinished, filename)
...
def postFinished(self, results, filename)
Does the above make sense?
Attachment #360843 -
Flags: review?(bhearsum) → review-
Updated•16 years ago
|
Attachment #360854 -
Flags: review?(rdoherty) → review-
Comment 11•16 years ago
|
||
Comment on attachment 360854 [details] [diff] [review]
add build tests/machines to staging graph server db (take 2)
Changing the column to null is fine, just change it in sql/schema.sql and make it part of this patch and I'll r+ it.
Assignee | ||
Comment 12•16 years ago
|
||
bhearsum -
The waterfall_name reflects the name of the waterfall that the itself would be associated with (ie, branch). For the all the staging boxes the tinderbox_tree is 'MozillaTest', but I require them to be 'Firefox', 'Firefox3.1' and 'TraceMonkey' depending on the type of build. Maybe calling it waterfall_name isn't quite appropriate. Would be more comfortable with branchLongName?
This is due to an effort in the graph server to get away from numbers (1.9, 1.9.0, 1.9.1, etc) in favor of waterfall branch names - it's seen as an easier to find the information.
Assignee | ||
Comment 13•16 years ago
|
||
Change to schema as requested.
Attachment #360854 -
Attachment is obsolete: true
Attachment #360983 -
Flags: review?(rdoherty)
Comment 14•16 years ago
|
||
Comment on attachment 360983 [details] [diff] [review]
[Checked in]add build tests/machines to staging graph server db (now with schema update)
Looks good!
Attachment #360983 -
Flags: review?(rdoherty) → review+
Comment 15•16 years ago
|
||
(In reply to comment #12)
> bhearsum -
> The waterfall_name reflects the name of the waterfall that the itself would be
> associated with (ie, branch). For the all the staging boxes the tinderbox_tree
> is 'MozillaTest', but I require them to be 'Firefox', 'Firefox3.1' and
> 'TraceMonkey' depending on the type of build. Maybe calling it waterfall_name
> isn't quite appropriate. Would be more comfortable with branchLongName?
>
> This is due to an effort in the graph server to get away from numbers (1.9,
> 1.9.0, 1.9.1, etc) in favor of waterfall branch names - it's seen as an easier
> to find the information.
Can we use repo_path instead, or is that too ugly? repo_path is more accurate, anyways, as evidenced by all staging branches sharing a tinderbox_tree.
Assignee | ||
Comment 16•16 years ago
|
||
Uses non-blocking getPage.
Attachment #360843 -
Attachment is obsolete: true
Attachment #361366 -
Flags: review?(bhearsum)
Assignee | ||
Comment 17•16 years ago
|
||
Attachment #360844 -
Attachment is obsolete: true
Attachment #361368 -
Flags: review?(bhearsum)
Assignee | ||
Comment 18•16 years ago
|
||
Comment on attachment 360983 [details] [diff] [review]
[Checked in]add build tests/machines to staging graph server db (now with schema update)
197:eb1cb26499d4
Attachment #360983 -
Attachment description: add build tests/machines to staging graph server db (now with schema update) → [Checked in]add build tests/machines to staging graph server db (now with schema update)
Attachment #360983 -
Flags: checked‑in+ checked‑in+
Assignee | ||
Comment 19•16 years ago
|
||
bhearsum - looking over the results on MozillaTest, the links that I'm generating aren't being added to the logs correctly. Used to be the case that we could get things to properly print on stage but not on production. Now it looks like we can't get graph links post anywhere.
Comment 20•16 years ago
|
||
Comment on attachment 361368 [details] [diff] [review]
changes to mozilla stage to pass correct variables to buildbotcustom (take 2)
>diff -r 9e0527a80b00 mozilla2-staging/config.py
>--- a/mozilla2-staging/config.py Tue Feb 03 14:32:48 2009 -0500
>+++ b/mozilla2-staging/config.py Mon Feb 09 14:40:13 2009 -0800
>@@ -16,8 +16,8 @@
> AUS2_USER = 'cltbld'
> AUS2_HOST = 'staging-stage.build.mozilla.org'
> DOWNLOAD_BASE_URL = 'ftp://ftp.mozilla.org/pub/mozilla.org/firefox'
>-GRAPH_SERVER = 'graphs-stage.mozilla.org'
>-GRAPH_SELECTOR = 'server'
>+GRAPH_SERVER = 'graphs-stage2.mozilla.org'
I guess this is the new style graph server?
Attachment #361368 -
Flags: review?(bhearsum) → review+
Comment 21•16 years ago
|
||
Comment on attachment 361366 [details] [diff] [review]
[Backed out]update buildbotcustom to handle new graph server send protocol (take 2)
Much nicer - thanks!
Attachment #361366 -
Flags: review?(bhearsum) → review+
Comment 22•16 years ago
|
||
(In reply to comment #19)
> bhearsum - looking over the results on MozillaTest, the links that I'm
> generating aren't being added to the logs correctly. Used to be the case that
> we could get things to properly print on stage but not on production. Now it
> looks like we can't get graph links post anywhere.
That sucks. If we need to fix it for this ping me later and maybe can track it down...
Comment 23•16 years ago
|
||
Are these patches going to fix the current GraphServerPost failures in staging. Eg:
Cannot find input stream
File "/var/www/html/graphs2/server/pyfomatic/collect.py", line 241, in handleRequest
raise ImproperFormatException("Cannot find input stream")
Assignee | ||
Comment 24•16 years ago
|
||
That looks like attempting data in the old graph server format to the new graph server - so yes, it should fix that.
Assignee | ||
Comment 25•16 years ago
|
||
Double checked, and the build machines and tests have been correctly added to the production graph server db. Rolling this out should only be a question of checking in and pushing the code.
Comment 26•16 years ago
|
||
Comment on attachment 361366 [details] [diff] [review]
[Backed out]update buildbotcustom to handle new graph server send protocol (take 2)
>+ def constructFile(self, machine, testname, branch, sourcestamp, buildid, date, val):
>+ info_format = "%s,%s,%s,%s,%s,%s\n"
>+ filename = tempfile.mktemp()
>+ tmpf = open(filename, "w")
>+ tmpf.write("START\n")
>+ tmpf.write("AVERAGE\n")
>+ tmpf.write(info_format % (machine, testname, branch, sourcestamp, buildid, date))
>+ tmpf.write("%.2f\n" % (float(val)))
>+ tmpf.write("END")
>+ tmpf.flush()
>+ tmpf.close()
>+ return filename
>+
Can we change this so we're not creating and deleting these temporary files all the time? Is there a need for this information to be stored in the file? Below, you're immediately reading the entire file into memory and then deleting the file, so it's probably better to just return a string with all the above data.
> def start(self):
> self.changes = self.build.allChanges()
> self.timestamp = int(self.step_status.build.getTimes()[0])
>- self.buildid = strftime("%Y%m%d%H%M", localtime(self.timestamp))
>+ self.buildid = self.getProperty('buildid')
>+ self.sourcestamp = self.getProperty('sourcestamp')
> self.testresults = self.getProperty('testresults')
> self.stdio = self.addLog('stdio')
> self.error = False
>@@ -379,9 +399,15 @@
> deferreds = []
> for res in self.testresults:
> testname, testlongname, testval, prettyval = res
>- params = urlencode({'branchid': self.buildid, 'value': str(testval).strip(string.letters), 'testname': testlongname, 'tbox' : self.resultsname, 'type' : "continuous", 'time' : self.timestamp, 'branch' : self.branch})
>- d = getPage(self.graphurl, timeout=self.timeout, method='POST',
>- postdata=params)
>+ testval = str(testval).strip(string.letters)
>+ filename = self.constructFile(self.resultsname, testlongname, self.branch, self.sourcestamp, self.buildid, self.timestamp, testval)
>+ fh = open(filename)
>+ file_data = fh.read()
>+ fh.close()
>+ os.remove(filename)
>+ content_type, body = post_file.encode_multipart_formdata([("key", "value")], [("filename", filename, file_data)])
>+ headers = {'Content-Type': content_type, 'Content-Length' : str(len(file_data))}
>+ d = getPage(self.graphurl, timeout=self.timeout, method='POST', headers=headers, postdata=body)
> d.addCallback(self.doTinderboxPrint, testlongname, testname,
> prettyval)
> d.addErrback(lambda x: self.postFailed(testlongname))
Assignee | ||
Comment 27•16 years ago
|
||
Updated patch to reflect currently reality. This looks like during migration we are going to break the ability to thunderbird and seamonkey to send data to graphs-stage...
How can we get in contact with them to make the transition smooth?
Attachment #361368 -
Attachment is obsolete: true
Attachment #383603 -
Flags: review?(bhearsum)
Updated•16 years ago
|
Attachment #383603 -
Attachment is patch: true
Attachment #383603 -
Attachment mime type: application/octet-stream → text/plain
Comment 28•16 years ago
|
||
(In reply to comment #27)
> Created an attachment (id=383603) [details]
> changes to pass correct variables to buildbotcustom (take 3)
>
> Updated patch to reflect currently reality. This looks like during migration
> we are going to break the ability to thunderbird and seamonkey to send data to
> graphs-stage...
Well afaik SeaMonkey doesn't send any stats to graphs-stage.
Thunderbird stats seemed to have stopped around May time. As I was expecting it to be much earlier than that I'm not too fussed about breaking it.
What I would like to do is to work with you so that we can migrate to the new code (graphs-new?) and get the right names for everything (we currently use comm-central and other things on graph-stage; the graphs-new version looks much better).
> How can we get in contact with them to make the transition smooth?
Make sure we're cc'ed and/or ping us on #maildev
![]() |
||
Comment 29•16 years ago
|
||
(In reply to comment #28)
> Well afaik SeaMonkey doesn't send any stats to graphs-stage.
Right, and we're waiting for the new graphs server to be fully working before enabling that, see bug 492406. I the buildbotcustom factories send the data in the new format, we're safe for that once we can move forward on it.
Updated•16 years ago
|
Attachment #383603 -
Flags: review?(bhearsum) → review+
Comment 30•16 years ago
|
||
Comment on attachment 383603 [details] [diff] [review]
[Backed out]changes to pass correct variables to buildbotcustom (take 3)
I don't know much about the new graph server, but this is what Talos uses, so it sounds good to me!
Assignee | ||
Comment 31•16 years ago
|
||
Comment on attachment 361366 [details] [diff] [review]
[Backed out]update buildbotcustom to handle new graph server send protocol (take 2)
changeset: 358:393344231abf
Attachment #361366 -
Attachment description: update buildbotcustom to handle new graph server send protocol (take 2) → [Checked in]update buildbotcustom to handle new graph server send protocol (take 2)
Attachment #361366 -
Flags: checked‑in+ checked‑in+
Assignee | ||
Comment 32•16 years ago
|
||
Comment on attachment 383603 [details] [diff] [review]
[Backed out]changes to pass correct variables to buildbotcustom (take 3)
changeset: 1270:55793cf7f0c9
Attachment #383603 -
Attachment description: changes to pass correct variables to buildbotcustom (take 3) → [Checked in]changes to pass correct variables to buildbotcustom (take 3)
Attachment #383603 -
Flags: checked‑in+ checked‑in+
Assignee | ||
Updated•16 years ago
|
Attachment #361366 -
Attachment description: [Checked in]update buildbotcustom to handle new graph server send protocol (take 2) → [Backed out]update buildbotcustom to handle new graph server send protocol (take 2)
Attachment #361366 -
Flags: checked‑in+ checked‑in+ → checked‑in- checked‑in-
Assignee | ||
Updated•16 years ago
|
Attachment #383603 -
Attachment description: [Checked in]changes to pass correct variables to buildbotcustom (take 3) → [Backed out]changes to pass correct variables to buildbotcustom (take 3)
Attachment #383603 -
Flags: checked‑in+ checked‑in+ → checked‑in- checked‑in-
Assignee | ||
Comment 33•16 years ago
|
||
Graph server posting failed after applying patches, backed out and currently sending to graphs-old.
Assignee | ||
Comment 34•16 years ago
|
||
Attachment #383603 -
Attachment is obsolete: true
Attachment #389840 -
Flags: review?(bhearsum)
Assignee | ||
Comment 35•16 years ago
|
||
Attachment #361366 -
Attachment is obsolete: true
Attachment #389841 -
Flags: review?(bhearsum)
Assignee | ||
Comment 36•16 years ago
|
||
Patches run green on stage, I've also added better error handling so that we can diagnose graph server failures quickly.
Comment 37•16 years ago
|
||
Comment on attachment 389840 [details] [diff] [review]
buildbot-configs updates to support new graph server
>diff -r 0a44baf04cbb mozilla2-staging/master-main.cfg
>--- a/mozilla2-staging/master-main.cfg Tue Jul 21 21:16:36 2009 +0200
>+++ b/mozilla2-staging/master-main.cfg Tue Jul 21 18:25:25 2009 -0700
>@@ -298,7 +298,7 @@
> stageBasePath=STAGE_BASE_PATH,
> graphServer=GRAPH_SERVER,
> graphSelector=GRAPH_SELECTOR,
>- graphBranch=branch['major_version'],
>+ graphBranch=branch['tinderbox_tree'],
Using tinderbox_tree here doesn't work very well in staging, but we can fix that later.
Attachment #389840 -
Flags: review?(bhearsum) → review+
Updated•16 years ago
|
Attachment #389840 -
Flags: checked‑in+ checked‑in+
Comment 38•16 years ago
|
||
Comment on attachment 389840 [details] [diff] [review]
buildbot-configs updates to support new graph server
2e154a2f8c5e
Updated•16 years ago
|
Attachment #389841 -
Flags: checked‑in+ checked‑in+
Comment 39•16 years ago
|
||
Comment on attachment 389841 [details] [diff] [review]
buildbotcustom updates to support new graph server
2dad9a4ec70e
Comment 40•16 years ago
|
||
Comment on attachment 389841 [details] [diff] [review]
buildbotcustom updates to support new graph server
>@@ -549,6 +567,24 @@
> warnOnFailure=True,
> haltOnFailure=True
> )
>+ self.addStep(SetProperty,
>+ command=['python', 'build/config/printconfigsetting.py',
>+ 'build/%s/dist/bin/application.ini' % self.objdir,
>+ 'App', 'BuildID'],
>+ property='buildid',
>+ workdir='.',
>+ description=['getting', 'buildid'],
>+ descriptionDone=['got', 'buildid']
>+ )
>+ self.addStep(SetProperty,
>+ command=['python', 'build/config/printconfigsetting.py',
>+ 'build/%s/dist/bin/application.ini' % self.objdir,
>+ 'App', 'SourceStamp'],
>+ property='sourcestamp',
>+ workdir='.',
>+ description=['getting', 'sourcestamp'],
>+ descriptionDone=['got', 'sourcestamp']
>+ )
I don't think you need to do this a second time here.
This patch looks fine otherwise. I'll check it in with this nit fixed during the downtime today.
Attachment #389841 -
Flags: review?(bhearsum) → review+
Comment 41•16 years ago
|
||
Alice, can you try adding this code in staging, it should get us better logs as to why graph server posts fail.
To reproduce set GRAPH_SELECTOR to 'server/collect.cgi' (instead of '/server/collect.cgi')
diff --git a/steps/test.py b/steps/test.py
--- a/steps/test.py
+++ b/steps/test.py
@@ -402,22 +402,24 @@
found = True
if not found:
self.stdio.addStderr("results not added, response: \n" + contents)
raise Exception("graph server did not add results successfully")
except Exception, e:
self.stdio.addStderr(str(e))
raise
- def postFailed(self, testlongname):
+ def postFailed(self, testlongname, res=None):
# This function is called when getPage() fails and simply sets
# self.error = True so postFinished knows that something failed.
self.error = True
self.stdio.addStderr('\nEncountered error when trying to post %s\n' % \
testlongname)
+ if res:
+ self.stdio.addStderr(str(res))
def constructString(self, machine, testname, branch, sourcestamp, buildid, date, val):
info_format = "%s,%s,%s,%s,%s,%s\n"
str = ""
str += "START\n"
str += "AVERAGE\n"
str += info_format % (machine, testname, branch, sourcestamp, buildid, date)
str += "%.2f\n" % (float(val))
@@ -439,17 +441,17 @@
testname, testlongname, testval, prettyval = res
testval = str(testval).strip(string.letters)
data = self.constructString(self.resultsname, testlongname, self.branch, self.sourcestamp, self.buildid, self.timestamp, testval)
content_type, body = post_file.encode_multipart_formdata([("key", "value")], [("filename", "data", data)])
headers = {'Content-Type': content_type, 'Content-Length' : str(len(data))}
d = getPage(self.graphurl, timeout=self.timeout, method='POST', headers=headers, postdata=body)
d.addCallback(self.doTinderboxPrint, testlongname, testname,
prettyval)
- d.addErrback(lambda x: self.postFailed(testlongname))
+ d.addErrback(lambda res: self.postFailed(testlongname, res))
deferreds.append(d)
# Now, once *everything* has finished we need to tell Buildbot
# that this step is complete.
dl = DeferredList(deferreds)
dl.addCallback(self.postFinished)
def postFinished(self, results):
Assignee | ||
Comment 42•16 years ago
|
||
I'm seeing the leak/bloat results on the new graph server, so data is successfully being sent. Going to call this done. Any further improvements can be split into new bugs.
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•