Closed
Bug 321404
Opened 19 years ago
Closed 19 years ago
Tinderbox build graphs are returning error 500
Categories
(mozilla.org Graveyard :: Server Operations, task)
mozilla.org Graveyard
Server Operations
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: bugzilla-mozilla-20000923, Assigned: morgamic)
References
()
Details
Every tinderbox build graph I try to view is failing to load the image, for example: http://build-graphs.mozilla.org/graph/query.cgi?testname=xulwinopen&tbox=comet.mozilla.org&autoscale=1&days=7&avg=1&showpoint=2005:12:23:13:28:09,453 The image on that page is getting a 500 Internal Server Error instead of anything useful, even though the data itself appears to be available.
Comment 1•19 years ago
|
||
That's on purpose. The script has issues.
Reporter | ||
Comment 2•19 years ago
|
||
Oh great, another security issue killing off a useful developer tool. Wheee.
Updated•19 years ago
|
OS: Windows Server 2003 → All
Hardware: PC → All
Updated•19 years ago
|
Assignee: server-ops → justdave
Comment 3•19 years ago
|
||
*** Bug 323007 has been marked as a duplicate of this bug. ***
Comment 4•19 years ago
|
||
Is there anything I can do to help get this moving? Without the graphs it's very difficult to spot perf regressions without watching tinderbox continuously...
Comment 5•19 years ago
|
||
*** Bug 323749 has been marked as a duplicate of this bug. ***
Updated•19 years ago
|
Severity: normal → major
Comment 6•19 years ago
|
||
This is a major problem for Firefox development. These graphs are very important. Is there anything I can do to help?
Updated•19 years ago
|
Severity: major → blocker
Comment 7•19 years ago
|
||
It's unacceptable for this to be unfixed for so long. I'm escalating now. /be
Comment 8•19 years ago
|
||
In particular, this bug should have higher priority than anything to-do with news.mozilla.org. /be
Comment 9•19 years ago
|
||
Morgamic - can you take this?
Assignee | ||
Comment 10•19 years ago
|
||
I can give it a shot. Is there a staging environment for Tinderbox I can use and/or get access to in order to test patches on graph/query.cgi?
Assignee | ||
Comment 12•19 years ago
|
||
It looks like some of the regexp matches were causing die's where it should have just set the respective parameter to '', and there were a couple of cases where there were missing ;'s. That said, even if I fix syntax errors and update these regexps to allow for a null param, the majority of the time I get no graph, and digging revealed what gnuplot is giving based on the passed $cmds: gnuplot> reset gnuplot> set term png color Terminal type set to 'png' Options are 'small color' gnuplot> set output "/tmp/gnuplot.9270" gnuplot> set title "comet.mozilla.org xulwinopen" gnuplot> set key graph 0.1,0.95 reverse spacing .75 width -18 gnuplot> set linestyle 1 lt 3 lw 1 pt 7 ps .5 gnuplot> set linestyle 2 lt 3 lw 1 pt 7 ps 1 gnuplot> set linestyle 3 lt 3 lw 1 gnuplot> set linestyle 4 lt 8 lw 1 pt 7 ps 3 gnuplot> set data style points gnuplot> set timefmt "%Y:%m:%d:%H:%M:%S" gnuplot> set xdata time gnuplot> set xrange ["2006:01:19:12:36:11" : "2006:01:26:12:36:11"] gnuplot> set yrange [ 0 : ] gnuplot> set ylabel "xulwinopen (ms)" gnuplot> set timestamp "Generated: %d/%b/%y %H:%M" 0,-1 gnuplot> set nokey gnuplot> set grid gnuplot> plot "db/xulwinopen/comet.mozilla.org" using 1:2 with lines, "db/xulwinopen/point.9270" using 1:2 with points ls 4, "db/xulwinopen/comet.mozilla.org_avg" using 1:2 with lines ls 3 all points undefined! This is an example -- the all points undefined error occurs when there is no data associated with the given test/machine/daterange. So this leads me to wonder if there is a problem with how data gets synced to axolotl from the build systems? Does anybody know what would cause the delivery of this data to be interrupted?
Status: NEW → ASSIGNED
Assignee | ||
Comment 13•19 years ago
|
||
Upon further investigation, I learned that Tinderbox accesses axolotl via HTTP. When a build fires off, it access a URL, defined by $tmpurl, that points to collect.cgi in graph/. For example: my $tmpurl = "http://$Settings::results_server/graph/collect.cgi"; tmpurl .= "?value=$value&data=$data_plus_co_time&testname=$testname&tbox=$tbox"; What this means is that in order for build information to be properly inserted as a datapoint in the build graph datafile (/db/$tbox), collect.cgi needs to: 1) be accessible over HTTP by the build machine 2) work correctly In our case, we know that access was not the problem, because the build IP space is allowed access to the graph/ directory on axolotl. In the second case, it was apparent that the patch meant to fix security holes in *.cgi (particularly for sanitizing GET parameters and opening files properly in rawdata.cgi and graph.cgi to disallow maligned inputs and/or injection) caused a syntax error on the last line of the cleansing for collect.cgi (see attachment 206844 [details] [diff] [review]: > +die "Unexpected value for parameter 'data' supplied" > + unless $data =~ /^(?:\d+:?)*$/ My conclusion is that because of this syntax error, for the period between the date when the patch was applied (probably Jan 10th) to now, data has not been updated for build graphs. This would explain the errors in gnuplot that report missing data and/or "All points undefined." The plan for resolving this: 1) Update/fix *.cgi to allow for null inputs, and re-verify all scripts to ensure there are no parse/syntax errors 2) Verify that build data is once again being inserted into /graph/db/$tbox 3) Document Tinderbox's dependency on graph/collect.cgi so this doesn't happen in the future Thoughts?
Comment 14•19 years ago
|
||
So in other words performance data is not actually being collected and hasn't been since Jan 10? If so we should probably close the tree, especially since tinderbox itself seems to lose such data after some time in my experience... Is there any way we can re-scrape the tinderbox logs since data collection stopped to restore the data?
Assignee | ||
Comment 15•19 years ago
|
||
It should be possible to scrape the apache logs for past data from the build systems, and create entries retro-actively to restore lost data. Someone will look into doing this soon, and hopefully we can get the data back this way. :) BTW, the graphs should be back up, pending a review on the patch for bug 321234.
Comment 16•19 years ago
|
||
This has been re-enabled. We are working to get back historical data...should have it back in the next few days.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Comment 17•19 years ago
|
||
The historical data should be back for this bug. I was not however able to retrieve the average data. I didn't look into this much, it may be easy to do.
Comment 18•19 years ago
|
||
Just having the red graph is probably fine unless the other is dead-easy. Thank you for fixing!
Assignee | ||
Comment 19•19 years ago
|
||
Jeremy > *!
Comment 20•19 years ago
|
||
Is bug 321234 going to be opened and/or the fixes checked into CVS?
Comment 21•19 years ago
|
||
(In reply to comment #20) > Is bug 321234 going to be opened and/or the fixes checked into CVS? > Yes, as soon as the webtools/security team reviews the code for cvs check-in.
Comment 22•19 years ago
|
||
So... I have some bad news. :( The graphs are up and there is data up to the middle of the day on Jan 27 (so up to comment 16 or comment 17 on this bug). None of the data from after that has made it into the graphs. Should I file a separate bug on this?
Comment 23•19 years ago
|
||
As pointed out in comment 22, collect.cgi apparently isn't working at all now... Going to need oremj's magic to fill in the blanks again after we get it working, too.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 24•19 years ago
|
||
(In reply to comment #23) > As pointed out in comment 22, collect.cgi apparently isn't working at all > now... > > Going to need oremj's magic to fill in the blanks again after we get it > working, too. > Is it really boroken, or were file permissions possibly altered by the fill in the blanks process? It was working right up until the approximate time that the blanks got filled in from what I can see. Just something to check.
Comment 25•19 years ago
|
||
Very possible that this is a permission problem from when I patched... I fixed the permissions we'll see if the data starts appearing again.
Comment 26•19 years ago
|
||
Permissions were the problem. I'll fill in the missing data and reset the permissions and it should be all fixed.
Assignee | ||
Comment 27•19 years ago
|
||
Graphs look good right now -- resolving. If anybody notices strange or completely nonsensical graph information, please holler.
Status: REOPENED → RESOLVED
Closed: 19 years ago → 19 years ago
Resolution: --- → FIXED
Updated•9 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•