Closed
Bug 183753
Opened 22 years ago
Closed 20 years ago
can't fork duplicates.cgi: Bad file descriptor at collectstats.pl
Categories
(Bugzilla :: Reporting/Charting, defect)
Tracking
()
RESOLVED
FIXED
Bugzilla 2.18
People
(Reporter: jean_seb, Assigned: glob)
References
Details
(Whiteboard: [needed for Win32bz])
Attachments
(4 files)
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0) Build Identifier: I'm on a Windows install, so this may be a known issue, or an irrelevant one... When I run collectstats.pl, the collect_stats runs, but when it gets to this line : # Generate a static RDF file containing the default view of the duplicates data. open(CGI, "REQUEST_METHOD=GET QUERY_STRING=ctype=rdf duplicates.cgi |") || die "can't fork duplicates.cgi: $!"; it dies : Uncaught exception from user code: can't fork duplicates.cgi: Bad file descriptor at collectstats.pl line 77. I believe that's around line 56 in the current CVS version, but I have applied the regenerate patch (bug 80157) and made some other modifications, so it's at line 77 now... I don't know if there's something else that needs to be done to open a pipe on Windows. I've never used that in Perl, my experience is rather limited. If someone wants to try and fix this, I can try things out for you on my installation, which works beautifully except for this. Reproducible: Always Steps to Reproduce: 1. run 'perl collectstats.pl' from the command line Actual Results: Uncaught exception from user code: can't fork duplicates.cgi: Bad file descriptor at collectstats.pl line 77. The stats are generated OK, but anything that should happen from that line on in the script doesn't happen. ("Generate a static RDF file containing the default view of the duplicates data.") Expected Results: I never had it succeed, so I don't know what the purpose of that is exactly. The script should at least run without the error... I have no dupes in my bugs database.
Comment 1•22 years ago
|
||
This is myk's code. I'm not really sure why we geenarte a static rdf version - we don't do a static html version, for example.
Assignee: gerv → myk
Status: UNCONFIRMED → NEW
Ever confirmed: true
Whiteboard: [needed for Win32bz]
Reporter | ||
Comment 2•22 years ago
|
||
For now I've commented out all lines after the collect_stats call and associated loop (this means the lines that "Generate a static RDF file "...) so that collectstats.pl can run in a nightly task on my machine without errors. I can help finding out what the problem is, but since no one seems to know why those lines exist for now, I find it acceptable to keep them out of the way.
Comment 3•22 years ago
|
||
The static RDF version is for better performance when running duplicates.xul, and b.m.o does generate a static HTML version, but when Gerv reviewed duplicates.xul he suggested that the static RDF version be generated in collectstats.pl to obviate the need for yet another cron job. This makes a lot of sense, and actually I should have added static HTML generation to collectstats.pl at the same time.
Reporter | ||
Comment 4•22 years ago
|
||
I agree, it makes sense. Was this tested on a Windows installation? If not, then maybe ActivePerl doesn't support that way of opening pipes. I'll check the docs. But if it was tested, then it must be my machine. Either way, I don't have any experience with this style of open. Input anyone? General note : I'm not questioning BZ's programming style in general, but since the script is called collectstats.pl, if it does anything other than collect stats, shouldn't it be very well documented? Just a little comment saying "Generate a static RDF file containing the default view of the duplicates data." doesn't say much. At least not for me. On the other hand, maybe collectstats.pl could call another script, which would generate the RDF data, so that you could still keep only one cron job while keeping collectstats.pl specialized in what it needs to do... But that would add yet another file to the root BZ directory. So maybe just commenting the "RDF generation" better would be a good middle ground.
Reporter | ||
Comment 5•22 years ago
|
||
OK, here's what I've come up with so far : a) The command in the open does 3 things : set REQUEST_METHOD=GET, set QUERY_STRING=ctype=rdf, then call duplicates.cgi. On Windows, the shell command separator is '&', and you need to prefix the env variable names with 'set'. I don't know how we can make that platform independent, though... b) Using a small test program that does an open similar to the one in collectstats.pl, I can verify that it should work (or there is a way to make it work). If someone is interested, I can post these test scripts here. But the short version is that one of them prints the value of REQUEST_METHOD and QUERY_STRING, and the other one sets those variables to recognizable values and then opens the first one, all that with a pipe open as in collectstats.pl. And it works. c) Even when I put '&'s and replace './duplicates.cgi' with 'perl -wT duplicates.cgi', collectstats.pl still gives the same error message. But the same command from the command line works fine (prints XML output from duplicates.cgi to stdout). That leads me to think about the error message I am getting from collectstats.pl. Why would it say "Bad file descriptor"? What does that mean? Could all of this be related to the fact that I need to run duplicates.cgi with perl explicitly (the shebang line is ignored unless you're running in Apache), and the system path is removed to prevent taint mode problems, so it can't find the perl binary? (similar to bug #129401) Sorry for spamming you all with this, but I find it interesting to try to help fix this bug, even if it might only affect me. And when we've found the solution, I'll be interested to see how it can be made to be platform- independent (or almost so).
Comment 6•22 years ago
|
||
There are a lot of problems with this working on windows. collectstats.pl should be for collecting stats - if we want a 'things to run nightly script', then thats separate. That script should run w/o loading globals.pl (so that we don't kill $::ENV{PATH}), and simply use redirection to capture output. We could require IO::Run, although thats a bit heavyweight for our needs.
Reporter | ||
Comment 7•22 years ago
|
||
OK, I can try to make a patch to fix this. Does anyone have an objection to me making a new file, called maybe 'nightlyjobs.pl', which would call 'collectstats.pl' and (hypothetically) 'generaterdfdata.pl', and whatever else we might want to set as nightly jobs in the future? I think that would be the right way to make one cron job that does multiple things. Also, what would be the right way to go about setting the 2 env variables before calling duplicates.cgi? Maybe saving the old values (if any) from %ENV, then setting the values we want in %ENV, and later resetting the old values? That would probably work on all platforms, whereas setting them and calling duplicates.cgi on the same command line would seem to require two different command lines for Windows and Unix...
Comment 8•22 years ago
|
||
What env variables do you want to save?
Reporter | ||
Comment 9•22 years ago
|
||
I just meant that if REQUEST_METHOD and QUERY_STRING have values before collectstats.pl is called, they should be saved before being overwritten and restored once we've done what we need to do, shouldn't they? I agree, that's proabably overzealous, since that will probably never happen. But that's the current behavior, since system() pushes the environment on the system stack, inherits the old process' environment, and once the called process ends, the environment is popped back to what it was... I just wanted to emulate that instead of overwriting variables that were not set by me. Still trying to find time to make a patch for this. Shouldn't be too long.
Reporter | ||
Comment 10•22 years ago
|
||
I started bashing on this... I just have a problem : has Bugzilla started using a portable way of calling other perl scripts? Last time I installed on Windows, I still had to change lots of "./blah.cgi" calls to "perl blah.cgi" to make it work on Windows. Is there a portable way of getting this to work? (also, it's not a system(), it's actually open("./blah.cgi |")...) Other than that, the split is done, and works without errors on my Windows installation (using "perl duplicates.cgi" for lack of another idea for now). So if we can find a way of getting that to work, this bug should be history.
Reporter | ||
Comment 11•22 years ago
|
||
Ok, here it is. Obviously, the first file is just a patch to remove lines in collectstats.pl, which will be put in the second file, almost verbatim. The main reason this is done, as discussed earlier, is because calling duplicates.cgi to generate rdf data does not require taint mode, and thus the second file can keep the path intact. The added benefit is that it separates the two tasks, removing from collectstats.pl something that does not "collect stats". The new question becomes, should the calculate_dupes function (still in collectstats.pl) be separated too, or could it be argued that it's more closely related to collecting stats than the generation of RDF data? See two following files.
Reporter | ||
Comment 12•22 years ago
|
||
The filename, generate_rdf.pl, is subject to approval of course. The only change needed to make it work on Windows is - open(CGI, "./duplicates.cgi |") + open(CGI, "perl -wT duplicates.cgi |") which is one of the required changes that is noted in the Win32 installation instructions, because it must be made to other files as well. Setting values in $ENV is probably more portable than prefixing the command line with them... This will probably work on both Windows and Unix.
Reporter | ||
Comment 13•22 years ago
|
||
Last one, this runs collectstats.pl and generate_rdf.pl. This should now replace collectstats.pl in the cron. Once again, the filename is subject to approval. Once again, change needed to work on Windows is to replace "./" by "perl " (no -wT this time, that was only needed so that duplicates.cgi wouldn't complain that it's "too late for -T") Let me know what you think about these 3 files, especially A) if it should all work on Unix. I think it should. B) if this is the right approach. IMO it is, but you might argue it adds too many new files... C) if I should move calculate_dupes to another file too, or maybe rename generate_rdf.pl to process_duplicates.pl and put it in there, since they both work on duplicates. Note that I put Myk and I as contributors in generate_rdf.pl since it was Myk's code, as stated above. If I need to move others there too, please tell me.
Comment 14•22 years ago
|
||
Is it really necessary to break this out into a separate script? It seems pretty related to collectstats.pl to me... after all, the RDF file is just another representation of the statistics collected by the script.
Reporter | ||
Comment 15•22 years ago
|
||
Is it really? I thought it was a representation of the duplicates data (which are surely stats, but not the same type of stats that collectstats is mandated to collect). I discussed it above, and bbaetz for one seemed to agree that collectstats should be just for collecting stats (see comment #6 above). Another good reason to break it in two is that generating RDF data calls duplicates.cgi, and on Windows, system() needs a valid path. But the rest of collectstats needs globals.pl, which clobbers the path. So separating the two makes collectstats work and the RDF data can be generated in another script that doesn't use globals.pl. Of course, if bug #129401 and bug #136156 are fixed, this point is moot. Anyways, if you don't like this solution, just invalidate my attachments. I was merely trying to give a possible solution.
Reporter | ||
Comment 16•22 years ago
|
||
And for the record, I agree that it's going a bit far to separate the two tasks on the basis of semantics alone... For me, the major point was that the call to duplicates.cgi would fail because there is no valid path for the system() call on Windows. But the run_nightly_jobs.pl principle could stay, if there are ever any other things we need to do nightly. In that sense, collectstats.pl should not be the cron job, run_nightly_jobs.pl should be. My understanding was that the RDF generation code was just 'dumped' into collectstats.pl because that was the script running from cron at the time (your own comment #3), and you didn't want to put another script in the cron. That's why I suggested this solution. Just trying to explain why I did what I did. It's all up to you of course.
Comment 17•22 years ago
|
||
>Is it really? I thought it was a representation of the duplicates data (which
>are surely stats, but not the same type of stats that collectstats is mandated
>to collect).
duplicates data is exactly the data that collectstats.pl collects.
Reporter | ||
Comment 18•22 years ago
|
||
No, it's _part_ of the data that collectstats.pl collects. # fields: DATE|NEW|ASSIGNED|REOPENED|UNCONFIRMED|RESOLVED|VERIFIED|CLOSED|FIXED|INVALID|WO NTFIX|LATER|REMIND|DUPLICATE|WORKSFORME|MOVED But I agree, it's related. So how do you suggest we proceed? Ignore my attachments and wait for bug #129401 and bug #136156 to land (which would fix this)? The reason I did what I did is that I suggested a way of dealing with this (comment #7) and asked if anyone objected, and no one answered in almost a month. So I went ahead as I thought was best. As I said, if you don't agree, just tell me and I'll forget it.
Comment 19•22 years ago
|
||
Is this related to bug 187861? I don't think it's exactly the same thing, but the same fix will likely fix both.
Reporter | ||
Comment 20•22 years ago
|
||
Well, you noted in bug 187861 that bug 124174 would probably fix it, which is not the case for this. The error message noted in the summary happens because on Windows, the shell (command.com for Win9x/ME, cmd.exe for WinNT/2000/XP) needs to be in the path for system() to work, and globals.pl zaps the path for taint mode reasons. So I think it's more closely related to the two bugs I put in "depends on" above.
Comment 21•22 years ago
|
||
Breaking up the scripts seems like overkill in this situation unless there are other reasons to do it than just working around the problem. The only other reason I can think of is to make it easier to manage nightly tasks, and I'm of two minds about that. I agreed with Gerv to add the RDF generation to collectstats.pl because it makes sense there, but in general I think we'll be better off with multiple cron jobs instead of aggregating nightly tasks into a script, since it gives us more flexibility for when to run each job. On the other hand, a nightly tasks script doesn't prevent us from doing multiple cron jobs when they make sense and might make it easier for some installations (particularly those without root access to a machine that need administrators to add cron jobs for them). Anyone else got an opinion?
Comment 22•22 years ago
|
||
I continue to maintain that we should have a single cron job that fires every 15 minutes (or once an hour) and the script within bugzilla that's fired off by it should handle the scheduling of what gets run when. This would allow the admin to set times and check off which scripts to run from a web page.
Comment 23•21 years ago
|
||
This works for me Microsoft Windows 2000 [Version 5.00.2195] Bugzilla 2.17.4 Perl v5.8.0 built for cygwin-multi-64int What Bugzilla version are you reporting this for?
Reporter | ||
Comment 24•21 years ago
|
||
As I stated in bug 143490, I don't think Cygwin should be used for Bugzilla. Of course all these bugs will work for you, since Cygwin emulates a Unix environment. But requiring that all users should install a Unix emulation to run Bugzilla is a bit much. Users who want Unix will install Linux. If we're on Windows, that's because we want/need it, and these bugs exist to make Bugzilla work on Windows, not on Unix-over-Windows. Please test with ActiveState Perl if you want to try to resolve these. That's the standard for Bugzilla.
Updated•21 years ago
|
Target Milestone: --- → Bugzilla 2.18
Comment 25•21 years ago
|
||
works for me on cygwin
Comment 26•20 years ago
|
||
Comment on attachment 110805 [details] (very) simple script that runs the previous two >system("./collectstats.pl"); >system("./generate_rdf.pl"); This still requires modification on Windows, the idea was to get it to run without people having to fiddle with it. :) How about we do it like this: system("$^X collectstats.pl"); system("$^X generate_rdf.pl"); The $^X variable contains the path to the currently running Perl executable, and in theory that should work on both *nix and Win32.
Attachment #110805 -
Flags: review-
Comment 27•20 years ago
|
||
Comment on attachment 110804 [details] Code previously from collectstats.pl >open(CGI, "./duplicates.cgi |") > || die "can't fork duplicates.cgi: $!"; Same here as on the other file, let's use $^X instead of assuming direct execution so it'll run the current perl.
Attachment #110804 -
Flags: review-
Comment 28•20 years ago
|
||
Comment on attachment 110802 [details] [diff] [review] Patch for collectstats.pl nothing in particular wrong with this patch except that it's bitrotted. And as mentioned in previous comments, I think we decided that separating this into different scripts is beyond the scope of this bug. The changes you made to that chunk of code look good, with the additional changes I pointed out, and can be done in place within the existing collectstats.pl for now.
Attachment #110802 -
Flags: review-
Comment 29•20 years ago
|
||
*** Bug 239971 has been marked as a duplicate of this bug. ***
Updated•20 years ago
|
Target Milestone: Bugzilla 2.18 → Bugzilla 2.20
Assignee | ||
Comment 30•20 years ago
|
||
also fixes a taint problem
Attachment #148837 -
Flags: review?
Comment 31•20 years ago
|
||
Comment on attachment 148837 [details] [diff] [review] simpler patch An interesting approach. The patch works on both Win32 (AS Perl 5.8.1) and Linux, although I'm slightly confused on _why_ exactly does the piping work here. This is different from our standard open "-|" approach, but that hasn't been of relevance before. I need to think this through when I'm better awake. Anyway, that's irrelevant -- the patch doesn't seem to cause harmful side-effects, so I'm inclined to accept it. r=jouni, but let's have another review from somebody who's intimately familiar with this stuff. Gerv?
Attachment #148837 -
Flags: review?(gerv)
Attachment #148837 -
Flags: review?
Attachment #148837 -
Flags: review+
Comment 32•20 years ago
|
||
Comment on attachment 148837 [details] [diff] [review] simpler patch for the record, that's insanely similar to the code that *used* to be there, which got removed because it was too easy to potentially cause security problems that we couldn't otherwise detect that way.
Comment 33•20 years ago
|
||
Comment on attachment 148837 [details] [diff] [review] simpler patch I know nothing about this stuff :-) Myk might be your man. Gerv
Attachment #148837 -
Flags: review?(gerv) → review?(myk)
Comment 34•20 years ago
|
||
(In reply to comment #32) > (From update of attachment 148837 [details] [diff] [review]) > for the record, that's insanely similar to the code that *used* to be there, > which got removed because it was too easy to potentially cause security > problems that we couldn't otherwise detect that way. What code are you referring to?
Comment 35•20 years ago
|
||
(In reply to comment #34) > What code are you referring to? OK, nevermind. That's the same code that's already here. :) Everyplace else we did that kind of stuff it got replaced with a forked exec(), because the piped opens are too easy to put bad variables in.
Assignee | ||
Comment 36•20 years ago
|
||
i'd like to see this bug fixed for 2.18
No longer depends on: 129401
Flags: blocking2.18?
Comment 37•20 years ago
|
||
We'd all like it to, but it's not blocking. I'll gladly accept a reviewed patch for it prior to release if it's ready before then though. (approval is what you want to request if you think it's ready now, but I still see a pending review request)
Flags: blocking2.18? → blocking2.18-
Assignee | ||
Comment 38•20 years ago
|
||
myk .. any chance of getting a review on this please?
Comment 39•20 years ago
|
||
Comment on attachment 148837 [details] [diff] [review] simpler patch Works for me. r=myk
Attachment #148837 -
Flags: review?(myk)
Assignee | ||
Comment 41•20 years ago
|
||
i don't have cvs access, can someone please check this in for me? hopefully it's ok to request blocking2.18 now.
Flags: blocking2.18- → blocking2.18?
Comment 42•20 years ago
|
||
Done: Checking in collectstats.pl; /cvsroot/mozilla/webtools/bugzilla/collectstats.pl,v <-- collectstats.pl new revision: 1.37; previous revision: 1.36 done
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Flags: blocking2.18?
Resolution: --- → FIXED
Target Milestone: Bugzilla 2.20 → Bugzilla 2.18
Updated•12 years ago
|
QA Contact: matty_is_a_geek → default-qa
You need to log in
before you can comment on or make changes to this bug.
Description
•