Bugzilla

Reporter

Comment 2

•

22 years ago

For now I've commented out all lines after the collect_stats call and 
associated loop (this means the lines that "Generate a static RDF file "...) so 
that collectstats.pl can run in a nightly task on my machine without errors. I 
can help finding out what the problem is, but since no one seems to know why 
those lines exist for now, I find it acceptable to keep them out of the way.

Comment 3

•

22 years ago

The static RDF version is for better performance when running duplicates.xul,
and b.m.o does generate a static HTML version, but when Gerv reviewed
duplicates.xul he suggested that the static RDF version be generated in
collectstats.pl to obviate the need for yet another cron job.  This makes a lot
of sense, and actually I should have added static HTML generation to
collectstats.pl at the same time.

Reporter

Comment 4

•

22 years ago

I agree, it makes sense. Was this tested on a Windows installation?

If not, then maybe ActivePerl doesn't support that way of opening pipes. I'll 
check the docs.

But if it was tested, then it must be my machine.

Either way, I don't have any experience with this style of open. Input anyone?

General note : I'm not questioning BZ's programming style in general, but since 
the script is called collectstats.pl, if it does anything other than collect 
stats, shouldn't it be very well documented? Just a little comment 
saying "Generate a static RDF file containing the default view of the 
duplicates data." doesn't say much. At least not for me.

On the other hand, maybe collectstats.pl could call another script, which would 
generate the RDF data, so that you could still keep only one cron job while 
keeping collectstats.pl specialized in what it needs to do... But that would 
add yet another file to the root BZ directory. So maybe just commenting 
the "RDF generation" better would be a good middle ground.

Reporter

Comment 5

•

22 years ago

OK, here's what I've come up with so far :

a) The command in the open does 3 things : set REQUEST_METHOD=GET, set 
QUERY_STRING=ctype=rdf, then call duplicates.cgi. On Windows, the shell command 
separator is '&', and you need to prefix the env variable names with 'set'. I 
don't know how we can make that platform independent, though...

b) Using a small test program that does an open similar to the one in 
collectstats.pl, I can verify that it should work (or there is a way to make it 
work). If someone is interested, I can post these test scripts here. But the 
short version is that one of them prints the value of REQUEST_METHOD and 
QUERY_STRING, and the other one sets those variables to recognizable values and 
then opens the first one, all that with a pipe open as in collectstats.pl. And 
it works.

c) Even when I put '&'s and replace './duplicates.cgi' with 'perl -wT 
duplicates.cgi', collectstats.pl still gives the same error message. But the 
same command from the command line works fine (prints XML output from 
duplicates.cgi to stdout).

That leads me to think about the error message I am getting from 
collectstats.pl. Why would it say "Bad file descriptor"? What does that mean?

Could all of this be related to the fact that I need to run duplicates.cgi with 
perl explicitly (the shebang line is ignored unless you're running in Apache), 
and the system path is removed to prevent taint mode problems, so it can't find 
the perl binary? (similar to bug #129401)

Sorry for spamming you all with this, but I find it interesting to try to help 
fix this bug, even if it might only affect me. And when we've found the 
solution, I'll be interested to see how it can be made to be platform-
independent (or almost so).

Bradley Baetz (:bbaetz)

Comment 6

•

22 years ago

There are a lot of problems with this working on windows.

collectstats.pl should be for collecting stats - if we want a 'things to run
nightly script', then thats separate. That script should run w/o loading
globals.pl (so that we don't kill $::ENV{PATH}), and simply use redirection to
capture output. We could require IO::Run, although thats a bit heavyweight for
our needs.

Reporter

Comment 7

•

22 years ago

OK, I can try to make a patch to fix this.

Does anyone have an objection to me making a new file, called 
maybe 'nightlyjobs.pl', which would call 'collectstats.pl' and 
(hypothetically) 'generaterdfdata.pl', and whatever else we might want to set 
as nightly jobs in the future? I think that would be the right way to make one 
cron job that does multiple things.

Also, what would be the right way to go about setting the 2 env variables 
before calling duplicates.cgi? Maybe saving the old values (if any) from %ENV, 
then setting the values we want in %ENV, and later resetting the old values? 
That would probably work on all platforms, whereas setting them and calling 
duplicates.cgi on the same command line would seem to require two different 
command lines for Windows and Unix...

Bradley Baetz (:bbaetz)

Comment 8

•

22 years ago

What env variables do you want to save?

Reporter

Comment 9

•

22 years ago

I just meant that if REQUEST_METHOD and QUERY_STRING have values before 
collectstats.pl is called, they should be saved before being overwritten and 
restored once we've done what we need to do, shouldn't they?

I agree, that's proabably overzealous, since that will probably never happen. 
But that's the current behavior, since system() pushes the environment on the 
system stack, inherits the old process' environment, and once the called 
process ends, the environment is popped back to what it was... I just wanted to 
emulate that instead of overwriting variables that were not set by me.

Still trying to find time to make a patch for this. Shouldn't be too long.

Reporter

Comment 10

•

22 years ago

I started bashing on this... 

I just have a problem : has Bugzilla started using a portable way of calling 
other perl scripts? Last time I installed on Windows, I still had to change 
lots of "./blah.cgi" calls to "perl blah.cgi" to make it work on Windows. Is 
there a portable way of getting this to work? (also, it's not a system(), it's 
actually open("./blah.cgi |")...)

Other than that, the split is done, and works without errors on my Windows 
installation (using "perl duplicates.cgi" for lack of another idea for now). So 
if we can find a way of getting that to work, this bug should be history.

Reporter

Comment 11

•

22 years ago

Attached patch Patch for collectstats.pl — Details — Splinter Review

Ok, here it is. Obviously, the first file is just a patch to remove lines in
collectstats.pl, which will be put in the second file, almost verbatim. The
main reason this is done, as discussed earlier, is because calling
duplicates.cgi to generate rdf data does not require taint mode, and thus the
second file can keep the path intact.

The added benefit is that it separates the two tasks, removing from
collectstats.pl something that does not "collect stats". The new question
becomes, should the calculate_dupes function (still in collectstats.pl) be
separated too, or could it be argued that it's more closely related to
collecting stats than the generation of RDF data?

See two following files.

Reporter

Comment 12

•

22 years ago

Attached file Code previously from collectstats.pl — Details

The filename, generate_rdf.pl, is subject to approval of course.

The only change needed to make it work on Windows is
- open(CGI, "./duplicates.cgi |")
+ open(CGI, "perl -wT duplicates.cgi |")
which is one of the required changes that is noted in the Win32 installation
instructions, because it must be made to other files as well.

Setting values in $ENV is probably more portable than prefixing the command
line with them... This will probably work on both Windows and Unix.

Reporter

Comment 13

•

22 years ago

Attached file (very) simple script that runs the previous two — Details

Last one, this runs collectstats.pl and generate_rdf.pl. This should now
replace collectstats.pl in the cron. Once again, the filename is subject to
approval.

Once again, change needed to work on Windows is to replace "./" by "perl " (no
-wT this time, that was only needed so that duplicates.cgi wouldn't complain
that it's "too late for -T")

Let me know what you think about these 3 files, especially 
A) if it should all work on Unix. I think it should.
B) if this is the right approach. IMO it is, but you might argue it adds too
many new files...
C) if I should move calculate_dupes to another file too, or maybe rename
generate_rdf.pl to process_duplicates.pl and put it in there, since they both
work on duplicates.

Note that I put Myk and I as contributors in generate_rdf.pl since it was Myk's
code, as stated above. If I need to move others there too, please tell me.

Comment 14

•

22 years ago

Is it really necessary to break this out into a separate script?  It seems
pretty related to collectstats.pl to me... after all, the RDF file is just
another representation of the statistics collected by the script.

Reporter

Comment 15

•

22 years ago

Is it really? I thought it was a representation of the duplicates data (which 
are surely stats, but not the same type of stats that collectstats is mandated 
to collect).

I discussed it above, and bbaetz for one seemed to agree that collectstats 
should be just for collecting stats (see comment #6 above).

Another good reason to break it in two is that generating RDF data calls 
duplicates.cgi, and on Windows, system() needs a valid path. But the rest of 
collectstats needs globals.pl, which clobbers the path. So separating the two 
makes collectstats work and the RDF data can be generated in another script 
that doesn't use globals.pl. Of course, if bug #129401 and bug #136156 are 
fixed, this point is moot.

Anyways, if you don't like this solution, just invalidate my attachments. I was 
merely trying to give a possible solution.

Reporter

Comment 16

•

22 years ago

And for the record, I agree that it's going a bit far to separate the two tasks 
on the basis of semantics alone... For me, the major point was that the call to 
duplicates.cgi would fail because there is no valid path for the system() call 
on Windows. But the run_nightly_jobs.pl principle could stay, if there are ever 
any other things we need to do nightly. In that sense, collectstats.pl should 
not be the cron job, run_nightly_jobs.pl should be.

My understanding was that the RDF generation code was just 'dumped' into 
collectstats.pl because that was the script running from cron at the time (your 
own comment #3), and you didn't want to put another script in the cron. That's 
why I suggested this solution.

Just trying to explain why I did what I did. It's all up to you of course.

Comment 17

•

22 years ago

>Is it really? I thought it was a representation of the duplicates data (which 
>are surely stats, but not the same type of stats that collectstats is mandated 
>to collect).

duplicates data is exactly the data that collectstats.pl collects.

Reporter

Comment 18

•

22 years ago

No, it's _part_ of the data that collectstats.pl collects.

# fields: 
DATE|NEW|ASSIGNED|REOPENED|UNCONFIRMED|RESOLVED|VERIFIED|CLOSED|FIXED|INVALID|WO
NTFIX|LATER|REMIND|DUPLICATE|WORKSFORME|MOVED

But I agree, it's related.

So how do you suggest we proceed? Ignore my attachments and wait for bug 
#129401 and bug #136156 to land (which would fix this)? The reason I did what I 
did is that I suggested a way of dealing with this (comment #7) and asked if 
anyone objected, and no one answered in almost a month. So I went ahead as I 
thought was best. As I said, if you don't agree, just tell me and I'll forget 
it.

Depends on: 129401, 136156

Comment 19

•

22 years ago

Is this related to bug 187861?  I don't think it's exactly the same thing, but
the same fix will likely fix both.

Reporter

Comment 20

•

22 years ago

Well, you noted in bug 187861 that bug 124174 would probably fix it, which is 
not the case for this. The error message noted in the summary happens because 
on Windows, the shell (command.com for Win9x/ME, cmd.exe for WinNT/2000/XP)
needs to be in the path for system() to work, and globals.pl zaps the path for 
taint mode reasons.

So I think it's more closely related to the two bugs I put in "depends on" 
above.

Comment 21

•

22 years ago

Breaking up the scripts seems like overkill in this situation unless there are
other reasons to do it than just working around the problem.  The only other
reason I can think of is to make it easier to manage nightly tasks, and I'm of
two minds about that.

I agreed with Gerv to add the RDF generation to collectstats.pl because it makes
sense there, but in general I think we'll be better off with multiple cron jobs
instead of aggregating nightly tasks into a script, since it gives us more
flexibility for when to run each job.

On the other hand, a nightly tasks script doesn't prevent us from doing multiple
cron jobs when they make sense and might make it easier for some installations
(particularly those without root access to a machine that need administrators to
add cron jobs for them).

Anyone else got an opinion?

Comment 22

•

22 years ago

I continue to maintain that we should have a single cron job that fires every 15
minutes (or once an hour) and the script within bugzilla that's fired off by it
should handle the scheduling of what gets run when.  This would allow the admin
to set times and check off which scripts to run from a web page.

Jason Pyeron

Comment 23

•

21 years ago

This works for me
Microsoft Windows 2000 [Version 5.00.2195]
Bugzilla 2.17.4
Perl v5.8.0 built for cygwin-multi-64int

What Bugzilla version are you reporting this for?

Reporter

Comment 24

•

21 years ago

As I stated in bug 143490, I don't think Cygwin should be used for Bugzilla. 

Of course all these bugs will work for you, since Cygwin emulates a Unix 
environment. But requiring that all users should install a Unix emulation to 
run Bugzilla is a bit much. Users who want Unix will install Linux. If we're on 
Windows, that's because we want/need it, and these bugs exist to make Bugzilla 
work on Windows, not on Unix-over-Windows.

Please test with ActiveState Perl if you want to try to resolve these. That's 
the standard for Bugzilla.

Updated

•

21 years ago

Target Milestone: --- → Bugzilla 2.18

Jason Pyeron

Comment 25

•

21 years ago

works for me on cygwin

Comment 26

•

20 years ago

Comment on attachment 110805 [details]
(very) simple script that runs the previous two

>system("./collectstats.pl");
>system("./generate_rdf.pl");

This still requires modification on Windows, the idea was to get it to run
without people having to fiddle with it. :)

How about we do it like this:

system("$^X collectstats.pl");
system("$^X generate_rdf.pl");

The $^X variable contains the path to the currently running Perl executable,
and in theory that should work on both *nix and Win32.

Attachment #110805 - Flags: review-

Comment 27

•

20 years ago

Comment on attachment 110804 [details]
Code previously from collectstats.pl

>open(CGI, "./duplicates.cgi |")
>  || die "can't fork duplicates.cgi: $!";

Same here as on the other file, let's use $^X instead of assuming direct
execution so it'll run the current perl.

Attachment #110804 - Flags: review-

Comment 28

•

20 years ago

Comment on attachment 110802 [details] [diff] [review]
Patch for collectstats.pl

nothing in particular wrong with this patch except that it's bitrotted.  And as
mentioned in previous comments, I think we decided that separating this into
different scripts is beyond the scope of this bug.  The changes you made to
that chunk of code look good, with the additional changes I pointed out, and
can be done in place within the existing collectstats.pl for now.

Attachment #110802 - Flags: review-

Comment 29

•

20 years ago

*** Bug 239971 has been marked as a duplicate of this bug. ***

Updated

•

20 years ago

Target Milestone: Bugzilla 2.18 → Bugzilla 2.20

Assignee

Comment 30

•

20 years ago

Attached patch simpler patch — Details — Splinter Review

also fixes a taint problem

Assignee

Updated

•

20 years ago

Attachment #148837 - Flags: review?

Assignee

Updated

•

20 years ago

Assignee: myk → bugzilla

Jouni Heikniemi

Comment 31

•

20 years ago

Comment on attachment 148837 [details] [diff] [review]
simpler patch

An interesting approach. The patch works on both Win32 (AS Perl 5.8.1) and
Linux, although I'm slightly confused on _why_ exactly does the piping work
here. This is different from our standard open "-|" approach, but that hasn't
been of relevance before. I need to think this through when I'm better awake.

Anyway, that's irrelevant -- the patch doesn't seem to cause harmful
side-effects, so I'm inclined to accept it. r=jouni, but let's have another
review from somebody who's intimately familiar with this stuff. Gerv?

Attachment #148837 - Flags: review?(gerv)

Attachment #148837 - Flags: review?

Attachment #148837 - Flags: review+

Comment 32

•

20 years ago

Comment on attachment 148837 [details] [diff] [review]
simpler patch

for the record, that's insanely similar to the code that *used* to be there,
which got removed because it was too easy to potentially cause security
problems that we couldn't otherwise detect that way.

Gervase Markham [:gerv]

Comment 33

•

20 years ago

Comment on attachment 148837 [details] [diff] [review]
simpler patch

I know nothing about this stuff :-) Myk might be your man.

Gerv

Attachment #148837 - Flags: review?(gerv) → review?(myk)

Jouni Heikniemi

Comment 34

•

20 years ago

(In reply to comment #32)
> (From update of attachment 148837 [details] [diff] [review])
> for the record, that's insanely similar to the code that *used* to be there,
> which got removed because it was too easy to potentially cause security
> problems that we couldn't otherwise detect that way.

What code are you referring to?

Comment 35

•

20 years ago

(In reply to comment #34)
> What code are you referring to?

OK, nevermind.  That's the same code that's already here. :)  Everyplace else we
did that kind of stuff it got replaced with a forked exec(), because the piped
opens are too easy to put bad variables in.

Assignee

Updated

•

20 years ago

Status: NEW → ASSIGNED

Assignee

Comment 36

•

20 years ago

i'd like to see this bug fixed for 2.18

No longer depends on: 129401

Flags: blocking2.18?

Comment 37

•

20 years ago

We'd all like it to, but it's not blocking.  I'll gladly accept a reviewed patch
for it prior to release if it's ready before then though.  (approval is what you
want to request if you think it's ready now, but I still see a pending review
request)

Flags: blocking2.18? → blocking2.18-

Assignee

Comment 38

•

20 years ago

myk .. any chance of getting a review on this please?

Comment 39

•

20 years ago

Comment on attachment 148837 [details] [diff] [review]
simpler patch

Works for me. r=myk

Attachment #148837 - Flags: review?(myk)

Comment 40

•

20 years ago

Low risk; good value for Windows installations.

a=myk

Flags: approval+

Assignee

Comment 41

•

20 years ago

i don't have cvs access, can someone please check this in for me?

hopefully it's ok to request blocking2.18 now.

Flags: blocking2.18- → blocking2.18?