Closed Bug 413851 Opened 17 years ago Closed 12 years ago

add csv output format to requests

Categories

(Bugzilla :: Attachments & Requests, enhancement)

3.1.2
enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
Bugzilla 5.0

People

(Reporter: timeless, Assigned: gerv)

References

()

Details

Attachments

(1 file, 10 obsolete files)

I'd like csv output so i can get some basic statistics in excel.

steps:
1. load
https://bugzilla.mozilla.org/request.cgi?action=queue&requester=&product=&type=all&requestee=&component=&group=requestee&format=csv

expected results:
a. honor requester=, product=, type=, requestee=, component=
b. use group for "sorting" and to control column order
c. output should be in csv.

Requestee,Requester,Flag,Bug,Attachment,Created
"","Max Kanat-Alexander <mkanat@bugzilla.org>",review,"319084: number of files indexed by lxr is retrievable from find.","204996: skip first line of file index",""
...
Attached patch this seems to work (obsolete) — Splinter Review
Assignee: attach-and-request → timeless
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Attachment #299110 - Flags: review?
Attachment #299110 - Flags: review? → review-
Comment on attachment 299110 [details] [diff] [review]
this seems to work

>Index: request.cgi
>===================================================================
>@@ -1,4 +1,4 @@
>-#!/usr/bin/perl -wT
>+#!/usr/bin/perl5.10.0 -wT

No.

>@@ -45,8 +45,32 @@
>+# Generate a reasonable filename for the user agent to suggest to the user
>+# when the user saves the bug list.  Uses the name of the remembered query
>+# if available.  We have to do this now, even though we return HTTP headers 
>+# at the end, because the fact that there is a remembered query gets 
>+# forgotten in the process of retrieving it.

Please fix this comment. It is't true when copied as is.

>+$filename =~ s/\\/\\\\/g; # escape backslashes
>+$filename =~ s/"/\\"/g; # escape quotes

Are these needed? If so, what about blanks?

>+if ($format->{'extension'} eq "csv") {

Maybe instead check against ctype param as extension might differ.

>Index: template/en/default/request/queue.csv.tmpl
>===================================================================

This template doesn't pass tests:

t/008filter..........NOK 243
#   Failed test '(en/default) template/en/default/request/queue.csv.tmpl has unfiltered directives:
#   42: colsepchar
#   42: column_headers.$column +
#   49: colsepchar
#   54: request.type
#   58: request.status
#   62: request.bug_id
#   62:+ request.bug_summary
#   67: request.attach_id
#   67:+ request.attach_summary
#   74: request.requestee
#   78: request.requester

>@@ -0,0 +1,83 @@
>+[%# The contents of this file are subject to the Mozilla Public
>+  # License Version 1.1 (the "License"); you may not use this file

Nit: You should use latest boilerplate for new files. One that atleast includes BEGIN/END LICENSE BLOCK comments.

>+  # The Initial Developer of the Original Code is Netscape Communications
>+  # Corporation. Portions created by Netscape are
>+  # Copyright (C) 1998 Netscape Communications Corporation. All
>+  # Rights Reserved.

I doubt this is correct.

>+  # Contributor(s): Myk Melez <myk@mozilla.org>

Really?

>+[% PROCESS "global/field-descs.none.tmpl" %]

I can't see any reason to include this.

>+      "category"   => "Product/Component"    } %]

This doesn't have matching display_category block so trying to display this column crashes Bugzilla. This happens, for example, when Group By is set to Product/Component.

Note that this problem doesn't affect HTML output as it never tries to display Product/Component as a column. It's only used in group heading.

>+           group_value     = ""

Doesn't look like this variable is needed.
Attached patch Patch v.1 (obsolete) — Splinter Review
Oops - just independently reinvented this before remembering this bug!

This is a slightly simpler patch which does the job. I think we need this for BMO, so we can monitor how well we are doing at reducing the number of old outstanding reviews, but let's upstream it first.

Gerv
Assignee: timeless → gerv
Attachment #575485 - Flags: review?(dkl)
Comment on attachment 575485 [details] [diff] [review]
Patch v.1

Review of attachment 575485 [details] [diff] [review]:
-----------------------------------------------------------------

Few comments.

::: request.cgi
@@ +52,2 @@
>  
> +print $cgi->header(-type => $format->{'ctype'});

Need to also set the content_disposition so that it gives it a proper filename. Otherwise, it asks to save it as request.cgi. Code stole from buglist.cgi.

my @time = localtime(time());
my $date = sprintf "%04d-%02d-%02d", 1900+$time[5],$time[4]+1,$time[3];
my $filename = "requests-$date.$format->{extension}";
my $disposition = "attachment; filename=\"$filename\"";

print $cgi->header(-type                => $format->{ctype},
                   -content_disposition => $disposition);

::: template/en/default/request/queue.csv.tmpl
@@ +23,5 @@
> +[% USE Bugzilla %]
> +[% cgi = Bugzilla.cgi %]
> +
> +[% column_headers = {
> +      "type"            => "Flag" ,

Nit: two space indentation (see global/field-descs.none.tmpl)

@@ +25,5 @@
> +
> +[% column_headers = {
> +      "type"            => "Flag" ,
> +      "status"          => "Status" ,
> +      "bug_summary"     => "$terms.Bug" ,

"bug_summary" => "$terms.Bug Summary",

@@ +27,5 @@
> +      "type"            => "Flag" ,
> +      "status"          => "Status" ,
> +      "bug_summary"     => "$terms.Bug" ,
> +      "bug_id"          => "$terms.Bug ID" ,
> +      "attach_summary"  => "Attachment" ,

"attach_summary" => "Attachment Description"

@@ +34,5 @@
> +      "requestee"       => "Requestee" ,
> +      "created"         => "Created" ,
> +      "category"        => "Product: Component" } %]
> +
> +[% DEFAULT display_columns = ["requester", "requestee", "type", "status", "bug_id", "bug_summary", "attach_id", "attach_summary", "created", "category"] %]

Nit: line too long, break up to less than 80 chars.

@@ +48,5 @@
> +      [% FOREACH column = display_columns %]
> +          [% IF column == 'created' %]
> +              [% request.$column FILTER time %][% ',' IF NOT loop.last() %]
> +          [% ELSE %]
> +              [% request.$column FILTER csv %][% ',' IF NOT loop.last() %]

Nit: Move [% ',' IF NOT loop.last() %] outside of the IF/ELSE conditions so it is only needed once.
Attachment #575485 - Flags: review?(dkl) → review-
(In reply to David Lawrence [:dkl] from comment #4)
> Need to also set the content_disposition so that it gives it a proper
> filename. 

We don't want to do that as an attachment unconditionally, otherwise the HTML version will be tagged for download too!

If we use content-disposition: inline, it turns out Firefox offers the download anyway. So I think that's the right thing to do.

Gerv
Attached patch Patch v.2 (obsolete) — Splinter Review
Try this. I've created a new function for making content dispositions, and refactored all the call sites (apart from attachment download, which works a bit differently) to use it.

Gerv
Attachment #299110 - Attachment is obsolete: true
Attachment #575485 - Attachment is obsolete: true
Attachment #576724 - Flags: review?(dkl)
Comment on attachment 576724 [details] [diff] [review]
Patch v.2

Review of attachment 576724 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me and works as expected. r=dkl
Attachment #576724 - Flags: review?(dkl) → review+
Requesting approval. Seems fairly low-risk for 4.2, but I don't know what criteria we are using.

Gerv
Flags: approval?
Flags: approval4.2?
Comment on attachment 576724 [details] [diff] [review]
Patch v.2

Review of attachment 576724 [details] [diff] [review]:
-----------------------------------------------------------------

I'm going to let LpSolit approve this, because he owns flags. However, I do have a few comments on the patch:

::: Bugzilla/Util.pm
@@ +735,5 @@
>      }
>      return $encoding;
>  }
>  
> +sub make_content_disposition {

This should be a method in Bugzilla::CGI, not a function in Bugzilla::Util.

@@ +740,5 @@
> +    my ($type, $pre, $ext) = @_;
> +
> +    my @time = localtime(time());
> +    my $date = sprintf "%04d-%02d-%02d", 1900+$time[5], $time[4]+1, $time[3];
> +    my $filename = "$pre-$date.$ext";

Ah, why aren't you just using format_time?

@@ +744,5 @@
> +    my $filename = "$pre-$date.$ext";
> +    
> +    $filename =~ s/\s/_/g; # Remove whitespace to avoid HTTP header tampering
> +    $filename =~ s/\\/\\\\/g; # escape backslashes
> +    $filename =~ s/"/\\"/g; # escape quotes

Is there not some other safe method of doing this, perhaps some function that exists somewhere in one of our libraries or elsewhere in our code? We must be doing this elsewhere.
But I think we shouldn't take it on 4.2, sorry. It really is an enhancement and we really are trying to narrow down what goes in on that branch now.
Flags: approval4.2? → approval4.2-
(In reply to Max Kanat-Alexander from comment #9)
> This should be a method in Bugzilla::CGI, not a function in Bugzilla::Util.

I'm about to attach a new patch which moves it to there, but it makes buglist.cgi somewhat less neat, I think. I'd appreciate your comments.

> Ah, why aren't you just using format_time?

Because the code I pinched wasn't :-) Fixed.

> Is there not some other safe method of doing this, perhaps some function
> that exists somewhere in one of our libraries or elsewhere in our code? We
> must be doing this elsewhere.

A quick look doesn't turn anything up. And I'm not sure what you'd call this sort of escaping anyway. It's not even a general HTTP header escaping...

Gerv
Attached patch Patch v.2a (obsolete) — Splinter Review
Attachment #577544 - Flags: review?(mkanat)
New patch attached, cancelling review.
Flags: approval?
the current patch causes the following warning on mod_perl when httpd is restarted:

> Variable "$format" will not stay shared at /data/www/bugzilla.mozilla.org/request.cgi line 330
Attached patch Patch v.3 (obsolete) — Splinter Review
I don't have mod_perl... but this should fix it.

Gerv
Attachment #577544 - Attachment is obsolete: true
Attachment #581951 - Flags: review?(glob)
Attachment #577544 - Flags: review?(mkanat)
Comment on attachment 581951 [details] [diff] [review]
Patch v.3

r=glob  mod_perl no longer warns :)
Attachment #581951 - Flags: review?(glob) → review+
Flags: approval?
Attachment #576724 - Attachment is obsolete: true
Comment on attachment 581951 [details] [diff] [review]
Patch v.3

>=== modified file 'Bugzilla/CGI.pm'

>+use DateTime;

Ah no, don't use DateTime here. Loading this module is slow, see bug 696537. As you don't want the time but only the date, the current code to get the date is fine and faster.
Attachment #581951 - Flags: review-
Flags: approval?
Blocks: 713325
Attached patch Patch v.4 (obsolete) — Splinter Review
Version without use of DateTime.

Gerv
Attachment #581951 - Attachment is obsolete: true
Attachment #587714 - Flags: review?(LpSolit)
No longer blocks: 713325
Comment on attachment 587714 [details] [diff] [review]
Patch v.4

>=== modified file 'Bugzilla/CGI.pm'

>+    if ($self->{'_content_disp'}) {
>+        unshift(@_, '-Content_Disposition' => $self->{'_content_disp'});
>+    }

We use lowercase everywhere, so it should be -content_disposition for consistency.


>+    $filename =~ s/\\/\\\\/g; # escape backslashes
>+    $filename =~ s/"/\\"/g; # escape quotes

I know this code is already present in Bugzilla, but I wonder if we shouldn't also convert \ and " into _. I think that's safer than escaping them (also, I'm not sure how the OS behaves with such characters).



>=== modified file 'buglist.cgi'

> sub _close_standby_message {
>+    my ($contenttype, $disposition, $cd_prefix, $extension, $serverpush) = @_;

What means "cd" in $cd_prefix??


>     if ($serverpush) {
>         print $cgi->multipart_end();
>-        print $cgi->multipart_start(-type                => $contenttype,
>-                                    -content_disposition => $disposition);
>+        print $cgi->multipart_start($contenttype);
>     }

Your patch is breaking buglist.cgi. No when I run a query, I see

 Set-Cookie: LASTORDER=bug_id; path=/bugzilla/; expires=Fri, 01-Jan-2038 00:00:00 GMT

as the first line. So this line appears in the body instead of in the header.



>=== added file 'template/en/default/request/queue.csv.tmpl'

>+[% USE Bugzilla %]
>+[% cgi = Bugzilla.cgi %]

Remove these lines. You don't use cgi.foo anywhere in this template.


>+  [% FOREACH request = requests %]
>+      [% FOREACH column = display_columns %]
>+          [% IF column == 'created' %]
>+              [% request.$column FILTER time %]
>+          [% ELSE %]
>+              [% request.$column FILTER csv %]
>+          [% END %][% ',' IF NOT loop.last() %]
>+      [% END %]
>+
>+  [% END %]

Please fix the indentation. In templates, the indentation is 2 whitespaces, not 4. Also, remove the newline between these two [% END %].
Also, for the requester and requestee, you must use |FILTER email| as we don't want to disclose full email addresses to logged out users.

Also, I see no link pointing to the CSV form, and so this feature will remain hidden and unknown from everybody but you. Please add a link.
Attachment #587714 - Flags: review?(LpSolit) → review-
Attached patch Patch v.5 (obsolete) — Splinter Review
Review comments addressed. I changed "\" to "_" but left double quotes alone; I'm pretty sure every modern OS can handle double quotes in filenames.

Gerv
Attachment #587714 - Attachment is obsolete: true
Attachment #611836 - Flags: review?(LpSolit)
Comment on attachment 611836 [details] [diff] [review]
Patch v.5

>=== modified file 'Bugzilla/CGI.pm'

>+sub set_dated_content_disp {

>+    $filename =~ s/"/\\"/g; # escape quotes

I checked, and double quotes are not allowed in filenames, together with many other characters, see http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247%28v=vs.85%29.aspx#file_and_directory_names

I propose that we convert all the characters that Windows doesn't allow in filenames to avoid any problem (I suspect redirection characters < and > would be evil on Linux too).


>+    
>+    my $disposition = "$type; filename=\"$filename\"";
>+    
>+    $self->{'_content_disp'} = $disposition;

Nit: please remove trailing whitespaces on empty lines.



>=== modified file 'buglist.cgi'

>-        print $cgi->multipart_start(-type                => $contenttype,
>-                                    -content_disposition => $disposition);
>+        print $cgi->multipart_start($contenttype);

multipart_start() expects a hash as argument, not a scalar. This is the reason why

 Set-Cookie: LASTORDER=bug_id; path=/bugzilla/; expires=Fri, 01-Jan-2038 00:00:00 GMT

is displayed in the body of the page. Writing (-type => $contenttype) as it is currently fixes the problem.



>=== added file 'template/en/default/request/queue.csv.tmpl'

>+        [% request.$column FILTER time %]

You must also call FILTER csv after FILTER time.


>=== modified file 'template/en/default/request/queue.html.tmpl'

>+  <a href="request.cgi?[% urlquerypart %]&ctype=csv">(view entire list as CSV)</a>

You must call FILTER html for urlquerypart. Also, remove the parens. They do not look nice in the HTML page.
Attachment #611836 - Flags: review?(LpSolit) → review-
(In reply to Frédéric Buclin from comment #21)
> I checked, and double quotes are not allowed in filenames, together with
> many other characters, see
> http://msdn.microsoft.com/en-us/library/windows/desktop/aa365247%28v=vs.
> 85%29.aspx#file_and_directory_names
> 
> I propose that we convert all the characters that Windows doesn't allow in
> filenames to avoid any problem (I suspect redirection characters < and >
> would be evil on Linux too).

We need to either do this completely or not at all. Option A) is that we find the set of characters which are permitted in filenames on all major platforms, and restrict it to that. (In which case, why not restrict the filename on upload rather than download?) Option B) is that we let the web browser and OS handle those problems. After all, people can do a "File | Save As" on a link like:

<a href="http://www.foo.com/fred|wilma.txt">Download Me</a>

and the browser and OS between them have to stop something bad happening. So they must have mechanisms for doing so.

IMO, the aim of this sanitization is merely to stop people breaking the HTTP response, not to make sure the filename is one which is acceptable to the target system. We need to escape quotes because they are the delimiter, and escape backslashes because they are the escape character. We remove whitespace to avoid HTTP response splitting, although arguably that restriction is too broad - we should allow " ", which is a valid filename character on many platforms.

But I don't see a need to do more than that.

> Also, remove the parens. They do
> not look nice in the HTML page.

They are simply to match the parens around "(view as bug list)", which is the text directly above, and which appears at the bottom of each section.

I have fixed all the other issues you raised - thank you.

Gerv
Attached patch Patch v.6 (obsolete) — Splinter Review
The CGI.pm part of the patch is larger because I ran it through my editor's "remove trailing whitespace" function.

Gerv
Attachment #611836 - Attachment is obsolete: true
Attachment #618626 - Flags: review?(LpSolit)
Attached patch Patch v.7 (obsolete) — Splinter Review
Drat - forgot to save one file.

Gerv
Attachment #618626 - Attachment is obsolete: true
Attachment #618627 - Flags: review?(LpSolit)
Attachment #618626 - Flags: review?(LpSolit)
Comment on attachment 618627 [details] [diff] [review]
Patch v.7

As asked on IRC some weeks ago, unrelated whitespaces changes should be removed to not confuse the history of the files.
Attachment #618627 - Flags: review?(LpSolit) → review-
Attached patch Patch v.8 (obsolete) — Splinter Review
Whitespace changes removed. Hope it's all OK now :-)

Gerv
Attachment #618627 - Attachment is obsolete: true
Attachment #624731 - Flags: review?(LpSolit)
Comment on attachment 624731 [details] [diff] [review]
Patch v.8

>=== added file 'template/en/default/request/queue.csv.tmpl'

>+[% PROCESS global/variables.none.tmpl %]

This template is now PRE-PROCESS'ed in Template->create() itself, see bug 696256, meaning that we no longer call this template ourselves. Please remove this call.


+[% column_headers = {
+  "type"            => "Flag" ,
+  "status"          => "Status" ,

Ideally, you should get strings from field_descs.$foo instead if hardcoding them here. I think most of the fields here are available in field_descs.



>+[% DEFAULT display_columns = ["requester", "requestee", "type", "status",

You specify a default list, but there is no way to override it. You must pass display_columns to the template so that we can set it from the URL.



>=== modified file 'template/en/default/request/queue.html.tmpl'

>+  <a href="request.cgi?[% urlquerypart FILTER html %]&ctype=csv">(view entire list as CSV)</a>

You must escape & -> &amp;


Otherwise looks good and works fine.
Attachment #624731 - Flags: review?(LpSolit) → review-
Attached patch Patch v.9Splinter Review
Review comments addressed. Instead of making the columns properly changeable, I just made them fixed. You need all the data for most use cases anyway.

Gerv
Attachment #624731 - Attachment is obsolete: true
Attachment #688337 - Flags: review?(LpSolit)
Comment on attachment 688337 [details] [diff] [review]
Patch v.9

>=== modified file 'Bugzilla/CGI.pm'

>+    if ($self->{'_content_disp'}) {
>+        unshift(@_, '-content_disposition' => $self->{'_content_disp'});
>+    }

The 2nd half of the header() method is about security headers. It's a bit weird to put your code about content disposition between x_frame_options and x_xss_protection. Please move your code right after the definition of the type header. That seems a bit more logical to me.



>=== added file 'template/en/default/request/queue.csv.tmpl'

>+[% PROCESS global/field-descs.none.tmpl %]

Did you test your patch? The code doesn't compile if you don't quote "global/field-descs.none.tmpl":

Precompiling templates...file error - parse error - request/queue.csv.tmpl line 8: unexpected token (-)
  [% PROCESS global/field-descs.none.tmpl %]


>+[% column_headers = {
>+  "type"            => "Flag" ,
>+  "status"          => field_descs.bug_status ,
>+  "bug_summary"     => field_descs.short_desc ,
>+  "bug_id"          => field_descs.bug_id ,
>+  "attach_summary"  => "Attachment Description" ,
>+  "attach_id"       => "Attachment ID" ,
>+  "requester"       => "Requester" ,
>+  "requestee"       => "Requestee" ,
>+  "created"         => "Created" ,

Please remove the useless whitespace before the commas.



>=== modified file 'template/en/default/request/queue.html.tmpl'

>+  <a href="request.cgi?[% urlquerypart FILTER html %]&amp;ctype=csv">(view entire list as CSV)</a>

I still think parens do not look nice here. You could remove them around (view as bug list) too, for consistency.


r=LpSolit with these comments addressed.
Attachment #688337 - Flags: review?(LpSolit) → review+
Flags: approval+
Keywords: relnote
OS: Windows XP → All
Hardware: x86 → All
Target Milestone: --- → Bugzilla 5.0
Committed revision 8543.

Gerv
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
(In reply to Gervase Markham [:gerv] from comment #30)
> Committed revision 8543.
> 
> Gerv

In the future, please paste the *full* commit output. See bug 817486, comment #5 as an example.
Committing to: bzr+ssh://bzr.mozilla.org/bugzilla/trunk/                       
modified buglist.cgi
modified request.cgi
modified template/en/default/request/queue.html.tmpl
modified report.cgi
added template/en/default/request/queue.csv.tmpl
modified Bugzilla/CGI.pm
Committed revision 8543.

Gerv
Added to relnotes for 5.0rc1.
Keywords: relnote
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: