Closed Bug 760562 Opened 12 years ago Closed 12 years ago

Integrate TypeSniffer into core codebase (auto-detect attachment MIME type)

Categories

(Bugzilla :: Attachments & Requests, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED
Bugzilla 4.4

People

(Reporter: gerv, Assigned: selsky)

References

()

Details

Attachments

(1 file, 6 obsolete files)

The TypeSniffer extension:
http://bzr.mozilla.org/bugzilla/extensions/typesniffer/trunk
has proved its worth on b.m.o. In bug 622455 comment 8 and on IRC, LpSolit has suggested we integrate it into trunk as an optional feature (i.e. the module is optional, and things work as they do now if it's not installed).

Related: bug 85132.

Gerv
Severity: normal → enhancement
Attached patch v1 (obsolete) — Splinter Review
Assignee: attach-and-request → selsky
Status: NEW → ASSIGNED
Attachment #630821 - Flags: review?(LpSolit)
Attached patch v2 (obsolete) — Splinter Review
Merged in updates from bug 762965.
Attachment #630821 - Attachment is obsolete: true
Attachment #630821 - Flags: review?(LpSolit)
Attachment #633222 - Flags: review?(LpSolit)
Comment on attachment 633222 [details] [diff] [review]
v2

I didn't test your patch yet, but it looks good. One major thing to fix first, though.


>=== modified file 'Bugzilla/Attachment.pm'

>+    if (Bugzilla->feature('typesniffer')) {
>+        import File::MimeInfo::Magic;
>+        import IO::Scalar;
>+
>+        # If we have autodetected application/octet-stream from the Content-Type
>+        # header, let's have a better go using a sniffer.
>+        if (Bugzilla->input_params->{contenttypemethod} eq 'autodetect' &&
>+            $params->{mimetype} eq 'application/octet-stream') {

To not waste cycles into loading File::MimeInfo::Magic and IO::Scalar, we should reverse the order in which we check stuff:

 if (Bugzilla->input_params->{contenttypemethod} eq 'autodetect'
     && $params->{mimetype} eq 'application/octet-stream'
     && Bugzilla->feature('typesniffer'))
 {

i.e. we only load these modules when we know we have something to do. Also, I'm not sure it makes sense to call 'import' when we aren't importing anything explicitly. But I may be wrong.


>+            else {
>+                # CGI.pm sends us an Fh that isn't actually an IO::Handle, but
>+                # has a method for getting an actual handle out of it.
>+                if (!$fh->isa('IO::Handle')) {

Nit: combine else + if into a single elsif.


>+            if (defined $mimetype) {
>+                $params->{mimetype} = $mimetype;
>+            }

Nit: could simply be written as: $params->{mimetype} = $mimetype if $mimetype;

Now the reason I deny review is because this code should be in _check_content_type(), not in _check_data(). This code is really about the MIME type, not about the data itself.


>=== modified file 'template/en/default/setup/strings.txt.pl'

>+    feature_typesniffer       => 'Sniff MIME types of attachments',

Shouldn't "types" be singular, as each attachment has only one type?
Attachment #633222 - Flags: review?(LpSolit) → review-
Keywords: relnote
Target Milestone: --- → Bugzilla 4.4
Attached patch v3 (obsolete) — Splinter Review
> To not waste cycles into loading File::MimeInfo::Magic and IO::Scalar, we should
> reverse the order in which we check stuff:

Fixed.

> i.e. we only load these modules when we know we have something to do. Also, I'm
> not sure it makes sense to call 'import' when we aren't importing anything
> explicitly. But I may be wrong.

We need to call import since the mimetype() function is exported by default.  But IO::Scalar can definitately be required instead of imported.  Should I explicitly call File::MimeInfo::Magic::mimetype() to avoid the call to import?

> Nit: combine else + if into a single elsif.

Fixed.

> Nit: could simply be written as: $params->{mimetype} = $mimetype if $mimetype;

Fixed.

> Now the reason I deny review is because this code should be in
> _check_content_type(), not in _check_data(). This code is really about the MIME
> type, not about the data itself.

Fixed.

> Shouldn't "types" be singular, as each attachment has only one type?

Fixed.

I also confirmed that all tests in t/ pass.
Attachment #633222 - Attachment is obsolete: true
Attachment #633258 - Flags: review?(LpSolit)
Comment on attachment 633258 [details] [diff] [review]
v3

>=== modified file 'Bugzilla/Attachment.pm'

>+    if (Bugzilla->input_params->{contenttypemethod} eq 'autodetect' &&
>+        $content_type eq 'application/octet-stream' &&
>+        Bugzilla->feature('typesniffer')) {

Nit: write && at the beginning of lines, not at the end, to follow our guidelines.


>+        import File::MimeInfo::Magic;

To make it clearer where mimetype() comes from, I think you should write:

          import File::MimeInfo::Magic qw(mimetype);


>+        my $fh = $params->{data};
>+        if (!ref($fh)) {
>+            $fh = new IO::Scalar \$fh;

I'm not sure it's a good idea to reuse $fh to store the IO::Scalar object. I suspect setting $fh to a reference to itself is going to generate a loop. Just to make sure we don't cause any trouble, use another variable name to store the IO::Scalar object.


>+            $fh = $fh->handle;

Same here.


>+        my $mimetype = mimetype($fh);
>+        $content_type = $mimetype if $mimetype;

While playing with your patch, I noted that mimetype() sets the content type of patches to text/x-patch. The problem is that the browser will try to download the patch instead of diplaying it inline, and we cannot use the PatchReader tool. As we already have code to take this problem into account in get_content_type(), we should refactor the code a bit to reuse it here. That's the reason of my r-. Otherwise your patch looks good.
Attachment #633258 - Flags: review?(LpSolit) → review-
(In reply to Frédéric Buclin from comment #5)
> I'm not sure it's a good idea to reuse $fh to store the IO::Scalar object. I
> suspect setting $fh to a reference to itself is going to generate a loop.
> Just to make sure we don't cause any trouble, use another variable name to
> store the IO::Scalar object.

This comment surprises me. $foo = some_function($foo) is a pretty common programming idiom; if Perl couldn't deal with it, there would be a lot of broken code out there! Every use of "+=" or "=~" effectively does this.

Gerv
(In reply to Gervase Markham [:gerv] from comment #6)
> This comment surprises me. $foo = some_function($foo) is a pretty common
> programming idiom

$foo = some_function($foo) is pretty common, $foo = some_function(\$foo) is not. I don't want to fall into something similar to bug 660502, see attachment 535919 [details] [diff] [review] there. So why take the risk?
Attached patch v4 (obsolete) — Splinter Review
> $foo = some_function($foo) is pretty common, $foo = some_function(\$foo) is
> not. I don't want to fall into something similar to bug 660502, see
> attachment 535919 [details] [diff] [review] there. So why take the risk?

In bug 660502, $tmp is only used in trunk.  In the other branches, the only fix was closing the file handle.  How come only trunk got that part of the fix?

I've addressed the other nits.
Attachment #633258 - Attachment is obsolete: true
Attachment #635098 - Flags: review?(LpSolit)
(In reply to Matt Selsky [:selsky] from comment #8)
> In bug 660502, $tmp is only used in trunk.  In the other branches, the only
> fix was closing the file handle.  How come only trunk got that part of the
> fix?

Because the code is not the same in trunk and 4.2.
Attached patch v5 (obsolete) — Splinter Review
I've addressed the reassigned reference.
Attachment #635098 - Attachment is obsolete: true
Attachment #635098 - Flags: review?(LpSolit)
Attachment #636012 - Flags: review?(LpSolit)
Comment on attachment 636012 [details] [diff] [review]
v5

>=== modified file 'Bugzilla/Attachment.pm'

>+            my $tmp = new IO::Scalar \$fh;
>+            $fh = $tmp;

This trick doesn't really help. Let's write "$fh = new ...." as you initially did for now, and we will see if this causes any problem in the future.


>+    if ($content_type =~ m{text/x-(?:diff|patch)}) {
>+        if (ref($invocant)) {

This code has two problems:
- First, a user can no longer set the MIME type to text/x-diff or text/x-patch manually as this code will alter it in all cases;
- This code should only be triggered for new attachments.

Said differently, this code must only be triggered for newly uploaded attachments and only if the method is autodetect, which is how Bugzilla behaves currently.
Attachment #636012 - Flags: review?(LpSolit) → review-
Attached patch v6 (obsolete) — Splinter Review
OK, I undid the trick.

Also, I only override text/x-patch, etc when autodetect is set.  This should also cover the case of non-newly uploaded attachments since that parameter won't be set.
Attachment #636012 - Attachment is obsolete: true
Attachment #638226 - Flags: review?(LpSolit)
I also uploaded a trivial patch to File::BaseDir (https://rt.cpan.org/Public/Bug/Display.html?id=41744) to avoid the following warning message:

WARNING: HOME is not set, using root: /
Comment on attachment 638226 [details] [diff] [review]
v6

>=== modified file 'Bugzilla/Attachment.pm'

>+    # Make sure patches are viewable in the browser
>+    if (Bugzilla->input_params->{contenttypemethod} eq 'autodetect'
>+        && $content_type =~ m{text/x-(?:diff|patch)}) {
>+        if (ref($invocant)) {
>+            $invocant->set_is_patch(1);
>+        }
>+        else {
>+            $params->{ispatch} = 1;
>+        }

As you are looking at ref($invocant) anyway, this code should be:

  if (!ref($invocant)
      && Bugzilla->input_params->{contenttypemethod} eq 'autodetect'
      && $content_type =~ m{text/x-(?:diff|patch)})
  {
      $params->{ispatch} = 1;
      ...
  }


Also note that due to https://rt.cpan.org/Public/Bug/Display.html?id=41744, a warning is thrown in the web server error log everytime we load File::MimeInfo::Magic. This is a bit annoying, and probably we should define $ENV{HOME} ourselves to make it happy. Something like:

    # $ENV{HOME} must be defined when using File::MimeInfo::Magic,
    # see https://rt.cpan.org/Public/Bug/Display.html?id=41744.
    local $ENV{HOME} = File::Spec->rootdir();


r=LpSolit with these problems fixed. Could you upload a new patch, please?
Attachment #638226 - Flags: review?(LpSolit) → review+
(In reply to Frédéric Buclin from comment #14)
>     local $ENV{HOME} = File::Spec->rootdir();

Even better, in case $ENV{HOME} is defined:

    local $ENV{HOME} = $ENV{HOME} || File::Spec->rootdir();
Flags: approval?
I'm still getting the same warning with $ENV{HOME} being set directly above the import of File::MimeInfo::Magic.  Also, do I need to handle the case of $invocant being a reference when setting "ispatch"?
(In reply to Matt Selsky [:selsky] from comment #16)
> I'm still getting the same warning with $ENV{HOME} being set directly above
> the import of File::MimeInfo::Magic.

That's because the error isn't thrown when calling import(), but when calling Bugzilla->feature(). So you must define "local $ENV{HOME}" before calling it.


  Also, do I need to handle the case of
> $invocant being a reference when setting "ispatch"?

If $invocant is a reference, then you are editing an existing attachment, and we shouldn't alter what the user typed. So the code simply needs to be like what I typed in comment 14. If $invocant is a reference, do nothing.
Attached patch v7Splinter Review
Also, I added a check for Bugzilla->input_params->{contenttypemethod} being defined, in case the user clicks the "ispatch" checkbox.
Attachment #638226 - Attachment is obsolete: true
Attachment #638339 - Flags: review?(LpSolit)
Comment on attachment 638339 [details] [diff] [review]
v7

>=== modified file 'Bugzilla/Attachment.pm'

>+    local $ENV{HOME} ||= File::Spec->rootdir();

This won't work. It must be:

  local $ENV{HOME} = $ENV{HOME} || File::Spec->rootdir();

Because the first syntax will always fall back to rootdir().


Otherwise looks good. Thanks for the patch. r=LpSolit
Attachment #638339 - Flags: review?(LpSolit) → review+
Flags: approval? → approval+
I did the fix on checkin.

Committing to: bzr+ssh://lpsolit%40gmail.com@bzr.mozilla.org/bugzilla/trunk/
modified Bugzilla/Attachment.pm
modified Bugzilla/Install/Requirements.pm
modified template/en/default/setup/strings.txt.pl
Committed revision 8284.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Added to relnotes for 4.4.
Keywords: relnote
Blocks: 852445
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: