Last Comment Bug 328628 - Attachment filenames having UTF8 characters in them are rendered incorrectly
: Attachment filenames having UTF8 characters in them are rendered incorrectly
Status: RESOLVED FIXED
:
Product: Bugzilla
Classification: Server Software
Component: Attachments & Requests (show other bugs)
: unspecified
: All All
: -- minor (vote)
: Bugzilla 3.4
Assigned To: Max Kanat-Alexander
: default-qa
:
Mentors:
: 481821 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2006-02-26 01:32 PST by Dmitriy Lujecskiy
Modified: 2013-08-09 22:33 PDT (History)
13 users (show)
mkanat: approval+
mkanat: approval3.4+
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
File name in Russian characters (1.17 KB, application/octet-stream)
2006-02-26 01:34 PST, Dmitriy Lujecskiy
no flags Details
QP encode file name of attachment (1.55 KB, patch)
2006-03-30 01:09 PST, Renat Sabitov
wurblzap: review-
glob: review+
Details | Diff | Splinter Review
QP encode file name of attachment v.2 (1.59 KB, patch)
2006-06-09 04:14 PDT, Renat Sabitov
wurblzap: review+
Details | Diff | Splinter Review
Patch 1.2 (974 bytes, patch)
2007-08-15 06:51 PDT, Marc Schumann [:Wurblzap]
glob: review+
LpSolit: review-
Details | Diff | Splinter Review
Patch 1.3 (1.52 KB, patch)
2007-09-09 15:36 PDT, Marc Schumann [:Wurblzap]
no flags Details | Diff | Splinter Review
v4 (1.02 KB, patch)
2009-08-19 15:17 PDT, Max Kanat-Alexander
wurblzap: review-
Details | Diff | Splinter Review
v5 (1.10 KB, patch)
2009-09-29 23:56 PDT, Max Kanat-Alexander
wurblzap: review+
Details | Diff | Splinter Review

Description Dmitriy Lujecskiy 2006-02-26 01:32:21 PST
User-Agent:       Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; MyIE2; .NET CLR 1.1.4322)
Build Identifier: 2.21.1+(download by CVS from 2006-02-20)

If try download attacment file created with Russian filename then bad name

Reproducible: Always

Steps to Reproduce:
1.Download attachment for this bug
2.
3.

Actual Results:  
Bad file name for save
Comment 1 Dmitriy Lujecskiy 2006-02-26 01:34:30 PST
Created attachment 213229 [details]
File name in Russian characters
Comment 2 Dmitriy Lujecskiy 2006-02-26 01:48:37 PST
This Bugzila server work good.

I am use server with configuration in bug 328013.

Comment 3 Dmitriy Lujecskiy 2006-02-26 02:38:34 PST
I am get HTTP header as

Content-disposition: inline; filename="Текстовый документ.xxx"
Content-Type: application/octet-stream; name="Текстовый документ.xxx"


it is not rfc2231
Comment 4 Renat Sabitov 2006-03-30 01:09:25 PST
Created attachment 216729 [details] [diff] [review]
QP encode file name of attachment

Non-ascii characters in file names must be base64 or qp encoded just like subject in mail. See patch. Changes in encode_qp_words are made because first word don't have space before.
Comment 5 Max Kanat-Alexander 2006-05-16 11:18:45 PDT
Comment on attachment 216729 [details] [diff] [review]
QP encode file name of attachment

I don't know anything about this code; probably best to ask Wurblzap or glob.
Comment 6 Byron Jones ‹:glob› [PTO until 2016-10-10] 2006-05-16 18:01:42 PDT
Comment on attachment 216729 [details] [diff] [review]
QP encode file name of attachment

r=glob by inspection
Comment 7 Marc Schumann [:Wurblzap] 2006-06-09 03:27:40 PDT
Comment on attachment 216729 [details] [diff] [review]
QP encode file name of attachment

This doesn't work if the first word is 7bit-clean and the second is not. You probably need to move the if($first) part outside if(!is_7bit_clean($word)).

Nit 1: you're appending '_' to $qp_head in every loop -- it might be easier to put the if($first) part after the push, and move the "else" part of if($first) into the "then" part. (This means that you need to merge $qp_head = '=UTF-8?Q?' with my $qp_head, of course.)

Nit 2: line breaks at curly braces should be as shown in the example at http://www.bugzilla.org/docs/developer.html#perl-style.
Comment 8 Renat Sabitov 2006-06-09 04:14:05 PDT
Created attachment 224989 [details] [diff] [review]
QP encode file name of attachment v.2

Thank you, Marc!

Please review new version of patch
Comment 9 Marc Schumann [:Wurblzap] 2006-06-09 04:32:30 PDT
Comment on attachment 224989 [details] [diff] [review]
QP encode file name of attachment v.2

That was quick. r=wurblzap by inspection.

Do you want to sign off on the updated patch, glob?
Comment 10 Frédéric Buclin 2006-10-12 03:26:05 PDT
Marc, Byron, what's new about this patch/bug?
Comment 11 Vitaly Fedrushkov 2006-11-17 04:00:39 PST
Confirmed and patched (the wrong way) independently while working on Bugzilla-ru project.
Comment 12 Vitaly Fedrushkov 2007-05-30 21:57:11 PDT
Confirmed in 3.0 for Internet Explorer 7.

Target milestone missed...
Comment 13 Marc Schumann [:Wurblzap] 2007-08-15 06:51:30 PDT
Created attachment 276768 [details] [diff] [review]
Patch 1.2

Unrotted.
Comment 14 Byron Jones ‹:glob› [PTO until 2016-10-10] 2007-08-15 17:55:43 PDT
Comment on attachment 276768 [details] [diff] [review]
Patch 1.2

r=glob
Comment 15 Frédéric Buclin 2007-08-17 16:57:49 PDT
Comment on attachment 276768 [details] [diff] [review]
Patch 1.2

This patch crashes Firefox when viewing an attachment with a name such as bug0öäüöüéàè-cvs.diff
Comment 16 Marc Schumann [:Wurblzap] 2007-08-17 22:35:17 PDT
Confirming the crash using the file name Frédéric used. I found out what's so special about this file name. It worked for me with others; the difference is it's resulting encoded length. It turns out that encode('MIME-Q') inserts a line break and a blank whenever the resulting string length exceeds a certain limit (probably 80 or 76 or something). With this, Bugzilla produces a multiple-line HTTP header entry, like this:

  Content-Type: text/plain; name="=?UTF-8?Q?bug=200=C3=B6=C3=A4=C3=BC?=
   =?UTF-8?Q?=2Dcvs?=.diff"; charset=UTF-8

I'd say this conforms to section 4.2 of RFC 2616, but Apache seems not to be able to parse this and says "malformed header from script. Bad header=?UTF-8?Q?cvs?" (error 500). It s complaining about the beginning of the second line makes my current working theory to be that Apache is not able to cope with multiple-line header entries. Let's see whether I can find a way around this.
Comment 17 Vitaly Fedrushkov 2007-08-20 01:01:06 PDT
Attachment 213229 [details] may be used for testing.  Its name is just "text file" (in Russian) but again long enough when encoded.
Comment 18 Vitaly Fedrushkov 2007-08-20 01:20:24 PDT
(In reply to comment #16)

> I'd say this conforms to section 4.2 of RFC 2616, but Apache seems not to be
> able to parse this and says "malformed header from script.

RFC 2616 section 2.2:

QUOTE

The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO-8859-1 [22] only when encoded according to the rules of RFC 2047 [14].

       TEXT           = <any OCTET except CTLs,
                        but including LWS>
UNQUOTE

As far as I can read English, it is hard to tell whether Apache considered "message parser" in the above context.  RFC 2047 is not mentioned elsewhere.

IMHO we should try to obey the Postel's principle, and generate headers safely parseable by non-liberal Apache...
Comment 19 Marc Schumann [:Wurblzap] 2007-09-09 15:36:16 PDT
Created attachment 280276 [details] [diff] [review]
Patch 1.3

Ok, next round. I didn't trust MIME::encode_qp to do everything all right in UTF-8 and non-UTF-8 cases (plus it doesn't add necessary escape sequences on its own), so I stuck to Encode::encode, fixing its output.

I can't say it's a really elegant solution; if somebody comes up with a better one, I'll be happy to accept it. If not, I'd say we should take it for now. It works with
- short ASCII file names
- long ASCII file names
- short non-ASCII UTF-8 file names
- long non-ASCII UTF-8 file names

“Short” means Encode::encode doesn't line-wrap the file name. “Long” means it does. I successfully tested both cases of wrapping, inside of an escaped sequence and outside of it.
Comment 20 Frédéric Buclin 2007-10-01 05:25:48 PDT
Aren't long lines cut with normal ASCII filenames?
Comment 21 Frédéric Buclin 2007-10-08 06:53:06 PDT
Look at the discussion in bug 374424. It seems that using Encode is the wrong way to go. Why not using MIME::Words, see the patch there?
Comment 22 Max Kanat-Alexander 2007-10-08 17:27:04 PDT
FWIW, I'm actually trying to get rid of the dependency on MIME-tools, which also pulls in MailTools and lots of other stuff that's not needed by Bugzilla.

If you can pull the necessary code out of MIME::Words and put it in Bugzilla::Util, that would also be fine.
Comment 23 Frédéric Buclin 2007-11-04 14:10:43 PST
Comment on attachment 280276 [details] [diff] [review]
Patch 1.3

Canceling review till last comments are answered.
Comment 24 Max Kanat-Alexander 2008-01-11 16:51:34 PST
I suspect this bug needs an entirely new trunk patch now that bug 363153 is fixed.
Comment 25 Frédéric Buclin 2008-12-07 06:48:56 PST
The Bugzilla 3.0 branch is now locked to security bugs and dataloss fixes only. This bug doesn't fit into one of these two categories and is retargetted to 3.2 as part of a mass-change. To catch bugmails related to this mass-change, use lts081207 in your email client filter.
Comment 26 Frédéric Buclin 2008-12-29 09:26:44 PST
I hope this bug summary is clearer.
Comment 27 Frédéric Buclin 2009-03-06 00:12:37 PST
*** Bug 481821 has been marked as a duplicate of this bug. ***
Comment 28 Max Kanat-Alexander 2009-08-18 15:37:27 PDT
Comment on attachment 280276 [details] [diff] [review]
Patch 1.3

Okay, somebody should review this. Probably me.
Comment 29 Max Kanat-Alexander 2009-08-19 15:17:53 PDT
Created attachment 395423 [details] [diff] [review]
v4

Okay, this is a pretty simple two-line fix that fixes it for me.
Comment 30 Max Kanat-Alexander 2009-09-16 17:24:09 PDT
Hey Wurblzap. Any chance of getting a review on this two-line patch some time soon?
Comment 31 Marc Schumann [:Wurblzap] 2009-09-16 22:45:38 PDT
(In reply to comment #30)
> Hey Wurblzap. Any chance of getting a review on this two-line patch some time
> soon?

Yeah... Actually, today or tomorrow might do it.
Comment 32 Marc Schumann [:Wurblzap] 2009-09-24 13:14:35 PDT
Comment on attachment 395423 [details] [diff] [review]
v4

While the approach is way better, I don't think we should take advantage of undocumented features. My Encode-2.35 doesn't have docs on the bpl variable.

If there's a newer version with appropriate docs, though, I'll be happy; I tested, and it works well for me.

The bytes per line (which bpl probably means) value seems arbitrary to me and justifies a code comment. It should probably be the maximally possible length of an encoded filename; definitely not less. According to our schema, the maximum filename length is 100 characters. Afaik every UTF-8 character can be represented by no more than four bytes. Each byte, when encoded, takes up three ASCII characters (=xx). So we're at 100×4×3=1200. Then there's some header and trailer.
Comment 33 Max Kanat-Alexander 2009-09-24 14:52:24 PDT
(In reply to comment #32)
> (From update of attachment 395423 [details] [diff] [review])
> While the approach is way better, I don't think we should take advantage of
> undocumented features. My Encode-2.35 doesn't have docs on the bpl variable.

  We already are using it in Bugzilla::Mailer--we talked to the maintainer of the module and I believe that he recommended it.
Comment 34 Marc Schumann [:Wurblzap] 2009-09-25 00:47:07 PDT
All right. Turns out bug 411544 comment 6 wasn't all that far off then :)

So let's go with it. I'm sure you'll agree a code comment pointing to bug 411544 comment 21 would be good (for Mailer.pm, too, but that's a different story). Fix its value, too, and I'd say we're set.
Comment 35 Marc Schumann [:Wurblzap] 2009-09-25 01:13:06 PDT
Hrm... It seems that values above 998 violate RFC 2822 (bug 374424 comment 15 quotes RFC 2822). Does RFC 2822 apply here at all, or does it apply to email only?
Comment 36 Vitaly Fedrushkov 2009-09-25 02:29:36 PDT
(In reply to comment #35)
> Does RFC 2822 apply here at all, or does it apply to email only?

RFC 2822 is about email.  It explicitly refers to RFC 2045 and RFC 2047 when it comes to MIME.

As clarified in RFC 2616:

QUOTE
19.4.7 MHTML and Line Length Limitations

   HTTP implementations which share code with MHTML [45] implementations
   need to be aware of MIME line length limitations. Since HTTP does not
   have this limitation, HTTP does not fold long lines. MHTML messages
   being transported by HTTP follow all conventions of MHTML, including
   line length limitations and folding, canonicalization, etc., since
   HTTP transports all message-bodies as payload (see section 3.7.2) and
   does not interpret the content or any MIME header lines that might be
   contained therein.
UNQUOTE

In this bug we're doing HTTP, so we don't have to fold anything, from RFC standpoint.  What we may fear (and avoid, to obey Postel's principle) is some broken shared MHTML/HTTP _client_ implementation which had read 19.4.7 backwards and did actually impose some limit to any MIME encoded data...
Comment 37 Max Kanat-Alexander 2009-09-29 23:56:36 PDT
Created attachment 403717 [details] [diff] [review]
v5

Okay, modified BPL and the comment.
Comment 38 Marc Schumann [:Wurblzap] 2009-09-30 00:21:29 PDT
Comment on attachment 403717 [details] [diff] [review]
v5

All right
Comment 39 Max Kanat-Alexander 2009-09-30 01:53:37 PDT
tip:

Checking in attachment.cgi;
/cvsroot/mozilla/webtools/bugzilla/attachment.cgi,v  <--  attachment.cgi
new revision: 1.161; previous revision: 1.160
done

3.4:

Checking in attachment.cgi;
/cvsroot/mozilla/webtools/bugzilla/attachment.cgi,v  <--  attachment.cgi
new revision: 1.153.2.2; previous revision: 1.153.2.1
done
Comment 40 Michael Knudson 2010-12-15 09:19:46 PST
I'm still experiencing trouble with this on Bugzilla 3.6, depending on what browser is being used.

Firefox works okay, but when I click on the Russian attachment above in IE (including IE8), it loads a page of garbage.  If I go to "save as", the filename shows up as:
=_UTF-8_Q_=D0=A2=D0=B5=D0=BA=D1=81=D1=82=D0=BE=D0=B2=D1=8B=D0=B9=20=D0=B4=D0=BE=D0=BA=D1=83=D0=BC=D0=B5=D0=BD=D1=82_=.htm

(I believe the .htm comes from the encoded stream being too long, so it lost the original extension.)

I have the same issue as the guy who posted bug 481821 where my Japanese users are trying to upload files with kanji in the filename.  IE is really dominant in Japan, so a solution that works on multiple browsers would be helpful.
Comment 41 Marc Schumann [:Wurblzap] 2010-12-16 03:51:51 PST
Bugs will only get reopened if the checked-in patch needs to be backed out. That's not the case here.

You're probably best off if you file a new bug for this, if there isn't already one, and refer to it from here in a comment.

I'm seeing the effect, too, btw :)
Comment 42 Michael Knudson 2010-12-16 17:40:58 PST
All of the related bugs I could find were marked as duplicates of this one.  Since I think its still an issue, I have opened up a new bug report at bug 619847.

Sorry for the extra email.
Comment 43 rosannah.vargas 2013-08-09 22:28:37 PDT
Kool
Comment 44 rosannah.vargas 2013-08-09 22:33:00 PDT
Comment on attachment 403717 [details] [diff] [review]
v5

>Index: attachment.cgi
>===================================================================
>RCS file: /cvsroot/mozilla/webtools/bugzilla/attachment.cgi,v
>retrieving revision 1.160
>diff -u -r1.160 attachment.cgi
>--- attachment.cgi	28 Sep 2009 17:24:16 -0000	1.160
>+++ attachment.cgi	30 Sep 2009 06:55:54 -0000
>@@ -52,6 +52,8 @@
> use Bugzilla::Token;
> use Bugzilla::Keyword;
> 
>+use Encode qw(encode);
>+
> # For most scripts we don't make $cgi and $template global variables. But
> # when preparing Bugzilla for mod_perl, this script used these
> # variables in so many subroutines that it was easier to just
>@@ -318,6 +320,11 @@
>     $filename =~ s/\\/\\\\/g; # escape backslashes
>     $filename =~ s/"/\\"/g; # escape quotes
> 
>+    # Avoid line wrapping done by Encode, which we don't need for HTTP
>+    # headers. See discussion in bug 328628 for details.
>+    local $Encode::Encoding{'MIME-Q'}->{'bpl'} = 10000;
>+    $filename = encode('MIME-Q', $filename);
>+
>     my $disposition = Bugzilla->params->{'allow_attachment_display'} ? 'inline' : 'attachment';
> 
>     print $cgi->header(-type=>"$contenttype; name=\"$filename\"",

Note You need to log in before you can comment on or make changes to this bug.