The default bug view has changed. See this FAQ.

Attachment filenames having UTF8 characters in them are rendered incorrectly

RESOLVED FIXED in Bugzilla 3.4

Status

()

Bugzilla
Attachments & Requests
--
minor
RESOLVED FIXED
11 years ago
4 years ago

People

(Reporter: Dmitriy Lujecskiy, Assigned: Max Kanat-Alexander)

Tracking

unspecified
Bugzilla 3.4
Bug Flags:
approval +
approval3.4 +

Details

Attachments

(2 attachments, 5 obsolete attachments)

(Reporter)

Description

11 years ago
User-Agent:       Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; MyIE2; .NET CLR 1.1.4322)
Build Identifier: 2.21.1+(download by CVS from 2006-02-20)

If try download attacment file created with Russian filename then bad name

Reproducible: Always

Steps to Reproduce:
1.Download attachment for this bug
2.
3.

Actual Results:  
Bad file name for save
(Reporter)

Comment 1

11 years ago
Created attachment 213229 [details]
File name in Russian characters
(Reporter)

Comment 2

11 years ago
This Bugzila server work good.

I am use server with configuration in bug 328013.

(Reporter)

Comment 3

11 years ago
I am get HTTP header as

Content-disposition: inline; filename="Текстовый документ.xxx"
Content-Type: application/octet-stream; name="Текстовый документ.xxx"


it is not rfc2231

Comment 4

11 years ago
Created attachment 216729 [details] [diff] [review]
QP encode file name of attachment

Non-ascii characters in file names must be base64 or qp encoded just like subject in mail. See patch. Changes in encode_qp_words are made because first word don't have space before.
Attachment #216729 - Flags: review?

Updated

11 years ago
Attachment #216729 - Flags: review? → review?(mkanat)
(Assignee)

Comment 5

11 years ago
Comment on attachment 216729 [details] [diff] [review]
QP encode file name of attachment

I don't know anything about this code; probably best to ask Wurblzap or glob.
Attachment #216729 - Flags: review?(mkanat) → review?(wurblzap)
Comment on attachment 216729 [details] [diff] [review]
QP encode file name of attachment

r=glob by inspection
Attachment #216729 - Flags: review+
Comment on attachment 216729 [details] [diff] [review]
QP encode file name of attachment

This doesn't work if the first word is 7bit-clean and the second is not. You probably need to move the if($first) part outside if(!is_7bit_clean($word)).

Nit 1: you're appending '_' to $qp_head in every loop -- it might be easier to put the if($first) part after the push, and move the "else" part of if($first) into the "then" part. (This means that you need to merge $qp_head = '=UTF-8?Q?' with my $qp_head, of course.)

Nit 2: line breaks at curly braces should be as shown in the example at http://www.bugzilla.org/docs/developer.html#perl-style.
Attachment #216729 - Flags: review?(wurblzap) → review-
Assignee: attach-and-request → srr
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows XP → All
Hardware: PC → All
Target Milestone: --- → Bugzilla 2.24

Comment 8

11 years ago
Created attachment 224989 [details] [diff] [review]
QP encode file name of attachment v.2

Thank you, Marc!

Please review new version of patch
Attachment #216729 - Attachment is obsolete: true

Updated

11 years ago
Attachment #224989 - Flags: review?(wurblzap)

Updated

11 years ago
Status: NEW → ASSIGNED
Comment on attachment 224989 [details] [diff] [review]
QP encode file name of attachment v.2

That was quick. r=wurblzap by inspection.

Do you want to sign off on the updated patch, glob?
Attachment #224989 - Flags: review?(wurblzap)
Attachment #224989 - Flags: review?(bugzilla)
Attachment #224989 - Flags: review+

Comment 10

11 years ago
Marc, Byron, what's new about this patch/bug?

Comment 11

11 years ago
Confirmed and patched (the wrong way) independently while working on Bugzilla-ru project.

Comment 12

10 years ago
Confirmed in 3.0 for Internet Explorer 7.

Target milestone missed...
Created attachment 276768 [details] [diff] [review]
Patch 1.2

Unrotted.
Attachment #224989 - Attachment is obsolete: true
Attachment #276768 - Flags: review?(bugzilla)
Attachment #224989 - Flags: review?(bugzilla)
Comment on attachment 276768 [details] [diff] [review]
Patch 1.2

r=glob
Attachment #276768 - Flags: review?(bugzilla) → review+
Flags: approval?
Flags: approval3.0?

Comment 15

10 years ago
Comment on attachment 276768 [details] [diff] [review]
Patch 1.2

This patch crashes Firefox when viewing an attachment with a name such as bug0öäüöüéàè-cvs.diff
Attachment #276768 - Flags: review-

Updated

10 years ago
Flags: approval?
Flags: approval3.0?
Confirming the crash using the file name Frédéric used. I found out what's so special about this file name. It worked for me with others; the difference is it's resulting encoded length. It turns out that encode('MIME-Q') inserts a line break and a blank whenever the resulting string length exceeds a certain limit (probably 80 or 76 or something). With this, Bugzilla produces a multiple-line HTTP header entry, like this:

  Content-Type: text/plain; name="=?UTF-8?Q?bug=200=C3=B6=C3=A4=C3=BC?=
   =?UTF-8?Q?=2Dcvs?=.diff"; charset=UTF-8

I'd say this conforms to section 4.2 of RFC 2616, but Apache seems not to be able to parse this and says "malformed header from script. Bad header=?UTF-8?Q?cvs?" (error 500). It s complaining about the beginning of the second line makes my current working theory to be that Apache is not able to cope with multiple-line header entries. Let's see whether I can find a way around this.

Comment 17

10 years ago
Attachment 213229 [details] may be used for testing.  Its name is just "text file" (in Russian) but again long enough when encoded.

Comment 18

10 years ago
(In reply to comment #16)

> I'd say this conforms to section 4.2 of RFC 2616, but Apache seems not to be
> able to parse this and says "malformed header from script.

RFC 2616 section 2.2:

QUOTE

The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO-8859-1 [22] only when encoded according to the rules of RFC 2047 [14].

       TEXT           = <any OCTET except CTLs,
                        but including LWS>
UNQUOTE

As far as I can read English, it is hard to tell whether Apache considered "message parser" in the above context.  RFC 2047 is not mentioned elsewhere.

IMHO we should try to obey the Postel's principle, and generate headers safely parseable by non-liberal Apache...
Created attachment 280276 [details] [diff] [review]
Patch 1.3

Ok, next round. I didn't trust MIME::encode_qp to do everything all right in UTF-8 and non-UTF-8 cases (plus it doesn't add necessary escape sequences on its own), so I stuck to Encode::encode, fixing its output.

I can't say it's a really elegant solution; if somebody comes up with a better one, I'll be happy to accept it. If not, I'd say we should take it for now. It works with
- short ASCII file names
- long ASCII file names
- short non-ASCII UTF-8 file names
- long non-ASCII UTF-8 file names

“Short” means Encode::encode doesn't line-wrap the file name. “Long” means it does. I successfully tested both cases of wrapping, inside of an escaped sequence and outside of it.
Assignee: srr → wurblzap
Attachment #276768 - Attachment is obsolete: true
Attachment #280276 - Flags: review?(LpSolit)

Comment 20

10 years ago
Aren't long lines cut with normal ASCII filenames?

Comment 21

10 years ago
Look at the discussion in bug 374424. It seems that using Encode is the wrong way to go. Why not using MIME::Words, see the patch there?
(Assignee)

Comment 22

10 years ago
FWIW, I'm actually trying to get rid of the dependency on MIME-tools, which also pulls in MailTools and lots of other stuff that's not needed by Bugzilla.

If you can pull the necessary code out of MIME::Words and put it in Bugzilla::Util, that would also be fine.

Comment 23

10 years ago
Comment on attachment 280276 [details] [diff] [review]
Patch 1.3

Canceling review till last comments are answered.
Attachment #280276 - Flags: review?(LpSolit)
(Assignee)

Comment 24

9 years ago
I suspect this bug needs an entirely new trunk patch now that bug 363153 is fixed.

Comment 25

8 years ago
The Bugzilla 3.0 branch is now locked to security bugs and dataloss fixes only. This bug doesn't fit into one of these two categories and is retargetted to 3.2 as part of a mass-change. To catch bugmails related to this mass-change, use lts081207 in your email client filter.
Target Milestone: Bugzilla 3.0 → Bugzilla 3.2

Comment 26

8 years ago
I hope this bug summary is clearer.
Summary: Attachment File name in utf8 → Attachment filenames having UTF8 characters in them are rendered incorrectly

Updated

8 years ago
Duplicate of this bug: 481821
(Assignee)

Comment 28

8 years ago
Comment on attachment 280276 [details] [diff] [review]
Patch 1.3

Okay, somebody should review this. Probably me.
Attachment #280276 - Flags: review?(mkanat)
(Assignee)

Comment 29

8 years ago
Created attachment 395423 [details] [diff] [review]
v4

Okay, this is a pretty simple two-line fix that fixes it for me.
Assignee: wurblzap → mkanat
Attachment #280276 - Attachment is obsolete: true
Attachment #395423 - Flags: review?(wurblzap)
Attachment #280276 - Flags: review?(mkanat)
(Assignee)

Comment 30

8 years ago
Hey Wurblzap. Any chance of getting a review on this two-line patch some time soon?
(In reply to comment #30)
> Hey Wurblzap. Any chance of getting a review on this two-line patch some time
> soon?

Yeah... Actually, today or tomorrow might do it.
Attachment #395423 - Flags: review?(wurblzap) → review-
Comment on attachment 395423 [details] [diff] [review]
v4

While the approach is way better, I don't think we should take advantage of undocumented features. My Encode-2.35 doesn't have docs on the bpl variable.

If there's a newer version with appropriate docs, though, I'll be happy; I tested, and it works well for me.

The bytes per line (which bpl probably means) value seems arbitrary to me and justifies a code comment. It should probably be the maximally possible length of an encoded filename; definitely not less. According to our schema, the maximum filename length is 100 characters. Afaik every UTF-8 character can be represented by no more than four bytes. Each byte, when encoded, takes up three ASCII characters (=xx). So we're at 100×4×3=1200. Then there's some header and trailer.
(Assignee)

Comment 33

8 years ago
(In reply to comment #32)
> (From update of attachment 395423 [details] [diff] [review])
> While the approach is way better, I don't think we should take advantage of
> undocumented features. My Encode-2.35 doesn't have docs on the bpl variable.

  We already are using it in Bugzilla::Mailer--we talked to the maintainer of the module and I believe that he recommended it.
All right. Turns out bug 411544 comment 6 wasn't all that far off then :)

So let's go with it. I'm sure you'll agree a code comment pointing to bug 411544 comment 21 would be good (for Mailer.pm, too, but that's a different story). Fix its value, too, and I'd say we're set.
Hrm... It seems that values above 998 violate RFC 2822 (bug 374424 comment 15 quotes RFC 2822). Does RFC 2822 apply here at all, or does it apply to email only?

Comment 36

8 years ago
(In reply to comment #35)
> Does RFC 2822 apply here at all, or does it apply to email only?

RFC 2822 is about email.  It explicitly refers to RFC 2045 and RFC 2047 when it comes to MIME.

As clarified in RFC 2616:

QUOTE
19.4.7 MHTML and Line Length Limitations

   HTTP implementations which share code with MHTML [45] implementations
   need to be aware of MIME line length limitations. Since HTTP does not
   have this limitation, HTTP does not fold long lines. MHTML messages
   being transported by HTTP follow all conventions of MHTML, including
   line length limitations and folding, canonicalization, etc., since
   HTTP transports all message-bodies as payload (see section 3.7.2) and
   does not interpret the content or any MIME header lines that might be
   contained therein.
UNQUOTE

In this bug we're doing HTTP, so we don't have to fold anything, from RFC standpoint.  What we may fear (and avoid, to obey Postel's principle) is some broken shared MHTML/HTTP _client_ implementation which had read 19.4.7 backwards and did actually impose some limit to any MIME encoded data...
(Assignee)

Comment 37

8 years ago
Created attachment 403717 [details] [diff] [review]
v5

Okay, modified BPL and the comment.
Attachment #395423 - Attachment is obsolete: true
Attachment #403717 - Flags: review?(wurblzap)
(Assignee)

Updated

8 years ago
Target Milestone: Bugzilla 3.2 → Bugzilla 3.4
Comment on attachment 403717 [details] [diff] [review]
v5

All right
Attachment #403717 - Flags: review?(wurblzap) → review+
(Assignee)

Updated

8 years ago
Flags: approval3.4+
Flags: approval+
(Assignee)

Comment 39

8 years ago
tip:

Checking in attachment.cgi;
/cvsroot/mozilla/webtools/bugzilla/attachment.cgi,v  <--  attachment.cgi
new revision: 1.161; previous revision: 1.160
done

3.4:

Checking in attachment.cgi;
/cvsroot/mozilla/webtools/bugzilla/attachment.cgi,v  <--  attachment.cgi
new revision: 1.153.2.2; previous revision: 1.153.2.1
done
Status: ASSIGNED → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED

Comment 40

6 years ago
I'm still experiencing trouble with this on Bugzilla 3.6, depending on what browser is being used.

Firefox works okay, but when I click on the Russian attachment above in IE (including IE8), it loads a page of garbage.  If I go to "save as", the filename shows up as:
=_UTF-8_Q_=D0=A2=D0=B5=D0=BA=D1=81=D1=82=D0=BE=D0=B2=D1=8B=D0=B9=20=D0=B4=D0=BE=D0=BA=D1=83=D0=BC=D0=B5=D0=BD=D1=82_=.htm

(I believe the .htm comes from the encoded stream being too long, so it lost the original extension.)

I have the same issue as the guy who posted bug 481821 where my Japanese users are trying to upload files with kanji in the filename.  IE is really dominant in Japan, so a solution that works on multiple browsers would be helpful.
Bugs will only get reopened if the checked-in patch needs to be backed out. That's not the case here.

You're probably best off if you file a new bug for this, if there isn't already one, and refer to it from here in a comment.

I'm seeing the effect, too, btw :)

Comment 42

6 years ago
All of the related bugs I could find were marked as duplicates of this one.  Since I think its still an issue, I have opened up a new bug report at bug 619847.

Sorry for the extra email.

Comment 43

4 years ago
Kool

Comment 44

4 years ago
Comment on attachment 403717 [details] [diff] [review]
v5

>Index: attachment.cgi
>===================================================================
>RCS file: /cvsroot/mozilla/webtools/bugzilla/attachment.cgi,v
>retrieving revision 1.160
>diff -u -r1.160 attachment.cgi
>--- attachment.cgi	28 Sep 2009 17:24:16 -0000	1.160
>+++ attachment.cgi	30 Sep 2009 06:55:54 -0000
>@@ -52,6 +52,8 @@
> use Bugzilla::Token;
> use Bugzilla::Keyword;
> 
>+use Encode qw(encode);
>+
> # For most scripts we don't make $cgi and $template global variables. But
> # when preparing Bugzilla for mod_perl, this script used these
> # variables in so many subroutines that it was easier to just
>@@ -318,6 +320,11 @@
>     $filename =~ s/\\/\\\\/g; # escape backslashes
>     $filename =~ s/"/\\"/g; # escape quotes
> 
>+    # Avoid line wrapping done by Encode, which we don't need for HTTP
>+    # headers. See discussion in bug 328628 for details.
>+    local $Encode::Encoding{'MIME-Q'}->{'bpl'} = 10000;
>+    $filename = encode('MIME-Q', $filename);
>+
>     my $disposition = Bugzilla->params->{'allow_attachment_display'} ? 'inline' : 'attachment';
> 
>     print $cgi->header(-type=>"$contenttype; name=\"$filename\"",
You need to log in before you can comment on or make changes to this bug.