Closed Bug 328628 Opened 18 years ago Closed 15 years ago

Attachment filenames having UTF8 characters in them are rendered incorrectly

Categories

(Bugzilla :: Attachments & Requests, defect)

defect
Not set
minor

Tracking

()

RESOLVED FIXED
Bugzilla 3.4

People

(Reporter: bugzilla.10.deep125, Assigned: mkanat)

References

Details

Attachments

(2 files, 5 obsolete files)

User-Agent:       Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; MyIE2; .NET CLR 1.1.4322)
Build Identifier: 2.21.1+(download by CVS from 2006-02-20)

If try download attacment file created with Russian filename then bad name

Reproducible: Always

Steps to Reproduce:
1.Download attachment for this bug
2.
3.

Actual Results:  
Bad file name for save
This Bugzila server work good.

I am use server with configuration in bug 328013.

I am get HTTP header as

Content-disposition: inline; filename="Текстовый документ.xxx"
Content-Type: application/octet-stream; name="Текстовый документ.xxx"


it is not rfc2231
Non-ascii characters in file names must be base64 or qp encoded just like subject in mail. See patch. Changes in encode_qp_words are made because first word don't have space before.
Attachment #216729 - Flags: review?
Attachment #216729 - Flags: review? → review?(mkanat)
Comment on attachment 216729 [details] [diff] [review]
QP encode file name of attachment

I don't know anything about this code; probably best to ask Wurblzap or glob.
Attachment #216729 - Flags: review?(mkanat) → review?(wurblzap)
Comment on attachment 216729 [details] [diff] [review]
QP encode file name of attachment

r=glob by inspection
Attachment #216729 - Flags: review+
Comment on attachment 216729 [details] [diff] [review]
QP encode file name of attachment

This doesn't work if the first word is 7bit-clean and the second is not. You probably need to move the if($first) part outside if(!is_7bit_clean($word)).

Nit 1: you're appending '_' to $qp_head in every loop -- it might be easier to put the if($first) part after the push, and move the "else" part of if($first) into the "then" part. (This means that you need to merge $qp_head = '=UTF-8?Q?' with my $qp_head, of course.)

Nit 2: line breaks at curly braces should be as shown in the example at http://www.bugzilla.org/docs/developer.html#perl-style.
Attachment #216729 - Flags: review?(wurblzap) → review-
Assignee: attach-and-request → srr
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows XP → All
Hardware: PC → All
Target Milestone: --- → Bugzilla 2.24
Thank you, Marc!

Please review new version of patch
Attachment #216729 - Attachment is obsolete: true
Attachment #224989 - Flags: review?(wurblzap)
Status: NEW → ASSIGNED
Comment on attachment 224989 [details] [diff] [review]
QP encode file name of attachment v.2

That was quick. r=wurblzap by inspection.

Do you want to sign off on the updated patch, glob?
Attachment #224989 - Flags: review?(wurblzap)
Attachment #224989 - Flags: review?(bugzilla)
Attachment #224989 - Flags: review+
Marc, Byron, what's new about this patch/bug?
Confirmed and patched (the wrong way) independently while working on Bugzilla-ru project.
Confirmed in 3.0 for Internet Explorer 7.

Target milestone missed...
Attached patch Patch 1.2 (obsolete) — Splinter Review
Unrotted.
Attachment #224989 - Attachment is obsolete: true
Attachment #276768 - Flags: review?(bugzilla)
Attachment #224989 - Flags: review?(bugzilla)
Comment on attachment 276768 [details] [diff] [review]
Patch 1.2

r=glob
Attachment #276768 - Flags: review?(bugzilla) → review+
Flags: approval?
Flags: approval3.0?
Comment on attachment 276768 [details] [diff] [review]
Patch 1.2

This patch crashes Firefox when viewing an attachment with a name such as bug0öäüöüéàè-cvs.diff
Attachment #276768 - Flags: review-
Flags: approval?
Flags: approval3.0?
Confirming the crash using the file name Frédéric used. I found out what's so special about this file name. It worked for me with others; the difference is it's resulting encoded length. It turns out that encode('MIME-Q') inserts a line break and a blank whenever the resulting string length exceeds a certain limit (probably 80 or 76 or something). With this, Bugzilla produces a multiple-line HTTP header entry, like this:

  Content-Type: text/plain; name="=?UTF-8?Q?bug=200=C3=B6=C3=A4=C3=BC?=
   =?UTF-8?Q?=2Dcvs?=.diff"; charset=UTF-8

I'd say this conforms to section 4.2 of RFC 2616, but Apache seems not to be able to parse this and says "malformed header from script. Bad header=?UTF-8?Q?cvs?" (error 500). It s complaining about the beginning of the second line makes my current working theory to be that Apache is not able to cope with multiple-line header entries. Let's see whether I can find a way around this.
Attachment 213229 [details] may be used for testing.  Its name is just "text file" (in Russian) but again long enough when encoded.
(In reply to comment #16)

> I'd say this conforms to section 4.2 of RFC 2616, but Apache seems not to be
> able to parse this and says "malformed header from script.

RFC 2616 section 2.2:

QUOTE

The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO-8859-1 [22] only when encoded according to the rules of RFC 2047 [14].

       TEXT           = <any OCTET except CTLs,
                        but including LWS>
UNQUOTE

As far as I can read English, it is hard to tell whether Apache considered "message parser" in the above context.  RFC 2047 is not mentioned elsewhere.

IMHO we should try to obey the Postel's principle, and generate headers safely parseable by non-liberal Apache...
Attached patch Patch 1.3 (obsolete) — Splinter Review
Ok, next round. I didn't trust MIME::encode_qp to do everything all right in UTF-8 and non-UTF-8 cases (plus it doesn't add necessary escape sequences on its own), so I stuck to Encode::encode, fixing its output.

I can't say it's a really elegant solution; if somebody comes up with a better one, I'll be happy to accept it. If not, I'd say we should take it for now. It works with
- short ASCII file names
- long ASCII file names
- short non-ASCII UTF-8 file names
- long non-ASCII UTF-8 file names

“Short” means Encode::encode doesn't line-wrap the file name. “Long” means it does. I successfully tested both cases of wrapping, inside of an escaped sequence and outside of it.
Assignee: srr → wurblzap
Attachment #276768 - Attachment is obsolete: true
Attachment #280276 - Flags: review?(LpSolit)
Aren't long lines cut with normal ASCII filenames?
Look at the discussion in bug 374424. It seems that using Encode is the wrong way to go. Why not using MIME::Words, see the patch there?
FWIW, I'm actually trying to get rid of the dependency on MIME-tools, which also pulls in MailTools and lots of other stuff that's not needed by Bugzilla.

If you can pull the necessary code out of MIME::Words and put it in Bugzilla::Util, that would also be fine.
Comment on attachment 280276 [details] [diff] [review]
Patch 1.3

Canceling review till last comments are answered.
Attachment #280276 - Flags: review?(LpSolit)
I suspect this bug needs an entirely new trunk patch now that bug 363153 is fixed.
The Bugzilla 3.0 branch is now locked to security bugs and dataloss fixes only. This bug doesn't fit into one of these two categories and is retargetted to 3.2 as part of a mass-change. To catch bugmails related to this mass-change, use lts081207 in your email client filter.
Target Milestone: Bugzilla 3.0 → Bugzilla 3.2
I hope this bug summary is clearer.
Summary: Attachment File name in utf8 → Attachment filenames having UTF8 characters in them are rendered incorrectly
Comment on attachment 280276 [details] [diff] [review]
Patch 1.3

Okay, somebody should review this. Probably me.
Attachment #280276 - Flags: review?(mkanat)
Attached patch v4 (obsolete) — Splinter Review
Okay, this is a pretty simple two-line fix that fixes it for me.
Assignee: wurblzap → mkanat
Attachment #280276 - Attachment is obsolete: true
Attachment #395423 - Flags: review?(wurblzap)
Attachment #280276 - Flags: review?(mkanat)
Hey Wurblzap. Any chance of getting a review on this two-line patch some time soon?
(In reply to comment #30)
> Hey Wurblzap. Any chance of getting a review on this two-line patch some time
> soon?

Yeah... Actually, today or tomorrow might do it.
Attachment #395423 - Flags: review?(wurblzap) → review-
Comment on attachment 395423 [details] [diff] [review]
v4

While the approach is way better, I don't think we should take advantage of undocumented features. My Encode-2.35 doesn't have docs on the bpl variable.

If there's a newer version with appropriate docs, though, I'll be happy; I tested, and it works well for me.

The bytes per line (which bpl probably means) value seems arbitrary to me and justifies a code comment. It should probably be the maximally possible length of an encoded filename; definitely not less. According to our schema, the maximum filename length is 100 characters. Afaik every UTF-8 character can be represented by no more than four bytes. Each byte, when encoded, takes up three ASCII characters (=xx). So we're at 100×4×3=1200. Then there's some header and trailer.
(In reply to comment #32)
> (From update of attachment 395423 [details] [diff] [review])
> While the approach is way better, I don't think we should take advantage of
> undocumented features. My Encode-2.35 doesn't have docs on the bpl variable.

  We already are using it in Bugzilla::Mailer--we talked to the maintainer of the module and I believe that he recommended it.
All right. Turns out bug 411544 comment 6 wasn't all that far off then :)

So let's go with it. I'm sure you'll agree a code comment pointing to bug 411544 comment 21 would be good (for Mailer.pm, too, but that's a different story). Fix its value, too, and I'd say we're set.
Hrm... It seems that values above 998 violate RFC 2822 (bug 374424 comment 15 quotes RFC 2822). Does RFC 2822 apply here at all, or does it apply to email only?
(In reply to comment #35)
> Does RFC 2822 apply here at all, or does it apply to email only?

RFC 2822 is about email.  It explicitly refers to RFC 2045 and RFC 2047 when it comes to MIME.

As clarified in RFC 2616:

QUOTE
19.4.7 MHTML and Line Length Limitations

   HTTP implementations which share code with MHTML [45] implementations
   need to be aware of MIME line length limitations. Since HTTP does not
   have this limitation, HTTP does not fold long lines. MHTML messages
   being transported by HTTP follow all conventions of MHTML, including
   line length limitations and folding, canonicalization, etc., since
   HTTP transports all message-bodies as payload (see section 3.7.2) and
   does not interpret the content or any MIME header lines that might be
   contained therein.
UNQUOTE

In this bug we're doing HTTP, so we don't have to fold anything, from RFC standpoint.  What we may fear (and avoid, to obey Postel's principle) is some broken shared MHTML/HTTP _client_ implementation which had read 19.4.7 backwards and did actually impose some limit to any MIME encoded data...
Attached patch v5Splinter Review
Okay, modified BPL and the comment.
Attachment #395423 - Attachment is obsolete: true
Attachment #403717 - Flags: review?(wurblzap)
Target Milestone: Bugzilla 3.2 → Bugzilla 3.4
Comment on attachment 403717 [details] [diff] [review]
v5

All right
Attachment #403717 - Flags: review?(wurblzap) → review+
Flags: approval3.4+
Flags: approval+
tip:

Checking in attachment.cgi;
/cvsroot/mozilla/webtools/bugzilla/attachment.cgi,v  <--  attachment.cgi
new revision: 1.161; previous revision: 1.160
done

3.4:

Checking in attachment.cgi;
/cvsroot/mozilla/webtools/bugzilla/attachment.cgi,v  <--  attachment.cgi
new revision: 1.153.2.2; previous revision: 1.153.2.1
done
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
I'm still experiencing trouble with this on Bugzilla 3.6, depending on what browser is being used.

Firefox works okay, but when I click on the Russian attachment above in IE (including IE8), it loads a page of garbage.  If I go to "save as", the filename shows up as:
=_UTF-8_Q_=D0=A2=D0=B5=D0=BA=D1=81=D1=82=D0=BE=D0=B2=D1=8B=D0=B9=20=D0=B4=D0=BE=D0=BA=D1=83=D0=BC=D0=B5=D0=BD=D1=82_=.htm

(I believe the .htm comes from the encoded stream being too long, so it lost the original extension.)

I have the same issue as the guy who posted bug 481821 where my Japanese users are trying to upload files with kanji in the filename.  IE is really dominant in Japan, so a solution that works on multiple browsers would be helpful.
Bugs will only get reopened if the checked-in patch needs to be backed out. That's not the case here.

You're probably best off if you file a new bug for this, if there isn't already one, and refer to it from here in a comment.

I'm seeing the effect, too, btw :)
All of the related bugs I could find were marked as duplicates of this one.  Since I think its still an issue, I have opened up a new bug report at bug 619847.

Sorry for the extra email.
Kool
Comment on attachment 403717 [details] [diff] [review]
v5

>Index: attachment.cgi
>===================================================================
>RCS file: /cvsroot/mozilla/webtools/bugzilla/attachment.cgi,v
>retrieving revision 1.160
>diff -u -r1.160 attachment.cgi
>--- attachment.cgi	28 Sep 2009 17:24:16 -0000	1.160
>+++ attachment.cgi	30 Sep 2009 06:55:54 -0000
>@@ -52,6 +52,8 @@
> use Bugzilla::Token;
> use Bugzilla::Keyword;
> 
>+use Encode qw(encode);
>+
> # For most scripts we don't make $cgi and $template global variables. But
> # when preparing Bugzilla for mod_perl, this script used these
> # variables in so many subroutines that it was easier to just
>@@ -318,6 +320,11 @@
>     $filename =~ s/\\/\\\\/g; # escape backslashes
>     $filename =~ s/"/\\"/g; # escape quotes
> 
>+    # Avoid line wrapping done by Encode, which we don't need for HTTP
>+    # headers. See discussion in bug 328628 for details.
>+    local $Encode::Encoding{'MIME-Q'}->{'bpl'} = 10000;
>+    $filename = encode('MIME-Q', $filename);
>+
>     my $disposition = Bugzilla->params->{'allow_attachment_display'} ? 'inline' : 'attachment';
> 
>     print $cgi->header(-type=>"$contenttype; name=\"$filename\"",
You need to log in before you can comment on or make changes to this bug.