Closed
Bug 328628
Opened 18 years ago
Closed 15 years ago
Attachment filenames having UTF8 characters in them are rendered incorrectly
Categories
(Bugzilla :: Attachments & Requests, defect)
Bugzilla
Attachments & Requests
Tracking
()
RESOLVED
FIXED
Bugzilla 3.4
People
(Reporter: bugzilla.10.deep125, Assigned: mkanat)
References
Details
Attachments
(2 files, 5 obsolete files)
1.17 KB,
application/octet-stream
|
Details | |
1.10 KB,
patch
|
Wurblzap
:
review+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; MyIE2; .NET CLR 1.1.4322) Build Identifier: 2.21.1+(download by CVS from 2006-02-20) If try download attacment file created with Russian filename then bad name Reproducible: Always Steps to Reproduce: 1.Download attachment for this bug 2. 3. Actual Results: Bad file name for save
Reporter | ||
Comment 1•18 years ago
|
||
Reporter | ||
Comment 2•18 years ago
|
||
This Bugzila server work good. I am use server with configuration in bug 328013.
Reporter | ||
Comment 3•18 years ago
|
||
I am get HTTP header as Content-disposition: inline; filename="Текстовый документ.xxx" Content-Type: application/octet-stream; name="Текстовый документ.xxx" it is not rfc2231
Comment 4•18 years ago
|
||
Non-ascii characters in file names must be base64 or qp encoded just like subject in mail. See patch. Changes in encode_qp_words are made because first word don't have space before.
Attachment #216729 -
Flags: review?
Updated•18 years ago
|
Attachment #216729 -
Flags: review? → review?(mkanat)
Assignee | ||
Comment 5•18 years ago
|
||
Comment on attachment 216729 [details] [diff] [review] QP encode file name of attachment I don't know anything about this code; probably best to ask Wurblzap or glob.
Attachment #216729 -
Flags: review?(mkanat) → review?(wurblzap)
Comment on attachment 216729 [details] [diff] [review] QP encode file name of attachment r=glob by inspection
Attachment #216729 -
Flags: review+
Comment 7•18 years ago
|
||
Comment on attachment 216729 [details] [diff] [review] QP encode file name of attachment This doesn't work if the first word is 7bit-clean and the second is not. You probably need to move the if($first) part outside if(!is_7bit_clean($word)). Nit 1: you're appending '_' to $qp_head in every loop -- it might be easier to put the if($first) part after the push, and move the "else" part of if($first) into the "then" part. (This means that you need to merge $qp_head = '=UTF-8?Q?' with my $qp_head, of course.) Nit 2: line breaks at curly braces should be as shown in the example at http://www.bugzilla.org/docs/developer.html#perl-style.
Attachment #216729 -
Flags: review?(wurblzap) → review-
Updated•18 years ago
|
Assignee: attach-and-request → srr
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows XP → All
Hardware: PC → All
Target Milestone: --- → Bugzilla 2.24
Comment 8•18 years ago
|
||
Thank you, Marc! Please review new version of patch
Attachment #216729 -
Attachment is obsolete: true
Updated•18 years ago
|
Attachment #224989 -
Flags: review?(wurblzap)
Updated•18 years ago
|
Status: NEW → ASSIGNED
Comment 9•18 years ago
|
||
Comment on attachment 224989 [details] [diff] [review] QP encode file name of attachment v.2 That was quick. r=wurblzap by inspection. Do you want to sign off on the updated patch, glob?
Attachment #224989 -
Flags: review?(wurblzap)
Attachment #224989 -
Flags: review?(bugzilla)
Attachment #224989 -
Flags: review+
Comment 10•18 years ago
|
||
Marc, Byron, what's new about this patch/bug?
Comment 11•18 years ago
|
||
Confirmed and patched (the wrong way) independently while working on Bugzilla-ru project.
Comment 12•17 years ago
|
||
Confirmed in 3.0 for Internet Explorer 7. Target milestone missed...
Comment 13•17 years ago
|
||
Unrotted.
Attachment #224989 -
Attachment is obsolete: true
Attachment #276768 -
Flags: review?(bugzilla)
Attachment #224989 -
Flags: review?(bugzilla)
Comment 14•17 years ago
|
||
Comment on attachment 276768 [details] [diff] [review] Patch 1.2 r=glob
Attachment #276768 -
Flags: review?(bugzilla) → review+
Updated•17 years ago
|
Flags: approval?
Flags: approval3.0?
Comment 15•17 years ago
|
||
Comment on attachment 276768 [details] [diff] [review] Patch 1.2 This patch crashes Firefox when viewing an attachment with a name such as bug0öäüöüéàè-cvs.diff
Attachment #276768 -
Flags: review-
Updated•17 years ago
|
Flags: approval?
Flags: approval3.0?
Comment 16•17 years ago
|
||
Confirming the crash using the file name Frédéric used. I found out what's so special about this file name. It worked for me with others; the difference is it's resulting encoded length. It turns out that encode('MIME-Q') inserts a line break and a blank whenever the resulting string length exceeds a certain limit (probably 80 or 76 or something). With this, Bugzilla produces a multiple-line HTTP header entry, like this: Content-Type: text/plain; name="=?UTF-8?Q?bug=200=C3=B6=C3=A4=C3=BC?= =?UTF-8?Q?=2Dcvs?=.diff"; charset=UTF-8 I'd say this conforms to section 4.2 of RFC 2616, but Apache seems not to be able to parse this and says "malformed header from script. Bad header=?UTF-8?Q?cvs?" (error 500). It s complaining about the beginning of the second line makes my current working theory to be that Apache is not able to cope with multiple-line header entries. Let's see whether I can find a way around this.
Comment 17•17 years ago
|
||
Attachment 213229 [details] may be used for testing. Its name is just "text file" (in Russian) but again long enough when encoded.
Comment 18•17 years ago
|
||
(In reply to comment #16) > I'd say this conforms to section 4.2 of RFC 2616, but Apache seems not to be > able to parse this and says "malformed header from script. RFC 2616 section 2.2: QUOTE The TEXT rule is only used for descriptive field contents and values that are not intended to be interpreted by the message parser. Words of *TEXT MAY contain characters from character sets other than ISO-8859-1 [22] only when encoded according to the rules of RFC 2047 [14]. TEXT = <any OCTET except CTLs, but including LWS> UNQUOTE As far as I can read English, it is hard to tell whether Apache considered "message parser" in the above context. RFC 2047 is not mentioned elsewhere. IMHO we should try to obey the Postel's principle, and generate headers safely parseable by non-liberal Apache...
Comment 19•17 years ago
|
||
Ok, next round. I didn't trust MIME::encode_qp to do everything all right in UTF-8 and non-UTF-8 cases (plus it doesn't add necessary escape sequences on its own), so I stuck to Encode::encode, fixing its output. I can't say it's a really elegant solution; if somebody comes up with a better one, I'll be happy to accept it. If not, I'd say we should take it for now. It works with - short ASCII file names - long ASCII file names - short non-ASCII UTF-8 file names - long non-ASCII UTF-8 file names “Short” means Encode::encode doesn't line-wrap the file name. “Long” means it does. I successfully tested both cases of wrapping, inside of an escaped sequence and outside of it.
Assignee: srr → wurblzap
Attachment #276768 -
Attachment is obsolete: true
Attachment #280276 -
Flags: review?(LpSolit)
Comment 20•17 years ago
|
||
Aren't long lines cut with normal ASCII filenames?
Comment 21•17 years ago
|
||
Look at the discussion in bug 374424. It seems that using Encode is the wrong way to go. Why not using MIME::Words, see the patch there?
Assignee | ||
Comment 22•17 years ago
|
||
FWIW, I'm actually trying to get rid of the dependency on MIME-tools, which also pulls in MailTools and lots of other stuff that's not needed by Bugzilla. If you can pull the necessary code out of MIME::Words and put it in Bugzilla::Util, that would also be fine.
Comment 23•17 years ago
|
||
Comment on attachment 280276 [details] [diff] [review] Patch 1.3 Canceling review till last comments are answered.
Attachment #280276 -
Flags: review?(LpSolit)
Assignee | ||
Comment 24•16 years ago
|
||
I suspect this bug needs an entirely new trunk patch now that bug 363153 is fixed.
Comment 25•16 years ago
|
||
The Bugzilla 3.0 branch is now locked to security bugs and dataloss fixes only. This bug doesn't fit into one of these two categories and is retargetted to 3.2 as part of a mass-change. To catch bugmails related to this mass-change, use lts081207 in your email client filter.
Target Milestone: Bugzilla 3.0 → Bugzilla 3.2
Comment 26•15 years ago
|
||
I hope this bug summary is clearer.
Summary: Attachment File name in utf8 → Attachment filenames having UTF8 characters in them are rendered incorrectly
Assignee | ||
Comment 28•15 years ago
|
||
Comment on attachment 280276 [details] [diff] [review] Patch 1.3 Okay, somebody should review this. Probably me.
Attachment #280276 -
Flags: review?(mkanat)
Assignee | ||
Comment 29•15 years ago
|
||
Okay, this is a pretty simple two-line fix that fixes it for me.
Assignee: wurblzap → mkanat
Attachment #280276 -
Attachment is obsolete: true
Attachment #395423 -
Flags: review?(wurblzap)
Attachment #280276 -
Flags: review?(mkanat)
Assignee | ||
Comment 30•15 years ago
|
||
Hey Wurblzap. Any chance of getting a review on this two-line patch some time soon?
Comment 31•15 years ago
|
||
(In reply to comment #30) > Hey Wurblzap. Any chance of getting a review on this two-line patch some time > soon? Yeah... Actually, today or tomorrow might do it.
Updated•15 years ago
|
Attachment #395423 -
Flags: review?(wurblzap) → review-
Comment 32•15 years ago
|
||
Comment on attachment 395423 [details] [diff] [review] v4 While the approach is way better, I don't think we should take advantage of undocumented features. My Encode-2.35 doesn't have docs on the bpl variable. If there's a newer version with appropriate docs, though, I'll be happy; I tested, and it works well for me. The bytes per line (which bpl probably means) value seems arbitrary to me and justifies a code comment. It should probably be the maximally possible length of an encoded filename; definitely not less. According to our schema, the maximum filename length is 100 characters. Afaik every UTF-8 character can be represented by no more than four bytes. Each byte, when encoded, takes up three ASCII characters (=xx). So we're at 100×4×3=1200. Then there's some header and trailer.
Assignee | ||
Comment 33•15 years ago
|
||
(In reply to comment #32) > (From update of attachment 395423 [details] [diff] [review]) > While the approach is way better, I don't think we should take advantage of > undocumented features. My Encode-2.35 doesn't have docs on the bpl variable. We already are using it in Bugzilla::Mailer--we talked to the maintainer of the module and I believe that he recommended it.
Comment 34•15 years ago
|
||
All right. Turns out bug 411544 comment 6 wasn't all that far off then :) So let's go with it. I'm sure you'll agree a code comment pointing to bug 411544 comment 21 would be good (for Mailer.pm, too, but that's a different story). Fix its value, too, and I'd say we're set.
Comment 35•15 years ago
|
||
Hrm... It seems that values above 998 violate RFC 2822 (bug 374424 comment 15 quotes RFC 2822). Does RFC 2822 apply here at all, or does it apply to email only?
Comment 36•15 years ago
|
||
(In reply to comment #35) > Does RFC 2822 apply here at all, or does it apply to email only? RFC 2822 is about email. It explicitly refers to RFC 2045 and RFC 2047 when it comes to MIME. As clarified in RFC 2616: QUOTE 19.4.7 MHTML and Line Length Limitations HTTP implementations which share code with MHTML [45] implementations need to be aware of MIME line length limitations. Since HTTP does not have this limitation, HTTP does not fold long lines. MHTML messages being transported by HTTP follow all conventions of MHTML, including line length limitations and folding, canonicalization, etc., since HTTP transports all message-bodies as payload (see section 3.7.2) and does not interpret the content or any MIME header lines that might be contained therein. UNQUOTE In this bug we're doing HTTP, so we don't have to fold anything, from RFC standpoint. What we may fear (and avoid, to obey Postel's principle) is some broken shared MHTML/HTTP _client_ implementation which had read 19.4.7 backwards and did actually impose some limit to any MIME encoded data...
Assignee | ||
Comment 37•15 years ago
|
||
Okay, modified BPL and the comment.
Attachment #395423 -
Attachment is obsolete: true
Attachment #403717 -
Flags: review?(wurblzap)
Assignee | ||
Updated•15 years ago
|
Target Milestone: Bugzilla 3.2 → Bugzilla 3.4
Comment 38•15 years ago
|
||
Comment on attachment 403717 [details] [diff] [review] v5 All right
Attachment #403717 -
Flags: review?(wurblzap) → review+
Assignee | ||
Updated•15 years ago
|
Flags: approval3.4+
Flags: approval+
Assignee | ||
Comment 39•15 years ago
|
||
tip: Checking in attachment.cgi; /cvsroot/mozilla/webtools/bugzilla/attachment.cgi,v <-- attachment.cgi new revision: 1.161; previous revision: 1.160 done 3.4: Checking in attachment.cgi; /cvsroot/mozilla/webtools/bugzilla/attachment.cgi,v <-- attachment.cgi new revision: 1.153.2.2; previous revision: 1.153.2.1 done
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Comment 40•13 years ago
|
||
I'm still experiencing trouble with this on Bugzilla 3.6, depending on what browser is being used. Firefox works okay, but when I click on the Russian attachment above in IE (including IE8), it loads a page of garbage. If I go to "save as", the filename shows up as: =_UTF-8_Q_=D0=A2=D0=B5=D0=BA=D1=81=D1=82=D0=BE=D0=B2=D1=8B=D0=B9=20=D0=B4=D0=BE=D0=BA=D1=83=D0=BC=D0=B5=D0=BD=D1=82_=.htm (I believe the .htm comes from the encoded stream being too long, so it lost the original extension.) I have the same issue as the guy who posted bug 481821 where my Japanese users are trying to upload files with kanji in the filename. IE is really dominant in Japan, so a solution that works on multiple browsers would be helpful.
Comment 41•13 years ago
|
||
Bugs will only get reopened if the checked-in patch needs to be backed out. That's not the case here. You're probably best off if you file a new bug for this, if there isn't already one, and refer to it from here in a comment. I'm seeing the effect, too, btw :)
Comment 42•13 years ago
|
||
All of the related bugs I could find were marked as duplicates of this one. Since I think its still an issue, I have opened up a new bug report at bug 619847. Sorry for the extra email.
Comment 43•11 years ago
|
||
Kool
Comment 44•11 years ago
|
||
Comment on attachment 403717 [details] [diff] [review] v5 >Index: attachment.cgi >=================================================================== >RCS file: /cvsroot/mozilla/webtools/bugzilla/attachment.cgi,v >retrieving revision 1.160 >diff -u -r1.160 attachment.cgi >--- attachment.cgi 28 Sep 2009 17:24:16 -0000 1.160 >+++ attachment.cgi 30 Sep 2009 06:55:54 -0000 >@@ -52,6 +52,8 @@ > use Bugzilla::Token; > use Bugzilla::Keyword; > >+use Encode qw(encode); >+ > # For most scripts we don't make $cgi and $template global variables. But > # when preparing Bugzilla for mod_perl, this script used these > # variables in so many subroutines that it was easier to just >@@ -318,6 +320,11 @@ > $filename =~ s/\\/\\\\/g; # escape backslashes > $filename =~ s/"/\\"/g; # escape quotes > >+ # Avoid line wrapping done by Encode, which we don't need for HTTP >+ # headers. See discussion in bug 328628 for details. >+ local $Encode::Encoding{'MIME-Q'}->{'bpl'} = 10000; >+ $filename = encode('MIME-Q', $filename); >+ > my $disposition = Bugzilla->params->{'allow_attachment_display'} ? 'inline' : 'attachment'; > > print $cgi->header(-type=>"$contenttype; name=\"$filename\"",
You need to log in
before you can comment on or make changes to this bug.
Description
•