Closed
Bug 388723
Opened 17 years ago
Closed 16 years ago
Bugzilla 3.0 does not wrap Japanese comment text
Categories
(Bugzilla :: User Interface, defect)
Tracking
()
RESOLVED
FIXED
Bugzilla 3.0
People
(Reporter: mozilla-bugzilla, Assigned: mkanat)
References
Details
(Keywords: intl, regression)
Attachments
(2 files, 1 obsolete file)
4.77 KB,
patch
|
himorin
:
review+
glob
:
review+
|
Details | Diff | Splinter Review |
3.90 KB,
patch
|
glob
:
review+
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en) AppleWebKit/522.11 (KHTML, like Gecko) Version/3.0.2 Safari/522.12 Build Identifier: We are in the process of upgrading from Bugzilla 2.18 to Bugzilla 3.0. We have both English and Japanese users. We set up a test installation of Bugzilla 3.0, imported our data (7 years worth), used the contrib/recode.pl script to convert it to UTF8, and switched Bugzilla to use UTF8. While testing out Bugzilla 3.0, we found that Bugzilla no longer wraps Japanese text in newly submitted comments. I assume this was introduced by the changes in bug 11901, which was committed in Bugzilla 2.20. This is preventing us from upgrading to Bugzilla 3.0. I will add another comment in Japanese to demonstrate. Reproducible: Always Here is the output from checksetup.pl for our version info: ---------------------------------------- * This is Bugzilla 3.0+ on perl 5.8.8 * Running on FreeBSD 6.1-RELEASE-p14 FreeBSD 6.1-RELEASE-p14 <snip> Checking perl modules... Checking for CGI (v2.93) ok: found v3.15 Checking for TimeDate (v2.21) ok: found v2.22 Checking for DBI (v1.41) ok: found v1.53 Checking for PathTools (v0.84) ok: found v3.24 Checking for Template-Toolkit (v2.12) ok: found v2.16 Checking for Email-Send (v2.00) ok: found v2.183 Checking for Email-MIME-Modifier (any) ok: found v1.441 Checking available perl DBD modules... Checking for DBD-Pg (v1.45) not found Checking for DBD-mysql (v2.9003) ok: found v4.001 The following Perl modules are optional: Checking for GD (v1.20) ok: found v2.35 Checking for Template-GD (any) ok: found v1.56 Checking for Chart (v1.0) ok: found v2.4.1 Checking for GDGraph (any) ok: found v1.4308 Checking for GDTextUtil (any) ok: found v0.86 Checking for XML-Twig (any) not found Checking for MIME-tools (v5.406) ok: found v5.420 Checking for libwww-perl (any) ok: found v2.033 Checking for PatchReader (v0.9.4) not found Checking for PerlMagick (any) ok: found v6.3.2 Checking for perl-ldap (any) not found Checking for SOAP-Lite (any) not found Checking for HTML-Parser (v3.40) ok: found v3.56 Checking for HTML-Scrubber (any) ok: found v0.08 Checking for Email-MIME-Attachment-Stripper (any) not found Checking for Email-Reply (any) not found Checking for mod_perl (v1.999022) not found Checking for CGI (v3.11) ok: found v3.15 Checking for Apache-DBI (v0.96) not found <snip> Checking for DBD-mysql (v2.9003) ok: found v4.001 Checking for MySQL (v4.1.2) ok: found v4.1.22 Checking for GraphViz (any) ok: found ------------------------------
英語だときちんと改行してくれますが、日本語で改行しないとWindow幅がどんどんひろくなってしまいますねー。これは前にもあった問題だったかもしれませんがどーだったのか良く覚えてないですねー。見ている環境はとりあえずFirefoxだけです。
Comment 2•17 years ago
|
||
i think it comes from the specification of Text::Wrap (cannot handle multibyte/ multi-width UTF-8 chars) workaround (used in our Bugzilla-jp : http://bugzilla.mozilla.gr.jp) is * disable Text::Wrap handlers * insert wrap="hard" to all textareas # i think i've heard that current bugzilla.org's not targetting 'truely i18n'... # (so, i have no hope to handle Japanese with original bugzilla distribution..) # so, many many problems will be occur to use in multi-byte UTF-8 languages :)
Thank you for the info, Shimono. Bugzilla-jp http://bugzilla.mozilla.gr.jp displays that it is running version 3.0-ja3. I can only see a link to the diff files for 2.20-ja at ftp://ftp.mozilla.gr.jp/Bugzilla-ja/2.20-ja/. Where can I find the diffs for 3.0-ja3? I have found Text::WrapI18N http://search.cpan.org/~kubota/Text-WrapI18N-0.06/WrapI18N.pm that has "support for multibyte, fullwidth, and combining characters and languages without whitespaces between words". Maybe this could be used in place of Text::Wrap?
Assignee | ||
Comment 4•17 years ago
|
||
Bugzilla does aim to have complete i18n support. You have demonstrated this bug very nicely. :-) Technically this is a regression of bug 282349. I'll investigate Text::WrapI18N.
Status: UNCONFIRMED → NEW
Depends on: 282349
Ever confirmed: true
Keywords: intl,
regression
Target Milestone: --- → Bugzilla 3.0
Version: unspecified → 2.20
Assignee | ||
Comment 5•17 years ago
|
||
Text::WrapI18N hasn't been updated since 2003 (though that might be OK) and doesn't support $Text::Wrap::huge, which would make it difficult to drop-in as a replacement for Text::Wrap. (It also doesn't support some other variables that we might need.) I'd be interested to see what Text::Wrap does if the string actually has the utf8 bit set in perl. (Maybe it does the exact same thing, but I have no idea.) (In order to do this, we have to have the various DBD drivers return data tagged as utf8, which we don't do yet, but is totally possible. We have a bug for it somewhere, but I'm not finding it.)
Assignee: ui → mkanat
Assignee | ||
Comment 6•17 years ago
|
||
The bug that I couldn't find was bug 363153.
Comment 7•17 years ago
|
||
I don't trust Text::WrapI18N too much as it depends on /usr/bin/locale.
Assignee | ||
Comment 8•17 years ago
|
||
(In reply to comment #7) > I don't trust Text::WrapI18N too much as it depends on /usr/bin/locale. Ah, okay. Yeah, that wouldn't work on Windows.
Comment 9•17 years ago
|
||
(In reply to comment #3) > Bugzilla-jp http://bugzilla.mozilla.gr.jp displays that it is running version > 3.0-ja3. I can only see a link to the diff files for 2.20-ja at > ftp://ftp.mozilla.gr.jp/Bugzilla-ja/2.20-ja/. Where can I find the diffs for > 3.0-ja3? ah,,, sorry.. i don't have enough time to update that page.. (even updating static html :p) please check out from http://svn.mozilla.gr.jp/ > I have found Text::WrapI18N > http://search.cpan.org/~kubota/Text-WrapI18N-0.06/WrapI18N.pm that has "support > for multibyte, fullwidth, and combining characters and languages without > whitespaces between words". Maybe this could be used in place of Text::Wrap? WrapI18N is first written for debian d-i (as my knowledge). i've failed to change Wrap to WrapI18N on bug-ja 2.20-ja code that i couldn't work WrapI18N on our server CentOS. (In reply to comment #5) > I'd be interested to see what Text::Wrap does if the string actually has the > utf8 bit set in perl. (Maybe it does the exact same thing, but I have no idea.) > (In order to do this, we have to have the various DBD drivers return data > tagged as utf8, which we don't do yet, but is totally possible. We have a bug > for it somewhere, but I'm not finding it.) current version of DBD drivers for mysql, pgsql can do that. (set utf8 flag) i've installed 'utf8' flag, but i had a serious bug related on CGI.pm
Assignee | ||
Comment 10•17 years ago
|
||
(In reply to comment #9) Hey himorin. I just checked in a patch to the Bugzilla CVS HEAD that sets the utf8 bit on correctly on strings. Could you tell me what Text::Wrap does now? I suspect it still won't work. I think we have two options: 1) We could set $Text::Wrap::break appropriately, somehow. There might be a good Unicode regex that would work. 2) Possibly look into using Unicode::Wrap?
Comment 11•17 years ago
|
||
As I noted on the ML, Unicode::Wrap is not a direct replacement, and might have performance problems "This module can be slow. It's a pure-Perl implementation that goes through an expensive classification process per character." This said Text::WrapI18N apparently only does CJK, also the documentation says "The texts have to be written in locale encoding."
Comment 12•17 years ago
|
||
(In reply to comment #10) > Hey himorin. I just checked in a patch to the Bugzilla CVS HEAD that sets the > utf8 bit on correctly on strings. Could you tell me what Text::Wrap does now? I > suspect it still won't work. yeah, it *still* doesn't work. see. http://bugzilla-trunk.test.mozilla.gr.jp/show_bug.cgi?id=5952 # today (20071127 1130JST)'s cvs head > I think we have two options: > > 1) We could set $Text::Wrap::break appropriately, somehow. There might be a > good Unicode regex that would work. sorry, i have no idea for good unicode (utf-8 byte sequence) regex.. of cource, we can cut when first bit (8bit) is set,, but it's ignoring some unicode features. # like http://services.mozilla.gr.jp/viewsvn/trunk/I18N.pm?revision=111&view=markup The most difficult thing, i think, is about composite griph. > 2) Possibly look into using Unicode::Wrap? i agree with comment #11
Reporter | ||
Comment 13•17 years ago
|
||
My company is very anxious for a fix for this issue (we're stuck at Bugzilla 2.18 because of it). It sounds like Text:Wrap, Text::WrapI18N, and Unicode::Wrap each have their own issues. Are there any other options? How can we help? (We're coders, but our perl experience is limited to simple customization changes to Bugzilla.)
Assignee | ||
Comment 14•17 years ago
|
||
(In reply to comment #13) > How can we help? (We're coders, but our perl experience is limited to simple > customization changes to Bugzilla.) Hrm. You could do a detailed performance analysis of Unicode::Wrap vs. Text::Wrap, that would help.
Comment 15•16 years ago
|
||
Couldn't install Unicode::Wrap 0.0.3 on perl 5.8.5.. It seems some bugs are in version 0.0.3? (refer http://cpantesters.perl.org/show/Unicode-Wrap.html#Unicode-Wrap-0.03, too.) # But could install 0.0.1 hmm. What's the other solutions? Of cource, i think we can include some perl module like Wrap.pm (of Unicode::Wrap) in util, technically.. Or, change the template for showing bugs from <pre> to <div> or something?
Assignee | ||
Comment 16•16 years ago
|
||
Just so you know, TASCAM has contracted Everything Solved (my company) to provide a solution for this bug, for them. It might be specific to their needs (Japanese and English), or it might be generally applicable. Either way I'll attach it here.
Assignee: ui → mkanat
Assignee | ||
Comment 17•16 years ago
|
||
Okay, here is a tip patch that handles this problem as far as I can see. Eseentially, what it does is use CSS 3 and custom browser CSS to make preformatted text wrap correctly. This wraps Chinese, Japense, and Korean correctly, while Bugzilla itself still does the wrapping of space-containing languages internally using Text::Wrap. Originally, when writing this patch, I removed wrap_comment and just had the browser do all the wrapping. However, this wouldn't work in browsers like lynx. It also made Bugzilla pretty dependent upon modern browsers just to display correctly--for example, there's a good chance that the code wouldn't have worked at all in Safari 2.x. I'd actually love to remove wrap_comment and let the browser do all the work, but Safari was really a problem. I don't want to exclude Safari 1.0 users, even, really, from seeing correctly-wrapped comments.
Attachment #298595 -
Flags: review?(shimono)
Attachment #298595 -
Flags: review?(LpSolit)
Assignee | ||
Updated•16 years ago
|
Attachment #298595 -
Flags: review?(wurblzap)
Attachment #298595 -
Flags: review?(bugzilla)
Attachment #298595 -
Flags: review?(LpSolit)
Assignee | ||
Comment 18•16 years ago
|
||
By the way, I did a tremendous amount of research for this bug. Unicode::Wrap had a bug since perl 5.8.4, and even when I fixed that bug, the module didn't pass its own tests. I spent a bit of time trying to fix it, and didn't get anywhere, probably because I don't understand UAX#14 (the Unicode Standard Annex 14--the specification for line-breaking in all languages, which is huge and too long for me to read and re-implement as part of this bug). Text::Wrap::I18N was from Debian originally, I think, and so of course is dependent upon /usr/bin/locale. There's a module called Text::ChaSen that can process Japanese text into "phrases" or "sentences", but it's dependent upon a third-party library called ChaSen that has to be compiled and installed, from my understanding. Also, its documentation is not in English. Finally, the solution would have been limited to Japanese in any case. The nice advantage of browser-side wrapping is that if browsers improve their UAX#14 compliance in the future, Bugzilla gets that functionality automatically. The only thing this patch doesn't fix is emails. That would have required one of the above server-side solutions, or I could alternately make Bugzilla wrap_hard for emails, but that would break mail clients' URL highlighting for comments.
Status: NEW → ASSIGNED
Assignee | ||
Comment 19•16 years ago
|
||
Oh, I realized that quoted text was being wrapped, incorrectly. This patch fixes it.
Attachment #298595 -
Attachment is obsolete: true
Attachment #298604 -
Flags: review?(wurblzap)
Attachment #298604 -
Flags: review?(shimono)
Attachment #298595 -
Flags: review?(wurblzap)
Attachment #298595 -
Flags: review?(shimono)
Attachment #298595 -
Flags: review?(bugzilla)
Comment 20•16 years ago
|
||
Comment on attachment 298604 [details] [diff] [review] v2 (tip) 1. I think we can remove 'FILTER wrap_comment' from templates. 2. Shouldn't we change the comment display width from >+.bz_comment_text { >+ width: 50em; >+} to something like 'width: xx%;'?
Attachment #298604 -
Flags: review?(shimono) → review-
Assignee | ||
Comment 21•16 years ago
|
||
(In reply to comment #20) > (From update of attachment 298604 [details] [diff] [review]) > 1. I think we can remove 'FILTER wrap_comment' from templates. Did you read my comments above? > 2. Shouldn't we change the comment display width from > >+.bz_comment_text { > >+ width: 50em; > >+} > to something like 'width: xx%;'? No, comments have a fixed width.
Comment 22•16 years ago
|
||
(In reply to comment #21) > > (From update of attachment 298604 [details] [diff] [review] [details]) > > 1. I think we can remove 'FILTER wrap_comment' from templates. > > Did you read my comments above? Yeah. If you want to support Safari or something which don't support CSS 3 features, I think this bug do not resolved the patch above, as you said. In that case, we should make some module using UAX 14. (Or 'remove' PRE formatting and use DIV with fixed fonts.) # I forgot to test with long texts which include us-ascii and the others mixed. > > 2. Shouldn't we change the comment display width from > > >+.bz_comment_text { > > >+ width: 50em; > > >+} > > to something like 'width: xx%;'? > > No, comments have a fixed width. Ok. At least, i think we might change this to match with the hard wrapping for us-ascii.
Comment 23•16 years ago
|
||
(In reply to comment #22) > > > (From update of attachment 298604 [details] [diff] [review] [details] [details]) > > > 1. I think we can remove 'FILTER wrap_comment' from templates. > > > > Did you read my comments above? > > Yeah. > If you want to support Safari or something which don't support CSS 3 features, > I think this bug do not resolved the patch above, as you said. In that case, we > should make some module using UAX 14. (Or 'remove' PRE formatting and use DIV > with fixed fonts.) mm, i totally forgot about users only using english... For them, current wrap_comment system might be the most suitable one.
Assignee | ||
Comment 24•16 years ago
|
||
(In reply to comment #22) > we should make some module using UAX 14. That's really easy to *say*. Have you read UAX 14? Do *you* want to fix Unicode::Wrap? > # I forgot to test with long texts which include us-ascii and the others mixed. I tested it. > At least, i think we might change this to match with the hard wrapping for > us-ascii. I didn't do that on purpose, because it could possibly re-wrap the already-wrapped us-ascii comments, and "em" is not a precise measure. (It depends on fonts installed, so you have to leave some headroom.) (In reply to comment #23) > mm, i totally forgot about users only using english... Or *any* non-CJK language.
Comment 25•16 years ago
|
||
(In reply to comment #24) > > we should make some module using UAX 14. > > That's really easy to *say*. Have you read UAX 14? Do *you* want to fix > Unicode::Wrap? I've read, and i've selected to set 'wrap=hard' on html/textarea rather than to write some code for wrapping for our bugzilla site.. # UAX#14 was one problem, and the "composite glyph" was another... > > # I forgot to test with long texts which include us-ascii and the others mixed. > > I tested it. thx. > > At least, i think we might change this to match with the hard wrapping for > > us-ascii. > > I didn't do that on purpose, because it could possibly re-wrap the > already-wrapped us-ascii comments, and "em" is not a precise measure. (It > depends on fonts installed, so you have to leave some headroom.) > > (In reply to comment #23) > > mm, i totally forgot about users only using english... > > Or *any* non-CJK language. Thank you for teaching that. And, now, we (japanese mozilla contributors) are discussing about this patch. Could you wait it? (for more "a few" days.)
Assignee | ||
Comment 26•16 years ago
|
||
(In reply to comment #25) > And, now, we (japanese mozilla contributors) are discussing about this patch. > Could you wait it? (for more "a few" days.) Yeah, sure.
Comment 27•16 years ago
|
||
(In reply to comment #26) > > And, now, we (japanese mozilla contributors) are discussing about this patch. > > Could you wait it? (for more "a few" days.) > > Yeah, sure. As talking with -ja contributors, i add r+ for this patch. # If this happens with only CJK, it might be ok to do with simple sprintf for email? (of cource, if we ignore about composite gryphs)
Updated•16 years ago
|
Attachment #298604 -
Flags: review- → review+
Assignee | ||
Comment 28•16 years ago
|
||
Comment on attachment 298604 [details] [diff] [review] v2 (tip) Glob, could you give this a quick look-over to make sure it all seems sane to you?
Attachment #298604 -
Flags: review?(wurblzap) → review?(bugzilla)
Assignee | ||
Comment 29•16 years ago
|
||
(In reply to comment #27) > # If this happens with only CJK, it might be ok to do with simple sprintf for > email? (of cource, if we ignore about composite gryphs) That would be possible in a separate bug. Do you have a list of all the CJK codepoints?
Comment 30•16 years ago
|
||
Comment on attachment 298604 [details] [diff] [review] v2 (tip) the css to force word wrapping looks correct. it seems odd that we've got the IE fixes in a separate file, while all the other browser specific fixes are in global.css. ah well :) i'm going to r+ this, however.. >+.bz_comment_text { >+ width: 50em; >+} > /* For bug fields */ > .uneditable_textarea { >+ width: 30em; > } any reason for these being different widths?
Attachment #298604 -
Flags: review?(bugzilla) → review+
Assignee | ||
Comment 31•16 years ago
|
||
(In reply to comment #30) > any reason for these being different widths? Yeah, uneditable textareas show up in the editing form, so they're shorter. They're actually wrapped at 60 chars instead of 80.
Assignee | ||
Comment 32•16 years ago
|
||
I still need to backport this to 3.0, but I want to approve it for 3.2 to keep it in our radar.
Flags: approval+
Assignee | ||
Comment 33•16 years ago
|
||
Here's the same patch for 3.0. I tested it and it seems to be fine.
Attachment #301973 -
Flags: review?(bugzilla)
Comment 34•16 years ago
|
||
Comment on attachment 301973 [details] [diff] [review] v2 (3.0) r=glob
Attachment #301973 -
Flags: review?(bugzilla) → review+
Assignee | ||
Updated•16 years ago
|
Flags: approval3.0+
Assignee | ||
Comment 35•16 years ago
|
||
Thanks for all the help and reviews, everybody! :-) tip: Checking in skins/standard/IE-fixes.css; /cvsroot/mozilla/webtools/bugzilla/skins/standard/IE-fixes.css,v <-- IE-fixes.css new revision: 1.2; previous revision: 1.1 done Checking in skins/standard/global.css; /cvsroot/mozilla/webtools/bugzilla/skins/standard/global.css,v <-- global.css new revision: 1.45; previous revision: 1.44 done Checking in template/en/default/attachment/midair.html.tmpl; /cvsroot/mozilla/webtools/bugzilla/template/en/default/attachment/midair.html.tmpl,v <-- midair.html.tmpl new revision: 1.2; previous revision: 1.1 done Checking in template/en/default/bug/comments.html.tmpl; /cvsroot/mozilla/webtools/bugzilla/template/en/default/bug/comments.html.tmpl,v <-- comments.html.tmpl new revision: 1.36; previous revision: 1.35 done Checking in template/en/default/bug/process/midair.html.tmpl; /cvsroot/mozilla/webtools/bugzilla/template/en/default/bug/process/midair.html.tmpl,v <-- midair.html.tmpl new revision: 1.22; previous revision: 1.21 done Checking in template/en/default/pages/linked.html.tmpl; /cvsroot/mozilla/webtools/bugzilla/template/en/default/pages/linked.html.tmpl,v <-- linked.html.tmpl new revision: 1.10; previous revision: 1.9 done 3.0: Checking in skins/standard/IE-fixes.css; /cvsroot/mozilla/webtools/bugzilla/skins/standard/IE-fixes.css,v <-- IE-fixes.css new revision: 1.1.2.1; previous revision: 1.1 done Checking in skins/standard/global.css; /cvsroot/mozilla/webtools/bugzilla/skins/standard/global.css,v <-- global.css new revision: 1.31.2.3; previous revision: 1.31.2.2 done Checking in template/en/default/bug/comments.html.tmpl; /cvsroot/mozilla/webtools/bugzilla/template/en/default/bug/comments.html.tmpl,v <-- comments.html.tmpl new revision: 1.28.2.1; previous revision: 1.28 done Checking in template/en/default/bug/process/midair.html.tmpl; /cvsroot/mozilla/webtools/bugzilla/template/en/default/bug/process/midair.html.tmpl,v <-- midair.html.tmpl new revision: 1.19.2.1; previous revision: 1.19 done Checking in template/en/default/pages/linked.html.tmpl; /cvsroot/mozilla/webtools/bugzilla/template/en/default/pages/linked.html.tmpl,v <-- linked.html.tmpl new revision: 1.8.8.1; previous revision: 1.8 done
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Comment 36•16 years ago
|
||
Note that Firefox now throws tons of warnings in the Error console about these two lines: white-space: -pre-wrap; /* Opera 4-6 */ white-space: -o-pre-wrap; /* Opera 7 */ It seems it doesn't know what to do with them.
Comment 37•16 years ago
|
||
sorry that i forgot this. (In reply to comment #29) > > # If this happens with only CJK, it might be ok to do with simple sprintf for > > email? (of cource, if we ignore about composite gryphs) > > That would be possible in a separate bug. Do you have a list of all the CJK > codepoints? you can use scripts or block identifier in regexp. http://www.unicode.org/Public/UNIDATA/Scripts.txt http://www.unicode.org/Public/UNIDATA/Blocks.txt like $string =~ /\p{InCJKSymbolsAndPunctuation}/;
Assignee | ||
Comment 38•16 years ago
|
||
(In reply to comment #37) > you can use scripts or block identifier in regexp. > http://www.unicode.org/Public/UNIDATA/Scripts.txt > http://www.unicode.org/Public/UNIDATA/Blocks.txt > > like $string =~ /\p{InCJKSymbolsAndPunctuation}/; Okay, file this as a separate bug and CC me on it. I think that there might be some perl-version limitation on using those block names (like it might require 5.8.x > 5.8.1 or something), but that could work, yeah.
Comment 39•16 years ago
|
||
(In reply to comment #38) > Okay, file this as a separate bug and CC me on it. I think that there might > be some perl-version limitation on using those block names Not sure there is a limitation: http://www.unix.org.ua/orelly/perl/prog3/ch05_04.htm
You need to log in
before you can comment on or make changes to this bug.
Description
•