Closed Bug 388723 Opened 17 years ago Closed 16 years ago

Bugzilla 3.0 does not wrap Japanese comment text

Categories

(Bugzilla :: User Interface, defect)

2.20
defect
Not set
major

Tracking

()

RESOLVED FIXED
Bugzilla 3.0

People

(Reporter: mozilla-bugzilla, Assigned: mkanat)

References

Details

(Keywords: intl, regression)

Attachments

(2 files, 1 obsolete file)

User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en) AppleWebKit/522.11 (KHTML, like Gecko) Version/3.0.2 Safari/522.12
Build Identifier: 

We are in the process of upgrading from Bugzilla 2.18 to Bugzilla 3.0.  We have both English and Japanese users.  We set up a test installation of Bugzilla 3.0, imported our data (7 years worth), used the contrib/recode.pl script to convert it to UTF8, and switched Bugzilla to use UTF8.

While testing out Bugzilla 3.0, we found that Bugzilla no longer wraps Japanese text in newly submitted comments.  I assume this was introduced by the changes in bug 11901, which was committed in Bugzilla 2.20.

This is preventing us from upgrading to Bugzilla 3.0.

I will add another comment in Japanese to demonstrate.



Reproducible: Always




Here is the output from checksetup.pl for our version info:
----------------------------------------
* This is Bugzilla 3.0+ on perl 5.8.8
* Running on FreeBSD 6.1-RELEASE-p14 FreeBSD 6.1-RELEASE-p14 <snip>

Checking perl modules...
Checking for             CGI (v2.93)   ok: found v3.15 
Checking for        TimeDate (v2.21)   ok: found v2.22 
Checking for             DBI (v1.41)   ok: found v1.53 
Checking for       PathTools (v0.84)   ok: found v3.24 
Checking for Template-Toolkit (v2.12)   ok: found v2.16 
Checking for      Email-Send (v2.00)   ok: found v2.183 
Checking for Email-MIME-Modifier (any)     ok: found v1.441 

Checking available perl DBD modules...
Checking for          DBD-Pg (v1.45)    not found 
Checking for       DBD-mysql (v2.9003) ok: found v4.001 

The following Perl modules are optional:
Checking for              GD (v1.20)   ok: found v2.35 
Checking for     Template-GD (any)     ok: found v1.56 
Checking for           Chart (v1.0)    ok: found v2.4.1 
Checking for         GDGraph (any)     ok: found v1.4308 
Checking for      GDTextUtil (any)     ok: found v0.86 
Checking for        XML-Twig (any)      not found 
Checking for      MIME-tools (v5.406)  ok: found v5.420 
Checking for     libwww-perl (any)     ok: found v2.033 
Checking for     PatchReader (v0.9.4)   not found 
Checking for      PerlMagick (any)     ok: found v6.3.2 
Checking for       perl-ldap (any)      not found 
Checking for       SOAP-Lite (any)      not found 
Checking for     HTML-Parser (v3.40)   ok: found v3.56 
Checking for   HTML-Scrubber (any)     ok: found v0.08 
Checking for Email-MIME-Attachment-Stripper (any)      not found 
Checking for     Email-Reply (any)      not found 
Checking for        mod_perl (v1.999022)  not found 
Checking for             CGI (v3.11)   ok: found v3.15 
Checking for      Apache-DBI (v0.96)    not found 

<snip>

Checking for       DBD-mysql (v2.9003) ok: found v4.001 
Checking for           MySQL (v4.1.2)  ok: found v4.1.22
Checking for        GraphViz (any)     ok: found
------------------------------
英語だときちんと改行してくれますが、日本語で改行しないとWindow幅がどんどんひろくなってしまいますねー。これは前にもあった問題だったかもしれませんがどーだったのか良く覚えてないですねー。見ている環境はとりあえずFirefoxだけです。
i think it comes from the specification of Text::Wrap (cannot handle multibyte/
multi-width UTF-8 chars)
workaround (used in our Bugzilla-jp : http://bugzilla.mozilla.gr.jp) is
* disable Text::Wrap handlers
* insert wrap="hard" to all textareas

# i think i've heard that current bugzilla.org's not targetting 'truely i18n'...
# (so, i have no hope to handle Japanese with original bugzilla distribution..)
# so, many many problems will be occur to use in multi-byte UTF-8 languages :)
Thank you for the info, Shimono.

Bugzilla-jp http://bugzilla.mozilla.gr.jp displays that it is running version 3.0-ja3.  I can only see a link to the diff files for 2.20-ja at ftp://ftp.mozilla.gr.jp/Bugzilla-ja/2.20-ja/.  Where can I find the diffs for 3.0-ja3?

I have found Text::WrapI18N http://search.cpan.org/~kubota/Text-WrapI18N-0.06/WrapI18N.pm that has "support for multibyte, fullwidth, and combining characters and languages without whitespaces between words".  Maybe this could be used in place of Text::Wrap?
Bugzilla does aim to have complete i18n support. You have demonstrated this bug very nicely. :-)

Technically this is a regression of bug 282349.

I'll investigate Text::WrapI18N.
Status: UNCONFIRMED → NEW
Depends on: 282349
Ever confirmed: true
Keywords: intl, regression
Target Milestone: --- → Bugzilla 3.0
Version: unspecified → 2.20
Text::WrapI18N hasn't been updated since 2003 (though that might be OK) and doesn't support $Text::Wrap::huge, which would make it difficult to drop-in as a replacement for Text::Wrap. (It also doesn't support some other variables that we might need.)

I'd be interested to see what Text::Wrap does if the string actually has the utf8 bit set in perl. (Maybe it does the exact same thing, but I have no idea.) (In order to do this, we have to have the various DBD drivers return data tagged as utf8, which we don't do yet, but is totally possible. We have a bug for it somewhere, but I'm not finding it.)
Assignee: ui → mkanat
The bug that I couldn't find was bug 363153.
I don't trust Text::WrapI18N too much as it depends on /usr/bin/locale.
(In reply to comment #7)
> I don't trust Text::WrapI18N too much as it depends on /usr/bin/locale.

  Ah, okay. Yeah, that wouldn't work on Windows.
(In reply to comment #3)
> Bugzilla-jp http://bugzilla.mozilla.gr.jp displays that it is running version
> 3.0-ja3.  I can only see a link to the diff files for 2.20-ja at
> ftp://ftp.mozilla.gr.jp/Bugzilla-ja/2.20-ja/.  Where can I find the diffs for
> 3.0-ja3?

ah,,, sorry.. i don't have enough time to update that page.. (even updating 
static html :p)
please check out from http://svn.mozilla.gr.jp/

> I have found Text::WrapI18N
> http://search.cpan.org/~kubota/Text-WrapI18N-0.06/WrapI18N.pm that has "support
> for multibyte, fullwidth, and combining characters and languages without
> whitespaces between words".  Maybe this could be used in place of Text::Wrap?

WrapI18N is first written for debian d-i (as my knowledge).
i've failed to change Wrap to WrapI18N on bug-ja 2.20-ja code that i couldn't
work WrapI18N on our server CentOS.

(In reply to comment #5)
> I'd be interested to see what Text::Wrap does if the string actually has the
> utf8 bit set in perl. (Maybe it does the exact same thing, but I have no idea.)
> (In order to do this, we have to have the various DBD drivers return data
> tagged as utf8, which we don't do yet, but is totally possible. We have a bug
> for it somewhere, but I'm not finding it.)

current version of DBD drivers for mysql, pgsql can do that. (set utf8 flag)
i've installed 'utf8' flag, but i had a serious bug related on CGI.pm
Assignee: mkanat → ui
Depends on: bz-utf8
(In reply to comment #9)
Hey himorin. I just checked in a patch to the Bugzilla CVS HEAD that sets the utf8 bit on correctly on strings. Could you tell me what Text::Wrap does now? I suspect it still won't work.

I think we have two options:

1) We could set $Text::Wrap::break appropriately, somehow. There might be a good Unicode regex that would work.

2) Possibly look into using Unicode::Wrap?
As I noted on the ML, Unicode::Wrap is not a direct replacement, and might have performance problems "This module can be slow. It's a pure-Perl implementation that goes through an expensive classification process per character." 

This said Text::WrapI18N apparently only does CJK, also the documentation says "The texts have to be written in locale encoding."
(In reply to comment #10)
> Hey himorin. I just checked in a patch to the Bugzilla CVS HEAD that sets the
> utf8 bit on correctly on strings. Could you tell me what Text::Wrap does now? I
> suspect it still won't work.

yeah, it *still* doesn't work.
see. http://bugzilla-trunk.test.mozilla.gr.jp/show_bug.cgi?id=5952
# today (20071127 1130JST)'s cvs head

> I think we have two options:
> 
> 1) We could set $Text::Wrap::break appropriately, somehow. There might be a
> good Unicode regex that would work.

sorry, i have no idea for good unicode (utf-8 byte sequence) regex..
of cource, we can cut when first bit (8bit) is set,, but it's ignoring some unicode features.
# like http://services.mozilla.gr.jp/viewsvn/trunk/I18N.pm?revision=111&view=markup

The most difficult thing, i think, is about composite griph.

> 2) Possibly look into using Unicode::Wrap?

i agree with comment #11
My company is very anxious for a fix for this issue (we're stuck at Bugzilla 2.18 because of it).

It sounds like Text:Wrap, Text::WrapI18N, and Unicode::Wrap each have their own issues.  Are there any other options?

How can we help?  (We're coders, but our perl experience is limited to simple customization changes to Bugzilla.)
(In reply to comment #13)
> How can we help?  (We're coders, but our perl experience is limited to simple
> customization changes to Bugzilla.)

  Hrm. You could do a detailed performance analysis of Unicode::Wrap vs. Text::Wrap, that would help.
Couldn't install Unicode::Wrap 0.0.3 on perl 5.8.5.. It seems some bugs are in version 0.0.3? (refer http://cpantesters.perl.org/show/Unicode-Wrap.html#Unicode-Wrap-0.03, too.)
# But could install 0.0.1

hmm. What's the other solutions?
Of cource, i think we can include some perl module like Wrap.pm (of Unicode::Wrap) in util, technically..

Or, change the template for showing bugs from <pre> to <div> or something?
Just so you know, TASCAM has contracted Everything Solved (my company) to provide a solution for this bug, for them. It might be specific to their needs (Japanese and English), or it might be generally applicable. Either way I'll attach it here.
Assignee: ui → mkanat
Attached patch v1 (tip) (obsolete) — Splinter Review
Okay, here is a tip patch that handles this problem as far as I can see.

Eseentially, what it does is use CSS 3 and custom browser CSS to make preformatted text wrap correctly. This wraps Chinese, Japense, and Korean correctly, while Bugzilla itself still does the wrapping of space-containing languages internally using Text::Wrap.

Originally, when writing this patch, I removed wrap_comment and just had the browser do all the wrapping. However, this wouldn't work in browsers like lynx. It also made Bugzilla pretty dependent upon modern browsers just to display correctly--for example, there's a good chance that the code wouldn't have worked at all in Safari 2.x.

I'd actually love to remove wrap_comment and let the browser do all the work, but Safari was really a problem. I don't want to exclude Safari 1.0 users, even, really, from seeing correctly-wrapped comments.
Attachment #298595 - Flags: review?(shimono)
Attachment #298595 - Flags: review?(LpSolit)
Attachment #298595 - Flags: review?(wurblzap)
Attachment #298595 - Flags: review?(bugzilla)
Attachment #298595 - Flags: review?(LpSolit)
By the way, I did a tremendous amount of research for this bug.

Unicode::Wrap had a bug since perl 5.8.4, and even when I fixed that bug, the module didn't pass its own tests. I spent a bit of time trying to fix it, and didn't get anywhere, probably because I don't understand UAX#14 (the Unicode Standard Annex 14--the specification for line-breaking in all languages, which is huge and too long for me to read and re-implement as part of this bug).

Text::Wrap::I18N was from Debian originally, I think, and so of course is dependent upon /usr/bin/locale.

There's a module called Text::ChaSen that can process Japanese text into "phrases" or "sentences", but it's dependent upon a third-party library called ChaSen that has to be compiled and installed, from my understanding. Also, its documentation is not in English. Finally, the solution would have been limited to Japanese in any case.

The nice advantage of browser-side wrapping is that if browsers improve their UAX#14 compliance in the future, Bugzilla gets that functionality automatically.

The only thing this patch doesn't fix is emails. That would have required one of the above server-side solutions, or I could alternately make Bugzilla wrap_hard for emails, but that would break mail clients' URL highlighting for comments.
Status: NEW → ASSIGNED
Attached patch v2 (tip)Splinter Review
Oh, I realized that quoted text was being wrapped, incorrectly. This patch fixes it.
Attachment #298595 - Attachment is obsolete: true
Attachment #298604 - Flags: review?(wurblzap)
Attachment #298604 - Flags: review?(shimono)
Attachment #298595 - Flags: review?(wurblzap)
Attachment #298595 - Flags: review?(shimono)
Attachment #298595 - Flags: review?(bugzilla)
Comment on attachment 298604 [details] [diff] [review]
v2 (tip)

1. I think we can remove 'FILTER wrap_comment' from templates.

2. Shouldn't we change the comment display width from 
>+.bz_comment_text {
>+     width: 50em;
>+}
to something like 'width: xx%;'?
Attachment #298604 - Flags: review?(shimono) → review-
(In reply to comment #20)
> (From update of attachment 298604 [details] [diff] [review])
> 1. I think we can remove 'FILTER wrap_comment' from templates.

  Did you read my comments above?

> 2. Shouldn't we change the comment display width from 
> >+.bz_comment_text {
> >+     width: 50em;
> >+}
> to something like 'width: xx%;'?

  No, comments have a fixed width.
(In reply to comment #21)
> > (From update of attachment 298604 [details] [diff] [review] [details])
> > 1. I think we can remove 'FILTER wrap_comment' from templates.
> 
>   Did you read my comments above?

Yeah.
If you want to support Safari or something which don't support CSS 3 features, I think this bug do not resolved the patch above, as you said. In that case, we should make some module using UAX 14. (Or 'remove' PRE formatting and use DIV with fixed fonts.)

# I forgot to test with long texts which include us-ascii and the others mixed. 

> > 2. Shouldn't we change the comment display width from 
> > >+.bz_comment_text {
> > >+     width: 50em;
> > >+}
> > to something like 'width: xx%;'?
> 
>   No, comments have a fixed width.

Ok.
At least, i think we might change this to match with the hard wrapping for us-ascii.
(In reply to comment #22)
> > > (From update of attachment 298604 [details] [diff] [review] [details] [details])
> > > 1. I think we can remove 'FILTER wrap_comment' from templates.
> > 
> >   Did you read my comments above?
> 
> Yeah.
> If you want to support Safari or something which don't support CSS 3 features,
> I think this bug do not resolved the patch above, as you said. In that case, we
> should make some module using UAX 14. (Or 'remove' PRE formatting and use DIV
> with fixed fonts.)

mm, i totally forgot about users only using english...
For them, current wrap_comment system might be the most suitable one.
(In reply to comment #22)
> we should make some module using UAX 14.

  That's really easy to *say*. Have you read UAX 14? Do *you* want to fix Unicode::Wrap?

> # I forgot to test with long texts which include us-ascii and the others mixed. 

  I tested it.

> At least, i think we might change this to match with the hard wrapping for
> us-ascii.

  I didn't do that on purpose, because it could possibly re-wrap the already-wrapped us-ascii comments, and "em" is not a precise measure. (It depends on fonts installed, so you have to leave some headroom.)

(In reply to comment #23)
> mm, i totally forgot about users only using english...

  Or *any* non-CJK language.
(In reply to comment #24)
> > we should make some module using UAX 14.
> 
>   That's really easy to *say*. Have you read UAX 14? Do *you* want to fix
> Unicode::Wrap?

I've read, and i've selected to set 'wrap=hard' on html/textarea rather than to write some code for wrapping for our bugzilla site..
# UAX#14 was one problem, and the "composite glyph" was another...

> > # I forgot to test with long texts which include us-ascii and the others mixed. 
> 
>   I tested it.
thx.

> > At least, i think we might change this to match with the hard wrapping for
> > us-ascii.
> 
>   I didn't do that on purpose, because it could possibly re-wrap the
> already-wrapped us-ascii comments, and "em" is not a precise measure. (It
> depends on fonts installed, so you have to leave some headroom.)
> 
> (In reply to comment #23)
> > mm, i totally forgot about users only using english...
> 
>   Or *any* non-CJK language.

Thank you for teaching that.

And, now, we (japanese mozilla contributors) are discussing about this patch. Could you wait it? (for more "a few" days.)
(In reply to comment #25)
> And, now, we (japanese mozilla contributors) are discussing about this patch.
> Could you wait it? (for more "a few" days.)

  Yeah, sure.
(In reply to comment #26)
> > And, now, we (japanese mozilla contributors) are discussing about this patch.
> > Could you wait it? (for more "a few" days.)
> 
>   Yeah, sure.

As talking with -ja contributors, i add r+ for this patch.

# If this happens with only CJK, it might be ok to do with simple sprintf for email? (of cource, if we ignore about composite gryphs)
Attachment #298604 - Flags: review- → review+
Comment on attachment 298604 [details] [diff] [review]
v2 (tip)

Glob, could you give this a quick look-over to make sure it all seems sane to you?
Attachment #298604 - Flags: review?(wurblzap) → review?(bugzilla)
(In reply to comment #27)
> # If this happens with only CJK, it might be ok to do with simple sprintf for
> email? (of cource, if we ignore about composite gryphs)

  That would be possible in a separate bug. Do you have a list of all the CJK codepoints?
Comment on attachment 298604 [details] [diff] [review]
v2 (tip)

the css to force word wrapping looks correct.  it seems odd that we've got the IE fixes in a separate file, while all the other browser specific fixes are in global.css.  ah well :)

i'm going to r+ this, however..

>+.bz_comment_text {
>+     width: 50em;
>+}

> /* For bug fields */
> .uneditable_textarea {
>+    width: 30em;
> }

any reason for these being different widths?
Attachment #298604 - Flags: review?(bugzilla) → review+
(In reply to comment #30)
> any reason for these being different widths?  

  Yeah, uneditable textareas show up in the editing form, so they're shorter. They're actually wrapped at 60 chars instead of 80.
I still need to backport this to 3.0, but I want to approve it for 3.2 to keep it in our radar.
Flags: approval+
Attached patch v2 (3.0)Splinter Review
Here's the same patch for 3.0. I tested it and it seems to be fine.
Attachment #301973 - Flags: review?(bugzilla)
Comment on attachment 301973 [details] [diff] [review]
v2 (3.0)

r=glob
Attachment #301973 - Flags: review?(bugzilla) → review+
Flags: approval3.0+
Thanks for all the help and reviews, everybody! :-)

tip:

Checking in skins/standard/IE-fixes.css;
/cvsroot/mozilla/webtools/bugzilla/skins/standard/IE-fixes.css,v  <--  IE-fixes.css
new revision: 1.2; previous revision: 1.1
done
Checking in skins/standard/global.css;
/cvsroot/mozilla/webtools/bugzilla/skins/standard/global.css,v  <--  global.css
new revision: 1.45; previous revision: 1.44
done
Checking in template/en/default/attachment/midair.html.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/attachment/midair.html.tmpl,v  <--  midair.html.tmpl
new revision: 1.2; previous revision: 1.1
done
Checking in template/en/default/bug/comments.html.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/bug/comments.html.tmpl,v  <--  comments.html.tmpl
new revision: 1.36; previous revision: 1.35
done
Checking in template/en/default/bug/process/midair.html.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/bug/process/midair.html.tmpl,v  <--  midair.html.tmpl
new revision: 1.22; previous revision: 1.21
done
Checking in template/en/default/pages/linked.html.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/pages/linked.html.tmpl,v  <--  linked.html.tmpl
new revision: 1.10; previous revision: 1.9
done

3.0:
Checking in skins/standard/IE-fixes.css;
/cvsroot/mozilla/webtools/bugzilla/skins/standard/IE-fixes.css,v  <--  IE-fixes.css
new revision: 1.1.2.1; previous revision: 1.1
done
Checking in skins/standard/global.css;
/cvsroot/mozilla/webtools/bugzilla/skins/standard/global.css,v  <--  global.css
new revision: 1.31.2.3; previous revision: 1.31.2.2
done
Checking in template/en/default/bug/comments.html.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/bug/comments.html.tmpl,v  <--  comments.html.tmpl
new revision: 1.28.2.1; previous revision: 1.28
done
Checking in template/en/default/bug/process/midair.html.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/bug/process/midair.html.tmpl,v  <--  midair.html.tmpl
new revision: 1.19.2.1; previous revision: 1.19
done
Checking in template/en/default/pages/linked.html.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/pages/linked.html.tmpl,v  <--  linked.html.tmpl
new revision: 1.8.8.1; previous revision: 1.8
done
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Note that Firefox now throws tons of warnings in the Error console about these two lines:

 white-space: -pre-wrap;     /* Opera 4-6 */
 white-space: -o-pre-wrap;   /* Opera 7 */

It seems it doesn't know what to do with them.
sorry that i forgot this.

(In reply to comment #29)
> > # If this happens with only CJK, it might be ok to do with simple sprintf for
> > email? (of cource, if we ignore about composite gryphs)
> 
>   That would be possible in a separate bug. Do you have a list of all the CJK
> codepoints?

you can use scripts or block identifier in regexp.
http://www.unicode.org/Public/UNIDATA/Scripts.txt
http://www.unicode.org/Public/UNIDATA/Blocks.txt

like $string =~ /\p{InCJKSymbolsAndPunctuation}/;
(In reply to comment #37)
> you can use scripts or block identifier in regexp.
> http://www.unicode.org/Public/UNIDATA/Scripts.txt
> http://www.unicode.org/Public/UNIDATA/Blocks.txt
> 
> like $string =~ /\p{InCJKSymbolsAndPunctuation}/;

  Okay, file this as a separate bug and CC me on it. I think that there might be some perl-version limitation on using those block names (like it might require 5.8.x > 5.8.1 or something), but that could work, yeah.
Blocks: 423564
Blocks: 425606
Blocks: 425636
(In reply to comment #38)
>   Okay, file this as a separate bug and CC me on it. I think that there might
> be some perl-version limitation on using those block names

Not sure there is a limitation: http://www.unix.org.ua/orelly/perl/prog3/ch05_04.htm
Blocks: 434008
You need to log in before you can comment on or make changes to this bug.