Last Comment Bug 388723 - Bugzilla 3.0 does not wrap Japanese comment text
: Bugzilla 3.0 does not wrap Japanese comment text
Status: RESOLVED FIXED
: intl, regression
Product: Bugzilla
Classification: Server Software
Component: User Interface (show other bugs)
: 2.20
: All All
: -- major with 1 vote (vote)
: Bugzilla 3.0
Assigned To: Max Kanat-Alexander
: default-qa
Mentors:
Depends on: 282349 bz-utf8
Blocks: 423564 425606 425636 434008
  Show dependency treegraph
 
Reported: 2007-07-18 20:31 PDT by Colin
Modified: 2008-05-19 16:48 PDT (History)
5 users (show)
mkanat: approval+
mkanat: approval3.0+
See Also:
QA Whiteboard:
Iteration: ---
Points: ---


Attachments
v1 (tip) (4.69 KB, patch)
2008-01-22 17:02 PST, Max Kanat-Alexander
no flags Details | Diff | Splinter Review
v2 (tip) (4.77 KB, patch)
2008-01-22 17:34 PST, Max Kanat-Alexander
shimono: review+
glob: review+
Details | Diff | Splinter Review
v2 (3.0) (3.90 KB, patch)
2008-02-07 12:39 PST, Max Kanat-Alexander
glob: review+
Details | Diff | Splinter Review

Description Colin 2007-07-18 20:31:40 PDT
User-Agent:       Mozilla/5.0 (Macintosh; U; Intel Mac OS X; en) AppleWebKit/522.11 (KHTML, like Gecko) Version/3.0.2 Safari/522.12
Build Identifier: 

We are in the process of upgrading from Bugzilla 2.18 to Bugzilla 3.0.  We have both English and Japanese users.  We set up a test installation of Bugzilla 3.0, imported our data (7 years worth), used the contrib/recode.pl script to convert it to UTF8, and switched Bugzilla to use UTF8.

While testing out Bugzilla 3.0, we found that Bugzilla no longer wraps Japanese text in newly submitted comments.  I assume this was introduced by the changes in bug 11901, which was committed in Bugzilla 2.20.

This is preventing us from upgrading to Bugzilla 3.0.

I will add another comment in Japanese to demonstrate.



Reproducible: Always




Here is the output from checksetup.pl for our version info:
----------------------------------------
* This is Bugzilla 3.0+ on perl 5.8.8
* Running on FreeBSD 6.1-RELEASE-p14 FreeBSD 6.1-RELEASE-p14 <snip>

Checking perl modules...
Checking for             CGI (v2.93)   ok: found v3.15 
Checking for        TimeDate (v2.21)   ok: found v2.22 
Checking for             DBI (v1.41)   ok: found v1.53 
Checking for       PathTools (v0.84)   ok: found v3.24 
Checking for Template-Toolkit (v2.12)   ok: found v2.16 
Checking for      Email-Send (v2.00)   ok: found v2.183 
Checking for Email-MIME-Modifier (any)     ok: found v1.441 

Checking available perl DBD modules...
Checking for          DBD-Pg (v1.45)    not found 
Checking for       DBD-mysql (v2.9003) ok: found v4.001 

The following Perl modules are optional:
Checking for              GD (v1.20)   ok: found v2.35 
Checking for     Template-GD (any)     ok: found v1.56 
Checking for           Chart (v1.0)    ok: found v2.4.1 
Checking for         GDGraph (any)     ok: found v1.4308 
Checking for      GDTextUtil (any)     ok: found v0.86 
Checking for        XML-Twig (any)      not found 
Checking for      MIME-tools (v5.406)  ok: found v5.420 
Checking for     libwww-perl (any)     ok: found v2.033 
Checking for     PatchReader (v0.9.4)   not found 
Checking for      PerlMagick (any)     ok: found v6.3.2 
Checking for       perl-ldap (any)      not found 
Checking for       SOAP-Lite (any)      not found 
Checking for     HTML-Parser (v3.40)   ok: found v3.56 
Checking for   HTML-Scrubber (any)     ok: found v0.08 
Checking for Email-MIME-Attachment-Stripper (any)      not found 
Checking for     Email-Reply (any)      not found 
Checking for        mod_perl (v1.999022)  not found 
Checking for             CGI (v3.11)   ok: found v3.15 
Checking for      Apache-DBI (v0.96)    not found 

<snip>

Checking for       DBD-mysql (v2.9003) ok: found v4.001 
Checking for           MySQL (v4.1.2)  ok: found v4.1.22
Checking for        GraphViz (any)     ok: found
------------------------------
Comment 1 Colin 2007-07-18 20:32:37 PDT
英語だときちんと改行してくれますが、日本語で改行しないとWindow幅がどんどんひろくなってしまいますねー。これは前にもあった問題だったかもしれませんがどーだったのか良く覚えてないですねー。見ている環境はとりあえずFirefoxだけです。
Comment 2 A. Shimono [:himorin] 2007-07-18 20:44:57 PDT
i think it comes from the specification of Text::Wrap (cannot handle multibyte/
multi-width UTF-8 chars)
workaround (used in our Bugzilla-jp : http://bugzilla.mozilla.gr.jp) is
* disable Text::Wrap handlers
* insert wrap="hard" to all textareas

# i think i've heard that current bugzilla.org's not targetting 'truely i18n'...
# (so, i have no hope to handle Japanese with original bugzilla distribution..)
# so, many many problems will be occur to use in multi-byte UTF-8 languages :)
Comment 3 Colin 2007-07-19 01:20:51 PDT
Thank you for the info, Shimono.

Bugzilla-jp http://bugzilla.mozilla.gr.jp displays that it is running version 3.0-ja3.  I can only see a link to the diff files for 2.20-ja at ftp://ftp.mozilla.gr.jp/Bugzilla-ja/2.20-ja/.  Where can I find the diffs for 3.0-ja3?

I have found Text::WrapI18N http://search.cpan.org/~kubota/Text-WrapI18N-0.06/WrapI18N.pm that has "support for multibyte, fullwidth, and combining characters and languages without whitespaces between words".  Maybe this could be used in place of Text::Wrap?
Comment 4 Max Kanat-Alexander 2007-07-19 04:34:09 PDT
Bugzilla does aim to have complete i18n support. You have demonstrated this bug very nicely. :-)

Technically this is a regression of bug 282349.

I'll investigate Text::WrapI18N.
Comment 5 Max Kanat-Alexander 2007-07-19 04:50:01 PDT
Text::WrapI18N hasn't been updated since 2003 (though that might be OK) and doesn't support $Text::Wrap::huge, which would make it difficult to drop-in as a replacement for Text::Wrap. (It also doesn't support some other variables that we might need.)

I'd be interested to see what Text::Wrap does if the string actually has the utf8 bit set in perl. (Maybe it does the exact same thing, but I have no idea.) (In order to do this, we have to have the various DBD drivers return data tagged as utf8, which we don't do yet, but is totally possible. We have a bug for it somewhere, but I'm not finding it.)
Comment 6 Max Kanat-Alexander 2007-07-19 05:03:49 PDT
The bug that I couldn't find was bug 363153.
Comment 7 Frédéric Buclin 2007-07-19 05:16:31 PDT
I don't trust Text::WrapI18N too much as it depends on /usr/bin/locale.
Comment 8 Max Kanat-Alexander 2007-07-19 05:21:11 PDT
(In reply to comment #7)
> I don't trust Text::WrapI18N too much as it depends on /usr/bin/locale.

  Ah, okay. Yeah, that wouldn't work on Windows.
Comment 9 A. Shimono [:himorin] 2007-07-19 09:12:18 PDT
(In reply to comment #3)
> Bugzilla-jp http://bugzilla.mozilla.gr.jp displays that it is running version
> 3.0-ja3.  I can only see a link to the diff files for 2.20-ja at
> ftp://ftp.mozilla.gr.jp/Bugzilla-ja/2.20-ja/.  Where can I find the diffs for
> 3.0-ja3?

ah,,, sorry.. i don't have enough time to update that page.. (even updating 
static html :p)
please check out from http://svn.mozilla.gr.jp/

> I have found Text::WrapI18N
> http://search.cpan.org/~kubota/Text-WrapI18N-0.06/WrapI18N.pm that has "support
> for multibyte, fullwidth, and combining characters and languages without
> whitespaces between words".  Maybe this could be used in place of Text::Wrap?

WrapI18N is first written for debian d-i (as my knowledge).
i've failed to change Wrap to WrapI18N on bug-ja 2.20-ja code that i couldn't
work WrapI18N on our server CentOS.

(In reply to comment #5)
> I'd be interested to see what Text::Wrap does if the string actually has the
> utf8 bit set in perl. (Maybe it does the exact same thing, but I have no idea.)
> (In order to do this, we have to have the various DBD drivers return data
> tagged as utf8, which we don't do yet, but is totally possible. We have a bug
> for it somewhere, but I'm not finding it.)

current version of DBD drivers for mysql, pgsql can do that. (set utf8 flag)
i've installed 'utf8' flag, but i had a serious bug related on CGI.pm
Comment 10 Max Kanat-Alexander 2007-11-22 23:43:25 PST
(In reply to comment #9)
Hey himorin. I just checked in a patch to the Bugzilla CVS HEAD that sets the utf8 bit on correctly on strings. Could you tell me what Text::Wrap does now? I suspect it still won't work.

I think we have two options:

1) We could set $Text::Wrap::break appropriately, somehow. There might be a good Unicode regex that would work.

2) Possibly look into using Unicode::Wrap?
Comment 11 Jean-Marc Desperrier 2007-11-23 14:49:10 PST
As I noted on the ML, Unicode::Wrap is not a direct replacement, and might have performance problems "This module can be slow. It's a pure-Perl implementation that goes through an expensive classification process per character." 

This said Text::WrapI18N apparently only does CJK, also the documentation says "The texts have to be written in locale encoding."
Comment 12 A. Shimono [:himorin] 2007-11-26 19:15:30 PST
(In reply to comment #10)
> Hey himorin. I just checked in a patch to the Bugzilla CVS HEAD that sets the
> utf8 bit on correctly on strings. Could you tell me what Text::Wrap does now? I
> suspect it still won't work.

yeah, it *still* doesn't work.
see. http://bugzilla-trunk.test.mozilla.gr.jp/show_bug.cgi?id=5952
# today (20071127 1130JST)'s cvs head

> I think we have two options:
> 
> 1) We could set $Text::Wrap::break appropriately, somehow. There might be a
> good Unicode regex that would work.

sorry, i have no idea for good unicode (utf-8 byte sequence) regex..
of cource, we can cut when first bit (8bit) is set,, but it's ignoring some unicode features.
# like http://services.mozilla.gr.jp/viewsvn/trunk/I18N.pm?revision=111&view=markup

The most difficult thing, i think, is about composite griph.

> 2) Possibly look into using Unicode::Wrap?

i agree with comment #11
Comment 13 Colin 2007-12-06 09:35:43 PST
My company is very anxious for a fix for this issue (we're stuck at Bugzilla 2.18 because of it).

It sounds like Text:Wrap, Text::WrapI18N, and Unicode::Wrap each have their own issues.  Are there any other options?

How can we help?  (We're coders, but our perl experience is limited to simple customization changes to Bugzilla.)
Comment 14 Max Kanat-Alexander 2007-12-06 09:42:05 PST
(In reply to comment #13)
> How can we help?  (We're coders, but our perl experience is limited to simple
> customization changes to Bugzilla.)

  Hrm. You could do a detailed performance analysis of Unicode::Wrap vs. Text::Wrap, that would help.
Comment 15 A. Shimono [:himorin] 2008-01-10 02:36:34 PST
Couldn't install Unicode::Wrap 0.0.3 on perl 5.8.5.. It seems some bugs are in version 0.0.3? (refer http://cpantesters.perl.org/show/Unicode-Wrap.html#Unicode-Wrap-0.03, too.)
# But could install 0.0.1

hmm. What's the other solutions?
Of cource, i think we can include some perl module like Wrap.pm (of Unicode::Wrap) in util, technically..

Or, change the template for showing bugs from <pre> to <div> or something?
Comment 16 Max Kanat-Alexander 2008-01-10 07:06:44 PST
Just so you know, TASCAM has contracted Everything Solved (my company) to provide a solution for this bug, for them. It might be specific to their needs (Japanese and English), or it might be generally applicable. Either way I'll attach it here.
Comment 17 Max Kanat-Alexander 2008-01-22 17:02:06 PST
Created attachment 298595 [details] [diff] [review]
v1 (tip)

Okay, here is a tip patch that handles this problem as far as I can see.

Eseentially, what it does is use CSS 3 and custom browser CSS to make preformatted text wrap correctly. This wraps Chinese, Japense, and Korean correctly, while Bugzilla itself still does the wrapping of space-containing languages internally using Text::Wrap.

Originally, when writing this patch, I removed wrap_comment and just had the browser do all the wrapping. However, this wouldn't work in browsers like lynx. It also made Bugzilla pretty dependent upon modern browsers just to display correctly--for example, there's a good chance that the code wouldn't have worked at all in Safari 2.x.

I'd actually love to remove wrap_comment and let the browser do all the work, but Safari was really a problem. I don't want to exclude Safari 1.0 users, even, really, from seeing correctly-wrapped comments.
Comment 18 Max Kanat-Alexander 2008-01-22 17:17:13 PST
By the way, I did a tremendous amount of research for this bug.

Unicode::Wrap had a bug since perl 5.8.4, and even when I fixed that bug, the module didn't pass its own tests. I spent a bit of time trying to fix it, and didn't get anywhere, probably because I don't understand UAX#14 (the Unicode Standard Annex 14--the specification for line-breaking in all languages, which is huge and too long for me to read and re-implement as part of this bug).

Text::Wrap::I18N was from Debian originally, I think, and so of course is dependent upon /usr/bin/locale.

There's a module called Text::ChaSen that can process Japanese text into "phrases" or "sentences", but it's dependent upon a third-party library called ChaSen that has to be compiled and installed, from my understanding. Also, its documentation is not in English. Finally, the solution would have been limited to Japanese in any case.

The nice advantage of browser-side wrapping is that if browsers improve their UAX#14 compliance in the future, Bugzilla gets that functionality automatically.

The only thing this patch doesn't fix is emails. That would have required one of the above server-side solutions, or I could alternately make Bugzilla wrap_hard for emails, but that would break mail clients' URL highlighting for comments.
Comment 19 Max Kanat-Alexander 2008-01-22 17:34:39 PST
Created attachment 298604 [details] [diff] [review]
v2 (tip)

Oh, I realized that quoted text was being wrapped, incorrectly. This patch fixes it.
Comment 20 A. Shimono [:himorin] 2008-01-24 03:44:14 PST
Comment on attachment 298604 [details] [diff] [review]
v2 (tip)

1. I think we can remove 'FILTER wrap_comment' from templates.

2. Shouldn't we change the comment display width from 
>+.bz_comment_text {
>+     width: 50em;
>+}
to something like 'width: xx%;'?
Comment 21 Max Kanat-Alexander 2008-01-24 13:05:46 PST
(In reply to comment #20)
> (From update of attachment 298604 [details] [diff] [review])
> 1. I think we can remove 'FILTER wrap_comment' from templates.

  Did you read my comments above?

> 2. Shouldn't we change the comment display width from 
> >+.bz_comment_text {
> >+     width: 50em;
> >+}
> to something like 'width: xx%;'?

  No, comments have a fixed width.
Comment 22 A. Shimono [:himorin] 2008-01-24 15:44:11 PST
(In reply to comment #21)
> > (From update of attachment 298604 [details] [diff] [review] [details])
> > 1. I think we can remove 'FILTER wrap_comment' from templates.
> 
>   Did you read my comments above?

Yeah.
If you want to support Safari or something which don't support CSS 3 features, I think this bug do not resolved the patch above, as you said. In that case, we should make some module using UAX 14. (Or 'remove' PRE formatting and use DIV with fixed fonts.)

# I forgot to test with long texts which include us-ascii and the others mixed. 

> > 2. Shouldn't we change the comment display width from 
> > >+.bz_comment_text {
> > >+     width: 50em;
> > >+}
> > to something like 'width: xx%;'?
> 
>   No, comments have a fixed width.

Ok.
At least, i think we might change this to match with the hard wrapping for us-ascii.
Comment 23 A. Shimono [:himorin] 2008-01-24 15:47:40 PST
(In reply to comment #22)
> > > (From update of attachment 298604 [details] [diff] [review] [details] [details])
> > > 1. I think we can remove 'FILTER wrap_comment' from templates.
> > 
> >   Did you read my comments above?
> 
> Yeah.
> If you want to support Safari or something which don't support CSS 3 features,
> I think this bug do not resolved the patch above, as you said. In that case, we
> should make some module using UAX 14. (Or 'remove' PRE formatting and use DIV
> with fixed fonts.)

mm, i totally forgot about users only using english...
For them, current wrap_comment system might be the most suitable one.
Comment 24 Max Kanat-Alexander 2008-01-24 15:53:11 PST
(In reply to comment #22)
> we should make some module using UAX 14.

  That's really easy to *say*. Have you read UAX 14? Do *you* want to fix Unicode::Wrap?

> # I forgot to test with long texts which include us-ascii and the others mixed. 

  I tested it.

> At least, i think we might change this to match with the hard wrapping for
> us-ascii.

  I didn't do that on purpose, because it could possibly re-wrap the already-wrapped us-ascii comments, and "em" is not a precise measure. (It depends on fonts installed, so you have to leave some headroom.)

(In reply to comment #23)
> mm, i totally forgot about users only using english...

  Or *any* non-CJK language.
Comment 25 A. Shimono [:himorin] 2008-01-28 09:40:03 PST
(In reply to comment #24)
> > we should make some module using UAX 14.
> 
>   That's really easy to *say*. Have you read UAX 14? Do *you* want to fix
> Unicode::Wrap?

I've read, and i've selected to set 'wrap=hard' on html/textarea rather than to write some code for wrapping for our bugzilla site..
# UAX#14 was one problem, and the "composite glyph" was another...

> > # I forgot to test with long texts which include us-ascii and the others mixed. 
> 
>   I tested it.
thx.

> > At least, i think we might change this to match with the hard wrapping for
> > us-ascii.
> 
>   I didn't do that on purpose, because it could possibly re-wrap the
> already-wrapped us-ascii comments, and "em" is not a precise measure. (It
> depends on fonts installed, so you have to leave some headroom.)
> 
> (In reply to comment #23)
> > mm, i totally forgot about users only using english...
> 
>   Or *any* non-CJK language.

Thank you for teaching that.

And, now, we (japanese mozilla contributors) are discussing about this patch. Could you wait it? (for more "a few" days.)
Comment 26 Max Kanat-Alexander 2008-01-28 16:39:00 PST
(In reply to comment #25)
> And, now, we (japanese mozilla contributors) are discussing about this patch.
> Could you wait it? (for more "a few" days.)

  Yeah, sure.
Comment 27 A. Shimono [:himorin] 2008-02-03 00:26:09 PST
(In reply to comment #26)
> > And, now, we (japanese mozilla contributors) are discussing about this patch.
> > Could you wait it? (for more "a few" days.)
> 
>   Yeah, sure.

As talking with -ja contributors, i add r+ for this patch.

# If this happens with only CJK, it might be ok to do with simple sprintf for email? (of cource, if we ignore about composite gryphs)
Comment 28 Max Kanat-Alexander 2008-02-03 13:42:22 PST
Comment on attachment 298604 [details] [diff] [review]
v2 (tip)

Glob, could you give this a quick look-over to make sure it all seems sane to you?
Comment 29 Max Kanat-Alexander 2008-02-03 13:45:46 PST
(In reply to comment #27)
> # If this happens with only CJK, it might be ok to do with simple sprintf for
> email? (of cource, if we ignore about composite gryphs)

  That would be possible in a separate bug. Do you have a list of all the CJK codepoints?
Comment 30 Byron Jones ‹:glob› 2008-02-03 15:56:44 PST
Comment on attachment 298604 [details] [diff] [review]
v2 (tip)

the css to force word wrapping looks correct.  it seems odd that we've got the IE fixes in a separate file, while all the other browser specific fixes are in global.css.  ah well :)

i'm going to r+ this, however..

>+.bz_comment_text {
>+     width: 50em;
>+}

> /* For bug fields */
> .uneditable_textarea {
>+    width: 30em;
> }

any reason for these being different widths?
Comment 31 Max Kanat-Alexander 2008-02-03 15:57:52 PST
(In reply to comment #30)
> any reason for these being different widths?  

  Yeah, uneditable textareas show up in the editing form, so they're shorter. They're actually wrapped at 60 chars instead of 80.
Comment 32 Max Kanat-Alexander 2008-02-03 15:58:31 PST
I still need to backport this to 3.0, but I want to approve it for 3.2 to keep it in our radar.
Comment 33 Max Kanat-Alexander 2008-02-07 12:39:41 PST
Created attachment 301973 [details] [diff] [review]
v2 (3.0)

Here's the same patch for 3.0. I tested it and it seems to be fine.
Comment 34 Byron Jones ‹:glob› 2008-02-07 15:51:19 PST
Comment on attachment 301973 [details] [diff] [review]
v2 (3.0)

r=glob
Comment 35 Max Kanat-Alexander 2008-02-08 15:20:59 PST
Thanks for all the help and reviews, everybody! :-)

tip:

Checking in skins/standard/IE-fixes.css;
/cvsroot/mozilla/webtools/bugzilla/skins/standard/IE-fixes.css,v  <--  IE-fixes.css
new revision: 1.2; previous revision: 1.1
done
Checking in skins/standard/global.css;
/cvsroot/mozilla/webtools/bugzilla/skins/standard/global.css,v  <--  global.css
new revision: 1.45; previous revision: 1.44
done
Checking in template/en/default/attachment/midair.html.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/attachment/midair.html.tmpl,v  <--  midair.html.tmpl
new revision: 1.2; previous revision: 1.1
done
Checking in template/en/default/bug/comments.html.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/bug/comments.html.tmpl,v  <--  comments.html.tmpl
new revision: 1.36; previous revision: 1.35
done
Checking in template/en/default/bug/process/midair.html.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/bug/process/midair.html.tmpl,v  <--  midair.html.tmpl
new revision: 1.22; previous revision: 1.21
done
Checking in template/en/default/pages/linked.html.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/pages/linked.html.tmpl,v  <--  linked.html.tmpl
new revision: 1.10; previous revision: 1.9
done

3.0:
Checking in skins/standard/IE-fixes.css;
/cvsroot/mozilla/webtools/bugzilla/skins/standard/IE-fixes.css,v  <--  IE-fixes.css
new revision: 1.1.2.1; previous revision: 1.1
done
Checking in skins/standard/global.css;
/cvsroot/mozilla/webtools/bugzilla/skins/standard/global.css,v  <--  global.css
new revision: 1.31.2.3; previous revision: 1.31.2.2
done
Checking in template/en/default/bug/comments.html.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/bug/comments.html.tmpl,v  <--  comments.html.tmpl
new revision: 1.28.2.1; previous revision: 1.28
done
Checking in template/en/default/bug/process/midair.html.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/bug/process/midair.html.tmpl,v  <--  midair.html.tmpl
new revision: 1.19.2.1; previous revision: 1.19
done
Checking in template/en/default/pages/linked.html.tmpl;
/cvsroot/mozilla/webtools/bugzilla/template/en/default/pages/linked.html.tmpl,v  <--  linked.html.tmpl
new revision: 1.8.8.1; previous revision: 1.8
done
Comment 36 Frédéric Buclin 2008-02-28 11:42:39 PST
Note that Firefox now throws tons of warnings in the Error console about these two lines:

 white-space: -pre-wrap;     /* Opera 4-6 */
 white-space: -o-pre-wrap;   /* Opera 7 */

It seems it doesn't know what to do with them.
Comment 37 A. Shimono [:himorin] 2008-03-03 07:35:14 PST
sorry that i forgot this.

(In reply to comment #29)
> > # If this happens with only CJK, it might be ok to do with simple sprintf for
> > email? (of cource, if we ignore about composite gryphs)
> 
>   That would be possible in a separate bug. Do you have a list of all the CJK
> codepoints?

you can use scripts or block identifier in regexp.
http://www.unicode.org/Public/UNIDATA/Scripts.txt
http://www.unicode.org/Public/UNIDATA/Blocks.txt

like $string =~ /\p{InCJKSymbolsAndPunctuation}/;
Comment 38 Max Kanat-Alexander 2008-03-03 15:28:52 PST
(In reply to comment #37)
> you can use scripts or block identifier in regexp.
> http://www.unicode.org/Public/UNIDATA/Scripts.txt
> http://www.unicode.org/Public/UNIDATA/Blocks.txt
> 
> like $string =~ /\p{InCJKSymbolsAndPunctuation}/;

  Okay, file this as a separate bug and CC me on it. I think that there might be some perl-version limitation on using those block names (like it might require 5.8.x > 5.8.1 or something), but that could work, yeah.
Comment 39 Frédéric Buclin 2008-04-07 07:42:38 PDT
(In reply to comment #38)
>   Okay, file this as a separate bug and CC me on it. I think that there might
> be some perl-version limitation on using those block names

Not sure there is a limitation: http://www.unix.org.ua/orelly/perl/prog3/ch05_04.htm

Note You need to log in before you can comment on or make changes to this bug.