Closed Bug 7969 Opened 25 years ago Closed 17 years ago

need dictionary based Thai line breaker and intergrate into line/word breaker

Categories

(Core :: Internationalization, defect, P3)

defect

Tracking

()

RESOLVED WONTFIX
Future

People

(Reporter: ftang, Assigned: smontagu)

References

(Blocks 2 open bugs)

Details

(Keywords: helpwanted, intl)

Attachments

(8 files, 17 obsolete files)

8.38 KB, patch
Details | Diff | Splinter Review
211.95 KB, application/octet-stream
Details
26.39 KB, text/html
Details
5.89 KB, patch
Details | Diff | Splinter Review
5.60 KB, patch
jshin1987
: review-
Details | Diff | Splinter Review
7.83 KB, patch
Details | Diff | Splinter Review
8.60 KB, patch
Details | Diff | Splinter Review
21.14 KB, patch
Details | Diff | Splinter Review
Status: NEW → ASSIGNED
Target Milestone: M10
Target Milestone: M10 → M15
change to M15 since this is post beta
Whiteboard: Help Wanted
Keywords: helpwanted
Whiteboard: Help Wanted
mark it M20
Target Milestone: M15 → M20
QA Contact: teruko → ftang
line / work breaking related issue. Reassign to shanjian and cc erik/nhotta
Assignee: ftang → shanjian
Status: ASSIGNED → NEW
Status: NEW → ASSIGNED
mark it future
Target Milestone: M20 → Future
Line breaking issue is big issue for Thai. Browser without Line breaking 
(Netscape 4) display text worse than IE5 (with line-breaking) so much.
This also impontant in Word Processing too. (should be in composer too).
What will you all do with this bug?
We need help to resolve this problem. Could you provide us help in this area? 
I can explain the current code to you if necessary.
Keywords: intl
for a thai dictionary for wordbreaker, i think samphan@thai.net can help you.
What is the current status of this issue ?
Here's the patch for dictionary-based Thai line breaker for Mozilla.
The work is not finished and have been tested on Linux only.
There're some problems with it such as :-
- On Linux the dictionary will be loaded as "/usr/share/inthai/inthai.lex"
- On other platform it'll be loaded as "inthai.lex" so it must be in 
  current directory! See mozilla/intl/inthai/src/PackedTrie.cpp

Set your browser to "http://opensource.thai.net" if you want some test data in
Thai.
- cd to mozilla/.. and tar -xzvf libinthai-moz.tar.gz
  The library will be installed in mozilla/intl/inthai
- Copy the dictionary file mozilla/intl/inthai/data/inthai.lex to
  /usr/share/inthai for Linux or the executable directory for Win.
  (you must mkdir /usr/share/inthai first).
Sorry, I forgot to tell that the above patch is made on mozilla 0.9.8 source.
How does the functionality of this patch compare to the functionality of
Mozilla's CTL(=Complex Text Layout) extension ?
as of my knowledge,

CTL provide the functionalities of:
- render Thai characters (cell-base, 4 levels of characters in 1 cell)
- cell-base caret travelling, delete, backspace
- cell-base highlight selection

There's NO wordbreak functionality in CTL (at least, for now).

Prabhat may better clarify this.
please do not attach application/octect-stream type in bugzilla attachment. 
Blocks: thai
I looked at this patch. With some minor changes we could integrate this 
to co-exist with current CTL code. One issue for discussion could be if
anyone in intl team is planning to integrate ICU (which i believe also
provides good thai word-breaking support). Could someone in intl team 
comment on this?

I'd love to hear about using ICU in mozilla. IMHO, this will solve many problems.

> ICU (which i believe also provides good thai word-breaking support)

Thai dictionary-based word-breaker in ICU is not good enough. People from
IBM/Sun Thailand who use/test ICU with Thai language can tell you that. The
problems come from :-
1) Greedy, longest-matching algorithm use in ICU can't handle ambiguous word
sequences at all (it was mentioned as over-kill). Modern algorithms in Thai use
maximal-matching algorithm which use dynamic programming technic to handle
ambiguous sequences efficiently.
2) ICU algorithm don't handle unknown words (words not in the dictionary) very
well. Unknown words happend a lot in real Thai texts so it's *very* serious.
3) It's not been tested enough with Thai so there may be cases that cause error.
One case I can think of is the use of '.' in Thai to denote abbreviations. The
'.' could be threated as part of Thai word or as punctuation (some use it to
denote end of sentences) depend on context and dictionary. The dictionary should
also contain list of abbreviations for this to work.

On the other hand, IMO, ICU's BreakIterator API is the way to go. Libinthai
actually adopted BreakIterator API from the beginning. The interface you see in
my Mozilla patch is the result of painfully adapting the stateful BreakIterator
interface to Mozilla's stateless breaker API. Internally libinthai is ICU
compatible. I think Mozilla should consider using BreakIterator API directly in
the future.
> The work is not finished and have been tested on Linux only.
> There're some problems with it such as :-
> - On Linux the dictionary will be loaded as "/usr/share/inthai/inthai.lex"
> - On other platform it'll be loaded as "inthai.lex" so it must be in 
>   current directory! See mozilla/intl/inthai/src/PackedTrie.cpp

The dependency on the dictionary data file (inthai.lex) in libinthai has been
solved. The (default) main dictionary is compiled-in as a module as in ICU.
We're working on (re)building libinthai on Windows. Now libinthai also has
spelling checking/correction functionality too (spelling check and word breaking
for Thai are the two sides of the same coin).

hi samphan - Please post the latest patch and library with dictionary dependency
removed so that i can get our thai team to verify this. 
thanks,
prabhat
The new libinthai can be d/l at
ftp://opensource.thai.net/pub/office-tle/libinthai_0.9.tar.gz
I haven't look at the patches for a while but will do ASAP if you want to.
BTW, what's the current plan?
1) change current code to use ICU's BreakIterator API and
1.1) go directly to ICU now
1.2) use libinthai until Thai BreakIterator in ICU is ok
2) use libinthai with its mozilla specific API as a temporaly solution

After all, please note that this particular patch (from above) fix the 'bug'
that prevent word-breaking for Thai eventhough the rule-based code is already
check-in (mozilla/intl/lwbrk/src/rulebrk.c) and used to work in old versions.

diff -ur tmp/mozilla/intl/lwbrk/src/nsJISx4501LineBreaker.cpp
dev/mozilla/intl/lwbrk/src/nsJISx4501LineBreaker.cpp
--- tmp/mozilla/intl/lwbrk/src/nsJISx4501LineBreaker.cpp	Wed Jan 16 07:03:16 2002
+++ dev/mozilla/intl/lwbrk/src/nsJISx4501LineBreaker.cpp	Wed Feb 13 18:06:52 2002
@@ -234,7 +233,8 @@
   return ((0x1100 <= (u) && (u) <= 0x11ff) ||
           (0x2e80 <= (u) && (u) <= 0xd7ff) ||
           (0xf900 <= (u) && (u) <= 0xfaff) ||
-          (0xff00 <= (u) && (u) <= 0xffff) );
+          (0xff00 <= (u) && (u) <= 0xffff) ) ||
+          (0x0e00 <= (u) && (u) <= 0x0e5f);
 }
 
 static inline int
Simplest solution for line break will be to allow separation after every Thai
Character
A better solution will be to separate before
characters in Thai Code page 874: 0xE0 to 0xE4
or Unicode: 0x0E40 to 0x0E44
Best will be to have a complete algorithm for word separation as e.g. in MS
Internet Explorer.
This patch is just for reviewing. It is for mozilla/firefox to use ICU for 
line breaking. It works on both Linux and Win32. The problem with it is that
- you need ICU installed in the target machine
- to build it is quite ad-hoc
  - on linux you have to put ICU in /usr/lib
  - on win32 you need to add icu/lib path to the LIB variable
Anyway, integrating a lite version of ICU into the tree will solve this problem.
Other than that it works quite well. The patched firefox has been released
and being used by many people.

A flaw in the patch, nsJISx4051LineBreaker::BreakInBetween() is not implemented.
It simply do this :-

if((CLASS_THAI == c1) && (CLASS_THAI == c2))
{
   *oCanBreak = 0;
}
else
{
   *oCanBreak = GetPair(c1,c2);
}

I'm not sure if this is ok but I guess that BreakInBetween() is only need for
CJK (cause the patch work for Thai) and I can't think of a way to implement 
BreakInBetween() using ICU. 
Any comment?
This is the new patch that fix a bug that make some space missing when it is at
the end of line in HTML source. It simply use
DictionaryBasedBreakIterator::CreateWordInstance instead of CreateLineInstance.

Many people are using the patched firefox on Linux. I also distribute a setup
program on Windows that contain patched firefox and ICU together so Windows
users can start using it.
Attachment #172558 - Attachment is obsolete: true
Blocks: 257935
shanjian is no longer working on mozilla for 2 years and these bugs are still
here. Mark them won't fix. If you want to reopen it, find a good owner first. 
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → WONTFIX
Mass Re-open of Frank Tangs Won't fix debacle. Spam is his responsibility not my own
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
Mass Re-assinging Frank Tangs old bugs that he closed won't fix and had to be
re-open. Spam is his fault not my own
Assignee: shanjian → nobody
Status: REOPENED → NEW
Assignee: nobody → smontagu
QA Contact: ftang → amyy
On Windows, I built ICU with --enable-static and modified the source as shown
in the next patch. Then I statically link the ICU binaries with the patched
firefox (which is built statically with default --enable-optimize). The result
firefox.exe with Thai line-breaking feature is only 328K bigger.
This proofs that it may be possible to include ICU in Mozilla/Firefox w/o too
much size penalty. ICU size can be reduced a lot. And static linking will pull
only the code we need from ICU.
Attachment #173704 - Attachment is obsolete: true
Attachment #186465 - Flags: review?(jshin1987)
This patch remove a lot of stuffs. I make sure only that Thai
DictionaryBasedBreakIterator is working.
Attachment #186467 - Flags: review?(jshin1987)
I've just build the original firefox source with the same MOZCONFIG/setting I
used for the modified source. The result firefox.exe is a little bit smaller
than Mozilla.org official build. So I recompute the actual size that my ICU add
to firefox.exe. The modified firefox.exe is actually 528K bigger.
Comment on attachment 186465 [details] [diff] [review]
Patch to use ICU for Thai line-breaking, statically link on Windows

Thanks for your work, but introducing a new dependency (although only at the
compilation time) is not under my 'jurisdiction'. Perhaps, we need to figure
out a better way to make use of ICU (including a subset of them in our tree may
be an option).
Attachment #186465 - Flags: review?(jshin1987)
1) Is there any other possible use of ICU in Mozilla now? There'll be more
chance to include ICU in the tree if there're many uses of it.

2) Is it possible to introduce a configure option, e.g. --with-icu[=icu-prefix],
to enable the use of ICU at compile time? This should not interfere with the
normal build while providing the option to use ICU in Mozilla (anywhere).
(In reply to comment #30)
> 1) Is there any other possible use of ICU in Mozilla now? There'll be more
> chance to include ICU in the tree if there're many uses of it.

There may be many if we decide to use ICU, but ...
 
> 2) Is it possible to introduce a configure option, e.g. --with-icu[=icu-prefix],
> to enable the use of ICU at compile time?
 
That's another possibility. Perhaps, that's a good interim solution. Can you
make a patch to implement that? 
I don't know how yet but I'll try.
I've looked at similar stuff in configure.in.
I split the patch into two parts. The patch to configure.in (add --with-icu) is
filed as another bug #299324 which this bug depends. This patch now use ICU for
Thai line-breaking only if configure with --with-icu flag (which will #define
ENABLE_ICU).
Attachment #186465 - Attachment is obsolete: true
Attachment #187900 - Flags: review?(jshin1987)
Depends on: 299324
Comment on attachment 187900 [details] [diff] [review]
Patch to optionally use ICU for Thai line breaking

Sorry for the delay.
We have to keep the current Thai line breaker if ENABLE_ICU is NOT defined,
don't we?
Attachment #187900 - Flags: review?(jshin1987) → review-
Comment on attachment 186467 [details] [diff] [review]
Patch to reduce the size of ICU for Thai line-breaking

This is not releavant at least for now
Attachment #186467 - Flags: review?(jshin1987)
> We have to keep the current Thai line breaker if ENABLE_ICU is NOT defined,
don't we?

No. The old rule-based Thai word-breaking code should be removed. It doesn't get
called anyway. nsJISx4501LineBreaker.cpp w/o the new patch only process CJK now
(it used to called the Thai rule-based code). That's why Thai word-breaking is
currently not working.
You will need to install libthai (http://libthai.sf.net/) on your system for
the Thai word breaking function to work.

The code loads libthai.so dynamically so if there is no libthai in the system,
it will do nothing. The benefit is that the patch is small and doesn't hurt
other users who don't speak Thai. For Thai users, they just have to install
libthai in their system and the vanilla version Mozilla/Firefox should work
immediately, no need to re-compile in order to have Thai support.

The code uses dlopen() (tested on Linux Debian unstable +Firefox-1.0.6), to be
more cross-platform, we might switch to GLib's GModule? I have included an
experimental code in the patch but had a problem with Makefile/INCLUDE stuff so
didn't test it yet. 

For Windows, instead of libthai, we can use the word breaking function provided
by Uniscribe? I have not much idea on this so Windows guru could you help?
The ability to break Thai words of web browsers is really needed for Thai 
users. The libthai-based patch above (modified from Samphan's ICU patch, 
tested) is a solution to add this support to Mozilla/Firefox. It loads 
libthai.so (http://libthai.sf.net/), if exists in the system, dynamically, so 
no need to do ./configure --enable-something and/or recompile Mozilla/Firefox 
for every new releases. 
 
Not sure if Mozilla will eventually go for ICU/Pango/... in the future but for 
the time being, is a solution like this acceptable?  
Will someone in Mozilla read this thread/bug?
*** Bug 257935 has been marked as a duplicate of this bug. ***
I have another idea. Since incorporating ICU into Mozilla code base seems to be impossible. Is it possible to write an extension for the job? 

The extension will do this :-
1) It'll implement the interface nsILineBreaker, using ICU for Thai line breaking
2) It'll register the new interface in the service NS_LWBRK_CONTRACTID, so nsDocument::GetLineBreaker() will get the new implementation from the extension instead.

I know nothing about XPCOM.
- Is this possible? Is it possible to replace the interface implementation from an extension?

What do you think? The way to go?
An updated version of libthai patch. It uses NSPR instead of dlopen() to load libthai according to the information here:
http://www.mozilla.org/projects/nspr/reference/html/prlink.html#24072
Thank you Jean-Marc Desperrier for poiting this out.

What is the proper way to seach for libthai.so from default paths specified in /etc/ld.so.conf or LD_LIBRARY_PATH? Now I just hard-code it to load from /usr/lib.

pattara
Without looking into all details of the bug, does
http://developer.mozilla.org/en/docs/Using_Dependent_Libraries_In_Extension_Components
help? Maybe not, but still putting the reference here.
(In reply to comment #42)

> What is the proper way to seach for libthai.so from default paths specified in
> /etc/ld.so.conf or LD_LIBRARY_PATH? Now I just hard-code it to load from
> /usr/lib.

If you use PR_LoadLibrary, I think you don't have to do anything. NSPR does the job for you afaik. 
Comment on attachment 194910 [details] [diff] [review]
Patch to add  Thai word break support to Mozilla/Firefox using libthai

obsoleted by the libthai NSPR patch
Attachment #194910 - Attachment is obsolete: true
Hi,

Months have passed. The patch is there. The patch does fix the problem. But it is left there. 

Just wondering how long will it take until Mozilla/Firefox can break Thai words correctly. Who is the right contact point for this? Where is the right place to post? If not here then what is this place for?

Does Mozilla/Firefox usually accept contribution like this from users? Is it the same for other bugs or this is the special case that nobody seems to care at all? Is it a technical problem or an administrative one? If the patch doesn't conform to some guidelines please let us know what to fix. 

Just need some responses. Please at least show me that Mozilla is an Open Source project with open collaboration from its users. Otherwise how should I tell my users when things like Thai word-breaking don't work and don't seem to get fixed in any near future?

Pattara
Attachment #211242 - Flags: review?(jshin1987)
Attached patch Updated libthai patch (obsolete) — Splinter Review
Updated libthai patch as modified when submitted for inclusion in Ubuntu dapper. It now loads libthai.so.0 (with library version info specified) instead of just libthai.so, to guarantee ABI. Besides, the debug messages are now put only if DEBUG is defined.
The code lacks platform ifdefs to boot.
Re Windows and ICU, going the extension way may be a good idea. A post to
mozilla.dev.platform and .i18n may be good here (.platform is probably more
important than .i18n).
Any idea about the mac? jshin, is any of this related to the indic problems we 
have?
(In reply to comment #48)
> The code lacks platform ifdefs to boot.

you mean something like

#ifdef LINUX

[...the added part...]

#endif

?

If not could you give me an example or point to some URLs?

Would you like to help test the patch?

> Re Windows and ICU, going the extension way may be a good idea. A post to

Windows has Uniscribe, so IMHO we should just use it (i don't know how).

> mozilla.dev.platform and .i18n may be good here (.platform is probably more
> important than .i18n).
> Any idea about the mac? jshin, is any of this related to the indic problems we 
> have?

I think the final solution which should work on all platforms is to use Pango. I heard Theppitak is working on it. But meanwhile, for the sake of Thai Linux users, temporary-but-works-and-doesn't-hurt-anybody solution like libthai should not be ignored.
Attached patch Rewritten LibThai Patch (obsolete) — Splinter Review
I've rewritten the LibThai patch. It's against MOZILLA_1_7_BRANCH, but I think still applies to MOZILLA_1_8_BRANCH as well (but not HEAD, of course).

In this patch, I split the libthai stuffs into a separate nsLibThaiLineBreaker class and call it from nsJISx4501LineBreaker at appropriate places. (At first, I was about to add a case in nsLWBreakerFimp factory, but it seemed all the callers passed a blank parameter. So, having more than one implementation is just impossible.)

Also, a preprocessor is added so that libthai is called only on Unix platforms, although making libthai available on Windows is also possible with cygwin/mingw. But that can be added later.
Attachment #218666 - Attachment is obsolete: true
OK. I managed to rediff the rewritten libthai patch against 1.8 branch, by using debian's xulrunner 1.8.0.1 pristine source.
Attachment #220497 - Attachment is obsolete: true
Related info: I've opened Bug #336959 for Pango/Uniscribe approach to line breaking in Trunk.
As another alternative, libthai dependency may be better for deployment in some distributions that also ship libthai.
Attachment #220509 - Flags: review?(jshin1987)
Attachment #221289 - Flags: review?(jshin1987)
I'm really sorry for dragging my feet here. Would you please make a new patch against the trunk instead of 1.8 branch? (note that nsILineBreaker is deCOMtaminated so that you need to make some changes). Without baking in the trunk for a while, it can't be landed in 1.8 branch (if it's not too late). BTW, why don't you use 'cvs diff' to make your patch? 
> BTW, why don't you use 'cvs diff' to make your patch?

Because new files were introduced, and without CVS write permission,
I couldn't 'cvs add' them before diff-ing. They would thus have been 
lost in the patch.

Regarding working with trunk or branch, it's quite a long story.
In short, it's a mix of results from Thai team's timing and from distro 
contacts which forward the bug back upstream. So, I lose the focus on
trunk. Now we seem to miss all the plans, anyway.

Sorry for the wrongly targeted patch. I shall rework it against trunk 
soon.
(In reply to comment #55)
> > BTW, why don't you use 'cvs diff' to make your patch?
> 
> Because new files were introduced, and without CVS write permission,
> I couldn't 'cvs add' them before diff-ing. They would thus have been 
> lost in the patch.

Ok. It doesn't matter much, but in a sense it may be more convenient for you  to upload two files (one is a patch against existing files obtained with 'cvs diff' and the other contains new files).
 
> Regarding working with trunk or branch, it's quite a long story.
....
> trunk. Now we seem to miss all the plans, anyway.

Sorry I'm partly to blame ....
OK. I have reworked the patch against trunk, using the configure switch instead of PR_LoadLibrary. Using the switch makes the solution available on other platforms than XP_UNIX as well, provided that libthai is made available.

This patch is adjusted a little bit from previous ones. The existing rule-based Thai word segmentation in rulebrk.c is added back as the fallback when libthai is not linked.

Two additional files are coming.
Attachment #220509 - Attachment is obsolete: true
Attachment #221289 - Attachment is obsolete: true
Attachment #220509 - Flags: review?(jshin1987)
Attachment #221289 - Flags: review?(jshin1987)
Attachment #230603 - Flags: review?(jshin1987)
The PR_LoadLibrary thing was actually better... You don't need to rebuild mozilla to enable/disable libthai support...
Attached file intl/lwbrk/src/nsLibThaiLineBreaker.h (obsolete) —
Attachment #230604 - Flags: review?(jshin1987)
Attachment #230607 - Flags: review?(jshin1987)
Mike Hommey (#58) wrote:

> The PR_LoadLibrary thing was actually better... You don't need to rebuild
> mozilla to enable/disable libthai support...

One drawback is that library names and directories for various platforms will have to be listed, right?

With --enable switch, you turn it on only if libthai is available on your platform, eliminating explicit conditions for different platforms in the code.

Any other points I miss?
Wouldn't the availability of libthai depend on installation instead of platform?
(In reply to comment #62)

Yes, libthai availability depends on installation. But its location and name depends on platform.

From my old patch in attachment #220509 [details] [diff] [review]:

+static const char* LibraryPaths[] = {
+  "/usr/local/lib",
+  "/usr/lib",
+  nsnull
+};
+
+nsLibThaiLineBreaker::nsLibThaiLineBreaker()
+  : mLibThai (nsnull)
+{
+  for (const char** ppLibPath = LibraryPaths; *ppLibPath; ++ppLibPath) {
+    char* pFullName = PR_GetLibraryName (*ppLibPath, "libthai.so.0");
+    if (pFullName) {
+      mLibThai = PR_LoadLibrary (pFullName);
+      PR_FreeLibraryName (pFullName);
+      if (mLibThai)
+        break;
+    }
+  }
+}

Yes, I can list all possible directories (/usr/lib, /usr/local/lib, c:/windows,
c:/Program\ Files/Firefox, etc.) and library names (libthai.so.0, libthai-0.dll, etc.) if this approach is chosen. Just feeling it's a bit awkward.

On the other hand, in distro maintainer's point of view, it might be convenient not to rebuild the whole suite to have libthai support.

This morning, I got another improvement idea: let's encapsulate all the mess and fallbacks for Thai in a class, say nsThaiLineBreaker. And the code in nsJISx4501LineBreaker would become a little bit cleaner. I will post the patch soon after testing it. But I will not switch back to PR_LoadLibrary yet at the moment, until we reach some decision.
Attachment #230603 - Flags: review?(jshin1987)
Attachment #230604 - Flags: review?(jshin1987)
Attachment #230607 - Flags: review?(jshin1987)
Let me reiterate what i told you in the bugreport you did on debian: put it in a component. Then, the component can be linked against libthai without problem. It won't be loaded if libthai is not present at runtime.
I know. And my reply was that the approach was currently not supported in the code base. Currently, JISx4501LineBreaker is practically the sole LineBreaker ever created to handle Unicode string in general. There is no string analysis in Pango's manner which allows multiple engines to be created for individual scripts yet.
Ahem, I decide to cancel the idea about the nsThaiLineBreaker encapsulation. Leaving the fallback in nsJISx4501LineBreaker.cpp should be more beneficial for componentization in the future. The new component just does its job. And the fallback is there on its absence. Isn't that good?

Anyway, I've added some caching to nsLibThaiLineBreaker to improve its performance. I'm posting the new version here.
Attachment #230604 - Attachment is obsolete: true
Attachment #230607 - Attachment is obsolete: true
Re: the path of libthai
How about adding a pref. entry (to be set in about:config) to point to the list of candidate paths (with the reasonable *platform-dependent* values set by default)? On Windows, we need to do a little more than that ... This is not a very clean solution, but there's a precedent of a similar approach (for libfreetype), I think.
I agree with using about:config. Could you suggest some sample code?
Can we put "[line-breaking]" in "Status Whiteboard" filed ?

(just as Bug #157967 "Make Gecko interoperate better with advanced typography systems such as ATSUI, Uniscribe, Pango & STSF")
Meta bug:

Bug #206152 [meta] line breaking bugs
(In reply to comment #69)
> This is not a very clean solution, but there's a precedent of a similar approach (for libfreetype), I think.

This is indeed very unclean, and FWIW, the libfreetype thing is useless nowadays...

(In reply to comment #73)
> the libfreetype thing is useless nowadays...

I know that at least as well as anybody else here.

Adding a pref. entry can be considered if there's no better way than hard-coding the library path or assuming the build-time availability to be the same as the run-time availability. (the latter will work if firefox is built for , say, a Thai linux distribution)

In reply to comment #70)
> I agree with using about:config. Could you suggest some sample code?

See
http://lxr.mozilla.org/seamonkey/source/gfx/src/freetype/nsFreeType.cpp#498

for a sample.  
(In reply to comment #74)
> Adding a pref. entry can be considered if there's no better way than
> hard-coding the library path or assuming the build-time availability to be the
> same as the run-time availability. (the latter will work if firefox is built
> for , say, a Thai linux distribution)

There's also the possibility to do like with the mozgnome component: if the gnome libs are present at build time, build the component, and if they are not at run time, the component won't be loaded (the latter doesn't require anything, the component loading system just can't load a component when its dependencies are not here ; the former is not hard to do).
(In reply to comment #75)

All right, that would be perfect, as both gecko and libthai will then rely on component system to get connected, rather than hard-coding or low-level configuration. Can we make that interface Thai-specific for the moment, to be called from nsJISx4501LineBreaker for Thai chunks?

Being Thai-specific means only Thai Unicode range is supported for the moment, until a proper Pango-like text analysis is implemented.

I know nothing about XPCOM, either, anyway.
I managed to work out the mozlibthai component.
Attachment #230603 - Attachment is obsolete: true
Attachment #230731 - Attachment is obsolete: true
Attachment #230732 - Attachment is obsolete: true
Hmm... It seems splitting the word breaker as a component has solved many issues. Now I have adjusted the patch so that:

- nsJISx4501LineBreaker _always_ tries to load a component of generic nsIThaiLineBreaker interface, with rule-based fallback on its absence. (That is, it's not guarded by libthai build condition any more.)

- The mozlibthai component installs itself as an implementation of nsIThaiLineBreaker interface, without direct link to libi18n.so. It will be built only if libthai is available at configure time, and --disable-libthai option is not set.

This way, distributions can build firefox with libthai support and split the mozlibthai component as a companion package, without enforcing dependency on non-Thai users. Meanwhile, a separate language pack is also possible on Windows in the same manner. (Thanks to Mike Hommey for mentioning mozgnome example.)

We can generalize the interface for more South East Asian languages later. It's still Thai-specific for the moment.
May I request for a review?
Attachment #230985 - Attachment is obsolete: true
Attachment #231067 - Flags: review?(jshin1987)
Attachment #230986 - Attachment is obsolete: true
Attachment #231068 - Flags: review?(jshin1987)
I have some updates for the last two patches. They happened to work by accident on my system, due to some trace left between the builds. Now I have moved the IDL into the component's directory, so that the .xpt file gets built and installed. Besides, the code is also adjusted a little bit.
Attachment #231067 - Attachment is obsolete: true
Attachment #231067 - Flags: review?(jshin1987)
Attachment #241552 - Flags: review?(jshin1987)
Attachment #231068 - Attachment is obsolete: true
Attachment #231068 - Flags: review?(jshin1987)
Attachment #241553 - Flags: review?(jshin1987)
Note: To test the patches, you need libthai, which is available at:
  ftp://linux.thai.net/pub/thailinux/software/libthai
or libthai-dev package in Debian/Ubuntu.
jshin, can you review the latest patches?  

simon, I wonder if you could help out and review to if jshin doesn't have cycles to review.

thanks
Blocks: 299423
While caching up recent changes in trunk to update the patch, and weighing on alternatives, I think another way might be simpler: just replace nsJIS4051LineBreaker with full Pango break support.

I have updated the patch in Bug 336959 (Attachment 260003 [details] [diff]). And the performance seems now acceptable. So, please also consider such option.
According to my tests, Thai line breaking looks fixed (or at least very strongly enhanced) since the checkin of bug 255990. The recent checkin of bug 336959 did not change anything to the behavior.

If the current behavior is correct, maybe this bug can be closed ?

NB : For info, I created the following test page to help test the line breaking behavior for thai : http://jmdesp.free.fr/i18n/linebreak/LBThai.html
Is there a test in the testsuite already to check for that ?
Bug 336959 is for linux only, we still need help for windows and mac.
AFAIK, bug 255990 was about applying line breaks to ASCII chars. Probably, the formerly-bypassed rule-based Thai line breaker got called as a side-effect of the patch. But that's not what we need here, after all. We need more precise approch like dictionary-based breaker.

Bug 336959 makes Thai chunks be analyzed by Pango breaker, which in turn handles Thai line breaking with dictionary-based engine--what we try to achieve in this bug.

But it's only for Unix platforms. We still need more works on Windows and Mac.
(In reply to comment #90)
> Bug 336959 makes Thai chunks be analyzed by Pango breaker, which in turn
> handles Thai line breaking with dictionary-based engine--what we try to achieve
> in this bug.
> 
> But it's only for Unix platforms. We still need more works on Windows and Mac.
> 

Thep, though I see that using pango would be elegant for linux, why don't we try to land something like what we currently have in ubuntu?

Alexander, it would clearly help the general audience if you'd made a more precise statement on what ubuntu currently has.

Anyway, as probably said elsewhere, too, but at least on http://weblogs.mozillazine.org/roc/archives/2007/07/status_3.html, we went for using pango on linux, "and modern Pangos can hook up to libthai to make this happen." Thus I'm not sure why we would need to do additional stuff for linux/unix.
Alexander meaned my another patch currently shipped with Ubuntu. It calls libthai without Pango in between, in an XPCOM component. In other words, the last patches proposed in this bug, as inspired by Mike's suggestion.

Why not? Well, many reasons:

- You can see it has got no response so far, while the Pango patch gets reviewed and committed. But that's probably a non-scientific reason. :)

- It's roc's suggestion, which I agree at some degree. It's just simpler, while allowing supports for other languages like Lao and Khmer immediately when their Pango works are ready. (But availibility on other platforms is an affordable trade-off.)

- My patch here may still require more adjustments to be checked-in. The component communicates with nsJISx4501LineBreaker via nsIThaiLineBreaker interface, which, as suggested by its name, is still Thai-specific. A Pango-like layer will still be needed to separately analyze Lao and Khmer in the future. Actually, while I like to actually have such layer in Mozilla, the change may require architectural changes, which seem to be too huge for me to work alone without any comment to the initially proposed patch.

- Packaging is another issue. While libthai is available on Debian and Ubuntu, and probably other Linux distros as well, it's not available on Windows and Mac yet. The introduced external dependency may still require further re-arrangement, both for libthai itself and Mozilla's build system.

So, as Thai line breaking is now available in most platforms, simply exploiting them appears to be a shorter path.
So we need two additionnal bugs blocking this one :
- Use Uniscribe breaker to analyze Thai chunks under Windows
- Use ATSUI breaker to analyze Thai chunks under Mac OS
And if those two are done this bug will be fully solved.

It will be easier to get that in that it would have been to add libthai as an additional dependency under linux.
Oups,  I meant "than to add libthai as an additional dependency under windows and Mac OS"
Bug 389520 has been filed for Mac.
Bug #390048 has been filed for Windows.
Since Bug #336959 (Linux), Bug #389520 (Mac) and Bug #390048 (Windows) are all fixed and checked-in, the current dependent Bug #299324 (ICU) approach is obsolete. Could we change the dependent list so that this bug will get fixed?
No longer depends on: 299324
In fact, it seems to me this means this bug is now fixed ? 
Can we close as fixed and thus get rid of another 4 digit bug :-) ?
(In reply to comment #100)
> In fact, it seems to me this means this bug is now fixed ? 
> Can we close as fixed and thus get rid of another 4 digit bug :-) ?

Agreed.
If I understand the last few comments correctly, the correct resolution here is WONTFIX, since Thai line breaking has been fixed by other means.
Status: NEW → RESOLVED
Closed: 19 years ago17 years ago
Resolution: --- → WONTFIX
Attachment #241552 - Flags: review?(jshin1987)
Comment on attachment 241552 [details] [diff] [review]
Patch to lwbrk so it calls mozlibthai component at appropriate places

clear outstanding reviews
Attachment #211242 - Flags: review?(jshin1987)
Attachment #241553 - Flags: review?(jshin1987)
You need to log in before you can comment on or make changes to this bug.