336959 - Line Breaking with Pango/Uniscribe

Assignee

Description

•

18 years ago

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8) Gecko/20060419 Epiphany/2.14 Firefox/2.0a1
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.8) Gecko/20060419 Epiphany/2.14 Firefox/2.0a1

As pango has already been used for rendering, it's also possible to use the pango_break() facility for a generic line breaking interface. This may be a sensible way for some solution like libthai [http://libthai.sourceforge.net] to be plugged in through Pango framework, along with solutions for other languages in the future.

To fullfill this approach on Windows, where Pango is rarely available, Uniscribe is just a natural choice.

As a result, this will make Gecko attempt to utilize line breaking API's that are already available on the platforms it runs on, in addition to its built-in engines.

Reproducible: Always

Theppitak Karoonboonyanan

Assignee

Comment 1

•

18 years ago

Attached patch An experimental Pango patch (obsolete) — Details — Splinter Review

I propose this patch (against Trunk) as a proof of concept. To try it, you need libthai and pango-libthai packages from http://libthai.sourceforge.net, or libthai0 and pango-libthai if you are using Debian sid/etch or Ubuntu dapper.

Philip Withnall (unavailable)

Updated

•

18 years ago

Status: UNCONFIRMED → NEW

Ever confirmed: true

Theppitak Karoonboonyanan

Assignee

Comment 2

•

18 years ago

Attached patch Experimental Pango patch, with missing change added (obsolete) — Details — Splinter Review

Oops. I forgot to include a change in a Makefile in previous patch. This one should work.

Attachment #221147 - Attachment is obsolete: true

Arthit Suriyawongkul

Comment 3

•

18 years ago

related: Bug 157967 - Make Gecko interoperate better with advanced typography systems such as ATSUI, Uniscribe, Pango & STSF

Theppitak Karoonboonyanan

Assignee

Comment 4

•

18 years ago

Known problem: performance is comparably worse than when using built-in engines, such as what proposed in bug 7969. This is because the itemization task is done twice, once in nsJISx4501LineWrapper, and again in nsPangoLineWrapper (for non-western and non-CJK languages). So, the overhead is doubled.

Arthit Suriyawongkul

Comment 5

•

18 years ago

Can we put "[line-breaking]" in "Status Whiteboard" filed ?

(just as Bug #157967 "Make Gecko interoperate better with advanced typography systems such as ATSUI, Uniscribe, Pango & STSF")

Arthit Suriyawongkul

Updated

•

18 years ago

Blocks: line-breaking

Arthit Suriyawongkul

Updated

•

18 years ago

Blocks: thai

Theppitak Karoonboonyanan

Assignee

Comment 6

•

17 years ago

Attached patch Updated patch, with reduced steps (obsolete) — Details — Splinter Review

I have adjusted the patch to reduce steps for performance's sake:

- Repleace nsJISx4051LineBreaker when Pango is enabled, instead of tapping
  calls from it, to remove duplicated tasks of scripts analysis in both
  engines. I also believe Pango's capabilities are sophisticated enough for
  the replacement.

- Simply use pango_get_log_attrs(), instead of two-fold pango_itemize() and 
  pango_break(). This reduces one more step, and simplifies logical attributes
  accesses a bit.

- Add GetJISx4051Breaks() method, according to recent change in nsILineBreaker.

With pango 1.16 built with libthai support (to be available soon in Debian
experimental and probably Ubuntu feisty+1), you won't need pango-libthai
to test Thai line breaking any more.

Performance seems acceptable now with this patch.

Attachment #221202 - Attachment is obsolete: true

Axel Hecht [:Pike]

Comment 7

•

17 years ago

roc, how would this relate to your textrun work? If at all, just guessing here.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 8

•

17 years ago

The new textframe code uses only GetJISx4051Breaks from nsILineBreaker (via content/base/src/nsLineBreaker.cpp). You can build it by setting MOZ_ENABLE_NEW_TEXTFRAME = 1 in layout/generic/Makefile.in. (nsILineBreaker's old methods are still used by the content serializer and some editor functions but we could probably easily fix that.) You probably should enable the new textframe and do your work with it because that's what will ship in 1.9.

The new textframe should actually work better for you in the end because it is designed to handle languages like Thai that require unbounded context for linebreaking. Basically nsLineBreaker gets access to the entire paragraph (sometimes less, when the text is broken up by non-text elements such as form elements or images). The current implementation first breaks things up by spaces and then if the text contains CJK characters it calls GetJISx4051Breaks to break up a run of non-spaces. I think we should expand that check in nsLineBreaker to include characters from other languages like Thai that break between non-spaces. Then your patch here should work well. We should probably rename GetJISx4051Breaks though. If this approach works for you then I think we should land your patch soon.

It's a bit unfortunate that we're doing Pango itemization in gfxPangoFontGroup for measurement and rendering --- in many cases this will mean itemization is happening twice for the same string. The problem is that the strings we send down for rendering aren't always the same as the strings you want to use for line breaking. In particular the strings for line breaking can be longer because the strings for rendering stop at font changes and direction changes. If this turns out to be a performance issue for you I think we should probably look at some kind of caching scheme for the results of itemization so if we're lucky enough to see the same string in both places we can reuse the Pango itemization.

Theppitak Karoonboonyanan

Assignee

Comment 9

•

17 years ago

Thanks for the hints. I have tried the new text frame with my Pango patch as you suggested. It works well with my Thai sample text, by a quick hack on IS_CJK_CHAR() in content/base/src/nsLineBreaker.cpp to cover Thai range (We want a better change than this, of course):

Index: nsLineBreaker.cpp
===================================================================
RCS file: /cvsroot/mozilla/content/base/src/nsLineBreaker.cpp,v
retrieving revision 1.3
diff -u -r1.3 nsLineBreaker.cpp
--- nsLineBreaker.cpp   24 Mar 2007 11:07:36 -0000      1.3
+++ nsLineBreaker.cpp   30 Mar 2007 03:50:09 -0000
@@ -60,7 +60,8 @@
 static inline int
 IS_CJK_CHAR(PRUnichar u)
 {
-  return (0x1100 <= u && u <= 0x11ff) ||
+  return (0x0e01 <= u && u <= 0x0e5f) ||
+         (0x1100 <= u && u <= 0x11ff) ||
          (0x2e80 <= u && u <= 0xd7ff) ||
          (0xf900 <= u && u <= 0xfaff) ||
          (0xff00 <= u && u <= 0xffef);

Performance improves a lot!  Judging from the time it takes to render a large off-line sample page, it now feels like openning normal documents with ordinary text editors, compared to a long pause with 100% CPU load in the old text frame. When I previously said the performance was acceptable, I meant average pages. But with new text frame, it applies to large pages as well.

However, I find it hangs on some certain real page, with the console message:

*** glibc detected *** free(): invalid next size (fast): 0x0000000001f175c0 ***

But that may not be related to this bug, as it occurs with or without the Pango patch. It deserves another bug instead. If you like to try, the page in problem is: http://linux.thai.net/planet/ (It's a dynamic page with constant updates, so the bug may depend on when it's viewed.)

Theppitak Karoonboonyanan

Assignee

Comment 10

•

17 years ago

Attached patch Updated patch, fixing a flaw in range checking (obsolete) — Details — Splinter Review

I've found a little flaw in my previous patch about range checking in iterator. It's never triggered the tests so far, but it should be fixed anyway.

Attachment #260003 - Attachment is obsolete: true

Theppitak Karoonboonyanan

Assignee

Comment 11

•

17 years ago

(In reply to comment #10)
> It's never triggered the tests so far, but it should be fixed anyway.

I mean, it's never triggered "in" the tests so far.

Karl Tomlinson (:karlt)

Comment 12

•

17 years ago

It looks like pango_get_log_attrs sets up mLogAttrs with one PangoLogAttr per character, not necessarily one per UTF16 code.  These don't correspond if there are surrogates present in the UTF16.

mAttrsLen can be determined with something like g_utf8_strlen(text, 0) + 1, but matching up indices between arrays is going to be a little more fun.

Bug #378695 is related to this issue.

Theppitak Karoonboonyanan

Assignee

Comment 13

•

17 years ago

Attached patch Updated patch to handle UTF-16 properly (obsolete) — Details — Splinter Review

Pretty fun. I have updated the patch so it iterates UTF-16 strings with UTF16CharEnumerator. Thanks for your comment.

Attachment #260102 - Attachment is obsolete: true

Theppitak Karoonboonyanan

Assignee

Comment 14

•

17 years ago

Comment on attachment 263111 [details] [diff] [review]
Updated patch to handle UTF-16 properly

May I request for a review?

Attachment #263111 - Flags: review?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 15

•

17 years ago

-  return (0x1100 <= u && u <= 0x11ff) ||
+  return (0x0e01 <= u && u <= 0x0e5f) ||

Could you add some comments indicating which characters these ranges correspond to?

+ * The Initial Developer of the Original Code is
+ * Netscape Communications Corporation.
+ * Portions created by the Initial Developer are Copyright (C) 1998
+ * the Initial Developer. All Rights Reserved.

You should change this :-)

With the new text frame, nsPangoLineBreaker::GetJISx4051Breaks is the only method that really matters. (We should rename it, actually, to just GetBreaks ... if you want to do a patch for that, that'd be great! :-) ) I think you could simplify the code to make Next, Prev and BreakInBetween be implemented in terms of that method. Use nsAutoTArray to allocate buffers so you can avoid heap allocations in the common case, like this:

  nsAutoTArray<PRPackedBool,200> buffer;
  if (!buffer.AppendElements(N))
    return ...; // out of memory
  ... buffer.Elements() ...

If you do that, I don't think you need nsPangoLogAttrList. You can just write the iteration inline since there will only be one iteration.

This code doesn't handle 0 bytes in the string, because pango_break stops at nulls. You need an outer loop over the bytes, like here:
http://lxr.mozilla.org/seamonkey/source/gfx/thebes/src/gfxPangoFonts.cpp#1056
The same issue probably arises with g_utf16_to_utf8 etc. Maybe use NS_ConvertUTF16toUTF8 instead.

+  for (nsPangoLogAttrList::Iterator i = logAttrs[0]; *i; ++i)
+  {
+    const PRUnichar* q = p;
+    UTF16CharEnumerator::NextChar(&p, aText + aLen, &err);
+    if (err)
+      break;
+    while (q < p - 1)
+      aBreakBefore[q++ - aText] = PR_FALSE;
+    aBreakBefore[q - aText] = (*i)->is_line_break;
+  }

I think there's a better way to write this loop. Treat it as a while loop where you keep an index into the logAttrs array and a index to the current PRUnichar. You just need to increment the current PRUnichar index an extra time when you see a low surrogate. You don't need to use UTF16CharEnumerator. Also I think this code is putting the line break opportunity in the middle of the surrogate pair, but it should go before the surrogate pair.

+  pango_get_log_attrs(text, -1, -1, pango_context_get_language(pPangoContext),

So this is basically using the default language, right? Would there be any benefit to passing more accurate language information? We do have some and we could change the API to pass it in.

Theppitak Karoonboonyanan

Assignee

Comment 16

•

17 years ago

Attached patch Patch with rearranged code (obsolete) — Details — Splinter Review

Thank you very much for your instructive comments. In this patch, I have tried to follow all, except for the method renaming and the last question regarding Pango language parameter. Yes, source looks much simpler so far. :-)

Attachment #263111 - Attachment is obsolete: true

Attachment #263111 - Flags: review?(roc)

Theppitak Karoonboonyanan

Assignee

Comment 17

•

17 years ago

(In reply to comment #15)

> With the new text frame, nsPangoLineBreaker::GetJISx4051Breaks is the only
> method that really matters. (We should rename it, actually, to just GetBreaks
> ... if you want to do a patch for that, that'd be great! :-) )

That would mean patching many other sources. Maybe, it's better done in another patch?

> +  pango_get_log_attrs(text, -1, -1, pango_context_get_language(pPangoContext),
> 
> So this is basically using the default language, right? Would there be any
> benefit to passing more accurate language information? We do have some and we
> could change the API to pass it in.

That should help reduce unnecessary map building. However, this reminds me of using pango_language_get_default() and getting rid of dependency on GDK. I'll adjust the patch soon, to get rid of unnecessary CXXFLAGS and LDFLAGS in some places. And when the API is changed, the call can then be immediately replaced with the passed in info.

Theppitak Karoonboonyanan

Assignee

Comment 18

•

17 years ago

Attached patch Patch with GDK dependency removed (obsolete) — Details — Splinter Review

Attachment #265483 - Attachment is obsolete: true

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 19

•

17 years ago

> Maybe, it's better done in another patch?

Yes, definitely.

+  if (!breakState.AppendElements(aLen + 1))

This doesn't need to be + 1 ... we won't fill in more than aLen elements

+    if (breakState[++aPos])

... and this tests breakState[aLen], but it shouldn't

+      if (NS_IS_LOW_SURROGATE(aText[u16Offset++]))
+        aBreakBefore[u16Offset++] = PR_FALSE;

I'd prefer the updates to u16Offset to be done in their own statements ... it's clearer.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 20

•

17 years ago

Masayuki, how do you feel about Pango's line breaking for CJK?

Theppitak Karoonboonyanan

Assignee

Comment 21

•

17 years ago

(In reply to comment #19)
> +  if (!breakState.AppendElements(aLen + 1))
> 
> This doesn't need to be + 1 ... we won't fill in more than aLen elements

So, we won't cover the case when the last character is a word end, which can be wrapped after? That's why Pango needs len + 1 elements of PangoLogAttr. Well, but if we have different convention here, I'll fix it.

> +    if (breakState[++aPos])
> 
> ... and this tests breakState[aLen], but it shouldn't

This is related to above point. Same question.

> +      if (NS_IS_LOW_SURROGATE(aText[u16Offset++]))
> +        aBreakBefore[u16Offset++] = PR_FALSE;
> 
> I'd prefer the updates to u16Offset to be done in their own statements ... it's
> clearer.

Fine. I can break it into series of statements in the next patch.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 22

•

17 years ago

> So, we won't cover the case when the last character is a word end, which can be
> wrapped after?

GetJIS4051Breaks only fills in aLen elements. Also, nsLineBreaker doesn't look at the first element. GetJIS4051Breaks is only supposed to compute breaks *inside* the whitespace-delimited "word". Breaks at the ends will be handled by nsLineBreaker. (There almost always will be break opportunities before and after the "word", except when the word is next to the start or end of a paragraph, or a replaced element such as an image.)

When Next() gets to the end of the text, I don't think we should return "yes, you can break after this text" ... we don't know what the next character is, it could be ZWNJ, right? We should always return NS_LINEBREAKER_NEED_MORE_TEXT.

Theppitak Karoonboonyanan

Assignee

Comment 23

•

17 years ago

Attached patch Patch with no access to extra break pos — Details — Splinter Review

OK. So, this is the updated patch.

Attachment #265490 - Attachment is obsolete: true

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 24

•

17 years ago

Comment on attachment 265499 [details] [diff] [review]
Patch with no access to extra break pos

great! I think we should go ahead.

Attachment #265499 - Flags: superreview+

Attachment #265499 - Flags: review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 25

•

17 years ago

Checked in! Thanks very much!!!

I'll close this bug. If you (or someone else) wants to do Uniscribe or ATSUI in a similar way, that can be a separate bug.

Status: NEW → RESOLVED

Closed: 17 years ago

Resolution: --- → FIXED

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 26

•

17 years ago

Reopening. I backed the patch out because it busted Tinderbox. pango_language_get_default was introduced in Pango 1.16 which is too new for us.

Should we go for the GDK dependency or is there another way to get a good default language?

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Theppitak Karoonboonyanan

Assignee

Comment 27

•

17 years ago

Can we just get current locale and convert it with pango_language_from_string()?

Is there a safe API in Mozilla to get current locale?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 28

•

17 years ago

We don't want page rendering to depend on the user's locale settings.

We have language information in the page, but language can vary across characters, even characters in the same word.

Maybe we could use pango_language_from_string("en"), if that does the right thing? Or maybe just pass in an empty string "" if that gives us a usable language?

Theppitak Karoonboonyanan

Assignee

Comment 29

•

17 years ago

I think the language information in the page is closest to what we need. pango_get_log_attr() does analyze script/language for every character, too. The passed-in parameter is just the default value.

Yes, we can temporarily use pango_language_from_string("en") to fix tinderbox, and then adjust (and rename) the API to get that info later.

Theppitak Karoonboonyanan

Assignee

Comment 30

•

17 years ago

Attached patch Patch for old Pango — Details — Splinter Review

Re-submit the patch, with pango_language_get_default() replaced.

Axel Hecht [:Pike]

Comment 31

•

17 years ago

I'm with roc here, we really shouldn't make our rendering depend on the user settings. That would sound like a regional hack to our rendering behaviour, and as such, something we should really only do if we have a well-known benefit.

Page language would be closer, but still put us into the area where we can't reliably test what layout other users are going to see for very similar pages. And I have met people doing studies on the decleration habits of lang="", most pages don't have it, and of those pages that do, most are wrong. So it's not going to be a constructive hint in the first place.

As a side question, should we cache the value of pango_get_language_from_string("en") outside the tight loop?

Theppitak Karoonboonyanan

Assignee

Comment 32

•

17 years ago

(In reply to comment #31)
> I'm with roc here, we really shouldn't make our rendering depend on the user
> settings. That would sound like a regional hack to our rendering behaviour, and
> as such, something we should really only do if we have a well-known benefit.

I'm not against roc regarding the locale dependency issue, either. The use of locale was just from my first thought to write some code as pango_language_get_default() replacement. (Internally, it determines language from system's LC_CTYPE.)

Basing on locale is just a heuristic. For example, chance is that user will open web pages of his/her own language, in case the script can be used to write more than one language. But that's just a guess after all, in case we have no other choice.

> Page language would be closer, but still put us into the area where we can't
> reliably test what layout other users are going to see for very similar pages.
> And I have met people doing studies on the decleration habits of lang="", most
> pages don't have it, and of those pages that do, most are wrong. So it's not
> going to be a constructive hint in the first place.

Although it's the expected behavior? When the page author finds the page is line-wrapped improperly, a natural guess is to set the language, just like usual experience from using word processors, isn't it? I think it would be a surprise to get no difference when setting the right language to the page.

> As a side question, should we cache the value of
> pango_get_language_from_string("en") outside the tight loop?

Umm.. It may depend on the decision to depend on the page language. If we do, that caching would be removed soon anyway. Besides, I think pango_language_from_string() also does some caching already.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 33

•

17 years ago

We might as well make pango_language_from_string get cached in a static. So,

  static PangoLanguage* language = pango_language_from_string("en");

With that change, let's reland the patch.

The problem with using the page language is that every DOM element can have a different language. So the text passed into the linebreaker can be composed of different languages. Does Pango have a way to handle that?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 34

•

17 years ago

I hope the language we choose here is not very important ... for the CJK and Thai cases where Pango is being invoked, I expect Pango to normally just ignore the language and figure things out based on the Unicode character ranges in the text. We shouldn't add the complexity of passing down DOM language information unless we have a testcase where we actually need it.

Theppitak Karoonboonyanan

Assignee

Comment 35

•

17 years ago

Before we really add the cache, let me remind again that pango_language_from_string() also caches this with a hash table.

Is it worth doing the cache again? Probably, because "en" is a very short string. Passing through hashing may not help much.

Regarding Pango's processing of languages/scripts in strings, it's like this, from my understanding:

The passed-in language determines a map of engines. The script characterized by Unicode character range then determines the engine to load from the language's map. Note the distinction between language and script here. German and French can have different maps, although they are both written with Latin script, for example. Then, for scripts that are not in the language map, there is another "default language" map for each script.

So, the difference between passing a fixed language "en" and the value from the real context here is that only "default language" for each character will be assumed in the former case, while "language-specific" engines will be invoked in the latter.

Yes, for CJK and Thai, that difference doesn't matter. Because Thai script is only used to write Thai language, and CJK scripts, despite some characters sharing among different languages, seem to have common rules as reflected in JISx4051. Just to keep in mind that our decision here is based on some assumptions.

Theppitak Karoonboonyanan

Assignee

Comment 36

•

17 years ago

Attached patch Patch with PangoLanguage caching — Details — Splinter Review

Updated patch, with PangoLanguage caching added.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 37

•

17 years ago

+  static PangoLanguage* language = nsnull;
+  if (nsnull == language)
+    language = pango_language_from_string("en");

You can just write
  static PangoLanguage* language = pango_language_from_string("en");

It will only happen once.

Axel Hecht [:Pike]

Comment 38

•

17 years ago

To get my thinking right here, does pango support syllabification? Hopefully picking the right of the two  translations of Silbentrennung. What I mean is, breaking the line in the middle of a word? Then I'd get the importance of feeding in the right language.

That doesn't change the sad fact that most authors wouldn't think about adding language to their pages in the first place, or even fix their CMS to not send lang="en". :-(

Anyway, we don't support syllabification at all, so that's not our problem. The lack of language declaration is probably a good argument to not start supporting it, too.

Theppitak Karoonboonyanan

Assignee

Comment 39

•

17 years ago

(In reply to comment #37)
> +  static PangoLanguage* language = nsnull;
> +  if (nsnull == language)
> +    language = pango_language_from_string("en");
> 
> You can just write
>   static PangoLanguage* language = pango_language_from_string("en");
> 
> It will only happen once.

Err.. I just thought of portability when writing. I have a habit of not initializing static variables with non-load-time data.

Theppitak Karoonboonyanan

Assignee

Comment 40

•

17 years ago

(In reply to comment #38)
> To get my thinking right here, does pango support syllabification? Hopefully
> picking the right of the two  translations of Silbentrennung. What I mean is,
> breaking the line in the middle of a word? Then I'd get the importance of
> feeding in the right language.

My argument does not apply to European languages, of course. The German and French case was just for illustration's sake. There can be many combinations of languages and scripts in other regions of the world. Pango's design seems to address such conditions.

> That doesn't change the sad fact that most authors wouldn't think about adding
> language to their pages in the first place, or even fix their CMS to not send
> lang="en". :-(
> 
> Anyway, we don't support syllabification at all, so that's not our problem. The
> lack of language declaration is probably a good argument to not start
> supporting it, too.

Right. Our current assumption is valid at some degree. What I mean to say is that there can be cases still unknown to us regarding languages of the world.

Theppitak Karoonboonyanan

Assignee

Comment 41

•

17 years ago

Attached patch Patch initializing static var directly — Details — Splinter Review

All right. I give up my portability concern.

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

17 years ago

Attachment #265807 - Flags: superreview+

Attachment #265807 - Flags: review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 42

•

17 years ago

I'm not aware of any system we build on where this was ever a problem. I vaguely recall some system in the past may have had locking problems in a multithreaded context but this will only be called on the main thread anyway.

Theppitak Karoonboonyanan

Assignee

Comment 43

•

17 years ago

My concern may apply to old systems, where compilers/loaders are primitive. Some compiler may even reject such initialization. To me, the if-block pattern of initialization is commonplace in other projects' codes. And I think it's safe to do so. Besides portability issue, performance also counts. The extra comparison with zero is not so expensive, while lazy initializations in many places can sum up to reduced load time, as unnecessary initializations are not done. So, it has become my habit.

But as the call is not so significant here, I'm OK with either way. Please go on.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 44

•

17 years ago

checked in again.

Status: REOPENED → RESOLVED

Closed: 17 years ago → 17 years ago

Resolution: --- → FIXED

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 45

•

17 years ago

> Some compiler may even reject such initialization.

The C++ standard requires this to work, so those compilers would be badly broken.

> lazy initializations in many places can sum up to reduced load time

This is lazy initialization. The C++ standard says that the initialization happens the first time the program reaches the line where the static variable is declared.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 46

•

17 years ago

Backed out. This broke several tinderbox tests, perhaps because we haven't switched over to new-textframe yet...

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 47

•

17 years ago

Looks like the nsTextTransformer self-tests were failing on fxdbug-linux-tbox:

 ###!!! ASSERTION: Illegal value (length > position): 'aLen > aPos', file /builds/tinderbox/Fx-Trunk-Debug/Linux_2.6.9-42.ELsmp_Depend/mozilla/intl/lwbrk/src/nsPangoLineBreaker.cpp, line 102
  ###!!! ASSERTION: Illegal value (length > position): 'aLen > aPos', file /builds/tinderbox/Fx-Trunk-Debug/Linux_2.6.9-42.ELsmp_Depend/mozilla/intl/lwbrk/src/nsPangoLineBreaker.cpp, line 102
  ###!!! ASSERTION: Illegal value (length > position): 'aLen > aPos', file /builds/tinderbox/Fx-Trunk-Debug/Linux_2.6.9-42.ELsmp_Depend/mozilla/intl/lwbrk/src/nsPangoLineBreaker.cpp, line 102
  nsTextTransformer: self test 6 failed
  nsTextTransformer: self test 7 failed
  ###!!! ASSERTION: Illegal value (length > position): 'aLen > aPos', file /builds/tinderbox/Fx-Trunk-Debug/Linux_2.6.9-42.ELsmp_Depend/mozilla/intl/lwbrk/src/nsPangoLineBreaker.cpp, line 102
  ###!!! ASSERTION: Illegal value (length > position): 'aLen > aPos', file /builds/tinderbox/Fx-Trunk-Debug/Linux_2.6.9-42.ELsmp_Depend/mozilla/intl/lwbrk/src/nsPangoLineBreaker.cpp, line 102
  ###!!! ASSERTION: Illegal value (length > position): 'aLen > aPos', file /builds/tinderbox/Fx-Trunk-Debug/Linux_2.6.9-42.ELsmp_Depend/mozilla/intl/lwbrk/src/nsPangoLineBreaker.cpp, line 102
  nsTextTransformer: self test 8 failed
  nsTextTransformer: self test 9 failed
  nsTextTransformer: self test 10 failed
  nsTextTransformer: self test 11 failed
  nsTextTransformer: self test 12 failed
  nsTextTransformer: self test 13 failed
  nsTextTransformer: self test 14 failed
  ###!!! ABORT: file /builds/tinderbox/Fx-Trunk-Debug/Linux_2.6.9-42.ELsmp_Depend/mozilla/layout/generic/nsTextTransformer.cpp, line 1794

Theppitak Karoonboonyanan

Assignee

Comment 48

•

17 years ago

Line 102 is in nsPangoLineBreaker::Prev(). Probably, aLen == aPos are passed? Does changing the assertion to aLen >= aPos help?

Moreover, I find an excessive assertion (aLen > aPos) in nsPangoLineBreaker::GetJISx4051Breaks(). That should be removed as well.

Theppitak Karoonboonyanan

Assignee

Comment 49

•

17 years ago

Attached patch Patch with modified assertions — Details — Splinter Review

It should be valid for Prev() to pass aPos == aLen. And aPos is undefined in GetJISx4051Breaks().

Also, I wonder if Pango's is_line_break logical attribute matches what the self tests assumes.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 50

•

17 years ago

If you make a debug build with the old textframe, then run it with a breakpoint in the nsTextTransformer self tests, you should be able to verify that they run and pass.

Theppitak Karoonboonyanan

Assignee

Comment 51

•

17 years ago

For test 6, with text = "Œnce xy\tzy",
 - Pango's is_line_break gives "Œnce |xy\t|zy"
 - the self test expects "Œnce| |xy|\t|zy"
where '|' represents line break position. Is that really expected for line breaking?

Theppitak Karoonboonyanan

Assignee

Comment 52

•

17 years ago

Attached patch Patch with line break condition adjusted (obsolete) — Details — Splinter Review

If so, the fix could be simply replacing "is_line_break" with "is_line_break || is_word_end".

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 53

•

17 years ago

The Pango behaviour is closer to what we expect with the new textframe. If test 6 is the only test that fails, we should just comment it out. This code will not be used after we turn on the new textframe anyway.

Theppitak Karoonboonyanan

Assignee

Comment 54

•

17 years ago

Not only test 6, the adjustment quiets all failed tests (6-14).

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 55

•

17 years ago

Isn't that adjustment wrong, though? Isn't is_line_break exactly what we want?

Theppitak Karoonboonyanan

Assignee

Comment 56

•

17 years ago

I think so. I'm just not sure about difference in conventions between Pango and Mozilla codes here. If Pango's result is preferred, let's obsolete my last patch and change the self tests instead.

Theppitak Karoonboonyanan

Assignee

Comment 57

•

17 years ago

Do you think it's still worth fixing the expected values of the self tests? Or would that cause failure on non-pango builds anyway, and the tests should just be commented out?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 58

•

17 years ago

I think we should just not check this in until new textframe is turned on and nsTextTransformer is removed. That should be soon.

Theppitak Karoonboonyanan

Assignee

Comment 59

•

17 years ago

Comment on attachment 265957 [details] [diff] [review]
Patch with line break condition adjusted

Obsolete the patch.

Attachment #265957 - Attachment is obsolete: true

Jean-Marc Desperrier

Comment 60

•

17 years ago

IS_CJK_CHAR really should be renamed to something like IS_COMPLEX_LINEBREAK_CHAR. A misleading function name will lead to problems sooner or later.

Arthit Suriyawongkul

Comment 61

•

17 years ago

(In reply to comment #60)
> IS_CJK_CHAR really should be renamed to something like
> IS_COMPLEX_LINEBREAK_CHAR. A misleading function name will lead to problems
> sooner or later.

agreed.

Theppitak Karoonboonyanan

Assignee

Comment 62

•

17 years ago

(In reply to comment #60)
> IS_CJK_CHAR really should be renamed to something like
> IS_COMPLEX_LINEBREAK_CHAR. A misleading function name will lead to problems
> sooner or later.

That would also mean renaming nsLineBreaker::mCurrentWordContainsCJK as well as some local variables, or it would leave doubts to code readers how come they are related.

But mCurrentWordContainsComplexLinebreakChar is a very long name. Would there be a meaningful shorter name?

Theppitak Karoonboonyanan

Assignee

Comment 63

•

17 years ago

Attached patch Summarized changes to current trunk (obsolete) — Details — Splinter Review

Functions and variables renaming; assertion fixes; initial PangoLanguage.

Theppitak Karoonboonyanan

Assignee

Comment 64

•

17 years ago

Attached patch Summarized changes to current trunk (obsolete) — Details — Splinter Review

Oops. I forgot other backed out parts.

Attachment #267096 - Attachment is obsolete: true

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 65

•

17 years ago

I would just call it mCurrentWordIsComplex and IS_COMPLEX_CHAR. Throw in some comments to explain what "complex" means in this context.

Theppitak Karoonboonyanan

Assignee

Comment 66

•

17 years ago

Attached patch Updated patch — Details — Splinter Review

I see you've already changed that in trunk. So, I update the summarized patch against trunk accordingly.

Attachment #267098 - Attachment is obsolete: true

Theppitak Karoonboonyanan

Assignee

Comment 67

•

17 years ago

Comment on attachment 268080 [details] [diff] [review]
Updated patch

Now that the new text layout has been enabled in trunk for a while, I hope it's time to check-in.

Attachment #268080 - Flags: review?(roc)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)(away 4/27 - 5/6)

Comment 68

•

17 years ago

now I'm working on bug 255990 which breaks an URL to multi words.

This patch can break the URL to several words? If so, we should not use this approach. Because there are no standard specs for line breaking. Therefore, we should care for compatibility with other browsers (especially WinIE).

And the current "our" line breaking spac is not JIS X 4051. based on it, but we customized the character classes.

Therefore, this patch breaks compatibility with Linux and others. It is not good for us, it will make to confuse the bug trackers.

I think that the native line breaker should be only used for poor part like Thai.

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)(away 4/27 - 5/6)

Comment 69

•

17 years ago

And note that we need to handle the non-natural languages for line breaking. e.g., URL, file path and fragment of code. We cannot use JIS X 4051 and UAX#14 directly.

Theppitak Karoonboonyanan

Assignee

Comment 70

•

17 years ago

(In reply to comment #68)
> now I'm working on bug 255990 which breaks an URL to multi words.
> 
> This patch can break the URL to several words? If so, we should not use this
> approach. Because there are no standard specs for line breaking. Therefore, we
> should care for compatibility with other browsers (especially WinIE).

Breaking URL requires your patch in bug 255990 so nsPangoLineBreaker gets called for the long continuous Latin chunk. After that, Pango's line breaker will break the URL based on UAX #14.

> And the current "our" line breaking spac is not JIS X 4051. based on it, but we
> customized the character classes.
> 
> Therefore, this patch breaks compatibility with Linux and others. It is not
> good for us, it will make to confuse the bug trackers.
> 
> I think that the native line breaker should be only used for poor part like
> Thai.

Fine. I wasn't aware of your attempt to enhance nsJISx4051LineBreaker when filing this bug. For me, Pango's line breaking is acceptable, as it has UAX #14 as its base, and I assumed its coverage can replace nsJISx4051LineBreaker functionality.

Well, if compatibility really matters here for line breaking behavior, and if we have enough time before FF3, your approach may be the right thing to do. With this, I have some issues:

A. Customization of some Thai characters needs to be added in bug 255990. (Some Thai characters prohibit line breaks before them, even with leading space, for example.)

B. Only call Thai line break routine for Thai chunks. This can be done either via platform's line breakers (Pango, Uniscribe, ATSUI), or by calling Thai library (libthai) directly. (Currently, it provides implementation for Thai line breaking in Pango.) For the latter choice, LibThai source is available under LGPL. (http://libthai.sf.net/) So, it's possible to compile it on other platforms than Linux as well. This could bypass Pango analysis layer and gain a little performance. But build process would need to be rearranged somehow to achieve that.

What do you think?

(For me, it means getting back to bug 7969, with bug 255990 combined.)

Theppitak Karoonboonyanan

Assignee

Comment 71

•

17 years ago

Note, however, that Thai line breaking is very much in need. You can see how long we have been waiting for bug 7969 to be fixed. This single bug has hindered all Firefox promotions in Thailand for years. So, if extending nsJISx4051LineBreaker takes too long, may I request for the Pango patch to be chosen for FF3?

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)(away 4/27 - 5/6)

Comment 72

•

17 years ago

I think that UAX #14 may not be best choice for web page rendering. Because it is not specified in CSS specs that UAs should use UAX #14. E.g., The normal paragraphs which has enough width for the content can be rendered as naturally for many people if we use UAX #14. But HTML has table which cell's width is auto in most times. If the many characters can break the line, there are too narrow cells than WinIE. For inhibiting this issue, we need near behavior of IE. So, it's not UAX #14 even if WinIE uses UAX #14 :-(

However, I think that the usability of Gecko should be preferred than compatibility for WinIE. Therefore, I agree that nsJISx4051LineBreaker uses native libraries in several language. (And I hope to land it before Fx3 too :-))

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 73

•

17 years ago

Here's an idea for how to handle this:
-- define a set of "difficult" characters (let's just say Thai for now)
-- in nsJISx4051LineBreaker::GetJISx4051Breaks, if the string contains any difficult characters, then call the platform API (say Pango) on the whole string and use the platform breakability values to determine whether there is a break opportunity before or after each "difficult" character. The break results for positions not adjacent to a "difficult" character remain controlled by the built-in breaking code.

How does that sound? Will it work well for Thai?

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)(away 4/27 - 5/6)

Comment 74

•

17 years ago

I think that it is a best solution for now.

Now, I fixed bug 255990. It breaks |nsPangoLineBreaker|, sorry. (I added a method to the |nsILineBreaker|, but I didn't change |nsPangoLineBreaker|.

If we use comment 73's solution, do we need to use nsPangoLineBreaker? Should we implement more simple modules for it? But nsPangoLineBreaker may be needed by some users who build themselves. (e.g., they may want more performance than compatibility with WinIE.)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 75

•

17 years ago

I don't think we need to support using Pango for everything. If someone *really* wants to do that they can patch their own build however they like.

What I suggest is to create a native platform linebreaker interface in gfxPlatform.h (because we already use Pango etc in gfx). It really only needs one method, taking a Unicode string and an array of booleans and filling in the array. Then we can call that from intl's linebreaker when difficult characters are detected.

Theppitak Karoonboonyanan

Assignee

Comment 76

•

17 years ago

So, it seems line breaking compatibility inside Thai chunks is not an issue here, when it's handled by platform breakers. (I don't think it matters, either.)

If so, let's go on with that scheme.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 77

•

17 years ago

Great! Can you write it? :-)

Theppitak Karoonboonyanan

Assignee

Comment 78

•

17 years ago

I now have a first version, which does not build, because intl is built in lower tier (necko) than gfx (gecko). So, gfx headers are not available yet when building intl. Would you have some suggestion? Can it be moved?

Theppitak Karoonboonyanan

Assignee

Comment 79

•

17 years ago

Attached patch First untested patch for nsJISx4501LineBreaker-gfx — Details — Splinter Review

OK. I try moving intl from necko to gecko, and it builds. This is a first, untested patch. I can't test it now because current trunk build segfaults on load.

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)(away 4/27 - 5/6)

Comment 80

•

17 years ago

I hope that you use -u8p for readability of the patch :-)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 81

•

17 years ago

I don't think we can move intl from necko to gecko. Sorry.

How about making the platform linebreaker interface its own C++ interface, XPCOM compatible by inheriting from nsISupports, put the interface header file in xpcom/system, and make the implementation an XPCOM service?

Theppitak Karoonboonyanan

Assignee

Comment 82

•

17 years ago

(In reply to comment #80)
> I hope that you use -u8p for readability of the patch :-)

OK. I'll do that next time. Thanks.
 
(In reply to comment #81)
> I don't think we can move intl from necko to gecko. Sorry.
> 
> How about making the platform linebreaker interface its own C++ interface,
> XPCOM compatible by inheriting from nsISupports, put the interface header file
> in xpcom/system, and make the implementation an XPCOM service?

And put the platform implementations in, say, intl/lwbrk/components?

Theppitak Karoonboonyanan

Assignee

Comment 83

•

17 years ago

(In reply to comment #82)

> And put the platform implementations in, say, intl/lwbrk/components?

Probably, it's better built in toolkit tier under, say, toolkit/system, rather than hacked to get back into intl/lwbrk/components again after gecko tier. I'll prepare another patch based on this.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 84

•

17 years ago

How about putting the component in widget or gfx, where we already have platform specific code? In gfx we already use Pango so that's probably not a bad idea.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 85

•

17 years ago

Actually no. All this messing around is not worth it. Just put the Pango code in its own file in intl/lwbrk/src and use ifdef MOZ_ENABLE_PANGO to control whether it gets built or whether a stub implementation that does nothing gets used instead.

Theppitak Karoonboonyanan

Assignee

Comment 86

•

17 years ago

Yeah, I was about to ask about this option. Besides code complexity, I worried about the chance of missing Pango component in some installation even if Pango is enabled. Performance implication of too many abstractions in tight loop is also another issue.

So, let's get back.

Theppitak Karoonboonyanan

Assignee

Comment 87

•

17 years ago

Umm.. OTOH, while working on bug 7969, I also used to have a wish that there had been some pluggable interface for Thai line breakers. Samphan proposed in bug 7969 comment 41 to replace, rather than extend, nsJISx4501LineBreaker with his engine in an extension just because it lacked such generic channel. My initial patch in this bug was also based on this reason.

That may be one supporting reason, if any, for componentization, as there are many Thai line break engines available, in addition to platform breakers.

Not to push this approach, though. Just FYI.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 88

•

17 years ago

That plugging could happen at the Pango level, couldn't it?

Theppitak Karoonboonyanan

Assignee

Comment 89

•

17 years ago

Yes, if Pango is intalled on *all* platforms.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 90

•

17 years ago

Oh, I thought you were talking about plugging in multiple Thai line breakers that are already available on the platform.

Why would we ever want to ship multiple Thai line breakers ourselves?

Theppitak Karoonboonyanan

Assignee

Comment 91

•

17 years ago

No, it's not about shipping it ourselves. I mean extensibility with extensions. In fact, there have been at least 3-4 custom solutions available while waiting for bug 7969. Most Thai users do not use mozilla.org-released Firefox as is, nor binary packages provided by distros [with Ubuntu as an exception], as the lack of proper line breaking rendered it unusable. And the need was so high that many developers came up with their own solutions on Linux, Windows, and Mac. Some users may still prefer their own favorite one even if Pango support is shipped with standard mozilla.org release, esp. for non-Unix platforms.

I cannot guess what would happen after Pango (or Uniscribe/ATSUI) support is included upstream. Some may not like the quality it provides, esp. when they cannot do anything with it for the cases of Uniscribe/ATSUI. Some may try to push their solutions upstream after that, or they may just continue overriding nsJISx4501LineBreaker, who knows? And if it's the case, extensibility for Thai line breaker can minimize the incompatibility.

But I can't speak for them, anyway. So I said I didn't try to push it. We can go on with hard-wiring and wait for further reports.

Theppitak Karoonboonyanan

Assignee

Comment 92

•

17 years ago

Attached patch untested patch using make flag — Details — Splinter Review

This patch uses MOZ_ENABLE_PANGO as make flag to choose Thai line breaker implementation. It falls back to rulebrk.c for non-Pango builds.

Patch still untested, because trunk default build still segfaults on start up.

Theppitak Karoonboonyanan

Assignee

Comment 93

•

17 years ago

masayuki, note on JISx4051 table: I think it doesn't make sense to have Thai characters breakable with all classes. For example, we shouldn't break between Thai character and quotation mark, exclamation mark, question mark, and parentheses. It makes more sense to treat it like Latin by default. (This includes no breaking between Thai characters, too. So it still makes some sense even without the "special handling".)

In above patch, I modified the loops with the Thai breakability assumption removed.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 94

•

17 years ago

> trunk default build still segfaults on start up.

It does? What bug is this?

Theppitak Karoonboonyanan

Assignee

Comment 95

•

17 years ago

On my Debian sid amd64 box:

$ dist/bin/firefox
dist/bin/run-mozilla.sh: line 131: 18712 Segmentation fault      "$prog" ${1+"$@"}

Normally, I "make install" it before running, but this time, it complains:

/home/thep/cvs/mozilla_cvs/mozilla/config/nsinstall -t -m 644 xrecore.h nsXULAppAPI.h /home/foxy/include/firefox-3.0a7pre/xulapp
/home/thep/cvs/mozilla_cvs/mozilla/config/nsinstall -t -m 644 _xpidlgen/xulapp.xpt /home/foxy/lib/firefox-3.0a7pre/components
/home/thep/cvs/mozilla_cvs/mozilla/config/nsinstall -t -m 644 nsINativeAppSupport.idl nsIXULRuntime.idl /home/foxy/share/idl/firefox-3.0a7pre
/home/thep/cvs/mozilla_cvs/mozilla/config/nsinstall -t -m 644 _xpidlgen/nsINativeAppSupport.h _xpidlgen/nsIXULRuntime.h /home/foxy/include/firefox-3.0a7pre/xulapp
/home/thep/cvs/mozilla_cvs/mozilla/config/nsinstall -R -m 644  /home/foxy/lib/firefox-3.0a7pre
usage: /home/thep/cvs/mozilla_cvs/mozilla/config/nsinstall [-C cwd] [-L linkprefix] [-m mode] [-o owner] [-g group]
                                                           [-DdltR] file [file ...] directory
make[3]: *** [install] Error 2
make[3]: Leaving directory `/home/thep/cvs/mozilla_cvs/mozilla/toolkit/xre'
make[2]: *** [install] Error 2
make[2]: Leaving directory `/home/thep/cvs/mozilla_cvs/mozilla/toolkit'
make[1]: *** [install] Error 2
make[1]: Leaving directory `/home/thep/cvs/mozilla_cvs/mozilla'
make: *** [install] Error 2

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 96

•

17 years ago

Can you get a stack trace for that crash? Or a regression range? You should file a bug in any case.

Theppitak Karoonboonyanan

Assignee

Comment 97

•

17 years ago

Oddly enough, when I use --enable-debug to get more info, it doesn't crash! But the "make install" problem remains.

Anyway, I can now run it from the uninstalled copy. And the patch works.

Theppitak Karoonboonyanan

Assignee

Comment 98

•

17 years ago

Attached patch updated patch using make flag, modifying jis x4051 table (obsolete) — Details — Splinter Review

In this patch, I also modify JIS x4051 pair table for Thai, so that it's not broken before closing parenthesis and after openning parenthesis, etc. With this, Thai punctuation marks and digits are also classified in separation from alphabets.

Theppitak Karoonboonyanan

Assignee

Comment 99

•

17 years ago

(In reply to comment #97)
> Oddly enough, when I use --enable-debug to get more info, it doesn't crash! But
> the "make install" problem remains.

To be sure, I just "make clean" and rebuild everything again. It now doesn't crash with or without --enable-debug. So, there must have been something wrong in my working copy in former builds.

But the "make install" problem should be real. I'll file a bug soon.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 100

•

17 years ago

"make install" is hardly ever used, I'm not surprised it doesn't work.

There's no point in keeping Next, Prev and BreakInBetween working with their separate implementations. BreakInBetween is never used anymore, just delete it from the interface (and rev the UUID). I suggest you replace Prev and Next with implementations that just select the space-delimited word containing the requested aPos, call GetJISx4051Breaks on that word, and compute their results from the returned boolean array. These are only called by stuff like content serialization now, where performance is not an issue as long as it's still O(N). At some point we should get rid of the callers to Prev and Next, and have them use nsLineBreaker.h instead, but that's probably more work than you want to do right now.

-     c = CLASS_THAI;
+     if (0x0E50 <= u && u <= 0x0E59) {
+       c = 6;  // Thai digits
+     } else {
+       switch (u)
+       {
+         case 0x0E3F: // Thai Currency Symbol Baht
+	   c = 0; break;
+         case 0x0E2F: // Thai Paiyan Noi
+         case 0x0E46: // Thai Mai Yamok
+         case 0x0E5A: // Thai Angkhan Khu
+         case 0x0E5B: // Thai Khomut
+           c = 1; break;
+         case 0x0E4F: // Thai Fongman
+           c = 8; break;

Could this be done better using GETCLASSFROMTABLE and a new table?

+      breakState.AppendElements(end - cur);

Check for OOM here.

+nsGetComplexLineBreaks(const PRUnichar* aText, PRUint32 aLength,
+                       PRPackedBool* aBreakBefore);

Call this NS_GetComplexLineBreaks, otherwise it looks like a type to me.

+      while (end < aLength && CLASS_THAI == GetClass(aChars[end]))
+        ++end;
+      for (PRUint32 i = cur; i < end; ++i)
+        aBreakBefore[i] = breakState[i - cur];

I think we generally try to use {} around any statement-blocks that aren't break, continue or return.

+      cur = end;

I think this is wrong. 'cur' gets incremented after this and aBreakBefore[end] won't be set. You probably want to set cur to end - 1 instead.

I think this code sets aBreakBefore[cur] to always be false, right? Should aBreakBefore[cur] and aBreakBefore[end] be set to the class-based break values, or be taken from the complex line breaker? If you want to use the complex line breaker for those values, then I think you should pass the whole string to the complex line breaker so it has all the context it might need. In this case you'll want to make sure we only call the complex breaker once.

Instead of making this stuff Thai-specific, should we rename CLASS_THAI to CLASS_COMPLEX and similarly for other names? Aren't there other languages that need similar processing, like Khmer and Lao? (We don't have to support those now but if we can make it easy to support them later without doing much extra work, that'd be nice.)

Theppitak Karoonboonyanan

Assignee

Comment 101

•

17 years ago

(In reply to comment #100)

> There's no point in keeping Next, Prev and BreakInBetween working with their
> separate implementations. BreakInBetween is never used anymore, just delete it
> from the interface (and rev the UUID).

I'll do so in next patch.

> I suggest you replace Prev and Next with
> implementations that just select the space-delimited word containing the
> requested aPos, call GetJISx4051Breaks on that word, and compute their results
> from the returned boolean array. These are only called by stuff like content
> serialization now, where performance is not an issue as long as it's still
> O(N). At some point we should get rid of the callers to Prev and Next, and have
> them use nsLineBreaker.h instead, but that's probably more work than you want
> to do right now.

Oh, yes. I tried to be less intrusive in last patch. But the change to Prev() and Next() you suggest is logical.

> -     c = CLASS_THAI;
> +     if (0x0E50 <= u && u <= 0x0E59) {
> +       c = 6;  // Thai digits
> +     } else {
> +       switch (u)
> +       {
> +         case 0x0E3F: // Thai Currency Symbol Baht
> +          c = 0; break;
> +         case 0x0E2F: // Thai Paiyan Noi
> +         case 0x0E46: // Thai Mai Yamok
> +         case 0x0E5A: // Thai Angkhan Khu
> +         case 0x0E5B: // Thai Khomut
> +           c = 1; break;
> +         case 0x0E4F: // Thai Fongman
> +           c = 8; break;
> 
> Could this be done better using GETCLASSFROMTABLE and a new table?

This would involve changing intl/lwbrk/tools/jisx4501class.txt, right? I'll try that.

> +      breakState.AppendElements(end - cur);
> 
> Check for OOM here.

OK. Probably, it should fallback to somthing logical rather than just return from the function? (It's void function, and no way to return the status.)

> +nsGetComplexLineBreaks(const PRUnichar* aText, PRUint32 aLength,
> +                       PRPackedBool* aBreakBefore);
> 
> Call this NS_GetComplexLineBreaks, otherwise it looks like a type to me.

OK.

> +      while (end < aLength && CLASS_THAI == GetClass(aChars[end]))
> +        ++end;
> +      for (PRUint32 i = cur; i < end; ++i)
> +        aBreakBefore[i] = breakState[i - cur];
> 
> I think we generally try to use {} around any statement-blocks that aren't
> break, continue or return.

OK, including this as well, right?

       while (cur > begin)
         if (breakState[--cur - begin])
           return cur;

> +      cur = end;
> 
> I think this is wrong. 'cur' gets incremented after this and aBreakBefore[end]
> won't be set. You probably want to set cur to end - 1 instead.

You're right. It should have been end - 1.

> I think this code sets aBreakBefore[cur] to always be false, right? Should
> aBreakBefore[cur] and aBreakBefore[end] be set to the class-based break values,
> or be taken from the complex line breaker? If you want to use the complex line
> breaker for those values, then I think you should pass the whole string to the
> complex line breaker so it has all the context it might need. In this case
> you'll want to make sure we only call the complex breaker once.

With cur = end - 1 fix above, aBreakBefore[end] should be set in next round. And for aBreakBefore[cur], it should be already set by this code before entering Thai chunk:

    if (cur > 0) {
      NS_ASSERTION(CLASS_COMPLEX != lastClass || CLASS_COMPLEX != cl,
                   "Loop should have prevented two adjacent Thai chars here");
      allowBreak = GetPair(lastClass, cl);
    } else {
      allowBreak = PR_FALSE;
    }
    aBreakBefore[cur] = allowBreak;

Note that it's forced to be PR_FALSE only at string beginning. (This may require Prev() and Next() to set it themselves before calling GetJISx4051Breaks() if I modify it as described above.)

> Instead of making this stuff Thai-specific, should we rename CLASS_THAI to
> CLASS_COMPLEX and similarly for other names? Aren't there other languages that
> need similar processing, like Khmer and Lao? (We don't have to support those
> now but if we can make it easy to support them later without doing much extra
> work, that'd be nice.)

That's fine. But note that this implies that we will enforce a requirement of a middle layer like Pango to separate Thai/Lao/Khmer texts before calling appropriate language engines. But we can imitate that by a separate breaker wrapper if actually in need. (I'm talking about the lack of supports for such languages in Windows and Mac, due to their small market sizes. So, Uniscribe/ATSUI may not be sufficient for them, and some other solutions may be in need.)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 102

•

17 years ago

(In reply to comment #101)
> (In reply to comment #100)
> > Could this be done better using GETCLASSFROMTABLE and a new table?
> 
> This would involve changing intl/lwbrk/tools/jisx4501class.txt, right? I'll
> try that.

You'll need to modify http://mxr.mozilla.org/seamonkey/source/intl/lwbrk/tools/anzx4501.pl too (see where it prints out the arrays).

> > +      breakState.AppendElements(end - cur);
> > 
> > Check for OOM here.
> 
> OK. Probably, it should fallback to somthing logical rather than just return
> from the function? (It's void function, and no way to return the status.)

Yeah, fill the array in with either true or false, whichever makes more sense to you.

> > I think we generally try to use {} around any statement-blocks that aren't
> > break, continue or return.
> 
> OK, including this as well, right?
> 
>        while (cur > begin)
>          if (breakState[--cur - begin])
>            return cur;

No, that's a return.

> With cur = end - 1 fix above, aBreakBefore[end] should be set in next round.

Right.

> And for aBreakBefore[cur], it should be already set by this code before
> entering Thai chunk:
> 
>     if (cur > 0) {
>       NS_ASSERTION(CLASS_COMPLEX != lastClass || CLASS_COMPLEX != cl,
>                    "Loop should have prevented two adjacent Thai chars here");
>       allowBreak = GetPair(lastClass, cl);
>     } else {
>       allowBreak = PR_FALSE;
>     }
>     aBreakBefore[cur] = allowBreak;

Nope, because your "i" loop is overwriting it. You probably want to start your i loop at cur+1.

> > Instead of making this stuff Thai-specific, should we rename CLASS_THAI to
> > CLASS_COMPLEX and similarly for other names? Aren't there other languages
> > that need similar processing, like Khmer and Lao? (We don't have to support
> > those now but if we can make it easy to support them later without doing
> > much extra work, that'd be nice.)
> 
> That's fine. But note that this implies that we will enforce a requirement of > a middle layer like Pango to separate Thai/Lao/Khmer texts before calling
> appropriate language engines. But we can imitate that by a separate breaker
> wrapper if actually in need. (I'm talking about the lack of supports for such
> languages in Windows and Mac, due to their small market sizes. So,
> Uniscribe/ATSUI may not be sufficient for them, and some other solutions may
> be in need.)

Yeah. We can write platform-specific NS_GetComplexLineBreaks implementations that do whatever is necessary for the platform.

Theppitak Karoonboonyanan

Assignee

Comment 103

•

17 years ago

Attached patch Updated patch, rewriting Prev(), Next(); using class table (obsolete) — Details — Splinter Review

Patch updated according to above comments. Classes for Lao, which is also in 0EXX range, are also added.

Attachment #272399 - Attachment is obsolete: true

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 104

•

17 years ago

+           (0x0e01 <= u && u <= 0x0e5f) || // Thai

Is this now Thai+Lao?

We should make GetClass entirely table driven. We could have a 256-entry table that just maps the high byte of the character to a class table. (Each entry could be one byte, indexing an array of class tables.) I think we'd need 5 more class tables as well as the pointer table but that's only 5*128+256 = 896 more bytes. We don't need to do that in this bug though.

You could share the implementation of Next and Prev in a static function that takes a direction flag (I like to make it an integer which is either -1 or 1). The loop over the breakState array could be written

  do {
    aPos += aDirection;
  } while (aPos >= begin && aPos < end && !breakState[aPos - begin]);

+    // out of memory, behave as if there were no complex line breaker
+    for (PRUint32 i = 0; i < aLength; ++i)
+      aBreakBefore[i] = IS_SPACE(aText[i]);

I wouldn't try this hard just for out-of-memory. Just set aBreakBefore to all PR_FALSE. Anyway, IS_SPACE can't be true because IS_COMPLEX is true for all characters passed in here, right?

Nearly there! Just one more iteration required I hope :-)

Theppitak Karoonboonyanan

Assignee

Comment 105

•

17 years ago

(In reply to comment #104)
> +           (0x0e01 <= u && u <= 0x0e5f) || // Thai
> 
> Is this now Thai+Lao?

Umm.. Yes, partially for Lao, by the classes info just added for completeness. Let's pass Lao in for the superficial support, then.

> We should make GetClass entirely table driven. We could have a 256-entry table
> that just maps the high byte of the character to a class table. (Each entry
> could be one byte, indexing an array of class tables.) I think we'd need 5 more
> class tables as well as the pointer table but that's only 5*128+256 = 896 more
> bytes. We don't need to do that in this bug though.

Right. I'm afraid it would need regression test for CJK as well, which I'm not so sure I can handle it well.

> You could share the implementation of Next and Prev in a static function that
> takes a direction flag (I like to make it an integer which is either -1 or 1).
> The loop over the breakState array could be written
> 
>   do {
>     aPos += aDirection;
>   } while (aPos >= begin && aPos < end && !breakState[aPos - begin]);

Oops. My loop condition in Prev() "while (aPos > 0)" was actually wrong. :P

However, since aPos and begin are unsigned, I think it can only be "aPos > begin", or we risk underflow on decrementing when begin == 0. Besides, breakState[aPos - begin] is always false when aPos == begin from GetJISx4051Breaks() assumption. So, when aPos reaches zero, it can be word start or really unbreakable. We should return NS_LINEBREAKER_NEED_MORE_TEXT in such case. But if aPos reaches non-zero begin, it's chunk start and thus breakable.

The shared function can't be static, though, I think. We need to call GetJISx4051Breaks() method.

> +    // out of memory, behave as if there were no complex line breaker
> +    for (PRUint32 i = 0; i < aLength; ++i)
> +      aBreakBefore[i] = IS_SPACE(aText[i]);
> 
> I wouldn't try this hard just for out-of-memory. Just set aBreakBefore to all
> PR_FALSE. Anyway, IS_SPACE can't be true because IS_COMPLEX is true for all
> characters passed in here, right?

I tried to decouple it by generalizing it. But, yes, it could be assumed there is no space passed in.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 106

•

17 years ago

> However, since aPos and begin are unsigned

You can convert them to signed...

> So, when aPos reaches zero, it can be word
> start or really unbreakable.

Right, so at the end you need an extra conditional to decide what result to return.

> The shared function can't be static, though, I think. We need to call
> GetJISx4051Breaks() method.

Sure OK.

Theppitak Karoonboonyanan

Assignee

Comment 107

•

17 years ago

With various constraints, the loop looks quite ugly:

    while (aDirection > 0 || aPos > begin) {
      aPos += aDirection;
      if (aPos >= end || breakState[aPos - begin])
        break;
    }

The constraints are:

a) (aPos > begin) needs to be checked before decrementing, to prevent unsigned wrap-around. This converts the loop from do-while to while-do.

b) (aPos == end) can be true in case of Prev(), while (aPos == begin) can also be true for Next(). So, mere (aPos > begin) loop condition will break the edge case for Next(), and checking (aPos < end - 1) before incrementing will break the edge case for Prev(). The former introduces (aDirection > 0 ||) to the loop, and the latter tears the (aPos >= end) loop condition to be done after incrementing.

If above loop is still readable to you, I'll go on with it.

Theppitak Karoonboonyanan

Assignee

Comment 108

•

17 years ago

(In reply to comment #106)
> > However, since aPos and begin are unsigned
> 
> You can convert them to signed...

Hmm.. The function returns PRInt32 anyway. So, we don't need to worry about the range. OK. I'll retry with that.

Theppitak Karoonboonyanan

Assignee

Comment 109

•

17 years ago

Attached patch updated patch, with WordMove() shared implementation — Details — Splinter Review

Attachment #272488 - Attachment is obsolete: true

Theppitak Karoonboonyanan

Assignee

Updated

•

17 years ago

Attachment #272597 - Attachment is patch: true

Attachment #272597 - Attachment mime type: text/x-patch → text/plain

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 110

•

17 years ago

Comment on attachment 272597 [details] [diff] [review]
updated patch, with WordMove() shared implementation

+  for (end = aPos + 1; end < aLen && !IS_SPACE(aText[end]); ++end)

Cast aLen to PRInt32(aLen) here to avoid a gcc warning.

Otherwise looks good. Attach the final patch and get someone to land it for you. I will if no-one else is able.

That person should CVS remove nsPangoLineBreaker.h/.cpp too.

Attachment #272597 - Flags: superreview+

Attachment #272597 - Flags: review+

Theppitak Karoonboonyanan

Assignee

Comment 111

•

17 years ago

Attached patch updated patch with warnings resolved (obsolete) — Details — Splinter Review

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)(away 4/27 - 5/6)

Comment 112

•

17 years ago

wait, it looks like that the prev/next can break ASCII words even if the text doesn't have CJK characters?

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)(away 4/27 - 5/6)

Comment 113

•

17 years ago

see bug 344816. we should not break ASCII words in Serializers.

Theppitak Karoonboonyanan

Assignee

Comment 114

•

17 years ago

I'm not sure I understand the problem. You mean Prev() and Next() should try to analyze complex line breaks only when at least one CJK/complex character is found in the string?

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)(away 4/27 - 5/6)

Comment 115

•

17 years ago

Yes, we should not break the lines in Prev/Next if they doesn't have CJK/complext charcters. But logically, this approach is wrong. I think that we should add a param for GetJISx4051Breaks like my patch for bug 344816.

I want to hear the roc's suggestion.

Theppitak Karoonboonyanan

Assignee

Comment 116

•

17 years ago

The change not to break non-CJK/complex texts should be simple: just borrow the fall back code in WordMove() when it's the case. If GetJISx4051Breaks() is not used by serializers, no need to touch it, I think.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 117

•

17 years ago

right; in WordMove(), after you've computed begin and end (or possibly while you're computing them), scan over the word and see if any characters are IS_CJK_CHAR. If none are, don't allow breaking in the word and don't call GetJISx4051Breaks. GetJISx4051Breaks does not need to change.

One thing I realized is that the term "complex" is overused; we use it in nsLineBreaker.h to mean any character requiring nsILineBreaker (pseudo-JIS4051) processing. With this patch we're using it again to mean characters that are too complex for JIS4051 and require platform-supplied language-specific processing (Thai/Lao). Theppitak, I bet you're tired of making changes to this patch --- sorry --- but when you fix the WordMove() issue, could you rename the use of "complex" here to something else? How about CLASS_EXTENDED, NS_GetExtendedLineBreaks etc?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 118

•

17 years ago

> scan over the word and see if any characters are IS_CJK_CHAR

That would give us compatibility with existing behaviour, but you could also check for IS_COMPLEX_CHAR/IS_EXTENDED_CHAR if you want the serializer to start breaking Thai/Lao.

Theppitak Karoonboonyanan

Assignee

Comment 119

•

17 years ago

In fact, I share some mind with Masayuki that being explicit can make things clearer. I wonder if the test cases in Bug 255990 are addressed with this complex chars detection. (I saw some Korean characters in URLs in attachment 156435 [details].)

An idea I get after rethinking is to reopen Bug 344816 and adjust Masayuki's patch after closing this bug. Is that sound?

Regarding the term "complex", well, I think the precise meaning of CTL doesn't include CJK. What's really "complex" are languages like Thai, Lao, Khmer, and even Arabic. But if you insist on changing it, I can do it in another patch, of course. (I'll not be tired, until it's committed. Because I'll continue being much more tired afterwards if it isn't.)

Masayuki Nakano [:masayuki] (he/him)(JST, +0900)(away 4/27 - 5/6)

Comment 120

•

17 years ago

(In reply to comment #119)
> An idea I get after rethinking is to reopen Bug 344816 and adjust Masayuki's
> patch after closing this bug. Is that sound?

I worry that the things that the URLs are broken at pasting block the testing (or daily use :-) ) of tb nightly builds.

Theppitak Karoonboonyanan

Assignee

Comment 121

•

17 years ago

If so, I should follow roc's comment for compatibility with existing behavior. You may address it later by reopenning Bug 344816 if you still like to add the extra argument.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 122

•

17 years ago

> If so, I should follow roc's comment for compatibility with existing behavior.
> You may address it later by reopenning Bug 344816 if you still like to add the
> extra argument.

I agree.

Ok, don't change the COMPLEX names for now.

Theppitak Karoonboonyanan

Assignee

Comment 123

•

17 years ago

Attached patch updated patch with complex line break avoided for non-JISx4051 chars — Details — Splinter Review

Attachment #272599 - Attachment is obsolete: true

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 124

•

17 years ago

Comment on attachment 272730 [details] [diff] [review]
updated patch with complex line break avoided for non-JISx4051 chars

OK let's land this!

Attachment #272730 - Flags: superreview+

Attachment #272730 - Flags: review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 125

•

17 years ago

checked in!

Status: REOPENED → RESOLVED

Closed: 17 years ago → 17 years ago

Resolution: --- → FIXED

Kai Engert (:KaiE:)

Comment 126

•

17 years ago

This patch breaks compilation on linux x86_64 with the following error message:

/usr/bin/ld: ../lwbrk/src/liblwbrk_s.a(nsPangoBreaker.o): relocation R_X86_64_PC32 against `pango_language_from_string' can not be used when making a shared object; recompile with -fPIC

Theppitak Karoonboonyanan

Assignee

Comment 127

•

17 years ago

That's weird, because I was using Debian sid amd64 when preparing & testing the patch..

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 128

•

17 years ago

What sort of build, Kai? static? libxul?

Kai Engert (:KaiE:)

Comment 129

•

17 years ago

SeaMonkey, debug, not static, not libxul

mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/../obj-sm-debug
mk_add_options MOZ_CO_PROJECT=suite
ac_add_options --enable-application=suite
ac_add_options --enable-extensions=default,tasks
ac_add_options --disable-tests

ac_add_options --enable-default-toolkit=cairo-gtk2
ac_add_options --enable-pango
ac_add_options --enable-svg
ac_add_options --enable-canvas

ac_add_options --disable-elf-dynstr-gc

ac_add_options --without-system-nspr
ac_add_options --without-system-nss
ac_add_options --without-system-jpeg
ac_add_options --without-system-zlib
ac_add_options --without-system-png

ac_add_options --enable-debug
ac_add_options --enable-optimize=no

mk_add_options MOZ_MAKE_FLAGS=-j3


Hmm, maybe I should no longer use cairo-gtk2?
I forgot why I selected that in the past.
Let me try gtk2.

Kai Engert (:KaiE:)

Comment 130

•

17 years ago

checking if app-specific confvars.sh exists... /home/kaie/moz/head-312/mozilla/suite/confvars.sh
configure: error: Toolkit must be cairo-gtk2.
*** Fix above errors and then restart with               "make -f client.mk build"
make[1]: *** [configure] Error 1

(not currently active) Ted Mielczarek

Updated

•

17 years ago

Depends on: 384172

(not currently active) Ted Mielczarek

Comment 131

•

17 years ago

You probably need to add the pango headers to config/system-headers.  See, for example, bug 385692.

(not currently active) Ted Mielczarek

Comment 132

•

17 years ago

Attached patch add pango/pango-break.h to system-headers [checked in] — Details — Splinter Review

Presumably this will fix it, but I didn't do any testing.

Attachment #273111 - Flags: review?(roc)

Kai Engert (:KaiE:)

Comment 133

•

17 years ago

Your patch doesn't seem to help me:

gmake[6]: Entering directory `/home/kaie/moz/head-312/obj-sm-debug/intl/lwbrk/src'
nsSampleWordBreaker.cpp
nsJISx4501LineBreaker.cpp
nsSemanticUnitScanner.cpp
gmake[6]: *** No rule to make target `nsPangoBreaker.o', needed by `liblwbrk_s.a'.  Stop.
gmake[6]: *** Waiting for unfinished jobs....
/home/kaie/moz/head-312/mozilla/intl/lwbrk/src/nsJISx4501LineBreaker.cpp:45:30: error: nsComplexBreaker.h: No such file or directory
/home/kaie/moz/head-312/mozilla/intl/lwbrk/src/nsJISx4501LineBreaker.cpp: In member function ‘virtual void nsJISx4051LineBreaker::GetJISx4051Breaks(const PRUnichar*, PRUint32, PRPackedBool*)’:
/home/kaie/moz/head-312/mozilla/intl/lwbrk/src/nsJISx4501LineBreaker.cpp:517: error: ‘NS_GetComplexLineBreaks’ was not declared in this scope
gmake[6]: *** [nsJISx4501LineBreaker.o] Error 1

(not currently active) Ted Mielczarek

Comment 134

•

17 years ago

Hm, it should address the problem in comment 126, I don't know what's up with the error you just pasted.

Kai Engert (:KaiE:)

Comment 135

•

17 years ago

(In reply to comment #134)
> Hm, it should address the problem in comment 126, I don't know what's up with
> the error you just pasted.
> 

It's a new error that seems to be introduced by your patch.

When I compile trunk as is, I get the error I reported in comment 126.

When I compile trunk with the patch proposed in comment 132, I get the error from comment 133.

When I back out the patch originally checked in for this bug, my build completes ok.

(not currently active) Ted Mielczarek

Comment 136

•

17 years ago

Ah, okay.  Maybe I should actually try compiling on this x86_64 system I have sitting right here.  :)

(not currently active) Ted Mielczarek

Comment 137

•

17 years ago

Hm, I can't reproduce any build bustage on my x86_64 build.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 138

•

17 years ago

Comment on attachment 273111 [details] [diff] [review]
add pango/pango-break.h to system-headers [checked in]

That patch did fix it for me (I wrote the same patch before seeing this), but I needed to clobber nsPangoBreaker.o to get it to take effect.

Attachment #273111 - Flags: superreview+

Attachment #273111 - Flags: review+

(not currently active) Ted Mielczarek

Comment 139

•

17 years ago

Comment on attachment 273111 [details] [diff] [review]
add pango/pango-break.h to system-headers [checked in]

Checked in.  Turns out I have gcc 4.1.2, which is why I couldn't reproduce this.

Attachment #273111 - Attachment description: add pango/pango-break.h to system-headers → add pango/pango-break.h to system-headers [checked in]

Attachment #273111 - Flags: review?(roc)

Theppitak Karoonboonyanan

Assignee

Comment 140

•

17 years ago

Attached patch Patch adding a missing return — Details — Splinter Review

I just find that a return line is missing in my last patch. In nsPangoBreaker.cpp, the out-of-memory fallback should return immediately when it's done.

Kai Engert (:KaiE:)

Comment 141

•

17 years ago

(In reply to comment #138)
> That patch did fix it for me (I wrote the same patch before seeing this), but I
> needed to clobber nsPangoBreaker.o to get it to take effect.

Confirming, now it works for me, too. Thanks.

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

17 years ago

Attachment #273382 - Flags: superreview+

Attachment #273382 - Flags: review+

Theppitak Karoonboonyanan

Assignee

Comment 142

•

17 years ago

Bug 389520 has been filed for Mac.

Keng Susumpow

Comment 143

•

17 years ago

Bug #390048 has been filed for Window

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

17 years ago

Attachment #268080 - Flags: review?(roc)

Theppitak Karoonboonyanan

Assignee

Comment 144

•

17 years ago

Attached patch Patch with pre-memset() and then return on failure — Details — Splinter Review

While the last patch is not checked-in yet, I update it for consistency with the patches proposed in the other two bugs.

Theppitak Karoonboonyanan

Assignee

Comment 145

•

17 years ago

Comment on attachment 274594 [details] [diff] [review]
Patch with pre-memset() and then return on failure

To really close this bug, could either this patch or Attachment #273382 [details] [diff] be checked-in?

Attachment #274594 - Flags: review?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

17 years ago

Attachment #274594 - Flags: superreview+

Attachment #274594 - Flags: review?(roc)

Attachment #274594 - Flags: review+

Attachment #274594 - Flags: approval1.9?

Damon Sicore (:damons)

Updated

•

17 years ago

Attachment #274594 - Flags: approval1.9? → approval1.9+

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 146

•

17 years ago

checked in attachment 274594 [details] [diff] [review]. Thanks!!!

Reed Loden [:reed]

Updated

•

17 years ago

Assignee: smontagu → thep

Reed Loden [:reed]

Updated

•

17 years ago

QA Contact: amyy → i18n

An experimental Pango patch 18 years ago Theppitak Karoonboonyanan 20.87 KB, patch		Details \| Diff \| Splinter Review
Experimental Pango patch, with missing change added 18 years ago Theppitak Karoonboonyanan 21.46 KB, patch		Details \| Diff \| Splinter Review
Updated patch, with reduced steps 17 years ago Theppitak Karoonboonyanan 13.93 KB, patch		Details \| Diff \| Splinter Review
Updated patch, fixing a flaw in range checking 17 years ago Theppitak Karoonboonyanan 13.94 KB, patch		Details \| Diff \| Splinter Review
Updated patch to handle UTF-16 properly 17 years ago Theppitak Karoonboonyanan 15.44 KB, patch		Details \| Diff \| Splinter Review
Patch with rearranged code 17 years ago Theppitak Karoonboonyanan 13.64 KB, patch		Details \| Diff \| Splinter Review
Patch with GDK dependency removed 17 years ago Theppitak Karoonboonyanan 12.96 KB, patch		Details \| Diff \| Splinter Review
Patch with no access to extra break pos 17 years ago Theppitak Karoonboonyanan 13.21 KB, patch	roc : review+ roc : superreview+	Details \| Diff \| Splinter Review
Patch for old Pango 17 years ago Theppitak Karoonboonyanan 13.21 KB, patch		Details \| Diff \| Splinter Review
Patch with PangoLanguage caching 17 years ago Theppitak Karoonboonyanan 13.28 KB, patch		Details \| Diff \| Splinter Review
Patch initializing static var directly 17 years ago Theppitak Karoonboonyanan 13.23 KB, patch	roc : review+ roc : superreview+	Details \| Diff \| Splinter Review
Patch with modified assertions 17 years ago Theppitak Karoonboonyanan 13.17 KB, patch		Details \| Diff \| Splinter Review
Patch with line break condition adjusted 17 years ago Theppitak Karoonboonyanan 13.19 KB, patch		Details \| Diff \| Splinter Review
Summarized changes to current trunk 17 years ago Theppitak Karoonboonyanan 5.89 KB, patch		Details \| Diff \| Splinter Review
Summarized changes to current trunk 17 years ago Theppitak Karoonboonyanan 9.61 KB, patch		Details \| Diff \| Splinter Review
Updated patch 17 years ago Theppitak Karoonboonyanan 6.24 KB, patch		Details \| Diff \| Splinter Review
First untested patch for nsJISx4501LineBreaker-gfx 17 years ago Theppitak Karoonboonyanan 13.05 KB, patch		Details \| Diff \| Splinter Review
untested patch using make flag 17 years ago Theppitak Karoonboonyanan 21.32 KB, patch		Details \| Diff \| Splinter Review
updated patch using make flag, modifying jis x4051 table 17 years ago Theppitak Karoonboonyanan 24.12 KB, patch		Details \| Diff \| Splinter Review
Updated patch, rewriting Prev(), Next(); using class table 17 years ago Theppitak Karoonboonyanan 34.20 KB, patch		Details \| Diff \| Splinter Review
updated patch, with WordMove() shared implementation 17 years ago Theppitak Karoonboonyanan 34.11 KB, patch	roc : review+ roc : superreview+	Details \| Diff \| Splinter Review
updated patch with warnings resolved 17 years ago Theppitak Karoonboonyanan 34.13 KB, patch		Details \| Diff \| Splinter Review
updated patch with complex line break avoided for non-JISx4051 chars 17 years ago Theppitak Karoonboonyanan 34.50 KB, patch	roc : review+ roc : superreview+	Details \| Diff \| Splinter Review
add pango/pango-break.h to system-headers [checked in] 17 years ago (not currently active) Ted Mielczarek 551 bytes, patch	dbaron : review+ dbaron : superreview+	Details \| Diff \| Splinter Review
Patch adding a missing return 17 years ago Theppitak Karoonboonyanan 826 bytes, patch	roc : review+ roc : superreview+	Details \| Diff \| Splinter Review
Patch with pre-memset() and then return on failure 17 years ago Theppitak Karoonboonyanan 1.10 KB, patch	roc : review+ roc : superreview+ damons : approval1.9+	Details \| Diff \| Splinter Review