Search string in Japanese doesn't match exactly

RESOLVED FIXED

Status

()

Core
Internationalization
RESOLVED FIXED
15 years ago
13 years ago

People

(Reporter: Koike Kazuhiko, Assigned: Jungshik Shin)

Tracking

({intl})

Trunk
x86
All
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 3 obsolete attachments)

(Reporter)

Description

15 years ago
Search string in Japanese doesn't match exactly if there is a linebreak
before kinsoku characters.

Original report in Bugzilla-jp:
http://bugzilla.mozilla.gr.jp/show_bug.cgi?id=2565

Testcase:
http://bugzilla.mozilla.gr.jp/showattachment.cgi?attach_id=1083

Patch:
http://bugzilla.mozilla.gr.jp/showattachment.cgi?attach_id=1105

Related: bug 135323
(Reporter)

Comment 1

15 years ago
Shanjian, can you review the patch by saito?

Comment 2

15 years ago
I am a little confused. Could you explain the problem and how your patch fix it
in English?  thanks.

Updated

15 years ago
Keywords: intl
QA Contact: ruixu → kasumi
(Reporter)

Comment 3

15 years ago
Steps to reproduce:

(1) Open <http://bugzilla.mozilla.gr.jp/showattachment.cgi?attach_id=1083>.
(2) Open "Find in This Page" dialog and enter "hairetsu" in Japanese.
(3) Push Find button twice.

"retsu no" is highlighted instead of "hairetsu" in the first paragraph.
If the paragraph doesn't have linebreaks in it, String search works fine.
Linebreaks are deleted in rendering, but aren't deleted for string search.

I missed the saito's second patch.
http://bugzilla.mozilla.gr.jp/showattachment.cgi?attach_id=1104

His patch deletes linebreaks for string search.

Comment 4

15 years ago
It supplements.
The kinsoku means Japanese hyphenation, however, a hyphen character does not
generate.
In a peculiar to Japanese, there is usually a character which is not placed at
the head of a line.
In the layout of a character, it means moving to the head of a line without the
Japanese character of several characters taking a break.

layout/html/base/src/nsTextFrame.cpp
-        if (1 == wordLen && contentLen == 2 && IS_CJ_CHAR(*bp)) {
+        if ((contentLen - wordLen) == 1 && IS_CJ_CHAR(*bp)) {

The above and a changed part mean that wordLen may become two or more.

Comment 5

15 years ago
It supplements also about the second patch.
http://bugzilla.mozilla.gr.jp/showattachment.cgi?attach_id=1104

When searching a string, in order to search to the original text containing the
new-line code between Kanji characters, the string cannot be searched.

However, it will not approve of this patch, since it is copying using new
operator.

Comment 6

15 years ago
A patch attachment 113902 [details] [diff] [review] of Bug 156369 has an effect on this ploblem.

Comment 7

13 years ago
I think both roy and me are off mozilla for more than 2 years. If these bugs are
still here now, I think the real stauts is 'won't fix'. If you want to reopen
it, please find a new owner for it first. 
Status: NEW → RESOLVED
Last Resolved: 13 years ago
Resolution: --- → WONTFIX

Comment 8

13 years ago
Mass Reassign Please excuse the spam
Assignee: tetsuroy → nobody

Comment 9

13 years ago
Mass Re-opening Bugs Frank Tang Closed on Wensday March 02 for no reason, all
the spam is his fault feel free to tar and feather him
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---

Comment 10

13 years ago
Reassigning Franks old bugs to Jungshik Shin for triage - Sorry for spam
Assignee: nobody → jshin1987
Status: REOPENED → NEW

Comment 11

13 years ago
Created attachment 177093 [details] [diff] [review]
patch

Updated

13 years ago
Attachment #177093 - Flags: review?(jshin1987)

Comment 12

13 years ago
jshin, I asked you to review. Please refer to comment #4 and comment #5 for the
reasons of the changes, that is, taking Japanese hyphenation into the changes of
Bug 135323 and ignoring one return code between kanjis for string-search.

Comment 13

13 years ago
Comment on attachment 177093 [details] [diff] [review]
patch

Sorry, I found a mistake.
Attachment #177093 - Attachment is obsolete: true
Attachment #177093 - Flags: review?(jshin1987)

Comment 14

13 years ago
Created attachment 177107 [details] [diff] [review]
patch

Comment 15

13 years ago
Created attachment 177109 [details] [diff] [review]
patch
Attachment #177107 - Attachment is obsolete: true

Updated

13 years ago
Attachment #177109 - Flags: review?(jshin1987)
(Assignee)

Updated

13 years ago
Attachment #177109 - Flags: superreview?(jst)
Attachment #177109 - Flags: review?(jshin1987)
Attachment #177109 - Flags: review?(akkzilla)

Comment 16

13 years ago
The nsFind.cpp test looks safe.  Can someone please run through the tests at
http://www.mozilla.org/quality/browser/front-end/testcases/xpapps-gui/find-on-page/find-in-page.html
and make sure it doesn't introduce any regressions?  (I'd check it myself, but
my devel machine is down with a bad motherboard and I'm leaving town tomorrow.)
 But it looks like this shouldn't affect anything but CJ characters, so I don't
expect regressions.  r=akkana on that part if someone runs through the tests.

Unfortunately I'm not familiar with the nsTextFrame.cpp code being modified, and
it'll be weeks before I could try to figure it out.  You should probably get
someone else to review that change.  This change particularly made me wonder:
-          i = contentLen;
+          i = 2;

If you can't find anyone else to review that file, I can look at it in a few
weeks if you explain what the changes are doing, particularly looping back from
2 instead of from contentLen.  The other change block after that makes more
sense, but I'm not clear why you left the first while (--i >= 0) in place while
introducing a second one (or is that just diff being wonky?)
Comment on attachment 177109 [details] [diff] [review]
patch

Looks right to me, I'll r=, but I think dbaron should have a look at at least
the nsTextFrame.cpp changes before this goes in. Requesting his sr...
Attachment #177109 - Flags: superreview?(jst)
Attachment #177109 - Flags: superreview?(dbaron)
Attachment #177109 - Flags: review?(akkzilla)
Attachment #177109 - Flags: review+
I've looked at the nsTextFrame.cpp change a whole bunch of times, and I still
have no idea what it's changing, or, for that matter, what the code that it's
modifying does.  Care to explain?

Comment 19

13 years ago
I think that the mapped index of the contents needs to match with the index of
the displaied texts. In a following example, the mapped index of |sp| is equal
to the mapped index of the next kanji character if this patch is applied.

contents         kk sp kk kk kk  (|kk| means a kanji, |sp| means a white space)
displaied texts  kk kk kk kk

mapped index of the contents
                 kk sp kk kk kk
                  0  1  1  2  3

|kk| is obtained by the first call of GetNextWord and |kkkkkk| is obtained by
the next call of it. |sp| is skipped by the second call. |kkkkkk| means that
these characters are continuing and it cannot divide with reason of Japanese
hyphenation.
(Assignee)

Comment 20

13 years ago
(In reply to comment #19)

> contents         kk sp kk kk kk  (|kk| means a kanji, |sp| means a white space)
> displaied texts  kk kk kk kk
> 
> mapped index of the contents
>                  kk sp kk kk kk
>                   0  1  1  2  3

Without the patch, 'kk sp kk sp(or other word-delimeters) .....' can be handled,
but 'kk sp kk kk kk ...' cannot be handled. Am I right? 

Comment 21

13 years ago
> Without the patch, 'kk sp kk sp(or other word-delimeters) .....' can be handled,

It can be handled.

> but 'kk sp kk kk kk ...' cannot be handled. Am I right? 

When |kkkkkk| can not be divided with reason of Japanese hyphenation, it can not
be handled.

Comment 22

13 years ago
> contents         kk sp kk kk kk  (|kk| means a kanji, |sp| means a white space)

I correct a mistake.&#12288;|sp| means a '\n'.
The skipped character is one '\n' between CJ characters but not white space.

Updated

13 years ago
Depends on: 289857

Comment 23

13 years ago
Comment on attachment 177109 [details] [diff] [review]
patch

Reported problem will be solved by the fix of Bug 289857. However, the problem
which cannot search CJ characters containing '\n' still remains. I will post a
patch except for changes to nsTextFrame::PrepareUnicodeText.
Attachment #177109 - Flags: superreview?(dbaron)

Comment 24

13 years ago
Created attachment 181498 [details] [diff] [review]
patch

patch to skip a '\n' between CJ characters in order to search only displayed CJ
characters.
Attachment #177109 - Attachment is obsolete: true
Attachment #181498 - Flags: superreview?(jst)
Attachment #181498 - Flags: review?(jst)
Comment on attachment 181498 [details] [diff] [review]
patch

r+sr=jst
Attachment #181498 - Flags: superreview?(jst)
Attachment #181498 - Flags: superreview+
Attachment #181498 - Flags: review?(jst)
Attachment #181498 - Flags: review+

Comment 26

13 years ago
Comment on attachment 181498 [details] [diff] [review]
patch

In searching text, this patch affect only a '\n' with CJ characters, I think
that this patch is low-risk.
Attachment #181498 - Flags: approval1.8b2?

Comment 27

13 years ago
Comment on attachment 181498 [details] [diff] [review]
patch

a=asa
Attachment #181498 - Flags: approval1.8b2? → approval1.8b2+

Comment 28

13 years ago
Johnny, this patch was approved for 1.8b2. Could you check in to the trunk?

Comment 29

13 years ago
jst, could you check in to the trunk and mark FIXED?
(Assignee)

Comment 30

13 years ago
landed on the trunk

Checking in embedding/components/find/src/nsFind.cpp;
/cvsroot/mozilla/embedding/components/find/src/nsFind.cpp,v  <--  nsFind.cpp
new revision: 1.32; previous revision: 1.31
Status: NEW → RESOLVED
Last Resolved: 13 years ago13 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.