Closed Bug 1213589 Opened 4 years ago Closed 4 years ago

[TSF] Converting Hangul to Hanja with MS Korean IME converts unexpected range of fixed string

Categories

(Core :: DOM: UI Events & Focus Handling, defect)

42 Branch
x86
Windows
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla45
Tracking Status
firefox41 --- wontfix
firefox42 --- wontfix
firefox43 --- wontfix
firefox44 --- wontfix
firefox45 --- fixed
firefox-esr38 --- unaffected

People

(Reporter: hong620, Assigned: masayuki)

References

(Blocks 3 open bugs, )

Details

(Keywords: inputmethod, Whiteboard: See bug 1229641 for the remaining edge cases)

Attachments

(10 files, 4 obsolete files)

12.43 KB, patch
smaug
: review+
Details | Diff | Splinter Review
30.25 KB, patch
smaug
: review+
Details | Diff | Splinter Review
3.40 KB, patch
smaug
: review+
Details | Diff | Splinter Review
40.14 KB, patch
smaug
: review+
Details | Diff | Splinter Review
5.59 KB, patch
smaug
: review+
Details | Diff | Splinter Review
7.40 KB, patch
Details | Diff | Splinter Review
96.20 KB, patch
smaug
: review+
Details | Diff | Splinter Review
100.94 KB, patch
smaug
: review+
Details | Diff | Splinter Review
10.64 KB, patch
smaug
: review+
Details | Diff | Splinter Review
9.06 KB, patch
Details | Diff | Splinter Review
User Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:41.0) Gecko/20100101 Firefox/41.0
Build ID: 20150929144111

Steps to reproduce:

type 안녕하세요
line-breaking
안
and push Hanja/Kanji Convert key from MS Korean IME.



Actual results:

IME's convert function is activated on wrong position focus.

http://i.imgur.com/roE6ScN.png
(FF 42)


Expected results:

convert function should be focus to second line's '안'

http://i.imgur.com/Ib9zLza.png
(Win7 notepad)
same as bug 1208043, malfunction is only affect in using <p></p> for lin-breaking sites.

[Naver], [Daum], [Dcinside], [Parkoz] and [Ruliweb] = bug exposed

[Gmail], [Google Groups], [Bugzilla], [Reddit] and [4chan] = no bug, just fine


when you typed in after line-breaking, MS IME's focus should be in second line's text.

but in FF41/FF42, it's focusing first line's text.
I tested it in Daum Mail with WYSIWIG HTML editor. I reproduced same bug in Windows 10(x64) Firefox 43.0a2 and Firefox 41.0.1.

I typed like below.
----
안녕하세요
안
---- (d, k, s, s, u, d, g, k, t, p, d, y, <enter key>, d, k, s)

The source is like below.
----
안녕하세요<p>안<br></p><br><p><br></p>
----

It seems like TSR's problem because e10s doesn't affect this bug.
This should be in my queue. But perhaps, we cannot take this bug into 42 since it's too late for taking risk.
Flags: needinfo?(masayuki)
Assignee: nobody → masayuki
Status: UNCONFIRMED → ASSIGNED
Component: Untriaged → Widget: Win32
Depends on: 1208043
Ever confirmed: true
Flags: needinfo?(masayuki)
Keywords: inputmethod
OS: Unspecified → Windows
Product: Firefox → Core
Hardware: Unspecified → x86
Summary: Text Services Framework make malfunction in Korean IME's Hanja Focusing → [TSF] Converting Hangul to Hanja with MS Korean IME converts unexpected range of fixed string
Hmm, I don't understand what happens. When typying ControlRight key, the previous Hangul character is committed. Then, restarting composition with odd range suddenly. I have no idea why MS Korean IME does specify such range...
Blocks: 1214973
If I don't insert a line break and press Ctrl at the end of "안녕하세요안", I see same result on Wordpad (TSF-aware) but not so on Notepad (TSF-unaware). Although, I'm not sure if the behavior on TSF-aware application is as you expected, but we should return \r\n at the end of block level element.
This is a bug of ContentEventHandler.
Component: Widget: Win32 → Event Handling
I think that the fix of this bug is risky due to changing ContentEventHandler behavior. So, we should give up this for 42.
also, Hanja/Kanji Convert key focus in single 'consonant' letter is not proper working on those former <p> line-break type boards in new TSF.

ex: type ㅁ and push conver key

http://i.imgur.com/jLWmznj.png
http://i.imgur.com/TLYQC0E.png

Win7 Notepad and Bugzilla input Form = Working fine

but in those tested korean input forms,

'single consonant letter Hanja/Kanji Covert' is sometimes works.
but while in some certain line-break, that's not works.

'single consonant letter Hanja/Kanji Covert' is need for Special Characters like ※ ― + Ⅲ 【 ㎨ Φ ⅔ ⒢ … ETC

i dunno how to test and check the cause.. :(
i think some codeable korean languagstic volunteer would be helped in this issues.

combining letters like hanguls, realization in IME software part is too professional complexity parts.
The patches for this bug become much risky. We can land them only on mozilla-central. So, as far as possible, we should fix this in this month (i.e., fixed in 44).
Unfortunately, we don't have much time to fix this bug completely even on 44 since I'm still struggling with last part of the series of my patches.

However, fortunately, even without the last part, probably, users won't realize there is this bug in daily use. See the test case:
http://jsfiddle.net/d_toybox/ztae99vf/

The first set and second set work fine to me with patched build but the last set still won't work well. But I believe that the last case is non-realistic case.

I'll prepare to request reviews without the last part.
Whiteboard: [leave open]
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f6dd84820ec6

I hit this assertion if ContentEventHandler inserts \n at each open tag of block level elements or replaced inline elements:
https://treeherder.mozilla.org/logviewer.html#?job_id=12848796&repo=try
>  23:53:22     INFO -  71 INFO TEST-START | dom/browser-element/mochitest/test_browserElement_inproc_CopyPaste.html
>  23:53:45     INFO -  INFO | automation.py | Application ran for: 0:04:16.844904
>  23:53:45     INFO -  INFO | zombiecheck | Reading PID log: /tmp/tmpc6QDx5pidlog
>  23:53:45     INFO -  /data/tombstones does not exist; tombstone check skipped
>  23:53:46  WARNING -  PROCESS-CRASH | java-exception | java.lang.IllegalArgumentException: invalid selection notification range at org.mozilla.gecko.GeckoEditable.onSelectionChange(GeckoEditable.java:910)

Does this mean the offset and length of NOTIFY_IME_OF_SELECTION_CHANGE are wrong? I wrote same assertion in TSFTextStore, but I cannot reproduce this. Any idea?
Flags: needinfo?(nchen)
(In reply to Masayuki Nakano (:masayuki) (Mozilla Japan) from comment #12)
> 
> Does this mean the offset and length of NOTIFY_IME_OF_SELECTION_CHANGE are
> wrong? I wrote same assertion in TSFTextStore, but I cannot reproduce this.
> Any idea?

I think the selection offsets are correct, but the offsets from NOTIFY_IME_OF_TEXT_CHANGE are wrong. Android code relies on NOTIFY_IME_OF_TEXT_CHANGE to know what the current text is. The selection offsets can become invalid if the text change offsets are wrong.
Flags: needinfo?(nchen)
Thank you, Jim. Good point. I'll check it tomorrow.
Sigh... Jim's advice (bug 1217669 comment 11) is very helpful to me to understand what happens. Thank you, Jim. However, I meet other bugs. nsContentIterator does not work fine if it's created when mutation observer is running. (It doesn't assume that child->GetParent()->IndexOf(child) never returns -1, but it's false!!)

Even after I hacked about this, I see a lot of NS_ASSERTION(). Additionally, I'm going to go TAPC until Wed. So, probably, I cannot fix this bug in 44.
No longer depends on: 1224101
The cause of this bug is our ContentEventHandler's flatten text generating rule.

We ignore all element nodes except <br> when generating flatten text from DOM tree. Therefore, for example, <p>ab</p><p>a</p> is converted to "aba". At trying to convert with MS-IME for Korean, it looks for the conversion target from caret position to start of the content or previous line breaker. Therefore, we can fix this bug if we redesign ContentEventHandler's rule to both open tags and closes tags of block level elements cause line breaks. I.e., if the result is "\nab\n\na\n", MS-IME for Korean stops looking for the conversion target at the line breaker before the second "a" when caret is after the second "a". Then, we can get same result with other TSF-aware application.

As I said in comment 10, I have patches which converts only open tags to line breakers. This fixes most cases of this bug, this is enough for Korean users in daily use.

First of all, we need to clean up ContentEventHandler for making the main patches simpler.

This patch makes low level methods in ContentEventHandler restrict the usage. 

* GetNativeTextLength() should work with text node.
* GetBRLength() should be renamed to GetNativeTextLengthBefore() since it computes native text length before the element.

Then, GetTextLength() becomes simpler.
Attachment #8686487 - Flags: review?(bugs)
This patch doesn't change any behavior. Just cleaning up ContentEventHandler.

* Make GenerateFlatTextContent() a member of ContentEventHandler.
* Rename GetFlatTextOffsetOfRange() to GetFlatTextLengthInRange() and GetFlatTextLengthBefore(). The new names must be clearer than GetFlatTextOffsetOfRange() what they do.
* Passing pairs of node and offset in its parent is messy. Therefore, this patch creates NodePosition struct for GetFlatTextLengthInRange().

Note that it doesn't make sense to use nsRange instead because nsRange isn't a good option for temporary object due to refcountable class and its behavior depends on DOM specs. NodePosition can represent only one position with additional information (to be added by following patch).
Attachment #8686495 - Flags: review?(bugs)
This is not smart change, but I have no better idea.

When nsIMutationObserver::ContentRemoved() is called, IMEContentObserver tries to compute the flatten text length of removed nodes. The most ancestor removed node is already removed from its parent's child node list but it still *thinks* it's still in current document. According to the explanation of nsIMutationObserver::ContentRemoved(), this is expected timing.

In this case, nsContentIterator has some bugs if Init(nsIDOMRange*) is called since the nsRange instance refers the root node which can be computed from removing node but the root node cannot access to the removing node from its parent.

For avoiding such case, we need to initialize nsContentIterator with the removing node as root content. Fortunately, in this case, the node of start position and end position are always same. And the start offset is always 0 and the end offset is always its child count. So, we can do iter->Init(aStartPosition.mNode). This is hacky but enough safe for now and the simplest fix of existing bug.
Attachment #8686509 - Flags: review?(bugs)
For inserting line breakers at open tags, we need to use pre order content iterator at generating flatten text and computing flatten text length. This change is the reason why we need to fix bug 1215798 and bug 1215816.
Attachment #8686511 - Flags: review?(bugs)
Before generating line breakers at each open tag, we need to sort out current rules and clean up the implementation.

I wrote the rules into the patch. Please check it. With this patch, all todo_is() are fixed, i.e., making the result of all existing tests consistent.
Attachment #8686513 - Flags: review?(bugs)
This is the main fix.

With this patch, [parent, index of node] and [node, 0] have different meaning. The former doesn't include the open tag of the node. The latter include open tag of the node.

Therefore, this change makes GetFlatTextLengthInRange() need to check if specified position includes open tag with NodePosition::mAfterOpenTag. This patch hides the flag with clearer name method and NodePositionBefore class.

The test changes in window_composition_text_querycontent.xul isn't changed the testing range. E.g., [<p>foo</p>] was tested, changed to <p>[foo]</p>. Following patch adds new test for checking the behavior at handling open tag.
Attachment #8686516 - Flags: review?(bugs)
Just adding new tests. But I found odd result at #11. Following patch fixes the bug.
Attachment #8686517 - Flags: review?(bugs)
If there is only inline level elements in contenteditable node, clicking in the element causes setting caret at [contenteditable node, 0]. So, eSetSelection event should behave same as so.
Attachment #8686519 - Flags: review?(bugs)
Although, I could've separated the preparation patches (part.1 - part.3) to smaller patches...

I struggled with this bug for 4 weeks. Therefore, I reviewed the complicated implementation deeply. So, my explanation of my patches are not enough to you. If so, let me know. I'll try to explain more.
Comment on attachment 8686516 [details] [diff] [review]
part.6 ContentEventHandler should insert line breaks at open tag of elements except non-replaced inline elements

Do we really want to rely on the element names here?

Have you asked roc about this? He might have some ideas for text handling, especially line breaks.
(In reply to Olli Pettay [:smaug] from comment #46)
> Comment on attachment 8686516 [details] [diff] [review]
> part.6 ContentEventHandler should insert line breaks at open tag of elements
> except non-replaced inline elements
> 
> Do we really want to rely on the element names here?

Yeah, I think so. In the testcase reported, we can test with "안녕하세요" this line too. When caret is at the end, "안녕" are converted by MS-IME for Korean at pressing ControlRight key.

Even if "하세" is bold text in Wordpad (Wordpad is a simple TSF-aware application bundled to Windows), we can get same result.

So, if we insert line breaks for inline elements too (in this case, <b></b>), MS-IME for Korean behaves different.

Another example, if a part of a Western language's word is colored by <span> or something, the word will be separated by line breaks if we insert line breaks at inline elements. Then, user cannot search the visible word as a word with dictionary look up service of Mac OS X.

> Have you asked roc about this? He might have some ideas for text handling,
> especially line breaks.

No, roc, do you have some idea? We should insert line breaks at least for block level elements for separating words/lines same as the rendering result when ContentEventHnadler generates flatten text. Then, MS-IME for Korean and Mac OS X's dictionary look up service work more naturally.

The reason why we don't check actual display CSS property is, when the display is changed, we need to send NOTIFY_IME_OF_TEXT_CHANGE but computing the range is too difficult since I have no idea to listen to the changed range from IMEContentObserver. Anyway, even if this is possible, we should put off to do that since current patches are enough big and risky. And not so important for usual rich text editor.
(In reply to Masayuki Nakano [:masayuki] (Mozilla Japan) from comment #47) 
> > Do we really want to rely on the element names here?
> 
> Yeah, I think so. In the testcase reported, we can test with "안녕하세요" this
> line too. When caret is at the end, "안녕" are converted by MS-IME for Korean
> at pressing ControlRight key.
> 
> Even if "하세" is bold text in Wordpad (Wordpad is a simple TSF-aware
> application bundled to Windows), we can get same result.
> 
> So, if we insert line breaks for inline elements too (in this case,
> <b></b>), MS-IME for Korean behaves different.
Oh, I didn't mean that. I meant that we should look at the layout information, not
element names


> The reason why we don't check actual display CSS property is, when the
> display is changed, we need to send NOTIFY_IME_OF_TEXT_CHANGE but computing
> the range is too difficult since I have no idea to listen to the changed
> range from IMEContentObserver. Anyway, even if this is possible, we should
> put off to do that since current patches are enough big and risky. And not
> so important for usual rich text editor.
Well, I don't see that layout approach being more risky, but would be conceptually right.
I wonder if we actually need to listen to the layout changes. What if we still just listen for the dom changes but when
dealing with line breaks, we use the layout information?
>> The reason why we don't check actual display CSS property is, when the
>> display is changed, we need to send NOTIFY_IME_OF_TEXT_CHANGE but computing
>> the range is too difficult since I have no idea to listen to the changed
>> range from IMEContentObserver. Anyway, even if this is possible, we should
>> put off to do that since current patches are enough big and risky. And not
>> so important for usual rich text editor.
> Well, I don't see that layout approach being more risky, but would be conceptually right.
> I wonder if we actually need to listen to the layout changes. What if we still just listen for the
> dom changes but when dealing with line breaks, we use the layout information?

If we could use layout information for generating flatten text, it'd be the best. In that case, IMEContentObserver should insert line breaks before and after creating block context. So, if an element becomes to create new block context or not to create that, line breaks become necessary or unnecessary. Therefore, it's very difficult to use layout information from IMEContentObserver...
I'm still missing something here.
How often do we need to create the flatten text?

The issue is that using just element names may lead to rather wrong behavior if one has styled those
element to be block level.

If we need to get updated when layout tree is reflowed, adding something to PresShell::DidDoReflow
might work quite well. Or just use nsIDocShell::addWeakReflowObserver
(it is a bit odd that the observers are in docshell, but that seems to be on purpose
https://bugzilla.mozilla.org/show_bug.cgi?id=453650#c31)
(In reply to Olli Pettay [:smaug] from comment #50)
> I'm still missing something here.
> How often do we need to create the flatten text?

It depends on IME. But much a lot of times. Especially, in e10s mode, we don't support synchronous query content events, therefore, ContentCacheInChild queries content every selection, content and position change notification and copy the result to ContentCacheInParent.

> The issue is that using just element names may lead to rather wrong behavior
> if one has styled those
> element to be block level.

Yes, you're right. But I think that it doesn't cause serious bugs.

For example, even if <li> is specified |display: inline;|, each contents of <li> elements are separated in semantics. So, perhaps, web designer does NOT make they rendered as a line.

Another example is, even if <span> is specified |display: block;|, |display: inline-block;| or |display: list-item;|. In such case, the <span> element contents is included in the line even if the rendering result doesn't look like same line. However, I don't think that this case occurs a lot. Especially in rich text editors. Therefore, I don't worry about this case.

> If we need to get updated when layout tree is reflowed, adding something to
> PresShell::DidDoReflow
> might work quite well.

Hmm, even in the method, there is no information about the modified range...

> Or just use nsIDocShell::addWeakReflowObserver
> (it is a bit odd that the observers are in docshell, but that seems to be on
> purpose
> https://bugzilla.mozilla.org/show_bug.cgi?id=453650#c31)

Currently, IMEContentObserver already uses it for notifying IME of position change. However, it causes a lot of unnecessary position change notifications because the methods of nsIReflowObserver doesn't have the detail of the changed range. (The methods only have timestamp)


I believe that mismatching actual display value and element's default display value is not so major case and even if we meet such case, the odd behavior is not more serious than this bug.
I test this bug one more time in https://jsfiddle.net/d_toybox/ztae99vf/ . Thanks to Masayuki-san, I can test this bug simply.

Unfortunaly, there is bug now but in some case, I can't reproduce this bug. At "안녕하세요\n안", When I add space and remove space after '안' at first line and push Hanja key(R-Ctrl), '안' in second line is converted to hanja. However, if I left the test page more than one min, bug is appeared.

I tested this bug  more cases. I did all test on https://jsfiddle.net/d_toybox/ztae99vf/ : 
1. "안녕하세요.\n안" ('.' is added) : When I push Hanja key after 안 of second line, 안 in second line is changed to hanja. This is same cases when I tested "안녕하세요안" in Wordpad.
2. "안녕하세요\n안심했어요" : When I push Hanja key after '심' letter in second line, 안녕 in first line is changed to hanja. I expected that 안심 will be changed to 安心.

3. "안녕하세요 안심했어요" (added space) : As I expected, 안심 is changed to Hanja.


Also, I found New case of bug. I think this bug is not only firefox's problem, but MS-IME's problem.

1. "안녕하세요안심했어요" (no line-break letter) : When I push Hanja key after '심' letter, '안녕' is changed to hanja. I think 안심 should be changed to Hanja. At wordpad, '안녕' is changed to hanja as I DIDN'T expected. This can be MS-IME's bug or just my misunderstand of MS-IME's hanja function.
(In reply to Maesil from comment #52)
> Also, I found New case of bug. I think this bug is not only firefox's
> problem, but MS-IME's problem.
> 
> 1. "안녕하세요안심했어요" (no line-break letter) : When I push Hanja key after '심'
> letter, '안녕' is changed to hanja. I think 안심 should be changed to Hanja. At
> wordpad, '안녕' is changed to hanja as I DIDN'T expected. This can be MS-IME's
> bug or just my misunderstand of MS-IME's hanja function.

Yeah, as far as I tested, MS-IME for Korean looks for converting range from previous character of caret position and to first \n or something. In other words, there are only Hangul characters, MS-IME for Korean ignores Korean words, this looks odd behavior. I think that we should report this to Microsoft, but I cannot explain about that since I don't understand Korean language. If you have an environment running Windows 10, you can use Windows Feedback app for reporting the MS-IME's bug.
(In reply to Masayuki Nakano [:masayuki] (Mozilla Japan) from comment #51)
> > The issue is that using just element names may lead to rather wrong behavior
> > if one has styled those
> > element to be block level.
> 
> Yes, you're right. But I think that it doesn't cause serious bugs.
let me ask this way. Would it cause harm if we didn't do element name based check where the patch is doing it, but
check there whether the relevant nsIFrame is inline?
(In reply to Olli Pettay [:smaug] from comment #54)
> (In reply to Masayuki Nakano [:masayuki] (Mozilla Japan) from comment #51)
> > > The issue is that using just element names may lead to rather wrong behavior
> > > if one has styled those
> > > element to be block level.
> > 
> > Yes, you're right. But I think that it doesn't cause serious bugs.
> let me ask this way. Would it cause harm if we didn't do element name based
> check where the patch is doing it, but
> check there whether the relevant nsIFrame is inline?

Yes. If the frame is changed from inline to block, a line break is inserted at its open tab. Unless we can notify IME of this change with the changed range, IME continues to query content with offset in old contents (i.e., doesn't include the line break), but the query result is computed with the new inserted line break.

This means that when IME tries to query contents, the result can be unexpected for the IME. In such case, we know some IME stop working in TSF mode on Windows, are confused and start to convert with wrong range's string on Mac.

Note that I'd like to use style information if it's possible (i.e., we can solve to compute the changed range issue). However, I don't have idea to do that. Checking element name is already used by editor (see nsHTMLEditor::NodeIsBlockStatic()). Therefore, I think that this approach is enough at least for now.
Editor ends up using ParserService in the background to get the list of "block" elements. Could we at least use that so that we don't have yet another list of elements which one needs to remember to modify when new elements are added.


(In reply to Masayuki Nakano [:masayuki] (Mozilla Japan) from comment #55)
> Yes. If the frame is changed from inline to block, a line break is inserted
> at its open tab. Unless we can notify IME of this change with the changed
> range, IME continues to query content with offset in old contents (i.e.,
> doesn't include the line break), but the query result is computed with the
> new inserted line break.
but doesn't the issue happen with the element name based approach too?
Say, you have <span> and then some js does span.style.display = "block";


> Note that I'd like to use style information if it's possible (i.e., we can
> solve to compute the changed range issue). 
still wondering if we could just after reflow check whether nsIFrame objects of the range the IMEContentObserver 
is interested in have changed.
But if that is not enough, fine, we could use the information from parserservice for now (or is there some reason to not use parserservice?).
(In reply to Olli Pettay [:smaug] from comment #56)
> Editor ends up using ParserService in the background to get the list of
> "block" elements. Could we at least use that so that we don't have yet
> another list of elements which one needs to remember to modify when new
> elements are added.

As far as I know, nsHTMLEditor currently use nsHTMLEditor::NodeIsBlockStatic() to check if an element is block:
http://mxr.mozilla.org/mozilla-central/source/editor/libeditor/nsHTMLEditor.cpp#736

This is NOT using nsIParseService. It directly checks the tag name.

However, I agree with making common utilities but looks like that nsIParserService needs two virtual calls per checking an element. We must not be able to ignore the cost of a lot of virtual calls with big document. So, I think that new utilities should create tables for each element to reduce a lot of virtual calls.

Anyway, I'd like to put off doing that to another bug. This bug must be a big problem for Korean users but we cannot uplift the patch due to risk management. So, the most important thing is, we need to fix this bug on m-c ASAP.

> (In reply to Masayuki Nakano [:masayuki] (Mozilla Japan) from comment #55)
> > Yes. If the frame is changed from inline to block, a line break is inserted
> > at its open tab. Unless we can notify IME of this change with the changed
> > range, IME continues to query content with offset in old contents (i.e.,
> > doesn't include the line break), but the query result is computed with the
> > new inserted line break.
> but doesn't the issue happen with the element name based approach too?
> Say, you have <span> and then some js does span.style.display = "block";

So, if we don't use layout information, this case won't be problem since current approach checks only DOM tree. Any DOM tree changes are observed with nsIMutationObserver.

> > Note that I'd like to use style information if it's possible (i.e., we can
> > solve to compute the changed range issue). 
> still wondering if we could just after reflow check whether nsIFrame objects
> of the range the IMEContentObserver 
> is interested in have changed.

Do you mean that we should check whole primary frames under mRootContent of IMEContentObserver? If so, I don't think that it's reasonable because the nodes may be too much number of elements in rich text editors.

> But if that is not enough, fine, we could use the information from
> parserservice for now (or is there some reason to not use parserservice?).

As I said above, I disagree with using parseservice for now. Next month has all hands in Orland, Christmas vacation of US/Europe people and endd year (and new year) vacation of Japan. So, it's not usual month, we may not have enough time to work on this. I believe that we should fix this bug in these 2 weeks (before the all hands) for landing them to m-c before next merge (45). If we missed that, next merge would be 25th, January. But I don't think we have much time in the cycle too.
And also, nsIParserService doesn't have enough methods for ContentEventHandler. ContentEventHandler needs to check should insert line break for ruby related elements even though they are phrase elements in the definition. And it cannot check if it's a replace element or not like <input>, <textarea>, <img> etc. So, anyway, we need more time to implement them.

Additionally, currently, there is a suggestion, nsHTMLEditor should check actual style (bug 1207086). If nsHTMLEditor will do so, anyway, ContentEventHandler needs to have independent method which is in the patch.
(In reply to Masayuki Nakano [:masayuki] (Mozilla Japan) from comment #57)
> As far as I know, nsHTMLEditor currently use
> nsHTMLEditor::NodeIsBlockStatic() to check if an element is block:
> http://mxr.mozilla.org/mozilla-central/source/editor/libeditor/nsHTMLEditor.
> cpp#736
> 
> This is NOT using nsIParseService. It directly checks the tag name.
See also http://mxr.mozilla.org/mozilla-central/source/editor/libeditor/nsHTMLEditor.cpp#763

> Anyway, I'd like to put off doing that to another bug. This bug must be a
> big problem for Korean users but we cannot uplift the patch due to risk
> management. So, the most important thing is, we need to fix this bug on m-c
> ASAP.
I'm just worried that this ends up fixing a subset of the issue. 
> > but doesn't the issue happen with the element name based approach too?
> > Say, you have <span> and then some js does span.style.display = "block";
> 
> So, if we don't use layout information, this case won't be problem since
> current approach checks only DOM tree. Any DOM tree changes are observed
> with nsIMutationObserver.
Now I'm a bit lost. Don't we end up handling new lines in wrong way if you treat
<span> as inline, in case it is actually block?

> I believe that we should fix this bug in these
> 2 weeks (before the all hands) for landing them to m-c before next merge
> (45). If we missed that, next merge would be 25th, January. But I don't
> think we have much time in the cycle too.
ok, fine, though, I still wonder why we aren't using parserservice here, when editor is using it.
(In reply to Olli Pettay [:smaug] from comment #59)
> (In reply to Masayuki Nakano [:masayuki] (Mozilla Japan) from comment #57)
> > As far as I know, nsHTMLEditor currently use
> > nsHTMLEditor::NodeIsBlockStatic() to check if an element is block:
> > http://mxr.mozilla.org/mozilla-central/source/editor/libeditor/nsHTMLEditor.
> > cpp#736
> > 
> > This is NOT using nsIParseService. It directly checks the tag name.
> See also
> http://mxr.mozilla.org/mozilla-central/source/editor/libeditor/nsHTMLEditor.
> cpp#763

Oh...

> > Anyway, I'd like to put off doing that to another bug. This bug must be a
> > big problem for Korean users but we cannot uplift the patch due to risk
> > management. So, the most important thing is, we need to fix this bug on m-c
> > ASAP.
> I'm just worried that this ends up fixing a subset of the issue. 

Yes. But at least for now, we should fix this bug in usual cases.

> > > but doesn't the issue happen with the element name based approach too?
> > > Say, you have <span> and then some js does span.style.display = "block";
> > 
> > So, if we don't use layout information, this case won't be problem since
> > current approach checks only DOM tree. Any DOM tree changes are observed
> > with nsIMutationObserver.
> Now I'm a bit lost. Don't we end up handling new lines in wrong way if you
> treat
> <span> as inline, in case it is actually block?

As I said in comment 51, if inline elements creates block context with display: block; or something, the behavior becomes odd for users. However, I still think that it's edge case.

> > I believe that we should fix this bug in these
> > 2 weeks (before the all hands) for landing them to m-c before next merge
> > (45). If we missed that, next merge would be 25th, January. But I don't
> > think we have much time in the cycle too.
> ok, fine, though, I still wonder why we aren't using parserservice here,
> when editor is using it.

nsIParserService::IsBlock() is useful but not enough for ContentEventHandler. And I really worry about its performance since ContentEventHandler may list up a lot of elements per query. I don't know why nsIPrarserService::IsBlock() needs to take element id instead of tag name directly.
Comment on attachment 8686487 [details] [diff] [review]
part.1 Make ContentEventHandler::GetTextLength() and GetNativeTextLength() called only with a text node

I don't yet understand the name GetNativeTextLengthBefore
Attachment #8686487 - Flags: review?(bugs) → review+
Comment on attachment 8686495 [details] [diff] [review]
part.2 Clean up GenerateFlatTextContent(), GelerateFlatFontRanges() and GetFlatTextOffsetOfRange() of ContentEventHandler

Hard to review. Hopefully I'm not missing any functionality changes here.
Attachment #8686495 - Flags: review?(bugs) → review+
Comment on attachment 8686509 [details] [diff] [review]
part.3 ContentEventHandler::GetFlatTextLengthInRange() should handle specially when it's called by nsIMutationObserver::ContentRemoved()

I wonder if IMEContentObserver should just update all its caches at the end of microtask if ContentAppended/Inserted/Removed or AttributeChanged or AttributeWill change has been called.
Or maybe update even less often - just before RefreshDriver tick or so.
I'm rather worried about the performance here, but I guess someone should
profile this code. Looks like the nsIMutationObserver code is originally from bug 88831, and I doubt its performance has ever really been measured.
But anyhow, that all is about some other bug.




>+  // This may be called for retrieving the text of removed nodes.  Even in this
>+  // case, the node thinks it's still in the tree because UnbindFromTree() will
>+  // be called after here.  However, the node was already removed from the
>+  // array of children of its parent.  So, be careful to handle this case.
>+  if (aIsRemovingNode) {
>+    DebugOnly<nsIContent*> parent = aStartPosition.mNode->GetParent();
>+    NS_ASSERTION(parent && parent->IndexOf(aStartPosition.mNode) == -1,
>+      "At removing the node, the node shouldn't be in the array of children "
>+      "of its parent");
>+    NS_ASSERTION(aStartPosition.mNode == aEndPosition.mNode,
>+      "At removing the node, start and end node should be same");
>+    NS_ASSERTION(aStartPosition.mOffset == 0,
>+      "When the node is being removed, the start offset should be 0");
>+    NS_ASSERTION(static_cast<uint32_t>(aEndPosition.mOffset) ==
>+                   aEndPosition.mNode->GetChildCount(),
>+      "When the node is being removed, the end offset should be child count");
>+    iter = NS_NewContentIterator();
>+    nsresult rv = iter->Init(aStartPosition.mNode);
>     if (NS_WARN_IF(NS_FAILED(rv))) {
>       return rv;
>     }
Please use MOZ_ASSERTs here and not NS_ASSERTION.


>   // Get the flatten text length in the range.
>+  // @param aIsRemovingNode     Should be true only when this is called from
>+  //                            nsIMutationObserver::ContentRemoved().
This needs some more explanation what aStartPosition/aEndPosition are pointing to then
and that aRootContent is still the root in the document.


>@@ -968,23 +968,22 @@ IMEContentObserver::ContentRemoved(nsIDo
>     offset = mStartOfRemovingTextRangeCache.mFlatTextLength;
>   }
> 
>   // get offset at the end of the deleted node
>   uint32_t textLength = 0;
>   if (aChild->IsNodeOfType(nsINode::eTEXT)) {
>     textLength = ContentEventHandler::GetNativeTextLength(aChild);
>   } else {
>-    uint32_t nodeLength =
>-      std::max(static_cast<int32_t>(aChild->GetChildCount()), 1);
>+    uint32_t nodeLength = static_cast<int32_t>(aChild->GetChildCount());
>     rv = ContentEventHandler::GetFlatTextLengthInRange(
>                                 NodePosition(aChild, 0),
>                                 NodePosition(aChild, nodeLength),
So this works also in the common case where nodeLenth is 0, right? Even in case the removed node is <br>?
(I'm trying to understand why the previous code had that max(..., 1))
This question answered and minor issues fixed, r+
Attachment #8686509 - Flags: review?(bugs) → review+
Attachment #8686511 - Flags: review?(bugs) → review+
Comment on attachment 8686513 [details] [diff] [review]
part.5 Redesign the rules to create range in ContentEventHandler::SetRangeFromFlatTextOffset()

># HG changeset patch
># User Masayuki Nakano <masayuki@d-toybox.com>
># Parent  b3b7c7136dd8c72997d3376cc9d217cf660d79df
>Bug 1213589 part.5 Redesign the rules to create range in ContentEventHandler::SetRangeFromFlatTextOffset() r=
>
>diff --git a/dom/events/ContentEventHandler.cpp b/dom/events/ContentEventHandler.cpp
>--- a/dom/events/ContentEventHandler.cpp
>+++ b/dom/events/ContentEventHandler.cpp
>@@ -35,16 +35,48 @@ namespace mozilla {
> 
> using namespace dom;
> using namespace widget;
> 
> /******************************************************************/
> /* ContentEventHandler                                            */
> /******************************************************************/
> 
>+// NOTE
>+//
>+// ContentEventHandler *creates* ranges as following rules:
>+// 1. Start of range:
>+//   1.1. When text node is start of a range, start node is the text node and
>+//        start offset is any number between 0 and the length of the text.
>+//   1.2. When before an element node is start of a range, start node is parent
>+//        of the element and start offset is the element's index in the parent.
>+//   1.3. When after an empty element node is start of a range, start node is
>+//        parent of the element and start offset is the element's index in the
>+//        parent + 1.
>+//   1.4. When start of a non-empty element is start of a range, start node is
>+//        the element and start offset is 0.
Had to read the code to understand what 1.2 and 1.4 mean. 
[<element> and <element>[ were useful, and I think they should be used in these comments too, 
also in other cases.



>+//   1.5. When start of a range is out of bounds, start node is the root node
>+//        and start offset is 0.
I wonder why this, and why not root, root.childNodes().length


>+// 2. End of range:
>+//   2.1. When text node is end of a range, end node is the text node and
>+//        end offset is any number between 0 and the length of the text.
>+//   2.2. When before an element node is end of a range, end node is previous
>+//        node causing text.
previous node of what? 'before an element node' is already previous node of some element.
end node is the last text node in the siblings? last element which has text nodes as descendants?
And in the code this is just
+      if (endOffset == offset) {
+        // Rule #2.2: ]<element>
+        MOZ_ASSERT(endOffset != offset,
+                   "This should've already been handled at previous node");
+        return NS_ERROR_FAILURE;
+      }
so nothing is done there. I'm missing something here.

>+//   2.4. When end of a range is out of bounds, end node is the root node and
>+//        end offset is number of the children.
ok, so 1.5 + 2.4 means that out of bounds range selects everything. I wonder why that.


>+// ContentEventHandler *treats* ranges as following additional rules:
>+// 1. When the start node is an element node which doesn't have children,
>+//    it includes its start (i.e., includes its open tag).
what does this mean?

>+// 2. When the end node is an element node which doesn't have children,
>+//    it includes the end (i.e., includes its end tag except empty element).
and this.
How are the open and end tags used?


>   bool startSet = false;
>   for (; !iter->IsDone(); iter->Next()) {
>     nsINode* node = iter->GetCurrentNode();
>     if (NS_WARN_IF(!node)) {
>       break;
>     }
>-    if (!node->IsContent()) {
>+    if (node == mRootContent || !node->IsContent()) {
>       continue;
>     }
I don't understand this change.
Attachment #8686513 - Flags: review?(bugs) → review-
Comment on attachment 8686516 [details] [diff] [review]
part.6 ContentEventHandler should insert line breaks at open tag of elements except non-replaced inline elements

>+ContentEventHandler::ShouldBreakLineBefore(nsIContent* aContent,
>+                                           nsINode* aRootNode)
>+{
>+  // We don't need to append linebreak at the start of the root element.
>+  if (aContent == aRootNode) {
>+    return false;
>+  }
>+
>+  if (!aContent->IsElement()) {
>+    return false;
>+  }
No need for this check since IsHTMLElement and
IsAnyOFHTMLElements check whether the node is actually an element.

>+
>+  if (aContent->IsHTMLElement(nsGkAtoms::br)) {
>+    return IsContentBR(aContent);
>+  }
>+
>+  // Note that ideally, we should refer the style of the primary frame of
>+  // aContent for deciding if it's an inline.  However, it's difficult
>+  // IMEContentObserver to notify IME of text change caused by style change.
>+  // Therefore, currently, we should check only from the tag for now.
>+  // XXX How to test if the element is an unknown element?
I think you need to add IID to HTMLUnknownElement and after IsAnyOfHTMLElements check QI to that.
That would prevent Custom elements to be handled here, though, Custom elements may have any kind of 
rendering. (so we probably need to fix this setup before enabling Custom Elements by default)

non-html elements should be handled the same way as unknown html elements, right?
Attachment #8686516 - Flags: review?(bugs) → review-
Attachment #8686519 - Flags: review?(bugs) → review+
Comment on attachment 8686517 [details] [diff] [review]
part.7 Add new testcases to runSetSelectionEventTest() and runQueryTextContentEventTest() for checking the behavior with open tag

mostly rs+
Attachment #8686517 - Flags: review?(bugs) → review+
(In reply to Olli Pettay [:smaug] from comment #63)
> Comment on attachment 8686509 [details] [diff] [review]
> part.3 ContentEventHandler::GetFlatTextLengthInRange() should handle
> specially when it's called by nsIMutationObserver::ContentRemoved()
> 
> I wonder if IMEContentObserver should just update all its caches at the end
> of microtask if ContentAppended/Inserted/Removed or AttributeChanged or
> AttributeWill change has been called.
> Or maybe update even less often - just before RefreshDriver tick or so.
> I'm rather worried about the performance here, but I guess someone should
> profile this code. Looks like the nsIMutationObserver code is originally
> from bug 88831, and I doubt its performance has ever really been measured.
> But anyhow, that all is about some other bug.

Honestly, IMEContentObserver's performance must be still bad in some cases but as far as we've reported, some of the worst scenario are inserting or removing large text cases. For that I added the cache mechanism in bug 496360 only for them.

> >   // Get the flatten text length in the range.
> >+  // @param aIsRemovingNode     Should be true only when this is called from
> >+  //                            nsIMutationObserver::ContentRemoved().
> This needs some more explanation what aStartPosition/aEndPosition are
> pointing to then
> and that aRootContent is still the root in the document.

Additional explanation is added by part.6. I keep current explanation in this patch.

> >@@ -968,23 +968,22 @@ IMEContentObserver::ContentRemoved(nsIDo
> >     offset = mStartOfRemovingTextRangeCache.mFlatTextLength;
> >   }
> > 
> >   // get offset at the end of the deleted node
> >   uint32_t textLength = 0;
> >   if (aChild->IsNodeOfType(nsINode::eTEXT)) {
> >     textLength = ContentEventHandler::GetNativeTextLength(aChild);
> >   } else {
> >-    uint32_t nodeLength =
> >-      std::max(static_cast<int32_t>(aChild->GetChildCount()), 1);
> >+    uint32_t nodeLength = static_cast<int32_t>(aChild->GetChildCount());
> >     rv = ContentEventHandler::GetFlatTextLengthInRange(
> >                                 NodePosition(aChild, 0),
> >                                 NodePosition(aChild, nodeLength),
> So this works also in the common case where nodeLenth is 0, right? Even in
> case the removed node is <br>?
> (I'm trying to understand why the previous code had that max(..., 1))
> This question answered and minor issues fixed, r+

Yes, this is the special case, aIsRemovingNode == true. In this case, GetFlatTextLengthInRange() initializes nsContentIterator as aChild is the root.
(In reply to Olli Pettay [:smaug] from comment #64)
> Comment on attachment 8686513 [details] [diff] [review]
> part.5 Redesign the rules to create range in
> ContentEventHandler::SetRangeFromFlatTextOffset()
> >+//   1.5. When start of a range is out of bounds, start node is the root node
> >+//        and start offset is 0.
> I wonder why this, and why not root, root.childNodes().length

Oh, yes. This is wrong. The code does right thing, though. (But this behavior is changed by part.8)

> >+// 2. End of range:
> >+//   2.1. When text node is end of a range, end node is the text node and
> >+//        end offset is any number between 0 and the length of the text.
> >+//   2.2. When before an element node is end of a range, end node is previous
> >+//        node causing text.
> previous node of what? 'before an element node' is already previous node of
> some element.

Hmm, I meant 'before and element node' is ']<element>'. So, I meant the previous causing text is a node which is iterated before the element node and caused some text.

> end node is the last text node in the siblings? last element which has text
> nodes as descendants?

The former, but not enough.

For example,

1. |text node]|<br>
2. <p>|text node]|</p><br>
3. <p>]<span></span></p><br>

Like this, when end of current node in the loop with nsContentIterator is the end of the range, the end of the range should be decided in this time.  In other words, the node immediately after the end of range shouldn't be handled in the loop (the loop should have finished).

> And in the code this is just
> +      if (endOffset == offset) {
> +        // Rule #2.2: ]<element>
> +        MOZ_ASSERT(endOffset != offset,
> +                   "This should've already been handled at previous node");
> +        return NS_ERROR_FAILURE;
> +      }
> so nothing is done there. I'm missing something here.

Therefore, if we hit this case, we have a bug. So, we need to to nothing here since this path shouldn't run actually. This is necessary for debug (with MOZ_ASSERT) and explicitly indicates an impossible case for other developers. (When they are reading this code and if there is no this block, they may realize the missing case and they must try to look for a bug to hit the case. But they'll fail to find such case because it never occurs.)

> >+//   2.4. When end of a range is out of bounds, end node is the root node and
> >+//        end offset is number of the children.
> ok, so 1.5 + 2.4 means that out of bounds range selects everything. I wonder
> why that.

So, explanation of 1.5 is wrong...

> >+// ContentEventHandler *treats* ranges as following additional rules:
> >+// 1. When the start node is an element node which doesn't have children,
> >+//    it includes its start (i.e., includes its open tag).
> what does this mean?

If start of the specified range is {<br>, 0}, the range should include the line break caused by <br>.

> >+// 2. When the end node is an element node which doesn't have children,
> >+//    it includes the end (i.e., includes its end tag except empty element).
> and this.
> How are the open and end tags used?

The end tag case isn't good example for now because the patches won't implement to insert line breaks at end tags of block level elements. Same as above, if end of the specified range is {<br>, 0}, the line break caused by the <br> should be included.

These cases are special cases if the caller doesn't use its parent element node at representing start or end of range at empty element node.

> >   bool startSet = false;
> >   for (; !iter->IsDone(); iter->Next()) {
> >     nsINode* node = iter->GetCurrentNode();
> >     if (NS_WARN_IF(!node)) {
> >       break;
> >     }
> >-    if (!node->IsContent()) {
> >+    if (node == mRootContent || !node->IsContent()) {
> >       continue;
> >     }
> I don't understand this change.

mRootContent shouldn't cause any text. Therefore, checking with the following complicated logic doesn't make sense. This change guarantees that mRooctContent never causes any text even if it's a <br> element (e.g., <br contenteditable>).
(In reply to Olli Pettay [:smaug] from comment #65)
> Comment on attachment 8686516 [details] [diff] [review]
> part.6 ContentEventHandler should insert line breaks at open tag of elements
> except non-replaced inline elements
> 
> >+ContentEventHandler::ShouldBreakLineBefore(nsIContent* aContent,
> >+                                           nsINode* aRootNode)
> >+{
> >+  // We don't need to append linebreak at the start of the root element.
> >+  if (aContent == aRootNode) {
> >+    return false;
> >+  }
> >+
> >+  if (!aContent->IsElement()) {
> >+    return false;
> >+  }
> No need for this check since IsHTMLElement and
> IsAnyOFHTMLElements check whether the node is actually an element.

Indeed.

> >+
> >+  if (aContent->IsHTMLElement(nsGkAtoms::br)) {
> >+    return IsContentBR(aContent);
> >+  }
> >+
> >+  // Note that ideally, we should refer the style of the primary frame of
> >+  // aContent for deciding if it's an inline.  However, it's difficult
> >+  // IMEContentObserver to notify IME of text change caused by style change.
> >+  // Therefore, currently, we should check only from the tag for now.
> >+  // XXX How to test if the element is an unknown element?
> I think you need to add IID to HTMLUnknownElement and after
> IsAnyOfHTMLElements check QI to that.
> That would prevent Custom elements to be handled here, though, Custom
> elements may have any kind of 
> rendering. (so we probably need to fix this setup before enabling Custom
> Elements by default)

Thanks!! I'll do that in new patch.

> non-html elements should be handled the same way as unknown html elements,
> right?

Yes. I think so. So, if the element node isn't an HTML element, this should return false.


Thank you for your review for such complicated patches! If I could've write them simpler, it'd be better though...
Including improvements of the explanation of GetFlatTextLengthInRange(). And the HTMLUnknownElement part is moved to part.9.
Attachment #8686516 - Attachment is obsolete: true
Attachment #8692339 - Flags: review?(bugs)
smaug: ping (we have only 4 days for this!)
Flags: needinfo?(bugs)
Attachment #8692340 - Flags: review?(bugs) → review+
Comment on attachment 8692337 [details] [diff] [review]
part.5 Redesign the rules to create range in ContentEventHandler::SetRangeFromFlatTextOffset()

Ah, bugzilla's diff actually works here generating useful interdiff.
Flags: needinfo?(bugs)
Attachment #8692337 - Flags: review?(bugs) → review+
Comment on attachment 8692339 [details] [diff] [review]
part.6 ContentEventHandler should insert line breaks at open tag of elements except non-replaced inline elements

But looks like bugzilla's diff doesn't like this one.
Could you perhaps provide an interdiff?
This is massive patch so hard to see what has changed from the previous version.

(or if not, I'll download both patches and run interdiff)
Attached patch Diff of part.6 (obsolete) — Splinter Review
Is this enough to you?
Attached patch Diff of part.6Splinter Review
Oh, this must be minimum diff.
Attachment #8693884 - Attachment is obsolete: true
Flags: needinfo?(bugs)
Flags: needinfo?(bugs)
Attachment #8692339 - Flags: review?(bugs) → review+
https://hg.mozilla.org/integration/mozilla-inbound/rev/cd5fdb842f905c8afccf35efa4dbb95d6981c675
Bug 1213589 part.1 Make ContentEventHandler::GetTextLength() and GetNativeTextLength() called only with a text node r=smaug

https://hg.mozilla.org/integration/mozilla-inbound/rev/d045e3b436223420edaf2933fb281d4f6cc464a3
Bug 1213589 part.2 Clean up GenerateFlatTextContent(), GelerateFlatFontRanges() and GetFlatTextOffsetOfRange() of ContentEventHandler r=smaug

https://hg.mozilla.org/integration/mozilla-inbound/rev/95efe6bad896355799aaaacea515f003c0c14f7e
Bug 1213589 part.3 ContentEventHandler::GetFlatTextLengthInRange() should handle specially when it's called by nsIMutationObserver::ContentRemoved() r=smaug

https://hg.mozilla.org/integration/mozilla-inbound/rev/afae79e8f7d2cbd4a0cdd18d8309053d4d8c8173
Bug 1213589 part.5 Redesign the rules to create range in ContentEventHandler::SetRangeFromFlatTextOffset() r=smaug

https://hg.mozilla.org/integration/mozilla-inbound/rev/23e3c6bf330f3e77834c2c08f40642a9731885ad
Bug 1213589 part.6 ContentEventHandler should insert line breaks at open tag of elements except non-replaced inline elements r=smaug

https://hg.mozilla.org/integration/mozilla-inbound/rev/be5d2f1ae122666135bbb4550bdc53f50643040d
Bug 1213589 part.7 Add new testcases to runSetSelectionEventTest() and runQueryTextContentEventTest() for checking the behavior with open tag r=smaug

https://hg.mozilla.org/integration/mozilla-inbound/rev/5b3f4ce479a5092965a56802193a7343d7008e63
Bug 1213589 part.8 When there are no nodes causing text, ContentEventHandler should set start of the editor root to start of the range r=smaug

https://hg.mozilla.org/integration/mozilla-inbound/rev/4434a5a96375d3ffe6e5cc05cab67b675b3772c1
Bug 1213589 part.9 ContentEventHandler::ShouldBreakLineBefore() should return false if the content is unknown HTML element r=smaug
Thank you very much, smaug!

And sorry for Korean users this fix won't be fixed until 45 since this is a problem of the design of internal module and the changes are too risky for uplifting.

I'll try to fix remaining problems in new bug because the number of comments are already 81, it's too long.
Whiteboard: [leave open]
I filed bug 1229641 for the close tag issue. Perhaps, I can work on it in Q1, 2016 but not so high priority because the case must be really rare.
Whiteboard: See bug 1229641 for the remaining edge cases
Duplicate of this bug: 1214973
Blocks: 1229991
Component: Event Handling → User events and focus handling
You need to log in before you can comment on or make changes to this bug.