Implement Blink-compat whitespace normalizer
Categories
(Core :: DOM: Editor, task, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox79 | --- | fixed |
People
(Reporter: masayuki, Assigned: masayuki)
References
(Depends on 1 open bug, Blocks 3 open bugs)
Details
Attachments
(5 files, 1 obsolete file)
Our whitespace normalization is not based on clear rules. It replaces an ASCII whitespace or an NBSP when it detects not acceptable. Therefore, the result of whitespace sequence may be different, depending on typing order of space key for example. Additionally, our editor touches DOM tree a lot of times if there are some unexpected situation in whitespace sequences. This causes our behavior slow. And for getting the range to delete, we need to consider all ranges before touching the DOM tree.
Therefore, we need a new normalization rules. Unfortunately, our rules are completely different from Blink and WebKit. And their behavior is also different. But fortunately, Blink's behavior is enough reasonable and Blink has the biggest market share. So, we should take Blink's whitespace normalization rules as far as possible.
On the other hand, the change is too risky especially under COVID-19 situation. Therefore, we should ship it latter, but for enabling beforeinput
event by default in Nightly channel, I need to implement it as soon as possible. So, binary size and the code become bigger though, we should have both normalization code and make them switchable with a pref.
Comment 1•4 years ago
|
||
(In reply to Masayuki Nakano [:masayuki] (he/him)(JST, +0900) from comment #0)
Our whitespace normalization is not based on clear rules. It replaces an ASCII whitespace or an NBSP when it detects not acceptable. Therefore, the result of whitespace sequence may be different, depending on typing order of space key for example. Additionally, our editor touches DOM tree a lot of times if there are some unexpected situation in whitespace sequences. This causes our behavior slow. And for getting the range to delete, we need to consider all ranges before touching the DOM tree.
Therefore, we need a new normalization rules. Unfortunately, our rules are completely different from Blink and WebKit. And their behavior is also different. But fortunately, Blink's behavior is enough reasonable and Blink has the biggest market share. So, we should take Blink's whitespace normalization rules as far as possible.
On the other hand, the change is too risky especially under COVID-19 situation. Therefore, we should ship it latter, but for enabling
beforeinput
event by default in Nightly channel, I need to implement it as soon as possible. So, binary size and the code become bigger though, we should have both normalization code and make them switchable with a pref.
I completely agree that normalizing on Blink's behavior is the right path, and having it behind a pref mitigates the risk. Once available, we should also consider a staged rollout of this feature.
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 2•4 years ago
|
||
Callers of WSRunObject::GetASCIIWhitespacesBounds()
may want to scan only
previous or next whitespaces. Therefore, we can split it to save creation
cost of EditorDOMPointInText
.
Additionally, this makes them scan whitespace sequence in a text node until
hitting its end for avoiding to use expensive API of WSRunScanner
.
Assignee | ||
Comment 3•4 years ago
|
||
This patch makes the method not smart intentionally because we need only the
range without DOM mutation.
Depends on D77986
Assignee | ||
Comment 4•4 years ago
|
||
It's difficult to explain the rules of Gecko's traditional whitespace
normalization code because it may put an NBSP before modifying the DOM tree
and it may toggle specific points after handling top-level edit sub-action
from an ASCII whitespace or an NBSP to the other if it's needed. For example,
when you type whitespaces, all whitespaces except the last one are replaced
with NBSPs, but when you insert a whitespace between NBSPs, it's allowed,
not replaced with NBSP.
On the other hand, Blink uses clearer than Gecko's rules:
- Basically, whitespace sequence should be pairs of an NBSP and an ASCII
whitespace. - If inserting a whitespace at end of hard line, it's replaced with an NBSP.
- If preceding text node ends with an NBSP, current text node can start with
an ASCII whitespace.
In this patch, HTMLEditor::DeleteTextAndNormalizeSurroundingWhitespaces()
is a semi-public method for the normalizer users. It takes a range as 2
EditorDOMPointInText
as deleting text range. And if the range follows
and/or following whitespaces, its helper method,
HTMLEditor::ExtendRangeToDeleteForNormalizingWhitespaceSequence()
, extends
the range and offers normalized whitespaces to replace the range (The reason
why we should do this complicated thing is, modifying DOM tree a lot may
increase security risk and get worse performance. Therefore, we should
modify text once per text node in the range). Note that the helper method
also computes minimized range as far as possible because if all whitespaces
are replaced every time, a lot of ReplaceTextTransaction
will have too
long text.
Unfortunately, I couldn't split this big patch to multiple parts because
it's impossible to check whether every new method is reasonable and correct
without other new methods.
Finally, this new behavior is disabled by default due to super risky change
and we're still under COVID-19 situation. However, it's enabled in all WPT
tests under editing
and mochitests under editor/libeditor/tests
for
checking the new behavior which needs to be shipped before beforeinput
.
Depends on D77987
Updated•4 years ago
|
Assignee | ||
Comment 5•4 years ago
|
||
This patch adds new tentative WPTs which test Chrome's behavior at modifying
around white spaces. As mentioned in the test files, these tests just check
compatibility with Chrome, not suggesting any standards.
Depends on D77987
Assignee | ||
Comment 6•4 years ago
|
||
I realized that there is no word "whitespace" in formal English. This patch
replaces it with "white-space" in comments, and change method names to use
"WhiteSpace".
Depends on D78654
Assignee | ||
Comment 7•4 years ago
|
||
This patch tries to implement Blink-compat white-space normalizer for
HTMLEditor
.
It's difficult to list up our traditional white-space normalization rules
because WSRunObject
touches white space sequence only when there is not
acceptable case, e.g., an ASCII white-spaces will be adjacent to another
one, and replaces only unacceptable white-space only. Therefore, whether
white-space sequence may start with either an ASCII white-space or an NBSP.
On the other hand, Blink and WebKit makes white-space sequence always
starts with an NBSP or an ASCII white-space (unfortunately, they behave
differently!). So, for web-compat, we should simulate Blink's behavior
because either behavior is reasonable but Blink have more market share.
This patch simply adds new white-space normalization path for the new one,
and it's switchable with a pref, and still disabled by default.
The other reason why we should do this is, our traditional white-space
normalizer touches the DOM a lot of times per edit action, and the timing
is both before and after touches the DOM tree. Therefore, it's difficult
to compute actual deleting range for InputEvent.getTargetRanges()
and
touching a lot of times causes running mutation event listeners a lot and
creates a lot of transaction class instances. So, new one have a lot of
merits:
- Improve web-compat
- Improve the peformance
- Improve the security
- Improve the footprint (but this is now worse then traditional one)
- Simplify the implementation
The new normalizer is mostly implemented with only 3 HTMLEditor
methods.
One is HTMLEditor::DeleteTextAndNormalizeSurroundingWhiteSpaces()
. This is
semi-public method for the edit action handlers. This takes a range with
2 EditorDOMPoinInText
to delete the range simply. This also replaces
surrounding white-space sequence if necessary. For inserting text case,
this method also handles only white-space normalization when it's called
with collapsed range, i.e., same EditorDOMPointInText
. This tries to use
RepaceTextWithTransaction()
as far as possible to reduce creation cost of
transaction classes and the footprint.
Another one is HTMLEditor::ExtendRangeToDeleteWithNormalizingWhiteSpaces()
.
This tries to extend the given range to normalize surrounding white-spaces.
This is currently not optimized for footprint because this may include
white-spaces which do not need to be replaced. This optimization should be
done before shipping, but for now, enabling InputEvent.getTargetRanges()
in
Nightly channel is more important. So that it should be done in a follow-up
bug.
The other is HTMLEditor::GenerateWhitepaceSequence()
. This creates
normalized white-space sequence with surrounding character information.
For keeping this method simple as far as possible, we shouldn't optimize
the range of generation even in follow-ups.
Finally, the white-space sequence is not tested in mochitests, so that we
can enable this new normalizer when we run mochitests under
editor/libeditor/tests
. However, WPT has some tests. We should keep
them running with current normalizer for checking regression. Instead,
we should enable the pref only for the new WPT added by the previous patch.
Depends on D78655
Assignee | ||
Comment 8•4 years ago
|
||
FYI: The last patch is not so risky for landing to Nightly because it creates new path which won't run with default settings.
Comment 12•4 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/289c293af80b
https://hg.mozilla.org/mozilla-central/rev/9df3c183e402
https://hg.mozilla.org/mozilla-central/rev/f644f853e276
https://hg.mozilla.org/mozilla-central/rev/2a1cceae6e2b
https://hg.mozilla.org/mozilla-central/rev/8e0e2c27bc4b
Assignee | ||
Updated•4 years ago
|
Description
•