Closed Bug 1642594 Opened 4 years ago Closed 4 years ago

Implement Blink-compat whitespace normalizer

Categories

(Core :: DOM: Editor, task, P2)

task

Tracking

()

RESOLVED FIXED
mozilla79
Tracking Status
firefox79 --- fixed

People

(Reporter: masayuki, Assigned: masayuki)

References

(Depends on 1 open bug, Blocks 3 open bugs)

Details

Attachments

(5 files, 1 obsolete file)

Our whitespace normalization is not based on clear rules. It replaces an ASCII whitespace or an NBSP when it detects not acceptable. Therefore, the result of whitespace sequence may be different, depending on typing order of space key for example. Additionally, our editor touches DOM tree a lot of times if there are some unexpected situation in whitespace sequences. This causes our behavior slow. And for getting the range to delete, we need to consider all ranges before touching the DOM tree.

Therefore, we need a new normalization rules. Unfortunately, our rules are completely different from Blink and WebKit. And their behavior is also different. But fortunately, Blink's behavior is enough reasonable and Blink has the biggest market share. So, we should take Blink's whitespace normalization rules as far as possible.

On the other hand, the change is too risky especially under COVID-19 situation. Therefore, we should ship it latter, but for enabling beforeinput event by default in Nightly channel, I need to implement it as soon as possible. So, binary size and the code become bigger though, we should have both normalization code and make them switchable with a pref.

(In reply to Masayuki Nakano [:masayuki] (he/him)(JST, +0900) from comment #0)

Our whitespace normalization is not based on clear rules. It replaces an ASCII whitespace or an NBSP when it detects not acceptable. Therefore, the result of whitespace sequence may be different, depending on typing order of space key for example. Additionally, our editor touches DOM tree a lot of times if there are some unexpected situation in whitespace sequences. This causes our behavior slow. And for getting the range to delete, we need to consider all ranges before touching the DOM tree.

Therefore, we need a new normalization rules. Unfortunately, our rules are completely different from Blink and WebKit. And their behavior is also different. But fortunately, Blink's behavior is enough reasonable and Blink has the biggest market share. So, we should take Blink's whitespace normalization rules as far as possible.

On the other hand, the change is too risky especially under COVID-19 situation. Therefore, we should ship it latter, but for enabling beforeinput event by default in Nightly channel, I need to implement it as soon as possible. So, binary size and the code become bigger though, we should have both normalization code and make them switchable with a pref.

I completely agree that normalizing on Blink's behavior is the right path, and having it behind a pref mitigates the risk. Once available, we should also consider a staged rollout of this feature.

Callers of WSRunObject::GetASCIIWhitespacesBounds() may want to scan only
previous or next whitespaces. Therefore, we can split it to save creation
cost of EditorDOMPointInText.

Additionally, this makes them scan whitespace sequence in a text node until
hitting its end for avoiding to use expensive API of WSRunScanner.

This patch makes the method not smart intentionally because we need only the
range without DOM mutation.

Depends on D77986

It's difficult to explain the rules of Gecko's traditional whitespace
normalization code because it may put an NBSP before modifying the DOM tree
and it may toggle specific points after handling top-level edit sub-action
from an ASCII whitespace or an NBSP to the other if it's needed. For example,
when you type whitespaces, all whitespaces except the last one are replaced
with NBSPs, but when you insert a whitespace between NBSPs, it's allowed,
not replaced with NBSP.

On the other hand, Blink uses clearer than Gecko's rules:

  • Basically, whitespace sequence should be pairs of an NBSP and an ASCII
    whitespace.
  • If inserting a whitespace at end of hard line, it's replaced with an NBSP.
  • If preceding text node ends with an NBSP, current text node can start with
    an ASCII whitespace.

In this patch, HTMLEditor::DeleteTextAndNormalizeSurroundingWhitespaces()
is a semi-public method for the normalizer users. It takes a range as 2
EditorDOMPointInText as deleting text range. And if the range follows
and/or following whitespaces, its helper method,
HTMLEditor::ExtendRangeToDeleteForNormalizingWhitespaceSequence(), extends
the range and offers normalized whitespaces to replace the range (The reason
why we should do this complicated thing is, modifying DOM tree a lot may
increase security risk and get worse performance. Therefore, we should
modify text once per text node in the range). Note that the helper method
also computes minimized range as far as possible because if all whitespaces
are replaced every time, a lot of ReplaceTextTransaction will have too
long text.

Unfortunately, I couldn't split this big patch to multiple parts because
it's impossible to check whether every new method is reasonable and correct
without other new methods.

Finally, this new behavior is disabled by default due to super risky change
and we're still under COVID-19 situation. However, it's enabled in all WPT
tests under editing and mochitests under editor/libeditor/tests for
checking the new behavior which needs to be shipped before beforeinput.

Depends on D77987

Attachment #9153718 - Attachment is obsolete: true

This patch adds new tentative WPTs which test Chrome's behavior at modifying
around white spaces. As mentioned in the test files, these tests just check
compatibility with Chrome, not suggesting any standards.

Depends on D77987

I realized that there is no word "whitespace" in formal English. This patch
replaces it with "white-space" in comments, and change method names to use
"WhiteSpace".

Depends on D78654

This patch tries to implement Blink-compat white-space normalizer for
HTMLEditor.

It's difficult to list up our traditional white-space normalization rules
because WSRunObject touches white space sequence only when there is not
acceptable case, e.g., an ASCII white-spaces will be adjacent to another
one, and replaces only unacceptable white-space only. Therefore, whether
white-space sequence may start with either an ASCII white-space or an NBSP.
On the other hand, Blink and WebKit makes white-space sequence always
starts with an NBSP or an ASCII white-space (unfortunately, they behave
differently!). So, for web-compat, we should simulate Blink's behavior
because either behavior is reasonable but Blink have more market share.

This patch simply adds new white-space normalization path for the new one,
and it's switchable with a pref, and still disabled by default.

The other reason why we should do this is, our traditional white-space
normalizer touches the DOM a lot of times per edit action, and the timing
is both before and after touches the DOM tree. Therefore, it's difficult
to compute actual deleting range for InputEvent.getTargetRanges() and
touching a lot of times causes running mutation event listeners a lot and
creates a lot of transaction class instances. So, new one have a lot of
merits:

  1. Improve web-compat
  2. Improve the peformance
  3. Improve the security
  4. Improve the footprint (but this is now worse then traditional one)
  5. Simplify the implementation

The new normalizer is mostly implemented with only 3 HTMLEditor methods.

One is HTMLEditor::DeleteTextAndNormalizeSurroundingWhiteSpaces(). This is
semi-public method for the edit action handlers. This takes a range with
2 EditorDOMPoinInText to delete the range simply. This also replaces
surrounding white-space sequence if necessary. For inserting text case,
this method also handles only white-space normalization when it's called
with collapsed range, i.e., same EditorDOMPointInText. This tries to use
RepaceTextWithTransaction() as far as possible to reduce creation cost of
transaction classes and the footprint.

Another one is HTMLEditor::ExtendRangeToDeleteWithNormalizingWhiteSpaces().
This tries to extend the given range to normalize surrounding white-spaces.
This is currently not optimized for footprint because this may include
white-spaces which do not need to be replaced. This optimization should be
done before shipping, but for now, enabling InputEvent.getTargetRanges() in
Nightly channel is more important. So that it should be done in a follow-up
bug.

The other is HTMLEditor::GenerateWhitepaceSequence(). This creates
normalized white-space sequence with surrounding character information.
For keeping this method simple as far as possible, we shouldn't optimize
the range of generation even in follow-ups.

Finally, the white-space sequence is not tested in mochitests, so that we
can enable this new normalizer when we run mochitests under
editor/libeditor/tests. However, WPT has some tests. We should keep
them running with current normalizer for checking regression. Instead,
we should enable the pref only for the new WPT added by the previous patch.

Depends on D78655

FYI: The last patch is not so risky for landing to Nightly because it creates new path which won't run with default settings.

Pushed by masayuki@d-toybox.com: https://hg.mozilla.org/integration/autoland/rev/289c293af80b part 1: Split `WSRunObject::GetASCIIWhitespacesBounds()` and redesign them r=m_kato https://hg.mozilla.org/integration/autoland/rev/9df3c183e402 part 2: Make `WSRunObject::ReplaceASCIIWhitespacesWithOneNBSP()` take range of collapsible ASCII whitespaces instead of a position in it r=m_kato https://hg.mozilla.org/integration/autoland/rev/f644f853e276 part 3: Add tentative WPT tests which test compatibility with Chrome r=m_kato https://hg.mozilla.org/integration/autoland/rev/2a1cceae6e2b part 4: Stop using "whitespace" in under libeditor r=m_kato https://hg.mozilla.org/integration/autoland/rev/8e0e2c27bc4b part 5: Implement first version of new white-space normalizer which simulates Blink's one r=m_kato
Created web-platform-tests PR https://github.com/web-platform-tests/wpt/pull/24145 for changes under testing/web-platform/tests
Upstream web-platform-tests status checks passed, PR will merge once commit reaches central.
Upstream PR merged by moz-wptsync-bot
Regressions: 1645983
Regressions: 1648564
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: