Closed Bug 1391538 Opened 2 years ago Closed 2 years ago

nsTextFragment for the text node in <input> or <textarea> should not store text as single byte characters even if all characters are less than U+0100

Categories

(Core :: DOM: Core & HTML, enhancement)

enhancement
Not set

Tracking

()

RESOLVED FIXED
mozilla57
Tracking Status
firefox57 --- fixed

People

(Reporter: masayuki, Assigned: masayuki)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

nsTextFragment is the storage of nsGenericDOMDataNode (nsTextNode). It stores text as single byte characters if all characters are less than 256. However, this packing makes damage of dynamical change of text nodes in web apps.

I *guess* that we don't need such saving anymore, however, at least in this bug, we should disable the single byte mode for text nodes in <input> and <textarea>.
Sounds reasonable to disable the memory optimization for input and textarea. I guess for any native anonymous content?
nsTextFragment methods could take a param to tell whether mIs2b mode should be forced and then
textnode implementation just passes information whether it is native anonymous or something like that.


(Removing the !mIs2b mode everywhere is way more controversial. It might increase memory usage quite a bit.)
(In reply to Olli Pettay [:smaug] from comment #1)
> Sounds reasonable to disable the memory optimization for input and textarea.
> I guess for any native anonymous content?

I have no idea. What's for example? (Anyway, should be changed in other bugs.)

> nsTextFragment methods could take a param to tell whether mIs2b mode should
> be forced and then
> textnode implementation just passes information whether it is native
> anonymous or something like that.

I add nsTextFrame::MarkAsMaybeModifiedFrequently() and calling it before calling SetText() or SetData(). I think that's easier to understand what is specifying rather than bool argument of constructor.

> (Removing the !mIs2b mode everywhere is way more controversial. It might
> increase memory usage quite a bit.)

Yeah, but it's only Western websites and no Emojis nor Unicode hyphens, spaces etc. I wonder the scan cost of adding text may make users feel worse than saving memory usage since if web content loads an image, it'll eats memory more than what we're saving in nsTextFragment.
(In reply to Masayuki Nakano [:masayuki] (JST, +0900) from comment #3)
> (In reply to Olli Pettay [:smaug] from comment #1)
> > Sounds reasonable to disable the memory optimization for input and textarea.
> > I guess for any native anonymous content?
> 
> I have no idea. What's for example? (Anyway, should be changed in other
> bugs.)
I mean since the content of <input> and <textarea> is native anonymous content, and it is fast to get data about a node being native anonymous, all such nodes can probably behave the same way.


> 
> > nsTextFragment methods could take a param to tell whether mIs2b mode should
> > be forced and then
> > textnode implementation just passes information whether it is native
> > anonymous or something like that.
> 
> I add nsTextFrame::MarkAsMaybeModifiedFrequently() and calling it before
> calling SetText() or SetData(). I think that's easier to understand what is
> specifying rather than bool argument of constructor.
Hmm. And what does MarkAsMaybeModifiedFrequently do? Do we have spare bits somewhere?

> Yeah, but it's only Western websites and no Emojis nor Unicode hyphens,
> spaces etc.
It is also about whitespace
(In reply to Olli Pettay [:smaug] from comment #4)
> (In reply to Masayuki Nakano [:masayuki] (JST, +0900) from comment #3)
> > (In reply to Olli Pettay [:smaug] from comment #1)
> > > Sounds reasonable to disable the memory optimization for input and textarea.
> > > I guess for any native anonymous content?
> > 
> > I have no idea. What's for example? (Anyway, should be changed in other
> > bugs.)
> I mean since the content of <input> and <textarea> is native anonymous
> content, and it is fast to get data about a node being native anonymous, all
> such nodes can probably behave the same way.

Ah, okay.

> > > nsTextFragment methods could take a param to tell whether mIs2b mode should
> > > be forced and then
> > > textnode implementation just passes information whether it is native
> > > anonymous or something like that.
> > 
> > I add nsTextFrame::MarkAsMaybeModifiedFrequently() and calling it before
> > calling SetText() or SetData(). I think that's easier to understand what is
> > specifying rather than bool argument of constructor.
> Hmm. And what does MarkAsMaybeModifiedFrequently do? Do we have spare bits
> somewhere?

Yeah, it sets new flag with nsINode::SetFlags() and nsTextFragment::SetTo() callers sets bool result of nsINode::GetFlags() to new argument.

> > Yeah, but it's only Western websites and no Emojis nor Unicode hyphens,
> > spaces etc.
> It is also about whitespace

If you say about shared storage of whitespaces in nsTextFragment, I think that they can be made to char16_t* if we do it.
Comment on attachment 8899020 [details]
Bug 1391538 - nsTextFragment for text nodes in <input> or <textarea> shouldn't store text as single byte characters even if all characters are less than U+0100

https://reviewboard.mozilla.org/r/170342/#review175570

Comments fixed, r+.


Not about this bug, but we might want to measure how often JS wants the data of text nodes, and perhaps mark such text nodes with NS_MAYBE_MODIFIED_FREQUENTLY.
Also storing the data using StringBuffer might make sense.

::: dom/base/nsTextFragment.cpp:334
(Diff revision 1)
>  nsTextFragment::Append(const char16_t* aBuffer, uint32_t aLength, bool aUpdateBidi)
>  {
>    // This is a common case because some callsites create a textnode
>    // with a value by creating the node and then calling AppendData.
>    if (mState.mLength == 0) {
> -    return SetTo(aBuffer, aLength, aUpdateBidi);
> +    return SetTo(aBuffer, aLength, aUpdateBidi, mState.mIs2b);

Why this way?
Couldn't you just add aForce2b param to Append and then the only caller of this method would use it.
That would be at least easier to follow.

::: editor/libeditor/EditorBase.cpp:4741
(Diff revision 1)
> +already_AddRefed<nsTextNode>
> +EditorBase::CreateTextNode(nsIDocument& aDocument,
> +                           const nsAString& aData)
> +{
> +  RefPtr<nsTextNode> text = aDocument.CreateEmptyTextNode();
> +  text->MarkAsMaybeModifiedFrequently();printf("%p, Marked!\n", text.get());

You have some left over printf here. Remove it.
Attachment #8899020 - Flags: review?(bugs) → review+
Comment on attachment 8899020 [details]
Bug 1391538 - nsTextFragment for text nodes in <input> or <textarea> shouldn't store text as single byte characters even if all characters are less than U+0100

https://reviewboard.mozilla.org/r/170342/#review175570

> Why this way?
> Couldn't you just add aForce2b param to Append and then the only caller of this method would use it.
> That would be at least easier to follow.

Ah, I misunderstood about nsTextFragment::AppendTo(). Okay, I can do it simply. (I thought that a lot of AppendTo() calls Append().)

> You have some left over printf here. Remove it.

Oops! Thank you for your nice catch!
Pushed by masayuki@d-toybox.com:
https://hg.mozilla.org/integration/autoland/rev/346b38526bdc
nsTextFragment for text nodes in <input> or <textarea> shouldn't store text as single byte characters even if all characters are less than U+0100 r=smaug
https://hg.mozilla.org/mozilla-central/rev/346b38526bdc
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Hmm, today's Nightly, I see 2ms in nsTextFragment.
https://perfht.ml/2wmO18s

> mozilla::EditorBase::SetTextImpl(mozilla::dom::Selection &,nsAString const &,mozilla::dom::Text &)
> nsGenericDOMDataNode::SetData(nsAString const &)
> nsGenericDOMDataNode::SetTextInternal(unsigned int,unsigned int,char16_t const *,unsigned int,bool,CharacterDataChangeInfo::Details *)
> nsTextFragment::SetTo(char16_t const *,int,bool,bool)
> mozilla::CheckedInt<unsigned int>::operator*=<unsigned __int64>(unsigned __int64)

and

> nsGenericDOMDataNode::SetTextInternal(unsigned int,unsigned int,char16_t const *,unsigned int,bool,CharacterDataChangeInfo::Details *)
> nsTextFragment::SetTo(char16_t const *,int,bool,bool)
> nsTextFragment::UpdateBidiFlag(char16_t const *,unsigned int)
> HasRTLChars(char16_t const *,unsigned int)

I should struggle with them. Perhaps, we can skip something for the both case.
Note that this may not just affect memory.  At one point we had a layout fast-path for ASCII-only text that was triggered only in the m1b textfragment case.  I don't know whether we still have that.
FWIW, I did some <textarea> profiling with lots of text and couldn't immediately see 2b usage related slowness when pasting or modifying the text.
(and need to disable spellcheck in such testing, since it is still too slow and happening too often. I'll file some bugs.)
Except that HasRTLChars does show up.
Depends on: 1394719
You need to log in before you can comment on or make changes to this bug.