Open Bug 1849302 Opened 1 years ago Updated 1 year ago

Implement faster converter from String/nsAtom to nsHTMLTag

Categories

(Core :: DOM: Editor, task, P3)

task

Tracking

()

ASSIGNED

People

(Reporter: masayuki, Assigned: masayuki)

References

(Blocks 1 open bug)

Details

(Whiteboard: [sp3])

No description provided.

I applied that patch locally and profiled, and it actually makes things quite a bit slower. I hadn't expected that.

I profiled by running https://shell--speedometer-preview.netlify.app/?suite=Editor-TipTap&iterationCount=100#summary; I used the regular Gecko profiler, with the "No marker stacks" feature checked.

I also counted the frequency of elements in invocations to ContentEventHandler::ShouldBreakLineBefore, using counts:

20670802 counts
(  1) 10029600 (48.5%, 48.5%): tagName: strong
(  2)  9470200 (45.8%, 94.3%): tagName: p
(  3)   439200 ( 2.1%, 96.5%): tagName: br
(  4)   320300 ( 1.5%, 98.0%): tagName: h2
(  5)   206400 ( 1.0%, 99.0%): tagName: h3
(  6)    75200 ( 0.4%, 99.4%): tagName: h1
(  7)    74700 ( 0.4%, 99.7%): tagName: em
(  8)    34600 ( 0.2%, 99.9%): tagName: li
(  9)    11500 ( 0.1%,100.0%): tagName: blockquote
( 10)     9100 ( 0.0%,100.0%): tagName: ul

In the end, the fastest approach was to check for nsGkAtoms::strong and nsGkAtoms::p explicitly before going the generic path.

Profiles:
https://share.firefox.dev/3OyMb9W (184 samples in MaybeNotifyIMEOfAddedTextDuringDocumentChange): base
https://share.firefox.dev/44ila0i (195 samples in MaybeNotifyIMEOfAddedTextDuringDocumentChange): with emilio's patch (shouldn't be slower, this is probably just noise)
https://share.firefox.dev/44nVQq4 (226 samples in MaybeNotifyIMEOfAddedTextDuringDocumentChange): with patch from comment 1
https://share.firefox.dev/3OK4Q2C (203 samples in MaybeNotifyIMEOfAddedTextDuringDocumentChange): with nsHTMLTags::CaseSensitiveStringBytesToId(char16ptr_t aBytes, size_t aLength) and ArrayEqual
https://share.firefox.dev/3YG2UwQ (199 samples in MaybeNotifyIMEOfAddedTextDuringDocumentChange): with the nested switch copied into ShouldBreakLineBefore, and return true on all switch cases of block elements
https://share.firefox.dev/3P5lfAs (138 samples in MaybeNotifyIMEOfAddedTextDuringDocumentChange): with up-front p and strong checking

Here's what I ended up with:

/* static */
bool ContentEventHandler::ShouldBreakLineBefore(const nsIContent& aContent,
                                                const Element* aRootElement) {
  // We don't need to append linebreak at the start of the root element.
  if (&aContent == aRootElement) {
    return false;
  }

  // If it's not an HTML element (including other markup language's elements),
  // we shouldn't insert like break before that for now.  Becoming this is a
  // problem must be edge case.  E.g., when ContentEventHandler is used with
  // MathML or SVG elements.
  if (!aContent.IsHTMLElement()) {
    return false;
  }

  // Note that ideally, we should refer the style of the primary frame of
  // aContent for deciding if it's an inline.  However, it's difficult
  // IMEContentObserver to notify IME of text change caused by style change.
  // Therefore, currently, we should check only from the tag for now.
  nsAtom* tagName = aContent.NodeInfo()->NameAtom();

  // Check some common elements first.
  if (tagName == nsGkAtoms::strong || tagName == nsGkAtoms::em) {
    return false;
  }
  if (tagName == nsGkAtoms::p) {
    return true;
  }
  if (tagName == nsGkAtoms::br) {
    // If the element is <br>, we need to check if the <br> is caused by web
    // content.  Otherwise, i.e., it's caused by internal reason of Gecko,
    // it shouldn't be exposed as a line break to flatten text.
    return IsContentBR(aContent);
  }

  // Order by common-ness
  // TODO: find some real-world data
  return tagName == nsGkAtoms::div || tagName == nsGkAtoms::h1 ||
         tagName == nsGkAtoms::h2 || tagName == nsGkAtoms::h3 ||
         tagName == nsGkAtoms::h4 || tagName == nsGkAtoms::h5 ||
         tagName == nsGkAtoms::h6 || tagName == nsGkAtoms::li ||
         tagName == nsGkAtoms::ul || tagName == nsGkAtoms::ol ||
         tagName == nsGkAtoms::td || tagName == nsGkAtoms::th ||
         tagName == nsGkAtoms::tr || tagName == nsGkAtoms::q ||
         tagName == nsGkAtoms::dd || tagName == nsGkAtoms::dl ||
         tagName == nsGkAtoms::dt || tagName == nsGkAtoms::hr ||
         tagName == nsGkAtoms::rb || tagName == nsGkAtoms::rp ||
         tagName == nsGkAtoms::rt || tagName == nsGkAtoms::col ||
         tagName == nsGkAtoms::dir || tagName == nsGkAtoms::img ||
         tagName == nsGkAtoms::map || tagName == nsGkAtoms::nav ||
         tagName == nsGkAtoms::pre || tagName == nsGkAtoms::rtc ||
         tagName == nsGkAtoms::wbr || tagName == nsGkAtoms::xmp ||
         tagName == nsGkAtoms::area || tagName == nsGkAtoms::base ||
         tagName == nsGkAtoms::body || tagName == nsGkAtoms::form ||
         tagName == nsGkAtoms::head || tagName == nsGkAtoms::html ||
         tagName == nsGkAtoms::link || tagName == nsGkAtoms::main ||
         tagName == nsGkAtoms::menu || tagName == nsGkAtoms::meta ||
         tagName == nsGkAtoms::nobr || tagName == nsGkAtoms::ruby ||
         tagName == nsGkAtoms::slot || tagName == nsGkAtoms::text ||
         tagName == nsGkAtoms::aside || tagName == nsGkAtoms::audio ||
         tagName == nsGkAtoms::embed || tagName == nsGkAtoms::frame ||
         tagName == nsGkAtoms::image || tagName == nsGkAtoms::input ||
         tagName == nsGkAtoms::label || tagName == nsGkAtoms::meter ||
         tagName == nsGkAtoms::param || tagName == nsGkAtoms::style ||
         tagName == nsGkAtoms::table || tagName == nsGkAtoms::tbody ||
         tagName == nsGkAtoms::tfoot || tagName == nsGkAtoms::thead ||
         tagName == nsGkAtoms::title || tagName == nsGkAtoms::track ||
         tagName == nsGkAtoms::video || tagName == nsGkAtoms::applet ||
         tagName == nsGkAtoms::button || tagName == nsGkAtoms::canvas ||
         tagName == nsGkAtoms::center || tagName == nsGkAtoms::dialog ||
         tagName == nsGkAtoms::figure || tagName == nsGkAtoms::footer ||
         tagName == nsGkAtoms::header || tagName == nsGkAtoms::hgroup ||
         tagName == nsGkAtoms::iframe || tagName == nsGkAtoms::keygen ||
         tagName == nsGkAtoms::legend || tagName == nsGkAtoms::object ||
         tagName == nsGkAtoms::option || tagName == nsGkAtoms::output ||
         tagName == nsGkAtoms::script || tagName == nsGkAtoms::search ||
         tagName == nsGkAtoms::select || tagName == nsGkAtoms::source ||
         tagName == nsGkAtoms::address || tagName == nsGkAtoms::article ||
         tagName == nsGkAtoms::bgsound || tagName == nsGkAtoms::caption ||
         tagName == nsGkAtoms::comment || tagName == nsGkAtoms::details ||
         tagName == nsGkAtoms::listing || tagName == nsGkAtoms::marquee ||
         tagName == nsGkAtoms::newline || tagName == nsGkAtoms::noembed ||
         tagName == nsGkAtoms::picture || tagName == nsGkAtoms::section ||
         tagName == nsGkAtoms::summary || tagName == nsGkAtoms::basefont ||
         tagName == nsGkAtoms::colgroup || tagName == nsGkAtoms::datalist ||
         tagName == nsGkAtoms::fieldset || tagName == nsGkAtoms::frameset ||
         tagName == nsGkAtoms::multicol || tagName == nsGkAtoms::noframes ||
         tagName == nsGkAtoms::noscript || tagName == nsGkAtoms::optgroup ||
         tagName == nsGkAtoms::progress || tagName == nsGkAtoms::textarea ||
         tagName == nsGkAtoms::_template || tagName == nsGkAtoms::plaintext ||
         tagName == nsGkAtoms::blockquote || tagName == nsGkAtoms::figcaption;
  // tagName == nsGkAtoms::entity ||
  // tagName == nsGkAtoms::markupDecl ||
  // tagName == nsGkAtoms::whitespace ||
  // tagName == nsGkAtoms::doctypeDecl ||
  // tagName == nsGkAtoms::instruction;

  // If the element is unknown element, we shouldn't insert line breaks
  // before it since unknown elements should be ignored.
  // return false;
}
Whiteboard: [sp3]

Did any of those comparisons pan out? I'm curious what your plans for this bug are.

Flags: needinfo?(masayuki)

I think that the conversion is generally used, and the hack is only a bit faster within current code base and current tag names. I don't disagree with adding the frequent tags path in ContentEventHandler, but I think that we should not touch the conversion code itself.

Flags: needinfo?(masayuki)

That sounds good to me.

You need to log in before you can comment on or make changes to this bug.