Implement faster converter from String/nsAtom to nsHTMLTag
Categories
(Core :: DOM: Editor, task, P3)
Tracking
()
People
(Reporter: masayuki, Assigned: masayuki)
References
(Blocks 1 open bug)
Details
(Whiteboard: [sp3])
Assignee | ||
Comment 1•1 years ago
|
||
Comment 2•1 years ago
|
||
I applied that patch locally and profiled, and it actually makes things quite a bit slower. I hadn't expected that.
I profiled by running https://shell--speedometer-preview.netlify.app/?suite=Editor-TipTap&iterationCount=100#summary; I used the regular Gecko profiler, with the "No marker stacks" feature checked.
I also counted the frequency of elements in invocations to ContentEventHandler::ShouldBreakLineBefore
, using counts
:
20670802 counts
( 1) 10029600 (48.5%, 48.5%): tagName: strong
( 2) 9470200 (45.8%, 94.3%): tagName: p
( 3) 439200 ( 2.1%, 96.5%): tagName: br
( 4) 320300 ( 1.5%, 98.0%): tagName: h2
( 5) 206400 ( 1.0%, 99.0%): tagName: h3
( 6) 75200 ( 0.4%, 99.4%): tagName: h1
( 7) 74700 ( 0.4%, 99.7%): tagName: em
( 8) 34600 ( 0.2%, 99.9%): tagName: li
( 9) 11500 ( 0.1%,100.0%): tagName: blockquote
( 10) 9100 ( 0.0%,100.0%): tagName: ul
In the end, the fastest approach was to check for nsGkAtoms::strong
and nsGkAtoms::p
explicitly before going the generic path.
Profiles:
https://share.firefox.dev/3OyMb9W (184 samples in MaybeNotifyIMEOfAddedTextDuringDocumentChange
): base
https://share.firefox.dev/44ila0i (195 samples in MaybeNotifyIMEOfAddedTextDuringDocumentChange
): with emilio's patch (shouldn't be slower, this is probably just noise)
https://share.firefox.dev/44nVQq4 (226 samples in MaybeNotifyIMEOfAddedTextDuringDocumentChange
): with patch from comment 1
https://share.firefox.dev/3OK4Q2C (203 samples in MaybeNotifyIMEOfAddedTextDuringDocumentChange
): with nsHTMLTags::CaseSensitiveStringBytesToId(char16ptr_t aBytes, size_t aLength)
and ArrayEqual
https://share.firefox.dev/3YG2UwQ (199 samples in MaybeNotifyIMEOfAddedTextDuringDocumentChange
): with the nested switch copied into ShouldBreakLineBefore
, and return true
on all switch cases of block elements
https://share.firefox.dev/3P5lfAs (138 samples in MaybeNotifyIMEOfAddedTextDuringDocumentChange
): with up-front p
and strong
checking
Here's what I ended up with:
/* static */
bool ContentEventHandler::ShouldBreakLineBefore(const nsIContent& aContent,
const Element* aRootElement) {
// We don't need to append linebreak at the start of the root element.
if (&aContent == aRootElement) {
return false;
}
// If it's not an HTML element (including other markup language's elements),
// we shouldn't insert like break before that for now. Becoming this is a
// problem must be edge case. E.g., when ContentEventHandler is used with
// MathML or SVG elements.
if (!aContent.IsHTMLElement()) {
return false;
}
// Note that ideally, we should refer the style of the primary frame of
// aContent for deciding if it's an inline. However, it's difficult
// IMEContentObserver to notify IME of text change caused by style change.
// Therefore, currently, we should check only from the tag for now.
nsAtom* tagName = aContent.NodeInfo()->NameAtom();
// Check some common elements first.
if (tagName == nsGkAtoms::strong || tagName == nsGkAtoms::em) {
return false;
}
if (tagName == nsGkAtoms::p) {
return true;
}
if (tagName == nsGkAtoms::br) {
// If the element is <br>, we need to check if the <br> is caused by web
// content. Otherwise, i.e., it's caused by internal reason of Gecko,
// it shouldn't be exposed as a line break to flatten text.
return IsContentBR(aContent);
}
// Order by common-ness
// TODO: find some real-world data
return tagName == nsGkAtoms::div || tagName == nsGkAtoms::h1 ||
tagName == nsGkAtoms::h2 || tagName == nsGkAtoms::h3 ||
tagName == nsGkAtoms::h4 || tagName == nsGkAtoms::h5 ||
tagName == nsGkAtoms::h6 || tagName == nsGkAtoms::li ||
tagName == nsGkAtoms::ul || tagName == nsGkAtoms::ol ||
tagName == nsGkAtoms::td || tagName == nsGkAtoms::th ||
tagName == nsGkAtoms::tr || tagName == nsGkAtoms::q ||
tagName == nsGkAtoms::dd || tagName == nsGkAtoms::dl ||
tagName == nsGkAtoms::dt || tagName == nsGkAtoms::hr ||
tagName == nsGkAtoms::rb || tagName == nsGkAtoms::rp ||
tagName == nsGkAtoms::rt || tagName == nsGkAtoms::col ||
tagName == nsGkAtoms::dir || tagName == nsGkAtoms::img ||
tagName == nsGkAtoms::map || tagName == nsGkAtoms::nav ||
tagName == nsGkAtoms::pre || tagName == nsGkAtoms::rtc ||
tagName == nsGkAtoms::wbr || tagName == nsGkAtoms::xmp ||
tagName == nsGkAtoms::area || tagName == nsGkAtoms::base ||
tagName == nsGkAtoms::body || tagName == nsGkAtoms::form ||
tagName == nsGkAtoms::head || tagName == nsGkAtoms::html ||
tagName == nsGkAtoms::link || tagName == nsGkAtoms::main ||
tagName == nsGkAtoms::menu || tagName == nsGkAtoms::meta ||
tagName == nsGkAtoms::nobr || tagName == nsGkAtoms::ruby ||
tagName == nsGkAtoms::slot || tagName == nsGkAtoms::text ||
tagName == nsGkAtoms::aside || tagName == nsGkAtoms::audio ||
tagName == nsGkAtoms::embed || tagName == nsGkAtoms::frame ||
tagName == nsGkAtoms::image || tagName == nsGkAtoms::input ||
tagName == nsGkAtoms::label || tagName == nsGkAtoms::meter ||
tagName == nsGkAtoms::param || tagName == nsGkAtoms::style ||
tagName == nsGkAtoms::table || tagName == nsGkAtoms::tbody ||
tagName == nsGkAtoms::tfoot || tagName == nsGkAtoms::thead ||
tagName == nsGkAtoms::title || tagName == nsGkAtoms::track ||
tagName == nsGkAtoms::video || tagName == nsGkAtoms::applet ||
tagName == nsGkAtoms::button || tagName == nsGkAtoms::canvas ||
tagName == nsGkAtoms::center || tagName == nsGkAtoms::dialog ||
tagName == nsGkAtoms::figure || tagName == nsGkAtoms::footer ||
tagName == nsGkAtoms::header || tagName == nsGkAtoms::hgroup ||
tagName == nsGkAtoms::iframe || tagName == nsGkAtoms::keygen ||
tagName == nsGkAtoms::legend || tagName == nsGkAtoms::object ||
tagName == nsGkAtoms::option || tagName == nsGkAtoms::output ||
tagName == nsGkAtoms::script || tagName == nsGkAtoms::search ||
tagName == nsGkAtoms::select || tagName == nsGkAtoms::source ||
tagName == nsGkAtoms::address || tagName == nsGkAtoms::article ||
tagName == nsGkAtoms::bgsound || tagName == nsGkAtoms::caption ||
tagName == nsGkAtoms::comment || tagName == nsGkAtoms::details ||
tagName == nsGkAtoms::listing || tagName == nsGkAtoms::marquee ||
tagName == nsGkAtoms::newline || tagName == nsGkAtoms::noembed ||
tagName == nsGkAtoms::picture || tagName == nsGkAtoms::section ||
tagName == nsGkAtoms::summary || tagName == nsGkAtoms::basefont ||
tagName == nsGkAtoms::colgroup || tagName == nsGkAtoms::datalist ||
tagName == nsGkAtoms::fieldset || tagName == nsGkAtoms::frameset ||
tagName == nsGkAtoms::multicol || tagName == nsGkAtoms::noframes ||
tagName == nsGkAtoms::noscript || tagName == nsGkAtoms::optgroup ||
tagName == nsGkAtoms::progress || tagName == nsGkAtoms::textarea ||
tagName == nsGkAtoms::_template || tagName == nsGkAtoms::plaintext ||
tagName == nsGkAtoms::blockquote || tagName == nsGkAtoms::figcaption;
// tagName == nsGkAtoms::entity ||
// tagName == nsGkAtoms::markupDecl ||
// tagName == nsGkAtoms::whitespace ||
// tagName == nsGkAtoms::doctypeDecl ||
// tagName == nsGkAtoms::instruction;
// If the element is unknown element, we shouldn't insert line breaks
// before it since unknown elements should be ignored.
// return false;
}
Assignee | ||
Comment 3•1 years ago
|
||
Thank you, although it's a hack for specific benchmark though... Yea, <strong>
, <em>
and <b>
(and <i>
>?) are widely used by users in the real use. So, it could be useful for users too.
Without the hack, but including more optimizations of the big switch
.
https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=ff05b6e8ed833c5ecf9a9ae3306b2930d2d4d7d0&newProject=try&newRevision=02d851c0e4f9f37eb6c386517e87c424543cf74a&page=1&framework=13
Assignee | ||
Comment 4•1 years ago
|
||
Other tries:
Make nsHTMLTags
resolve short tag names without its hash tables (3 characters or less). The result might be a bit faster in Editor-CodeMirror.
Make the short tag name target is 4 or less. Then, the result seems worse or nothing is changed.
Then, tried to reduce the number of switch
statement in the worst cases. Then, the result might be better than the previous one.
Finally, the comparing of the results is:
https://treeherder.mozilla.org/perfherder/comparesubtest?originalProject=try&newProject=try&newRevision=726574bae9f65fa232ba90b08d99affe3d725c61&originalSignature=4586009&newSignature=4586009&framework=13&application=firefox&originalRevision=c022e2c405124ed666f4cd558b51c2b3cb798226&page=1
Updated•1 years ago
|
Updated•1 years ago
|
Comment 5•1 years ago
|
||
Did any of those comparisons pan out? I'm curious what your plans for this bug are.
Assignee | ||
Comment 6•1 years ago
|
||
I think that the conversion is generally used, and the hack is only a bit faster within current code base and current tag names. I don't disagree with adding the frequent tags path in ContentEventHandler
, but I think that we should not touch the conversion code itself.
Comment 7•1 years ago
|
||
That sounds good to me.
Description
•