Open
Bug 1376323
Opened 7 years ago
Updated 2 years ago
Speed up StripCRLF()
Categories
(Core :: XPCOM, enhancement, P3)
Core
XPCOM
Tracking
()
REOPENED
People
(Reporter: ehsan.akhgari, Unassigned)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
2.52 KB,
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
Here is a perf profile of nsACString::StripTaggedASCII(). This function shows up under StripCRLF() calls in HTMLInputElement::SanitizeValue(). │ void ▒ │ nsTSubstring_CharT::StripTaggedASCII(const ASCIIMaskArray& aToStrip) ▒ │ { ▒ 6.06 │ push %rbp ▒ 3.03 │ mov %rsp,%rbp ▒ │ push %r15 ▒ 1.52 │ push %r14 ◆ │ push %rbx ▒ │ push %rax ▒ │ mov %rsi,%rbx ▒ 1.52 │ mov %rdi,%r14 ▒ │ if (mLength == 0) { ▒ │ mov 0x8(%r14),%r15d ▒ │ test %r15d,%r15d ▒ │ ↓ je 6c ▒ │ _ZN9nsAString13EnsureMutableEj(): ▒ │ if (mFlags & (F_FIXED | F_OWNED)) { ▒ │ mov 0xc(%r14),%eax ▒ │ test $0x18,%al ▒ │ ↓ je 77 ▒ │ _ZN9nsAString16StripTaggedASCIIERKSt5arrayIbLm128EE(): ▒ │ ▒ │ if (!EnsureMutable()) { ▒ │ AllocFailed(mLength); ▒ │ } ▒ │ ▒ │ char_type* to = mData; ▒ │21: mov (%r14),%rsi ▒ │ char_type* from = mData; ▒ │ char_type* end = mData + mLength; ▒ │ mov %r15d,%eax ▒ │ lea (%rsi,%rax,2),%rcx ▒ │ mov %rsi,%rax ▒ │ ↓ jmp 40 ▒ │ uint32_t theChar = (uint32_t)*from++; ▒ │ // Replacing this with a call to ASCIIMask::IsMasked ▒ │ // regresses performance somewhat, so leaving it inlined. ▒ │ if (!mozilla::ASCIIMask::IsMasked(aToStrip, theChar)) { ▒ │ // Not stripped, copy this char. ▒ │ *to++ = (char_type)theChar; ▒ 3.03 │30: mov %di,(%rax) ▒ 7.58 │ add $0x2,%rax ▒ │ } ▒ 1.52 │ add $0x2,%rdx ▒ 3.03 │ mov %rdx,%rsi ▒ 1.52 │ xchg %ax,%ax ▒ 4.55 │40: mov %rsi,%rdx ▒ │ while (from < end) { ▒ 1.52 │ cmp %rcx,%rdx ▒ │ ↓ jae 5d ▒ │ uint32_t theChar = (uint32_t)*from++; ▒ 7.58 │ movzwl (%rdx),%edi ▒ 4.55 │ cmp $0x7f,%rdi ▒ │ ↑ ja 30 ▒ 9.09 │ lea 0x2(%rdx),%rsi ▒ │ if (!mozilla::ASCIIMask::IsMasked(aToStrip, theChar)) { ▒ 6.06 │ cmpb $0x0,(%rbx,%rdi,1) ▒ 28.79 │ ↑ jne 40 ▒ 4.55 │ ↑ jmp 30 ▒ │ } ▒ │ *to = char_type(0); // add the null ▒ 1.52 │5d: movw $0x0,(%rax) ▒ │ mLength = to - mData; ▒ 1.52 │ sub (%r14),%rax ▒ │ shr %rax ▒ │ mov %eax,0x8(%r14) ▒ │ } ▒ 1.52 │6c: add $0x8,%rsp ▒ │ pop %rbx ▒ │ pop %r14 ▒ │ pop %r15 ▒ │ pop %rbp ▒ │ ← retq ▒ See how we are stalling all of our time on two dependent loads in the body of our hot loop where the CPU is sitting around waiting for the memory fetches to arrive. With two levels of loop unrolling this should get to optimimum performance. I wrote a patch to unroll this loop, and here is the impact when running the following tests. export MOZ_XRE_DIR=/path/to/objdir/dist/bin export MOZ_RUN_GTEST=1 export GTEST_FILTER=Strings.PerfStripWhitespace:Strings.PerfStripCRLF # These two tests examine StripTaggedASCII() Before: ehsan@ehsan-precision:~/moz/src.1347035$ ./mach run 0:00.09 /home/ehsan/moz/src.1347035/obj-ff-opt.noindex/dist/bin/firefox -no-remote -profile /home/ehsan/moz/src.1347035/obj-ff-opt.noindex/tmp/scratch_user Running GTest tests... Note: Google Test filter = Strings.PerfStripWhitespace:Strings.PerfStripCRLF [==========] Running 2 tests from 1 test case. [----------] Global test environment set-up. [----------] 2 tests from Strings [ RUN ] Strings.PerfStripWhitespace PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "Strings", "subtests": [{"name": "PerfStripWhitespace", "value": 105074, "lowerIsBetter": true}]}]} [ OK ] Strings.PerfStripWhitespace (105 ms) [ RUN ] Strings.PerfStripCRLF PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "Strings", "subtests": [{"name": "PerfStripCRLF", "value": 119844, "lowerIsBetter": true}]}]} [ OK ] Strings.PerfStripCRLF (119 ms) [----------] 2 tests from Strings (225 ms total) [----------] Global test environment tear-down [==========] 2 tests from 1 test case ran. (225 ms total) [ PASSED ] 2 tests. Finished running GTest tests. ehsan@ehsan-precision:~/moz/src.1347035$ ./mach run 0:00.09 /home/ehsan/moz/src.1347035/obj-ff-opt.noindex/dist/bin/firefox -no-remote -profile /home/ehsan/moz/src.1347035/obj-ff-opt.noindex/tmp/scratch_user Running GTest tests... Note: Google Test filter = Strings.PerfStripWhitespace:Strings.PerfStripCRLF [==========] Running 2 tests from 1 test case. [----------] Global test environment set-up. [----------] 2 tests from Strings [ RUN ] Strings.PerfStripWhitespace PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "Strings", "subtests": [{"name": "PerfStripWhitespace", "value": 104696, "lowerIsBetter": true}]}]} [ OK ] Strings.PerfStripWhitespace (104 ms) [ RUN ] Strings.PerfStripCRLF PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "Strings", "subtests": [{"name": "PerfStripCRLF", "value": 124698, "lowerIsBetter": true}]}]} [ OK ] Strings.PerfStripCRLF (124 ms) [----------] 2 tests from Strings (229 ms total) [----------] Global test environment tear-down [==========] 2 tests from 1 test case ran. (229 ms total) [ PASSED ] 2 tests. Finished running GTest tests. ehsan@ehsan-precision:~/moz/src.1347035$ ./mach run 0:00.08 /home/ehsan/moz/src.1347035/obj-ff-opt.noindex/dist/bin/firefox -no-remote -profile /home/ehsan/moz/src.1347035/obj-ff-opt.noindex/tmp/scratch_user Running GTest tests... Note: Google Test filter = Strings.PerfStripWhitespace:Strings.PerfStripCRLF [==========] Running 2 tests from 1 test case. [----------] Global test environment set-up. [----------] 2 tests from Strings [ RUN ] Strings.PerfStripWhitespace PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "Strings", "subtests": [{"name": "PerfStripWhitespace", "value": 100907, "lowerIsBetter": true}]}]} [ OK ] Strings.PerfStripWhitespace (101 ms) [ RUN ] Strings.PerfStripCRLF PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "Strings", "subtests": [{"name": "PerfStripCRLF", "value": 116099, "lowerIsBetter": true}]}]} [ OK ] Strings.PerfStripCRLF (116 ms) [----------] 2 tests from Strings (217 ms total) [----------] Global test environment tear-down [==========] 2 tests from 1 test case ran. (217 ms total) [ PASSED ] 2 tests. Finished running GTest tests. After: ehsan@ehsan-precision:~/moz/src.1347035$ ./mach run 0:00.09 /home/ehsan/moz/src.1347035/obj-ff-opt.noindex/dist/bin/firefox -no-remote -profile /home/ehsan/moz/src.1347035/obj-ff-opt.noindex/tmp/scratch_user Running GTest tests... Note: Google Test filter = Strings.PerfStripWhitespace:Strings.PerfStripCRLF [==========] Running 2 tests from 1 test case. [----------] Global test environment set-up. [----------] 2 tests from Strings [ RUN ] Strings.PerfStripWhitespace PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "Strings", "subtests": [{"name": "PerfStripWhitespace", "value": 64004, "lowerIsBetter": true}]}]} [ OK ] Strings.PerfStripWhitespace (64 ms) [ RUN ] Strings.PerfStripCRLF PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "Strings", "subtests": [{"name": "PerfStripCRLF", "value": 76056, "lowerIsBetter": true}]}]} [ OK ] Strings.PerfStripCRLF (76 ms) [----------] 2 tests from Strings (140 ms total) [----------] Global test environment tear-down [==========] 2 tests from 1 test case ran. (140 ms total) [ PASSED ] 2 tests. Finished running GTest tests. ehsan@ehsan-precision:~/moz/src.1347035$ ./mach run 0:00.09 /home/ehsan/moz/src.1347035/obj-ff-opt.noindex/dist/bin/firefox -no-remote -profile /home/ehsan/moz/src.1347035/obj-ff-opt.noindex/tmp/scratch_user Running GTest tests... Note: Google Test filter = Strings.PerfStripWhitespace:Strings.PerfStripCRLF [==========] Running 2 tests from 1 test case. [----------] Global test environment set-up. [----------] 2 tests from Strings [ RUN ] Strings.PerfStripWhitespace PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "Strings", "subtests": [{"name": "PerfStripWhitespace", "value": 65946, "lowerIsBetter": true}]}]} [ OK ] Strings.PerfStripWhitespace (66 ms) [ RUN ] Strings.PerfStripCRLF PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "Strings", "subtests": [{"name": "PerfStripCRLF", "value": 76946, "lowerIsBetter": true}]}]} [ OK ] Strings.PerfStripCRLF (77 ms) [----------] 2 tests from Strings (143 ms total) [----------] Global test environment tear-down [==========] 2 tests from 1 test case ran. (143 ms total) [ PASSED ] 2 tests. Finished running GTest tests. ehsan@ehsan-precision:~/moz/src.1347035$ ./mach run 0:00.09 /home/ehsan/moz/src.1347035/obj-ff-opt.noindex/dist/bin/firefox -no-remote -profile /home/ehsan/moz/src.1347035/obj-ff-opt.noindex/tmp/scratch_user Running GTest tests... Note: Google Test filter = Strings.PerfStripWhitespace:Strings.PerfStripCRLF [==========] Running 2 tests from 1 test case. [----------] Global test environment set-up. [----------] 2 tests from Strings [ RUN ] Strings.PerfStripWhitespace PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "Strings", "subtests": [{"name": "PerfStripWhitespace", "value": 67360, "lowerIsBetter": true}]}]} [ OK ] Strings.PerfStripWhitespace (67 ms) [ RUN ] Strings.PerfStripCRLF PERFHERDER_DATA: {"framework": {"name": "platform_microbench"}, "suites": [{"name": "Strings", "subtests": [{"name": "PerfStripCRLF", "value": 78263, "lowerIsBetter": true}]}]} [ OK ] Strings.PerfStripCRLF (78 ms) [----------] 2 tests from Strings (145 ms total) [----------] Global test environment tear-down [==========] 2 tests from 1 test case ran. (146 ms total) [ PASSED ] 2 tests. Finished running GTest tests.
Reporter | ||
Updated•7 years ago
|
Blocks: Speedometer_V2
Reporter | ||
Comment 1•7 years ago
|
||
Attachment #8881307 -
Flags: review?(nfroyd)
Comment 2•7 years ago
|
||
Comment on attachment 8881307 [details] [diff] [review] Unroll the loop in nsTSubstring_CharT::StripTaggedASCII() in order to speed up StripCRLF() Review of attachment 8881307 [details] [diff] [review]: ----------------------------------------------------------------- That's quite a speedup! To be clear, the claim is that we're stalling on 1) the load from the original string and 2) the load from the mask array?
Attachment #8881307 -
Flags: review?(nfroyd) → review+
Reporter | ||
Comment 3•7 years ago
|
||
That's correct.
Pushed by eakhgari@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/a5c0a74876ae Unroll the loop in nsTSubstring_CharT::StripTaggedASCII() in order to speed up StripCRLF(); r=froydnj
Comment 5•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/a5c0a74876ae
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox56:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Comment 6•7 years ago
|
||
backed out for causing bug 1379578
Status: RESOLVED → REOPENED
status-firefox56:
fixed → ---
Flags: needinfo?(ehsan)
Resolution: FIXED → ---
Backout by cbook@mozilla.com: https://hg.mozilla.org/mozilla-central/rev/715eae7152cc Backed out changeset a5c0a74876ae for causing bug 1379578
Updated•7 years ago
|
Target Milestone: mozilla56 → ---
Reporter | ||
Comment 8•7 years ago
|
||
Nathan, sorry I dropped the ball on this... For a summary of what changed on Talos as a result of this landing, please see https://treeherder.mozilla.org/perf.html#/alerts?id=7555 When I came back from vacation I did a few Talos pushes based on the idea of disabling this optimization on 32-bit architectures by setting the unrolling factor to 1. The rationale for that is that we don't care about 32-bit anywhere besides Win32, and we don't care about it all that much even on Windows for our users on Win64 who would use 64-bit builds. Are you OK with that general approach? Should I invest more time in this? The alternative would be to give up on this and move on, since I have no idea how to have our cake and eat it on both 32 and 64-bit architectures.
Flags: needinfo?(ehsan) → needinfo?(nfroyd)
Comment 9•7 years ago
|
||
I think giving up on unrolling on win32 (and linux32, I guess?) is fine. We may want to keep the optimization on ARM--we have more registers there and can probably live with an unrolling factor of 2. But if trying to tweak things for ARM is too much, I think having an unroll factor of 2 on 64-bit and not unrolling on 32-bit would be fine.
Flags: needinfo?(nfroyd)
Reporter | ||
Comment 10•7 years ago
|
||
Do we run gtests on Android? It seems that we don't... I'm now thinking how to assess what to do on ARM...
Updated•7 years ago
|
Priority: -- → P3
Comment 11•7 years ago
|
||
Just a note if we get back to this, the tests we have don't really do what they intend to do [1]. `StripCRLF` is an inline operation that mutates the string, so after the first call to strip there will be no characters to strip for the remaining 19,999 iterations. We're effectively testing an edge case of no CRLF in the string (which is an okay test, but not what we intend to do currently). We should really fix the tests and possibly add a few more stress tests before we start relying on them for perf comparisons. [1] http://searchfox.org/mozilla-central/rev/1c4da216e00ac95b38a3f236e010b31cdfaae03b/xpcom/tests/gtest/TestStrings.cpp#1312-1342
Reporter | ||
Updated•5 years ago
|
Assignee: ehsan → nobody
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•