Closed
Bug 1385181
Opened 7 years ago
Closed 7 years ago
Redundant copying when return from HashTable::lookupForAdd()
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
mozilla57
People
(Reporter: ting, Assigned: ting)
References
(Blocks 1 open bug)
Details
Attachments
(3 files, 3 obsolete files)
I saw this when profile Speedometer Ember test on Nightly, the assembly when HashTable::lookupForAdd() returns [1]: mov qword ptr [rbp-0x79], rdi mov dword ptr [rbp-0x71], r15d movaps xmm0, xmmword ptr [rbp-0x79] movdqa xmmword ptr [rbp-0x79], xmm0 test rdi, rdi jz 0x1808e1b7c rbp-0x79 is the address of the local variable |p| in SnapshotWriter::add() on stack [2]. The first two mov instructions are the constructor of AddPtr initializing |entry_| and |keyHash|, but the following movaps and movdqa are redundant though I am not sure why they're there, it seems to me that NRVO doesn't kick in. I don't see the same pattern for the Nightly on Linux, so I assume this is Windows only. [1] http://searchfox.org/mozilla-central/rev/ad093e98f42338effe2e2513e26c3a311dd96422/js/public/HashTable.h#1776-1777 [2] http://searchfox.org/mozilla-central/rev/ad093e98f42338effe2e2513e26c3a311dd96422/js/src/jit/Snapshots.cpp#684
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → janus926
Status: NEW → ASSIGNED
Assignee | ||
Comment 1•7 years ago
|
||
Assignee | ||
Comment 2•7 years ago
|
||
Assignee | ||
Comment 3•7 years ago
|
||
Probably this is not about NRVO because they're inlined. I've talked to :chia-hung and he thinks this may be because of the timing that compiler does different optimizations. With the patch, not only movaps/movdqa but also the two mov are gone from my local build.
Attachment #8891221 -
Flags: review?(luke)
Comment 4•7 years ago
|
||
Comment on attachment 8891221 [details] [diff] [review] patch v1 Review of attachment 8891221 [details] [diff] [review]: ----------------------------------------------------------------- It's surprising that this is necessary; seems like a "bug" in the optimization. Anyway, nice job!
Attachment #8891221 -
Flags: review?(luke) → review+
Pushed by tchou@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/e94dceac8090 Alter HashTable::lookupForAdd() to remove a redundant copy when return. r=luke
Assignee | ||
Comment 6•7 years ago
|
||
For the record, VTune shows that the movaps/movdqa takes ~170ms on my desktop for the Speedometer Ember test.
Comment 7•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/e94dceac8090
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox56:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Comment 8•7 years ago
|
||
Ting, do you think we should add a comment in the code here? This seems non-obvious to the casual reader and I could see it getting changed back accidentally, etc. Maybe just something like: // Note, the following line has a subtle optimization to avoid excess copying. See bug 1385181.
Flags: needinfo?(janus926)
Comment 9•7 years ago
|
||
This might have caused bug 1386011.
Assignee | ||
Comment 10•7 years ago
|
||
Unfortunately, this simple patch seems trigger a bug in gcc [1], which is fixed in 6.0 but we're using 4.9.3. I've asked :KWierso to help back it out. [1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70499
Assignee | ||
Comment 11•7 years ago
|
||
I'll add comment later when reland.
Assignee | ||
Comment 12•7 years ago
|
||
The odd is this only happens to Linux x64 CCov opt build.
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(janus926)
Assignee | ||
Comment 13•7 years ago
|
||
Debug build using gcc-5 has the same error as bug 1386011, gcc-6 is fine.
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/5588a5f33bbc
Status: RESOLVED → REOPENED
Flags: needinfo?(janus926)
Resolution: FIXED → ---
Target Milestone: mozilla56 → ---
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(janus926)
Comment 15•7 years ago
|
||
If it was a Windows-only optimization, perhaps you can add an #ifdef and file a bug to remove it when we update gcc?
Assignee | ||
Comment 16•7 years ago
|
||
Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e8b5edbf3ed9a4a337f176973d684fb708e9dee6
Attachment #8891221 -
Attachment is obsolete: true
Assignee | ||
Comment 17•7 years ago
|
||
Attachment #8892398 -
Attachment is obsolete: true
Attachment #8892708 -
Flags: review?(luke)
Comment 18•7 years ago
|
||
Have you tested this with clang? clang also defines __GNUC__ IIRC.
Comment 19•7 years ago
|
||
You should probably use MOZ_GCC_VERSION_AT_LEAST() here.
Comment 20•7 years ago
|
||
(In reply to Ting-Yu Chou [:ting] from comment #6) This is starting to look a bit gnarly so, before r+'ing, some basic due diligence questions: - Is this a measurable speedup? - The comment says this is an optimization for MSVC 2015; does 2017 do better? > For the record, VTune shows that the movaps/movdqa takes ~170ms on my > desktop for the Speedometer Ember test. What % of total time is that?
Assignee | ||
Comment 21•7 years ago
|
||
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog, Away Aug 7-8) from comment #18) > Have you tested this with clang? clang also defines __GNUC__ IIRC. Thanks for reminding. The clang 3.9 that |mach bootstrap| installed doesn't output error for the patched code for the code coverage, normal, and debug builds on my Ubuntu. (In reply to :Ehsan Akhgari (needinfo please, extremely long backlog, Away Aug 7-8) from comment #19) > You should probably use MOZ_GCC_VERSION_AT_LEAST() here. Fixed. (In reply to Luke Wagner [:luke] from comment #20) > (In reply to Ting-Yu Chou [:ting] from comment #6) > This is starting to look a bit gnarly so, before r+'ing, some basic due > diligence questions: > - Is this a measurable speedup? VTune lists CodeGeneratorShared::encodeAllocation() in the top 5 of CPU time, and the lookupForAdd() in SnapshotWriter::add() [2] is the hottest spot, so yes. > - The comment says this is an optimization for MSVC 2015; does 2017 do > better? Just checked. No, 2017 outputs similar assembly, I fixed the comment. > > For the record, VTune shows that the movaps/movdqa takes ~170ms on my > > desktop for the Speedometer Ember test. > > What % of total time is that? The test runs for ~60second, so it's about 0.2%. [1] http://searchfox.org/mozilla-central/rev/f0e4ae5f8c40ba742214e89aba3f554da0b89a33/js/src/jit/Snapshots.cpp#684
Attachment #8892708 -
Attachment is obsolete: true
Attachment #8892708 -
Flags: review?(luke)
Attachment #8893244 -
Flags: review?(luke)
Comment 22•7 years ago
|
||
Comment on attachment 8893244 [details] [diff] [review] patch v4 Review of attachment 8893244 [details] [diff] [review]: ----------------------------------------------------------------- Fair enough, thanks for the answers.
Attachment #8893244 -
Flags: review?(luke) → review+
Assignee | ||
Comment 23•7 years ago
|
||
Non-gcc builds are failed because the macro MOZ_GCC_VERSION_AT_LEAST() is not defined, so I switched back to check __GNUC__. I'll reland once the tree is open.
Comment 24•7 years ago
|
||
Pushed by tchou@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/516c01f62d84 Avoid excess copying when return from HashTable::lookupForAdd(). r=luke
Comment 25•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/516c01f62d84
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
status-firefox57:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
You need to log in
before you can comment on or make changes to this bug.
Description
•