Redundant copying when return from HashTable::lookupForAdd()

RESOLVED FIXED in Firefox 57

Status

()

Core
JavaScript Engine
RESOLVED FIXED
22 days ago
14 days ago

People

(Reporter: ting, Assigned: ting)

Tracking

(Depends on: 1 bug, Blocks: 1 bug)

Trunk
mozilla57
Unspecified
Windows
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox56 ?, firefox57 fixed)

Details

Attachments

(3 attachments, 3 obsolete attachments)

(Assignee)

Description

22 days ago
I saw this when profile Speedometer Ember test on Nightly, the assembly when HashTable::lookupForAdd() returns [1]:

  mov qword ptr [rbp-0x79], rdi
  mov dword ptr [rbp-0x71], r15d
  movaps xmm0, xmmword ptr [rbp-0x79]
  movdqa xmmword ptr [rbp-0x79], xmm0
  test rdi, rdi
  jz 0x1808e1b7c

rbp-0x79 is the address of the local variable |p| in SnapshotWriter::add() on stack [2]. The first two mov instructions are the constructor of AddPtr initializing |entry_| and |keyHash|, but the following movaps and movdqa are redundant though I am not sure why they're there, it seems to me that NRVO doesn't kick in.

I don't see the same pattern for the Nightly on Linux, so I assume this is Windows only.

[1] http://searchfox.org/mozilla-central/rev/ad093e98f42338effe2e2513e26c3a311dd96422/js/public/HashTable.h#1776-1777
[2] http://searchfox.org/mozilla-central/rev/ad093e98f42338effe2e2513e26c3a311dd96422/js/src/jit/Snapshots.cpp#684
(Assignee)

Updated

22 days ago
Assignee: nobody → janus926
Status: NEW → ASSIGNED
(Assignee)

Comment 1

22 days ago
Created attachment 8891211 [details]
original_asm.txt
(Assignee)

Comment 2

22 days ago
Created attachment 8891212 [details]
patched_asm.txt
(Assignee)

Comment 3

22 days ago
Created attachment 8891221 [details] [diff] [review]
patch v1

Probably this is not about NRVO because they're inlined. I've talked to :chia-hung and he thinks this may be because of the timing that compiler does different optimizations.

With the patch, not only movaps/movdqa but also the two mov are gone from my local build.
Attachment #8891221 - Flags: review?(luke)

Comment 4

22 days ago
Comment on attachment 8891221 [details] [diff] [review]
patch v1

Review of attachment 8891221 [details] [diff] [review]:
-----------------------------------------------------------------

It's surprising that this is necessary; seems like a "bug" in the optimization.  Anyway, nice job!
Attachment #8891221 - Flags: review?(luke) → review+

Comment 5

19 days ago
Pushed by tchou@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e94dceac8090
Alter HashTable::lookupForAdd() to remove a redundant copy when return. r=luke
(Assignee)

Comment 6

19 days ago
For the record, VTune shows that the movaps/movdqa takes ~170ms on my desktop for the Speedometer Ember test.
https://hg.mozilla.org/mozilla-central/rev/e94dceac8090
Status: ASSIGNED → RESOLVED
Last Resolved: 19 days ago
status-firefox56: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56

Comment 8

19 days ago
Ting, do you think we should add a comment in the code here?  This seems non-obvious to the casual reader and I could see it getting changed back accidentally, etc.  Maybe just something like:

// Note, the following line has a subtle optimization to avoid excess copying.  See bug 1385181.
Flags: needinfo?(janus926)
This might have caused bug 1386011.
Depends on: 1386011
(Assignee)

Comment 10

18 days ago
Unfortunately, this simple patch seems trigger a bug in gcc [1], which is fixed in 6.0 but we're using 4.9.3. I've asked :KWierso to help back it out.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70499
(Assignee)

Comment 11

18 days ago
I'll add comment later when reland.
(Assignee)

Comment 12

18 days ago
The odd is this only happens to Linux x64 CCov opt build.
(Assignee)

Updated

18 days ago
Flags: needinfo?(janus926)
(Assignee)

Comment 13

18 days ago
Debug build using gcc-5 has the same error as bug 1386011, gcc-6 is fine.
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/5588a5f33bbc
Status: RESOLVED → REOPENED
status-firefox56: fixed → ?
Flags: needinfo?(janus926)
Resolution: FIXED → ---
Target Milestone: mozilla56 → ---
(Assignee)

Updated

18 days ago
Flags: needinfo?(janus926)
If it was a Windows-only optimization, perhaps you can add an #ifdef and file a bug to remove it when we update gcc?
(Assignee)

Comment 16

18 days ago
Created attachment 8892398 [details] [diff] [review]
patch v2

Try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=e8b5edbf3ed9a4a337f176973d684fb708e9dee6
Attachment #8891221 - Attachment is obsolete: true
(Assignee)

Comment 17

17 days ago
Created attachment 8892708 [details] [diff] [review]
patch v3
Attachment #8892398 - Attachment is obsolete: true
Attachment #8892708 - Flags: review?(luke)
Have you tested this with clang?  clang also defines __GNUC__ IIRC.
You should probably use MOZ_GCC_VERSION_AT_LEAST() here.

Comment 20

16 days ago
(In reply to Ting-Yu Chou [:ting] from comment #6)
This is starting to look a bit gnarly so, before r+'ing, some basic due diligence questions:
 - Is this a measurable speedup?
 - The comment says this is an optimization for MSVC 2015; does 2017 do better?

> For the record, VTune shows that the movaps/movdqa takes ~170ms on my
> desktop for the Speedometer Ember test.

What % of total time is that?
(Assignee)

Comment 21

16 days ago
Created attachment 8893244 [details] [diff] [review]
patch v4

(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog, Away Aug 7-8) from comment #18)
> Have you tested this with clang?  clang also defines __GNUC__ IIRC.

Thanks for reminding. The clang 3.9 that |mach bootstrap| installed doesn't output error for the patched code for the code coverage, normal, and debug builds on my Ubuntu.

(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog, Away Aug 7-8) from comment #19)
> You should probably use MOZ_GCC_VERSION_AT_LEAST() here.

Fixed.

(In reply to Luke Wagner [:luke] from comment #20)
> (In reply to Ting-Yu Chou [:ting] from comment #6)
> This is starting to look a bit gnarly so, before r+'ing, some basic due
> diligence questions:
>  - Is this a measurable speedup?

VTune lists CodeGeneratorShared::encodeAllocation() in the top 5 of CPU time, and the lookupForAdd() in SnapshotWriter::add() [2] is the hottest spot, so yes.

>  - The comment says this is an optimization for MSVC 2015; does 2017 do
> better?

Just checked. No, 2017 outputs similar assembly, I fixed the comment.

> > For the record, VTune shows that the movaps/movdqa takes ~170ms on my
> > desktop for the Speedometer Ember test.
> 
> What % of total time is that?

The test runs for ~60second, so it's about 0.2%.

[1] http://searchfox.org/mozilla-central/rev/f0e4ae5f8c40ba742214e89aba3f554da0b89a33/js/src/jit/Snapshots.cpp#684
Attachment #8892708 - Attachment is obsolete: true
Attachment #8892708 - Flags: review?(luke)
Attachment #8893244 - Flags: review?(luke)

Comment 22

16 days ago
Comment on attachment 8893244 [details] [diff] [review]
patch v4

Review of attachment 8893244 [details] [diff] [review]:
-----------------------------------------------------------------

Fair enough, thanks for the answers.
Attachment #8893244 - Flags: review?(luke) → review+
(Assignee)

Comment 23

15 days ago
Non-gcc builds are failed because the macro MOZ_GCC_VERSION_AT_LEAST() is not defined, so I switched back to check __GNUC__. I'll reland once the tree is open.

Comment 24

15 days ago
Pushed by tchou@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/516c01f62d84
Avoid excess copying when return from HashTable::lookupForAdd(). r=luke
https://hg.mozilla.org/mozilla-central/rev/516c01f62d84
Status: REOPENED → RESOLVED
Last Resolved: 19 days ago14 days ago
status-firefox57: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
You need to log in before you can comment on or make changes to this bug.