Redundant copying when return from HashTable::lookupForAdd()

RESOLVED FIXED in Firefox 57

Status

()

defect
RESOLVED FIXED
2 years ago
2 years ago

People

(Reporter: ting, Assigned: ting)

Tracking

(Blocks 1 bug)

Trunk
mozilla57
Unspecified
Windows
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox56 ?, firefox57 fixed)

Details

Attachments

(3 attachments, 3 obsolete attachments)

Assignee

Description

2 years ago
I saw this when profile Speedometer Ember test on Nightly, the assembly when HashTable::lookupForAdd() returns [1]:

  mov qword ptr [rbp-0x79], rdi
  mov dword ptr [rbp-0x71], r15d
  movaps xmm0, xmmword ptr [rbp-0x79]
  movdqa xmmword ptr [rbp-0x79], xmm0
  test rdi, rdi
  jz 0x1808e1b7c

rbp-0x79 is the address of the local variable |p| in SnapshotWriter::add() on stack [2]. The first two mov instructions are the constructor of AddPtr initializing |entry_| and |keyHash|, but the following movaps and movdqa are redundant though I am not sure why they're there, it seems to me that NRVO doesn't kick in.

I don't see the same pattern for the Nightly on Linux, so I assume this is Windows only.

[1] http://searchfox.org/mozilla-central/rev/ad093e98f42338effe2e2513e26c3a311dd96422/js/public/HashTable.h#1776-1777
[2] http://searchfox.org/mozilla-central/rev/ad093e98f42338effe2e2513e26c3a311dd96422/js/src/jit/Snapshots.cpp#684
Assignee

Updated

2 years ago
Assignee: nobody → janus926
Status: NEW → ASSIGNED
Assignee

Comment 1

2 years ago
Posted file original_asm.txt
Assignee

Comment 2

2 years ago
Posted file patched_asm.txt
Assignee

Comment 3

2 years ago
Posted patch patch v1 (obsolete) — Splinter Review
Probably this is not about NRVO because they're inlined. I've talked to :chia-hung and he thinks this may be because of the timing that compiler does different optimizations.

With the patch, not only movaps/movdqa but also the two mov are gone from my local build.
Attachment #8891221 - Flags: review?(luke)
Comment on attachment 8891221 [details] [diff] [review]
patch v1

Review of attachment 8891221 [details] [diff] [review]:
-----------------------------------------------------------------

It's surprising that this is necessary; seems like a "bug" in the optimization.  Anyway, nice job!
Attachment #8891221 - Flags: review?(luke) → review+

Comment 5

2 years ago
Pushed by tchou@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e94dceac8090
Alter HashTable::lookupForAdd() to remove a redundant copy when return. r=luke
Assignee

Comment 6

2 years ago
For the record, VTune shows that the movaps/movdqa takes ~170ms on my desktop for the Speedometer Ember test.
https://hg.mozilla.org/mozilla-central/rev/e94dceac8090
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Ting, do you think we should add a comment in the code here?  This seems non-obvious to the casual reader and I could see it getting changed back accidentally, etc.  Maybe just something like:

// Note, the following line has a subtle optimization to avoid excess copying.  See bug 1385181.
Flags: needinfo?(janus926)
This might have caused bug 1386011.
Depends on: 1386011
Assignee

Comment 10

2 years ago
Unfortunately, this simple patch seems trigger a bug in gcc [1], which is fixed in 6.0 but we're using 4.9.3. I've asked :KWierso to help back it out.

[1] https://gcc.gnu.org/bugzilla/show_bug.cgi?id=70499
Assignee

Comment 11

2 years ago
I'll add comment later when reland.
Assignee

Comment 12

2 years ago
The odd is this only happens to Linux x64 CCov opt build.
Assignee

Updated

2 years ago
Flags: needinfo?(janus926)
Assignee

Comment 13

2 years ago
Debug build using gcc-5 has the same error as bug 1386011, gcc-6 is fine.
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/5588a5f33bbc
Status: RESOLVED → REOPENED
Flags: needinfo?(janus926)
Resolution: FIXED → ---
Target Milestone: mozilla56 → ---
Assignee

Updated

2 years ago
Flags: needinfo?(janus926)
If it was a Windows-only optimization, perhaps you can add an #ifdef and file a bug to remove it when we update gcc?
Assignee

Comment 17

2 years ago
Posted patch patch v3 (obsolete) — Splinter Review
Attachment #8892398 - Attachment is obsolete: true
Attachment #8892708 - Flags: review?(luke)

Comment 18

2 years ago
Have you tested this with clang?  clang also defines __GNUC__ IIRC.

Comment 19

2 years ago
You should probably use MOZ_GCC_VERSION_AT_LEAST() here.
(In reply to Ting-Yu Chou [:ting] from comment #6)
This is starting to look a bit gnarly so, before r+'ing, some basic due diligence questions:
 - Is this a measurable speedup?
 - The comment says this is an optimization for MSVC 2015; does 2017 do better?

> For the record, VTune shows that the movaps/movdqa takes ~170ms on my
> desktop for the Speedometer Ember test.

What % of total time is that?
Assignee

Comment 21

2 years ago
Posted patch patch v4Splinter Review
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog, Away Aug 7-8) from comment #18)
> Have you tested this with clang?  clang also defines __GNUC__ IIRC.

Thanks for reminding. The clang 3.9 that |mach bootstrap| installed doesn't output error for the patched code for the code coverage, normal, and debug builds on my Ubuntu.

(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog, Away Aug 7-8) from comment #19)
> You should probably use MOZ_GCC_VERSION_AT_LEAST() here.

Fixed.

(In reply to Luke Wagner [:luke] from comment #20)
> (In reply to Ting-Yu Chou [:ting] from comment #6)
> This is starting to look a bit gnarly so, before r+'ing, some basic due
> diligence questions:
>  - Is this a measurable speedup?

VTune lists CodeGeneratorShared::encodeAllocation() in the top 5 of CPU time, and the lookupForAdd() in SnapshotWriter::add() [2] is the hottest spot, so yes.

>  - The comment says this is an optimization for MSVC 2015; does 2017 do
> better?

Just checked. No, 2017 outputs similar assembly, I fixed the comment.

> > For the record, VTune shows that the movaps/movdqa takes ~170ms on my
> > desktop for the Speedometer Ember test.
> 
> What % of total time is that?

The test runs for ~60second, so it's about 0.2%.

[1] http://searchfox.org/mozilla-central/rev/f0e4ae5f8c40ba742214e89aba3f554da0b89a33/js/src/jit/Snapshots.cpp#684
Attachment #8892708 - Attachment is obsolete: true
Attachment #8892708 - Flags: review?(luke)
Attachment #8893244 - Flags: review?(luke)
Comment on attachment 8893244 [details] [diff] [review]
patch v4

Review of attachment 8893244 [details] [diff] [review]:
-----------------------------------------------------------------

Fair enough, thanks for the answers.
Attachment #8893244 - Flags: review?(luke) → review+
Assignee

Comment 23

2 years ago
Non-gcc builds are failed because the macro MOZ_GCC_VERSION_AT_LEAST() is not defined, so I switched back to check __GNUC__. I'll reland once the tree is open.

Comment 24

2 years ago
Pushed by tchou@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/516c01f62d84
Avoid excess copying when return from HashTable::lookupForAdd(). r=luke
https://hg.mozilla.org/mozilla-central/rev/516c01f62d84
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
You need to log in before you can comment on or make changes to this bug.