Closed Bug 1058631 Opened 10 years ago Closed 10 years ago

Some small nursery GC performance improvements

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla34

People

(Reporter: jandem, Assigned: jandem)

Details

Attachments

(2 files)

Part 1 - Assorted tweaks 10 years ago Jan de Mooij [:jandem] 7.06 KB, patch	terrence : review+	Details \| Diff \| Splinter Review
Part 2 - Specialize memcpy 10 years ago Jan de Mooij [:jandem] 2.80 KB, patch		Details \| Diff \| Splinter Review

Jan de Mooij [:jandem]

Assignee

Description

•

10 years ago

I'll attach some patches that make the micro-benchmark below at least 10% faster on OS X. They also improve things on Linux and Windows, though especially on Linux the win appears to be smaller.

The patches also win a few hundred points each on Octane earley-boyer.

function f() {
    var prevO = null;
    for (var i=0; i<100000000; i++) {
	var o = {x: 1, y: 2, z: 3, prev: prevO};
	if (i % 8 === 0)
	    prevO = o;
    }
}
var t = new Date;
f();
print(new Date - t);

Jan de Mooij [:jandem]

Assignee

Comment 1

•

10 years ago

Attached patch Part 1 - Assorted tweaks — Details — Splinter Review

* Clang was not inlining Nursery::moveSlotsToTenured and Nursery::moveElementsToTenured into their only caller, I added MOZ_ALWAYS_INLINE to force inlining.

* Flags CrashAtUnhandlableOOM as MOZ_NORETURN, to hint to the compiler calls to it are cold and won't return.

* Some minor changes to avoid a few dereferences. These are unlikely to matter much, but very straight-forward so I left them in.

This patch improves the micro-benchmark in comment 0 from 1604 to 1502 ms on OS X 32-bit.

Attachment #8479080 - Flags: review?(terrence)

Lars T Hansen [:lth]

Comment 2

•

10 years ago

Since you're micro-optimizing: PodCopy, used in the copy operations, is not clever for most slot arrays, in fact it is fairly strange: For 128 elements or more PodCopy becomes a single memcpy.  For less than 128 elements it becomes a loop plus one memcpy call per element.

Of course it's possible that a good C++ compiler does something clever with that.

Terrence Cole [:terrence]

Comment 3

•

10 years ago

Comment on attachment 8479080 [details] [diff] [review]
Part 1 - Assorted tweaks

Review of attachment 8479080 [details] [diff] [review]:
-----------------------------------------------------------------

Nice!

Attachment #8479080 - Flags: review?(terrence) → review+

Jan de Mooij [:jandem]

Assignee

Comment 4

•

10 years ago

Attached patch Part 2 - Specialize memcpy — Details — Splinter Review

Clang was emitting a call to a generic memcpy function. This patch adds an AllocKind switch statement so that each of the memcpy calls has a constant "size" argument, and the compiler can do something smarter.

For instance, for the FINALIZE_OBJECT0_BACKGROUND switch-case, Clang now emits the following instead of a memcpy call:

	movsd	(%ebx), %xmm0
	movsd	8(%ebx), %xmm1
	movsd	%xmm1, 8(%esi)
	movsd	%xmm0, (%esi)

Attachment #8479123 - Flags: review?(terrence)

Jan de Mooij [:jandem]

Assignee

Comment 5

•

10 years ago

Part 2 also wins about 100 ms on the comment 0 micro-benchmark, 1502 -> 1408 ms on OS X 32-bit.

Jan de Mooij [:jandem]

Assignee

Comment 6

•

10 years ago

(In reply to Lars T Hansen [:lth] from comment #2)
> Since you're micro-optimizing: PodCopy, used in the copy operations, is not
> clever for most slot arrays, in fact it is fairly strange: For 128 elements
> or more PodCopy becomes a single memcpy.  For less than 128 elements it
> becomes a loop plus one memcpy call per element.
> 
> Of course it's possible that a good C++ compiler does something clever with
> that.

Yeah, I think it's assuming that C++ compilers do something clever with small constant-size memcpy but not with variable-size memcpy (or at least not all compilers).

Emanuel Hoogeveen [:ehoogeveen]

Comment 7

•

10 years ago

I do know that the < 128 elements thing was based on performance measurements; Waldo probably remembers more of the details.

Jan de Mooij [:jandem]

Assignee

Comment 8

•

10 years ago

Comment on attachment 8479123 [details] [diff] [review]
Part 2 - Specialize memcpy

I see a small regression with this patch on OS X x64 and Linux x64. Not sure why it behaves differently on x64 but it's probably not worth the complexity then.

Attachment #8479123 - Flags: review?(terrence)

Jan de Mooij [:jandem]

Assignee

Comment 9

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/0556ceb562e3

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 10

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/0556ceb562e3

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla34

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Some small nursery GC performance improvements

Categories

(Core :: JavaScript: GC, defect)

Tracking

()

People

(Reporter: jandem, Assigned: jandem)

References

Details

Crash Data

Security

(public)

User Story

Attachments

(2 files)

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Attachment

General

Description

File Name

Content Type