Closed Bug 571332 Opened 14 years ago Closed 14 years ago

jemalloc - avoiding the null check in the free method for non-huge allocations

Categories

(Core :: Memory Allocator, enhancement)

x86
Linux
enhancement
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: igor, Assigned: igor)

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(2 files)

Currently the implementation of the free function in jemalloc essentially doing:

if (ptr != NULL) {
	chunk = (arena_chunk_t *)CHUNK_ADDR2BASE(ptr);
	if (chunk != ptr) {
		arena_dalloc(chunk->arena, chunk, ptr);
	} else {
		huge_dalloc(ptr);
	}
}

The initial null check can be avoided for small allocations if one takes into account that (arena_chunk_t *)CHUNK_ADDR2BASE(NULL) == NULL for any (known for me) platform. Hence the idea is to reorganize the above code as

chunk = (arena_chunk_t *)CHUNK_ADDR2BASE(ptr);
if (chunk != ptr) {
	assert(ptr != NULL);
	arena_dalloc(chunk->arena, chunk, ptr);
} else if (ptr != NULL) {
	huge_dalloc(ptr);
}

This way only huge allocations would bear the penalty of the NULL check.
Attached patch v1Splinter Review
Besides moving the NULL check the patch also chages idalloc and free to use CHUNK_ADDR2OFFSET, not CHUNK_ADDR2BASE, for optimal performance.
The test case measures how long it take to call free method for 32-byte allocation using rdtsc instruction.

To run it save it to a directory with jemalloc source (memory/jemalloc) and compile on Linux with GCC using:

gcc -o x -O3 -std=c99 -Wall -DMOZ_MEMORY -DMOZ_MEMORY_LINUX -DMOZ_MEMORY_SIZEOF_PTR_2POW=3 -DNDEBUG -fstrict-aliasing -fomit-frame-pointer jemalloc.c x.c -lpthread

To get 32-bit output use:

gcc -m32 -o x -O3 -std=c99 -Wall -DMOZ_MEMORY -DMOZ_MEMORY_LINUX -DMOZ_MEMORY_SIZEOF_PTR_2POW=3 -DNDEBUG -fstrict-aliasing -fomit-frame-pointer jemalloc.c x.c -lpthread

With the patch I see the following results on my Intel(R) Core(TM) i5 CPU M 520  @ 2.40GHz laptop:

i686 executable

before the patch: 
cycles per loop iteration: average=74.2 min=72.5
after the patch:
cycles per loop iteration: average=71.6 min=69.8

or roughly 3% speedup of the hot free call.

x86_64 executable:
before the patch: 
cycles per loop iteration: average=61.8 min=60.6
after the patch:
cycles per loop iteration: average=61.3 min=59.9

or roughly 1% speedup of the hot free call.
Attachment #450645 - Flags: review?(jasone)
Comment on attachment 450645 [details] [diff] [review]
v1

This change seems unlikely to have a measurable impact on Firefox performance, but it certainly won't hurt anything.
Attachment #450645 - Flags: review?(jasone) → review+
http://hg.mozilla.org/tracemonkey/rev/2e14a43ef3db
Whiteboard: fixed-in-tracemonkey
http://hg.mozilla.org/mozilla-central/rev/2e14a43ef3db
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: