Closed Bug 516241 Opened 11 years ago Closed 11 years ago

Does XPT_ArenaMalloc really need to use calloc?


(Core :: XPCOM, defect)

Not set





(Reporter: joelr, Unassigned)


(Whiteboard: [ts])

Do we really want to zero-fill a new arena?

Switching to malloc will skip the memset and, perhaps, a few other things below. I don't know how big of an optimization this is, will run Talos Ts and report.

23898898       .      .                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           -> XPT_NewArena(0x200, 0x8, 0xE90630)		
23898904       .      .                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             -> calloc(0x1, 0x18, 0xBFFFEB28)		
23898908       .      .                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               -> malloc_zone_calloc(0x255F000, 0x1, 0x18)		
23898917       .      .                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 -> szone_calloc(0x255F000, 0x1, 0x18)		
23898921      26      4                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 <- szone_calloc = 77
23898924       .      .                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 -> szone_malloc_should_clear(0x255F000, 0x1, 0x18)		
23898926       .      .                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   -> _spin_lock(0x2562500, 0x38100D0, 0xBFFFEA48)		
23898929       .      .                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     -> spin_unlock(0x2562500, 0x38100D0, 0xBFFFEA48)		
23898931      22      2                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     <- spin_unlock = 10
23898935       .      .                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     -> memset(0x260D0D0, 0x0, 0x20)
My apologies! 

-> XPT_NewArena(0x200, 0x8, 0xE90630)		
-> calloc(0x1, 0x18, 0xBFFFEB28)		
-> malloc_zone_calloc(0x255F000, 0x1, 0x18)		
-> szone_calloc(0x255F000, 0x1, 0x18)		
<- szone_calloc = 77
-> szone_malloc_should_clear(0x255F000, 0x1, 0x18)		
-> _spin_lock(0x2562500, 0x38100D0, 0xBFFFEA48)		
-> spin_unlock(0x2562500, 0x38100D0, 0xBFFFEA48)		
<- spin_unlock = 10
-> memset(0x260D0D0, 0x0, 0x20)		
Whiteboard: [ts]
Also applies to  XPT_ArenaMalloc.
XPT_NewArena uses calloc to zero the fields. It's just 24 bytes of memory so the savings from not using calloc won't be large. We'll need to initialize first, next, space and, perhaps, name to 0 anyway. XPT_ArenaMalloc is a different thing entirely since it grabs blocks of 1K (in my testing).
Also, I don't understand the alignment bit. If we are really trying to use aligned memory the using calloc is not foolproof, e.g. on Snow Leopard

posix_memalign: err=0, ptr=100180, align=0
malloc: ptr=1002b0, align=48
calloc: ptr=1003b0, align=48

Notice that pointers returned from malloc and calloc are not aligned at a 128-byte boundary. Code below. Unfortunately, posix_memalign only works on Snow Leopard.

#include <stdio.h>
#include <stdlib.h>

int main(int argc, char** argv)
	void *ptr1= 0;
	void *ptr2= 0;
	void *ptr3= 0;

	int err = posix_memalign(&ptr1, 128, 250);
	ptr2 = malloc(250);
	ptr3 = calloc(250 / 128, 128);
	printf("posix_memalign: err=%d, ptr=%lx, align=%ld\n",
		err, (unsigned long)ptr1, (unsigned long)ptr1 % 128);
	printf("malloc: ptr=%lx, align=%ld\n", 
		(unsigned long)ptr2, (unsigned long)ptr2 % 128);
	printf("calloc: ptr=%lx, align=%ld\n", 
		(unsigned long)ptr3, (unsigned long)ptr3 % 128);
	return 0;
Summary: Does XPT_NewArena really need to use calloc? → Does XPT_ArenaMalloc really need to use calloc?
Simply replacing calloc with malloc in XPT_ArenaMalloc crashes promptly. I have a hunch that the code relies on arena pointers being pre-initialized to 0 by calloc.
bsmedberg:joelr: do you have evidence that the calloc is hurting? It really doesn't sound like something that would take a lot of time
bsmedberg:there's no IO, and arena pages are roughly page-sized, right?
joelr:bsmedberg: i don't have evidence that it's hurting, not at all. i just see it in the startup path a lot.
bsmedberg:"see it" in what way?
joelr:bsmedberg: there's no IO but arena pages are not page-sized. they are 1k blocks whereas pages are 4k
joelr:bsmedberg: i'm dtracing startup, all functions, that's how i see it
bsmedberg:it's less than a page, then... you really shouldn't be having memory-thrashing issues with the calloc
bsmedberg:joelr: you mean it takes time, or its just a callcount?
bsmedberg:I'm a little skeptical that the arena actually helps at all, but that's a different matter.
joelr:bsmedberg: call count, i suppose. i'm also skeptical that arena helps at all since mac osx has it's own memory allocation optimizations
bsmedberg:And the xpt code has no tests, so unles you have a measured performance problem I suggest not touching it.
joelr:bsmedberg: i see (tests)... well, i'll look into it a little bit more since i want to measure a difference vs malloc (no zero-fill). it does crash with malloc at the moment so i won't be looking into it too much.
bsmedberg:you'd have to basically zero-fill when you allocate items from the arena
joelr:bsmedberg: because those items expect to be zero-filled?
bsmedberg:seems so, if you're crashing with malloc...
bsmedberg:joelr: ok, well, I'm saying "you have no evidence this bug is worth fixing, please move on"
Closed: 11 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.