The "pretty" js raytracer runs slower -- when it runs at all -- with JIT on than with JIT off. But if I oprofile it, it's spending 97.3% of its wall time in the linux kernel, not the application at all. Here are the top few offenders: samples % image name symbol name 2172 40.0516 vmlinux-debug-2.6.22-14-generic get_page_from_freelist 849 15.6555 vmlinux-debug-2.6.22-14-generic native_flush_tlb_single 246 4.5362 vmlinux-debug-2.6.22-14-generic __handle_mm_fault 191 3.5220 vmlinux-debug-2.6.22-14-generic bad_page 117 2.1575 vmlinux-debug-2.6.22-14-generic mca_nmi_hook 110 2.0284 vmlinux-debug-2.6.22-14-generic strncpy_from_user 99 1.8256 libc-2.6.1.so (no symbols) 85 1.5674 vmlinux-debug-2.6.22-14-generic __pagevec_lru_add 80 1.4752 vmlinux-debug-2.6.22-14-generic die_nmi 65 1.1986 vmlinux-debug-2.6.22-14-generic kunmap_atomic 46 0.8482 vmlinux-debug-2.6.22-14-generic __alloc_pages That sounds suspiciously like mmap(), though I haven't been able to conclusively demonstrate that yet: I think oprofile's NMI frame crawler doesn't work so well in the kernel, I can't seem to get stacks out of it in there, only userspace. When I break on mmap in the debugger I get a lot of calls coming from jemalloc via array resizing in js (presumably on-trace in the raytracer) and also js arena allocation, and gfx/canvas buffer allocation. Every break I've seen is either a 1mb or 2mb mmap(), even when the incoming allocation is *much* smaller (eg. a 11kb malloc() appears to provoke a 1mb mmap()). I don't wish to leap to the conclusion it's all solvable at the jemalloc level, but I wonder whether a simple cache of recent chunk allocations -- say the last 100ms or so worth of them -- would give it a chance to recycle them without hitting up the kernel. Perhaps this should be switched to component: jemalloc? I thought it'd be a place to start. Obviously there's something spooky going on here.
More details: strace -c (trimmed to memory calls) paints a less conclusive picture. % time seconds usecs/call calls errors syscall ------ ----------- ----------- --------- --------- ---------------- 10.41 0.000827 0 2141 madvise 0.57 0.000045 0 801 mmap2 0.15 0.000012 0 2013 mprotect 0.00 0.000000 0 1 brk 0.00 0.000000 0 471 munmap Possibly the madvise stuff there (breaking in gdb, that's from chunk_dealloc_reserve in jemalloc), but even that seems less likely, at only 2000 calls. I wonder if my oprofile symbols are wrong. Maybe this is a total phantom bug. Don't get too preoccupied investigating it yet.
Total false alarm, misconfigured oprofile. My fault. I should have known something was up with those low event counts. Nevermind.