TM: Raytracer spends a lot of time shuffling pages around in the kernel

RESOLVED INVALID

Status

()

Core
JavaScript Engine
RESOLVED INVALID
9 years ago
9 years ago

People

(Reporter: graydon, Assigned: graydon)

Tracking

Trunk
x86
Linux
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(URL)

(Assignee)

Description

9 years ago
The "pretty" js raytracer runs slower -- when it runs at all -- with JIT on than with JIT off. But if I oprofile it, it's spending 97.3% of its wall time in the linux kernel, not the application at all. Here are the top few offenders:

samples  %        image name               symbol name
2172     40.0516  vmlinux-debug-2.6.22-14-generic get_page_from_freelist
849      15.6555  vmlinux-debug-2.6.22-14-generic native_flush_tlb_single
246       4.5362  vmlinux-debug-2.6.22-14-generic __handle_mm_fault
191       3.5220  vmlinux-debug-2.6.22-14-generic bad_page
117       2.1575  vmlinux-debug-2.6.22-14-generic mca_nmi_hook
110       2.0284  vmlinux-debug-2.6.22-14-generic strncpy_from_user
99        1.8256  libc-2.6.1.so            (no symbols)
85        1.5674  vmlinux-debug-2.6.22-14-generic __pagevec_lru_add
80        1.4752  vmlinux-debug-2.6.22-14-generic die_nmi
65        1.1986  vmlinux-debug-2.6.22-14-generic kunmap_atomic
46        0.8482  vmlinux-debug-2.6.22-14-generic __alloc_pages

That sounds suspiciously like mmap(), though I haven't been able to conclusively demonstrate that yet: I think oprofile's NMI frame crawler doesn't work so well in the kernel, I can't seem to get stacks out of it in there, only userspace. 

When I break on mmap in the debugger I get a lot of calls coming from jemalloc via array resizing in js (presumably on-trace in the raytracer) and also js arena allocation, and gfx/canvas buffer allocation. Every break I've seen is either a 1mb or 2mb mmap(), even when the incoming allocation is *much* smaller (eg. a 11kb malloc() appears to provoke a 1mb mmap()).

I don't wish to leap to the conclusion it's all solvable at the jemalloc level, but I wonder whether a simple cache of recent chunk allocations -- say the last 100ms or so worth of them -- would give it a chance to recycle them without hitting up the kernel.

Perhaps this should be switched to component: jemalloc? I thought it'd be a place to start. Obviously there's something spooky going on here.
(Assignee)

Comment 1

9 years ago
More details: strace -c (trimmed to memory calls) paints a less conclusive picture. 

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
 10.41    0.000827           0      2141           madvise
  0.57    0.000045           0       801           mmap2
  0.15    0.000012           0      2013           mprotect
  0.00    0.000000           0         1           brk
  0.00    0.000000           0       471           munmap

Possibly the madvise stuff there (breaking in gdb, that's from chunk_dealloc_reserve in jemalloc), but even that seems less likely, at only 2000 calls. I wonder if my oprofile symbols are wrong. Maybe this is a total phantom bug. Don't get too preoccupied investigating it yet.
(Assignee)

Comment 2

9 years ago
Total false alarm, misconfigured oprofile. My fault. I should have known something was up with those low event counts. Nevermind.
Status: NEW → RESOLVED
Last Resolved: 9 years ago
Resolution: --- → INVALID
(Assignee)

Updated

9 years ago
No longer blocks: 463478
You need to log in before you can comment on or make changes to this bug.