Closed Bug 501515 Opened 16 years ago Closed 12 years ago

Performance bottleneck for object creation in js_NewGCThing function

Categories

(Core :: JavaScript Engine, defect)

All
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: wagnerg, Assigned: dmandelin)

References

Details

Attachments

(1 file)

User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1) Gecko/20090624 Firefox/3.5 Build Identifier: For each object allocation, the js_NewGCThing function is called. Profiling shows that about 50% of the time in this function is used by a division in the inlined function IsGCThresholdReached. Reproducible: Always
Assignee: general → dmandelin
Blocks: 501186
Based on this, I tried to speed up the new-object microbenchmark by altering or removing the IsGCThresholdReach test. But I couldn't get any speedup by changing it, and only 1% by removing it entirely. So I did a bit of testing of my own. On my machine, the original benchmark runs in 1400ms. Shark shows about 50% of samples in system calls, mostly page faults, mmap, and munmap. Shark shows most of the rest of the samples in js_NewGCThing. By direct measurement, I found that almost all the time is spent in js_NewGCThing. Thus, the system calls and page faults are ultimately generated by js_NewGCThing. I also found that GC runs 4 times, and about 300ms total is spent in js_GC. Breakdown: Activity Time Spent % of total time All 1400 ms 100% js_NewGCThing 1400 ms 100% paging/mmap 700 ms 50% js_GC 300 ms 20% (the rest) 400 ms 30% Keep in mind we are 3x slower than WebKit on this microbenchmark. Thus, in order to catch up, we *must* reduce the time spent in MM system calls by at least 40%, presumably by using fewer pages.
We might do better if the page allocator could allocate more than one 4K page at once. x86 has the machinery to have larger-than-4K pages, and I would assume on modern OSes mmap will use that machinery if you provide the right arguments.
I bet WebKit has TCMalloc wired up so that it skips OS X choosing the zone allocator. It would be interesting to see the numbers on linux.
could we get a testcase here?
This loop in the test-case should not trigger a GC otherwise we have to compile everything again and the results are not very good to compare. With the previous patch from Andreas that removes the first division and removing the division in IsGCThresholdReached the execution time reduces from 125ms to 115ms.
Webkit allocates 64K pages with vm_map(...) for Darwin. For the previous testcase, they call this function 4 times. Furthermore, they start the GC 173 times. We allocate 16K pages, call the mmap function 2034 times and don't call the GC at all.
Status: UNCONFIRMED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: