Closed
Bug 501515
Opened 16 years ago
Closed 12 years ago
Performance bottleneck for object creation in js_NewGCThing function
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: wagnerg, Assigned: dmandelin)
References
Details
Attachments
(1 file)
|
211 bytes,
application/x-javascript
|
Details |
User-Agent: Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1) Gecko/20090624 Firefox/3.5
Build Identifier:
For each object allocation, the js_NewGCThing function is called.
Profiling shows that about 50% of the time in this function is used by a division in the inlined function IsGCThresholdReached.
Reproducible: Always
| Assignee | ||
Updated•16 years ago
|
Assignee: general → dmandelin
| Assignee | ||
Comment 1•16 years ago
|
||
Based on this, I tried to speed up the new-object microbenchmark by altering or removing the IsGCThresholdReach test. But I couldn't get any speedup by changing it, and only 1% by removing it entirely.
So I did a bit of testing of my own. On my machine, the original benchmark runs in 1400ms. Shark shows about 50% of samples in system calls, mostly page faults, mmap, and munmap. Shark shows most of the rest of the samples in js_NewGCThing.
By direct measurement, I found that almost all the time is spent in js_NewGCThing. Thus, the system calls and page faults are ultimately generated by js_NewGCThing. I also found that GC runs 4 times, and about 300ms total is spent in js_GC. Breakdown:
Activity Time Spent % of total time
All 1400 ms 100%
js_NewGCThing 1400 ms 100%
paging/mmap 700 ms 50%
js_GC 300 ms 20%
(the rest) 400 ms 30%
Keep in mind we are 3x slower than WebKit on this microbenchmark. Thus, in order to catch up, we *must* reduce the time spent in MM system calls by at least 40%, presumably by using fewer pages.
Comment 2•16 years ago
|
||
We might do better if the page allocator could allocate more than one 4K page at once. x86 has the machinery to have larger-than-4K pages, and I would assume on modern OSes mmap will use that machinery if you provide the right arguments.
Comment 3•16 years ago
|
||
I bet WebKit has TCMalloc wired up so that it skips OS X choosing the zone allocator. It would be interesting to see the numbers on linux.
Comment 4•16 years ago
|
||
could we get a testcase here?
| Reporter | ||
Comment 5•16 years ago
|
||
This loop in the test-case should not trigger a GC otherwise we have to compile everything again and the results are not very good to compare. With the previous patch from Andreas that removes the first division and removing the division in IsGCThresholdReached the execution time reduces from 125ms to 115ms.
| Reporter | ||
Comment 6•16 years ago
|
||
Webkit allocates 64K pages with vm_map(...) for Darwin.
For the previous testcase, they call this function 4 times.
Furthermore, they start the GC 173 times.
We allocate 16K pages, call the mmap function 2034 times and don't call the GC at all.
Updated•12 years ago
|
Status: UNCONFIRMED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•