Closed
Bug 771737
Opened 13 years ago
Closed 12 years ago
Much slower performance on run-length encoding testcase
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: bzbarsky, Unassigned)
References
(Depends on 2 open bugs, )
Details
(Whiteboard: [js:t])
Attachments
(1 file)
|
16.38 KB,
text/plain
|
Details |
On http://www.timo-ernst.net/misc/riabench-start/runlengthencode/javascript/ I see about 700ms in nightly, and about 100ms in Chrome.
Profiler shows SlowCall stub calls to js_str_charAt. Why are we failing to inline that?
Also, it looks like we do a whole bunch of rope flattening under there....
This is the one test in http://www.tomshardware.com/reviews/windows-7-chrome-20-firefox-13-opera-12,3228-7.html where we do very poorly.
| Reporter | ||
Comment 1•13 years ago
|
||
| Reporter | ||
Comment 2•13 years ago
|
||
Profiling with Shark works better than Cleopatra here.
40% vm_fault
12% munmap
15% memcpy when flattening under charAt
9% script compilation (!)
Do we end up doing lots of partial flattening and therefore reallocating a ton?
| Reporter | ||
Comment 3•13 years ago
|
||
We keep creating long (~60KB) strings and then calling charAt() on them. About 5000 times.
Depends on: 615290
Comment 4•13 years ago
|
||
From irc: there seems to be a vicious loop of:
text = ...
for (...) {
... = text.charAt(...)
text = text.substr(...) + stuff + more stuff + text.substr(...)
}
Which is O(n^2) due to the flattening on each iteration. There are two general string optimizations I've been wanting for a yearthat would together avoid the O(n^2) behavior:
- bug 615290 would be to have charAt work directly on the rope. This could even be done easily from jit code since the lengths can be used to do a simple binary search
- bug 615280l would allow substr to produce a dependent string without flattening the rope
The benchmark shows that v8/jsc/opera are all >3x better than FF and IE; my question is whether they all do these optimizations or is there something simpler. I'll try to dig into one of them when I get a chance.
Comment 5•13 years ago
|
||
Ok, I dug in and I think I understand the slowdown. There are several things working against us in this benchmark, #1 isn't easily actionable, but the others are.
1. v8 is actually doing roughly the same O(n^2) string work we are: they spend 33% of their insns purely in memcpy whereas we spend 45%. The difference is that, since they have the optional ascii representation, they are copying half as many chars.
2. The 10% in compilation insns is caused by the roughly 25 GCs that occur due to the high-level of allocation. Brian: do you think we can tweak our throw-away-all-jit-code on GC heuristics to have some minimum time below which it won't release jit code?
3. 6% of our insns are creating 78K temporary NumberObjects so that GetPropertyOperation can call toString on them. This may be contributing to the number of GCs. This seems like a pretty reasonable thing to fix in the VM; from the callgrind graph, it seems v8 also handles this in a C++ IC path.
Adding a dual ascii/unicode string representation might be a good idea, but it's no small task. OTOH, the two bugs in comment 4 aren't too hard and I believe would make us the fastest in the pack (unless there is some hidden source of flattening I'm missing), so that is the quickest way I see to get faster on this.
Comment 6•13 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #5)
> 2. The 10% in compilation insns is caused by the roughly 25 GCs that occur
> due to the high-level of allocation. Brian: do you think we can tweak our
> throw-away-all-jit-code on GC heuristics to have some minimum time below
> which it won't release jit code?
This wouldn't be hard to do and I don't think it would backfire, but it's a bandaid and I agree pursuing the comment 4 bugs is the best approach. If these strings aren't too big then generational GC would also help.
Updated•13 years ago
|
Whiteboard: [js:t]
Comment 7•12 years ago
|
||
Chrome 31
Time elapsed: 109 ms
Nightly 28.0a1 (2013-12-06)
Time elapsed: 11 ms
!!!
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•