Much slower performance on run-length encoding testcase

RESOLVED WORKSFORME

Status

()

defect
RESOLVED WORKSFORME
7 years ago
5 years ago

People

(Reporter: bzbarsky, Unassigned)

Tracking

(Depends on 3 bugs)

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [js:t], URL)

Attachments

(1 attachment)

On http://www.timo-ernst.net/misc/riabench-start/runlengthencode/javascript/ I see about 700ms in nightly, and about 100ms in Chrome.

Profiler shows SlowCall stub calls to js_str_charAt.  Why are we failing to inline that?

Also, it looks like we do a whole bunch of rope flattening under there....

This is the one test in http://www.tomshardware.com/reviews/windows-7-chrome-20-firefox-13-opera-12,3228-7.html where we do very poorly.
Posted file Shell testcase
Profiling with Shark works better than Cleopatra here.

40% vm_fault
12% munmap
15% memcpy when flattening under charAt
9% script compilation (!)

Do we end up doing lots of partial flattening and therefore reallocating a ton?
We keep creating long (~60KB) strings and then calling charAt() on them.  About 5000 times.
Depends on: 615290
Depends on: 615280
From irc: there seems to be a vicious loop of:

  text = ...
  for (...) {
     ... = text.charAt(...)
     text = text.substr(...) + stuff + more stuff + text.substr(...)
  }

Which is O(n^2) due to the flattening on each iteration.  There are two general string optimizations I've been wanting for a yearthat would together avoid the O(n^2) behavior:
  - bug 615290 would be to have charAt work directly on the rope.  This could even be done easily from jit code since the lengths can be used to do a simple binary search
  - bug 615280l would allow substr to produce a dependent string without flattening the rope

The benchmark shows that v8/jsc/opera are all >3x better than FF and IE; my question is whether they all do these optimizations or is there something simpler.  I'll try to dig into one of them when I get a chance.
Ok, I dug in and I think I understand the slowdown.  There are several things working against us in this benchmark, #1 isn't easily actionable, but the others are.

1. v8 is actually doing roughly the same O(n^2) string work we are: they spend 33% of their insns purely in memcpy whereas we spend 45%.  The difference is that, since they have the optional ascii representation, they are copying half as many chars.

2. The 10% in compilation insns is caused by the roughly 25 GCs that occur due to the high-level of allocation.  Brian: do you think we can tweak our throw-away-all-jit-code on GC heuristics to have some minimum time below which it won't release jit code?

3. 6% of our insns are creating 78K temporary NumberObjects so that GetPropertyOperation can call toString on them.  This may be contributing to the number of GCs.  This seems like a pretty reasonable thing to fix in the VM; from the callgrind graph, it seems v8 also handles this in a C++ IC path.

Adding a dual ascii/unicode string representation might be a good idea, but it's no small task.  OTOH, the two bugs in comment 4 aren't too hard and I believe would make us the fastest in the pack (unless there is some hidden source of flattening I'm missing), so that is the quickest way I see to get faster on this.
(In reply to Luke Wagner [:luke] from comment #5)
> 2. The 10% in compilation insns is caused by the roughly 25 GCs that occur
> due to the high-level of allocation.  Brian: do you think we can tweak our
> throw-away-all-jit-code on GC heuristics to have some minimum time below
> which it won't release jit code?

This wouldn't be hard to do and I don't think it would backfire, but it's a bandaid and I agree pursuing the comment 4 bugs is the best approach.  If these strings aren't too big then generational GC would also help.

Updated

7 years ago
Depends on: 771865

Updated

7 years ago
Depends on: 771866
Whiteboard: [js:t]

Comment 7

5 years ago
Chrome 31
Time elapsed: 109 ms

Nightly 28.0a1 (2013-12-06)
Time elapsed: 11 ms

!!!
Status: NEW → RESOLVED
Last Resolved: 5 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.