Closed Bug 771737 Opened 12 years ago Closed 11 years ago

Much slower performance on run-length encoding testcase

Tracking

()

Status:

RESOLVED WORKSFORME

People

(Reporter: bzbarsky, Unassigned)

References

(Depends on 3 open bugs,
URL
)

Details

(Whiteboard: [js:t])

Attachments

(1 file)

Shell testcase 12 years ago Boris Zbarsky [:bzbarsky] 16.38 KB, text/plain		Details

Boris Zbarsky [:bzbarsky]

Reporter

Description

•

12 years ago

On http://www.timo-ernst.net/misc/riabench-start/runlengthencode/javascript/ I see about 700ms in nightly, and about 100ms in Chrome.

Profiler shows SlowCall stub calls to js_str_charAt.  Why are we failing to inline that?

Also, it looks like we do a whole bunch of rope flattening under there....

This is the one test in http://www.tomshardware.com/reviews/windows-7-chrome-20-firefox-13-opera-12,3228-7.html where we do very poorly.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 1

•

12 years ago

Attached file Shell testcase — Details

Boris Zbarsky [:bzbarsky]

Reporter

Comment 2

•

12 years ago

Profiling with Shark works better than Cleopatra here.

40% vm_fault
12% munmap
15% memcpy when flattening under charAt
9% script compilation (!)

Do we end up doing lots of partial flattening and therefore reallocating a ton?

Boris Zbarsky [:bzbarsky]

Reporter

Comment 3

•

12 years ago

We keep creating long (~60KB) strings and then calling charAt() on them.  About 5000 times.

Depends on: 615290

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

12 years ago

Depends on: 615280

Luke Wagner [:luke]

Comment 4

•

12 years ago

From irc: there seems to be a vicious loop of:

  text = ...
  for (...) {
     ... = text.charAt(...)
     text = text.substr(...) + stuff + more stuff + text.substr(...)
  }

Which is O(n^2) due to the flattening on each iteration.  There are two general string optimizations I've been wanting for a yearthat would together avoid the O(n^2) behavior:
  - bug 615290 would be to have charAt work directly on the rope.  This could even be done easily from jit code since the lengths can be used to do a simple binary search
  - bug 615280l would allow substr to produce a dependent string without flattening the rope

The benchmark shows that v8/jsc/opera are all >3x better than FF and IE; my question is whether they all do these optimizations or is there something simpler.  I'll try to dig into one of them when I get a chance.

Luke Wagner [:luke]

Comment 5

•

12 years ago

Ok, I dug in and I think I understand the slowdown.  There are several things working against us in this benchmark, #1 isn't easily actionable, but the others are.

1. v8 is actually doing roughly the same O(n^2) string work we are: they spend 33% of their insns purely in memcpy whereas we spend 45%.  The difference is that, since they have the optional ascii representation, they are copying half as many chars.

2. The 10% in compilation insns is caused by the roughly 25 GCs that occur due to the high-level of allocation.  Brian: do you think we can tweak our throw-away-all-jit-code on GC heuristics to have some minimum time below which it won't release jit code?

3. 6% of our insns are creating 78K temporary NumberObjects so that GetPropertyOperation can call toString on them.  This may be contributing to the number of GCs.  This seems like a pretty reasonable thing to fix in the VM; from the callgrind graph, it seems v8 also handles this in a C++ IC path.

Adding a dual ascii/unicode string representation might be a good idea, but it's no small task.  OTOH, the two bugs in comment 4 aren't too hard and I believe would make us the fastest in the pack (unless there is some hidden source of flattening I'm missing), so that is the quickest way I see to get faster on this.

Brian Hackett [Laid off!]

Comment 6

•

12 years ago

(In reply to Luke Wagner [:luke] from comment #5)
> 2. The 10% in compilation insns is caused by the roughly 25 GCs that occur
> due to the high-level of allocation.  Brian: do you think we can tweak our
> throw-away-all-jit-code on GC heuristics to have some minimum time below
> which it won't release jit code?

This wouldn't be hard to do and I don't think it would backfire, but it's a bandaid and I agree pursuing the comment 4 bugs is the best approach.  If these strings aren't too big then generational GC would also help.

Luke Wagner [:luke]

Updated

•

12 years ago

Depends on: 771865

Luke Wagner [:luke]

Updated

•

12 years ago

Depends on: 771866

David Mandelin [:dmandelin]

Updated

•

12 years ago

Blocks: WebJSPerf

David Mandelin [:dmandelin]

Updated

•

12 years ago

Whiteboard: [js:t]

Guilherme Lima

Comment 7

•

11 years ago

Chrome 31
Time elapsed: 109 ms

Nightly 28.0a1 (2013-12-06)
Time elapsed: 11 ms

!!!

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → WORKSFORME

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Much slower performance on run-length encoding testcase

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: bzbarsky, Unassigned)

References

(Depends on 3 open bugs,
URL
)

Details

(Whiteboard: [js:t])

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Updated

Updated

Updated

Updated

Comment 7

Attachment

General

Description

File Name

Content Type