Closed Bug 771737 Opened 13 years ago Closed 12 years ago

Much slower performance on run-length encoding testcase

Tracking

()

Status:

RESOLVED WORKSFORME

People

(Reporter: bzbarsky, Unassigned)

References

(Depends on 2 open bugs,
URL
)

Details

(Whiteboard: [js:t])

Attachments

(1 file)

Shell testcase 13 years ago Boris Zbarsky [:bzbarsky] 16.38 KB, text/plain		Details

Boris Zbarsky [:bzbarsky]

Reporter

Description

•

13 years ago

On http://www.timo-ernst.net/misc/riabench-start/runlengthencode/javascript/ I see about 700ms in nightly, and about 100ms in Chrome. Profiler shows SlowCall stub calls to js_str_charAt. Why are we failing to inline that? Also, it looks like we do a whole bunch of rope flattening under there.... This is the one test in http://www.tomshardware.com/reviews/windows-7-chrome-20-firefox-13-opera-12,3228-7.html where we do very poorly.

Boris Zbarsky [:bzbarsky]

Reporter

Comment 1

•

13 years ago

Attached file Shell testcase — Details

Boris Zbarsky [:bzbarsky]

Reporter

Comment 2

•

13 years ago

Profiling with Shark works better than Cleopatra here. 40% vm_fault 12% munmap 15% memcpy when flattening under charAt 9% script compilation (!) Do we end up doing lots of partial flattening and therefore reallocating a ton?

Boris Zbarsky [:bzbarsky]

Reporter

Comment 3

•

13 years ago

We keep creating long (~60KB) strings and then calling charAt() on them. About 5000 times.

Depends on: 615290

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

13 years ago

Depends on: 615280

Luke Wagner [:luke]

Comment 4

•

13 years ago

From irc: there seems to be a vicious loop of: text = ... for (...) { ... = text.charAt(...) text = text.substr(...) + stuff + more stuff + text.substr(...) } Which is O(n^2) due to the flattening on each iteration. There are two general string optimizations I've been wanting for a yearthat would together avoid the O(n^2) behavior: - bug 615290 would be to have charAt work directly on the rope. This could even be done easily from jit code since the lengths can be used to do a simple binary search - bug 615280l would allow substr to produce a dependent string without flattening the rope The benchmark shows that v8/jsc/opera are all >3x better than FF and IE; my question is whether they all do these optimizations or is there something simpler. I'll try to dig into one of them when I get a chance.

Luke Wagner [:luke]

Comment 5

•

13 years ago

Ok, I dug in and I think I understand the slowdown. There are several things working against us in this benchmark, #1 isn't easily actionable, but the others are. 1. v8 is actually doing roughly the same O(n^2) string work we are: they spend 33% of their insns purely in memcpy whereas we spend 45%. The difference is that, since they have the optional ascii representation, they are copying half as many chars. 2. The 10% in compilation insns is caused by the roughly 25 GCs that occur due to the high-level of allocation. Brian: do you think we can tweak our throw-away-all-jit-code on GC heuristics to have some minimum time below which it won't release jit code? 3. 6% of our insns are creating 78K temporary NumberObjects so that GetPropertyOperation can call toString on them. This may be contributing to the number of GCs. This seems like a pretty reasonable thing to fix in the VM; from the callgrind graph, it seems v8 also handles this in a C++ IC path. Adding a dual ascii/unicode string representation might be a good idea, but it's no small task. OTOH, the two bugs in comment 4 aren't too hard and I believe would make us the fastest in the pack (unless there is some hidden source of flattening I'm missing), so that is the quickest way I see to get faster on this.

Brian Hackett [Laid off!]

Comment 6

•

13 years ago

(In reply to Luke Wagner [:luke] from comment #5) > 2. The 10% in compilation insns is caused by the roughly 25 GCs that occur > due to the high-level of allocation. Brian: do you think we can tweak our > throw-away-all-jit-code on GC heuristics to have some minimum time below > which it won't release jit code? This wouldn't be hard to do and I don't think it would backfire, but it's a bandaid and I agree pursuing the comment 4 bugs is the best approach. If these strings aren't too big then generational GC would also help.

Luke Wagner [:luke]

Updated

•

13 years ago

Depends on: 771865

Luke Wagner [:luke]

Updated

•

13 years ago

Depends on: 771866

David Mandelin [:dmandelin]

Updated

•

13 years ago

Blocks: WebJSPerf

David Mandelin [:dmandelin]

Updated

•

13 years ago

Whiteboard: [js:t]

Guilherme Lima

Comment 7

•

12 years ago

Chrome 31 Time elapsed: 109 ms Nightly 28.0a1 (2013-12-06) Time elapsed: 11 ms !!!

Status: NEW → RESOLVED

Closed: 12 years ago

Resolution: --- → WORKSFORME

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Much slower performance on run-length encoding testcase

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: bzbarsky, Unassigned)

References

(Depends on 2 open bugs,
URL
)

Details

(Whiteboard: [js:t])

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Comment 1

Comment 2

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Updated

Updated

Updated

Updated

Comment 7

Attachment

General

Description

File Name

Content Type