Open Bug 1326193 Opened 8 years ago Updated 2 years ago

Inline code to compare strings

Categories

(Core :: JavaScript Engine: JIT, defect, P3)

defect

Tracking

()

People

(Reporter: h4writer, Unassigned, Mentored)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Attachments

(3 files)

We don't have machine code to compare two strings. We currently test the quick cases in our JIT and go to a C++ call to do the other cases. This adds some serious overhead for those cases and we can do the testing in assembly directly which would increase our performance on string compares: > function test(a, b) { > return a == b; > } > > var start = new Date(); > var b = true; > for (var i=0; i<100000000; i++) { > var a = "fdsqfdsq"+(i%2) > b = test(a, "fdsqfdsq0"); > } > > print(b); > print(new Date() - start); Takes 3700ms locally. With unintelligent code to inline the code in assembly I get 3000ms. But there is still a lot of possible optimizations here. Testing multiple bytes simultaneous. Iterating from left to right. Using architecture specific instructions. Supporting 16bit characters ... I'm leaving this open for anybody wanting to take a dive into a small assembly snippet and optimizing it for speed.
Attached patch Example patchSplinter Review
Blocks: sm-js-perf
Mentor: hv1989
Priority: -- → P3
Keywords: perf
I will take this if you don't mind. Feel free to unassign me for whatever reason :)
Assignee: nobody → mau
I know that Mauricio is already assigned on this bug, but since there is no updates for months I've tried something based on the Hannes patch. The idea is to check if it remains less than 4 characters to compare. In that case, each character is tested individually. If there are more than 4 characters to compare, we take a "fast path" which tests 4 characters in the same instruction (it also avoids a load). In those both cases, strings are compared from left to right. There is an obvious mistake in the patch since the VM segfault when the jit code is called on a string which is less than 4 characters (Hey, IonMonkey internals learner here ;-)). For instance, a loop which tests strings with 2 or 3 characters will segfault around the 10th iteration (after the jit code is generated). I'm not sure if the mistake is in the code I wrote or if it's more conceptual. Anyways, I'm curious to see if the general idea leads to a real improvement... Some ideas? Cheers!
I ended up not having enough time to work on this. Sorry for hogging this bug! Changing assignee per comment 3.
Assignee: mau → colin
Comment on attachment 8885138 [details] [diff] [review] String comaraison optimization based on Hannes patch Review of attachment 8885138 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/MacroAssembler.cpp @@ +1380,5 @@ > + Register temp = regs.takeAny(); // Points to the chars of the left string. > + Register temp2 = regs.takeAny(); // Points to the chars of the right string. > + Register temp3 = regs.takeAny(); // Specific left character to compare and > + // reused as comparison destination. > + Register temp4 = regs.takeAny(); // Specific right character to compare. This code requests at least 7 registers which is some luxury that we do not have on x86, with the profiler enabled. @@ +1385,5 @@ > + > + push(temp); > + push(temp2); > + push(temp3); > + push(temp4); Use the capitalized version of this function, same for pop. @@ +1395,5 @@ > + > + // Loop from left to right and compare characters > + bind(&testEqual); > + branch32(Assembler::Equal, result, Imm32(0), &equal); > + branch32(Assembler::GreaterThanOrEqual, result, Imm32(4), &fastPath); This is basically re-implementing memcmpn, I suggest you create a dedicated memcmpn function as part of the MacroAssembler, with multiple loop entries to avoid unused conditionals/code to make shorter loops (which are better optimized within the CPUs). And comment the code with the following code: while (result > 4) { if (*(uint32_t*)lhs != *(uint32_t*)rhs) { result = false; goto end; } lhs += 4; rhs += 4; } while (result > 0) { if (*(uint8_t*)lhs != *(uint8_t*)rhs) { result = false; goto end; } lhs += 1; rhs += 1; }
Hi Nicolas, thanks for the review! > ::: js/src/jit/MacroAssembler.cpp > @@ +1380,5 @@ > > + Register temp = regs.takeAny(); // Points to the chars of the left string. > > + Register temp2 = regs.takeAny(); // Points to the chars of the right string. > > + Register temp3 = regs.takeAny(); // Specific left character to compare and > > + // reused as comparison destination. > > + Register temp4 = regs.takeAny(); // Specific right character to compare. > > This code requests at least 7 registers which is some luxury that we do not > have on x86, with the profiler enabled. Right. By the way, what happens if more registers than the actual number of general registers provided by the processor are used? > @@ +1395,5 @@ > > + > > + // Loop from left to right and compare characters > > + bind(&testEqual); > > + branch32(Assembler::Equal, result, Imm32(0), &equal); > > + branch32(Assembler::GreaterThanOrEqual, result, Imm32(4), &fastPath); > > This is basically re-implementing memcmpn, I suggest you create a dedicated > memcmpn function as part of the MacroAssembler, with multiple loop entries > to avoid unused conditionals/code to make shorter loops I'm not sure to understand what do you mean by "multiple loop entries"? Here is a new version of the patch, with a dedicated memcmpn function. I tried to make shorter loops. For the "slow path", I don't know how to make it without two registers... Would it be possible to avoid them by using a bitmask?
FWIW, I think we should be careful with this: especially for longer strings it will likely be faster to do the compare in C++ due to optimized memcmp implementations (or auto vectorization) using SIMD instructions...

The bug assignee didn't login in Bugzilla in the last 7 months.
:sdetar, could you have a look please?
For more information, please visit auto_nag documentation.

Assignee: colin → nobody
Flags: needinfo?(sdetar)
Flags: needinfo?(sdetar)
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: