presently, to test if a js::Value holds an integer, we compare its upper 32 bits with 0xffff001, which on X86 requires a 5 byte compare instruction, and on ARM requires 3 separate instructions (effectively, load 0xffff0000 into a temp; mask in 0x1; do comparison), If we shift the range of tags to be 0xffffff80-0xffffffff, then all of the tags are representable as an 8 bit sign extended value, which can be tested on x86 in a single 2 byte instruction, and on arm in a single 32 byte instruction. so it looks like it should be a win all around.
Created attachment 548615 [details] [diff] [review] Shift the range of tags for js::Value's x86 should immediately get all of the benefits of the new encoding, whereas on ARM, I needed to touch the macro assembler so it emits single instruction cmn (compare negative) for compares against negative numbers. Looks like the answer is "Yes". preliminary results are: ============================================================================= ** TOTAL **: 1.054x as fast 17495.0ms +/- 0.8% 16605.9ms +/- 0.2% significant ============================================================================= v8: 1.054x as fast crypto: 1.040x as fast raytrace: 1.105x as fast richards: 1.008x as fast splay: *1.050x as slow* deltablue: 1.119x as fast earley-boyer: 1.045x as fast regexp: 1.018x as fast ============================================================================= ** TOTAL **: 1.114x as fast 3813.4ms +/- 0.1% 3423.6ms +/- 0.1% significant ============================================================================= 3d: 1.099x as fast cube: 1.098x as fast morph: 1.050x as fast raytrace: 1.122x as fast bitops: 1.073x as fast 3bit-bits-in-byte: 1.059x as fast nsieve-bits: 1.062x as fast bitwise-and: - bits-in-byte: 1.109x as fast math: 1.24x as fast cordic: 1.009x as fast partial-sums: 1.45x as fast spectral-norm: 1.085x as fast regexp: *1.130x as slow* dna: *1.130x as slow* string: 1.081x as fast tagcloud: 1.075x as fast unpack-code: 1.113x as fast validate-input: 1.089x as fast base64: *1.015x as slow* fasta: 1.059x as fast access: 1.146x as fast nbody: 1.26x as fast binary-trees: 1.066x as fast fannkuch: 1.149x as fast nsieve: 1.071x as fast date: 1.154x as fast format-tofte: 1.142x as fast format-xparb: 1.167x as fast controlflow: 1.114x as fast recursive: 1.114x as fast crypto: 1.109x as fast md5: 1.111x as fast aes: 1.112x as fast sha1: 1.096x as fast I am admittedly curious about base64, dna, and splay. The only reason I can think of for the slowdown is that we take slightly longer to generate cmp's in some cases.
(In reply to comment #2) > x86 should immediately get all of the benefits of the new encoding, whereas > on ARM, I needed to touch the macro assembler so it emits single instruction > cmn (compare negative) for compares against negative numbers. > In case it was not obvious, the patch includes these changes; the ARM work is done > Looks like the answer is "Yes". > preliminary results are: these results are for ARM/Linux, running on the pegatron on my desk. I have not tested x86 yet, nor android/more modern ARM chips
Comment on attachment 548615 [details] [diff] [review] Shift the range of tags for js::Value's Since I don't have commit access yet, if you could push this to the try server, then to a repo, that would be incredibly useful (or if you can get someone else to do it...)
Comment on attachment 548615 [details] [diff] [review] Shift the range of tags for js::Value's Luke, could you take a look at the tag changes?
Comment on attachment 548615 [details] [diff] [review] Shift the range of tags for js::Value's Looks good to me. I'm a bit surprised this is all it took ;-). Per your request, I pushed to try; you can watch the action unfold live: http://tbpl.mozilla.org/?tree=Try&rev=67a24913f7d2
Created attachment 549413 [details] [diff] [review] do the same thing as before, but work on x86 *cough* I only changed the complexity of the checks on 32 bit platforms, and x86 had hard coded the offset between "here is the first byte of the compare instruction" and "here is the last byte of the branch". Which the space saving patch changed. It looks like this was the only hard coded offset that was changed (the other one is to check the shape of an object), so the hard coded offset of 12 has been lowered to 9.
Watch the action unfold: http://tbpl.mozilla.org/?tree=Try&rev=4dbde781da2f
Passing comment, you'll want to remove trailing whitespace in the patch.
Comment on attachment 549413 [details] [diff] [review] do the same thing as before, but work on x86 That is very very green.