TM: merge tamarin-redux to tracemonkey

RESOLVED FIXED

Status

()

P1
normal
RESOLVED FIXED
10 years ago
10 years ago

People

(Reporter: graydon, Assigned: gal)

Tracking

Trunk
Points:
---
Dependency tree / graph
Bug Flags:
in-testsuite -
in-litmus -

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 8 obsolete attachments)

(Reporter)

Description

10 years ago
Created attachment 341029 [details] [diff] [review]
merge tamarin-redux to mozilla-central

Here is a preliminary patch that merges the tamarin-redux branch to today's mozilla-central tip (specifically revision aa4d3083995f). The result appears to work as far as tests I know how to run, and can function in a browser with content jit turned on. Any feedback on what you see here would be good; I'm happy to revise this extensively, it's a little messy and reaches quite deep.
(Assignee)

Updated

10 years ago
Attachment #341029 - Flags: review?(gal)
(Assignee)

Updated

10 years ago
Attachment #341029 - Flags: review?(danderson)
(Reporter)

Comment 1

10 years ago
Created attachment 341358 [details] [diff] [review]
patch moved to tracemonkey branch and refreshed for today's tip

Fresh patch, same general content. Only mochitest failures are those I see on m-c too, very likely my own desktop. Still bringing up x64 and arm test VMs, no testing on them yet.
Attachment #341029 - Attachment is obsolete: true
Attachment #341358 - Flags: review?(gal)
Attachment #341029 - Flags: review?(gal)
Attachment #341029 - Flags: review?(danderson)
(Assignee)

Comment 2

10 years ago
Doesn't apply against my tip, fails on Nativei386.cpp. Could you re-patch that? Looks not trivial so I don't want to touch it.
(Reporter)

Comment 3

10 years ago
Created attachment 341364 [details] [diff] [review]
newer and fresher still

apparently the previous patch with its delicious extra formatting doesn't enjoy applying to end-of-today's tip. try again!
Attachment #341358 - Attachment is obsolete: true
Attachment #341358 - Flags: review?(gal)
Looks good from a quick rundown of the diff, AMD64 will have problems but I can clean it up after landing.  We should retest not having FASTCALL on x86 as well, with all those calling convention changes.
(Assignee)

Comment 5

10 years ago
About 60ms perf loss over SS total.

With old nanojit:

t/3d-cube.js
interp: 96 94 94 94 94 
jit: 40 40 40 40 40 
jit factor: 2.35
t/3d-morph.js
interp: 83 84 83 83 83 
jit: 30 29 29 29 29 
jit factor: 2.86
t/3d-raytrace.js
interp: 96 97 96 95 96 
jit: 41 41 41 41 42 
jit factor: 2.28
t/access-binary-trees.js
interp: 39 40 40 39 40 
jit: 39 40 40 40 40 
jit factor: 1.00
t/access-fannkuch.js
interp: 127 127 126 127 127 
jit: 67 67 67 68 68 
jit factor: 1.86
t/access-nbody.js
interp: 108 108 108 108 108 
jit: 31 30 31 30 30 
jit factor: 3.60
t/access-nsieve.js
interp: 38 38 38 38 39 
jit: 13 12 12 12 12 
jit factor: 3.25
t/bitops-3bit-bits-in-byte.js
interp: 37 37 37 38 37 
jit: 2 2 2 2 2 
jit factor: 18.50
t/bitops-bits-in-byte.js
interp: 58 58 58 58 58 
jit: 9 10 10 10 9 
jit factor: 6.44
t/bitops-bitwise-and.js
interp: 54 54 54 54 54 
jit: 4 4 3 4 4 
jit factor: 13.50
t/bitops-nsieve-bits.js
interp: 68 69 68 68 69 
jit: 24 23 24 23 23 
jit factor: 3.00
t/controlflow-recursive.js
interp: 33 33 33 33 33 
jit: 32 33 33 32 33 
jit factor: 1.00
t/crypto-aes.js
interp: 56 56 55 56 56 
jit: 34 34 34 34 35 
jit factor: 1.60
t/crypto-md5.js
interp: 41 41 41 42 42 
jit: 23 24 23 24 24 
jit factor: 1.75
t/crypto-sha1.js
interp: 42 42 42 42 42 
jit: 9 9 9 9 9 
jit factor: 4.66
t/date-format-tofte.js
interp: 113 113 114 114 114 
jit: 111 110 111 111 110 
jit factor: 1.03
t/date-format-xparb.js
interp: 96 95 95 95 96 
jit: 103 103 104 103 103 
jit factor: .93
t/math-cordic.js
interp: 98 98 98 98 99 
jit: 20 21 21 20 21 
jit factor: 4.71
t/math-partial-sums.js
interp: 91 91 91 91 91 
jit: 14 13 14 14 13 
jit factor: 7.00
t/math-spectral-norm.js
interp: 51 51 51 51 51 
jit: 8 8 8 8 7 
jit factor: 7.28
t/regexp-dna.js
interp: 226 226 226 227 227 
jit: 227 227 227 227 227 
jit factor: 1.00
t/string-base64.js
interp: 41 41 41 41 41 
jit: 15 16 16 15 16 
jit factor: 2.56
t/string-fasta.js
interp: 107 106 106 106 106 
jit: 75 75 76 76 76 
jit factor: 1.39
t/string-tagcloud.js
interp: 113 112 113 112 113 
jit: 106 106 106 106 106 
jit factor: 1.06
t/string-unpack-code.js
interp: 148 147 148 147 148 
jit: 149 149 150 149 149 
jit factor: .99
t/string-validate-input.js
interp: 61 61 61 62 61 
jit: 42 42 42 43 42 
jit factor: 1.45

New nanojit:

t/3d-cube.js
interp: 95 94 94 94 94 
jit: 44 44 44 46 45 
jit factor: 2.08
t/3d-morph.js
interp: 86 86 86 86 86 
jit: 30 30 30 30 30 
jit factor: 2.86
t/3d-raytrace.js
interp: 95 94 94 94 94 
jit: 43 43 43 43 43 
jit factor: 2.18
t/access-binary-trees.js
interp: 38 38 37 38 38 
jit: 39 39 38 39 38 
jit factor: 1.00
t/access-fannkuch.js
interp: 122 123 123 123 122 
jit: 78 78 77 78 77 
jit factor: 1.58
t/access-nbody.js
interp: 101 101 101 101 102 
jit: 31 32 31 31 31 
jit factor: 3.29
t/access-nsieve.js
interp: 38 38 38 38 38 
jit: 14 15 14 14 14 
jit factor: 2.71
t/bitops-3bit-bits-in-byte.js
interp: 37 37 38 38 37 
jit: 4 4 4 5 4 
jit factor: 9.25
t/bitops-bits-in-byte.js
interp: 58 58 58 58 58 
jit: 14 13 14 14 13 
jit factor: 4.46
t/bitops-bitwise-and.js
interp: 54 54 54 54 54 
jit: 6 6 5 5 6 
jit factor: 9.00
t/bitops-nsieve-bits.js
interp: 69 69 68 68 68 
jit: 26 26 27 27 26 
jit factor: 2.61
t/controlflow-recursive.js
interp: 34 34 34 34 34 
jit: 33 33 33 34 34 
jit factor: 1.00
t/crypto-aes.js
interp: 56 56 55 56 55 
jit: 36 37 37 37 37 
jit factor: 1.48
t/crypto-md5.js
interp: 41 41 41 41 41 
jit: 26 27 26 26 26 
jit factor: 1.57
t/crypto-sha1.js
interp: 43 42 42 43 42 
jit: 10 10 10 10 10 
jit factor: 4.20
t/date-format-tofte.js
interp: 115 116 116 115 116 
jit: 114 113 113 113 114 
jit factor: 1.01
t/date-format-xparb.js
interp: 96 97 97 97 97 
jit: 104 105 104 104 104 
jit factor: .93
t/math-cordic.js
interp: 92 93 93 92 93 
jit: 23 23 24 23 23 
jit factor: 4.04
t/math-partial-sums.js
interp: 88 89 89 89 88 
jit: 14 14 15 15 15 
jit factor: 5.86
t/math-spectral-norm.js
interp: 46 45 46 46 46 
jit: 9 8 9 9 9 
jit factor: 5.11
t/regexp-dna.js
interp: 228 229 229 227 228 
jit: 228 232 228 229 229 
jit factor: .99
t/string-base64.js
interp: 51 51 51 51 51 
jit: 17 17 16 17 17 
jit factor: 3.00
t/string-fasta.js
interp: 107 107 107 107 108 
jit: 78 78 79 78 78 
jit factor: 1.38
t/string-tagcloud.js
interp: 117 113 113 114 114 
jit: 106 107 106 106 106 
jit factor: 1.07
t/string-unpack-code.js
interp: 148 148 149 149 148 
jit: 151 151 150 150 151 
jit factor: .98
t/string-validate-input.js
interp: 63 62 63 63 63 
jit: 43 43 43 43 43 
jit factor: 1.46
(Assignee)

Comment 6

10 years ago
Looks like the StoreFilter isn't working properly. Try running trace.js:

--------------------------------------- end exit block SID 0
    sti sp[0] = add2
                   mov 0(esi),eax             eax(add2) ecx(state) ebx(add1) esi(sp)
                   mov edi,esi                eax(add2) ecx(state) ebx(add1) esi(sp)
    sti sp[-8] = add2
                   mov -8(edi),eax            eax(add2) ecx(state) ebx(add1) edi(sp)
                   mov esi,edi                eax(add2) ecx(state) ebx(add1) edi(sp)
    sti sp[0] = add2
                   mov 0(esi),eax             eax(add2) ecx(state) ebx(add1) esi(sp)
                   mov edi,esi                eax(add2) ecx(state) ebx(add1) esi(sp)
    sti sp[8] = 5000
                   mov 8(edi),5000            eax(add2) ecx(state) ebx(add1) edi(sp)
No longer blocks: 456607
(Assignee)

Comment 7

10 years ago
Created attachment 341377 [details] [diff] [review]
patch with stack filter enabled

The generated code is still slightly slower. What are 3 parameters loaded on top of each fragment? param1/2/3?
Attachment #341364 - Attachment is obsolete: true
(Assignee)

Comment 8

10 years ago
                   sub esp,24                
compiling trunk 0x303940 T1
    param1 = param 0 ebx
        spill param1
                   mov -4(ebp),ebx            ebx(param1)
    param2 = param 1 esi
        spill param2
                   mov -8(ebp),esi            esi(param2)
    param3 = param 2 edi
        spill param3
                   mov -12(ebp),edi           edi(param3)
    state = param 0 ecx
    sp = ld state[0]
                   mov edx,0(ecx)             ecx(state)
    ld1 = ld sp[-16]
(Assignee)

Comment 9

10 years ago
So I guess the register allocator is now used to save callee-saved registers. Unfortunately this happens on trace, so at the end of trace we restore them and the jump to SOT spills them again. Not good.
Assignee: general → graydon
Severity: enhancement → normal
Priority: -- → P1
(Assignee)

Comment 10

10 years ago
I guess we can try to use the new branch instructions, but that doesn't seem to work entirely:

I create a label at the top of the tree (when fragment->root == fragment, so firstfragment).
    
    if (fragment->root == fragment)
        treeInfo->sot = lir->ins0(LIR_label);

And then at each tail I try to jump to it:

    lir->insBranch(LIR_j, NULL, treeInfo->sot);
    if (fragment == fragment->root) {
        fragment->lastIns = lir->insGuard(LIR_loop, lir->insImm(1), exit);
    } else {
        fragment->lastIns = lir->insGuard(LIR_x, lir->insImm(1), exit);
    }
    compile(fragmento);

Note that we need the lastIns stuff still. We can't make the LIR_j the last ins(why?).

Also, this doesn't seem to compute the right label:

    j -> label1
        Loop j -> label1
                   jmp 0x0                   

Maybe nanojit only supports downward branches? Any ideas graydon?
(Assignee)

Comment 11

10 years ago
Also I had to disable the stack filter again. That seems to botch in the presence of jumps. We definitively need to work that out if we want to use jumps on trace.

Comment 12

10 years ago
Assembler does support branches in both directions, but has no smarts for loop carried registers, so when a backwards branch is ecnountered during codegen, we require all registers to be empty. (not ideal).

I don't know what's going on with StackFilter.

this is what i'd expect to see if you want to use LIR_j instead of LIR_loop, for loop back edges, and move the callee-saved reg instructions out of the loop, but without changing anything else:

start
param 0,1 // ebx
param 1,1 // esi
param 2,1 // edi
label
param 0,0 // ecx
param 1,0 // edx, not sure if you use this
...
[update state]
j -> label

The code in [update state] would be whatever is there now.  for a non-recursive loop, sp/rp shouldn't need to move, assuming the stack is balanced, so it would just be stores of loop variant variables, like the loop counter.  

if this starts to get too hard then its possible to not emit the callee-saved params, and then change the prolog/epilog code back to explicitly push/pop those registers.

Yet a third option is to make use of a stub for transitioning betwen interp and trace and have the stub handle callee-saved regs.  then you have a simple prolog and no need to save any regs.

Comment 13

10 years ago
In Assembler.h, pending_lives should be LIST_NonGCObjects

Comment 14

10 years ago
I have your patch re-merged back into redux, with all the new x64, arm, explicit free code, etc.  My first step was a bulk copy, so the lion share of merging work (done by you, thank you) is hopefully behind us.

some comments:

- ExprFilter removed the optimization for cmov(const ? x : y) => x or y.  any idea why?

- i re-added the const folding for LIR_add/sub/mul, guarded by ifdefs.  I think if we simply have these not fold in the case of overflow, we could be fine.  ?

- I re-added code guarded by PERFM and VTUNE.  i will clean up the ifdef names so we can leave the code in place and TM can disable it easily.

- the array inside LInsHashSet needs to be zero'd upon alloc.  We were calling gc->Alloc(size, GC::kZero), your patch removed the kZero.  since you aren't crashing horribly, i am assuming your gc->Alloc() maps to calloc()?  anyway, i added the flag back.

- (maybe for danders) why the new +5 in AMD64's underrunProtect()?

more after another review of my diffs.  since your patch hasn't landed yet i'm anticipating a second round of (smaller!) merges after this.
(Assignee)

Comment 15

10 years ago
I think the previous cmov code didn't fold if both sides were constant. I tried to fix that.
(Reporter)

Comment 16

10 years ago
- Wrt. the ExprFilter cmov optimization: I don't see it being removed in the patch I'm looking at. This is in ExprFilter::ins2(), around line 772? I still see this code:

	if (oprnd1->isconst()) {
	    // const ? x : y => return x or y depending on const
	    return oprnd1->constval() ? oprnd2->oprnd1() : oprnd2->oprnd2();
	}

- Wrt. const folding on the add/sub/mul, I don't know what you mean by ifdef'ing them in: why conditionally compile? Does the ifdef-guarded code check for overflow? It seems to me that any correct version of constant folding there must do so. If you've added code to do so, why ifdef at all?

- Wrt. PERFM / VTUNE, yeah, I was spotty and inconsistent about which of them I deleted and which I left in. I think I initially wanted to minimize the patch landing on tracemonkey, and later wanted to minimize the patch landing on your side instead. I've no personal preference how this goes.

- Wrt. kZero, yes, our fake "gc" falls through to calloc, I checked on this. It also doesn't accept a second param at all. Perhaps for future merge simplicity we should make it do so, and ignore it. Or even support it! malloc is faster than calloc, after all.

- Wrt. +5 in underrunProtect(), I have no idea.
(Assignee)

Comment 17

10 years ago
I would really prefer if we can properly initialize fields in nanojit instead of using calloc.

Comment 18

10 years ago
(In reply to comment #17)
> I would really prefer if we can properly initialize fields in nanojit instead
> of using calloc.

LInsHashSet's array needs to be zero'd since it's a hashtable and 0=empty.  For other structures, Agreed.

Comment 19

10 years ago
(In reply to comment #16)
> - Wrt. the ExprFilter cmov optimization: 

aha, i missed the case near the top that covered it.  fixed.

> - Wrt. const folding on the add/sub/mul

the ifdef'd code doesn't check for overflow but should and therefore doesn't need to be ifdef'd.  will fix.


> - Wrt. kZero, yes, our fake "gc" falls through to calloc, I checked on this.

i'll leave it as Alloc(size, kZero), maybe tm can add add the flags (default=0) support for compatibility with MMgc::GC's Alloc() api?

underrunprotect()+5 wasn't needed after all, but we have more fixes in that area coming that will subsume the tweaked code.

background: underrunProtect() needs to make sure the next instruction to be written still fits on the current page -- needs to add padding for the size of the page header only.  when a new page is allocated, a jump is written from the new page to the old one, and as long as the new page is larger than the size of the jump instruction, we're good.
(Assignee)

Updated

10 years ago
Summary: merge tamarin-redux to tracemonkey → TM: merge tamarin-redux to tracemonkey
(Reporter)

Comment 20

10 years ago
Created attachment 342178 [details] [diff] [review]
use LIR_j to a label after the params, rather than LIR_loop

This patch replaces the LIR_loop with LIR_j and passes trace tests. Not sure about full browser, or how to measure remaining perf perturbation. Will look further tomorrow. 

(The trick was that you have to jump to *after* the second set of param opcodes, not between the callee-save params and caller-passed params. If you jump between the two sets, you get a failure to reload spilled members of the second set that don't live through the whole trace: namely the state variable, in many traces. I am not entirely sure why this is so...)

Comment 21

10 years ago
LIR_loop has a SideExit parameter for vm-specific stuff, but LIR_loop doesn't.  Gal hinted this could be a problem, is it?  because if not, LIR_loop probably can go away entirely.
(Reporter)

Comment 22

10 years ago
Created attachment 342363 [details] [diff] [review]
updated merge patch

Updated patch. Includes:

  - refresh to tracemonkey rev 4dd36c3e0cdb, after bug 458735
  - LIR_j jumping to label after params
  - Dummy LIR_x after LIR_j to make StackFilter behave
  - StackFilter enabled in  both places, ignoring LIR_j and LIR_label
  - Integrated fix-in-progress to bug 458431
  - Integrated edwsmith's proposed fix to make RegAlloc::usepri and active both
    be LastReg+1 in length

Appears to work on linux-x86. Sampling 50 runs of the benchmark script gives a very slight performance regression. Haven't isolated that yet. Anyone else want to give it a spin, confirm/deny? Have not checked other platforms yet.

without patch:  count 50, sum 83632, [1532,1812], avg 1672 +/- 62 (3.742223%)
   with patch:  count 50, sum 85219, [1587,1825], avg 1704 +/- 58 (3.455970%)
Attachment #341377 - Attachment is obsolete: true
Attachment #342178 - Attachment is obsolete: true
(Assignee)

Comment 23

10 years ago
I definitively see the perf regression too, 1285 -> 1312.
(Assignee)

Comment 24

10 years ago
The perf regression is mild. I would not be opposed to landing this now and then analyze the cause, otherwise the patch will go stale again.
(Assignee)

Comment 25

10 years ago
Brendan points out we might want to wait with landing until we merge with mc since we don't want to regress perf of mc.
We should figure out the cause of the perf regression, and fix it or make a compensating perf-win fix elsewhere.

No going backward.

/be
(Reporter)

Comment 27

10 years ago
I concur, didn't mean to suggest otherwise. Just need to work out what's causing it. Details then: breaking down the benchmark to individual tests, it appears to be a general cost increase, same increase on all tests, not a pathological case. So I compared pre- and post- variants of the smallest test, bitops-bitwise-and.js. The results appear to my untrained eye to point a finger at the new register allocator: it keeps fewer live registers at several points, and as a result winds up with 4 more inter-register shuffling instructions inside the loop. Do I misread? Here's a screenshot that, I think, makes it pretty clear:

http://venge.net/graydon/bitwise-and.png

Left is current TM tip, right is with the redux patch. Any insights or possible solutions, Edwin? I'm also curious about all the nops at the entry, but that is probably less important.
(Reporter)

Comment 28

10 years ago
Further update: I spent the afternoon mucking with oprofile to try to confirm that these few movs are the source of the grief, and found that at least bitwise-and was losing a little more to codegen issues (page growth strategy in particular) than this. But fixing that that only fixed 3 or 4 of the benchmarks. I focused on the next one (3bits) and found that, if I artificially cranked up the iteration count of the innermost loops, I could clearly find the generated code -- samples occurring *on* this code page -- running about 2-3% slower. And on disassembly, it's the same pattern: extra inter-register shuffles, less register occupancy, ~10% more code emitted. I can try confirming that it's not loss to branch mispredicts or cache misses or such, but the most obvious thing still looks like these not-so-useful insns:

http://venge.net/graydon/3bit-bits-in-byte.png

So er, I actually don't know how to fix this. Register allocators are sort of deep combinatorial magic. Any suggestions?

Comment 29

10 years ago
The nop's at entry are emitted in genPrologeu to align the method address on a %16 boundary.  it's very dumb, intel does publish multibyte NOP instructions that would be smarter.  and the whole alignment thing is potentially not a win anyway, i would be fine with an option switch somewhere to disable it.

looking at register alloc now.
(Reporter)

Comment 30

10 years ago
Created attachment 342646 [details] [diff] [review]
update to patch

Thanks to a quick turnaround on the register allocation bug from Edwin, this patch (freshened to today's tip) is now down to about 1% (on some tests 2%) regression from baseline. I'm trying to further isolate the remainder but it's getting trickier since the generated code actually reads better with the patch than without; a couple fewer memory references within the loop. It just runs slightly slower. Odd. Trying a variety of hw performance counters to see if anything shows up, so far no dice. Feel free to test perf on your own systems and see if it's just something odd about my setup.
Attachment #342363 - Attachment is obsolete: true
(Assignee)

Updated

10 years ago
Depends on: 459537
(Assignee)

Comment 31

10 years ago
Created attachment 342754 [details] [diff] [review]
updated patch against tip
Attachment #342646 - Attachment is obsolete: true
(Assignee)

Comment 32

10 years ago
After fixing 459537 we now crash with NJ2 on trace-tests in the decay-loop testcase. However, we don't crash if the test is run individually. This looks like a reproducible memory corruption bug:


Program received signal EXC_BAD_ACCESS, Could not access memory.
Reason: KERN_INVALID_ADDRESS at address: 0x00401f0f
0x0013bd6a in nanojit::Assembler::onPage (this=0x803600, where=0x263000 "\017\037@", exitPages=false) at nanojit/Assembler.cpp:383
383				page = page->next;
(gdb) bt
#0  0x0013bd6a in nanojit::Assembler::onPage (this=0x803600, where=0x263000 "\017\037@", exitPages=false) at nanojit/Assembler.cpp:383
#1  0x0013bdbe in nanojit::Assembler::pageValidate (this=0x803600) at nanojit/Assembler.cpp:393
#2  0x0014bb84 in nanojit::Assembler::gen (this=0x803600, reader=0xbfffcd3c, loopJumps=@0xbfffcdbc) at nanojit/Assembler.cpp:1730
#3  0x0014bd69 in nanojit::Assembler::assemble (this=0x803600, frag=0x38f640, loopJumps=@0xbfffcdbc) at nanojit/Assembler.cpp:862
#4  0x0015b643 in nanojit::compile (assm=0x803600, triggerFrag=0x38f640) at nanojit/LIR.cpp:2061
#5  0x001222a5 in TraceRecorder::compile (this=0x3902e0, fragmento=0x3005e0) at jstracer.cpp:1939
#6  0x001225c1 in TraceRecorder::closeLoop (this=0x3902e0, fragmento=0x3005e0) at jstracer.cpp:1976
#7  0x0012428b in js_CloseLoop (cx=0x3010f0) at jstracer.cpp:2514
#8  0x00137c43 in js_RecordLoopEdge (cx=0x3010f0, r=0x3902e0, inlineCallCount=@0xbfffdb48) at jstracer.cpp:2530
#9  0x001382f5 in js_MonitorLoopEdge (cx=0x3010f0, inlineCallCount=@0xbfffdb48) at jstracer.cpp:2839
#10 0x00068a1b in js_Interpret (cx=0x3010f0) at jsinterp.cpp:3696
#11 0x000991b0 in js_Execute (cx=0x3010f0, chain=0x257000, script=0x81e000, down=0x0, flags=0, result=0x0) at jsinterp.cpp:1550
#12 0x00018804 in JS_ExecuteScript (cx=0x3010f0, obj=0x257000, script=0x81e000, rval=0x0) at jsapi.cpp:4982
#13 0x0000236e in Process (cx=0x3010f0, obj=0x257000, filename=0xbffffa0c "trace-test.js", forceTTY=0) at js.cpp:277
#14 0x00007bee in ProcessArgs (cx=0x3010f0, obj=0x257000, argv=0xbffff910, argc=2) at js.cpp:575
#15 0x00008d64 in main (argc=2, argv=0xbffff910, envp=0xbffff91c) at js.cpp:3989
(Assignee)

Comment 33

10 years ago
We run all off SS in debug mode but in opt mode crypt-sha1 fails. Without sha1 NJ2 is now within 10ms of the time of NJ1 (however, it still seems a tad slower consistently).
(Assignee)

Comment 34

10 years ago
Created attachment 342759 [details] [diff] [review]
working patch, still minimally slower (10ms-ish)
Assignee: graydon → gal
Attachment #342754 - Attachment is obsolete: true
Status: NEW → ASSIGNED
(Assignee)

Comment 35

10 years ago
I was not invoking underrunProtect in the new alignment code that generates wide nops. Also, don't try to align loop labels. It doesn't seem to help.
(Assignee)

Updated

10 years ago
Attachment #342759 - Flags: review?(danderson)
Attachment #342759 - Flags: review?(danderson) → review+
(Assignee)

Comment 36

10 years ago
I think this is ready to go in, but maybe we should do it after the merge tonight.
(Assignee)

Comment 37

10 years ago
Now a slight speedup (-2ms).

http://hg.mozilla.org/tracemonkey/rev/53072c29a4fe
(Assignee)

Updated

10 years ago
Status: ASSIGNED → RESOLVED
Last Resolved: 10 years ago
Resolution: --- → FIXED

Updated

10 years ago
Flags: in-testsuite-
Flags: in-litmus-
You need to log in before you can comment on or make changes to this bug.