Closed Bug 457786 Opened 17 years ago Closed 17 years ago

TM: merge tamarin-redux to tracemonkey

Categories

(Core :: JavaScript Engine, defect, P1)

Product:

Component:

Type:

defect

Priority:

P1

Severity:

normal

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: graydon, Assigned: gal)

References

Details

Attachments

(1 file, 8 obsolete files)

merge tamarin-redux to mozilla-central 17 years ago Graydon Hoare :graydon 199.46 KB, patch		Details \| Diff \| Splinter Review
patch moved to tracemonkey branch and refreshed for today's tip 17 years ago Graydon Hoare :graydon 198.85 KB, patch		Details \| Diff \| Splinter Review
newer and fresher still 17 years ago Graydon Hoare :graydon 153.35 KB, patch		Details \| Diff \| Splinter Review
patch with stack filter enabled 17 years ago Andreas Gal :gal 196.27 KB, patch		Details \| Diff \| Splinter Review
use LIR_j to a label after the params, rather than LIR_loop 17 years ago Graydon Hoare :graydon 2.28 KB, patch		Details \| Diff \| Splinter Review
updated merge patch 17 years ago Graydon Hoare :graydon 154.78 KB, patch		Details \| Diff \| Splinter Review
update to patch 17 years ago Graydon Hoare :graydon 155.71 KB, patch		Details \| Diff \| Splinter Review
updated patch against tip 17 years ago Andreas Gal :gal 162.06 KB, patch		Details \| Diff \| Splinter Review
working patch, still minimally slower (10ms-ish) 17 years ago Andreas Gal :gal 162.09 KB, patch	dvander : review+	Details \| Diff \| Splinter Review

Graydon Hoare :graydon

Reporter

Description

•

17 years ago

Attached patch merge tamarin-redux to mozilla-central (obsolete) — Details — Splinter Review

Here is a preliminary patch that merges the tamarin-redux branch to today's mozilla-central tip (specifically revision aa4d3083995f). The result appears to work as far as tests I know how to run, and can function in a browser with content jit turned on. Any feedback on what you see here would be good; I'm happy to revise this extensively, it's a little messy and reaches quite deep.

Andreas Gal :gal

Assignee

Updated

•

17 years ago

Attachment #341029 - Flags: review?(gal)

Andreas Gal :gal

Assignee

Updated

•

17 years ago

Attachment #341029 - Flags: review?(danderson)

Graydon Hoare :graydon

Reporter

Comment 1

•

17 years ago

Attached patch patch moved to tracemonkey branch and refreshed for today's tip (obsolete) — Details — Splinter Review

Fresh patch, same general content. Only mochitest failures are those I see on m-c too, very likely my own desktop. Still bringing up x64 and arm test VMs, no testing on them yet.

Attachment #341029 - Attachment is obsolete: true

Attachment #341358 - Flags: review?(gal)

Attachment #341029 - Flags: review?(gal)

Attachment #341029 - Flags: review?(danderson)

Andreas Gal :gal

Assignee

Comment 2

•

17 years ago

Doesn't apply against my tip, fails on Nativei386.cpp. Could you re-patch that? Looks not trivial so I don't want to touch it.

Graydon Hoare :graydon

Reporter

Comment 3

•

17 years ago

Attached patch newer and fresher still (obsolete) — Details — Splinter Review

apparently the previous patch with its delicious extra formatting doesn't enjoy applying to end-of-today's tip. try again!

Attachment #341358 - Attachment is obsolete: true

Attachment #341358 - Flags: review?(gal)

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 4

•

17 years ago

Looks good from a quick rundown of the diff, AMD64 will have problems but I can clean it up after landing. We should retest not having FASTCALL on x86 as well, with all those calling convention changes.

Andreas Gal :gal

Assignee

Comment 5

•

17 years ago

About 60ms perf loss over SS total. With old nanojit: t/3d-cube.js interp: 96 94 94 94 94 jit: 40 40 40 40 40 jit factor: 2.35 t/3d-morph.js interp: 83 84 83 83 83 jit: 30 29 29 29 29 jit factor: 2.86 t/3d-raytrace.js interp: 96 97 96 95 96 jit: 41 41 41 41 42 jit factor: 2.28 t/access-binary-trees.js interp: 39 40 40 39 40 jit: 39 40 40 40 40 jit factor: 1.00 t/access-fannkuch.js interp: 127 127 126 127 127 jit: 67 67 67 68 68 jit factor: 1.86 t/access-nbody.js interp: 108 108 108 108 108 jit: 31 30 31 30 30 jit factor: 3.60 t/access-nsieve.js interp: 38 38 38 38 39 jit: 13 12 12 12 12 jit factor: 3.25 t/bitops-3bit-bits-in-byte.js interp: 37 37 37 38 37 jit: 2 2 2 2 2 jit factor: 18.50 t/bitops-bits-in-byte.js interp: 58 58 58 58 58 jit: 9 10 10 10 9 jit factor: 6.44 t/bitops-bitwise-and.js interp: 54 54 54 54 54 jit: 4 4 3 4 4 jit factor: 13.50 t/bitops-nsieve-bits.js interp: 68 69 68 68 69 jit: 24 23 24 23 23 jit factor: 3.00 t/controlflow-recursive.js interp: 33 33 33 33 33 jit: 32 33 33 32 33 jit factor: 1.00 t/crypto-aes.js interp: 56 56 55 56 56 jit: 34 34 34 34 35 jit factor: 1.60 t/crypto-md5.js interp: 41 41 41 42 42 jit: 23 24 23 24 24 jit factor: 1.75 t/crypto-sha1.js interp: 42 42 42 42 42 jit: 9 9 9 9 9 jit factor: 4.66 t/date-format-tofte.js interp: 113 113 114 114 114 jit: 111 110 111 111 110 jit factor: 1.03 t/date-format-xparb.js interp: 96 95 95 95 96 jit: 103 103 104 103 103 jit factor: .93 t/math-cordic.js interp: 98 98 98 98 99 jit: 20 21 21 20 21 jit factor: 4.71 t/math-partial-sums.js interp: 91 91 91 91 91 jit: 14 13 14 14 13 jit factor: 7.00 t/math-spectral-norm.js interp: 51 51 51 51 51 jit: 8 8 8 8 7 jit factor: 7.28 t/regexp-dna.js interp: 226 226 226 227 227 jit: 227 227 227 227 227 jit factor: 1.00 t/string-base64.js interp: 41 41 41 41 41 jit: 15 16 16 15 16 jit factor: 2.56 t/string-fasta.js interp: 107 106 106 106 106 jit: 75 75 76 76 76 jit factor: 1.39 t/string-tagcloud.js interp: 113 112 113 112 113 jit: 106 106 106 106 106 jit factor: 1.06 t/string-unpack-code.js interp: 148 147 148 147 148 jit: 149 149 150 149 149 jit factor: .99 t/string-validate-input.js interp: 61 61 61 62 61 jit: 42 42 42 43 42 jit factor: 1.45 New nanojit: t/3d-cube.js interp: 95 94 94 94 94 jit: 44 44 44 46 45 jit factor: 2.08 t/3d-morph.js interp: 86 86 86 86 86 jit: 30 30 30 30 30 jit factor: 2.86 t/3d-raytrace.js interp: 95 94 94 94 94 jit: 43 43 43 43 43 jit factor: 2.18 t/access-binary-trees.js interp: 38 38 37 38 38 jit: 39 39 38 39 38 jit factor: 1.00 t/access-fannkuch.js interp: 122 123 123 123 122 jit: 78 78 77 78 77 jit factor: 1.58 t/access-nbody.js interp: 101 101 101 101 102 jit: 31 32 31 31 31 jit factor: 3.29 t/access-nsieve.js interp: 38 38 38 38 38 jit: 14 15 14 14 14 jit factor: 2.71 t/bitops-3bit-bits-in-byte.js interp: 37 37 38 38 37 jit: 4 4 4 5 4 jit factor: 9.25 t/bitops-bits-in-byte.js interp: 58 58 58 58 58 jit: 14 13 14 14 13 jit factor: 4.46 t/bitops-bitwise-and.js interp: 54 54 54 54 54 jit: 6 6 5 5 6 jit factor: 9.00 t/bitops-nsieve-bits.js interp: 69 69 68 68 68 jit: 26 26 27 27 26 jit factor: 2.61 t/controlflow-recursive.js interp: 34 34 34 34 34 jit: 33 33 33 34 34 jit factor: 1.00 t/crypto-aes.js interp: 56 56 55 56 55 jit: 36 37 37 37 37 jit factor: 1.48 t/crypto-md5.js interp: 41 41 41 41 41 jit: 26 27 26 26 26 jit factor: 1.57 t/crypto-sha1.js interp: 43 42 42 43 42 jit: 10 10 10 10 10 jit factor: 4.20 t/date-format-tofte.js interp: 115 116 116 115 116 jit: 114 113 113 113 114 jit factor: 1.01 t/date-format-xparb.js interp: 96 97 97 97 97 jit: 104 105 104 104 104 jit factor: .93 t/math-cordic.js interp: 92 93 93 92 93 jit: 23 23 24 23 23 jit factor: 4.04 t/math-partial-sums.js interp: 88 89 89 89 88 jit: 14 14 15 15 15 jit factor: 5.86 t/math-spectral-norm.js interp: 46 45 46 46 46 jit: 9 8 9 9 9 jit factor: 5.11 t/regexp-dna.js interp: 228 229 229 227 228 jit: 228 232 228 229 229 jit factor: .99 t/string-base64.js interp: 51 51 51 51 51 jit: 17 17 16 17 17 jit factor: 3.00 t/string-fasta.js interp: 107 107 107 107 108 jit: 78 78 79 78 78 jit factor: 1.38 t/string-tagcloud.js interp: 117 113 113 114 114 jit: 106 107 106 106 106 jit factor: 1.07 t/string-unpack-code.js interp: 148 148 149 149 148 jit: 151 151 150 150 151 jit factor: .98 t/string-validate-input.js interp: 63 62 63 63 63 jit: 43 43 43 43 43 jit factor: 1.46

David Anderson [:dvander] - inactive, e-mail if emergency

Updated

•

17 years ago

Blocks: 456607

Andreas Gal :gal

Assignee

Comment 6

•

17 years ago

Looks like the StoreFilter isn't working properly. Try running trace.js: --------------------------------------- end exit block SID 0 sti sp[0] = add2 mov 0(esi),eax eax(add2) ecx(state) ebx(add1) esi(sp) mov edi,esi eax(add2) ecx(state) ebx(add1) esi(sp) sti sp[-8] = add2 mov -8(edi),eax eax(add2) ecx(state) ebx(add1) edi(sp) mov esi,edi eax(add2) ecx(state) ebx(add1) edi(sp) sti sp[0] = add2 mov 0(esi),eax eax(add2) ecx(state) ebx(add1) esi(sp) mov edi,esi eax(add2) ecx(state) ebx(add1) esi(sp) sti sp[8] = 5000 mov 8(edi),5000 eax(add2) ecx(state) ebx(add1) edi(sp)

No longer blocks: 456607

Andreas Gal :gal

Assignee

Comment 7

•

17 years ago

Attached patch patch with stack filter enabled (obsolete) — Details — Splinter Review

The generated code is still slightly slower. What are 3 parameters loaded on top of each fragment? param1/2/3?

Attachment #341364 - Attachment is obsolete: true

Andreas Gal :gal

Assignee

Comment 8

•

17 years ago

sub esp,24 compiling trunk 0x303940 T1 param1 = param 0 ebx spill param1 mov -4(ebp),ebx ebx(param1) param2 = param 1 esi spill param2 mov -8(ebp),esi esi(param2) param3 = param 2 edi spill param3 mov -12(ebp),edi edi(param3) state = param 0 ecx sp = ld state[0] mov edx,0(ecx) ecx(state) ld1 = ld sp[-16]

Andreas Gal :gal

Assignee

Comment 9

•

17 years ago

So I guess the register allocator is now used to save callee-saved registers. Unfortunately this happens on trace, so at the end of trace we restore them and the jump to SOT spills them again. Not good.

Assignee: general → graydon

Severity: enhancement → normal

Priority: -- → P1

Andreas Gal :gal

Assignee

Comment 10

•

17 years ago

I guess we can try to use the new branch instructions, but that doesn't seem to work entirely: I create a label at the top of the tree (when fragment->root == fragment, so firstfragment). if (fragment->root == fragment) treeInfo->sot = lir->ins0(LIR_label); And then at each tail I try to jump to it: lir->insBranch(LIR_j, NULL, treeInfo->sot); if (fragment == fragment->root) { fragment->lastIns = lir->insGuard(LIR_loop, lir->insImm(1), exit); } else { fragment->lastIns = lir->insGuard(LIR_x, lir->insImm(1), exit); } compile(fragmento); Note that we need the lastIns stuff still. We can't make the LIR_j the last ins(why?). Also, this doesn't seem to compute the right label: j -> label1 Loop j -> label1 jmp 0x0 Maybe nanojit only supports downward branches? Any ideas graydon?

Andreas Gal :gal

Assignee

Comment 11

•

17 years ago

Also I had to disable the stack filter again. That seems to botch in the presence of jumps. We definitively need to work that out if we want to use jumps on trace.

Comment 12

•

17 years ago

Assembler does support branches in both directions, but has no smarts for loop carried registers, so when a backwards branch is ecnountered during codegen, we require all registers to be empty. (not ideal). I don't know what's going on with StackFilter. this is what i'd expect to see if you want to use LIR_j instead of LIR_loop, for loop back edges, and move the callee-saved reg instructions out of the loop, but without changing anything else: start param 0,1 // ebx param 1,1 // esi param 2,1 // edi label param 0,0 // ecx param 1,0 // edx, not sure if you use this ... [update state] j -> label The code in [update state] would be whatever is there now. for a non-recursive loop, sp/rp shouldn't need to move, assuming the stack is balanced, so it would just be stores of loop variant variables, like the loop counter. if this starts to get too hard then its possible to not emit the callee-saved params, and then change the prolog/epilog code back to explicitly push/pop those registers. Yet a third option is to make use of a stub for transitioning betwen interp and trace and have the stub handle callee-saved regs. then you have a simple prolog and no need to save any regs.

Comment 13

•

17 years ago

In Assembler.h, pending_lives should be LIST_NonGCObjects

Comment 14

•

17 years ago

I have your patch re-merged back into redux, with all the new x64, arm, explicit free code, etc. My first step was a bulk copy, so the lion share of merging work (done by you, thank you) is hopefully behind us. some comments: - ExprFilter removed the optimization for cmov(const ? x : y) => x or y. any idea why? - i re-added the const folding for LIR_add/sub/mul, guarded by ifdefs. I think if we simply have these not fold in the case of overflow, we could be fine. ? - I re-added code guarded by PERFM and VTUNE. i will clean up the ifdef names so we can leave the code in place and TM can disable it easily. - the array inside LInsHashSet needs to be zero'd upon alloc. We were calling gc->Alloc(size, GC::kZero), your patch removed the kZero. since you aren't crashing horribly, i am assuming your gc->Alloc() maps to calloc()? anyway, i added the flag back. - (maybe for danders) why the new +5 in AMD64's underrunProtect()? more after another review of my diffs. since your patch hasn't landed yet i'm anticipating a second round of (smaller!) merges after this.

Andreas Gal :gal

Assignee

Comment 15

•

17 years ago

I think the previous cmov code didn't fold if both sides were constant. I tried to fix that.

Graydon Hoare :graydon

Reporter

Comment 16

•

17 years ago

- Wrt. the ExprFilter cmov optimization: I don't see it being removed in the patch I'm looking at. This is in ExprFilter::ins2(), around line 772? I still see this code: if (oprnd1->isconst()) { // const ? x : y => return x or y depending on const return oprnd1->constval() ? oprnd2->oprnd1() : oprnd2->oprnd2(); } - Wrt. const folding on the add/sub/mul, I don't know what you mean by ifdef'ing them in: why conditionally compile? Does the ifdef-guarded code check for overflow? It seems to me that any correct version of constant folding there must do so. If you've added code to do so, why ifdef at all? - Wrt. PERFM / VTUNE, yeah, I was spotty and inconsistent about which of them I deleted and which I left in. I think I initially wanted to minimize the patch landing on tracemonkey, and later wanted to minimize the patch landing on your side instead. I've no personal preference how this goes. - Wrt. kZero, yes, our fake "gc" falls through to calloc, I checked on this. It also doesn't accept a second param at all. Perhaps for future merge simplicity we should make it do so, and ignore it. Or even support it! malloc is faster than calloc, after all. - Wrt. +5 in underrunProtect(), I have no idea.

Andreas Gal :gal

Assignee

Comment 17

•

17 years ago

I would really prefer if we can properly initialize fields in nanojit instead of using calloc.

David Anderson [:dvander] - inactive, e-mail if emergency

Updated

•

17 years ago

Blocks: 458474

David Anderson [:dvander] - inactive, e-mail if emergency

Updated

•

17 years ago

Blocks: 456607

Comment 18

•

17 years ago

(In reply to comment #17) > I would really prefer if we can properly initialize fields in nanojit instead > of using calloc. LInsHashSet's array needs to be zero'd since it's a hashtable and 0=empty. For other structures, Agreed.

Comment 19

•

17 years ago

(In reply to comment #16) > - Wrt. the ExprFilter cmov optimization: aha, i missed the case near the top that covered it. fixed. > - Wrt. const folding on the add/sub/mul the ifdef'd code doesn't check for overflow but should and therefore doesn't need to be ifdef'd. will fix. > - Wrt. kZero, yes, our fake "gc" falls through to calloc, I checked on this. i'll leave it as Alloc(size, kZero), maybe tm can add add the flags (default=0) support for compatibility with MMgc::GC's Alloc() api? underrunprotect()+5 wasn't needed after all, but we have more fixes in that area coming that will subsume the tweaked code. background: underrunProtect() needs to make sure the next instruction to be written still fits on the current page -- needs to add padding for the size of the page header only. when a new page is allocated, a jump is written from the new page to the old one, and as long as the new page is larger than the size of the jump instruction, we're good.

Andreas Gal :gal

Assignee

Updated

•

17 years ago

Summary: merge tamarin-redux to tracemonkey → TM: merge tamarin-redux to tracemonkey

Graydon Hoare :graydon

Reporter

Comment 20

•

17 years ago

Attached patch use LIR_j to a label after the params, rather than LIR_loop (obsolete) — Details — Splinter Review

This patch replaces the LIR_loop with LIR_j and passes trace tests. Not sure about full browser, or how to measure remaining perf perturbation. Will look further tomorrow. (The trick was that you have to jump to *after* the second set of param opcodes, not between the callee-save params and caller-passed params. If you jump between the two sets, you get a failure to reload spilled members of the second set that don't live through the whole trace: namely the state variable, in many traces. I am not entirely sure why this is so...)

Comment 21

•

17 years ago

LIR_loop has a SideExit parameter for vm-specific stuff, but LIR_loop doesn't. Gal hinted this could be a problem, is it? because if not, LIR_loop probably can go away entirely.

Graydon Hoare :graydon

Reporter

Comment 22

•

17 years ago

Attached patch updated merge patch (obsolete) — Details — Splinter Review

Updated patch. Includes: - refresh to tracemonkey rev 4dd36c3e0cdb, after bug 458735 - LIR_j jumping to label after params - Dummy LIR_x after LIR_j to make StackFilter behave - StackFilter enabled in both places, ignoring LIR_j and LIR_label - Integrated fix-in-progress to bug 458431 - Integrated edwsmith's proposed fix to make RegAlloc::usepri and active both be LastReg+1 in length Appears to work on linux-x86. Sampling 50 runs of the benchmark script gives a very slight performance regression. Haven't isolated that yet. Anyone else want to give it a spin, confirm/deny? Have not checked other platforms yet. without patch: count 50, sum 83632, [1532,1812], avg 1672 +/- 62 (3.742223%) with patch: count 50, sum 85219, [1587,1825], avg 1704 +/- 58 (3.455970%)

Attachment #341377 - Attachment is obsolete: true

Attachment #342178 - Attachment is obsolete: true

Andreas Gal :gal

Assignee

Comment 23

•

17 years ago

I definitively see the perf regression too, 1285 -> 1312.

Andreas Gal :gal

Assignee

Comment 24

•

17 years ago

The perf regression is mild. I would not be opposed to landing this now and then analyze the cause, otherwise the patch will go stale again.

Andreas Gal :gal

Assignee

Comment 25

•

17 years ago

Brendan points out we might want to wait with landing until we merge with mc since we don't want to regress perf of mc.

Brendan Eich [:brendan]

Comment 26

•

17 years ago

We should figure out the cause of the perf regression, and fix it or make a compensating perf-win fix elsewhere. No going backward. /be

Graydon Hoare :graydon

Reporter

Comment 27

•

17 years ago

I concur, didn't mean to suggest otherwise. Just need to work out what's causing it. Details then: breaking down the benchmark to individual tests, it appears to be a general cost increase, same increase on all tests, not a pathological case. So I compared pre- and post- variants of the smallest test, bitops-bitwise-and.js. The results appear to my untrained eye to point a finger at the new register allocator: it keeps fewer live registers at several points, and as a result winds up with 4 more inter-register shuffling instructions inside the loop. Do I misread? Here's a screenshot that, I think, makes it pretty clear: http://venge.net/graydon/bitwise-and.png Left is current TM tip, right is with the redux patch. Any insights or possible solutions, Edwin? I'm also curious about all the nops at the entry, but that is probably less important.

Graydon Hoare :graydon

Reporter

Comment 28

•

17 years ago

Further update: I spent the afternoon mucking with oprofile to try to confirm that these few movs are the source of the grief, and found that at least bitwise-and was losing a little more to codegen issues (page growth strategy in particular) than this. But fixing that that only fixed 3 or 4 of the benchmarks. I focused on the next one (3bits) and found that, if I artificially cranked up the iteration count of the innermost loops, I could clearly find the generated code -- samples occurring *on* this code page -- running about 2-3% slower. And on disassembly, it's the same pattern: extra inter-register shuffles, less register occupancy, ~10% more code emitted. I can try confirming that it's not loss to branch mispredicts or cache misses or such, but the most obvious thing still looks like these not-so-useful insns: http://venge.net/graydon/3bit-bits-in-byte.png So er, I actually don't know how to fix this. Register allocators are sort of deep combinatorial magic. Any suggestions?

Comment 29

•

17 years ago

The nop's at entry are emitted in genPrologeu to align the method address on a %16 boundary. it's very dumb, intel does publish multibyte NOP instructions that would be smarter. and the whole alignment thing is potentially not a win anyway, i would be fine with an option switch somewhere to disable it. looking at register alloc now.

Graydon Hoare :graydon

Reporter

Comment 30

•

17 years ago

Attached patch update to patch (obsolete) — Details — Splinter Review

Thanks to a quick turnaround on the register allocation bug from Edwin, this patch (freshened to today's tip) is now down to about 1% (on some tests 2%) regression from baseline. I'm trying to further isolate the remainder but it's getting trickier since the generated code actually reads better with the patch than without; a couple fewer memory references within the loop. It just runs slightly slower. Odd. Trying a variety of hw performance counters to see if anything shows up, so far no dice. Feel free to test perf on your own systems and see if it's just something odd about my setup.

Attachment #342363 - Attachment is obsolete: true

Andreas Gal :gal

Assignee

Updated

•

17 years ago

Depends on: 459537

Andreas Gal :gal

Assignee

Comment 31

•

17 years ago

Attached patch updated patch against tip (obsolete) — Details — Splinter Review

Attachment #342646 - Attachment is obsolete: true

Andreas Gal :gal

Assignee

Comment 32

•

17 years ago

After fixing 459537 we now crash with NJ2 on trace-tests in the decay-loop testcase. However, we don't crash if the test is run individually. This looks like a reproducible memory corruption bug: Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_INVALID_ADDRESS at address: 0x00401f0f 0x0013bd6a in nanojit::Assembler::onPage (this=0x803600, where=0x263000 "\017\037@", exitPages=false) at nanojit/Assembler.cpp:383 383 page = page->next; (gdb) bt #0 0x0013bd6a in nanojit::Assembler::onPage (this=0x803600, where=0x263000 "\017\037@", exitPages=false) at nanojit/Assembler.cpp:383 #1 0x0013bdbe in nanojit::Assembler::pageValidate (this=0x803600) at nanojit/Assembler.cpp:393 #2 0x0014bb84 in nanojit::Assembler::gen (this=0x803600, reader=0xbfffcd3c, loopJumps=@0xbfffcdbc) at nanojit/Assembler.cpp:1730 #3 0x0014bd69 in nanojit::Assembler::assemble (this=0x803600, frag=0x38f640, loopJumps=@0xbfffcdbc) at nanojit/Assembler.cpp:862 #4 0x0015b643 in nanojit::compile (assm=0x803600, triggerFrag=0x38f640) at nanojit/LIR.cpp:2061 #5 0x001222a5 in TraceRecorder::compile (this=0x3902e0, fragmento=0x3005e0) at jstracer.cpp:1939 #6 0x001225c1 in TraceRecorder::closeLoop (this=0x3902e0, fragmento=0x3005e0) at jstracer.cpp:1976 #7 0x0012428b in js_CloseLoop (cx=0x3010f0) at jstracer.cpp:2514 #8 0x00137c43 in js_RecordLoopEdge (cx=0x3010f0, r=0x3902e0, inlineCallCount=@0xbfffdb48) at jstracer.cpp:2530 #9 0x001382f5 in js_MonitorLoopEdge (cx=0x3010f0, inlineCallCount=@0xbfffdb48) at jstracer.cpp:2839 #10 0x00068a1b in js_Interpret (cx=0x3010f0) at jsinterp.cpp:3696 #11 0x000991b0 in js_Execute (cx=0x3010f0, chain=0x257000, script=0x81e000, down=0x0, flags=0, result=0x0) at jsinterp.cpp:1550 #12 0x00018804 in JS_ExecuteScript (cx=0x3010f0, obj=0x257000, script=0x81e000, rval=0x0) at jsapi.cpp:4982 #13 0x0000236e in Process (cx=0x3010f0, obj=0x257000, filename=0xbffffa0c "trace-test.js", forceTTY=0) at js.cpp:277 #14 0x00007bee in ProcessArgs (cx=0x3010f0, obj=0x257000, argv=0xbffff910, argc=2) at js.cpp:575 #15 0x00008d64 in main (argc=2, argv=0xbffff910, envp=0xbffff91c) at js.cpp:3989

Andreas Gal :gal

Assignee

Comment 33

•

17 years ago

We run all off SS in debug mode but in opt mode crypt-sha1 fails. Without sha1 NJ2 is now within 10ms of the time of NJ1 (however, it still seems a tad slower consistently).

Andreas Gal :gal

Assignee

Comment 34

•

17 years ago

Attached patch working patch, still minimally slower (10ms-ish) — Details — Splinter Review

Assignee: graydon → gal

Attachment #342754 - Attachment is obsolete: true

Status: NEW → ASSIGNED

Andreas Gal :gal

Assignee

Comment 35

•

17 years ago

I was not invoking underrunProtect in the new alignment code that generates wide nops. Also, don't try to align loop labels. It doesn't seem to help.

Andreas Gal :gal

Assignee

Updated

•

17 years ago

Attachment #342759 - Flags: review?(danderson)

David Anderson [:dvander] - inactive, e-mail if emergency

Updated

•

17 years ago

Attachment #342759 - Flags: review?(danderson) → review+

Andreas Gal :gal

Assignee

Comment 36

•

17 years ago

I think this is ready to go in, but maybe we should do it after the merge tonight.

Andreas Gal :gal

Assignee

Comment 37

•

17 years ago

Now a slight speedup (-2ms). http://hg.mozilla.org/tracemonkey/rev/53072c29a4fe

Andreas Gal :gal

Assignee

Updated

•

17 years ago

Status: ASSIGNED → RESOLVED

Closed: 17 years ago

Resolution: --- → FIXED

Bob Clary [:bc] (inactive)

Updated

•

17 years ago

Flags: in-testsuite-

Flags: in-litmus-

You need to log in before you can comment on or make changes to this bug.