Closed Bug 628350 Opened 13 years ago Closed 6 years ago

Mops performance improvement on x86

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

People

(Reporter: sold2, Assigned: sold2)

References

Details

Attachments

(1 file, 7 obsolete files)

Initial implementation of segmented mops 13 years ago sold2 854.98 KB, patch		Details \| Diff \| Splinter Review
Initial implmentation of segmented mops 13 years ago sold2 885.61 KB, patch		Details \| Diff \| Splinter Review
Initial implementation of segmented mops - diff 13 years ago sold2 211.65 KB, patch		Details \| Diff \| Splinter Review
thread_local.h small fix 13 years ago sold2 344 bytes, patch		Details \| Diff \| Splinter Review
Initial implementation 13 years ago sold2 218.46 KB, patch		Details \| Diff \| Splinter Review
Initial implementation 13 years ago sold2 218.49 KB, patch		Details \| Diff \| Splinter Review
Initial implementation 13 years ago sold2 219.04 KB, patch		Details \| Diff \| Splinter Review
Segmented mops sketch 13 years ago sold2 258.16 KB, patch	sold2 : feedback?	Details \| Diff \| Splinter Review

sold2

Assignee

Description

•

13 years ago

User-Agent:       Mozilla/4.0 (compatible; MSIE 8.0; Windows NT 6.0; Trident/4.0; GTB6.6; Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1) ; SLCC1; .NET CLR 2.0.50727; Media Center PC 5.0; InfoPath.2; .NET CLR 3.5.21022; .NET CLR 3.5.30729; .NET CLR 3.0.30729; .NET4.0C; .NET4.0E)
Build Identifier: tamarin-redux-c72c9b1c20a9

As of the current implementation, mops (domainMemory load/store instructions, sXX/lXX) carry a significant overhead: the array's base address  and size are fetched from memory, then the offset is bounds checked. The JIT only makes a shallow attempt to eliminate redundancies, so these costs are more-or-less per-mop. On x86 it's possible to eliminate these steps entirely, reducing mops to a single MOV instruction, while still maintaining based addressing and bounds checking, using some segmentation tricks.

Reproducible: Always

sold2

Assignee

Comment 1

•

13 years ago

There needs to be a way to distinguish between a mop load/store, and any other form of load/strore. Would adding a new access set do? (e.g ACCSET_DOMAIN. mops currently use ACCSET_OTHER) As far as I can tell there shouldn't be any aliasing issues here (it can only improve aliasing analysis, if anything)

Rick Reitmaier

Comment 2

•

13 years ago

(In reply to comment #1)
> As far as I can tell there shouldn't be any
> aliasing issues here (it can only improve aliasing analysis, if anything)

True, this is a good first step and should provide the isolation you need.  You may want to open a new bug for each independent piece of work (e.g. access set, segmentation)  and then use this bug as a 'tracker' (i.e the 'depends on:' field of this bug is filled with the other bug numbers).

re: the segmentation tricks, not sure what you have in mind, but just a gentle reminder that one use of tamarin is that its embedded in a plug-in that relies on the hosting application for resources.

Steven Johnson

Comment 3

•

13 years ago

IMHO we should consider changing the behavior for future SWF versions: instead of range-checking, constrain the memory to a power-of-two and trim the offset using a bitwise and.

Edwin Smith

Comment 4

•

13 years ago

(In reply to comment #0)

> On x86 it's possible to eliminate these steps entirely, reducing mops
> to a single MOV instruction, while still maintaining based addressing and
> bounds checking, using some segmentation tricks.

Could you elaborate on what the segmentation tricks are?

sold2

Assignee

Comment 5

•

13 years ago

@Rick:
> True, this is a good first step and should provide the isolation you need.

Any reason not use this for non x86 builds too, then?

@Rick & Edwin:
"segmentation tricks" was a bad choice of words, obviously :-P
It's actually pretty basic. Each segment (descriptor) has a base address and a limit. Memory accesses to the segment are zero based, and automtically adjusted to the correct address, and any out of bounds access raises a segmentation fault exception. That's exactly the ingreediants of the current implementation.
Basically, a segment is set up that spans the domainMemory array, and any access violations to this segment are caught and redirected to the RangeError thunk.

sold2

Assignee

Comment 6

•

13 years ago

(In reply to comment #3)
> IMHO we should consider changing the behavior for future SWF versions: instead
> of range-checking, constrain the memory to a power-of-two and trim the offset
> using a bitwise and.

This would only spare you the possible branching, that's all. And since you get nasty "edge cases" too (reading 2 bytes, one byte from the end), it would probablly be even more expensive.

sold2

Assignee

Comment 7

•

13 years ago

(In reply to comment #6)
> And since you get nasty "edge cases" too
> (reading 2 bytes, one byte from the end), it would
> probablly be even more expensive.

Rereading my comment I figure that this is probably irrelevant, since the result of such a thing doesn't need to be well-defined anyway. You could just augment the array with a safety-buffer. Still, this only gets rid of the branching, and maybe some of the arithmetic, and we still need to get the array's base & size from memory. For how long can they be reused? Basic blocks, assuming no register pressure? I mean, right now, even this trivial code:

loop:
	pushbyte 0;
	dup;
	si32;
jump loop;

translates into  read base & size + bounds check, on every iteration.

Not to mention the fact that simply trimming the offset would turn what's now a visible bug in user code, into a quite bug.

Btw, I'm going to take Rick's advice and create new bugs for sub-issues. Hope this doesn't cause too much noise.

Steven Johnson

Comment 8

•

13 years ago

(In reply to comment #6)
> This would only spare you the possible branching, that's all. 

But that would still be a win on architectures without segmentation tricks (i.e., pretty much everything except x86).

sold2

Assignee

Comment 9

•

13 years ago

(In reply to comment #8)
> (In reply to comment #6)
> > This would only spare you the possible branching, that's all. 
> But that would still be a win on architectures without segmentation tricks
> (i.e., pretty much everything except x86).

Oh yeah, forgot about those :)
Anyway, just keep in mind that this still comes at a price: you open a door for hard-to-find user-code bugs. (an oob pointer quitely wraps around and corrupts some data structure, etc...)

sold2

Assignee

Updated

•

13 years ago

Depends on: 628776

sold2

Assignee

Comment 10

•

13 years ago

Where does VMThread belong? Is is it an integral part of AVM?
It's in the 'vmbase' directory right next to things like 'atom', but in the VS project tree it's under 'shell'.
I need a mutex. Can I use it in 'avmplus'?

Edwin Smith

Comment 11

•

13 years ago

things in vmbase are meant as infrastructure usable by MMgc and avmplus, so yeah, fair game.  they aren't shell-only things.

sold2

Assignee

Comment 12

•

13 years ago

Is there a way to retreive the current DomainEnv from within CodegenLIR at jit-time? In central Domain was accessible through pool->domain, but in redux most of what used to be in Domain has been moved to DomainEnv, which seems to be retreived at run-time.

Edwin Smith

Comment 13

•

13 years ago

The same piece of JIT'd code will be used for varying DomainEnv's, hence the compiled code should load the address to use at runtime.  This was a bug in central and its been a while since that branch was updated.

sold2

Assignee

Comment 14

•

13 years ago

Ok, I got some numbers, and they're not very exciting. I am able to cut down up to about 15% of the execution-time, on-off. (The initial tests on central gave way more dramatic results)
Anyway, the tests I'm running are very simplistic, and might not be such a good indication. What are you using to test mops performance? There doesn't seem to be anything relevant in test/performance.
Also, it might be worth testing this on other machines too. The produced code for non-segmented mops isn't that different between central and redux (but performance is, at least in my case), which suggests that numbers might be processor-sensitive (e.g. branch-prediction depth, etc') If anyone's willing to give this a try on their machine, let me know.

sold2

Assignee

Comment 15

•

13 years ago

I ran a few more tests, only slightly more challenging, and I'm getting over 40% cut-down, so the earlier ones were definitely not very indicative.
Anyway, where do I take it from here? I'm waiting on some sort of feedback.

Steven Johnson

Comment 16

•

13 years ago

Step one would probably be to attach a patch to this bug, and request feedback from someone (eg Edwin, me, anyone else commenting so far)

Rick Reitmaier

Comment 17

•

13 years ago

If you haven't already come across these pages, more details can be found here https://developer.mozilla.org/en/Tamarin and here https://developer.mozilla.org/en/Hacking_Mozilla#Life_Cycle_of_a_Patch

sold2

Assignee

Comment 18

•

13 years ago

I will upload what I got so far, but it's by no means complete. There are still some loose ends, and I'm going to need some design clarifications to make progress (the sub-issue i posted a while ago, some of it irrelevant by now, got no response. Don't want to be a nagger, but I only got limited time to invest in this).
If anyone else wants to take over the code, that's fine too.

Steven Johnson

Comment 19

•

13 years ago

(In reply to comment #18)
> I will upload what I got so far, but it's by no means complete. There are still
> some loose ends, and I'm going to need some design clarifications to make
> progress

That's ok, sketches are fine.

> (the sub-issue i posted a while ago, some of it irrelevant by now, got
> no response. Don't want to be a nagger, but I only got limited time to invest
> in this).

What was the sub-issue? I don't seem to see it.

sold2

Assignee

Comment 20

•

13 years ago

> What was the sub-issue? I don't seem to see it.

It's the 'depends on' bug (628776). Sorry if I was too quite about it, I thought those changes are being notified.

sold2

Assignee

Comment 21

•

13 years ago

Attached patch Initial implementation of segmented mops (obsolete) — Details — Splinter Review

sold2

Assignee

Comment 22

•

13 years ago

If you wonder why the patch is so big, it's because it includes an assembler used to build the performance test. It's not like there's tons of code :)

sold2

Assignee

Updated

•

13 years ago

Attachment #509563 - Attachment is obsolete: true

sold2

Assignee

Comment 23

•

13 years ago

Attached patch Initial implmentation of segmented mops (obsolete) — Details — Splinter Review

Synced code with revision bfbc6d260f40, and added an altenative backend for segmentation_interface.

sold2

Assignee

Comment 24

•

13 years ago

Attached patch Initial implementation of segmented mops - diff (obsolete) — Details — Splinter Review

sold2

Assignee

Comment 25

•

13 years ago

I tried to run the mops acceptance tests on linux with the original code, straight from the repository, and it segfaults. Is this a known issue?

Rick Reitmaier

Comment 26

•

13 years ago

(In reply to comment #25)
> linux segfaults. Is this a known issue?

News to me; can you send me the details on platform and testcase.

sold2

Assignee

Comment 27

•

13 years ago

(In reply to comment #26)
> News to me; can you send me the details on platform and testcase.

I'm getting this on ubuntu 10.04, inside a VM. Simple release build of bfbc6d260f40. Running test/acceptance/mops/mops.abc_ without any special flags.

Stack trace:

#0  avmplus::ExceptionFrame::beginCatch (this=0xb7e4e020) at core/Exception.cpp:188
#1  0x080fd77e in avmplus::interpBoxed (env=0xb7ec0140, _argc=0, _atomv=0xbfffecb8) at core/Interpreter.cpp:3233
#2  0x080f6d9c in avmplus::BaseExecMgr::invokeInterpNoCoerce (env=0xb7ec0140, argc=0, args=0xbfffecb8) at core/exec.cpp:812
#3  avmplus::BaseExecMgr::initInvokeInterpNoCoerce (env=0xb7ec0140, argc=0, args=0xbfffecb8) at core/exec.cpp:191
#4  0x080b0e14 in avmplus::MethodEnv::coerceEnter (this=0xb7e03020, code=..., start=0, toplevel=0xb7e6c040, ninit=0x0, codeContext=0xb7e060b8, apiVersion=avmplus::kApiVersion_SWF_13) at core/MethodEnv-inlines.h:137
#5  avmplus::AvmCore::callScriptEnvEntryPoint (this=0xb7e03020, code=..., start=0, toplevel=0xb7e6c040, ninit=0x0, codeContext=0xb7e060b8, apiVersion=avmplus::kApiVersion_SWF_13) at core/AvmCore.cpp:720
#6  avmplus::AvmCore::handleActionPool (this=0xb7e03020, code=..., start=0, toplevel=0xb7e6c040, ninit=0x0, codeContext=0xb7e060b8, apiVersion=avmplus::kApiVersion_SWF_13) at core/AvmCore.cpp:800
#7  avmplus::AvmCore::handleActionBlock (this=0xb7e03020, code=..., start=0, toplevel=0xb7e6c040, ninit=0x0, codeContext=0xb7e060b8, apiVersion=avmplus::kApiVersion_SWF_13) at core/AvmCore.cpp:847
#8  0x080702af in avmshell::ShellCore::handleArbitraryExecutableContent (this=0xb7e03020, settings=..., code=..., filename=0xbffff52a "test/acceptance/mops/mops.abc_") at shell/ShellCore.cpp:542
#9  0x080705f5 in avmshell::ShellCore::evaluateFile (this=0xb7e03020, settings=..., filename=0xbffff52a "test/acceptance/mops/mops.abc_") at shell/ShellCore.cpp:519
#10 0x0806cf69 in avmshell::Shell::singleWorkerHelper (shell=0xb7e03020, settings=...) at shell/avmshell.cpp:219
#11 0x0806d498 in avmshell::Shell::singleWorker (settings=...) at shell/avmshell.cpp:178
#12 0x0806dc9a in avmshell::Shell::run (argc=2, argv=0xbffff354) at shell/avmshell.cpp:146
#13 0x0807dc55 in main (argc=2, argv=0xbffff354) at shell/avmshellUnix.cpp:112

Windows / Mac are doing fine.

sold2

Assignee

Comment 28

•

13 years ago

Attached patch thread_local.h small fix (obsolete) — Details — Splinter Review

There was a small mistake in therad_local.h, this fixes it. Should be applayed on top of the main patch (or just editted manually)

sold2

Assignee

Updated

•

13 years ago

Attachment #510733 - Attachment is patch: true

sold2

Assignee

Comment 29

•

13 years ago

Attached patch Initial implementation (obsolete) — Details — Splinter Review

Fixed formatting (tabs to spaces, crlfs), new segfault_handler, fast-mops features turned on in avmshell-features.h

Attachment #510624 - Attachment is obsolete: true

Attachment #510625 - Attachment is obsolete: true

Attachment #510733 - Attachment is obsolete: true

sold2

Assignee

Comment 30

•

13 years ago

Attached patch Initial implementation (obsolete) — Details — Splinter Review

Small fix to segfault_handler to make Mac happy.

Attachment #511088 - Attachment is obsolete: true

sold2

Assignee

Comment 31

•

13 years ago

Attached patch Initial implementation (obsolete) — Details — Splinter Review

A few fixes to thread_local

Attachment #511212 - Attachment is obsolete: true

sold2

Assignee

Comment 32

•

13 years ago

If anyone got to look into this yet, or is willing to, let me know so I can request feedback.

Steven Johnson

Comment 33

•

13 years ago

I don't think anyone has looked at it yet; I've been pretty tied up recently, but hopefully will be able to take a look in the next week or so.

sold2

Assignee

Updated

•

13 years ago

Attachment #511744 - Attachment is obsolete: true

sold2

Assignee

Comment 34

•

13 years ago

Obsolted the last patch. Anyway, it's useless to keep uploading updates if no one looks at it. When anybody gets time to give feedback, let me know and I'll upload the most recent patch, so we'll be in sync.

Steven Johnson

Comment 35

•

13 years ago

Yeah, sorry for the delayed response. Please keep pinging us if you don't get feedback.

Felix S. Klock II [:pnkfelix, :fklock]

Comment 36

•

13 years ago

(In reply to comment #34)
> Obsolted the last patch. Anyway, it's useless to keep uploading updates if no
> one looks at it. When anybody gets time to give feedback, let me know and I'll
> upload the most recent patch, so we'll be in sync.

Presumably at some point you'll stop updating the code or move on to a different task; I'd recommend you upload your patch at that point.

Dan Smith

Updated

•

13 years ago

Assignee: nobody → sold2

Flags: flashplayer-qrb+

Priority: -- → P3

sold2

Assignee

Comment 37

•

13 years ago

(In reply to comment #36)
> Presumably at some point you'll stop updating the code or move on to a
> different task; I'd recommend you upload your patch at that point.

Sorry for the delay, haven't forgot about this. I'll try to upload it somewhen next week.

sold2

Assignee

Comment 38

•

13 years ago

Attached patch Segmented mops sketch — Details — Splinter Review

This patch is pretty big. It should have really been divided into smaller patches, but just to get something going...
Overall, this is pretty stable, but probably needs some refinements.

Attachment #523438 - Flags: feedback?

Sylvestre Ledru [:Sylvestre]

Comment 39

•

6 years ago

Tamarin isn't maintained anymore. WONTFIX remaining bugs.

Status: UNCONFIRMED → RESOLVED

Closed: 6 years ago

Resolution: --- → WONTFIX

You need to log in before you can comment on or make changes to this bug.