Closed Bug 462228 Opened 11 years ago Closed 11 years ago

Need code to patch trace-JITed code (all trees) to force loop exits

Categories

(Core :: JavaScript Engine, defect, critical)

defect
Not set
critical

Tracking

()

RESOLVED FIXED
mozilla1.9.1b2

People

(Reporter: brendan, Assigned: graydon)

References

Details

Attachments

(3 files, 3 obsolete files)

Andreas's write-up of what needs to happen, in a patch based on the forthcoming new patch for bug 453157:

- Re-enable the generation of side-exit stubs for LIR_loop in nanojit (currently disabled as optimization to save a few bytes of code t the loop tail).
- Add a method to TraceMonitor that walks the entire code cache (Fragmento) and for every fragment that ends with LIR_loop (lastIns->isOp(LIR_loop)) patch the jump destination (LIR_loop->record()->jmp+1) to point to the side exit stub (target) instead of the loop header.
- Add another method to restore all LIR_loop jumps (see the way LIR_loop is patched to fragEntry initially).

Sayrer volunteered Graydon for this -- we are all grateful!

/be
Note: this ability to patch all potentially executed traces must work when executed from an arbitrary thread.
Do we still get the guarantee that the destination trace will be suspended?
Apparently there is nothing like PR_SuspendThread. Presumably it means that the code cannot assume thread suspension, isn't it?
We were talking about storing atomically and racing yesterday. Is that still a good idea?

Cc'ing wtc for PR_Suspend*Thread* tips.

/be
NSPR doesn't provide PR_SuspendThread because if you suspend a
thread that holds lock, it may cause the program to deadlock.
This is why Java's Thread.suspend method is deprecated.

NSPR provides PR_SuspendAll, which suspends all the threads
known to NSPR.  Netscape's port of Sun Java VM used to use
PR_SuspendAll and PR_ResumeAll for garbage collection.
However, after OJI, PR_SuspendAll and PR_ResumeAll haven't
been used for a long time, so I don't know if they still work.
Lets shoot for racing code write. I will write up some code to verify that this works on x86.
Attached patch a sketch to gather feedback (obsolete) — Splinter Review
After several misunderstandings of the meaning of guards vs. side exits (thankfully elucidated by dvander), here's an initial sketch of what I think is required. 

More or less. The actual choice of instruction-patching function is probably wrong, as I've not tested it, and there are two, and I've forgotten which one does which flavour of jumps yet. And I'm of half a mind to try to consolidate the two flavours, as in the patch for bug 461489.

Anyway, meanwhile: is this the sort of thing you had in mind? Any glaring mistakes or misunderstandings at this point? I'll carry on with it tomorrow if not.

(Wrt. suspending threads: I think on modern Intel machines we can rely on an atomic word-write, if we have proper alignment and such. Not sure if we get that on ARM or elsewhere.)
#include <pthread.h>
#include <stdio.h>
#include <sys/types.h>
#include <sys/mman.h>

static void* jmp = 0x1e84;

static int started = 0;
static int done = 0;

void y() { }

void* victim(void* threadid)
{
  //jmp = &&label4;
label:
  if (done)
    goto label3;
  started = 1;
label2:
  y(); y(); y(); y(); y(); y(); 
  y(); y(); y(); y(); y(); y(); 
  y(); y(); y(); y(); y(); y(); 
  y(); y(); y(); y(); y(); y(); 
  y(); y(); y(); y(); y(); y(); 
  y(); y(); y(); y(); y(); y(); 
  y(); y(); y(); y(); y(); y(); 
label4:
  goto label;
label3:
  printf("done=%d\n", done);
  return NULL;
}

int main (int argc, char *argv[])
{
  pthread_t t;
  int ret = pthread_create(&t, NULL, victim, (void*)t);
  if (ret) {
    printf("ERROR return code from pthread_create() is %d\n", ret);
    exit(-1);
  }
  while (!started)
    ;

  ret = mprotect(((unsigned)victim) & ~(0xfff), 4096, PROT_READ | PROT_WRITE | PROT_EXEC);
  printf("mprotect = %d\n", ret);

  int* addr = (int*)(((char*)jmp)+2);
  printf("addr=%p\n", addr);

  *addr = 0; /* skip over the jump */
  
  int i;
  for (i = 0; i < 1000000; ++i)
    ;

  done = 1;
  pthread_exit(NULL);
}

Result:

mprotect = 0
addr=0x1e86
done=0

The 2nd thread exits with done == 0, which means we successfully patched the branch to fall through to the next op.

This should work on arm as well since there we get an even better guarantee (we can fiddle with the code almost arbitrarily until we flush the icache).
Here is a test run with a nop in between to force an unaligned address:

mprotect = 0
addr=0x1e83
done=0

The jump is at 0x1e83, the 4-byte offset field we overwrite at 0x1e85.
(In reply to comment #7)
> (Wrt. suspending threads: I think on modern Intel machines we can rely on an
> atomic word-write, if we have proper alignment and such. Not sure if we get
> that on ARM or elsewhere.)

My worry is how to ensure that all traces are patched. For example, what happens when a new trace is being constructed when the patching is started? It seems that some locking plus a flag to ensure that is necessary.
Yeah, we'll need to lock the fragmento, and prevent new traces from forming, for the duration of the watchdog execution. Or some such combination; it'll be a little subtle to get right.
(In reply to comment #11)
> Yeah, we'll need to lock the fragmento, and prevent new traces from forming,
> for the duration of the watchdog execution. Or some such combination; it'll be
> a little subtle to get right.

I think we need some kind an invariant that from a particular moment all the existing and future traces comes with the patched bytecode when they start execution.
fragmento is single threaded. Either we are recording and we will see that we shouldn't enter a tree or we are executing a tree and we can't be recording
This first patch is a clean-up and correction of some hunks that showed up over in bug 461489. It just merges the two functions that know how to patch jump instructions into a single one, and generalizes the code for emitting jumps, to carry a flag indicating whether they must be (patchable) long jumps. Neither is strictly necessary but I think they're both good cleanups, and the next patch I'm about to submit here depends on the merged patcher function.
Attachment #345824 - Flags: review?(edwsmith)
This tidied-up version of the initial sketch patches the code properly, and uses a single "swap" function to alternate between jump-to-loop and jump-to-exit.
Attachment #345639 - Attachment is obsolete: true
Attachment #345825 - Flags: review?(gal)
Not for review, but if you want to test: this is a little baby test that uses SIGALRM to disconnect a running loop after 1s, and then re-enable it on trace exit. Apply with the 2 patches above and run a script with a long-running loop, to verify that it works.
- Why do we need so many additional fields in GuardStruct? We know the target (fragEntry of fragment->root), and guardrecord->exit->target tells us the target of the loop exit stub. 
- TIMEOUT_EXIT seems to be intended badly.
- Setting a fake ->target seems really hacky. I think LIR_loop and LIR_guard should just be treated the same, except that for LIR_loop we manually patch them over once we hit the fragEntry point (going up the trace in reverse) or immediately if we are on a secondary path.
- The swap interface is brittle IMO. We should have separate connect and disconnect functions.
No, in the current code nobody stores either of the correct addresses.

The fragEntry field you're suggesting of fragment->root is not the loop-jump target. That's the target for transferring in from some other fragment that needs its sp adjusted. In the non-root loop edges, those are the same. In root loop-edges, those two differ: the loop-jump target is a few bytes later. It's determined as the value SOT in endAssembly, and patched into all the loopJump records, then discarded. Nobody stores it.

Likewise, guardRecord->exit->target is not the jump-to-exit-stub target address. It's the fragment the exit stub jumps *to* after the stub itself runs. It's returned from asm_leave_trace, then from asm_exit, and immediately recycled as an operand to JMP / asm_branch, then discarded. Nobody stores it.

In both cases, neither of these jump-target addresses are stored anywhere I can find in the current code, so I had to add fields that store them.

Setting a fake ->target *is* really hacky. I don't like it either. I think you are oversimplifying by saying that LIR_loop and LIR_guard can be "treated the same", but I'll have a go at removing this hacky-ness this afternoon and further merging what can be merged between them. Will post an updated patch here.

The swap interface was a last minute thing and easily changed back to a disconnect / reconnect pair. It just seemed to make the patch less intrusive, but I don't mind either way. I predict you'll like the other form even less, but I'll switch back in the next patch and you can decide :)
Attached patch update to patchSplinter Review
This variant removes the hack of setting exit->target to null, in favour of predicating the stub generation on the current exit code. Reads better, you were right. I also moved the loop-entry point (a few insns after the frag entry point) into a dedicated field of each frag, renamed 'patchEntry' to 'fragEntry' since it refers to the same location, and cleaned up some of the code that does loop patching at the end of assembly.
Attachment #345825 - Attachment is obsolete: true
Attachment #346159 - Flags: review?(gal)
Attachment #345825 - Flags: review?(gal)
Oh, and of course, since I split the "swap" interface into a disconnect/reconnect pair, the example patch needed updating too. Here it is.
Attachment #345828 - Attachment is obsolete: true
Comment on attachment 346161 [details] [diff] [review]
update to example patch using the new interface

nit: jmpToTarget is the address of the actual jmp instruction, whereas jmpToStub is the address of the stub. Thats a little confusing. How about "jmp" and "stub"?
Attachment #346161 - Flags: review+
Attachment #346161 - Flags: review+
Attachment #346159 - Flags: review?(gal) → review+
Attachment #345824 - Flags: review?(edwsmith) → review?(rreitmai)
(In reply to comment #21)
> (From update of attachment 346161 [details] [diff] [review])
> nit: jmpToTarget is the address of the actual jmp instruction, whereas
> jmpToStub is the address of the stub. Thats a little confusing. How about "jmp"
> and "stub"?

No, they're both addresses of jmp instructions. The stub entry address is stubEntry.
Comment on attachment 345824 [details] [diff] [review]
a prelude to the actual work

- Should we rename nPatchBranch to asm_adjustBranch()  ? 

- Is anyone still using 'was' should we remove it entirely?
Attachment #345824 - Flags: review?(rreitmai) → review+
(In reply to comment #23)
> (From update of attachment 345824 [details] [diff] [review])
> - Should we rename nPatchBranch to asm_adjustBranch()  ? 
> 
> - Is anyone still using 'was' should we remove it entirely?

The first variant of the next patch in this bug (the one using swap()) used the 'was' field. And the logging functions in the current variant. I'd just as soon keep it there, helps when debugging.
(In reply to comment #13)
> fragmento is single threaded. Either we are recording and we will see that we
> shouldn't enter a tree or we are executing a tree and we can't be recording

This is too simplistic a view, I think. The fragmento belongs to a single 'main' thread, but the timeout thread is, by definition, a different thread.

So if we have the timeout thread commence walking through the fragmento disconnecting loops, the main thread may (either naturally, or due to disconnection) exit from a trace and race against it, modifying the fragmento as well.

I think the timeout thread and the main thread will have to synchronize a little.
These actually landed last week in tracemonkey; forgot to update with the revision IDs: 863f06764089 and 040e5ac010b1.
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Flags: in-testsuite-
Flags: in-litmus-
You need to log in before you can comment on or make changes to this bug.