Closed Bug 461073 Opened 13 years ago Closed 13 years ago

Nanojit mem failures cause crashes

Categories

(Tamarin Graveyard :: Baseline JIT (CodegenLIR), defect)

defect
Not set
major

Tracking

(Not tracked)

VERIFIED FIXED

People

(Reporter: stejohns, Assigned: rreitmai)

Details

Attachments

(1 file, 3 obsolete files)

Assembler::pageAlloc attempts to allocate another page for the JIT to use; if it can't, it just returns the first one in the list and sets the error condition to OutOMem. 

The caller is supposed to check the error condition, but if the failure happens from underrunProtect(), the page returned will be stomped on before the error condition is ever detected (see all the macros in Nativei386.h for example)... 

Since this page is probably in use in a previously existing fragment, you now have corrupted code. (If said fragment is exdecuting higher up in the current callstack then you get especially amusing failures.)
Assignee: nobody → rreitmai
Status: NEW → ASSIGNED
The fix is to allocate a page strictly for use during out of memory situations.  Unfortunately this has a couple of side-effects; 
  (1) one page per Assembler is essentially wasted 
  (2) the allocation of the page occurs during Assembler construction, 
      which then triggers Fragmento.pagesGrow().  Meaning that pagesGrow() is
      called earlier than previously and if the Fragmento is not used (for 
      whatever reason) we'll be consuming more memory.  This was not the case 
      previously. 

An alternate solution would be to force each compile to happen on a new page, guaranteeing never to collide with another method, but this might lead to excessive waste.

Or we could perform checks prior to the generation of each instruction and abort if we don't have enough memory, but this would require more extensive changes and possibly result in greater overhead.  BTW, the original intent for code generation was to call underrunProtect() only once per iteration of gen() passing it the largest number of bytes that could be generated in one pass.  But this would most likely prove too fragile. 

So maybe an '_err=None' check in underrunProtect() might be the better route if we're going to be calling it for each instruction anyway.
It would certainly be easier to detect errors, since we most likely want to clear out the page pointers along with _nIns and _nExitIns.

Thoughts?
Attachment #344393 - Flags: review?(edwsmith)
Attachment #344393 - Flags: review?(stejohns)
Comment on attachment 344393 [details] [diff] [review]
ver1 - applies against redux change 1021

I don't think we can afford to waste a page, or afford to waste space by having 1 page per method.  (unless we shrink the page size a lot, which will add lots more jumps).  

Here's a proposal that doesn't waste page space and doesn't require more frequent OOM checks.

Remember _nIns when we start compiling each new method. (_nSave = _nIns)  if a page allocation fails, instead of reverting to the top of page, revert to the saved ptr (_nSave = _nIns).  when we succeed, reset _nIns to the new page top, so a later failure still wraps around on the same page without jumping back to the previous page where _nSave began.
Attachment #344393 - Flags: review?(edwsmith) → review-
Attachment #344393 - Flags: review?(stejohns)
Yes I was initially thinking of something similar but there is an issue when you're near the top of page (TOP).  Let's say _nSave starts at TOP+7 bytes what happens if you need 8B for the next instruction?

Although now thinking about it again, I guess if we ensured that we always have sufficient space for the largest possible instruction, then the idea will work.
right, if we start out with (say) underrunProtect(32) or something > than any other call to underrunProtect, *and* check OOM first, then we're all set.
I've re-branded the title to encompass all mem issues with NJ.  Right now, we have the original issue and a similar issue in the LirBufWriter. 

Patches for both (which most likely will follow the same policy) will be forthcoming.
Summary: Nanojit pageAlloc failure mode causes crashes → Nanojit mem failures cause crashes
TraceMonkey calls clearFrags if Fragmento/Assembler ever returns OOM, so I don't think we hit this problem.  We do hit cases where LirBuffer runs out of memory during recording, and the filters assert trying to form references.  This doesn't result in a crash in optimized builds though, we just get a corrupt LIR stream that gets thrown out quickly.
The patch includes a number of fixes mostly related to how
out of memory conditions are handled by Lir and Assembler.

It is posted here in a pre-liminary form in order to generate
feedback.  I will post a follow-up patch(s) which contain a
trimmed down, submit-ready version once we've agreed
on most of the details.

Also, it would be great of the TM team could take this patch 
and validate it against their latest (incl.  64b machines); 
danderson, you there :)

On to what has changed...

LirBuffer has been modified to provide advance warning of
out of memory (OOM) conditions.

A new page is allocated LIR_BUF_THRESHOLD instructions
prior to reaching the end of page.  If the page allocation fails,
call to outOmem() will return true.  The buffer can still be 
safely written to during during this period but it is assumed
the higher level code will catch this condition and handle
it appropriately as writing LIR_BUF_THRESHOLD instructions
past this point will cause a crash.

This opportunity was also taken to re-factor the code for 
LirBufWriter making it more platform agnostic.
- All non-LInsp data in the instruction stream is now managed
  through structures that overlay the memory region.
- prepFor() was added to replace the multiple 
  ensureReferenceable() calls for each instruction.
- insCall() was also modified somewhat in that the 
  arguments are now stored growing downwards from 
  the position of the pseudo instruction LirCallIns.  

CodegenLIR now has LirBuffer checks at the granularity
of each emitXXX() call that is exposed publicly.  This seemed
like a reasonable approach since a client could potentially 
call at this level indefinitely.  If we want to reduce the frequency
of these checks then we'd have to push the check up into the
verifier.
 
Assembler OOM handling has also changed.  The variable
_startingIns was added and contains the location at which
the assembler began writing code for the current 
begin/assem/end sequence.   If an OOM condition occurs
the assembler will reset the current instruction pointer
to _startingIns, effectively overwriting the code that has
been generated.  This allows the assembler to produce
code indefinitely (and without error) until the upper layers 
have noticed the error and respond accordingly.

The constant LARGEST_UNDERRUN_PROT was added
and needs to be set to a platform specific value that is 
equal to or greater than the number of NIns written for
the largest possible instruction.  i.e.  you cannot write 
more than this number of NIns to the buffer for each 
call to underrunProtect().

- ignore the Fragmento change, that was for lo-mem testing.
- ignore the xcode project change , 10.4 merge fun


TODO
- arm/thumb/ppc changes for underrun protect 
- 64b differences still exist (e.g. insCall), can these be abstracted out?
- insCall argument access should be modified to use int32_t* with
  shifts, since some embedded platforms may not allow byte level ptrs.
- makeReachable() has spref/rpref optimizations in it?  Shouldn't
  this code exist somewhere else?  
- LirBufWriter.ins0() has insParam() calls in it.  Any reason to keep it here?
Attachment #344393 - Attachment is obsolete: true
if LIR_call args are growing downwards, how can the buffer be read in reverse? won't the args be encountered before the call in that case?

doing OOM checks at each emitXXX is fine except for the few that internally loop and generate some variable amount of code.  prolog() comes to mind, as well as the emitter for OP_lookupswitch.  those cases need additional checks.

Since we signal OOM LIR_BUF_THRESHOLD before the end of the last page, I don't think we need to support any wraparound capability, and therefore don't need _startingIns.  Asserts should be enough.

the insParam setup code in LirBufWriter.ins0() could/should be moved into a utility function, perhaps in LirWriter.  It will be up to the LIR generator to call the function if desired.  (turns out, TM doesn't desire these).
> args growing downwards

To be clearer;  the old format was
(as an e.g) :

[ arg0 arg1 ...       ]
[      argN           ]
[      CallInfo*      ]
[     LIR_call        ]

new format is :

[          argN       ]
[ ...       arg1 arg0 ]
[      CallInfo*      ]
[     LIR_call        ]

Note only the arg's are reversed.  The arg's
are currently packed in an array of words (4Bytes)
which prefaces the CallInfo pointer and the LIR_call.
The general order has not changed just the placement 
of args within the array.


> loop and generate some variable amount of code

Ok, i'll check into those cases and add the checks.


> _startingIns

Assembler was not changed to use the threshold, it
still uses the overwrite technique.  Seeing as the 
output of the assembler is write-only, it didn't 
seem necessary to apply the threshold technique there.
Do you think there's any advantage to doing so?  

> insParam setup code

Agreed it needs to be moved.  How about emitStart() 
in CodegenLIR since it really is specfiic to the type
of code you wish to generate.

> ...

Also speaking of specific, what about sp/rp refs inside of 
LirBuffer and LirBufWriter.  This seem highly specific to 
our use.  

I haven't tried it out yet, but what about a more general
technique where we keep a map of LInsp -> tramp. If we
can reach the tramp we use it , otherwise we create a 
new tramp and update the map.

It would cost a hashmap lookup for each parameter that 
is > 8bits away, but we'd end up doing more processing 
by creating a tramp in the stream anyway.    BTW, was
this a large memory savings?  It seems like it wouldn't
help performance at all.
(In reply to comment #9)
> > _startingIns
> 
> Assembler was not changed to use the threshold, it
> still uses the overwrite technique.  Seeing as the 
> output of the assembler is write-only, it didn't 
> seem necessary to apply the threshold technique there.
> Do you think there's any advantage to doing so?  

Nope, I stand corrected, I was thinking of LIR generation only.

> Agreed it needs to be moved.  How about emitStart() 
> in CodegenLIR since it really is specfiic to the type
> of code you wish to generate.

Sounds good

> Also speaking of specific, what about sp/rp refs inside of 
> LirBuffer and LirBufWriter.  This seem highly specific to 
> our use.  
> 
> I haven't tried it out yet, but what about a more general
> technique where we keep a map of LInsp -> tramp. If we
> can reach the tramp we use it , otherwise we create a 
> new tramp and update the map.
>
> It would cost a hashmap lookup for each parameter that 
> is > 8bits away, but we'd end up doing more processing 
> by creating a tramp in the stream anyway.    BTW, was
> this a large memory savings?  It seems like it wouldn't
> help performance at all.

It saved a lot of LIR code size, yes.  something like 10-15% iirc, certianly enough to tip the scales clearly in favor of a 8bit refs within a 32bit instruction, compared to 16bit refs in a 64bit instr.  Ive also noodled around with having a kind of "recent instruction" map with lower overhead, but haven't found anything compelling as of yet.

If there are particular expressions that are often referenced, other than sp/rp, then this could be a win.  There were very few such expressions and at the time I wrote that code, so it wasn't worth the heavy-handed map.

Both tracemonkey and tamarin-central have something sp-like, so having two hardcoded vars has worked well so far.  we can treat this as a separate optimization.
Attached patch ver3 - applied to redux 1130 (obsolete) — Splinter Review
Patch that resolves all @todo issues and removes #ifdef's
Attachment #347011 - Attachment is obsolete: true
Attachment #347674 - Flags: review?(edwsmith)
Comment on attachment 347674 [details] [diff] [review]
ver3 - applied to redux 1130

bugs:

* ln 1145, looks like && !outOMem() is misplaced, its in the init expression for n instead of the for loop condition.  also ln 1196, 1253, 1265

* indentation looks a bit corrupt in LIR.cpp, should be 4-spaces indent, no tabs

* in LirBufReader.read(), i->callInsWords() replaced with inline math, why?  encapsulation  would be good to keep unless it's a performance hit (inline not working?)

* intentional changes to project.pbxproj?

not a bug, but could be:

* in CodegenLIR.localSet(), check for outOMem before the first store, instead of the second store.
Attachment #347674 - Flags: review?(edwsmith) → review-
Comment on attachment 347674 [details] [diff] [review]
ver3 - applied to redux 1130

marking r+, no re-review required after fixes are made.
Attachment #347674 - Flags: review- → review+
could we change outOmem into OutOfMem our outOfMem?
(In reply to comment #14)
> could we change outOmem into OutOfMem our outOfMem?

+1 vote for "outOfMem" -- IIRC all other methods in NJ are initial-lowercase
yikes!  Good catch with the outOMem() calls.  

- re: callInsWords(), no real preference either way, but maintaining the abstraction seems like a good idea, i'll add it back.

- all my diffs contain project.pbxproj changes, but I revert them prior to push.

- outOMem changed.

I will re-post the patch, since I'm finding an issue(s) on win32 that will most likely require changes.
msvc does slightly different packing than gcc, so using a union to extract
directly from the LIR buffer does not work nicely on windows.

Will revert to using casting such as '*(double*)l->v' and post a patch after acceptance tests.
Cleaned up.  Still contains xcode proj diffs that will be removed and also contains tabs. 

Getting hg to convert tab to spaces is a bit of a pain.  hg diff -t doesn't work and piping to expand -t 4 gives results that are off by 1.  I'll play around with it a bit more to see what can be done.
Attachment #347674 - Attachment is obsolete: true
Attachment #347891 - Flags: review?(edwsmith)
tabs in the diff aren't a problem, I was referring to tabs in the modified files, i think LIR.cpp was the one I noticed.
Comment on attachment 347891 [details] [diff] [review]
ver4 - applied to redux 1137

How about 
  s/outOMem/outOfMem/g
before pushing?
Attachment #347891 - Flags: review?(edwsmith) → review+
pushed as f7a0041cd0c5 (1104).  Tabs unfortunately still remain as there was no simple way to convert them without odd off-by-one side-effects.   This shouldn't happen in the future as my tool-chain is now fully space-friendly.

Also, ended up removing the check in localSet().  I wasn't convinced that it is needed and it seemed like a fair amount of overhead for a single invocation from CodegenLIR.  

If we find that we do need to protect against this case, we should probably make localSet() invisible to the verifier and instead go through an intermediary that performs the check.

Also forgot to mention that asm_output was modified to be fully variadic (i.e. ...), apparently this gets around a compiler bug on sun and since there didn't seem to be any negatives to doing so, the change was made.

While on the topic of native.h , setting outputAddr=true (rather than false, which is the default) in the last line of asm_output() will result in verbose output displaying the address of each machine instruction.  This is handy for debugging the back-end.
marked fixed
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Resolved fixed engineering / work item that has been pushed.  Setting status to verified.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.