Closed Bug 528857 Opened 12 years ago Closed 12 years ago

nanojit: mismanagement of name lifetimes with TMFLAGS=assembly?

Categories

(Core :: JavaScript Engine, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: n.nethercote, Assigned: n.nethercote)

Details

(Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey)

Attachments

(2 files, 1 obsolete file)

I'm seeing some screwy behaviour with TMFLAGS=assembly.  Names get associated with instructions for the debug output.  AFAICT memory holding LIR is being recycled -- eg. different instructions in different fragments that are compiled at different times may end up at the same address in memory.  However, the LirNameMap seems to be staying live across the compilation of these different fragments, and so inappropriate names are being given to instructions in the second fragment.

I've attached some output that shows this.  First, I turned off addName(), which manually names some interesting instructions, because it just confuses things.  'xt69' is the instruction name of interest.  It appears in a fragment on line 3881:

  0x9fc5d90: xt69: xt eq136 -> pc=0x9febb2f imacpc=(nil) sp+56 rp+0 (GuardID=094)

'xt69' is an appropriate name for a LIR_xt instruction.  'xt69' appears again a couple of fragments later:

  0x9fc5d90: xt69 = int 12

I added some code to print the address of LIR_xt and LIR_int instructions, and you can see it's the same address, ie. the memory has been recycled.  But the LirNameMap seemingly hasn't been recycled, so 'xt69' is still associated with addres 0x9fc5d90 even thought the LIR instruction at that location has changed, and we use it inappropriately.

Graydon, you understand the allocator lifetimes... any idea what's going on here?  The LirNameMap is using 'alloc' as its allocator, whereas the LirBuffer is using 'tempAlloc'.  I thought they should use the same one, because the lifetime of a LIR fragment should match the lifetime of its LirNameMap.  So I tried changing LirNameMap to use 'tempAlloc' but I got crashes, and Valgrind diagnoses bazillions of use-after-free errors.  Then I tried changing LirBuffer to use 'alloc' and the problem goes away, but that extends the LirBuffer's lifetime inappropriately, I think.
Attached file Valgrind output
Here's Valgrind output from when I used tempAlloc for LirNameMap.  I don't understand it, it seems like the freeing should be occurring after compile() is called in TraceRecorder::closeLoop().
FWIW, this bug is impeding my progress in some other areas.  For example, bug 505662 involves removing operandCount[], which is currently only used when printing liveness information.  The obvious way to test a patch for that bug is to compare the TMFLAGS=liveness output before and after operandCount[] is removed;  there should be no change.  But because the TMFLAGS output is bogus, that doesn't work, and there are all sorts of random differences.

What I ended up doing was changing LirBuffer to use 'alloc' instead of 'tempAlloc' as comment 0 mentions.  With that in place I found I had no differences before and after, ie. that the patch was good.

This will come up repeatedly for me, as I have other bugs where I'm cleaning up Nanojit in ways that shouldn't change the generated code at all (eg. bug 513615 and all those it depends on) and so TMFLAGS pre vs. post comparisons are the best way to test.  While I do have a workaround it sucks that the existing TMFLAGS output cannot be trusted.
Attached patch patch (obsolete) — Splinter Review
For understanding why using tempAlloc for the LirNameMap doesn't work, the
relevant code is in ~TraceRecorder:

    /* Purge the tempAlloc used during recording. */
    tempAlloc().reset();
    traceMonitor->lirbuf->clear();

The first line clears all the tempAlloc memory.  The second line resets the
LirBuffer and allocates a new chunk (with tempAlloc) -- not exactly obvious
behaviour for a function called 'clear()'.  But the LirNameMap hasn't been
updated, yet the memory its using has been pulled out from underneath it.
Which causes problems later.

Really, the current storage approach for LirBuffer/LirNameMap/LabelMap is just
weird.  They logically belong within TraceRecorder, but it's TraceMonitor
that holds them.  This patch attempts to fix the problem.  It:

- Moves LirBuffer and LabelMap into TraceRecorder (and also RegExpNativeCompiler).

- This requires making the LabelMap in LirNameMap public, because the
  LabelMap in TraceMonitor was public.

To me, this makes the relevant code simpler.  And it seems to work --
trace-tests passes as before -- but all this memory allocation stuff is very
new to me and should be checked carefully.  Esp.  the jsregexp.cpp stuff,
which is even more opaque to me than the jstracer.cpp stuff.
Assignee: general → nnethercote
Attachment #414454 - Flags: review?(graydon)
Attachment #414454 - Flags: review?(dmandelin)
Comment on attachment 414454 [details] [diff] [review]
patch

I really don't know much about nanojit memory management, so my review doesn't count for much, but it makes sense to me and is a lot clearer than what came before.
Attachment #414454 - Flags: review?(dmandelin) → review+
Thanks, dmandelin.  I thought you might know about the jsregexp.cpp changes.  The patch is more about TM memory management than NJ memory management, although the line between the two is not clear-cut.
Comment on attachment 414454 [details] [diff] [review]
patch

Hm. Ok, I ... almost like this! It's giving me an uneasy feeling that I tried this already and it didn't work (or possibly regressed performance a lot?) but I'm not able to really see how that would occur, reading the patch. Maybe circumstances have changed. Give it a thorough testing (and check perf) but if nothing shows up, go for it. 

The one concrete change I'd also ask for is to try to keep the lirbuf pointer in the recorder const. You should be able to initialize it in the initializer-list for the recorder, I think. And possibly to change the newly-public labelmap member of the lirbuf to a public accessor for a private member. Just to avoid exposing more of the lirbuf to mutation than necessary.
Attachment #414454 - Flags: review?(graydon) → review+
(Apologies also for dragging my ass on this bug, it's been a kinda silly week but this *was* the top of my "actually fixing code" list. I just haven't got to actually fixing much code.)
(In reply to comment #6)
> 
> The one concrete change I'd also ask for is to try to keep the lirbuf pointer
> in the recorder const. You should be able to initialize it in the
> initializer-list for the recorder, I think.

Done.  I was able to do it for RegExpNativeCompiler too.

> And possibly to change the
> newly-public labelmap member of the lirbuf to a public accessor for a private
> member. Just to avoid exposing more of the lirbuf to mutation than necessary.

I don't like that 'labels' is public either, but having a get method won't make a difference because it would get a pointer to the LabelMap through which you can make modifications.

I have some ideas for changes elsewhere in nanojit that will allow 'labels' to be made private again.  Maybe I'll try to get those done before I land this so that this change isn't necessary.
Attached patch patch, v2Splinter Review
Minor changes to the first patch, in particular, it expands on the comments describing the different allocators.
Attachment #414454 - Attachment is obsolete: true
Attachment #415550 - Flags: review?(graydon)
Attachment #415550 - Flags: review?(graydon) → review+
Landed the tiny NJ-specific part:

http://hg.mozilla.org/projects/nanojit-central/rev/cb855a65f046
Whiteboard: fixed-in-nanojit
Comment 11 was the merge of the NJ-specific part.  This is the TM-specific part:

http://hg.mozilla.org/tracemonkey/rev/c7f2d32dc2f8
Whiteboard: fixed-in-nanojit → fixed-in-nanojit, fixed-in-tracemonkey
http://hg.mozilla.org/mozilla-central/rev/4c7acf580717
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.