Closed Bug 638627 Opened 9 years ago Closed 9 years ago

Properly handle failures to patch guards due to non-32-bit offsets

Categories

(Core :: JavaScript Engine, defect)

x86_64
macOS
defect
Not set

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- .x+

People

(Reporter: brendan, Assigned: njn)

References

()

Details

(Whiteboard: fixed-in-nanojit, fixed-in-tracemonkey)

Attachments

(2 files, 1 obsolete file)

David poked around a bit, but it wasn't clear what had been traced on without a lot more effort. Here's what he and I got, more or less:

(gdb) p $.u.i.script
$32 = (JSScript *) 0x1cdf19090
(gdb) p $.filename
$33 = 0x1c595af2d "http://processingjs.nihongoresources.com/bezierinfo/resources/processing-1042.js"
(gdb) p $32.length
$34 = 109
(gdb) p cx.regs.pc
$35 = (jsbytecode *) 0x1cdf191c0 "?"
(gdb) p cx.regs.pc - $32.code
$36 = 80
(gdb) p (JSOp)*cx.regs.pc
$37 = JSOP_TRACE
(gdb) p js_Disassemble(cx, $32, 0, __stdoutp)
No symbol "js_Disassemble" in current context.
(gdb) call js_Disassemble1(cx, $32, 0, __stdoutp)
No symbol "js_Disassemble1" in current context.
(gdb) x/109b $32.code
0x1cdf19170:	0x5a	0x00	0x00	0x01	0x3e	0x54	0x00	0x00
0x1cdf19178:	0x5e	0x5c	0x57	0x00	0x00	0x51	0xd4	0x00
0x1cdf19180:	0x00	0x00	0x00	0x57	0x00	0x01	0x51	0x06
0x1cdf19188:	0x00	0x30	0xe5	0x00	0x00	0x39	0x00	0x01
0x1cdf19190:	0x56	0x00	0x01	0x54	0x00	0x00	0x3a	0x00
0x1cdf19198:	0x02	0x51	0x56	0x00	0x00	0xbb	0x00	0x02
0x1cdf191a0:	0x56	0x00	0x01	0x3a	0x00	0x01	0x51	0x56
0x1cdf191a8:	0x00	0x01	0x55	0x00	0x00	0x51	0xd5	0x00
0x1cdf191b0:	0x01	0x00	0x00	0x57	0x00	0x01	0x51	0x56
0x1cdf191b8:	0x00	0x01	0x08	0xff	0xd0	0x06	0x00	0x16
0x1cdf191c0:	0xe5	0x00	0x01	0x56	0x00	0x00	0xbb	0x00
0x1cdf191c8:	0x03	0x3a	0x00	0x00	0x54	0x00	0x00	0x36
0x1cdf191d0:	0x00	0x04	0x51	0x56	0x00	0x00	0xdf	0x3e
0x1cdf191d8:	0x16	0x08	0xff	0xe7	0xc5
(gdb) p (JSOp)0x5a
$38 = JSOP_NEWARRAY
(gdb) p (JSOp)0x1 
$39 = JSOP_PUSH
(gdb) p (JSOp)0x3e
$40 = JSOP_ZERO
(gdb) p (JSOp)0x54
$41 = JSOP_GETARG
(gdb) p (JSOp)0x5e
$42 = JSOP_INITELEM
(gdb) p (JSOp)0x5c
$43 = JSOP_ENDINIT
(gdb) p (JSOp)0x57
$44 = JSOP_SETLOCAL
(gdb) p (JSOp)0x51
$45 = JSOP_POP
(gdb) p (JSOp)0xd4
$46 = JSOP_GETARGPROP
(gdb) p (JSOp)0x51
$47 = JSOP_POP
(gdb) p (JSOp)0x6 
$48 = JSOP_GOTO
(gdb) x/b $32.code+80
0x1cdf191c0:	0xe5
(gdb) p (JSOp)0x16
$49 = JSOP_GT
(gdb) p (JSOp)0x56
$50 = JSOP_GETLOCAL
(gdb) p (JSOp)0x54
$51 = JSOP_GETARG

Hope to reproduce on the same page (which is in my session store).

/be
David, can you take this now that it is probably understood? I'll let you know as soon as I get the controlled crash signature, but developing a patch ASAP seems good in case we respin.

For everyone's info: this bug is killing my Firefox 4 RC and tracemonkey tip dogfood builds, based on my magic session store + OS state.

/be
blocking2.0: --- → ?
If njn or gal should take, please do.

/be
Taking, unless Andreas wants to. The problem is that nanojit's CodeAlloc can allocate pages >2GB apart, and nPatchBranch() just Assert(0)'s this case. So we can patch a guard with a bogus offset.
Assignee: general → dvander
Status: NEW → ASSIGNED
I would prefer a fix where we try to allocate all code pages within the same 2gb on 64-bit systems.
Re: blocking status (reproduced at Brendan's behest): OSX (10.6) is the only OS to have the crash and ExecuteTree is the #17 b12 OSX crasher compared to #120 on Windows.  Similarly, filtering for all 2.0 crashes, ExecuteTree is #17 on OS X vs. #247 on Windows.

Amusing/related comments:
"I was trying to open 24 tabs at once from bookmark folder in addition to the ~80 I already had open. My bad."
"I was trying to see how many tabs i could open, Vwalah it crashed........ :("
(In reply to comment #5)
> I would prefer a fix where we try to allocate all code pages within the same
> 2gb on 64-bit systems.

I would prefer a fast fix so I can use Firefox 4!

/be
Attached patch patchSplinter Review
NativeX64.cpp:nPatchBranch() does check if the offset fits in 32-bits, and if it doesn't, it sets an error flag.  The real problem is that the calls to assm->patch() in jstracer.cpp don't check for that error flag.

This patch adds such checking in the simplest way possible -- it checks after the assm->patch() calls and immediately propagates false/ARECORD_STOP upwards.  I'm not certain that this is safe in all the cases, ie. whether early exits from involved functions could leave behind inconsistent state or something.  JoinPeers() and joinEdgesToEntry() are my main concerns as they don't currently have any early exits, whereas the other affected functions (closeLoop() and endLoop()) already have early exits.

Patch passes trace tests on linux32 and linux64.
Attachment #518261 - Flags: review?(dvander)
Attachment #518261 - Flags: feedback?(brendan)
Attached patch NJ patch (obsolete) — Splinter Review
We should get rid of that assert in the Nanojit code too, as it's pretty clear that this is happening legitimately, and so if it fails in a debug build it'll just be annoying.
Attachment #518262 - Flags: review?(edwsmith)
Comment on attachment 518261 [details] [diff] [review]
patch

Looks good to me. dvander is better at saying what cleanup might be needed, if any, when bailing from joinEdgesToEntry and JoinPeers. I hope nothing needs to be trashed, since we didn't join.

/be
Attachment #518261 - Flags: feedback?(brendan) → feedback+
> Looks good to me.

The f? request was intended more along the lines of try-it-and-see-if-it-stops-crashing-for-you :)
I would not block rc or final on this. This seems like a good candidate for 4.0.1. This only affects 64-bit macosx users with a ton of open tabs, so much that we generate code past the 2GB boundary. I have never seen this crash in the wild, neither have many others. If we do 4.0.1 it might bite a few, but the many get to enjoy 4.0 in the meantime. Lets fix this immediately on trunk though so brendan can use FF4 again (his session restore state captured this bug).
Comment on attachment 518261 [details] [diff] [review]
patch

Looks good. We seem to propagate a stop everywhere with this. I would love a proper fix though eventually.
Attachment #518261 - Flags: review+
Comment on attachment 518262 [details] [diff] [review]
NJ patch

So we can land it. We answered the question the assert was trying to answer.
Attachment #518262 - Flags: review+
(In reply to comment #13)
>
> I would love a proper fix though eventually.

Presumably by "proper fix" you mean this?

> I would prefer a fix where we try to allocate all code pages within the same
> 2gb on 64-bit systems.

How would you do that?
No, a proper fix would be to ensure that all JIT code pages (for both JITs even) reside within the same 2GB window. It will require some mmap magic.
Comment on attachment 518262 [details] [diff] [review]
NJ patch


>         // Guards can result in a valid branch being patched again later, so don't assert
>         // that the old value is poison.
>-        if (!isS32(target - next)) {
>+        if (!isS32(target - next))
>             setError(BranchTooFar);
>-            NanoAssert(0);  // assert because we'd like to know if this ever happens
>-        }

We must return here, not just continue, because then we'll have clobbered the working patch with one that could crash.
Comment on attachment 518261 [details] [diff] [review]
patch

If JoinPeers fails to do something, the worst that'll happen is we could record the same tree twice, so this seems totally fine. We could consider trashing the trees if we cared.

To fix this bug though, Nanojit can't perform the patch unless it knows the patch will succeed. Propagating the error and checking the assm result is actually optional.
Attachment #518261 - Flags: review?(dvander) → review+
Attached patch NJ patch, v2Splinter Review
dvander says we have to prevent the code from being bogusly patched, so this patch does that by returning early.  It also checks if the optional second patching is out-of-range.
Attachment #518262 - Attachment is obsolete: true
Attachment #518262 - Flags: review?(edwsmith)
Attachment #518283 - Flags: review?(dvander)
(In reply to comment #12)
> I would not block rc or final on this. This seems like a good candidate for
> 4.0.1. This only affects 64-bit macosx users with a ton of open tabs, so much
> that we generate code past the 2GB boundary.

It varies. Sometimes I don't have as many tabs but OS uptime is higher and the bug bites. This comment is too assertive about stuff we don't know. Argue based on crash stats instead.

/be
Attachment #518283 - Flags: review?(dvander) → review+
(In reply to comment #16)
> No, a proper fix would be to ensure that all JIT code pages (for both JITs
> even) reside within the same 2GB window. It will require some mmap magic.

I don't believe this is feasible, since the only way to make it work
is to use MAP_FIXED, and MAP_FIXED requires you to know the entire
existing map status of the process in order that you can guarantee to
choose an address range which has no current mapping.  And even then
you're racing against other threads doing mmap.  IOW, MAP_FIXED is
essentially unusable.
Maybe if you had a long list of strange addresses and tried picking one out of there it would work, but that sounds scarily unreliable. Linux and Windows can give you addresses in the bottom 2GB range on x64, and Mac's mmap() does a linear best-fit search, so something might be feasible.

I tried making JM emit far jumps and it was really gross. At least there, I think we should work toward having a 2GB code region and being able to move code memory.
One other nasty possibility is to mmap a 2GB chunk at startup,
mprotect it to be no-access, and then incrementally mprotect +wx
bits of it as they are needed.  It's pretty sucky in that it requires
the kernel to be able to overcommit VM, but it does work.

> I tried making JM emit far jumps and it was really gross.

  movabsq $imm64, %tmpreg; jmp* %tmpreg   ?

  == REX.W 0xB8 imm64 ; FF E6  -- 12 bytes if %tmpreg <= %rdi

vs short form

  jmp $imm32  ;  nopw 0x0(%rax,%rax,1) (7 byte nop)

Hmm, adds 7 bytes to all patchable jumps and uses a register.
The 7-byte nopw will presumably disappear in the insn decoder
so is pretty much free, so the perf hazard comes from the
extra icache pressure and the loss of a register.
Also you can't do that with conditional jumps. What I tried in JM (and had done this in the original Nanojit X64 port) was reserve a word at the bottom of the page for every might-be-patched jump. Then when patching, if you overflowed the range, you'd make the jump indirect, give it the next word, and fill in the full address. But it was a huge amount of code motion and added a lot of verbosity.
Will fix in 4.0.1
blocking2.0: ? → .x+
Summary: Crash in unmapped trace-jitted code from this URL → Properly handle failures to patch guards due to non-32-bit offsets
Assignee: dvander → nnethercote
You need to log in before you can comment on or make changes to this bug.