Closed Bug 596755 Opened 11 years ago Closed 11 years ago

Re-enable YARR on ARM

Categories

(Core :: JavaScript Engine, defect)

ARM
All
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
fennec 2.0b2+ ---

People

(Reporter: dougt, Assigned: cdleary)

References

Details

(Keywords: crash, Whiteboard: fixed-in-tracemonkey)

Attachments

(2 files, 10 obsolete files)

740 bytes, patch
jbramley
: review+
Details | Diff | Splinter Review
25.91 KB, patch
vlad
: review+
Details | Diff | Splinter Review
+++ This bug was initially created as a clone of Bug #596669 +++



This is a bug to track re-enabling YARR
Summary: Fennec Android crash in jsstr.cpp:ReplaceCallback → Re-enable YARR on ARM
OS: Mac OS X → All
Hardware: x86 → ARM
Assignee: nobody → general
Severity: blocker → critical
Component: General → JavaScript Engine
QA Contact: general → general
Assignee: general → cdleary
The replace that I was shown as failing was of the form:

    "app.update.lastUpdateTime.%ID%".replace(/%ID%/, "test")

I saw it crash when I built the JS shell with --enable-debug --disable-optimize yesterday, but couldn't reproduce that crash when I rebuilt today. I never saw this failure in the JS shell debug build. All the regexp shell-based jsreftests pass under both debug and optimized builds.

This was all being tested on a Pegatron box with http://hg.mozilla.org/mozilla-central/rev/b9ff2a9339e2 reverted. I gave the Pegatron to somebody else so I'm loading L4T ( http://developer.download.nvidia.com/tegra/docs/linux_for_tegra_quickstart_20100810_10_7_2.pdf ) on a Tegra board now.
The tegra board is flashed for Linux boot and I prepped a drive with the Linux4Tegra contents, but the VGA signal isn't supplying anything and it doesn't seem to make it to a ssh-able point in the boot process (it never requests a dhcp address from my router).

Need to hunt down a the daughterboard with the SATA UART dealie so I can debug the boot process tomorrow.
Status: NEW → ASSIGNED
I think Bear found the VGA port was unreliable, and saw success after switching to the HDMI port.
After setting up my Tegra2/Android board properly, updating the wiki a few times, and finding |__android_log_print| / |adb shell setprop log.redirect-stdio true| I'm making good forward progress -- much better than marking things as volatile and praying that the compiler doesn't optimize them much. :-)

I've tracked down the offending regexp for the early browser crasher and took a peek at the generated code. I'll do some real analysis tomorrow morning and update with the crux of the problem.

I/JS      ( 9345): Executing for pair count: 2
I/JS      ( 9345): Source: ^([^\.]+\.[0-9]+[a-z]*).*
I/JS      ( 9345): Input:  2.0b1pre
I/ActivityManager( 4552): Process org.mozilla.fennec (pid 9345) has died.

Many thanks to #mobile guys for helping me out along the way!
I thought that the asm was clobbering memory in some unholy way, which it may still be doing, but adding these opt checks has made the problem go away. Attaching for future reference.
Attached file YARR executable region. (obsolete) —
I'm weirded out by the UNDEFINED instruction stuff here -- I'm thinking the disassembly dialect is set to thumb but we're producing code with the ARMv7Assembler. Will check that out and also get the JaegerSpew instruction stream going off __android_log_print.
Attached file YARR executable region. (obsolete) —
`set arm force-mode arm` did the trick and looks better.
Attachment #479121 - Attachment is obsolete: true
Attached file RegExp activity. (obsolete) —
Unsurprisingly, when I add new assertion code to check for things the source of the error changes in unpredictable ways. Somewhat unfortunately, remote debugging doesn't seem to like the |masm.breakpoint()| instruction -- I don't get anything triggered on the client side.

So, before I step through and observe every load instruction, I'm trying to running a similar trace through the js shell under valgrind-for-arm, which jbramley pointed out the ubuntu PPA for. Got that running on the pegatron box now from the debs here: http://ppa.launchpad.net/linaro-maintainers/overlay/ubuntu/pool/main/v/valgrind/ so I'll see what it turns up.
Attached file Valgrind output on opt build. (obsolete) —
Okay, things get weirder when I look at valgrind results. The valgrind under opt indicates bad things about the strncpy that's being performed in ReplaceCallback, but when I try to abort under those conditions the platform is fine, but valgrind *does* abort. That could either indicate a toolchain bug or, I would think more likely, a bug in the current implementation of this valgrind-for-arm.

Moral of the story is that, without valgrind, I just have to step through the YARR blob and check the address of every write isn't shooting holes in weird parts of memory.

I'm visiting family on the east coast until Monday, at which point I'll go the instruction-by-instruction route. (Will attach the debug valgrind and the offending script after this file.) The ARM boxes I've been using will be on my desk in the meantime, if someone is feeling brave.
Attachment #479264 - Attachment description: Valgrind output on debug build. → Valgrind output on opt build.
tracking-fennec: 2.0b1+ → 2.0b2+
Chris, just so as to rule this out: are you 110% sure there isn't some
kind of D-I cache coherence problem here?  AIUI, YARR uses the
assembler in js/src/assembler, and presumably if that didn't do 
D-I cache syncing properly then JM wouldn't work at all.  Is that
correct?
(In reply to comment #12)
> Chris, just so as to rule this out: are you 110% sure there isn't some
> kind of D-I cache coherence problem here?  AIUI, YARR uses the
> assembler in js/src/assembler, and presumably if that didn't do 
> D-I cache syncing properly then JM wouldn't work at all.  Is that
> correct?

Yes, that's correct. Synchronization is done by ExecutableAllocator::cacheFlush. The methodjit calls that from Compiler::finishThisUp, but I can't see a corresponding call from YARR. (Not directly anyway.)

That would certainly explain why things sometimes work and sometimes don't. Thanks Julian!

Oh, you also have to call forceFlushConstantPool, as finishThisUp does. (It's implicit in executableCopy, I think, but finishThisUp needs to do other stuff in between and expects executableCopy to not change the size of the code buffer.)
Attached patch patchSplinter Review
I think I see a valid path which calls ExecutableAllocator::cacheFlush for all jitted regexps:
 RegexGenerator::compile -> LinkBuffer::performFinalization -> cacheFlush

However, I can't find any call to forceFlushConstantPool.  I also can't find any mention of forceFlushConstantPool in WebKit... how do they get by without this?  Regardless, the attached patch sticks in the call where I think it belongs.  Jacob: does this look right?  (I don't have any environment setup to actually test this, so I'm kindof shooting in the dark.)

Also, it would be good to factor this detail into the Assembler so that it happens automatically and this type of bug can't happen.
Attachment #480181 - Flags: review?(Jacob.Bramley)
(In reply to comment #14)
> Created attachment 480181 [details] [diff] [review]
> patch
> 

Tried this, turned YARR on, and it still crashes.
(In reply to comment #14)
> Created attachment 480181 [details] [diff] [review]

I suspect (with somewhat flimsy evidence) that the method jit is not
doing the proper sync when doing PIC/MIC patching.

With the patch shown below on current m-c, it's possible to see when
an explicit cacheFlush is done (and V is notified).  For a small test
case, this shows a few relatively large address ranges being
invalidated, which I'd guess are jitted method bodies:

  cacheFlush 0x4037000 102
  cacheFlush 0x4033000 105
  cacheFlush 0x4033070 35
  cacheFlush 0x4033098 105
  cacheFlush 0x4037068 4259
  cacheFlush 0x4038110 3067
  cacheFlush 0x4038d18 687

But even with that happening, when I ask valgrind --smc-check=all to
look out for any other D-I incoherence, it reports these very short
(probably single instruction) address ranges as stale

  INVAL BY CHECK: 0x40381C8 7
  INVAL BY CHECK: 0x40381CF 6
  INVAL BY CHECK: 0x4038D57 7
  INVAL BY CHECK: 0x4038D5E 6
  INVAL BY CHECK: 0x4038E33 7
  INVAL BY CHECK: 0x4038E3A 6
  INVAL BY CHECK: 0x40381C8 7
  INVAL BY CHECK: 0x4038350 10

which strike me as what I'd expect to see for PIC/MIC patching.  If
they had been properly invalidated by calling cacheFlush then I would
not expect to see them, so this says to me they lack a cacheFlush call.

This is on x86_64-linux.  We don't need to run on ARM to investigate
D-I coherence lossage.  FTR, what I am running is:

/space2/sewardj/MOZ/MC-02-10-2010/js/src/D64/shell/js -m -e \
  "const platform='linux2'; const libdir='/space2/sewardj/MOZ/MC-02-10-2010/js/src/trace-test/lib/';" \
  -f /space2/sewardj/MOZ/MC-02-10-2010/js/src/trace-test/lib/prolog.js
  -f /space2/sewardj/MOZ/MC-02-10-2010/js/src/trace-test/tests/arguments/args-createontrace.js





Initial patch:

diff --git a/js/src/assembler/jit/ExecutableAllocator.h b/js/src/assembler/jit/ExecutableAllocator.h
--- a/js/src/assembler/jit/ExecutableAllocator.h
+++ b/js/src/assembler/jit/ExecutableAllocator.h
@@ -60,16 +60,26 @@ extern "C" __declspec(dllimport) void Ca
 #if ENABLE_ASSEMBLER_WX_EXCLUSIVE
 #define PROTECTION_FLAGS_RW (PROT_READ | PROT_WRITE)
 #define PROTECTION_FLAGS_RX (PROT_READ | PROT_EXEC)
 #define INITIAL_PROTECTION_FLAGS PROTECTION_FLAGS_RX
 #else
 #define INITIAL_PROTECTION_FLAGS (PROT_READ | PROT_WRITE | PROT_EXEC)
 #endif
 
+// Embed no-op macros that let Valgrind work with the JIT.
+#ifdef MOZ_VALGRIND
+#  define JS_VALGRIND
+#endif
+#ifdef JS_VALGRIND
+#  include <valgrind/valgrind.h>
+#elif !defined(VALGRIND_DISCARD_TRANSLATIONS)
+#  define VALGRIND_DISCARD_TRANSLATIONS(addr, szB)
+#endif
+
 namespace JSC {
 
 // Something included via windows.h defines a macro with this name,
 // which causes the function below to fail to compile.
 #ifdef _MSC_VER
 # undef max
 #endif
 
@@ -223,18 +233,22 @@ public:
     }
 #else
     static void makeWritable(void*, size_t) {}
     static void makeExecutable(void*, size_t) {}
 #endif
 
 
 #if WTF_CPU_X86 || WTF_CPU_X86_64
-    static void cacheFlush(void*, size_t)
+    static void cacheFlush(void* start, size_t len)
     {
+fprintf(stderr, "cacheFlush %p %lu\n", start, len);
+        (void)start;
+        (void)len;
+        VALGRIND_DISCARD_TRANSLATIONS(start, len);
     }
 #elif WTF_CPU_MIPS
     static void cacheFlush(void* code, size_t size)
     {
 #if WTF_COMPILER_GCC && (GCC_VERSION >= 40300)
 #if WTF_MIPS_ISA_REV(2) && (GCC_VERSION < 40403)
         int lineSize;
         asm("rdhwr %0, $1" : "=r" (lineSize));
As an experiment, I made the destructor for RepatchBuffer call
cacheFlush on the range covered by the buffer:


diff --git a/js/src/assembler/assembler/RepatchBuffer.h b/js/src/assembler/assembler/RepatchBuffer.h
--- a/js/src/assembler/assembler/RepatchBuffer.h
+++ b/js/src/assembler/assembler/RepatchBuffer.h
@@ -65,16 +67,22 @@ public:
         if (mprot)
             ExecutableAllocator::makeWritable(m_start, m_size);
     }
 
     ~RepatchBuffer()
     {
         if (mprot)
             ExecutableAllocator::makeExecutable(m_start, m_size);
+        // Force D-I sync for the entire address range in the repatch
+        // buffer.  This might be far more than is really required
+        // (so is a performance hazard) and assumes that the patched
+        // area falls entirely within [m_start, +m_size) (so is a
+        // potential correctness hazard).
+        ExecutableAllocator::cacheFlush(m_start, m_size);
     }
 
     void relink(CodeLocationJump jump, CodeLocationLabel destination)
     {
         MacroAssembler::repatchJump(jump, destination);
     }
 
     void relink(CodeLocationCall call, CodeLocationLabel destination)



That gets rid of most, but not all the observed D-I incoherence
("INVAL BY CHECK:" messages in previous comment) running trace-tests.

But it doesn't get rid of all of them, because sometimes a RepatchBuffer
is used to patch code outside of its stated range.  Here's one place
in MonoIC.cpp:

    void patchInlinePath(JSScript *script, JSObject *obj)
    {
        /* Very fast path. */
        uint8 *start = (uint8 *)ic.funGuard.executableAddress();
        JSC::RepatchBuffer repatch(start - 32, 64);

        ic.fastGuardedObject = obj;

        repatch.repatch(ic.funGuard, obj);
        repatch.relink(ic.funGuard.jumpAtOffset(ic.hotJumpOffset),
                       JSC::CodeLocationLabel(script->ncode));

        JaegerSpew(JSpew_PICs, "patched CALL path %p (obj: %p)\n", start, ic.fastGuardedObject);
    }

The repatch.relink call causes accesses outside the stated range
'(start - 32, 64)' hence the invalidate-cache-in-~RepatchBuffer
strategy fails.  Changing 64 to 128 here stopped it complaining,
 but I'm sure that's not really a correct fix.

Another such case is in PolyIC.cpp,
PICStubCompiler::patchInline(const Shape *shape)
The repatcher created thusly:

253:         PICRepatchBuffer repatcher(pic, pic.fastPathStart);

does not cover the access it is later used for:

281: #elif defined JS_PUNBOX64
282:         repatcher.repatch(pic.storeBack.dataLabel32AtOffset(
                               SETPROP_INLINE_STORE_VALUE), offset);



At this point I gave up trying to fix things, because

(1) I don't know if the strategy of invalidate-cache-in-~RepatchBuffer
    is the right thing to do, and

(2) if (1) is the right thing to do, then all uses of RepatchBuffer
    in {Mono,Poly}IC.cpp and wherever else, need to be checked to
    ensure they fall inside the bounds stated at repatcher-construction
    time.


So I defer to those who know this stuff better than me.
(In reply to comment #16)

> [...] We don't need to run on ARM to investigate D-I coherence lossage.

Scratch that.  The ARM code in MJ and the assembler (and generated
code/data layout etc) is sufficiently different from the x86/x86_64
cases, that we will need to independently verify that invalidation on
ARM is working properly.
(In reply to comment #17)

> (2) if (1) is the right thing to do, then all uses of RepatchBuffer
>     in {Mono,Poly}IC.cpp and wherever else, need to be checked to
>     ensure they fall inside the bounds stated at repatcher-construction
>     time.

RepatchBuffer::RepatchBuffer uses the stated range to ensure that
the area we're about to patch is writable.  So if we're using it
inadvertantly to patch areas outside the stated range, doesn't that
put MJ at risk of segfaulting?  Assuming of course that we're
switching between r-x when running and rw- when patching, and not
merely having code pages be permanently rwx.
Julian - good catch, can you file a bug on the MonoIC patching?

AFAIK, we're not actually flipping rwx bits in JM. The WebKit assembler has that disabled unless it's compiling for the iphone or something.

Even so, we should be protecting the right regions if we do flip it on.
(In reply to comment #14)
> However, I can't find any call to forceFlushConstantPool.  I also can't find
> any mention of forceFlushConstantPool in WebKit... how do they get by without
> this?

I think it's done by executableCopy(ExecutablePool*). JM uses executableCopy(void*), which assumes that the buffer has already been flushed. I added the forceFlushConstantPool method as a kind of hack to get this working. Because we allocate one code pool and copy the fast and slow paths into consecutive regions in the same pool, the existing Macro Assembler API doesn't allow us to use the same mechanism as WebKit.

For example, we cannot call masm.size() (or stubcc.masm.size()) until we've flushed the constant pool, but without forceFlushConstantPool, we can't measure the code size.

Much of the problem here is that JM uses Macro Assembler quite differently to how WebKit does. We have a lot of low-level additions that allow fiddling with the generated code, but WebKit abstracts everything under Macro Assembler. Their system is very neat for their usage, but doesn't facilitate ICs very easily, and makes it tricky to play tricks like assembling fast and slow paths separately.
(In reply to comment #16)
> I suspect (with somewhat flimsy evidence) that the method jit is not
> doing the proper sync when doing PIC/MIC patching.

Your evidence certainly points to this, but it _should_ work. Only MICs are active on ARM at the moment, and they only patch literal-pool entries. (See patchPointerInternal in ARMAssembler.h for the actual patching code.) Literal-pool entries don't require cache maintenance after patching.
(In reply to comment #19)
> RepatchBuffer::RepatchBuffer uses the stated range to ensure that
> the area we're about to patch is writable.  So if we're using it
> inadvertantly to patch areas outside the stated range, doesn't that
> put MJ at risk of segfaulting?

It could indeed, and this is something that I have noticed in the past but haven't had time to investigate. As dvander said, it doesn't actually make any difference right now.
Comment on attachment 480181 [details] [diff] [review]
patch

It should work, but it's clearly not what is causing our crash as Michael still sees it. I'm giving r+ anyway because it belongs there.
Attachment #480181 - Flags: review?(Jacob.Bramley) → review+
(In reply to comment #22)
> (In reply to comment #16)
> > I suspect (with somewhat flimsy evidence) that the method jit is not
> > doing the proper sync when doing PIC/MIC patching.
> 
> Your evidence certainly points to this, but it _should_ work. Only MICs are
> active on ARM at the moment, and they only patch literal-pool entries. (See
> patchPointerInternal in ARMAssembler.h for the actual patching code.)
> Literal-pool entries don't require cache maintenance after patching.

That fits what I'm seeing.  I re-ran the experiment described in
comment #16 on ARM-Linux overnight, and didn't see any D-I
incoherence.  So I'm afraid that's all a red herring as far as this
bug is concerned.  Apologies for the noise.
(In reply to comment #25)
> Apologies for the noise.

No, thank you for trying to do my work for me. :-)

I've got a nice assertion firing on a capture group index of 8 when the input string is length 6 -- stepping through the blob now and reading up on ARM asm as I go.
I got curious and confirmed that the debug fennec build doesn't crash in the same way. Assuming we emit the same JIT'd code in both opt and debug, it has to be something about the way that the JIT'd code interacts with the rest of the world in opt.
Quick update: right now I'm just stepping through the emitted code to double check correctness and being particularly suspicious of the area where the emitted code returns back to the compiled code.

I was thinking that it could be worthwhile to confirm that the emitted code is "the same" in debug and opt builds, which will be possible if the pre-finalized code buffers don't have any jump targets patched.
Found some definite stack corruption using volatile global vars -- abort at:

340         JS_ALWAYS_ASSERT(matchItemCountBefore == matchItemCountAfter);

(gdb) p matchItemCountBefore
$60 = 4
(gdb) p matchItemCountAfter
$61 = 1230121070

The stack pointer itself appears to be fine.
The compiler is not generating code to preserve local reg $r8 across our nice little weirdness:

return JS_EXTENSION((reinterpret_cast<RegexJITCode>(m_ref.m_code.executableAddress()))(input, start, length, output));

Fiddling...
r8 is implicitly clobbered by the ARM macroassembler (it has a special designation as ARMRegisters::S1) for operations where it doesn't have an implicit addressing mode. Saving it away on entry fixed the problem. I'll post a patch that merges this along with other upstream changes.

(On further investigation, this was also fixed in JSC a while back. I'll file a followup to get automated notifications of relevant changes to JSC going.)
Just pushed to try.
Attachment #481108 - Flags: review?(gal)
That was stale (this has the fixed moz-form WTF_CPU macros).
Attachment #479115 - Attachment is obsolete: true
Attachment #479126 - Attachment is obsolete: true
Attachment #479224 - Attachment is obsolete: true
Attachment #479264 - Attachment is obsolete: true
Attachment #479265 - Attachment is obsolete: true
Attachment #479266 - Attachment is obsolete: true
Attachment #481076 - Attachment is obsolete: true
Attachment #481108 - Attachment is obsolete: true
Attachment #481108 - Flags: review?(gal)
Looks like it's probably okay on try, but will check what the failures are tomorrow: http://hg.mozilla.org/try/rev/710c204c4888
Yup, looks like the failures on try are unrelated.
Attachment #481110 - Attachment is obsolete: true
Attachment #481305 - Flags: review?(sayrer)
Was this pushed to mozilla-central in a recent TM merge? If so, is it FIXED?
http://hg.mozilla.org/mozilla-central/rev/2965216f9953
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.