Closed Bug 506693 Opened 15 years ago Closed 11 years ago

SELinux is preventing JIT from changing memory segment access

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
blocking2.0 --- -
status2.0 --- wanted
status1.9.2 --- ?
status1.9.1 --- ?

People

(Reporter: dveditz, Unassigned)

References

Details

Attachments

(2 files, 21 obsolete files)

177.32 KB, patch
Details | Diff | Splinter Review
265.76 KB, patch
Details | Diff | Splinter Review
Jan Lieskovsky from Red Hat Security Response Team reported the following to the Mozilla Security via mail:

while using Firefox-3.5 version if Fedora 11
(firefox-3.5.1-1.fc11), we noticed the following
security problem, related with the new Firefox's 3.5
JavaScript Just-In-Time (JIT) compiler (SELinux setroubleshoot message):

The problem:
===========

<cite>
Summary:

SELinux is preventing firefox from changing a writable memory segment
executable.

Detailed Description:

[SELinux is in permissive mode, the operation would have been denied but was
permitted due to permissive mode.]

The firefox application attempted to change the access protection of memory
(e.g., allocated using malloc). This is a potential security problem.
Applications should not be doing this. Applications are sometimes coded
incorrectly and request this permission. The SELinux Memory Protection Tests
(http://people.redhat.com/drepper/selinux-mem.html) web page explains how to
remove this requirement. If firefox does not work and you need it to work, you
can configure SELinux temporarily to allow this access until the application is
fixed. Please file a bug report
(http://bugzilla.redhat.com/bugzilla/enter_bug.cgi) against this package.

Allowing Access:

If you trust firefox to run correctly, you can change the context of the
executable to execmem_exec_t. "chcon -t execmem_exec_t
'/usr/lib/firefox-3.5/firefox'". You must also change the default file context
files on the system in order to preserve them even on a full relabel. "semanage
fcontext -a -t execmem_exec_t '/usr/lib/firefox-3.5/firefox'"

Fix Command:

chcon -t execmem_exec_t '/usr/lib/firefox-3.5/firefox'
</cite>

Analysis and argumentation:
===========================

After further investigation / review it implied, this is a potential security
issue related with the way, the new Just-In-Time compiler is designed.
Here is the argumentation from Ulrich Drepper what's related to this issue:

<snip>
It's simply not acceptable that the program which is most susceptible to the
problems the Internet exposes a machine is the one which doesn't follow the
rules for secure programming.  If the new JIT requires writable and executable
memory it either has to be rewritten (have two mappings for the same data: one
writable, one executable) or it has to be disabled. 
</snip>

Conclusion:
===========

This results to the following proposed solution -- if newest Mozilla's Firefox 3.5
JavaScript JIT compiler needs to map relevant memory area for both, writing and
execution, it should do that in two separate requests. Because as Ulrich
already pointed out, the current implementation is prone to security attacks,
as demonstrated in the wild by:

    https://bugzilla.mozilla.org/show_bug.cgi?id=503286
    (CVE-2009-2477 CVE-2009-2478 CVE-2009-2479)

Separation of 'write' and 'exec' memory map requests into two separate requests
could prevent occurrance of such exploits / flaws in the future.

  Ulrich also recommends / provides sample code, how this problem can be
  solved in a secure way:
    http://people.redhat.com/drepper/selinux-mem.html

Could you have a further look into this issue and fix / reimplement relevant
parts of Mozilla Firefox's 3.5 JIT compiler to address this deficiency?
Whiteboard: [sg:investigate]
The basic SELinux problem is that you can't map writable and executable at the same time, correct?

I don't see any reason this needs to be private: it's well known.
Could I ask for adding stransky@redhat.com to the CC list so he can see this?
(In reply to comment #2)
> Could I ask for adding stransky@redhat.com to the CC list so he can see this?

Done.
blocking1.9.1: ? → ---
status1.9.1: --- → ?
I'll note that I agree this should not be private.

Also, this bug will block Fedora 12 which will deny execmem by default, in SELinux enforcing mode.  We had to revert that feature temporarily to be able to ship our alpha, but the intention is to ship it for final.
Anyone there who can help with a patch to the JIT to get the better behaviour?
Assignee: general → gal
I think graydon's merge work will help this, but I assume we have to fix this for 3.5.x so we are going to have to patch the old fragmento. I have a patch for r/o code fragments somewhere in bugzilla.
Depends on: 473872
Yeah. I'm hesitant to accept Ulrich's reasoning on this. The theory is that someone who wants to attack us might be able to pull off said attack by finding a bug and exploiting it to write to executable memory (this much I agree with) but is then *not* going to be able to pull off the exact same attack when it's done via "dual mapping", writing their attack into the W mapping and then convincing the program to run it via the X mapping.

The argument rests on the belief that "twice random = more random". It's a nice thought, but it's also a lot of work to implement, and I don't see anything other than wishful thinking motivating it. Just this notion that "since it's more complicated, it'll be secure". If that were true, we could just throw in a few more randomized hashtables in the middle of our page-mapping data structures and declare ourselves secure.
I think we have a unique opportunity with traces that we can actually implement a read-only code cache, since traces are specialized at recording time and don't require on-the-fly patching like PICs do. The previous patch was pretty expensive because we used to have 4k pages and we had to set and revoke the execute bit for each page individually. With the new code allocator that should be less of an issue. What I still don't know how to solve is how to address concurrent code cache access. As long we keep one code cache per thread though, that should be fine.
Dan, I would like to unhide the bug to get more feedback. I am pretty sure its widely known that our code cache is write and executable, and I don't think this bug contains information that warrants to hide it.
For what it's worth, I agree completely with comment 7.  Executing dynamically generated code is what can get you into trouble; you still have that with the proposed workaround.

I also agree completely with comment 9.  Are there any other serious JIT engines which actually do the two-views-of-one-page hackaround?  I'm not aware of any, which makes me think anyone malicious is going to expect precisely what we do without having looked, and it's not exactly hard to look and verify that assumption in any case.
Group: core-security
Let me say one thing first:

getting writable and executable memory in any form is nowadys a privilege, not a right.

I think even today's rules as implemented in SELinux are to loose and should be tightened even further.

In this light those who insist on having writable memory have the obligation to do everything possible to justify the trust.  Reading Graydon's comments does exactly the opposite.

Of course using the double-mapping scheme doesn't solve the problem 100%.  Only disallowing writable&executable memory completely does it.  If this is what you prefer, fine, it's easy enough to implement in the OS.

But this is not in the best interest of everybody and therefore we have to mitigate the risk with all means.  The double mapping does help, as simple math tells you.  If an attacker can guess the address range for a mapping with, let's say, 5% (very high) then guessing both addresses at the same time is possible in only 0.25% of the time.  That's a significant reduction.

If only on address is needed an attacker can write position-independent code which just has to be injected.  On x86-64 we have an appropriate addressing mode (relative to the %rip).  Using this makes it far too easy to write exploits.  If the writable region is elsewhere this becomes much harder.

It is no extravagant request/demand to fix this.  It is just common sense.  Firefox is *by far* the most problematic application, security-wise.  5 out of 7 critical advisories for RHEL5.3 were for firefox (see Mark Cox's blog).  *Anything* that can help here, reducing the severity level or whatever, must be done.

The changes also shouldn't be that problematic, unless the code is truly chaotic:

- change buffer management to keep two buffers aligned in size
- wrap all places where code/data is injected
- do some pointer arithmetic before the actual poking

If there are ways in which we (= Red Hat) can help let any of us know.  We'll see what can be done.
Personally I am shocked security folks are floating the double mapped memory hack. It seems like false security through obscurity to me, with very little randomization, even if we accept your math above. In my little academic ivory tower, executable and writable bits shouldn't co-exist, and the kernel should enforce this. I would prefer implementing this strategy in ff, but unfortunately it seems to have a pretty significant performance cost (ballpark 5% last time I measured).

The root cause for this overhead is the fact that dynamically typed programming languages like JS require frequent code cache updates, either because code is rewritten on the fly (PICs, V8 says hello) or in case of TraceMonkey a lot of specialized read-only code fragments are assembled in very quick succession, interleaved with actual code execution. Flipping the read/write/exec bits every time we update the code cache seems impractical. The syscall takes a small eternity, not to mention the havoc that causes at the hardware level (TLBs and what not).

An additional complication I ran into is concurrent code generation. I am playing around with a background compiler thread, and revoking the execute bit becomes extremely tricky without tripping the foreground thread that runs code from the same code cache.

Both frequent code cache updates and concurrent updates would be overhead free with the proposed double mapped page approach, so I am willing to go that route as long there is buy-in from our security folks that this actually a valid fix for the writable code cache issue.

As for an actual implementation, its not as an easy 1. 2. 3. as in the above post. Our code cache is chunked, and native code fragments can span multiple regions in multiple chunks. Figuring out for each region what the local mapping to the sibling page(s) is, is not going to be trivial, especially if we want the offset to differ for each chunk (which we want for randomization, but also because address space fragmentation will likely not allow to always get additional pages with a constant offset).

Ed, what is your take on this?
(In reply to comment #12)
> The root cause for this overhead is the fact that dynamically typed programming
> languages like JS require frequent code cache updates, either because code is
> rewritten on the fly (PICs, V8 says hello) or in case of TraceMonkey a lot of
> specialized read-only code fragments are assembled in very quick succession,
> interleaved with actual code execution. Flipping the read/write/exec bits every
> time we update the code cache seems impractical.

Nobody suggests this.  With two mappings you can always have a writable segment.  It seems you haven't even tried to understand what I described in the page linked in the original report.
(In reply to comment #13)
> Nobody suggests this.  With two mappings you can always have a writable
> segment.  It seems you haven't even tried to understand what I described in the
> page linked in the original report.

*Andreas* suggests this, since as you say the current different-mapping limitation is a mitigation, not a solution to the problem.  It would actually solve the problem if we didn't have any memory that was concurrently mapped executable and writable (via one or multiple address regions, the attacker probably doesn't care), but unfortunately the performance characteristics of making such changes frequently aren't really viable.

It seems you haven't tried to understand the comment to which you replied?

Thanks for your offer of help -- I think that Graydon and Andreas can point you to the appropriate code, and I know they'd be willing to review a patch.
For case this would be helpful for you (to identify particular code parts).
This was reported in relevant Fedora bug BZ#509945:

<cite>
This is where xulrunner traces back when executable/writable mappings are
disallowed:

Program received signal SIGSEGV, Segmentation fault.
nanojit::LirBufWriter::ins0 (op=<value optimized out>, this=<value optimized
out>, this=<value optimized out>, 
    op=<value optimized out>) at nanojit/LIR.cpp:315
315                     l->initOpcode(op);
Missing separate debuginfos, use: debuginfo-install firefox-3.5-2.fc12.i586
(gdb) bt
#0  0x00a3890d in nanojit::LirBufWriter::ins0 (op=<value optimized out>,
this=<value optimized out>, 
    this=<value optimized out>, op=<value optimized out>) from
/usr/lib/xulrunner-1.9.1/libmozjs.so
#1  0x00a0202f in RegExpNativeCompiler::compile (this=0xbfffa26c,
cx=0xb1ba2c00) at jsregexp.cpp:2411
#2  0x009f94aa in CompileRegExpToNative (re=<value optimized out>, cx=<value
optimized out>, fragment=<value optimized out>)
    at jsregexp.cpp:2475
#3  GetNativeRegExp (re=<value optimized out>, cx=<value optimized out>,
fragment=<value optimized out>)
    at jsregexp.cpp:2510
#4  MatchRegExp (re=<value optimized out>, cx=<value optimized out>,
fragment=<value optimized out>) at jsregexp.cpp:3922
#5  js_ExecuteRegExp (re=<value optimized out>, cx=<value optimized out>,
fragment=<value optimized out>)
    at jsregexp.cpp:4090
#6  0x009fff92 in regexp_exec_sub (cx=0xb1ba2c00, obj=<value optimized out>,
argc=1, argv=0xb15cd0c0, test=1, 
    rval=0xb15cd0b8) at jsregexp.cpp:4889
#7  0x00a00062 in regexp_test (cx=0xb1ba2c00, argc=1, vp=0xb15cd0b8) at
jsregexp.cpp:4911
#8  0x009c8d99 in js_Interpret (cx=0xb1ba2c00) at jsinterp.cpp:5147

Just debuginfo-install xulrunner and you'll see several instances of such
mappings:

/usr/src/debug/xulrunner-1.9.1/mozilla-1.9.1/js/src/nanojit/Assembler.cpp
/usr/src/debug/xulrunner-1.9.1/mozilla-1.9.1/js/src/nanojit/Assembler.h
/usr/src/debug/xulrunner-1.9.1/mozilla-1.9.1/js/src/nanojit/Nativei386.cpp
/usr/src/debug/xulrunner-1.9.1/mozilla-1.9.1/js/src/nanojit/avmplus.h

And a rather funny comment as well:

    #elif defined AVMPLUS_UNIX
            /**
             * Don't use normal heap with mprotect+PROT_EXEC for executable
code.
             * SELinux and friends don't allow this.
             */
            return mmap(NULL,
                        pages * kNativePageSize,
                        PROT_READ | PROT_WRITE | PROT_EXEC,
                        MAP_PRIVATE | MAP_ANON,
                        -1,
                        0);
    #else  

</cite>

Cc-ed also original Fedora reporter Lubomir Rintel to this bug.
Thanks, we know where the regions are coming from, it's all quite centralized. 

I wish I could say Ulrich's response "shocks" me, but it's what I knew we'd get here. I still don't think this will make us one iota safer: heapspray though the writable mapping (of the sort we just were exploited by) will still work, and %rip-relative addressing is a red herring (you can build PIC code if you can find a single GOT-derived address in a register or on the stack). But that's beside the point.

The plain fact is that one can't win an argument like this with Ulrich, so we really only have two options: do what he says, or close the bug and/or maintain a perpetual disagreement with him. I don't have the energy for the latter. It'll be less work to just accept his preference and implement it. Let's put this on the "post merge" list of feature work on the code allocator and be done with it.
I'm working on a patch based on the double-mapped pages. The page allocating/altering seems to work but I expect the code is generated as position dependent and generated relatively to the write-pages. Can you point me where I should look when I want to generate the code relatively to the execution pages?
(In reply to comment #17)
> I'm working on a patch based on the double-mapped pages. The page
> allocating/altering seems to work but I expect the code is generated as
> position dependent and generated relatively to the write-pages. Can you point
> me where I should look when I want to generate the code relatively to the
> execution pages?

This is going to be a long effort. Look in the backends, such as nanojit/Nativei386.cpp and its associated header nanojit/Nativei386.h. See, for example, the macro JMP, which composes a pc-relative jump instruction. This is based on the distance to its target, but its target may not be in the same contiguous chunk of memory / page. So if you're going to have each writable page correspond to an executable page, all such arithmetic is going to have to change to go indirectly through a mapping table that transforms "addresses in a writable page" to "addresses in an executable page".

This is why I said "it'll be a lot of work". The code might be position-independent (or it might be possible to *make* it position-independent) but it's certainly not the case that all code referencing into or out of page X is contained within page X. Or even "multi-page chunk X". Instructions are written repeatedly into tiny bits of memory scattered all over the CodeAlloc's pages. You're going to be doing a lot of book-keeping to map between each writable bit and its executable partner.
Attached patch WIP (obsolete) — Splinter Review
First work in progress version, applies to 1.9.1.12 and passes trace-test.js. Should work fine (just need to change home dir at GCHeap::Alloc().

A short description&discussion:

- Each Page has attached its executable mirror (exec pointer at PageHeader)

- write_to_exec() translates any address at write page to corresponding executable page. Page verification is managed by PAGE_SIGNATURE hack (should we traverse the page list here? disable the check? or something else?)

- calcOffset() macro calculates offset in executable space. Note that the source address has to be decremented some code is generated from end of the page so effective jump offset is actually calculated relative to next (but unrelated) page. So we have to differentiate source & target address.

- CALL (c->_address) is not translated, supposed to be in code segment only. Do we even use call into generated code?

- CALLr(c,r) is ignored for now.

- GCHeap::Alloc() has to map some sane tmp file.
Assignee: gal → stransky
Why is the PAGE_SIGNATURE hack necessary? Is that intended for debugging only? Why would we ever touch a page we don't have to translate?
(In reply to comment #20)
> Why is the PAGE_SIGNATURE hack necessary? Is that intended for debugging only?
> Why would we ever touch a page we don't have to translate?

I used it for debugging and because I don't know how the code is composed and if all translated address/jump targets are from our page space (e.g. destinations of CALL()). It can be removed if we don't translate any foreign address.

From my current experience (run trace-test.js) the PAGE_SIGNATURE check is not necessary.
(In reply to comment #20)
> Why is the PAGE_SIGNATURE hack necessary? Is that intended for debugging only?
> Why would we ever touch a page we don't have to translate?

btw. The PAGE_SIGNATURE check could be simplified to check whether the executable page points to itself:

assert(pageExec->exec == pageExec);

Another options is to add a reference to write page to the page header so we can provide full exec <=> write translation (and checks) like so:

struct PageHeader
{
  struct Page *next;
  struct Page *write;
  struct Page *exec;
};

assert(pageWrite->write == pageWrite);
pageExec = pageWrite->exec;
assert(pageExec->exec == pageExec);

Anyway I'd like to hear if you agree with the overall conception. If so I'll provide some polished version.
(In reply to comment #19)
> Created an attachment (id=399047) [details]
> WIP
> 
> First work in progress version, applies to 1.9.1.12 and passes trace-test.js.
> Should work fine (just need to change home dir at GCHeap::Alloc().

You're working on old code (perhaps 3.5 release series?)

We replaced the entire allocation system after that release, and it is no longer based on single-page allocations or a Fragmento. Please base your work on trunk, unless you're just hoping to backport to 3.5.
Oops, I just learned that the new allocator is not even on 1.9.2 / 3.6 branch yet. So we're ... still in transition. But shifting 1.9.2 to the new allocator is proposed. May or may not happen. Trunk definitely has switched, as has Adobe's code, and it's the interface we're hoping to use for the future.
Martin, I am looking through your patch. I will give you some feedback based on it. As Graydon pointed out we reworked the code allocator path, so we would have to port to that. 3.6 will be based off the old code afaik.
Looks like the new code allocator _will_ land for 3.6, so this makes reworking the patch for the new CodeAlloc even more important.
(In reply to comment #23)
> We replaced the entire allocation system after that release, and it is no
> longer based on single-page allocations or a Fragmento. Please base your work
> on trunk, unless you're just hoping to backport to 3.5.

Yes, we (Red Hat) really want it for Fedora 12/ff3.5. But I will definitely work on a patch for the trunk too.
Attached patch WIP v2 (for trunk) (obsolete) — Splinter Review
Here is the updated trunk version. 

CodeAlloc::freePage(), CodeAlloc::allocPage() manage the physical memory allocation and CodeAlloc::getPage(NIns* p) returns appropriate page (CodeList *) for p. Address translation is realised by CodeList::toExec().

(btw. If pagesPerAlloc = 1 we can drop the page traversal at CodeAlloc::getPage() and just round up the address like in the first patch)
Ping, any feedback here?
Sorry for the delay. I will review the patch.
Comment on attachment 402318 [details] [diff] [review]
WIP v2 (for trunk)

> 
> #ifdef NANOJIT_IA32
>     void Assembler::patch(SideExit* exit, SwitchInfo* si)
>     {
>         for (GuardRecord* lr = exit->guards; lr; lr = lr->next) {
>             Fragment *frag = lr->exit->target;
>             NanoAssert(frag->fragEntry != 0);
>-            si->table[si->index] = frag->fragEntry;
>+            si->table[si->index] = TO_EXEC(frag->fragEntry);


We are trying to get rid of macro magic in NJ. If there is no strong reason to use a macro here, please don't.

>         underrunProtect(si->count * sizeof(NIns*) + 20);
>         _nIns = reinterpret_cast<NIns*>(uintptr_t(_nIns) & ~(sizeof(NIns*) - 1));
>         for (uint32_t i = 0; i < si->count; ++i) {
>             _nIns = (NIns*) (((uint8*) _nIns) - sizeof(NIns*));
>-            *(NIns**) _nIns = target;
>+            *(NIns**) _nIns = TO_EXEC(target);
>         }

Why not adjust this outside the loop?

>+nanojit::CodeAlloc::allocCodeChunk(size_t nbytes, void *&exec) {
>+    char tmpfname[] = "/home/komat/.execmemXXXXXX";

Whats the final plan for this? /tmp or env[HOME]? Are there any security implications here? Reliability issues? What if the file exists at startup? Make sure you understand all aspects of this (because I am not sure I do).

> #elif defined AVMPLUS_UNIX
>     // fixme: __clear_cache is a libgcc feature, test for libgcc or gcc
>     void CodeAlloc::flushICache(CodeList* &blocks) {
>         for (CodeList *b = blocks; b != 0; b = b->next) {
>+            // TODO -> translate to exec space?

Yeah, we should translate here for sure.

>             __clear_cache((char*)b->start(), (char*)b->start()+b->size());
>         }
>     }
> #endif // AVMPLUS_MAC && NANOJIT_PPC
> 
>     void CodeAlloc::addBlock(CodeList* &blocks, CodeList* b) {
>         b->next = blocks;
>         blocks = b;
>     }
> 

>+        /** return true if just this block contains p */
>+        bool contains(NIns* p, uintptr_t blockSize)  { return uintptr_t(p) >= uintptr_t(start()) && 
>+                                                              uintptr_t(p) <  uintptr_t(start()) + blockSize; }

Style. Use
foo
{
}          

>+
>+        bool hasExecPage() { return write != exec; }
>+
>+        /** converts address to exec page */
>+        NIns* toExec(NIns *p) 
>+        { 
>+            if(!p || !hasExecPage())
>+                return(p);

Can these cases really happen? Might be better to assert here and only hand in useful conversion requests.

>+
>+            assert(p >= write->start());
>+            return (NIns*)(uintptr_t(exec) + (uintptr_t(p) - uintptr_t(write)));
>+        }
>+
>+        /** converts address to write page */
>+        NIns* toWrite(NIns *p) 
>+        { 
>+           if(!p || !hasExecPage())
>+                return(p);

Dito here.

>+
>+            assert(p >= exec->start());
>+            return (NIns*)(uintptr_t(write) + (uintptr_t(p) - uintptr_t(exec)));
>+        }
>+   };
> 

Looks promising. Thanks for working on this. The next round we should ask someone from Adobe to review the patch.
> >         underrunProtect(si->count * sizeof(NIns*) + 20);
> >         _nIns = reinterpret_cast<NIns*>(uintptr_t(_nIns) & ~(sizeof(NIns*) - 1));
> >         for (uint32_t i = 0; i < si->count; ++i) {
> >             _nIns = (NIns*) (((uint8*) _nIns) - sizeof(NIns*));
> >-            *(NIns**) _nIns = target;
> >+            *(NIns**) _nIns = TO_EXEC(target);
> >         }
> 
> Why not adjust this outside the loop?

Oh, sure, it's just overlooked.

> >+nanojit::CodeAlloc::allocCodeChunk(size_t nbytes, void *&exec) {
> >+    char tmpfname[] = "/home/komat/.execmemXXXXXX";
> 
> Whats the final plan for this? /tmp or env[HOME]? Are there any security
> implications here? Reliability issues? What if the file exists at startup? Make
> sure you understand all aspects of this (because I am not sure I do).

Me neither but http://gcc.gnu.org/viewcvs/trunk/libffi/src/closures.c?revision=150042&view=markup contains some ideas how to manage it. And I believe Ulrich can help here too.

> >+
> >+        bool hasExecPage() { return write != exec; }
> >+
> >+        /** converts address to exec page */
> >+        NIns* toExec(NIns *p) 
> >+        { 
> >+            if(!p || !hasExecPage())
> >+                return(p);
> 
> Can these cases really happen? Might be better to assert here and only hand in
> useful conversion requests.

Sometimes offsets are NULL. And !hasExecPage() happens when the douple-page design is not enabled (win and other arches).

> Looks promising. Thanks for working on this. The next round we should ask
> someone from Adobe to review the patch.

Thanks. If you agree with the overall conception I'll move toward x86_64 implementation and the correct tmpfname[].
(In reply to comment #32)
> Me neither but
> http://gcc.gnu.org/viewcvs/trunk/libffi/src/closures.c?revision=150042&view=markup
> contains some ideas how to manage it. And I believe Ulrich can help here too.

The libffi code is the best possible solution.  It covers all the bases.  There are systems where home cannot be used for mmap and others where /tmp is mounted with noexec.  There must be one place where mmap(PROT_EXEC) is possible and this code should find it.  I suggest you implement the same for moz.
We should make sure the patch does not cause any overhead for systems where this won't be enabled.
Attached patch v3 (obsolete) — Splinter Review
Third version, includes the dynamic file allocation, i386 only (for now).
Attachment #402318 - Attachment is obsolete: true
Attachment #405024 - Flags: review?(gal)
Comment on attachment 405024 [details] [diff] [review]
v3

>-    {
>+    {        

Bogus tab at the end.

>+        // TODO -> Do we need to adjust something here for the separated read/write pages?
>         for (CodeList *b = blocks; b != 0; b = b->next)
>             VALGRIND_DISCARD_TRANSLATIONS(b->start(), b->size());

Lets get some feedback from Nick or Julian on this.


>+        /** convert address to exec page */
>+        NIns* toExec(NIns *p) 
>+        { 
>+            if(!p || !hasExecPage())
>+                return p;

Why is this not an assert? How can an already executable page end up here?

>+        { 
>+           if(!p || !hasExecPage())
>+                return p;
>+

Dito.

Lets get Ed to take a look too. I am pretty sure Adobe will want 0 overhead for this on Win32 if we don't use it there, so we would have to provide empty toWrite toExec wrappers for all platforms where this isn't going to be used.
Attachment #405024 - Flags: review?(edwsmith)
(In reply to comment #36)
> 
> >+        // TODO -> Do we need to adjust something here for the separated read/write pages?
> >         for (CodeList *b = blocks; b != 0; b = b->next)
> >             VALGRIND_DISCARD_TRANSLATIONS(b->start(), b->size());
> 
> Lets get some feedback from Nick or Julian on this.

I don't know this code, but it looks reasonable.  In general, if you generate code over the top of previously executed code, Valgrind needs to be told about the affected memory region via VALGRIND_DISCARD_TRANSLATIONS.  And that's what the above code seems to do.
Attached patch v4 (obsolete) — Splinter Review
Should address the comments. Added different toExec/calcOffset methods for unsupported platforms and fixed the valgrind notification.
Attachment #405024 - Attachment is obsolete: true
Attachment #406206 - Flags: review?(gal)
Attachment #405024 - Flags: review?(gal)
Attachment #405024 - Flags: review?(edwsmith)
Attached patch v5 (obsolete) — Splinter Review
Ah, hg diff ignores new files so the v4 is missing the MemMap class. Attaching a correct patch now.
Attachment #406206 - Attachment is obsolete: true
Attachment #406207 - Flags: review?(gal)
Attachment #406206 - Flags: review?(gal)
Attachment #406207 - Flags: review?(rreitmai)
Attachment #406207 - Flags: review?(edwsmith)
You should add this to your .hgrc:

# Use git diffs for binary diffing, emulate -p
[diff]
git = 1
showfunc = true
Ping - any update here?
Preliminary comments

* I can't find anything obviously wrong with the patch.  it looks like the invariant throughout Assembler is that code pointers point to writeable memory, and every point we need the executable address, we use toExec().  cool... a final review will be needed to just make sure we dont miss a spot.  

one way to mitigate that risk is to use a different pointer type for pointers to executable memory vs pointers to writeable memory.

* its also important that the machenery for double-mapping memory be encapsulated (all the code with pathnames in it is clearly machine specific.  I think that's been done but the patch attached doesn't preserve filenames.

* the working version of toExec has lots of code in it, which worries me.  it could slow down the assembler by a lot.  I think the CodeList data structure should only track writeable memory (no twins) and have just one field, the offset to the executable memory.  The Assembler tracks the current pages being used, so the fast path through Assembler::toExec should be:  { return p + exec_offset } where exec_offset is computed once and only updated when we allocate new pages.

(its value also needs to be swapped in the swapCodeChunks() function when we swap codeStart/exitStart, etc).

is this feasible?  do we ever have to do anything with CodeList structures when starting solely from an executable address (ie: i think all the calculations start with write addresses).

If the fast path is { p + offset } then in non-double-mapped builds, offset can be a constant 0 and we dont need as many ifdef's.
Attachment #406207 - Flags: review?(edwsmith) → review-
Thanks for the comments!

> one way to mitigate that risk is to use a different pointer type for pointers
> to executable memory vs pointers to writeable memory.

Do you mean something like NIns* and NInsExec* ?

> * its also important that the machenery for double-mapping memory be
> encapsulated (all the code with pathnames in it is clearly machine specific.  > I think that's been done but the patch attached doesn't preserve filenames.
 
I'm not sure what do you mean with "pathnames" here...

> * the working version of toExec has lots of code in it, which worries me.  it
> could slow down the assembler by a lot.  I think the CodeList data structure
> should only track writeable memory (no twins) and have just one field, the
> offset to the executable memory.  The Assembler tracks the current pages being
> used, so the fast path through Assembler::toExec should be:  { return p +
> exec_offset } where exec_offset is computed once and only updated when we
> allocate new pages.
> [....] 

Okay, will try move the address translation to higher level.
Let's divide the patch to some smaller pieces. This one contains secure memory allocation for linux systems.
Attachment #417089 - Flags: review?(gal)
There is a patch with requested changes here:

- offset to current executable pages has been moved to Assembler::_execOffset, Assembler::_exitExecOffset, they are allocated by Assembler::codeAlloc().

- Assembler::toExec() simplification.

- CodeList::toExec has been removed completely, page twins has been
replaced by execOffset.

- Assembler::swapCodeChunks() update (i386 only for now)

I haven't done the NIns->NInsExec change for now.
Attachment #406207 - Attachment is obsolete: true
Attachment #417098 - Flags: review?(edwsmith)
Attachment #406207 - Flags: review?(rreitmai)
Attachment #406207 - Flags: review?(gal)
(In reply to comment #43)
> Thanks for the comments!
> 
> > one way to mitigate that risk is to use a different pointer type for pointers
> > to executable memory vs pointers to writeable memory.
> 
> Do you mean something like NIns* and NInsExec* ?

yes

> > * its also important that the machenery for double-mapping memory be
> > encapsulated (all the code with pathnames in it is clearly machine specific.  > I think that's been done but the patch attached doesn't preserve filenames.
> 
> I'm not sure what do you mean with "pathnames" here...

I was referring to the code that creates the temporary file, unlinks it, then maps it twice.  You've already factored it out of CodeAlloc, which addressed my concern.

> > * the working version of toExec has lots of code in it, which worries me.  it
> > could slow down the assembler by a lot.  I think the CodeList data structure
> > should only track writeable memory (no twins) and have just one field, the
> > offset to the executable memory.  The Assembler tracks the current pages being
> > used, so the fast path through Assembler::toExec should be:  { return p +
> > exec_offset } where exec_offset is computed once and only updated when we
> > allocate new pages.
> > [....] 
> 
> Okay, will try move the address translation to higher level.
Does freeCodeChunk() need to take the exec address as an argument as well?  Seems like if allocCodeChunk returns two addresses then freeCodeChunk should free (munmap) them both at the same time.

In the patch for bug 460993, I'm moving protect calls out of CodeAlloc and into a host api; that way the host controls how/when page protections are set.  If we extend the api to handle addr and exec in each call, would it clean things up?

my last concern is still that we could spend too much compile time traversing data structures in calcOffset().  Some benchmark results would alleviate the concern.
(In reply to comment #47)
> my last concern is still that we could spend too much compile time traversing
> data structures in calcOffset().  Some benchmark results would alleviate the
> concern.

Which benchmark/CPU combination would you like to see? I did some sunspider runs
(from http://www2.webkit.org/perf/sunspider-0.9/sunspider-driver.html) on Intel Core 2 box (2GHz) and there was no difference between patched/non patched version.
There is a sun spider sample here, run on Core2/trunk/debug build/optimalizations disabled.

Patched:
http://www2.webkit.org/perf/sunspider-0.9/sunspider-results.html?%7B%223d-cube%22:%5B516,513,514,516,509%5D,%223d-morph%22:%5B87,86,86,88,85%5D,%223d-raytrace%22:%5B349,349,352,354,348%5D,%22access-binary-trees%22:%5B156,157,157,157,155%5D,%22access-fannkuch%22:%5B272,255,256,256,263%5D,%22access-nbody%22:%5B130,138,131,130,134%5D,%22access-nsieve%22:%5B48,49,47,48,48%5D,%22bitops-3bit-bits-in-byte%22:%5B2,2,2,2,3%5D,%22bitops-bits-in-byte%22:%5B12,12,12,12,13%5D,%22bitops-bitwise-and%22:%5B4,4,4,4,4%5D,%22bitops-nsieve-bits%22:%5B79,82,78,79,79%5D,%22controlflow-recursive%22:%5B19,19,19,18,19%5D,%22crypto-aes%22:%5B219,217,215,219,264%5D,%22crypto-md5%22:%5B99,99,98,102,99%5D,%22crypto-sha1%22:%5B37,38,37,37,37%5D,%22date-format-tofte%22:%5B674,678,671,674,672%5D,%22date-format-xparb%22:%5B315,318,315,317,316%5D,%22math-cordic%22:%5B31,31,31,31,31%5D,%22math-partial-sums%22:%5B37,36,38,37,36%5D,%22math-spectral-norm%22:%5B25,25,25,25,25%5D,%22regexp-dna%22:%5B122,119,120,119,123%5D,%22string-base64%22:%5B106,106,106,105,107%5D,%22string-fasta%22:%5B476,479,473,473,497%5D,%22string-tagcloud%22:%5B514,512,515,518,517%5D,%22string-unpack-code%22:%5B650,642,661,645,648%5D,%22string-validate-input%22:%5B343,361,360,351,355%5D%7D

Not patched:
http://www2.webkit.org/perf/sunspider-0.9/sunspider-results.html?%7B%223d-cube%22:%5B511,511,516,505,517%5D,%223d-morph%22:%5B85,85,84,82,85%5D,%223d-raytrace%22:%5B350,345,345,345,345%5D,%22access-binary-trees%22:%5B160,158,157,158,161%5D,%22access-fannkuch%22:%5B250,251,264,264,250%5D,%22access-nbody%22:%5B127,125,126,122,129%5D,%22access-nsieve%22:%5B47,47,48,50,47%5D,%22bitops-3bit-bits-in-byte%22:%5B2,2,2,2,2%5D,%22bitops-bits-in-byte%22:%5B13,13,13,12,13%5D,%22bitops-bitwise-and%22:%5B3,3,3,3,3%5D,%22bitops-nsieve-bits%22:%5B79,80,79,79,78%5D,%22controlflow-recursive%22:%5B19,19,18,18,19%5D,%22crypto-aes%22:%5B214,214,217,214,212%5D,%22crypto-md5%22:%5B98,98,97,96,97%5D,%22crypto-sha1%22:%5B36,36,36,37,36%5D,%22date-format-tofte%22:%5B664,665,663,665,662%5D,%22date-format-xparb%22:%5B310,309,309,311,307%5D,%22math-cordic%22:%5B31,33,31,31,31%5D,%22math-partial-sums%22:%5B36,36,36,36,37%5D,%22math-spectral-norm%22:%5B25,25,25,25,25%5D,%22regexp-dna%22:%5B119,123,121,121,118%5D,%22string-base64%22:%5B106,105,105,108,108%5D,%22string-fasta%22:%5B499,500,498,497,497%5D,%22string-tagcloud%22:%5B506,517,515,511,512%5D,%22string-unpack-code%22:%5B651,634,637,636,633%5D,%22string-validate-input%22:%5B353,354,350,359,354%5D%7D

Total results:
--------------
Patched: 5335.2ms +/- 0.7%
Not patched: 5288.4ms +/- 0.2%

As for the calcOffset() and traversing of allocated pages - there are usually only 2-3 physical pages allocated so the page search is pretty fast. I can implement some page cache but it looks pointless for now.

Unfortunately I can't test a patch from bug 460993, the protect-impl.patch does not apply to mozilla tree.
I did a small optimalization to CodeAlloc::toExec()/CodeAlloc::calcOffset(),

+    NIns* CodeAlloc::toExec(NIns* p, CodeList* page) {
+        if(!p) {
+            return NULL; 
+        }
+
+        if (!page)
+            page = getPage(p);
+
+        assert(page != NULL);
+
+        NIns *out = (NIns*)(uintptr_t(p) + page->execOffset);
+   
+        if (verbose)
+            avmplus::AvmLog("toExec write %p -> exec %p page = %d\n", p, out, page);
+       
+        return out;
+    }
+
+    uintptr_t CodeAlloc::calcOffset(NIns* target, NIns* source) {


+        CodeList* target_page = getPage(target);
+        CodeList* source_page = getPage(source);
+
+        uintptr_t offset;
+           
+        if(target_page == source_page) {
+            // both adresses are from one continous memory block
+            offset = uintptr_t(target) - (uintptr_t(source) + sizeof(NIns));
+        }
+        else {
+            // different pages - run the translation machinery
+            uintptr_t t = uintptr_t(toExec(target, target_page));
+            uintptr_t s = uintptr_t(toExec(source, source_page));
+            assert(t != 0 || s != 0);
+       
+            offset = t - (s + sizeof(NIns));
+        }
+      
+        if (verbose)
+            avmplus::AvmLog("calcOffset target %p, source %p -> offset %p\n", target, source+1, offset);
+          
+        return offset;
+    }
BTW. With this small optimalization in CodeAlloc::toExec()/CodeAlloc::calcOffset(), CodeAlloc::toExec() can take a page argument so if it's called from calcOffset(), getPage() is calculated only once.
Ping. Is there anything else you need?
Comment on attachment 417098 [details] [diff] [review]
write/exec code with requested changes

I don't see anything else glaringly wrong.  In Tamarin we decided to go with a more heavy handed page protection scheme (flip between RX/RW with no dual mapping), but if TM wants this scheme to facilitate faster patching, it should be doable.  again, the key is not impacting performance or maintainability when single-mapped code pages are wanted by the host vm.
Attachment #417098 - Flags: review?(gal)
Attachment #417098 - Flags: review?(edwsmith)
Attachment #417098 - Flags: review+
(In reply to comment #53)
> (From update of attachment 417098 [details] [diff] [review])
> In Tamarin we decided to go with a
> more heavy handed page protection scheme (flip between RX/RW with no dual
> mapping), but if TM wants this scheme to facilitate faster patching, it should
> be doable.

Ehm, switching between RW and RX is almost equally bad and also isn't allowed by SELinux.  If the program can do this then the bad guy can craft some code to do this as well (return-to-libc sequences to call into the mprotect wrappers in libc).

Once a page is marked writable it cannot be allowed to become executable again.  That's the only safe policy.
Comment on attachment 417098 [details] [diff] [review]
write/exec code with requested changes


>+// TODO -> when write/exec pages are enabled, c->_address has to be in exec code segment

What's the follow-up for this?

I recommend renaming CodeAlloc::toExec to something else. There is already a toExec function that does a very different mapping (without walking all heap blocks).

I am still not a big fan of the approach, but Ulrich seems to make a good point about RW<>RX mapping changes using return-to-libc, which somewhat sinks my favorite approach (disable X while emitting code).

I wonder what the next steps are. Windows support? Should we try to settle on one mechanism together with Adobe?
Attachment #417098 - Flags: review?(gal) → review+
(In reply to comment #55)
> (From update of attachment 417098 [details] [diff] [review])
> 
> >+// TODO -> when write/exec pages are enabled, c->_address has to be in exec code segment
> 
> What's the follow-up for this?

I'm going to put some assert here. From what I see the c->_address is always in external executable code space (libs and so) but we should catch if the design changes.

> I recommend renaming CodeAlloc::toExec to something else. There is already a
> toExec function that does a very different mapping (without walking all heap
> blocks).

Will do.

> I am still not a big fan of the approach, but Ulrich seems to make a good point
> about RW<>RX mapping changes using return-to-libc, which somewhat sinks my
> favorite approach (disable X while emitting code).
> 
> I wonder what the next steps are. Windows support? Should we try to settle on
> one mechanism together with Adobe?

I've been doing a merge with latest trunk but it seems to contain the RW<>RX mapping change already. So what is the proposal here? Shall I integrate them together? Or do you suppose to remove it?
Attached patch exec patch v2 (obsolete) — Splinter Review
An updated patch with:

- new NInsExec on some places
- freeCodeChunk() has two arguments now (write&exec)
- CodeAlloc::toExec()/CodeAlloc::calcOffset() optimalization
- c->_address is checked by isExec()
- CodeAlloc::toExec() changed to CodeAlloc::getExec()

I hope it address all the remarks. It supports i386 for now and I'd like to have some decision about the general approach and then I can add support for other arches (x86_64/PPC).
Attachment #417098 - Attachment is obsolete: true
Attachment #426675 - Flags: review?(gal)
Attachment #426675 - Flags: review?(edwsmith)
> I've been doing a merge with latest trunk but it seems to contain the RW<>RX
> mapping change already. So what is the proposal here? Shall I integrate them
> together? Or do you suppose to remove it?

The intent with the RX<->RW implementation was to make it easy for it to do nothing, based on VM implementations of allocCodeChunk, etc.  For example in tamarin, it's controlled by an ifdef and in Tracemonkey the protection calls are no-ops.  (memory is mmapped as RWX and later munmapped() with no intermediate permission changes).

One general question about dual-mapping: Once we're done writing to the RW region, and assuming no patching, we could unmap it, right?  any experience with this, or recommendations?
(In reply to comment #58)
> One general question about dual-mapping: Once we're done writing to the RW
> region, and assuming no patching, we could unmap it, right?  any experience
> with this, or recommendations?

If you never intend to write to the region then the region can indeed be unmap.  But is this the case?  It is terribly expensive to unmap everything just to create a new pair of mappings for later.  But if this is how it is normally done then by all means, unmap the writable mapping.
Attached patch exec patch v3 (obsolete) — Splinter Review
Patch with markBlockWrite() routines, it's compatible with the latest trunk.
Attachment #426675 - Attachment is obsolete: true
Attachment #426949 - Flags: review?(gal)
Attachment #426949 - Flags: review?(edwsmith)
Attachment #426675 - Flags: review?(gal)
Attachment #426675 - Flags: review?(edwsmith)
Attached patch exec patch v.4 - i386 & x86_64 (obsolete) — Splinter Review
This patch adds x86_64 support. It's without any extra hack in the 64-bit part so sometimes the offset is calculated twice. 

Plus this patch has a safe version of calcOffset routine - you can pass any kind of pointer here (from exec/write/external space) and the pointer is translated only if relevant page is found. It's useful for the CALL() instructions and so.
Attachment #426949 - Attachment is obsolete: true
Attachment #426949 - Flags: review?(gal)
Attachment #426949 - Flags: review?(edwsmith)
Attachment #427321 - Flags: review?(gal)
Attachment #427321 - Flags: review?(edwsmith)
(In reply to comment #55)
> I wonder what the next steps are. Windows support? Should we try to settle on
> one mechanism together with Adobe?

So what is the recent status here? Is there any agreement between Adobe and mozilla about an united approach?
Whiteboard: [sg:investigate]
Attachment #427321 - Flags: review?(gal) → review?(nnethercote)
Attachment #417089 - Flags: review?(gal) → review?(nnethercote)
Depends on: 555033
Comment on attachment 417089 [details] [diff] [review]
separated patch for secure memory allocation

Big nit: the whole file uses two space indents.  NJ uses four spaces.


>+    if (statfs ("/selinux", &sfs) >= 0 && (unsigned int)sfs.f_type == 0xf97cff8cU)
>+      return true;

What is 0xf97cff8cU?  Is there a name for that constant?


>+    f = fopen ("/proc/mounts", "r");
>+    if (f == NULL)
>+      return false;
>+    while (getline(&buf, &len, f) >= 0) {
>+      char *p = strchr(buf, ' ');
>+      if (p == NULL)
>+        break;
>+      p = strchr(p + 1, ' ');
>+      if (p == NULL)
>+        break;
>+      if (strncmp(p + 1, "selinuxfs ", 10) != 0) {
>+        free(buf);
>+        fclose(f);
>+        return true;
>+      }

A brief comment explaining the format of /proc/mounts would help here.


>+     writable and exexutable filesystem.  */

Nit: s/exexutable/executable/


>+    assert(!execFileReady());    

We use NanoAssert() in Nanojit.  And nanojit/VMPI.h has local versions of various other functions.  We might need to use them, although if this is Linux-only code I'm not sure if it's necessary.  Ed?


>+    /* Map the file for writting */

Nit: s/writting/writing/


>+    /* Open a map file */
>+    if (!execFileReady()) {
>+      pthread_mutex_lock(&fileMutex);
>+      bool ret = openExecFile();
>+      pthread_mutex_unlock(&fileMutex);
>+      if(!ret) {
>+        return false;
>+      }
>+    }
>+    
>+    /* Map memory */
>+    return mapExec(length, write, exec);
>+  }

mapExec() doesn't look thread-safe to me.  Why isn't there a lock protecting it?


>+    /** Our curent position inside loader table */

Nit: s/curent/current/
(In reply to comment #64)

> We use NanoAssert() in Nanojit.  And nanojit/VMPI.h has local versions of
> various other functions.  We might need to use them, although if this is
> Linux-only code I'm not sure if it's necessary.  Ed?

not strictly necessary... In platform specific code it's okay to use api's that are known to be what you want on that platform.  but unrelated calls should use portable api's when possible. (e.g. use NanoAssert anyway).

> >+    /* Map the file for writting */
> 
> Nit: s/writting/writing/
> 
> 
> >+    /* Open a map file */
> >+    if (!execFileReady()) {
> >+      pthread_mutex_lock(&fileMutex);
> >+      bool ret = openExecFile();
> >+      pthread_mutex_unlock(&fileMutex);
> >+      if(!ret) {
> >+        return false;
> >+      }
> >+    }
> >+    
> >+    /* Map memory */
> >+    return mapExec(length, write, exec);
> >+  }
> 
> mapExec() doesn't look thread-safe to me.  Why isn't there a lock protecting
> it?
> 
> 
> >+    /** Our curent position inside loader table */
> 
> Nit: s/curent/current/
Shaver and I discussed this and we would like to try this across all platforms (no need to block the patch on that), so lets land this and file bugs on windows and mac support.

I will hold off with the alternative approach in bug 555033.
Depends on: 555111, 555112
(In reply to comment #66)
> Shaver and I discussed this and we would like to try this across all platforms

In that case all the "is this SELinux?" stuff can be removed.
Comment on attachment 427321 [details] [diff] [review]
exec patch v.4 - i386 & x86_64

In general, AFAICT the patch works as intended, ie. implements double
mapping.  As for whether this is what we want, I'll leave that up to others
to decide.

Martin, can you update the patch?  It's bit-rotted a bit and I'd like to run
some tests and timings.

Detailed comments below.

----

First of all, 8 lines of context would help with this patch -- put
"diff=-p -U 8 " in the "[defaults]" section of your .hgrc file.


We now have a MIPS backend that will need to be modified.


>+            uintptr_t   _execOffset;            // offset to executable code page for current normal code chunk
>+            uintptr_t   _exitExecOffset;        // offset to executable code page for current exit code chunk

There is no big comment explaining the whole separate-write-and-exec-pages
idea.  I'd like to see one, IMO it's quite non-obvious.  This would be a
good place for it.  A reference to this bug would be good.


>+            // write to exec page translations
>+            NInsExec*   toExec(NIns *p)                           { return (NInsExec*)(uintptr_t(p) + _execOffset); }
>+
>+#ifdef AVMPLUS_UNIX    
>+            uintptr_t   calcOffset(NIns *target, NIns *source)    { return _codeAlloc.calcOffset(target, source); }
>+#else      
>+            uintptr_t   calcOffset(NIns *target, NIns *source)    { return uintptr_t(target) - uintptr_t(source); }
>+#endif     

Nit: standard whitespace usage, please.

Non-nit: I'd like CodeAlloc::calcOffset() to exist in the non-AVMPLUS_UNIX
code, ie. have the #ifdef inside CodeAlloc::calcOffset() rather than here.


>+    NInsExec* CodeAlloc::getExec(NIns* p, CodeList* page) {
>+        if (!p || !page) {
>+            return p; 
>+        }

This should instead assert (!p && !page).


>+    NInsExec* CodeAlloc::getExec(NIns* p) {
>+        return(getExec(p,getPage(p)));
>+    }

This function appears to be dead, please remove it.


>+    uintptr_t CodeAlloc::calcOffset(NIns* target, NIns* source) {
>+
>+        // We don't expect that
>+        assert(source != NULL);

Is that comment truncated?


>+ 
>+        // Source address can lay on unmapped/forein page

Nit: s/forein/foreign/


>+        // (for instance exit code with JMP at the end)
>+        // Because offsets are calculated for jumps,
>+        // calculate it relatively to the jmp instructions (source-1)
>+        // and then shift it back.
>+        source--;
>+
>+        CodeList* target_page = getPage(target);
>+        CodeList* source_page = getPage(source);
>+
>+        uintptr_t offset;
>+           
>+        if(target_page == source_page) {
>+            // both adresses are from one continous memory block

Nit: s/adresses/addresses/

>+            offset = uintptr_t(target) - (uintptr_t(source) + sizeof(NIns));
>+        }
>+        else {
>+            // different pages - run the translation machinery
>+            uintptr_t t = uintptr_t(getExec(target, target_page));
>+            uintptr_t s = uintptr_t(getExec(source, source_page));
>+            assert(t != 0 || s != 0);
>+       
>+            offset = t - (s + sizeof(NIns));
>+        }
>+      
>+        if (verbose)
>+            avmplus::AvmLog("calcOffset target %p, source %p -> offset %p\n", target, source+1, offset);
>+          
>+        return offset;
>+    }
>+
>+#endif // AVMPLUS_UNIX

That looks like it could be slow.  What used to be just a pointer difference
now involves two calls to getPage(), each of which scans the CodeList.
Maybe the two calls could be combined into one that looks for both pages in
tandem?  Or maybe it's not that important.
Attached patch linux allocation v2 (obsolete) — Splinter Review
Fixed indentation, typos, mutex, removed selinux check.
Attachment #417089 - Attachment is obsolete: true
Attachment #435577 - Flags: review?(nnethercote)
Attachment #417089 - Flags: review?(nnethercote)
Attached patch exec patch v5 (obsolete) — Splinter Review
Thanks for the review, there's an updated one, with fixed typos/comments and getPages routine which translates two pointers on one loop.
Attachment #427321 - Attachment is obsolete: true
Attachment #435614 - Flags: review?(nnethercote)
Attachment #427321 - Flags: review?(nnethercote)
Attachment #427321 - Flags: review?(edwsmith)
Attachment #399047 - Attachment is obsolete: true
> >+    NInsExec* CodeAlloc::getExec(NIns* p, CodeList* page) {
> >+        if (!p || !page) {
> >+            return p; 
> >+        }
> 
> This should instead assert (!p && !page).

No, we can't assert here because it's a correct situation. p is NULL if target in calcOffset is NULL and NULL page means that p comes from outside address space (libraries and so) and we're not going to translate it.
Comment on attachment 435577 [details] [diff] [review]
linux allocation v2

>+    /* Map in a writable and executable chunk of memory if possible.
>+       Failing that, fall back to mapExec.  */
>+    bool MemMap::map(size_t length, void *&write, void *&exec)
>+    {
>+        write = NULL;
>+        exec = NULL;
>+
>+        pthread_mutex_lock(&fileMutex);
>+
>+        /* Open a map file */
>+        if (!execFileReady()) {
>+            if (!openExecFile()) {
>+                return false;
>+            }
>+        }
>+        /* Map memory */
>+        bool ret = mapExec(length, write, exec);
>+
>+        pthread_mutex_unlock(&fileMutex);
>+
>+        return ret;
>+    }

You haven't unlocked the mutex on the 'return false' exit path.


>+        /* Return a file descriptor of a temporary zero-sized file in a
>+           writable and exexutable filesystem.  */

Nit: s/executable/exexutable/

r=me with those fixed.
Attachment #435577 - Flags: review?(nnethercote) → review+
Hmm, on memmap.cpp I get this warning from GCC:

../nanojit/memmap.cpp: In constructor ‘nanojit::MemMap::MemMap()’:
../nanojit/memmap.cpp:319: warning: extended initializer lists only available with -std=c++0x or -std=gnu++0x
../nanojit/memmap.cpp:319: warning: extended initializer lists only available with -std=c++0x or -std=gnu++0x

The relevant line is this:

        fileMutex = PTHREAD_MUTEX_INITIALIZER;

I don't know how to fix that.

Also:

../nanojit/memmap.cpp: In member function ‘bool nanojit::MemMap::openExecFile()’:
../nanojit/memmap.cpp:247: warning: ignoring return value of ‘int ftruncate(int, __off_t)’, declared with attribute warn_unused_result
../nanojit/memmap.cpp: In member function ‘bool nanojit::MemMap::mapExec(size_t, void*&, void*&)’:
../nanojit/memmap.cpp:274: warning: ignoring return value of ‘int ftruncate(int, __off_t)’, declared with attribute warn_unused_result
../nanojit/memmap.cpp:284: warning: ignoring return value of ‘int ftruncate(int, __off_t)’, declared with attribute warn_unused_result

That can be fixed with:

  int dummy = ftruncate(...)
  (void)dummy;

The second line is needed to avoid a warning about 'dummy' being unused.
Also, it doesn't compile on Mac:

In file included from ../nanojit/avmplus.cpp:50:
../nanojit/memmap.h:65: error: ‘pthread_mutex_t’ does not name a type

I tried adding pthread.h to memmap.h but that results in this:

Undefined symbols:
  "nanojit::MemMap::MemMap()", referenced from:
      __static_initialization_and_destruction_0(int, int)in avmplus.o
  "nanojit::MemMap::~MemMap()", referenced from:
      ___tcf_0 in avmplus.o
  "nanojit::MemMap::map(unsigned long, void*&, void*&)", referenced from:
      nanojit::CodeAlloc::allocCodeChunk(unsigned long, void*&)in avmplus.o
ld: symbol(s) not found

And I haven't tried Windows.
Comment on attachment 435614 [details] [diff] [review]
exec patch v5

Still no NativeMIPS.cpp changes, it's not crucial but would be nice to have before landing (even if untested).


>+    NInsExec* CodeAlloc::getExec(NIns* p, CodeList* page) {
>+        // There are who cases when we don't translate given pointer:
>+        //
>+        // * p == NULL    - calcOffset() has been called with target = NULL.
>+        //                  
>+        // * page == NULL - calcOffset() has been called with target or source
>+        //                  from outside of our dual-mapped address space.
>+        if (!p || !page) {
>+            return p; 
>+        }
>+
> [...]
>+
>+            // different pages - run the translation machinery
>+            uintptr_t t = uintptr_t(getExec(target, targetPage));
>+            uintptr_t s = uintptr_t(getExec(source, sourcePage));
>+            assert(t != 0 || s != 0);

We never call getExec() with p==NULL;  if we did this assertion would fail.
So I'd like the !p assertion within getExec() where it's more obvious.

I ran the TM trace-tests on i386/Linux and X64/Linux and they all passed.
I ran timings on i386/Linux on SS and V8 and saw no noticeable change.
(In reply to comment #73)
> ../nanojit/memmap.cpp: In constructor ‘nanojit::MemMap::MemMap()’:
> ../nanojit/memmap.cpp:319: warning: extended initializer lists only available
> with -std=c++0x or -std=gnu++0x
> ../nanojit/memmap.cpp:319: warning: extended initializer lists only available
> with -std=c++0x or -std=gnu++0x
> 
> The relevant line is this:
> 
>         fileMutex = PTHREAD_MUTEX_INITIALIZER;

Use

   pthread_mutex_init(&fileMutex, NULL);


PTHREAD_MUTEX_INITIALIZER and the other initializer macros are only meant to
initialize global variables statically.
Attached patch memmap v3 (obsolete) — Splinter Review
Fixed return from mutex, pthread initialization, ftruncate return values (I hope). Which compile arguments do you use? -std=c++0x or -std=gnu++0x don't throw me the warning messages about ftruncate (gcc version 4.4.3 20100127/Fedora 12). As for Mac, I don't have such platform available but I should be able to find some box with Windows.
Attachment #435577 - Attachment is obsolete: true
Attachment #435844 - Flags: review?(nnethercote)
(In reply to comment #75)
> >+    NInsExec* CodeAlloc::getExec(NIns* p, CodeList* page) {
> >+        // There are who cases when we don't translate given pointer:
> >+        //
> >+        // * p == NULL    - calcOffset() has been called with target = NULL.
> >+        //                  
> >+        // * page == NULL - calcOffset() has been called with target or source
> >+        //                  from outside of our dual-mapped address space.
> >+        if (!p || !page) {
> >+            return p; 
> >+        }
> >+
> > [...]
> >+
> >+            // different pages - run the translation machinery
> >+            uintptr_t t = uintptr_t(getExec(target, targetPage));
> >+            uintptr_t s = uintptr_t(getExec(source, sourcePage));
> >+            assert(t != 0 || s != 0);
> 
> We never call getExec() with p==NULL;  if we did this assertion would fail.
> So I'd like the !p assertion within getExec() where it's more obvious.

Yes, we do. Here is the backtrace if I put NanoAssert(p); to CodeAlloc::getExec():

#0  NanoAssertFail () at ./nanojit/avmplus.cpp:75
#1  0x081b7f6b in nanojit::CodeAlloc::getExec (this=0x826cb48, p=0x0, page=0x0) at ./nanojit/CodeAlloc.cpp:167
#2  0x081b8045 in nanojit::CodeAlloc::calcOffset (this=0x826cb48, target=0x0, source=0xb72ccfe2 "")
    at ./nanojit/CodeAlloc.cpp:207
#3  0x081b6bcc in nanojit::Assembler::calcOffset(unsigned char*, unsigned char*) ()
#4  0x081b314b in nanojit::Assembler::gen (this=0x827dc04, reader=0x82d4f0c) at ./nanojit/Assembler.cpp:1548
#5  0x081b225e in nanojit::Assembler::assemble (this=0x827dc04, frag=0x8281eac, reader=0x82d4f0c)
    at ./nanojit/Assembler.cpp:1039
#6  0x081b1b66 in nanojit::Assembler::compile (this=0x827dc04, frag=0x8281eac, alloc=..., optimize=true, labels=0x8375bdc)
    at ./nanojit/Assembler.cpp:921
#7  0x081766c4 in js::TraceRecorder::compile (this=0x8288530) at jstracer.cpp:4255

It's because:

./nanojit/Assembler.cpp:1548, i386
1538	                    else {
1539	                        // backwards jump
1540	                        handleLoopCarriedExprs(pending_lives);
1541	                        if (!label) {
1542	                            // save empty register state at loop header
1543	                            _labels.add(to, 0, _allocator);
1544	                        }
1545	                        else {
1546	                            intersectRegisterState(label->regs);
1547	                        }
(gdb) 
1548	                        JMP(0); <<<< here we come
1549	                        _patches.put(_nIns, to);
1550	                    }
1551	                    break;
Attached patch exec v6 (obsolete) — Splinter Review
With an attempt of basic MIPS support, untested.
Attachment #435614 - Attachment is obsolete: true
Attachment #435867 - Flags: review?(nnethercote)
Attachment #435614 - Flags: review?(nnethercote)
Can you update to the latest TraceMonkey tip?  Also, you've updated configure.in but you need to update js/src/configure.in.  Thanks.

Once I have a cleanly applying patch I'll try to work out the problem on Mac.
What do you mean with "TraceMonkey tip"? This patch cleanly apply to latest trunk.
#82: SpiderMonkey development usually happens against the hg.mozilla.org/tracemonkey tree and is then merged into mozilla-central every few days.
Attachment #435844 - Attachment is obsolete: true
Attachment #435844 - Flags: review?(nnethercote)
Attachment #435867 - Attachment is obsolete: true
Attachment #436162 - Flags: review?(nnethercote)
Attachment #435867 - Flags: review?(nnethercote)
Attached patch a different approach, v1, broken (obsolete) — Splinter Review
I worked out the Mac build problem.  In the configure.in file you were only compiling memmap.cpp if it was a Linux machine, but in the C++ code the use of memmap.cpp was decided by AVMPLUS_UNIX, which is true on both Linux and Mac.  Rather than conditionally including memmap.cpp I changed things to always include memmap.cpp, but it's entire contents are guarded by "#ifdef AVMPLUS_UNIX".

I did some measurements on Mac, there was a noticeable slowdown, something like 2--3%.  It's not yet clear to me how much of that is due to extra syscalls and how much is due to the extra work looking for pages and offsets.

And that got me thinking.  I can see two ways to clearly improve the patch, and I've attached an in-progress patch that does these things:

- Properly distinguish writable instructions from executable instructions.  We have the types NIns and NInsExec but they are equivalent so there's no meaningful checking.  My attached patch renames NInsExec as NInsX and changes it to a struct containing a single field of the same size.  This makes NIns and NInsX equivalent in terms of layout, but different types.  (It also fixes some problems with the debug output, where the address shown were a mix of writable and executable instructions.)

- Record NInsX values instead of NIns value, where appropriate, rather than reconstituting them from the NIns values later on.  This allows all the calcOffset() and getPages() stuff to be completely avoided.

The patch compiles on i386 but I'm currently getting a segfault, I screwed something up.  I really think this is the right way to go, and I'm happy to take it from here, Martin, but I won't have a chance to work on it again until the Tuesday after Easter.
Assignee: stransky → nnethercote
Status: NEW → ASSIGNED
I've found only one bug in the patch so far, missing "execOffset = b->execOffset;" in CodeAlloc::alloc() when availblocks == false. But it still crashes right after start, the exec address seems to be broken.
Depends on: 557483
Depends on: 557705
Depends on: 557991
Attached patch a different approach, v2, works (obsolete) — Splinter Review
This patch is similar in spirit to my previous one but it actually works.
Changes:

- Only working for i386 and X64 so far.  Getting i386 working (along with
  the arch-neutral changes) was difficult but getting X64 working after that
  was very easy -- it worked first got after I fixed all the compile errors.
  So hopefully the other backends won't be too hard.

- NIns is split into NInsW and NInsX.  NInsW is the same as the old NIns,
  just a uint8_t on i386.  NInsX is structurally equivalent, but uses a
  union, which means that you cannot mix NInsW and NInsX.  This is a very
  good thing.  Lots of the new NInsW/NInsX variables now have 'W' or 'X'
  suffixes which clarifies what they are.  Some places need one or the
  other, some need both.

- I split Assembler::_patches into two, one part for normal branches, and
  one for jtbl branches.  Much clearer.  BTW, LIR_jtbl doesn't appear to be
  generated by TR or TM.  Why do we have it?  It's probably broken (even
  before my changes). 

- I overhauled memmap.{h,cpp}:
  - There are now fewer methods and fewer data members
  - The control flow is greatly simplified, esp. for mount points -- no need
    for the 'repeat' field in the LoaderTable
  - For the first mapping, it used to do a test mapping of one page, and if
    that succeeded, threw that away and then did a second mapping of the
    right size.  Now the test mapping is of the right size and if it
    succeeds its used directly.  This avoids one mapping.

  Martin, I'd appreciate it if you could look over these changes and see
  that I preserved the behaviour.

- There are some tiny TM-only changes, I didn't bother separating them.

Some questions:

- In flushICache() for 64-bit PPC Mac, should the sys_dcache_flush() call
  use 'startW' instead of 'startX'?

- I'm seeing very slight changes to the code generated by TM for SunSpider
  -- in two or three of them some chunk-linking jumps are occurring
  slightly earlier than before.  This is because CodeList has an extra
  field (execOffset), so the space available for code in each chunk is
  reduced by one word.

- allocCodeChunk() is supposed to always succeed, but several (all?) of
  the implementations of it can fail!  This situation predates my patch.

- I think memmap/MemMap is a pretty bad name for the new file and class.
  I'd prefer something like "MapWX", indicating the dual-mapping.

I've tested it on {i386,X64}/{Linux,Mac}.  Further testing will involve
porting it to Tamarin and testing with the Mozilla try servers.  I'm seeing
maybe a 1% slowdown on SunSpider, and a negligible slowdown on V8.
Attachment #436430 - Attachment is obsolete: true
Attachment #437801 - Flags: feedback?(edwsmith)
Comment on attachment 437801 [details] [diff] [review]
a different approach, v2, works

BTW, I've got a little more polishing to do, but I thought I'd put it up for feedback now.
Attachment #437801 - Flags: feedback?(stransky)
The memmap change looks fine, I wonder why I messed with the one-page test mapping. But IMO if mapExec() fails to map executable segment on a given file, you should close the file and try to find another one...not just quit. Some file systems can be mounted as r/w only. So something like:

        int i = 0;
        for(i = 0; i < (sizeof(loaders) / sizeof(loaders[0])); i++) {
            NanoAssert(execFile == FILE_ERROR);

            execFile = (this->*loaders[i].loader)(loaders[i].arg);
            if (execFile != FILE_ERROR) {
                if (mapExec(length, tmpW, tmpX)) {
                    memW = tmpW;
                    memX = tmpX;
                    return true;
                } else {
                    close(execFile);
                    // don't pretend we have a valid execFile
                    execFile = FILE_ERROR;
                }
            }
        }
(In reply to comment #90)
> But IMO if mapExec() fails to map executable segment on a given file,
> you should close the file and try to find another one...not just quit. Some
> file systems can be mounted as r/w only. So something like:

Indeed.  A common, good recommendation is to mount /tmp with noexec (in Linux-speak).  If /tmp would be the only place to look for such a mapping to exist it wouldn't work.  The code should look in other places such as the user's home dir etc.  The list should be configurable.
Does Linux (glibc, mayhap) not have a way to set up this sort of split mapping that can follow whatever the system's preferences are?  Given that it is described as such a critical security improvement, it seems strange to me that there's no support beyond "try a lot of filesystem places and do your own mmap trickery".  Is that really what Linux ISVs are all supposed to do if they generate code (increasingly common for scripting languages, which are themselves increasingly common)?  Surely one of the JVMs that is shipped has already solved this problem as well?

njn: how about TwinMapping as the class name, or TwinMappingWX?  MapWX I would expect to be w+x, and therefore AN ABOMINATION.
www.semantiscope.com/research/BHDC2010/BHDC-2010-Paper.pdf

I'm skeptical this patch matters in a world where "Leon" has a "typewriter" (see Dion's slides).

/be
> - I split Assembler::_patches into two, one part for normal branches, and
>   one for jtbl branches.  Much clearer.  BTW, LIR_jtbl doesn't appear to be
>   generated by TR or TM.  Why do we have it?  It's probably broken (even
>   before my changes). 

TR generates LIR_jtbl when compiling the switch construct, and it works, but i just
noticed a stack overflow bug somewhere near printer->formatIns().  
I'll investigate and post a fix.
(In reply to comment #93)
> www.semantiscope.com/research/BHDC2010/BHDC-2010-Paper.pdf
> 
> I'm skeptical this patch matters in a world where "Leon" has a "typewriter"
> (see Dion's slides).

at least, nanojit does constant folding of xor.  (the paper references flash's old jit, which nanojit replaces).  still, thats not necessarily enough to weaken jit spraying of PIC code.
(In reply to comment #90)
> The memmap change looks fine, I wonder why I messed with the one-page test
> mapping. But IMO if mapExec() fails to map executable segment on a given file,
> you should close the file and try to find another one...not just quit. Some
> file systems can be mounted as r/w only. So something like:
> 
>         int i = 0;
>         for(i = 0; i < (sizeof(loaders) / sizeof(loaders[0])); i++) {
>             NanoAssert(execFile == FILE_ERROR);
> 
>             execFile = (this->*loaders[i].loader)(loaders[i].arg);
>             if (execFile != FILE_ERROR) {
>                 if (mapExec(length, tmpW, tmpX)) {
>                     memW = tmpW;
>                     memX = tmpX;
>                     return true;
>                 } else {
>                     close(execFile);
>                     // don't pretend we have a valid execFile
>                     execFile = FILE_ERROR;
>                 }
>             }
>         }

That code is better and I'll use it, but I don't think it addresses your main point.  IIUC you're talking about the mount case... in my new code if we can successfully open a file on one of the mounts but then fail to map from it, it won't try another mount.  I assume your old code did but the control flow is too complicated for me to tell for sure.

Making this work again will require some fiddling.
(In reply to comment #91)
> (In reply to comment #90)
> > But IMO if mapExec() fails to map executable segment on a given file,
> > you should close the file and try to find another one...not just quit. Some
> > file systems can be mounted as r/w only. So something like:
> 
> Indeed.  A common, good recommendation is to mount /tmp with noexec (in
> Linux-speak).  If /tmp would be the only place to look for such a mapping to
> exist it wouldn't work.  The code should look in other places such as the
> user's home dir etc.  The list should be configurable.

Yes.  If you read the patch you'll see this:

    LoaderTable MemMap::loaders[] = {
        {(FileLoader) (&MemMap::openFileEnv), "TMPDIR"},
        {(FileLoader) (&MemMap::openFileDir), "/tmp"},
        {(FileLoader) (&MemMap::openFileDir), "/var/tmp"},
        {(FileLoader) (&MemMap::openFileDir), "/dev/shm"},
        {(FileLoader) (&MemMap::openFileEnv), "HOME"},
#ifdef HAVE_MNTENT_H
        {(FileLoader) (&MemMap::openFileMnt), "/etc/mtab"},
        {(FileLoader) (&MemMap::openFileMnt), "/proc/mounts"},
#endif // HAVE_MNTENT_H
    };

which should give you a good idea of what's being tried.

(In reply to comment #92)
> Does Linux (glibc, mayhap) not have a way to set up this sort of split mapping
> that can follow whatever the system's preferences are?  Given that it is
> described as such a critical security improvement, it seems strange to me that
> there's no support beyond "try a lot of filesystem places and do your own mmap
> trickery".  Is that really what Linux ISVs are all supposed to do if they
> generate code (increasingly common for scripting languages, which are
> themselves increasingly common)?

It's a good point.  The example at http://people.redhat.com/drepper/selinux-mem.html just tries the home directory, then adds this not-very-useful comment:

"The only requirement for this code to work is that the filesystem on which the file is created must allow execution. Some sites might (wisely) decide to mount filesystems like /tmp and /var/tmp with the noexec option.  One can either detect this ... or actively look for directory with appropriate permissions."

Having said that, if we want to do this on Mac and Windows we can't rely on any glibc magic anyway.

I wonder how useful the mount point attempts are -- how often will the first five attempts fail?  It makes me nervous because the mount code will be used very rarely, which increases it's likelihood of being broken.

Another question: is it ok to use pthread_mutex_lock() et al inside Mozilla code?  Or do we have to indirect it to some NSPR thing?
Martin, seems like changing this:

        if (!(hasmntopt(&mnt, "ro") ||
              hasmntopt(&mnt, "noexec") || access(mnt.mnt_dir, W_OK)))

to this:

        if (!(hasmntopt(&mnt, "ro") ||
              hasmntopt(&mnt, "noexec") || access(mnt.mnt_dir, R_OK|W_OK|X_OK)))

might be better? Or maybe the the R_OK|X_OK check is subsumed by the hasmntopt() checks?
(In reply to comment #95)
> (In reply to comment #93)
> > www.semantiscope.com/research/BHDC2010/BHDC-2010-Paper.pdf
> > 
> > I'm skeptical this patch matters in a world where "Leon" has a "typewriter"
> > (see Dion's slides).
> 
> at least, nanojit does constant folding of xor.  (the paper references flash's
> old jit, which nanojit replaces).  still, thats not necessarily enough to
> weaken jit spraying of PIC code.

Before we take this patch, I want us to be clear on the security benefits we're going to get. Please, no sermonizing on "privileges" and "rights".

What attacks does this patch prevent, and which ones does it leave open? Let's make the trade-offs extremely clear, so we have a record of our decision, if nothing else.
This version fixes the problems with giving up too early in the mount cases.

Todo list:
- Test it in the browser (I've only done shell testing so far)
- Test it on Windows (I've only done Linux and Mac so far)
- Decide if pthreads can be used as-is
- Decide if __builtin_alloca can be used
- Port it to Tamarin, check it works there
Attachment #436161 - Attachment is obsolete: true
Attachment #436162 - Attachment is obsolete: true
Attachment #437801 - Attachment is obsolete: true
Attachment #438016 - Flags: feedback?(stransky)
Attachment #437801 - Flags: feedback?(stransky)
Attachment #437801 - Flags: feedback?(edwsmith)
Attachment #436161 - Flags: review?(nnethercote)
Attachment #436162 - Flags: review?(nnethercote)
(In reply to comment #98)
> Martin, seems like changing this:
> 
>         if (!(hasmntopt(&mnt, "ro") ||
>               hasmntopt(&mnt, "noexec") || access(mnt.mnt_dir, W_OK)))
> 
> to this:
> 
>         if (!(hasmntopt(&mnt, "ro") ||
>               hasmntopt(&mnt, "noexec") || access(mnt.mnt_dir,
> R_OK|W_OK|X_OK)))
> 
> might be better? Or maybe the the R_OK|X_OK check is subsumed by the
> hasmntopt() checks?

man access(2) says:

"access() may not work correctly on NFS file systems with UID mapping enabled, because UID mapping is  done on the server and hidden from the client, which checks permissions." 

so I think the idea is to check general accessibility by access() and the execute flags by hasmntopt().
(In reply to comment #100)
> - Decide if __builtin_alloca can be used

MemMap::openDirFileAndMapExec()/__builtin_alloca() is called only when a mapfile is created and the file creation takes some time so IMO malloc() would serve as well here.
I just saw this on the Linux mmap man page:

  The use of MAP_ANONYMOUS in conjunction  with  MAP_SHARED
  is only supported on Linux since kernel 2.4.

If we are able to rely on MAP_ANONYMOUS|MAP_SHARED working (on all AVMPLUS_UNIX platforms, which includes Linux and Mac) then we could greatly simplify memmap.cpp -- no need to look through $TMPDIR, /tmp, /var/tmp, etc, to find a file that can be marked executable.  Also, it might be faster because we wouldn't need the ftruncate() syscalls.
(In reply to comment #103)
> 
> If we are able to rely on MAP_ANONYMOUS|MAP_SHARED working (on all AVMPLUS_UNIX
> platforms, which includes Linux and Mac) then we could greatly simplify
> memmap.cpp -- no need to look through $TMPDIR, /tmp, /var/tmp, etc, to find a
> file that can be marked executable.  Also, it might be faster because we
> wouldn't need the ftruncate() syscalls.

Is this capability available on SELinux?
(In reply to comment #104)
> 
> Is this capability available on SELinux?

Looking more closely, I think not.  The full excerpt from the man page:

MAP_ANONYMOUS
    The  mapping  is not backed by any file; its contents are initialized to
    zero.  The fd and offset arguments are ignored; however, some
    implementations require fd to be -1  if  MAP_ANONYMOUS  (or MAP_ANON) is
    specified, and portable applications should ensure this.  The use of
    MAP_ANONYMOUS in conjunction with MAP_SHARED is only  sup‐ ported on
    Linux since kernel 2.4.

Also, it doesn't make much sense logically -- without the underlying file,
how do you tie two mmapped regions together?
(In reply to comment #105)

The MAP_ANONYMOUS|MAP_SHARED mapping works fine with SELinux (at least on Fedora 12) but the pages are not linked together, so a change in write page is not propagated to a executable one. IMHO MAP_SHARED here just means that the memory can be shared.
There's the v3 version synced with latest tracemonkey trunk. I've updated all arches although can check i386&x86_64 only. Is there anything else I can help with to move it forward?
Attachment #453033 - Flags: review?(nnethercote)
Attachment #438016 - Flags: feedback?(stransky)
Thanks for the update.

(In reply to comment #107)
> Is there anything else I can help
> with to move it forward?

Answers to the questions in comment 99 would help.  Thanks.
(In reply to comment #99)
> What attacks does this patch prevent, and which ones does it leave open? Let's
> make the trade-offs extremely clear, so we have a record of our decision, if
> nothing else.

When SELinux is enabled flash plug-in can run in its own sandbox (as OOP) so we don't have to care about it.

The issue is the browser itself, with this patch an attacker can write (somehow) its code to write-only page and execute it then, even with enabled SELinux.

Let's assume an attacker somehow delivers its code to the JIT R/W page and tries to execute it. He may:

- write some specific constant to R/W page and search for it in all executable pages. But we can mmap the executable pages as executable only, w/o the read permissions.

- read /proc/PID/maps and look for mapped twins here. For instance I have there:

7ffff7fda000-7ffff7fea000 rw-s 00020000 08:01 3358809                    /tmp/jitU8GOD2 (deleted)
7ffff7fea000-7ffff7ffa000 --xs 00020000 08:01 3358809                    /tmp/jitU8GOD2 (deleted)

where "/tmp/jitU8GOD2 (deleted)" is the temporary file used for mmap so it gives clear 7ffff7fda000 => 7ffff7fea000 mapping. But again, access to the process map file can be disabled by SELinux.

So with this small tweak (map the executable memory as executable only without the read permission) I don't see any way how an attacker can find any executable page.
blocking2.0: --- → ?
status1.9.2: --- → ?
status2.0: --- → ?
Hey nick, whats the status of this?
(In reply to comment #110)
> Hey nick, whats the status of this?

The code works, someone needs to decide if it's a good idea and should be landed.  Eg. see comment 99.
(In reply to comment #111)
> The code works, someone needs to decide if it's a good idea and should be
> landed.  Eg. see comment 99.

Well, who can make such decision?
(In reply to comment #109)
> Let's assume an attacker somehow delivers its code to the JIT R/W page and
> tries to execute it. He may:
>
> [...]

Dumb question: what stops the attacker from reading the same execOffset field that TraceMonkey itself uses to find the other mapping?
(In reply to comment #113)
> Dumb question: what stops the attacker from reading the same execOffset field
> that TraceMonkey itself uses to find the other mapping?

Nothing. But the attacker has to search for page header and if the executable page is executable only, he can't confirm the right executable address.
(In reply to comment #114)
> (In reply to comment #113)
> > Dumb question: what stops the attacker from reading the same execOffset field
> > that TraceMonkey itself uses to find the other mapping?
> 
> Nothing. But the attacker has to search for page header and if the executable
> page is executable only, he can't confirm the right executable address.

See comment 93 and please read the paper. I think this "attacker has to search" point is meaningless.

/be
I read it tree times.

> When this executable page is entered at a known offset (0x6A, for the 
> given example code below) will execute stage-0 shellcode that marks the 
> rest of the page RWX and copies the next stage of shellcode 
> from an ActionScript string that can be modified 
> before compilation.

- SELinux does not allow to change page permissions. So once you allocate it as X, you can't change it to RWX. 
- How do you manage to launch compiled script it with such offset?

> In the end, it's really just a pointer into the heap. 
> To try and find a use for this pointer, we must understand the Flash heap and
> some of the details of Windows memory architecture.

How is this related to Linux/SELinux? And if there is any relevance (leaving unused executable segments on heap for instance) we can easily fix them, can't we?

I'm not a security expert but I didn't see anything relative in the paper. If you think I'm wrong please provide the details.
An interesting reading:

http://www.zdnet.com/blog/security/questions-for-pwn2own-hacker-charlie-miller/2941

"It’s really simple. Safari on the Mac is easier to exploit.  The things that Windows do to make it harder (for an exploit to work), Macs don’t do.  Hacking into Macs is so much easier. You don’t have to jump through hoops and deal with all the anti-exploit mitigations you’d find in Windows.

It’s more about the operating system than the (target) program.  Firefox on Mac is pretty easy too.  The underlying OS doesn’t have anti-exploit stuff built into it.

With my Safari exploit, I put the code into a process and I know exactly where it’s going to be.  There’s no randomization. I know when I jump there, the code is there and I can execute it there.  On Windows, the code might show up but I don’t know where it is.  Even if I get to the code, it’s not executable.  Those are two hurdles that Macs don’t have."

... do we really want to weak whole browser just because of one small part which fails to follow modern security standards?
Martin, Charlie was talking about lack of ASLR. This bug does nothing of the kind.

Re comment 116 -- return-oriented programming means you don't need to write code, you let the JIT write code and then find interesting "instructions" (not the ones generated) that can be hooked together by tail jumps and returns. If this is (as I contend) the threat to defend against, I don't see how this bug's patch helps.

/be
(In reply to comment #118)
>Re comment 116 -- return-oriented programming means you don't need 
> to write code, you let the JIT write code and then find 
> interesting "instructions" (not the ones generated) that 
> can be hooked together by tail jumps and returns. If this 
> is (as I contend) the threat to defend against, I don't see 
> how this bug's patch helps.

With this patch and SELinux enabled, you can't switch the pages to RWX so you have to generate the initial code from those crafted/switched instruction. 

If you want to load any code from any external source, it has to be smaller than the executable page and you have to calculate proper address in RW page. (You have to load it somewhere, right? And not into the executable page where is your code executed from).

And if you want to generate interpage tail jumps and returns that you're referencing, the exploit has to translate the address from write to executable space in the same way as the JIT does it.

I'm not convicted that the exploit, generated from malformed constants can find the page header, read execOffset, recalculate target, find and load the exploit, execute it and manage to calculate jumps/returns.

For recapitulation, the exploiting scenario with this patch could be:

1) create the crafted JIT code
2) execute it with given offset (how?)
3) malformed code find execOffset of its page (I don't know how but let's assume it's possible) or call toExec() or calcOffset() and calculates writtable twin (how?)
4) loads prepared code from an external source, (it has to fit ton the executable page size, actually it can't rewrite its current code so it may be even smaller)
5) fixes return point (by execOffset/toExec()/calcOffset()) and launch main exploit.

btw. You can see why I'm talking about "attacker has to search for page header" now, the execOffset is located there.

And we still can make it harder. Those steps would have an impact to performance (often memory allocation/deallocation, frequent interpage jumps) so they could be considered as optional:

- divide code to more independent segments. An attacker will be more restricted by maximal exploit size and will have less space for write -> executable page recalculation.

- aggressively free unused memory segments and allocate only what we need. Attacker would not find another space where he can load and execute the code.
See Also: → 506693
See Also: 506693
Ping. Any update here?
I really don't think this blocks... it didn't block the last release either, so it's not a regression.
blocking2.0: ? → -
Attached patch JM WIP/i386 (obsolete) — Splinter Review
First version of double page patch for JM. It still contains a crash when it's run outside of gdb (it passes all js trace tests inside gdb on i386). 

I tried another approach - to make it as small/simple as possible and translate the addresses on the lowest level. A performance could be improved by a page cache (I'll do speed tests when I sort out the crash outside of gdb).

I think the same approach would be applied to tracemonkey so the both engines would share the same ExecutableTranslator service.
Attachment #488986 - Flags: feedback?(nnethercote)
Attachment #488986 - Flags: feedback?(brendan)
Comment on attachment 488986 [details] [diff] [review]
JM WIP/i386

Ask a JM lead hacker for review.

/be
Attachment #488986 - Flags: feedback?(brendan) → feedback?(dvander)
If the performance effects are not indistinguishable from noise, a minimally #ifdef'ed patch to let SELinux turn this on would be better -- provided that side of the ifdefs will see testing and use.

/be
Attached patch JM WIP v.2 (obsolete) — Splinter Review
Fixed offset calculation, added memory management for allocated pages, added page cache, range check, page translation statistics. 

It's missing fixes in trap handler, (it's not redirected to correct return address) and fails the trap-* jit_tests.
Attachment #488986 - Attachment is obsolete: true
Attachment #490903 - Flags: feedback?(nnethercote)
Attachment #490903 - Flags: feedback?(dvander)
Attachment #488986 - Flags: feedback?(nnethercote)
Attachment #488986 - Flags: feedback?(dvander)
Attachment #490903 - Flags: feedback?(nnethercote)
Blocks: 574119
This one combines patches for nanojit and methodjit, uses shared memory allocator (MemoryManager) plus the dual mapping can be controlled by --enable-dual-mapping configure switch.

From my measures the difference in performance between dual/single page builds is ~ 10ms for SpiderMonkey test and for optimized build. (~290ms single, ~300ms double on i5 box).
Attachment #453033 - Attachment is obsolete: true
Attachment #490903 - Attachment is obsolete: true
Attachment #497240 - Flags: review?
Attachment #490903 - Flags: feedback?(dvander)
Attachment #453033 - Flags: review?(nnethercote)
Attachment #497240 - Flags: review? → review?(dvander)
Assignee: nnethercote → general
Status: ASSIGNED → NEW
Comment on attachment 497240 [details] [diff] [review]
nanojit & methodjit patch

Canceling old r?s.
Attachment #497240 - Flags: review?(dvander)
Was the security patch applied? If not: why not?
I do not see why this issue depends on bug 555111 and bug 555112.

But those issues seem to soehow depend on this one. Was there no progress regarding this issue due to circular issue dependencies? (an issue deadlock?)
The patches in this bug were very large and it wasn't clear whether there was actually any security benefit (see comment #7, comment #16) for the maintenance effort. By now these patches would not be applicable anyway since we have entirely new JITs.

The best path forward for JIT security is sandboxed content processes, which is in progress for both Desktop Firefox and Firefox OS. Sandboxing will greatly reduce the threat of JIT exploits, and it's unlikely any other solution will be as effective. For Desktop, it's part of the Electrolysis effort:
 [1] https://wiki.mozilla.org/FoxInABox
 [2] https://wiki.mozilla.org/Electrolysis
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: