Closed Bug 679840 Opened 8 years ago Closed 8 years ago

unsafe linking concern in CodeAlloc.cpp CodeList class

Categories

(Core Graveyard :: Nanojit, defect)

x86_64
Windows 7
defect
Not set

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: trbaker, Assigned: rreitmai)

References

Details

(Whiteboard: [sg:low stepping-stone], fixed-in-nanojit, fixed-in-tamarin)

Attachments

(2 files, 2 obsolete files)

Reported by partner

File: air\code\third_party\avmplus\nanojit\CodeAlloc.cpp
Line: 167, 199

1) Description
The CodeList class in nanoJIT is not unlinked safely

2) Impact
Memory corruption bugs in nanoJIT could be exploited to cause code execution more easily.

3) Minimum Steps to reproduce

4) Recommendation
Safe unlinking be implemented

Note:
Achieving code execution using this method was mentioned during a talk at a
security conference (BlackHat), so there is public knowledge of this issue.

From blackhat 2011 preso on page 16
http://www.matasano.com/research/Attacking_Clientside_JIT_Compilers.pdf
CodeAlloc class
• Handles allocating JIT pages that will hold code
• Allocates memory RWX
CodeList class
• Inline meta-data for tracking the location of code chunks within JIT pages
• *Next *Lower *Terminator pointers at static offsets
• Creates a doubly linked list of JIT pages
• Overwriting these will give you an arbitrary 4 byte write
• Similar to the original heap unlink attacks
Summary: unsafe linking concern in codealloc.cpp CodeList class → unsafe linking concern in CodeAlloc.cpp CodeList class
Can you give more detail about what an attack would look like?  I'm not familiar with the "original heap unlink attacks".
Chris: I know the Matasano guys gave us a head's up on their BlackHat talk, do we already have a bug that covers this aspect?
(In reply to Nicholas Nethercote [:njn] from comment #1)
> Can you give more detail about what an attack would look like?  I'm not
> familiar with the "original heap unlink attacks".

I don't have a testcase.  I'm created this bug to make sure everyone is aware of the blackhat talk and in hopes that bug can be duped out.  If people aren't aware of the issue and have a fix, it warrants investigation.

Felix forwared this via email:

https://bugzilla.mozilla.org/show_bug.cgi?id=620667

This is not the same bug at all (its in MMgc) but it has a similar flavor.  It was uncovered by a partner security audit.  The dialogue in the comments may be of general interest to those involved.
See Also: → 620667
Assignee: nobody → rreitmai
Flags: flashplayer-qrb+
As far as I know, there is nothing about the JIT or this particular case that makes it any more of an issue than every other instance of doubly-linked lists in either the VM or the host application.  In general, the conventional idiom for unlinking an element from a doubly-linked list allows an attacker to write an arbitrary memory location with a selected value if he is able to overwrite the list header pointers before the unlinking occurs, say, by exploiting a missing bounds check or dangling pointer.  An alternative unlinking idiom checks the structural invariant of the doubly-linked list that will be invalidated by such corruption, and is a recommended best practice for security.  So-called "unsafe" linking is not a vulnerability in itself, but makes it much easier to turn other vulnerabilities into useful attacks.

A concise summary of heap unlink attacks and their mitigation can be found here:

http://blogs.technet.com/b/srd/archive/2009/05/26/safe-unlinking-in-the-kernel-pool.aspx
Whiteboard: [sg:low stepping-stone]
Attached patch Linked-list consistency checks (obsolete) — Splinter Review
This patch adds a couple of classical consistency checks for doubly-linked lists in order to mitigate certain heap corruption attacks, aka "safe unlinking".  The code here is doing more, however, than simply managing a list of well-formed blocks -- in some cases, we are splitting blocks, making a new block of part of its memory allocation.  I've identified another place where a less obvious attack of the same kind is possible, which will be the subject of a followup comment.

Of greater concern is that none of the relevant code is actually executed during a run of the Tamarin acceptance suite.  It is unclear how the freelist management code is tested at all -- we don't free any code blocks in Tamarin.
The essence of the "unsafe unlinking" attack is that if an attacker can manage to scribble over an object at p, then subsequent execution of 'p->foo->bar = p->baz'  will have the effect of '*location = value', where location == p->foo + k and value == p->baz.  Code following this pattern is used to unlink elements of a doubly-linked list, so the attacker can control the assignment of an arbitrary value of his choice to an arbitrary location when the unlink occurs.

Some generalizations are also vulnerable, such as expressions of the form (p->foo + k1)->bar = p->baz + k2, and many similar variants.

In CodeAlloc::alloc(), I find the following code, with my comments prefixed by '###':

    // break block into 2 pieces, returning the lower portion to the free list
    CodeList* higher = b->higher;
    b->end = (NIns*) ( (uintptr_t)b->end - consume );
    CodeList* b1 = b->higher;
    //### Note that b->higher and b->end are overlaid.
    //### With respect to field values at entry:
    //###     b1 == b->higher - consume
    //###     higher == b->higher
    higher->lower = b1;    //### b->higher->lower = (b->higher - consume)
    b1->higher = higher;   //### (b->higher - consume)->higher = b->higher
    b1->lower = b;         //### (b->higher - consume)->lower = b
    //### (b->higher - consume)->terminator = b->terminator   VULNERABLE
    b1->terminator = b->terminator; 
    NanoAssert(b->size() > minAllocSize);
    addBlock(availblocks, b);  // put back the rest of the block

It appears that the last assignment can be exploited.  There previously-posted patch makes a few consistency checks here, but the logic is not so clear-cut as for the forward-link/backward-link check.

In CodeAlloc::addRemainder(), there are also manipulations of the block metadata that are suspicious, but we are splitting blocks and creating new ones, not just unlinking, just as in the snippet above.  Note that code of the form p1->foo->bar = p2->baz would be vulnerable if p2 = p1 + k.  During block splitting, when the pointers are calcuated memory offsets, we end up with scenarios that are much less cut-and-dried than simple list unlinking.
Attachment #560454 - Attachment is patch: true
Trivial fix to previous patch.

I've been able to exercise all of the assertions with a randomized driver that attempts to simulate something approximating the way we would use the allocator if in fact we frequently reclaimed compiled code.  The driver is a dirty hack at the moment, but will end up as one of our self-tests.
Attachment #560454 - Attachment is obsolete: true
Attachment #560513 - Flags: review?(edwsmith)
Note that the instances of vulnerabilities that I observed and attempt to mitigate involve the 'higher', 'lower', 'end', and 'terminator' pointers.  The Matasano report does not mention 'higher', and instead implicates 'next', which is used only for singly-threaded lists.  I strongly suspect that the report was based on a very superficial analysis, based merely on the presence of double-linking in the code, but review should take this into account.
Minor corrections.
Attachment #560755 - Attachment is obsolete: true
Attachment #560993 - Flags: review?(edwsmith)
(In reply to William Maddox from comment #6)
>     //### (b->higher - consume)->terminator = b->terminator   VULNERABLE
>     b1->terminator = b->terminator; 

The claim that this statement is vulnerable turns on the potential predictability of 'consume', which is not actually a constant.  Going beyond the simplest case of unlinking seems to be a slippery slope, and I'm doubting that the checks intended to protect this assignment are warranted.  Review comments are welcome.
Attachment #560513 - Flags: review?(nnethercote)
Attachment #560993 - Flags: review?(nnethercote)
Attachment #560513 - Flags: review?(edwsmith) → review+
Comment on attachment 560993 [details] [diff] [review]
Selftest for code allocation/deallocation (v2)

It would be better if the selftest didn't depend on a random number, it makes
troubleshooting harder.  But its not a fatal problem.
Attachment #560993 - Flags: review?(edwsmith) → review+
(In reply to Edwin Smith from comment #12)
> Comment on attachment 560993 [details] [diff] [review] [diff] [details] [review]
> Selftest for code allocation/deallocation (v2)
> 
> It would be better if the selftest didn't depend on a random number, it makes
> troubleshooting harder.  But its not a fatal problem.

It would be easy enough to fix the seed if the test were to be used for debugging,
but the randomized nature of the test should result in improved coverage over time as more runs are accumulated.
Pushed to TR:

http://hg.mozilla.org/tamarin-redux/rev/850f81df5862

Pushed directly to TR due to Flash Player schedule concerns.  The fix still needs to go into nanojit-central.  It appears that 1) VMPI_abort() is not defined in nanojit/VMPI.h, though it is supplied by the Tamarin host.  I'd like to get Nick's input on the best way to handle this.
Comment on attachment 560513 [details] [diff] [review]
Linked-list consistency checks

Review of attachment 560513 [details] [diff] [review]:
-----------------------------------------------------------------

::: nanojit/CodeAlloc.cpp
@@ +59,5 @@
>      static const int pagesPerAlloc = 16;
>  #endif
>  
> +    // Sanity checks that should remain enabled in release builds.
> +    #define CHECK(cond) do { NanoAssert(cond); if (!(cond)) VMPI_abort(); } while(0)

"CHECK" isn't a great name.  Maybe "ABORT_UNLESS"?
Attachment #560513 - Flags: review?(nnethercote) → review+
Attachment #560993 - Flags: review?(nnethercote) → review+
(In reply to William Maddox from comment #14)
> It appears that 1) VMPI_abort() is not
> defined in nanojit/VMPI.h, though it is supplied by the Tamarin host.  I'd
> like to get Nick's input on the best way to handle this.

Can we just add

  #define VMPI_abort abort

to nanojit/VMPI.h?  That's how most of the VMPI_* functions are defined.
As this is merely hardening to a hypothetical attack that's been publicly disclosed, I've submitted the fix to nanojit-central.  There's no reason that this bug needs to remain classified, though I'll let Trevor do the honors.

http://hg.mozilla.org/projects/nanojit-central/rev/41f7df0b8cb0
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Trevor: please declassify, the code is in TR via nanojit-central.
Reopening bug as the testcase is asserting. I have rolled back to the change that introduced this testcase into tamarin-redux (6615:850f81df5862) and confirmed that it is asserting. The reason that this has never been seen before is that the selftest was committed without rebuilding the cpp code.

cd extensions
$AVM selftest.abc -- *.st
<rebuild shell>
$AVM -Dselftests=nanojit,codealloc


#4  0x000000010019a42a in nanojit::CodeAlloc::alloc (this=0x7fff5fbfebb0, start=@0x7fff5fbfeb78, end=@0x7fff5fbfeb70, byteLimit=128) at ../nanojit/CodeAlloc.cpp:145
#5  0x0000000100059983 in avmplus::ST_nanojit_codealloc::CodeAllocDriver::makeCodeList (this=0x1011e0050, index=6, alloc=@0x7fff5fbfebb0, code=@0x1011e1088) at ../shell/../extensions/SelftestExec.cpp:3978
#6  0x0000000100059bf9 in avmplus::ST_nanojit_codealloc::CodeAllocDriver::run (this=0x1011e0050, iterations=20000) at ../shell/../extensions/SelftestExec.cpp:4024
#7  0x0000000100059ce0 in avmplus::ST_nanojit_codealloc::ST_nanojit_codealloc::test0 (this=0x100609200) at ../shell/../extensions/SelftestExec.cpp:4069
#8  0x0000000100059d41 in avmplus::ST_nanojit_codealloc::ST_nanojit_codealloc::run (this=0x100609200, n=0) at ../shell/../extensions/SelftestExec.cpp:4048
#9  0x0000000100056b76 in avmplus::SelftestRunner::run (this=0x1011df048, component_glob=0x7fff5fbff200 "nanojit", category_glob=0x7fff5fbff208 "codealloc", name_glob=0x0) at ../shell/../extensions/Selftest.cpp:124
#10 0x0000000100056f0e in avmplus::selftests (core=0x101023008, component_glob=0x7fff5fbff200 "nanojit", category_glob=0x7fff5fbff208 "codealloc", name_glob=0x0) at ../shell/../extensions/Selftest.cpp:58
#11 0x0000000100043c4d in avmshell::ShellCore::executeSelftest (this=0x101023008, settings=@0x7fff5fbff150) at ../shell/ShellCore.cpp:385
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Duplicate of this bug: 702200
Despite the missing rebuild, this code worked fine in local testing during development, on 32-bit MacOSX.  It appears that the problem is specific to 64-bit platforms, where it had indeed never been tested.  Due to the larger pointers, the block headers are larger, increasing the size of the minimum usable chunk of memory.
(In reply to William Maddox from comment #23)
> http://hg.mozilla.org/tamarin-redux/rev/466ff1f49594
> 
> Fix for wordcode builds.

Bill this selftest never completes when run on android debug builds. By never I mean that it takes at least more than 20 minutes, that is the amount of time the build system waits for it to complete... I only waited about 5 minutes locally.
Whiteboard: [sg:low stepping-stone] → [sg:low stepping-stone], fixed-in-nanojit, fixed-in-tamarin
http://hg.mozilla.org/tamarin-redux/rev/2b6f0238be8d

Run fewer iterations on non-desktop debug builds in the hope of avoiding
runtime pathologies.  Runtime is quadratic in debug builds due to calls to
sanity_check().
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
Product: Core → Core Graveyard
Group: core-security
You need to log in before you can comment on or make changes to this bug.