Closed Bug 601522 Opened 14 years ago Closed 3 years ago

detecting big stack frame that can make native stack overflow exploitable

Categories

(Developer Infrastructure :: Source Code Analysis, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: igor, Unassigned)

References

Details

(Keywords: sec-want, Whiteboard: [sg:want P3])

Attachments

(2 files)

On Linux as a defense against a stack overflow the kernel adds a guard page at the stack limit address. After that page stacks for other threads or heap data may live. Thus a stack overflow bug could be potentially exploitable if the code allocates a large block of memory on the stack and does not touch all its CPU pages it immediately. 

For example, a code like

char buffer[10000];

for (int i = 0; i != 10000; ++i)
    buffer[i] = data[i];

could lead to an exploit if it is called near the stack limit. Given that the stack grows downwards the above loop may write to the stack on some other thread or heap before hitting the guard page. But at that moment another thread could be damaged enough to launch an exploit.

As a defense against that it would be nice to have a static analysis that detects that we do not have stack frames that are bigger then a CPU page in size. 

The analysis should also flag any code containing alloca calls as unsafe until proven otherwise.
Whiteboard: [sg:want P3]
We cant detect stack frame sizes statically. This sounds like the halting problem.
(In reply to comment #1)
> We cant detect stack frame sizes statically. This sounds like the halting
> problem.

Compilers compute stack frame sizes ahead of time, so this is not the halting problem. It may be hairier than one might think, though.

/be
(In reply to comment #2)
> (In reply to comment #1)
> > We cant detect stack frame sizes statically. This sounds like the halting
> > problem.
> 
> Compilers compute stack frame sizes ahead of time, so this is not the halting
> problem. It may be hairier than one might think, though.
> 
> /be

I think I misunderstood the problem. If you just need a single stack frame, that should be relatively easy to get out of gcc.
Is alloca used in Firefox anywhere?  Other than that the stack size should be statically computable, the hard part is getting that info out of GCC.  If that's hard then adding up the sizes of all locals might work to a first approximation (could be greater than the real value due to different locals sharing space, or smaller due to alignment stuff).
(In reply to comment #4)
> Is alloca used in Firefox anywhere?

Quote from Igor: "jstracer.cpp still uses alloca for example"

See bug 589103 comment #11.
(In reply to comment #4)
> Is alloca used in Firefox anywhere?

<http://mxr.mozilla.org/mozilla-central/ident?i=alloca> gives the following:
* Several in libffi
* Several in libvorbis
js/src/jstracer.cpp:4005
  LIns** map = (LIns**)alloca(sizeof(LIns*) * length);
js/src/jstracer.cpp:3724
  (JSValueType*)alloca(sizeof(JSValueType) * length)); 
nsprpub/pr/src/md/os2/os2io.c:188
  char *copy = (char *)alloca(len); 
nsprpub/pr/src/md/os2/os2misc.c:300
  char *copy = (char *)alloca(len); 
toolkit/components/alerts/src/mac/growl/CFGrowlAdditions.c:291
  char *cwdBytes = alloca(PATH_MAX); 
toolkit/components/alerts/src/mac/growl/CFGrowlAdditions.c:394
  .outName = destName ? NULL : (struct HFSUniStr255 *)(destName = alloca(sizeof(struct HFSUniStr255))),
modules/lib7z/LZMASDK/CPP/7zip/Compress/LZMA_Alone/LzmaBench.cpp:523
  alloca(encoder->AllocaSize);
modules/lib7z/LZMASDK/CPP/7zip/Compress/LZMA_Alone/LzmaBench.cpp:536
  alloca(decoder->AllocaSize); 

In addition, sqlite appears to use it in a #define at least, as does gfx/angle/generated/glslang_tab.cpp. That looks to be it for alloca usages.


> If
> that's hard then adding up the sizes of all locals might work to a first
> approximation (could be greater than the real value due to different locals
> sharing space, or smaller due to alignment stuff).

There's also the arguments that get pushed to the stack, so you'd need to know max(|stack args|) over all called functions (depending on whose stack frame you wish to attribute those to).
The stack frame size information is, I believe, in PDB files for MSVC.  I'm trying off and on to figure out how to extract it, since we could then statically detect frames that are beyond a certain stack threshold.  I presume we could do the same with DWARF info with gcc?

(Just adding locals doesn't really work in the presence of PGO, as we found recently.)
(In reply to comment #7)
> The stack frame size information is, I believe, in PDB files for MSVC.  I'm
> trying off and on to figure out how to extract it, since we could then
> statically detect frames that are beyond a certain stack threshold.  I presume
> we could do the same with DWARF info with gcc?

MSVC has the Debug Interface Access for grabbing stuff out of PDB files. I've not tried it though. It also appears that the IDiaFrameData has a method, get_maxStack, which appears to be what you are referring to.

The dwarf documentation is a bit denser to go to into, but an objdump --dwarf doesn't indicate any information that looks suspiciously like a frame size. Well, actually, a better way to put it is that the frame size information embedded in the DWARF would require a fair amount of computation: it doesn't appear to be a simple attribute like the PDB file has.
The documented interfaces for DIA are not sufficient: IDiaFrameData requires a "live" frame, it can't operate simply on a binary.  I'm in lukewarm pursuit of the truth, though.

jimb did the CFI work needed for breakpad to work on dwarf in the absence of a frame pointer, it might be a good start there.

I really think that static analysis on the source is a dead end, because of unpredictable inlining behaviour.
gcc has -Wframe-larger-than= as of 4.4 which works but which gives a slightly smaller frame size than is correct (more below). alloca and variable length arrays are also not considered.

There's also -fstack-check which apparently inserts code for runtime stack checking (might be worth looking into).

I was also messing with the gcc sources and it turns out there is a legacy mode, -fstack-check=generic, which does a better job of calculating the frame size. (Compare http://gcc.gnu.org/viewcvs/trunk/gcc/reload1.c?revision=164733&view=markup#l1267 to the calculation done by -Wframe-larger-than: http://gcc.gnu.org/viewcvs/trunk/gcc/final.c?revision=164751&view=markup#l1565)

I don't know why gcc isn't using the more accurate calculation but I've attached a patch (against 4_5-branch tip) that prints out the more accurate frame size. In my tests so far it's matched the object code frame size (not counting alloca). I'm unclear if this accounts for inline calls that increase the frame size, if such things exist/are significant though.

I've yet to run a mozilla build but I can post some results in a bit.
Attached file frames >= 512 bytes
this is from a non opt, non debug build (rev 76b7f615bab9). From what I've checked the results seem ok.

On my linux x86 box the max cache size is 4096 kb. Maybe we should just turn on -fstack-size=524288 on debug builds?
(In reply to comment #11)
> Created attachment 480690 [details]
> frames >= 512 bytes

add_hidden_capitalized_word: 196648
get_clen_and_captype: 131112
parse_file: 73948
decode_flags: 65596
getline: 65580
ReadFileToString: 65572
NativeFrameCleaner: 65556
Close: 49276

So we can have over 100K frames and the usage in many of them often do not touch the high part of the buffer. This is really bad and allows to turn native stack overflow DOS into exploits not only on Linux. I make this bug security sensitive to try to limit that information leakage.
Group: core-security
Happened to be talking to Brendan about this trying to figure out a way to turn the parser into an (incremental) coroutine -- I found STACK_CHECK_MAX_FRAME_SIZE ( http://gcc.gnu.org/onlinedocs/gccint/Stack-Checking.html ) in addition to some of the things Ehren saw.

If we can figure out the maximum frame size reachable from an arbitrary node in the call graph (call it FRAME_SIZE_LIMIT) and assume there are no reachable alloca, then we can mmap a guard chunk of memory (-rwx) >= FRAME_SIZE_LIMIT to run the recursive descent parser out of heap space with stack overflow safety.

It'd be a real pain to perform dynamic checks of the %sp bump amounts in the prologue of every procedure reachable from the parser, so a static way of reliably extracting this information would rock my socks.

Of course, to check yourself properly you would have to 1) run the static analysis then 2) update the mmap size constant and 3) iterate to the point where the used constant is sufficient -- ideally changes to mmap constants won't change frame sizes. :-)
Group: core-security
Product: Core → Firefox Build System
I believe protections against this are best done via compiler-provided stack probing. I believe LLVM supports this (https://github.com/rust-lang/rust/pull/42816), not sure if we enable it via clang though?

LLVM has been working on -fstack-clash-protection, as of mid February it landed then got reverted for test failures.

Added in bug 1588710

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Product: Firefox Build System → Developer Infrastructure
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: