Closed Bug 1135609 Opened 10 years ago Closed 8 years ago

Reading and writing below the stack pointer when running Flash game

Categories

(External Software Affecting Firefox Graveyard :: Flash (Adobe), defect)

x86
All
defect
Not set
normal

Tracking

(firefox-esr45 fixed, firefox-esr52 fixed)

RESOLVED FIXED
Tracking Status
firefox-esr45 --- fixed
firefox-esr52 --- fixed

People

(Reporter: jseward, Unassigned, NeedInfo)

References

()

Details

(Keywords: sec-high, sec-vector, Whiteboard: ADBE 3944206)

Attachments

(5 files)

No description provided.
Observed with: x86_64-linux (flash 11.2.202.442) x86-linux (flash 11.2.202.442) x86-win32 (flash NPSWF32_16_0_0_305) STR: http://www.y8.com/games/3d_valet_parking wait till "100" in wibbly progress bar Click CONTINUE Click START on the sign Click "> CONTINUE" "Loading" on a green background displayed Approximately 160 million memory accesses, both reads and writes, are then observed in the area roughly 4000 bytes to 300 bytes below SP. At least for the x86_64-ELF and x86-ELF ABIs these are definitely not allowed. For win32-x86 ABI I don't know, but I suspect they are also not allowed. I suspect, though I can't show for sure, that these happen in code generated by the flashplayer's JIT(s). They only happen in the plugin-container process wrapped around the flash player. Most of the stacks are missing, presumably due to being in JIT generated code, but some do lead back into the flashplayer, eg Invalid write of size 8 at 0x4109F15F: ??? by 0x3F: ??? by 0x2E35000F: ??? by 0x1: ??? by 0x2E33824F: ??? by 0x40DC4E35: ??? by 0x2E2B001F: ??? by 0x2E30700F: ??? by 0x2E33830F: ??? by 0x303: ??? by 0x2E35000F: ??? by 0x1: ??? by 0xFF: ??? by 0x1360B217: ??? (in /home/sewardj/.mozilla/plugins/libflashplayer.so) by 0x1360DD04: ??? (in /home/sewardj/.mozilla/plugins/libflashplayer.so) by 0x1371BC28: ??? (in /home/sewardj/.mozilla/plugins/libflashplayer.so) by 0x317E807529: start_thread (/usr/src/debug/glibc-2.20/nptl/pthread_create.c:310) by 0x317E50079C: clone (/usr/src/debug////////glibc-2.20/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:109) Address 0x3182f920 is on thread 11's stack 536 bytes below stack pointer
Invalid accesses on x86_64-linux
Invalid accesses on x86-linux
Invalid accesses on x86-win32
Julian, did you get these results with Valgrind? Whatever you used, please give (or refer to) detailed instructions on how to set it up, so that Adobe can reproduce these results. Do you have a way to test in "real" Windows :-) That is not in Wine?
> Do you have a way to test in "real" Windows :-) That is not in Wine? Now I realize the answer is "no", since Valgrind isn't available for Windows.
(In reply to Steven Michaud from comment #6) > Now I realize the answer is "no", since Valgrind isn't available for Windows. Unfortunately so. I can document what I did with Wine if that helps. I should point out, I did the Linux test runs as a cross-check, and got what looks suspiciously like the same invalid accesses (comments 2 and 3 above). That raises my confidence that it's a real problem, since there are fewer Valgrind-unknowns for native Linux runs compared to going via Wine.
This is ADBE 3944206
AppVerifier on Windows might give you some Windows-equivalent feedback, just FYI.
I also have Windows Address-Sanitizer builds that we can probably share if you're interested at looking at Windows with ASAN.
Component: Plug-ins → Flash (Adobe)
Keywords: sec-high
Product: Core → Plugins
Whiteboard: ADBE 3944206
Version: Trunk → unspecified
Still present with current latest (11.2.202.451) on x86_64-linux. I didn't test any other targets.
So I tried running [Windows Firefox 37.0.2 + FP 17_0_r0_188] in [wine-1.7.38] with [valgrind-3.10.1] on [Ubuntu 14.04 LTS x64], but ran into the following fatal error whenever the "--trace-children=yes" option is passed into valgrind. valgrind: the 'impossible' happened: Unsupported arch_prtctl option This is the valgrind command-line I used: valgrind -v --demangle=no --suppressions=${HOME}/Desktop/valgrind_ff_flash.sup --trace-children=yes --show-reachable=yes --error-limit=no --track-origins=yes --smc-check=all-non-file --vex-iropt-register-updates=allregs-at-mem-access --show-mismatched-frees=no wine "${HOME}/.wine/drive_c/Program Files (x86)/Mozilla Firefox/firefox.exe" http://www.y8.com/games/3d_valet_parking Without the "--trace-children=yes" option, Firefox will launch successfully but with that we're not able to trace system calls made by Flash Player, thereby not reproducing the bug. Is there anything else I can do to circumvent this error? Or a way to repro the issue without using Wine?
Per comment 1, you don't need Wine at all. I could reproduce it on 32- and 64-bit x86 linux. That is, with a Linux build of Firefox using a Linux build of the Flash player, running on Valgrind.
Thanks. I was sidetracked from the Wine/Windows conversations earlier. With Linux Flash Player 11.2.202.460 running in a nightly build of Firefox, I can see the "Invalid write of size 8 or 4" reports from Valgrind.
Jeromie, any updates on Adobe's side for this issue?
Flags: needinfo?(jeclark)
No, the security queue is pretty deep, and issues on the more populous platforms and Firefox stabilization efforts have been given priority. Flash NPAPI Linux support is an ancient branch that we provide out of contractual obligations, the only reason that we're even considering it is because it's potentially exploitable. We also publish current versions of PPAPI Flash Player for Linux browsers that support PPAPI. In parallel, we're looking at doing additional work on Linux that may render this completely moot. It's too early to talk about that publicly, but I'd be happy to share more offline. I've been waiting to see where the decisions land, as I have a preferred choice on what to do, should options be available. The bug *has* lingered longer than I'd like, and I'm pushing hard to get it dealt with in our P+1 (July) release, as it looks like the ongoing decision-making process is going to drag on for a while longer.
Flags: needinfo?(jeclark)
(In reply to Jeromie Clark from comment #16) > The bug *has* lingered longer than I'd like, and I'm pushing hard to get it > dealt with in our P+1 (July) release, as it looks like the ongoing > decision-making process is going to drag on for a while longer. Hey Jeromie, was there any progress on this decision? Did a fix go out, or are we still waiting?
Flags: needinfo?(jeclark)
It's bumped out to the subsequent release (Quint) again. It's assigned to a developer, but timing was impacted by a combination of Adobe's July shutdown, some related extended vacations by key folks, and the HackingTeam madness that coincided.
Flags: needinfo?(jeclark)
We believe that this issue is fixed in Flash Player 11.2.r202.518; however, the game itself requires a higher version of Flash Player, so we're having trouble reproducing the crash. We hit an assert and safely shut down because of the SWF version incompatibility. We'd appreciate your confirmation that the bug is fixed since it looks like you can reliably reproduce this in your test environment. I'd be happy to provide a build to try out.
Flags: needinfo?(jseward)
(In reply to Jeromie Clark from comment #19) I can certainly give it a try. I just tried .508 (the most recent public release) and the bug is still present. Can you get me a .tar.gz build for x86_64-linux? Then I'll test it.
Flags: needinfo?(jseward)
Flags: needinfo?(jeclark)
Group: core-security → core-security-release
Oh, maybe that version number *was* right. Vacation brain. I probably looked at a stale page with the version numbers. I'll drop you a note with a private build in a minute.
Flags: needinfo?(jeclark)
I see ==29836==ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed: 0x00000195e030 in thread T0 on Fedora 22 with today's opt asan 64 bit build with Flash 11.2.202.508-release. I couldn't reproduce on RHEL6 though.
Attached file 11.2.202.530-3.log
Sorry for the delay but I have been experiencing issues with my display on Fedora. With Flash 11.2.202.530 x86_64 with xorg-x11-drv-nouveau.x86_64 1:1.0.11-2.fc22 I can reproduce: ERROR: AddressSanitizer: attempting free on address which was not malloc()-ed When I boot with ACPI=off and access the machine via VNC I can not reproduce this.
The previous comment was on Fedora 22 with Kernel 4.1.6. I also tried this on an ESXi VM RHEL6 2.6.32-573.3.1 and could not reproduce. Tomcat: do you have a Linux system (not a vm and not over vnc) that you could test this with? jeclark: do you mind if I pass the 11.2.202.530 to Tomcat?
Flags: needinfo?(jeclark)
Flags: needinfo?(cbook)
I tested 11.2.202.530 on Valgrind and the invalid accesses are still present.
@bclary - you're welcome to share the binary internally @jseward - Can you tell us a little more about your config? We were reproducing this using Ubuntu 14.04.2 LTS x64 with FP 11.2.r202.46, but we're not seeing it now. We also aren't seeing the failing log output we'd expect, like the "nnn bytes below stack pointer" diagnostic with "Invalid write of size 4 or 8".
Flags: needinfo?(jeclark) → needinfo?(jseward)
(In reply to Jeromie Clark from comment #26) > @jseward - Can you tell us a little more about your config? mozilla-central, mozconfig as shown below. Fedora 21, x86_64. gcc (GCC) 4.9.2 20150212 (Red Hat 4.9.2-6) libflashplayer.so 11.2.202.530, 64-bit. Valgrind, trunk or 3.11.0 (released today) Start up like this: /path/to/valgrind --num-transtab-sectors=24 --px-default=allregs-at-mem-access \ --px-file-backed=unwindregs-at-mem-access --fair-sched=yes \ --smc-check=all-non-file --trace-children=yes -v \ ./ff-Og-linux64/dist/bin/firefox-bin -P dev -no-remote 2>&1 \ | tee logfile --trace-children=yes is important, so as to get V to follow into the child process wrapped around the Flash player. You need to be able to find something like this in the log: ==26129== Command: /home/sewardj/MOZ/MC-CURR/ff-Og-linux64/dist/bin/plugin-container /home/sewardj/.mozilla/plugins/libflashplayer.so -appdir /home/sewardj/MOZ/MC-CURR/ff-Og-linux64/dist/bin/browser 25960 plugin Let it start up (3-4 mins on fast machine) and go idle; watch with /bin/top. Then follow the STR in comment #1. Reproduces reliably for me. ------------ MOZCONFIG ----------- . $topsrcdir/browser/config/mozconfig mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/ff-Og-linux64 ac_add_options --enable-tests ac_add_options --enable-optimize="-g -Og" ac_add_options --enable-debug-symbols ac_add_options --enable-valgrind ac_add_options --enable-profiling ac_add_options --enable-elf-hack ac_add_options --disable-crashreporter ## Shouldn't really need this ac_add_options --disable-jemalloc mk_add_options MOZ_MAKE_FLAGS="-j4" mk_add_options AUTOCLOBBER=1
Flags: needinfo?(jseward)
I am still seeing these errors in 11.2.202.540.
Flags: needinfo?(cbook)
Julian, can you try in the latest Flash Player? If it's still an issue, we'll circle back with Jeromie/Adobe.
Flags: needinfo?(jseward)
Still present in 11.2.202.569 on 64 bit Linux. I can look in more detail on Monday at the generated code and see if that yields any information.
Flags: needinfo?(jseward)
I managed to catch this with gdb and valgrind, with 11.2.202.569, on 64 bit Linux. It looks like a bogus frame setup in some kind of vectorised routine. How valgrind complains is: ==31534== Invalid write of size 8 ==31534== at 0x2A3F2BB9D074: ??? ==31534== by 0x31: ??? ==31534== by 0x57DB0A8000F: ??? ==31534== Address 0x2243b6c0 is on thread 17's stack ==31534== 648 bytes below stack pointer The routine begins with a pretty normal looking prologue, apart from the redundant rex prefixes, which are harmless. 0x00002a3f2bb9d034: rex.W push %rbp 0x00002a3f2bb9d036: rex.W push %rbx 0x00002a3f2bb9d038: rex.WB push %r12 0x00002a3f2bb9d03a: rex.WB push %r13 0x00002a3f2bb9d03c: rex.WB push %r14 0x00002a3f2bb9d03e: rex.WB push %r15 0x00002a3f2bb9d040: mov %rsp,%rbp So far, so good. Then it continues (immediately) 0x00002a3f2bb9d043: sub $0x208,%rbp RBP is now 520 bytes below RSP, and since the redzone is 128 bytes, it is 392 bytes below the redzone. IOW it points 392 bytes into a non-allowed area. There then follow quite a number of RBP-relative accesses, all of which involve negative offsets from RBP, hence are even further into the invalid area. The first is: => 0x00002a3f2bb9d074: mov %rax,-0x80(%rbp) The offsets make sense. This access is at -128 + -520 relative to RSP, that is, at -648(RSP), as the valgrind message says.
Hi Jeromie, can you get someone to take a look at this?
Flags: needinfo?(jeclark)
This bug is opened and assigned, and we're able to reproduce it. I do not have an ETA for a fix at this time.
Flags: needinfo?(jeclark)
I have tried different ways to run the test and analyze the log, I think the most annoying part is the error about "below stack pointer". I did try to disable AS3JIT, and also tried to used Debug build or Release-with-symbols build to run the test, but there is no stack information related to "below stack pointer" errors. The typical log output that seems to relate the "below stack pointer" error to flashplayer is the similar to what Vincent presents in his tests, like: >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> ==76627== Invalid write of size 8 ==76627== at 0x3DFEE81EF181: ??? ==76627== by 0xB: ??? ==76627== by 0x3A41CB3A100F: ??? ==76627== Address 0x33b137a0 is on thread 13's stack ==76627== 552 bytes below stack pointer ... And Thread 13 that's being referenced in that message has Flash Player on the stack, like so: ==76627== Thread 13: ==76627== Conditional jump or move depends on uninitialised value(s) ==76627== at 0x22B9137B: ??? (in /home/vilee/.mozilla/plugins/libflashplayer.so) ==76627== by 0x22B91C37: ??? (in /home/vilee/.mozilla/plugins/libflashplayer.so) ==76627== by 0x22BAE8F6: ??? (in /home/vilee/.mozilla/plugins/libflashplayer.so) ==76627== by 0x22BAFDF3: ??? (in /home/vilee/.mozilla/plugins/libflashplayer.so) ==76627== by 0x22FA803F: ??? (in /home/vilee/.mozilla/plugins/libflashplayer.so) ==76627== by 0x226FA50E: ??? (in /home/vilee/.mozilla/plugins/libflashplayer.so) ==76627== by 0x223FE5CA: ??? (in /home/vilee/.mozilla/plugins/libflashplayer.so) ==76627== by 0x22879FBB: ??? (in /home/vilee/.mozilla/plugins/libflashplayer.so) ==76627== by 0x2287A50D: ??? (in /home/vilee/.mozilla/plugins/libflashplayer.so) ==76627== by 0x5040554: start_thread (in /usr/lib64/libpthread-2.21.so) ==76627== by 0xA2ACB9C: clone (in /usr/lib64/libc-2.21.so) <<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<< I think it does not prove for sure that this error is caused by flashplayer. My speculation is that this might be caused by the compiler, or some other system libraries, since there is not JITed code, all are interpreted. Most other flash related errors are GC specific, so I don't think they are real issues. There are a few error reported against third_party/swiftshader, they are about member variables not initialized, but it does not seem to be very critical here. Is there any other good ways to pinpoint where the "below stack pointer" error comes from?
Hey Jeromie, any update?
Flags: needinfo?(jeclark)
As my latest update, I am not convinced that the "below stack pointer" error comes from flash player. Correct me if I am wrong on this, otherwise I would rather treat it as not flash player's bug.
I am still seeing this with 11.2.202.632 on x86_64-linux. Here is a disassembly of the function in question. It is consistent with the analysis in comment 31, except that the stack offsets have changed a bit -- in particular "sub $0x208,%rbp" has become "sub $0x1e8,%rbp". That doesn't invalidate the analysis. A couple of other observations: * This is JIT generated code. If it had come from a file-backed mapping, Valgrind (and, probably, GDB) would have said so. (Or at least, it is presented in a non-file-backed mapping. Maybe it is a copy of a piece of AOT generated code that has been copied into a non-file-backed mapping.) * After the function is a bunch of int3 padding, which I assume are "safe" values should execution run off the end for any reason. * The function entry point is at 0x00000e690d666010, 16 bytes past the page start point. The instructions in the 16 bytes prior to that are nonsense, and, I suspect, a 16 byte data area -- perhaps some kind of descriptor for the code that follows. * There are a lot of SIMD (SSE2+) instructions in the routine. It looks like quite good quality code. Curiously it only uses xmm0 .. xmm7 despite the fact that there are 16 xmm registers available on x86_64. * The code has been assembled by an assembler that (harmlessly) adds redundant REX prefixes. I'm pretty sure that that is not the GNU assembler.
Looking at the disassembly, this cannot be produced by SpiderMonkey JITs, as we do not have the "emms" instruction in our vocabulary. (nor any alias of it)
(In reply to Nicolas B. Pierron [:nbp] from comment #38) xzhang, is there any JIT in the Flashplayer that could have produced the code shown in comment 37?
Flags: needinfo?(xzhang)
(In reply to xzhang from comment #34) > There are a few error reported against third_party/swiftshader, they are > about member variables not initialized, but it does not seem to be very > critical here. This is unrelated to the main issue here, but neverthelese: please, just initialise these member variables, even if you think it is not necessary. * it is quite difficult to argue that branches based on undefined values are harmless. Then can easily take the code down an unintended path and lead to a crash. * making the code Valgrind clean(er) makes it easier to see new errors.
(In reply to Julian Seward [:jseward] from comment #40) > (In reply to xzhang from comment #34) > > There are a few error reported against third_party/swiftshader, they are > > about member variables not initialized, but it does not seem to be very > > critical here. > > This is unrelated to the main issue here, but neverthelese: please, just > initialise these member variables, even if you think it is not necessary. > > * it is quite difficult to argue that branches based on undefined values > are harmless. Then can easily take the code down an unintended path > and lead to a crash. > > * making the code Valgrind clean(er) makes it easier to see new errors. Sure, we can do that.
(In reply to Julian Seward [:jseward] from comment #39) > (In reply to Nicolas B. Pierron [:nbp] from comment #38) > xzhang, is there any JIT in the Flashplayer that could have produced > the code shown in comment 37? I will look into that.
the "below stack pointer" error is indeed triggered by SwiftShader JIT in flash player. SwiftShader is a software renderer for Stage3D. The suspicious assembly code is like below (from the previous attachment): 0x00000e690d666040: mov %rsp,%rbp 0x00000e690d666043: sub $0x1e8,%rbp ... 0x00000e690d666074: mov %rax,-0x80(%rbp) what happens is that SwiftShader adjust rbp (sub $0x1e8,%rbp) to make room for the current shader function's frame + 8 xmm registers of extra room + 8 (for alignment). One curious point is that "rsp" is usually adjusted to establish the current stack frame, and "rbp" is used to access local variables, arguments, but this JIT implementation seems taking a different approach. I have tried to modify the JIT code assembler to make "rsp" the real lower bound of the stack, and use "rbp" to access other local variables, and transfer data between xmm registers, but it didn't work very well, there were always some weird crash happened, for example, xmm register access requires memory address to 16-byte aligned, etc. I think it should be possible to figure out all the issues related to changing "rsp/rbp" to the regular use, but it will be quite risk, because the JIT engine seems to be designed that way specifically. SwiftShader is basically to convert vertex/fragment shader code into SSE assembly code to do the rendering, it looks like the shader code is flattened into JIT code without further function calls. I guess this might be the reason that the JIT code does not adjust "rsp" to establish a new stack frame boundary. If the above analysis is correct, should we still consider this "below stack pointer" a security error? I guess valgrind monitors stack access by comparing with "rsp" register to capture issues like "below stack pointer", is that correct?
(In reply to xzhang from comment #43) > If the above analysis is correct, should we still consider this "below stack > pointer" a security error? I guess valgrind monitors stack access by > comparing with "rsp" register to capture issues like "below stack pointer", > is that correct? Yes, correct. Whether or not this is a security problem I can't say. What I can say is that reading and writing below -127(RSP) on Linux-x86_64 (actually, on any ELF based x86_64 platform) is disallowed by the amd64-ELF ABI. Applications must regard the area at -128(RSP) and below as changing at any time. For example, if a signal is delivered to that thread while this code is executing, then the stack frame may well get corrupted. Or, the kernel may decide to unmap the page that ends at -128(RSP) -- it is allowed to -- and so any access in that area will cause a segfault. The same is true for x86-ELF (x86-linux, etc) and arm-ELF (I think), except the limit is zero, not -128. That is, any access below the stack pointer is illegal. I have no idea what the situation is on Windows, but I wouldn't be surprised to find that something similar is true.
Thanks Julian, I will continue to seek solution for this issue then.
Excellent! Thanks.
I have tested a local fix for this issue, it seems working well. What I am trying to do is to adjust esp/rsp to the proper position so that the right stack frame is allocated. The only concern I have is actually the amd64-ELF ABI. It says: >>>>>>>> The 128-byte area beyond the location pointed to by %rsp is considered to be reserved and shall not be modified by signal or interrupt handlers. Therefore, functions may use this area for temporary data that is not needed across function calls. In particular, leaf functions may use this area for their entire stack frame, rather than adjusting the stack pointer in the prologue and epilogue. This area is known as the red zone. <<<<<<<< My understanding is that a leaf function is ok to use the red zone for temporary use, and this is what the JIT code is doing, seems like it is valid for the below-stack-point access. This is an important point that needs to make clear, otherwise my fix won't pass peer review, and it can't get into the production branch.
(In reply to xzhang from comment #47) > My understanding is that a leaf function is ok to use the red zone for > temporary use, and this is what the JIT code is doing, seems like it is > valid for the below-stack-point access. This seems correct to me. To be clear, my understanding of the ABI is as follows: (1) any function may use the area -127(rsp) .. -1(rsp) for its own purposes (2) no function may access at -128(rsp) or a lower address As a consequence of (1), we also have (3) if f calls g, f must assume that any data in the area -127(rsp) .. -1(rsp) is undefined (that is, has been overwritten by g) after the call These are the rules that the amd64-linux port of Valgrind has been checking for some years now.
Thanks for the clarification, I will add it to my change description.
The fix has been submitted into our release branch.
Flags: needinfo?(jeclark)
This was submitted to Adobe's release branch five months ago. Julian, can you see if it reproduces still?
Flags: needinfo?(jseward)
I can't reproduce this any more with 24.0.0.194 on x86_64-linux. So I think this can be closed.
Flags: needinfo?(jseward)
Weekly sec-high triage: it seems to be solved per comment 52, so closing the bug as fixed.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Group: core-security-release
Product: External Software Affecting Firefox → External Software Affecting Firefox Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: