Closed Bug 557771 Opened 15 years ago Closed 15 years ago

TestPoisonArea.cpp only builds on arm, ppc, x86, x86_64 and sparc

Categories

(Core :: Layout, defect)

Other
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla2.0b8

People

(Reporter: glandium, Assigned: glandium)

Details

Attachments

(1 file, 1 obsolete file)

As per $SUMMARY, only TestPoisonArea only supports these architectures. Either a failure to build it should not fail the entire build or building it should be made conditional to the target platform (with the risk that when a new platform support is added to the cpp file, the Makefile is not modified accordingly). What would you prefer ?
Isn't any platform on which it doesn't build missing important security capabilities of the platform?
I'm not sure, but it doesn't seem nsPresArena.cpp is platform specific. Only the test is.
(In reply to comment #1) > Isn't any platform on which it doesn't build missing important security > capabilities of the platform? No, except maybe yes. TestPoisonArea.cpp fails to compile on any but the above list of architectures because it only knows how to fill a page with return instructions on those architectures. That is only necessary for one of the three tests that it does, and if we pass the other two we have excellent odds that we *would* pass the third if we could run it. The actual poisoning code in nsPresArena.cpp doesn't need to know this. But the reason I made that a hard build error, instead of skipping the "execute" test, is that I wanted people doing ports to other architectures to stop a minute and think about whether any of the frame poisoning code's *other* unportable assumptions (documented in the long comment at the top of TestPoisonArea.cpp) are going to break.
Attached patch Preliminary patch (obsolete) — Splinter Review
I gave a round on all architectures I have direct access to and came up with this patch for a few of them, namely alpha, hppa, mips (not mipsel) and s390. I also gathered some information about the poison preferred address on various architectures, maybe some other values should be tried as preferred address on some of these, especially when it uses the consolation prize ; note all the results come from a source with the patch from bug 563610 applied : alpha: INFO | negative control allocated at 0x0000020000028000 INFO | positive control allocated at 0x000002000002c000 INFO | poison area assumed at 0x7ffffffff0dea000 armel: INFO | negative control allocated at 0x4001f000 INFO | positive control allocated at 0x40021000 INFO | poison area allocated at 0x40022000 (consolation prize) amd64: INFO | negative control allocated at 0x00007fa6b18f0000 INFO | positive control allocated at 0x00007fa6b18ee000 INFO | poison area assumed at 0x7ffffffff0dea000 hppa: INFO | negative control allocated at 0x4000a000 INFO | positive control allocated at 0x4000c000 INFO | poison area allocated at 0xf0dea000 (preferred addr) i386: INFO | negative control allocated at 0xf77d9000 INFO | positive control allocated at 0xf77d7000 INFO | poison area allocated at 0xf0dea000 (preferred addr) mips: INFO | negative control allocated at 0x2aacd000 INFO | positive control allocated at 0x2aace000 INFO | poison area allocated at 0x2aacf000 (consolation prize) powerpc: INFO | negative control allocated at 0x48023000 INFO | positive control allocated at 0x48025000 INFO | poison area allocated at 0x48026000 (consolation prize) s390: INFO | negative control allocated at 0x77fde000 INFO | positive control allocated at 0x77fdc000 INFO | poison area allocated at 0x77fdb000 (consolation prize) sparc: INFO | negative control allocated at 0xf7f16000 INFO | positive control allocated at 0xf7f12000 INFO | poison area allocated at 0xf0dea000 (preferred addr) All these are with a Linux kernel. I will shortly also treat mipsel, but I need to setup a qemu instance first. ia64 can't work with the same approach, because there is no direct function call. IOW, calling opaddr() will not execute what is at opaddr, but will read a function pointer and a function descriptor there. powerpc64 is similar to that but I don't have access to a ppc64 machine (while I do for ia64). These would probably need a totally different approach, but I first need to check what exactly the poison thing is trying to achieve or prevent. I also tried on some systems with a GNU userland but a non-linux kernel, which didn't show the same patterns as with a linux userland. I haven't checked too deeply why some failures are different than on linux (illegal instruction or bus error) yet: kfreebsd-i386: INFO | negative control allocated at 0x282df000 INFO | positive control allocated at 0x282e1000 INFO | poison area assumed at 0xf0dea000 (preferred addr) TEST-PASS | reading negative control TEST-PASS | executing negative control TEST-PASS | writing negative control TEST-PASS | reading positive control | Bus error TEST-PASS | executing positive control | Bus error TEST-PASS | writing positive control | Bus error TEST-PASS | reading poison area | Segmentation fault TEST-PASS | executing poison area | Segmentation fault TEST-PASS | writing poison area | Segmentation fault kfreebsd-amd64: INFO | negative control allocated at 0x0000000800635000 INFO | positive control allocated at 0x0000000800637000 INFO | poison area assumed at 0x7ffffffff0dea000 TEST-PASS | reading negative control TEST-PASS | executing negative control TEST-PASS | writing negative control TEST-PASS | reading positive control | Bus error TEST-PASS | executing positive control | Bus error TEST-PASS | writing positive control | Bus error TEST-PASS | reading poison area | Bus error TEST-PASS | executing poison area | Bus error TEST-PASS | writing poison area | Bus error hurd-i386: TestPoisonArea.cpp:(.text+0x8d): warning: warning: madvise is not implemented and will always fail INFO | negative control allocated at 0x01026000 INFO | positive control allocated at 0x01028000 INFO | poison area assumed at 0xf0dea000 (preferred addr) TEST-PASS | reading negative control TEST-PASS | executing negative control TEST-PASS | writing negative control TEST-PASS | reading positive control | Segmentation fault TEST-PASS | executing positive control | Segmentation fault TEST-PASS | writing positive control | Segmentation fault TEST-PASS | reading poison area | Illegal instruction TEST-PASS | executing poison area | Illegal instruction TEST-PASS | writing poison area | Illegal instruction
Attachment #443327 - Flags: review?(zweinberg)
Attachment #443327 - Flags: review?(zweinberg) → review+
Comment on attachment 443327 [details] [diff] [review] Preliminary patch This patch is OK, and I'll also pre-approve further patches that add more RETURN_INSTR #defines and/or tweak the #if chain, as long as they make no other code changes. Could you repeat your collection of preferred addresses with the patch from bug 563610 backed out? I would like to know if the same hang happens on any other Linux architecture, and I strongly suspect some of your 'consolation prize' results are going to turn into 'preferred addr' if we don't reject ENOMEM. Re non-Linux kernels, don't worry about "Bus error" versus "Segmentation fault"; those are effectively the same thing. "Illegal instruction" signals, however, need further investigation (my guess is that Hurd is generating the wrong signal code there, but it does need to be checked). The goal of the "execute" test is to cause the CPU to fetch an instruction from the target area, so for ia64 and ppc64 the Right Thing would be to set up a *valid* (accessible) function descriptor whose destination PC address is in the target area. For ia64 you'll also need to deal with machine instructions being 128 rather than 32 bits long. (I honestly don't care if we have perfect test coverage on ia64, so if it's too much hassle just disable that test on ia64.)
I'm wondering how such a test for ia64 would be relevant to what nsPresArena.cpp is trying to prevent. As I don't have access to bug 497495, I can only guess from the code, which I haven't had time to check yet.
You should be able to see bug 497495 now.
You were right, all those getting the consolation prize actually get: INFO | poison area assumed at 0xf0dea000 (preferred addr), except armel, where the program freezes when jumping in the poison area. The mips RETURN_INSTR define should work on mipsel, but I get a behaviour that doesn't make sense when I try in qemu: I get SIGILL, while the instruction is correctly written at the location the jump is made to, gdb is able to tell me it's jr ra, the program counter register points exactly at the right place, and a minimalist version of the program works (only mmap()ing, writing the same value, and jumping). I'll also check what is happening on hurd, and see what I can do for ia64. Thanks for the Cc on bug 497495, that was an interesting read.
This patch adds mipsel and ia64 to the previous one, both need more than 32 bits for the return instruction, so I introduced a RETURN_INSTR_TYPE macro for the type of the RETURN_INSTR macro. And as function calls needed to be wrapped for ia64 (I would have done ppc64 too if i had access to a ppc64 machine), I introduced a JumpTo wrapper function that does the right thing on ia64. The ia64 part /should/ work on ia64 winnt, but I can't test that.
Attachment #443327 - Attachment is obsolete: true
Attachment #443588 - Flags: review?(zweinberg)
As for the Hurd, it looks like it goes SIGILL when trying to read, write or execute addresses above 0x80000000... (at least, when they are not mapped, I didn't try to map some pages there) Below 0x80000000, it goes SIGSEGV.
I won't be able to look at this till next week.
Attachment #443588 - Flags: review?(zweinberg) → review+
Comment on attachment 443588 [details] [diff] [review] Patch with more architectures Looks good; some minor nits ... >+ * For architectures where this is not true, fiddling with RETURN_INSTR_TYPE >+ * can be enough. s/can/might/ >+#elif defined __mips >+#define RETURN_INSTR 0x03e00008 /* jr ra */ >+ >+#ifdef __MIPSEL >+/* On mipsel, jr ra needs to be followed by a nop. >+ 0x03e00008 as a 64 bits integer just does that */ >+#define RETURN_INSTR_TYPE uint64_t >+#endif Why is __MIPSEL uppercase when all the others are lowercase? Also, presumably the nop is needed to fill a delay slot, but then why isn't it necessary on plain mips? I think I'd be more comfortable if you made both of them put a nop after the jr ra. >+#elif defined __s390__ >+#define RETURN_INSTR 0x07fe0000 /* br %r14 */ Why the surrounded-by-underscores form for this one? >+#ifdef __ia64 >+ struct func_call { >+ uintptr_t func; >+ uintptr_t gp; >+ } call = { opaddr, }; { opaddr, 0 } please, just to be clear.
(In reply to comment #8) > You were right, all those getting the consolation prize actually get: > INFO | poison area assumed at 0xf0dea000 (preferred addr), except armel, where > the program freezes when jumping in the poison area. Can you take this up with the ARM kernel people and find out why we get a freeze instead of a segfault? And follow up in bug 563610. If you can't do that, I can try to find time to do it but my time is kinda overbooked. (In reply to comment #10) > As for the Hurd, it looks like it goes SIGILL when trying to read, write or > execute addresses above 0x80000000... (at least, when they are not mapped, I > didn't try to map some pages there) > Below 0x80000000, it goes SIGSEGV. For Hurd, we probably *should* treat an ENOSYS or E(OP)NOTSUP(P) errno code from madvise as "don't use that address," because that means the system call *never* does anything useful. It might also be good to mention this to the Hurd maintainers but I wouldn't call it a priority.
(In reply to comment #13) > Can you take this up with the ARM kernel people and find out why we get a > freeze instead of a segfault? And follow up in bug 563610. If you can't do > that, I can try to find time to do it but my time is kinda overbooked. As I said there, it doesn't freeze everywhere, as I don't have direct access to the hardware, nor root access to the system, I can't change the kernel myself, but that is, in the end, likely to be a sporadic issue in the kernel: 2.6.31rc7 (iirc) freezed, an 2.6.34 rc doesn't and 2.6.26 doesn't either. Unfortunately, the 3 tests have been done on three different machines. Anyways, it means the sky is not as dark as i thought. > (In reply to comment #10) > > As for the Hurd, it looks like it goes SIGILL when trying to read, write or > > execute addresses above 0x80000000... (at least, when they are not mapped, I > > didn't try to map some pages there) > > Below 0x80000000, it goes SIGSEGV. > > For Hurd, we probably *should* treat an ENOSYS or E(OP)NOTSUP(P) errno code > from madvise as "don't use that address," because that means the system call > *never* does anything useful. I'll check if madvise really returns something useful on Hurd. > It might also be good to mention this to the Hurd maintainers but I wouldn't > call it a priority. Will do.
> >+#elif defined __mips > >+#define RETURN_INSTR 0x03e00008 /* jr ra */ > >+ > >+#ifdef __MIPSEL > >+/* On mipsel, jr ra needs to be followed by a nop. > >+ 0x03e00008 as a 64 bits integer just does that */ > >+#define RETURN_INSTR_TYPE uint64_t > >+#endif > > Why is __MIPSEL uppercase when all the others are lowercase? Because __mipsel isn't defined, sadly. > Also, presumably > the nop is needed to fill a delay slot, but then why isn't it necessary on > plain mips? I actually don't know why this is. It turns out that on mipsel, I get SIGILL when executing the zone filled with jr ra only. That doesn't happen on mips. > I think I'd be more comfortable if you made both of them put a nop > after the jr ra. > > >+#elif defined __s390__ > >+#define RETURN_INSTR 0x07fe0000 /* br %r14 */ > > Why the surrounded-by-underscores form for this one? Because __s390 isn't defined, sadly.
(In reply to comment #14) > > For Hurd, we probably *should* treat an ENOSYS or E(OP)NOTSUP(P) errno code > > from madvise as "don't use that address," because that means the system call > > *never* does anything useful. > > I'll check if madvise really returns something useful on Hurd. I'm guessing it always returns ENOSYS from the warning at link time that you quoted - | TestPoisonArea.cpp:(.text+0x8d): warning: warning: madvise is not | implemented and will always fail > > It might also be good to mention this to the Hurd maintainers but I wouldn't > > call it a priority. > > Will do. I'd consider the SIGILL/SIGSEGV inconsistency to be more important than a working madvise. (In reply to comment #15) > > Also, presumably > > the nop is needed to fill a delay slot, but then why isn't it necessary on > > plain mips? > > I actually don't know why this is. It turns out that on mipsel, I get SIGILL > when executing the zone filled with jr ra only. That doesn't happen on mips. Maybe newer MIPS hardware objects to a branch in the delay slot? (just guessing here, it doesn't matter that much) The macro inconsistencies are unfortunate, but I suppose there's nothing to do about it.
Assignee: nobody → mh+mozilla
Comment on attachment 443588 [details] [diff] [review] Patch with more architectures Let's already land this version, that adds support for alpha, hppa, ia64, mips, mipsel and s390. I'll file separate bugs for s390 (it appears that while the test works, it fails when run by make) and hurd.
Attachment #443588 - Flags: approval2.0?
Attachment #443588 - Flags: approval2.0? → approval2.0+
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla2.0b8
Product: Core → Core Graveyard
Component: Layout: Misc Code → Layout
Product: Core Graveyard → Core
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: