Closed Bug 1273462 Opened 8 years ago Closed 8 years ago

Consider marking fully used pages of AssemblerBuffer as read-only to detect heap corruption

Categories

(Core :: JavaScript Engine: JIT, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla49
Tracking Status
firefox49 --- fixed

People

(Reporter: ehoogeveen, Assigned: ehoogeveen)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 4 obsolete files)

Since bug 1271165 is still waiting for review, I decided to try Jan's original suggestion from bug 1124397 and make as many pages of AssemblerBuffer non-writable as possible. The patches in this bug attempt to do that by interposing a new class, PageProtectingVector, which duplicates a small set of functions from mozilla::Vector.

A PageProtectingVector can only grow, and every time a full page of its elements are used, that page is marked as read-only. PageProtectingVector takes care of unprotecting and reprotecting its pages when the underlying vector needs to grow, and allows for a minimal set of pages to be briefly unprotected so they can be modified.
I split this out because I think it's a nice simplification regardless. I used a couple of templates to get around type casts, since in the end we need reinterpret_cast<>s anyway.

Using append instead of growByUninitialized + memcpy makes administration in the next patch simpler, and I think it's less of a footgun anyway.
Attachment #8753319 - Flags: review?(jdemooij)
The meat of this bug, as described in the first comment. This protects as aggressively as possible, only unprotecting the set of pages is absolutely needs to. I'm only asking for feedback because performance comparisons on Talos show that this consistently regresses Kraken on OSX by 3-4%, probably because it's unprotecting and reprotecting for every write that happens out of order. Oddly, Windows seems mostly unaffected, and even Linux (which uses the same mprotect calls as OSX) seems less affected.

I tried moving the unprotect-reprotect calls in patchCall, patchThunk, setNextJump and linkJump up to their callers, unprotecting all pages outside of for loops, but this didn't seem to help performance at all. Some of the callers are deeply embedded inside codegen, and I just don't know enough about how the JITs are structured to move the unprotect calls further up the call chain.

As a result, PageProtectingVector::disableProtection and PageProtectingVector::enableProtection are currently unused. These disable or re-enable protection for the entire buffer, they're just looking for a good place to do this and prevent the fine-grained unprotect-reprotect dance.
Attachment #8753323 - Flags: feedback?(jdemooij)
The actual performance comparison:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=cb6e6f2bb2e5&newProject=try&newRevision=c31d95b5cc01&framework=1&showOnlyImportant=0&showOnlyConfident=1

The tabpaint and tp5o regressions may be real as well, but the Kraken regression seems more consistent (though it was larger on Linux in an earlier comparison, oddly).
Fixed up PageProtectingVector::setContainingRegion to work with larger regions, and fixed up some outdated comments. Currently the patch only calls unprotectRegion and reprotectRegion on 4-byte regions, so this doesn't change behavior, but the logic was still wrong (and it's actually somewhat simpler now).
Attachment #8753323 - Attachment is obsolete: true
Attachment #8753323 - Flags: feedback?(jdemooij)
Attachment #8753364 - Flags: feedback?(jdemooij)
Since we discussed only enabling this for certain kinds of scripts on IRC, I figured I should confirm that this has negligible overhead when disabled. The Windows testing machines aren't playing ball, but Linux and OSX seem to confirm that we could land this infrastructure disabled by default without regressing anything:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=cb6e6f2bb2e5&newProject=try&newRevision=b2c1b85327ad&framework=1

(that ts_paint opt e10s regression on osx-10-10 *must* be spurious, it wasn't even that big with the mprotect calls *enabled*)
This version disables the protection by default so we can experiment with enabling it for certain kinds of compilations. Since it just adds the infrastructure, I'm putting it up for review this time (I also reordered things so it can land before bug 1271165).
Attachment #8753364 - Attachment is obsolete: true
Attachment #8753364 - Flags: feedback?(jdemooij)
Attachment #8753790 - Flags: review?(jdemooij)
Comment on attachment 8753319 [details] [diff] [review]
Part 1: Simplify and refactor AssemblerBuffer a bit to make it easier to replace mozilla::Vector.

Review of attachment 8753319 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jit/x86-shared/AssemblerBuffer-x86-shared.h
@@ +72,5 @@
> +    {
> +        template<typename T>
> +        void sizedAppendUnchecked(T value, size_t size)
> +        {
> +            m_buffer.infallibleAppend(reinterpret_cast<unsigned char*>(&value), size);

Can you make sizedAppendUnchecked MOZ_ALWAYS_INLINE? I'm a bit afraid some compilers may use a loop now (IIRC all our compilers optimized the memcpy calls to a single move instruction).

Maybe you can check what MSVC emits?
Attachment #8753319 - Flags: review?(jdemooij) → review+
Comment on attachment 8753790 [details] [diff] [review]
Part 2 v1.3: Add infrastructure to mark all fully used pages of AssemblerBuffer's vector as read-only (disabled by default).

Review of attachment 8753790 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for working on this and sorry for the delay. I'm a bit torn: it might help us track down this bug, but it also adds a bunch of overhead/complexity compared to a plain Vector.

Maybe we can use this temporarily on Nightly and after a few weeks we can change the AssemblerBuffer to use a plain Vector again?

::: js/src/ds/PageProtectingVector.h
@@ +32,5 @@
> +{
> +    size_t pageSize;
> +    size_t pageMask;
> +
> +    size_t offsetToPage;

Please add a brief description here. Unlike the other members it wasn't immediately obvious.

@@ +74,5 @@
> +            protectedBytes = 0;
> +        }
> +    }
> +
> +    MOZ_MUST_USE bool anyProtected(size_t first, size_t last) {

Nit: as this is just a bool query, MOZ_MUST_USE seems unnecessary/confusing.

@@ +78,5 @@
> +    MOZ_MUST_USE bool anyProtected(size_t first, size_t last) {
> +        return protectedBytes && last >= offsetToPage && first < offsetToPage + protectedBytes;
> +    }
> +
> +    void setContainingRegion(size_t first, size_t last, uintptr_t& addr, size_t& size) {

Convention is to use pointers for out params, because it's clearer at the call site.

@@ +120,5 @@
> +    };
> +
> +  public:
> +    explicit PageProtectingVector(AllocPolicy policy = AllocPolicy())
> +        : pageSize(gc::SystemPageSize())

Nit: these lines should be indented with 2 spaces, and comma usually goes at the end.

@@ +197,5 @@
> +            guard.emplace(this);
> +        return vector.reserve(size);
> +    }
> +
> +    template<typename U> void infallibleAppend(const U* values, size_t size) {

Nit: MOZ_ALWAYS_INLINE?

::: js/src/gc/Memory.h
@@ +43,5 @@
>  
>  void* TestMapAlignedPagesLastDitch(size_t size, size_t alignment);
>  
>  void ProtectPages(void* p, size_t size);
> +void ReadOnlyPages(void* p, size_t size);

Maybe use a verb, ProtectPagesReadOnly?
Attachment #8753790 - Flags: review?(jdemooij) → review+
(In reply to Jan de Mooij [:jandem] from comment #8)
> Thanks for working on this and sorry for the delay. I'm a bit torn: it might
> help us track down this bug, but it also adds a bunch of overhead/complexity
> compared to a plain Vector.
> 
> Maybe we can use this temporarily on Nightly and after a few weeks we can
> change the AssemblerBuffer to use a plain Vector again?

I'll address the review comments later, but to just quickly reply: I think the actual overhead added by this is low (I'll check the code that gets generated, as with part 1), but in general I agree. The only things preventing this from truly being a drop-in replacement (letting you enable/disable it with a one-liner) are the enableBufferProtection/disableBufferProtection/unprotectDataRegion/reprotectDataRegion definitions that all the classes need.

Do you think we should leave those in (but make the underlying ones in AssemblerBuffer do nothing), or take them out again too? It's unfortunate that we can't just add them where they're actually needed, but I can't see a way around that. The only other option I see would be to instead use a single function to expose the underlying vector on each level (or change the data() functions do to that), and then operate on those. Do you think that would be preferable? We could leave that in without lying about what it does.
(In reply to Jan de Mooij [:jandem] from comment #7)
> Can you make sizedAppendUnchecked MOZ_ALWAYS_INLINE? I'm a bit afraid some
> compilers may use a loop now (IIRC all our compilers optimized the memcpy
> calls to a single move instruction).
> 
> Maybe you can check what MSVC emits?

It turns out everything gets inlined, so I checked 4 callsites. Before the changes, the code MSVC emits is:

In movb_im(int32_t imm, int32_t offset, RegisterID base):
    mov rax,qword ptr [rbx+10h]
    mov rcx,qword ptr [rbx+8]
    mov byte ptr [rcx+rax],bpl
    inc qword ptr [rbx+10h]

In movw_im(int32_t imm, int32_t offset, RegisterID base):
    add qword ptr [rbx+8],2
    mov rcx,qword ptr [rbx+8]
    mov rax,qword ptr [rbx]
    mov word ptr [rcx+rax-2],bp

In movl_i32m(int32_t imm, int32_t offset, RegisterID base):
    add qword ptr [rbx+10h],4
    mov rcx,qword ptr [rbx+10h]
    mov rax,qword ptr [rbx+8]
    mov dword ptr [rcx+rax-4],ebp

In movq_i64r(int64_t imm, RegisterID dst):
    add qword ptr [rbx+10h],8
    mov rcx,qword ptr [rbx+10h]
    mov rax,qword ptr [rbx+8]
    mov qword ptr [rax+rcx-8],rsi

After part 1, everything is still inlined, and movb_im, movl_i32m and movq_i64r use the same amount of instructions (albeit reordered slightly), but movw_im uses:

    mov   rcx,qword ptr [rbx+8]
    movzx eax,bp
    add   rcx,qword ptr [rbx]
    mov   dword ptr [rsp+48h],ebp
    mov   word ptr [rcx],ax
    add   qword ptr [rbx+8],2

I'm not sure why it adds the movzx; it doesn't do anything special for the 8-bit case. I tried putting the size parameter into the template, but that made no difference (I do think it looks a bit nicer though).

After part 2, things no longer get inlined. Forcing it to inline by sprinkling MOZ_ALWAYS_INLINE about, it generates the following code for movl_i32m:

    mov rax,qword ptr [r14+28h]  
    add rax,qword ptr [r14+30h]  
    mov dword ptr [rax],ebp  
    add qword ptr [r14+30h],4  

    add qword ptr [r14+20h],4   # Bump PageProtectingVector::unprotectedBytes
    cmp byte ptr [r14+140h],0   # See if protection is enabled
    mov rax,qword ptr [r14+20h] # Load PageProtectingVector::unprotectedBytes
    je  07FF654C3FDD3h          # Move on if protection is disabled
    cmp rax,qword ptr [r14]     # Compare against PageProtectingVector::pageSize
    jl  07FF654C3FDD3h          # Move on if unprotectedBytes < pageSize

    mov rbx,qword ptr [r14+8]   # Load PageProtectingVector::pageMask
    mov rcx,qword ptr [r14+10h] # Add PageProtectingVector::offsetToPage
    not rbx                     # Invert the page mask
    add rcx,qword ptr [r14+18h] # Add PageProtectingVector::protectedBytes
    and rbx,rax                 # Set rbx to the number of bytes to protect
    add rcx,qword ptr [r14+28h] # Add PageProtectingVector::vector.begin()
    mov rdx,rbx
    call js::gc::MakePagesReadOnly
    sub qword ptr [r14+20h],rbx # Adjust PageProtectingVector::unprotectedBytes
    add qword ptr [r14+18h],rbx # Adjust PageProtectingVector::protectedBytes

Maybe the last block would be nicer out-of-line, but that seems to require a separate function marked as MOZ_NEVER_INLINE. The generated code looks good to me though.
Carrying forward r+.

(In reply to Jan de Mooij [:jandem] from comment #7)
> Can you make sizedAppendUnchecked MOZ_ALWAYS_INLINE?

Done, more details in comment #10. I assume we can overlook that single extra instruction in the 16-bit case ;)
Attachment #8753319 - Attachment is obsolete: true
Attachment #8757487 - Flags: review+
Thanks for the reviews! Carrying forward r+.

(In reply to Jan de Mooij [:jandem] from comment #8)
> > +    size_t offsetToPage;
> 
> Please add a brief description here. Unlike the other members it wasn't
> immediately obvious.

Done. I also added an even longer comment for unprotectedBytes, explaining why it's signed. I think that particular bit of cleverness was warranted, since it simplified things a fair bit, but it deserves an explanation.

> > +    MOZ_MUST_USE bool anyProtected(size_t first, size_t last) {
> 
> Nit: as this is just a bool query, MOZ_MUST_USE seems unnecessary/confusing.

OK, makes sense. I wasn't sure if it was going to be policy to use MOZ_MUST_USE everywhere, but this is just a private method anyway.

> > +    void setContainingRegion(size_t first, size_t last, uintptr_t& addr, size_t& size) {
> 
> Convention is to use pointers for out params, because it's clearer at the
> call site.

OK, done. Having to use *ptr = value for outparams always looks a bit.. C-y to me, but I suppose it's better to be explicit.

> > +    explicit PageProtectingVector(AllocPolicy policy = AllocPolicy())
> > +        : pageSize(gc::SystemPageSize())
> 
> Nit: these lines should be indented with 2 spaces, and comma usually goes at
> the end.

Oh, hmm, place I based this off originally must have used a different style. Fixed.

> > +    template<typename U> void infallibleAppend(const U* values, size_t size) {
> 
> Nit: MOZ_ALWAYS_INLINE?

Done; needed too, at least for MSVC. I had to sprinkle MOZ_ALWAYS_INLINE around a few other places as well. I doubt the increase in code size will be a concern (the next best thing requires MOZ_NEVER_INLINE on an out-of-line definition and only saves 9 instructions - didn't seem worth it).

> > +void ReadOnlyPages(void* p, size_t size);
> 
> Maybe use a verb, ProtectPagesReadOnly?

Ended up going with MakePagesReadOnly. I was trying for something that fit in with the existing function names, but it didn't really work out :)
Attachment #8753790 - Attachment is obsolete: true
Attachment #8757491 - Flags: review+
Try run with a one-liner on top of part 2 to actually enable the protection everywhere:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=7ce356935214

This will actually land disabled with the protection disabled (and shouldn't add much overhead), but seeing it work on try gives me confidence that I didn't miss anything.
Status: NEW → ASSIGNED
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/dbd9ceba3964
https://hg.mozilla.org/mozilla-central/rev/9a76ad279eaa
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla49
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #2)
> I tried moving the unprotect-reprotect calls in patchCall, patchThunk,
> setNextJump and linkJump up to their callers, unprotecting all pages outside
> of for loops, but this didn't seem to help performance at all. Some of the
> callers are deeply embedded inside codegen, and I just don't know enough
> about how the JITs are structured to move the unprotect calls further up the
> call chain.

Incidentally, I just hit these two unprotect-reprotect calls; they happen inside a loop with millions of elements so they are doubling overall wasm compilation time for Unity-sized apps.  I understand the code enough that I should be able to hoist them outside the loop but next time, feel free to ping me :)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: