Closed Bug 1332594 Opened 3 years ago Closed 3 years ago

Detect poison in AssemblerBuffer resulting from calling realloc

Categories

(Core :: JavaScript Engine, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla54
Tracking Status
firefox53 --- affected
firefox54 --- fixed

People

(Reporter: ehoogeveen, Assigned: ehoogeveen)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 6 obsolete files)

From bug 1124397, it looks like realloc itself is causing part of our buffer to be poisoned. Let's check and make sure.
This changes ProtectedReallocPolicy to check for poison and other shenanigans resulting from realloc, and also turns off the other protections of PageProtectingVector (aside from the buffer overflow assertion). I made it so that we fall back to normal realloc in case of OOM since these extra allocations can be pretty big.

It'd be nice to get this into 53 since the crash rate on Aurora is a lot higher (though I suppose having the current accidental workaround would be kind of nice for Aurora users!), but it doesn't matter that much as crash stats won't tell us exactly what's going wrong.
Attachment #8828763 - Flags: review?(jdemooij)
The previous patch was too slow. This removes the poison check after realloc (which didn't add much) and makes the check before it less extreme. We still compare the new buffer to the original. This patch also switches AssemblerBuffer back to using plain mozilla::Vector, but with the new alloc policy.

With this, performance more or less matches the status quo (only doing this for large buffers didn't affect performance much - the bigger the buffer, the larger the overhead).
Attachment #8828763 - Attachment is obsolete: true
Attachment #8828763 - Flags: review?(jdemooij)
Attachment #8828784 - Flags: review?(jdemooij)
Comment on attachment 8828784 [details] [diff] [review]
v2 - Check for poison and other corruption during realloc.

Review of attachment 8828784 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good. I wonder if it would be useful to store the buffer size etc somewhere on the stack, so we can get it from crash dumps, but we can always do that later when we know we're crashing here.

::: js/src/ds/PageProtectingVector.h
@@ +559,5 @@
> +
> +        // Check for the poison pattern every so often.
> +        const uint8_t* oldAddrBytes = reinterpret_cast<const uint8_t*>(oldAddr);
> +        for (size_t j, i = 0; i < bytes; i += 1024) {
> +            for (j = 0; j < 16 && oldAddrBytes[i + j] == PoisonPattern; ++j);

We need to ensure i + j stays in bounds right? Could change |j < 16| to |j < 16 && i + j < bytes|. Or change |i < bytes| in the outer loop to |i + 16 < bytes|. Or something.

@@ +579,5 @@
> +
> +        const uint8_t* newAddrBytes = reinterpret_cast<const uint8_t*>(newAddr);
> +        const uint8_t* tmpAddrBytes = reinterpret_cast<const uint8_t*>(tmpAddr);
> +        for (size_t i = 0; i < bytes; ++i) {
> +            if (MOZ_UNLIKELY(newAddrBytes[i] != tmpAddrBytes[i])) {

Maybe use PodEqual here:

if (!mozilla::PodEqual(tmpAddrBytes, newAddrBytes, bytes)) {
    if (oldAddr == newAddr)
    ...    
}

PodEqual will use memcmp for larger buffers so it might also be a bit faster (assuming the compiler doesn't auto-vectorize this loop).
Attachment #8828784 - Flags: review?(jdemooij) → review+
Thanks!

(In reply to Jan de Mooij [:jandem] from comment #3)
> Looks good. I wonder if it would be useful to store the buffer size etc
> somewhere on the stack, so we can get it from crash dumps, but we can always
> do that later when we know we're crashing here.

If this *is* corruption in realloc I'm not sure how much more that would tell us. But we can worry about that once we know for sure.

> We need to ensure i + j stays in bounds right? Could change |j < 16| to |j <
> 16 && i + j < bytes|. Or change |i < bytes| in the outer loop to |i + 16 <
> bytes|. Or something.

Oops, wrote that a little too quickly. Fixed.

> Maybe use PodEqual here:

Done. Looks a little cleaner too :)
Attachment #8828784 - Attachment is obsolete: true
Attachment #8828812 - Flags: review+
Hmm, getting some failures on try with the poison pattern scan. I think that's because we only set mozjemalloc's opt_junk on debug builds, so if we reserve a larger buffer before using most of the previous one there might be random poison and other stuff in the buffer we're using.

The poison detection was just a sanity check anyway, so let's just get rid of it.
Attachment #8828812 - Attachment is obsolete: true
Attachment #8828855 - Flags: review+
I don't know what's up with the TraceLogger assertion failure, but I can't imagine it's related to this patch. Otherwise green: https://treeherder.mozilla.org/#/jobs?repo=try&revision=6eae3a3f83072e0273e137e32b69357dea4e7886
Status: NEW → ASSIGNED
Keywords: checkin-needed
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/209d492c3ec5
Check for poison and other corruption during realloc. r=jandem
Keywords: checkin-needed
How strange! Okay, this requires a debug build and --tbpl to reproduce, specifically "--ion-eager --ion-offthread-compile=off". With that the assertion at [1] fails.

The text for id is "script self-hosted:3481:0"
The text for prev is "script c:\\Users\\Emanuel\\firefox\\mozilla-central\\js\\src\\jit-test\\tests\\tracelogger\\bug1302417.js:13:24"

I have no idea what this is testing or why it would fail with this patch.

Hannes, this patch adds extra allocations (using js_pod_malloc), but all we use them for is to compare two buffers. In the event of OOM we should fall back to the normal logic, so I wouldn't expect it to make a difference there, although it might temporarily increase the total amount of memory used. Any idea why it could cause this assertion to fail?

[1] https://dxr.mozilla.org/mozilla-central/rev/3cedab21a7e65e6a1c4c2294ecfb5502575a46e3/js/src/vm/TraceLogging.cpp#576
Flags: needinfo?(hv1989)
If we fail to create the events for ion, we need to forcefully disable it and fail the tracelogger. We cannot recover TL anymore. We bake nullptr in. As a result when executing that script we will not report the start and stop events. As a result our log cannot be correct. Lets be obvious about it and fail Tracelogger (with message if TLOPTIONS=Errors is set).
Flags: needinfo?(hv1989)
Attachment #8831109 - Flags: review?(bbouvier)
Comment on attachment 8831109 [details] [diff] [review]
Silently fail the logger when not-recoverable

Review of attachment 8831109 [details] [diff] [review]:
-----------------------------------------------------------------

r=me with the nit addressed, but please let's open a new bug (or reuse the refactoring one) and let's do not derail this one (since it makes tracking flags and assignee totally inconsistent).

::: js/src/jit/CodeGenerator.cpp
@@ +9857,5 @@
>              Assembler::PatchDataWithValueCheck(CodeLocationLabel(code, patchableTraceLoggers_[i]),
>                                                 ImmPtr(logger),
>                                                 ImmPtr(nullptr));
>          }
> +        TraceLoggerSilentFail(logger, "OOM when trying to patch ion log events.");

Shouldn't this be in the else branch?
Attachment #8831109 - Flags: review?(bbouvier) → review+
Updated patch.
Attachment #8831109 - Attachment is obsolete: true
Attachment #8831123 - Flags: review+
Thanks! I ran all the jit-tests locally with that patch applied and --tbpl, and all of them pass now (some tests fail with --slow, but the same test fails with and without the patches in this bug).

If you'd like to land this that would be great (I need to request something higher than level 1 access sometime..), but note that bbouvier asked for attachment 8831109 [details] [diff] [review] to go in its own bug.
Pushed by hv1989@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/740e1005d4b6
TraceLogger: Get --disable-trace-logging working again, r=bbouvier
Comment on attachment 8831123 [details] [diff] [review]
Silently fail the logger when not-recoverable

Landed in bug 1334426
Attachment #8831123 - Attachment is obsolete: true
(In reply to Pulsebot from comment #14)
> Pushed by hv1989@gmail.com:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/740e1005d4b6
> TraceLogger: Get --disable-trace-logging working again, r=bbouvier

Hmmm. Somehow I managed to add the wrong bug id on that patch. Very sorry!
Pushed by hv1989@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ecd98c460060
Check for poison and other corruption during realloc. r=jandem
I don't know if this is real or if we have to add a suppression. Is there a way to coax Valgrind into telling us how long it thinks the allocated region was? Getting a jemalloc callstack might be too much to ask since the realloc has already happened at this point.
Flags: needinfo?(emanuel.hoogeveen)
Keywords: leave-open
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #19)
> I don't know if this is real or if we have to add a suppression. Is there a
> way to coax Valgrind into telling us how long it thinks the allocated region
> was? Getting a jemalloc callstack might be too much to ask since the realloc
> has already happened at this point.

Maybe the problem is that not all of the Vector's memory has been initialized when we resize it (because length < capacity), so there will be some uninitialized bytes at the end...
Ah, good point. So the same problem as what I described in comment #5, except we don't actually care that the bytes are uninitialized (so long as they match before and after). Still, probably better to just initialize them than to just add a suppression and potentially miss other problems.
Sorry about the size of this patch, but this is mostly code removal. PageProtectingVector seemed the logical place to add this initialization, but it was bothering me how bloated PageProtectingVector was becoming, so I tore out the reentrance guard and the poison check. The actual code to add initialization is very small.
Attachment #8831711 - Flags: review?(jdemooij)
This patch didn't really change, so carrying forward r+. With part 1 the Valgrind issue seems to be gone, as expected: https://treeherder.mozilla.org/#/jobs?repo=try&revision=5ca86f3673c7978f332e6afbb1ece6dc3bcc0615
Attachment #8828855 - Attachment is obsolete: true
Attachment #8831712 - Flags: review+
Attachment #8831711 - Flags: review?(jdemooij) → review+
Wow, that was fast :) Thanks! Oh, I forgot to mention, I also fixed up some small style issues in AssemblerBuffer-x86.h that were bothering me. It still uses a nonstandard style, but at least it's consistent now.
By the way, locally this improves single threaded WASM baseline compilation performance on the test I've been using from ~710ms to ~650ms, where just part 1 improves it to ~630ms (so the check adds ~20ms).
Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/1064fae59e5d
Part 1: Simplify PageProtectingVector and make it initialize new buffers. r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/51901165b259
Part 2: Check AssemblerBuffer for corruption during realloc. r=jandem
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/1064fae59e5d
https://hg.mozilla.org/mozilla-central/rev/51901165b259
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla54
You need to log in before you can comment on or make changes to this bug.