Closed Bug 1535194 Opened 5 years ago Closed 5 years ago

Silent overflow in diffB during far jump setup leads to branch-to-wild-location

Categories

(Core :: JavaScript: WebAssembly, defect, P2)

ARM
All
defect

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox-esr60 67+ fixed
firefox65 --- wontfix
firefox66 --- wontfix
firefox67 + fixed
firefox68 + fixed

People

(Reporter: gkw, Assigned: lth)

References

(Blocks 1 open bug)

Details

(5 keywords, Whiteboard: [jsbugmon:][adv-main67+][adv-esr60.7+])

Attachments

(2 files, 1 obsolete file)

Attached file 504.zip (obsolete) —

The following testcase crashes on mozilla-central revision 85d3d2603b8f (build with --enable-debug --enable-more-deterministic --enable-simulator=arm and other 32-bit build parameters, run with --fuzzing-safe --no-threads --no-baseline --no-ion --wasm-compiler=baseline):

See attachment.

Backtrace:

#0 js::jit::Simulator::instructionDecode (this=0xf6b3d000, instr=0x3140183c) at js/src/jit/arm/Simulator-arm.cpp:4670
#1 0x57f12c6c in js::jit::Simulator::execute<false> (this=<optimized out>) at js/src/jit/arm/Simulator-arm.cpp:4750
#2 js::jit::Simulator::callInternal (this=0xf6b3d000, entry=0x346afe28) at js/src/jit/arm/Simulator-arm.cpp:4828
#3 0x57f131ff in js::jit::Simulator::call (this=0xf6b3d000, entry=0x346afe28, argument_count=2) at js/src/jit/arm/Simulator-arm.cpp:4915
#4 0x5829b58d in js::wasm::Instance::callExport (this=0xf6b73b00, cx=0xf6b1c800, funcIndex=46326, args=...) at js/src/wasm/WasmInstance.cpp:1579
#5 0x582b2635 in WasmCall (cx=<optimized out>, argc=0, vp=0xf6587068) at js/src/wasm/WasmJS.cpp:1433
/snip

For detailed crash information, see attachment.

Setting s-s as a start as I am not sure if this is simulator-specific.

autobisectjs shows this is probably related to the following changeset:

The first bad revision is:
changeset: https://hg.mozilla.org/mozilla-central/rev/c474d2c881f5
user: Lars T Hansen
date: Thu Jan 31 08:50:38 2019 +0100
summary: Bug 1524178 - Enable lock-free 8-byte atomics for ARM+ARM64 simulators. r=jseward

Lars, is bug 1524178 a likely regressor?

Blocks: 1524178
Flags: needinfo?(lhansen)
Assignee: nobody → lhansen
Status: NEW → ASSIGNED
Flags: needinfo?(lhansen)

I can repro the crash.

Please be more specific re command line arguments when specifying the STR. The actual command line is this: "--fuzzing-safe --no-threads --no-baseline --no-ion --wasm-compiler=baseline w504-out.wrapper w504-out.wasm", specifically the wasm file must be present as a command line argument.

Looks like an unpatched call:

  ...
  0x4b2b6830  e59d9048       ldr r9, [sp, #+72]
  0x4b2b6834  eb800000       bl -33554424 -> 0x492b683c

Note the bogus relative address in the bl. The target is not in memory and we crash when trying to fetch at the target address.

Definitely baseline code, both because Ion has been disabled and because there's a baseline prologue a hundred instructions earlier.

I think what's going on here is a bug in the way we handle far calls.

We're in ModuleGenerator.cpp in linkCallSites(). We are looking at a CallSiteDesc::Func where the callee (target) has already been compiled, and is more than 20MB away from the caller, and the callee is not yet in the map of callFarJumps.

So we create a far jump island with farJumpWithPatch() and this code is correct, and then finally we patch the address of this island into the original call.

The problem is that there is so much code in this module that the island is more than 32MB away from the caller that is going to be patched, so in MacroAssembler::patchCall() in MacroAssembler-arm.cpp we overflow the difference and end up with a zero value for our bl instruction.

The difference computation is in diffB() in IonAssemblerBuffer.h, and it strikes me as a footgun that it just returns nothing if the computation overflows, but I don't know if there are other uses that require this.

(This was not regressed by bug 1524178, I think; at most that patch would have moved code around somewhat so that this problem is surfaced.)

Now it's possible that this situation shouldn't happen because we should be sure to insert the far jump islands earlier, but we only do the necessary checking per-task (in ModuleGenerator::finishTask() in WasmGenerator.cpp), and with --no-threads there is only one task so we only do it at the end of compilation, which is too late. And even doing it per-task in a concurrent setting, frankly, it's possible that certain kinds of code (exceptionally large functions, say, or just large functions on architectures with more limited branch distances) will run into this bug. Need to investigate that further.

Of course threads are always enabled in release settings and the bug is probably ARM-only (among Tier 1 platforms anyway), so it doesn't strike me as a high severity problem, but we should fix it.

OS: Linux → All
Priority: -- → P2
Summary: Crash [@ js::jit::Simulator::instructionDecode] → Silent overflow in diffB during far jump setup leads to branch-to-wild-location

Still running some tests, but so far there seems to be no ill effects of replacing "return BOffImm()" with "MOZ_CRASH()", and that would at least close the security hole.

The overflow check was introduced in bug 1300550, where callers of diffB() would check whether the returned offset was invalid and then do something sensible on a case-by-case basis. In our case, patchCall() does not do anything sensible, so maybe that's the problem. patchCall() is not the only code to use diffB() this way, as_b() also does.

We were missing error checks at two points. In one case an error
return is meaningful; in another case it is not, as the problem should
have been guarded against at a higher level by emitting far jump
islands soon enough during pasteup of compiled code.

So this shouldn't happen because we check whether we need to insert far jump islands before we merge in a new task's resulting code:
https://searchfox.org/mozilla-central/source/js/src/wasm/WasmGenerator.cpp#769-777

I think what must be happening is that task->output.bytes.length() is itself bigger than the jump range, so even this check is insufficient.

Now ultimately we have to fail if a single function generates more code than the jump range and I assume we have a max-size check against that somewhere that makes this fail (otherwise plain old intra-function branches would start having range failures).

One thing I noticed is that in our batching logic:
https://searchfox.org/mozilla-central/source/js/src/wasm/WasmGenerator.cpp#853-876
we are unconditionally adding a new function to a task's inputs before checking whether the task is too big. So my theory of what's happening here is that we have a task that already has a bunch of bytecode and then we add on a gigantor function that itself doesn't go over the limit (otherwise we would've OOMed) but when appended to task->output.bytes.length(), we oom.

Could you check whether that is indeed the case?

If so, I think the right fix would be:

  • check batch size before appending the new function (in compileFuncDef)
  • because bytecode size isn't a hard bound on machine code size, add an OOM check if task->output.bytes.length() is too big (judged by InRange(0, task->output.bytes.length()))

I will go after the batch size bug too, but I am going to try to land this small patch since it guards decisively against the problem.

Comment on attachment 9050992 [details]
Bug 1535194 - Always check error return from BufferOffset::diffB. r?luke

Security Approval Request

  • How easily could an exploit be constructed based on the patch?: It takes a leap of imagination to go from the patch to the problem. And then the problem is going to be hard to exploit because it's hard to control the layout of the machine code; the machine code pool is somewhat chaotic and compilation is parallel on multicore (== all) systems. Any reliable exploit will probably be system-class dependent at best (eg number and speed of CPUs and concurrent workload).
  • Do comments in the patch, the check-in comment, or tests included in the patch paint a bulls-eye on the security problem?: No
  • Which older supported branches are affected by this flaw?: ESR 60
  • If not all supported branches, which bug introduced the flaw?: None
  • Do you have backports for the affected branches?: The patch applies directly to ESR60
  • If not, how different, hard to create, and risky will they be?:
  • How likely is this patch to cause regressions; how much testing does it need?: It is completely unlikely to cause regressions and should need no further testing.

Should also uplift to beta but not critical to do anything about release at this point since we're close to merge day, IMO.

Attachment #9050992 - Flags: sec-approval?

Comment on attachment 9050992 [details]
Bug 1535194 - Always check error return from BufferOffset::diffB. r?luke

This will still make it to Beta67 without any extra effort if it lands on m-c today. Please just go ahead and land it. Bonus points for including a test :)

Flags: needinfo?(lhansen)
Attachment #9050992 - Flags: sec-approval?

Comment on attachment 9050992 [details]
Bug 1535194 - Always check error return from BufferOffset::diffB. r?luke

Oh sorry, I misread the deps of this bug and thought this was 67-only. Nevermind me then about not needing sec-approval and including tests!

Flags: needinfo?(lhansen)
Attachment #9050992 - Flags: sec-approval?
Blocks: 1536036
Blocks: 1536039

(In reply to Luke Wagner [:luke] from comment #8)

Could you check whether that is indeed the case?

Filed bug 1536039 and am tracking concerns of this nature generally under bug 1536036.

Blocks: 1536105

(In reply to Luke Wagner [:luke] from comment #8)

If so, I think the right fix would be:

  • check batch size before appending the new function (in compileFuncDef)
  • because bytecode size isn't a hard bound on machine code size, add an OOM check if task->output.bytes.length() is too big (judged by InRange(0, task->output.bytes.length()))

We're using that InRange check a couple of places (in as_b and as_bl in the ARM assembler), but it's a crude tool, and I think that if we're going to enforce a buffer size limit we should put this limit into the buffer code so that we OOM when we try to make the buffer larger than we can handle.

Even more so, using BOffImm::InRange(current_length) as the assembler does is a more lenient test than InRange(0, current_length) because the former uses a 32MB limit, which is what the branches can handle, but the latter uses a 20MB limit, more appropriate for the far jump islands. But really I think we should cut the buffer for individual compilations off at 32MB (for ARM) and instead investigate whether we can improve the batching logic to allow extremely large programs to work so long as no one function exceeds the buffer size. See comments on bug 1535482.

(Still haven't performed the investigation you asked about; coming.)

(In reply to Lars T Hansen [:lth] from comment #14)

(Still haven't performed the investigation you asked about; coming.)

Should sec-approval wait for this investigation?

(In reply to Al Billings [:abillings] from comment #15)

(In reply to Lars T Hansen [:lth] from comment #14)

(Still haven't performed the investigation you asked about; coming.)

Should sec-approval wait for this investigation?

No, that's not necessary or desirable, this patch is sufficient on its own to prevent any security problem, the rest of the work is about resource management and is ongoing, see bug 1535482 if you're intensely curious.

In fact this bug is now blocking a bunch of other bugs, so the sooner we can get it landed the better.

Blocks: 1535482

Al, any hope of having a resolution here soonish? This bug is now blocking other work.

Flags: needinfo?(abillings)

Comment on attachment 9050992 [details]
Bug 1535194 - Always check error return from BufferOffset::diffB. r?luke

sec-approval+ for trunk.

Flags: needinfo?(abillings)
Attachment #9050992 - Flags: sec-approval? → sec-approval+

Ryan, given that the sec-approval request already stated that this wants an uplift to esr60, do I need to request that specially?

Flags: needinfo?(ryanvm)

Comment on attachment 9050992 [details]
Bug 1535194 - Always check error return from BufferOffset::diffB. r?luke

Beta/Release Uplift Approval Request

  • Feature/Bug causing the regression: None
  • User impact if declined: Risk of security exploit on 32-bit ARM - it's conceivable that an attacker will be able to construct a program that leaves a branch instruction target uninitialized in such a way that the program can be made to jump to an address it should not jump to. It's conceivable, though unlikely, that the attacker can use this for an exploit.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Very local patch that clearly inserts two missing error checks and propagate errors, along other paths that already have error propagation.
  • String changes made/needed:
Attachment #9050992 - Flags: approval-mozilla-beta?

(In reply to Lars T Hansen [:lth] from comment #20)

Ryan, given that the sec-approval request already stated that this wants an
uplift to esr60, do I need to request that specially?

Yes, sec-approval is independent from uplift approval.

Flags: needinfo?(ryanvm)

Comment on attachment 9050992 [details]
Bug 1535194 - Always check error return from BufferOffset::diffB. r?luke

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration:
  • User impact if declined: Risk of security exploit on 32-bit ARM - it's conceivable that an attacker will be able to construct a program that leaves a branch instruction target uninitialized in such a way that the program can be made to jump to an address it should not jump to. It's conceivable, though unlikely, that the attacker can use this for an exploit.
  • Fix Landed on Version: 68
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Very local patch that clearly inserts two missing error checks and propagate errors, along other paths that already have error propagation.
  • String or UUID changes made by this patch:
Attachment #9050992 - Flags: approval-mozilla-esr60?
Group: javascript-core-security → core-security-release
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68

Comment on attachment 9050992 [details]
Bug 1535194 - Always check error return from BufferOffset::diffB. r?luke

Uplift approved for 67 beta 9, thanks.

Attachment #9050992 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Flags: qe-verify-

Comment on attachment 9050992 [details]
Bug 1535194 - Always check error return from BufferOffset::diffB. r?luke

Fix for sec-critical issue, OK for uplift to ESR 60.7.

Attachment #9050992 - Flags: approval-mozilla-esr60? → approval-mozilla-esr60+
Whiteboard: [jsbugmon:] → [jsbugmon:][adv-main67+][adv-esr60.7+]
Attached file 504.tar.xz

Compile with --enable-debug --enable-more-deterministic --enable-simulator=arm and other 32-bit configure flags, run with --fuzzing-safe --no-threads --no-baseline --no-ion --wasm-compiler=baseline w504-out.wrapper w504-out.wasm and on m-c rev 9bad25ccc8d0 , here's a further reduced testcase (for future reference) using wasm-reduce.

Attachment #9050862 - Attachment is obsolete: true
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: