Closed Bug 1721020 Opened 3 years ago Closed 10 months ago

Rewrite stack probes to update stack pointer first

Categories

(Core :: JavaScript Engine, task, P3)

task

Tracking

()

RESOLVED DUPLICATE of bug 1839669

People

(Reporter: sourc7, Assigned: jseward)

References

(Blocks 1 open bug)

Details

(Whiteboard: [reporter-external] [client-bounty-form] [verif?])

Attachments

(3 files)

Attached file testcase.html

After initialize a lot of new variable to null, then multiple times call new String to trigger GC? (probably), then after call array[x] = true the Valgrind show Invalid write of size 4 from JS stack.

Interestingly I got different Valgrind JS stack output when changing the array[x] = value value to another JS code (e.g. string).

When I run Firefox valgrind build from fuzzfetch, the Valgrind show invalid write of size 4 stack address with only a question mark ??? which doesn't provide clues to the root cause. However when compiling Firefox valgrind build with system clang, I got insightful stack output showing that Invalid write of size 4 is from JS stack.

When finding the regression I found Valgrind report Invalid write of size 4 from JS stack after checking out to commit Unify stack touch logic across all platforms.

Steps to reproduce:

  1. Install Valgrind on Linux
  2. Open Terminal
  3. Run python3 -m pip install fuzzfetch to install Mozilla Fuzzfetch
  4. Run fuzzfetch --valgrind to download Firefox Valgrind build
  5. Open Firefox Valgrind directory
  6. Run Valgrind with command valgrind --trace-children=yes --smc-check=all-non-file --vex-iropt-register-updates=allregs-at-mem-access --show-mismatched-frees=no --read-inline-info=yes firefox
  7. Visit attached testcase.html
  8. After page load is finish, Valgrind will show output "Invalid write of size 4"

Valgrind JS stack:

==3612== Thread 1 Web Content:
==3612== Invalid write of size 4
==3612==    at 0x147C0777148D: ???
==3612==    by 0x4F85D8F: ???
==3612==    by 0xFCEFFFF: slotForIndex (dist/include/mozilla/HashTable.h:1734)
==3612==    by 0xFCEFFFF: lookup<mozilla::detail::HashTable<mozilla::HashMapEntry<js::ScriptSourceChunk, mozilla::UniquePtr<void, JS::FreePolicy> >, mozilla::HashMap<js::ScriptSourceChunk, mozilla::UniquePtr<void, JS::FreePolicy>, js::ScriptSourceChunkHasher, js::SystemAllocPolicy>::MapHashPolicy, js::SystemAllocPolicy>::ForNonAdd> (dist/include/mozilla/HashTable.h:1750)
==3612==    by 0xFCEFFFF: readonlyThreadsafeLookup (dist/include/mozilla/HashTable.h:2074)
==3612==    by 0xFCEFFFF: lookup (dist/include/mozilla/HashTable.h:2079)
==3612==    by 0xFCEFFFF: lookup (dist/include/mozilla/HashTable.h:236)
==3612==    by 0xFCEFFFF: mozilla::Utf8Unit const* js::UncompressedSourceCache::lookup<mozilla::Utf8Unit>(js::ScriptSourceChunk const&, js::UncompressedSourceCache::AutoHoldEntry&) (js/src/vm/JSScript.cpp:1881)
==3612==    by 0x1FFEFFBE3F: ???
==3612==    by 0x1FFEFFBDEF: ???
==3612==    by 0x147C0777154E: ???
==3612==    by 0x1042: ???
==3612==    by 0x2857C1E51317: ???
==3612==  Address 0x1ffeffb540 is on thread 1's stack
==3612==  2048 bytes below stack pointer
Flags: sec-bounty?
Attached file valgrind.full.log
Group: firefox-core-security → javascript-core-security
Component: Security → JavaScript Engine
Product: Firefox → Core

Matthew, as this might be a regression from an old patch of yours (bug 1488763), maybe you could take a look? Thanks.

Flags: needinfo?(mgaudet)

Writes at exactly 2048 bytes are generally stack probes of some kind (that is, deliberately compiled in), I have observed in the past. That said, I've run Fx (an m-c build) on V quite a lot recently (and, also, Mochitests) and I don't recall seeing any such reports.

So, definitely we do intentional stack probes at 2048 byte intervals on all platforms.

Is the reason we see this on this test case just unluckiness triggering heuristics for valgrind? Or is this a legitimate issue because we've missed some precondition for touching the stack that we ought to have.

Could this be more of a Valgrind configuration issue than anything else?

Flags: needinfo?(mgaudet) → needinfo?(jseward)

We currently do stack probes prior to moving the stack pointer:

    masm.touchFrameValues(numStackValues, scratch, framePtr);
    masm.mov(rsp, framePtr);

    // Reserve space for locals and stack values.
    Register valuesSize = regs.takeAny();
    masm.mov(numStackValues, valuesSize);
    masm.shll(Imm32(3), valuesSize);
    masm.subPtr(valuesSize, rsp);

I think the important question here is whether there's anything wrong with that decision. My expectation is that a) it's fine, b) this is a false positive in Valgrind, and c) we could consider refactoring this code to avoid the false positive, but I'm not 100% confident.

For reference, it looks like Rust's stack probing code modifies the stack pointer in the loop, then resets it afterwards.

I'll investigate. I have run both the JS shell and the browser quite a bit on V in
the past month or so, and I'm mystified as to why I haven't seen any such complaints.

Flags: needinfo?(jseward)

assigning to Julian for the investigation in comment 6

Assignee: nobody → jseward
Severity: -- → N/A
Priority: -- → P1

I can reproduce this, using the supplied test case, on m-c
586586:3881437c0949.

In the attachment is a disassembly of the 256 bytes surrounding the offending
instruction. The instruction is part of the following loop

   // loop setup
   0x000013dd6a2a346e:  mov    %rsp,%rbp
   0x000013dd6a2a3471:  mov    %rdx,%rax
   0x000013dd6a2a3474:  shl    $0x3,%rax
   0x000013dd6a2a3478:  sub    %rax,%rbp
   0x000013dd6a2a347b:  mov    %rsp,%rax
   0x000013dd6a2a347e:  sub    $0x800,%rax

   // loop body
   0x000013dd6a2a3484:  cmp    %rbp,%rax
   0x000013dd6a2a3487:  jb     0x13dd6a2a349b
=> 0x000013dd6a2a348d:  movl   $0x0,(%rax)  <--- the offending insn
   0x000013dd6a2a3493:  sub    $0x800,%rax
   0x000013dd6a2a3499:  jmp    0x13dd6a2a3484

   // post-loop
   0x000013dd6a2a349b:  mov    %rsp,%rbp

which, as Iain points out, looks a lot like what
MacroAssembler::touchFrameValues() in jit/MacroAssembler.cpp would generate.

I don't think this is a bug in Gecko -- it looks quite deliberate to me. I
also wouldn't call it a bug in Valgrind, since, per the ELF x86_64 ABI, any
access at or below -128(%rsp) is illegal, so V is correct to report it.

On that last ABI-related point, it might be cleaner to use the use the Rust
scheme as linked to in comment 5. It strikes me that -- even if we think we
know what we're doing -- accessing memory in a place that, at least, the ELF
x86_64 ABI says is illegal, is pretty dubious, because we can't guarantee that
the Linux kernel isn't relying on the ABI-specified requirement in some way we
don't understand.

I'd say the Rust way -- move %rsp down incrementally, so that all of the probe
accesses are at 0(%rsp) or above, hence, per the ABI, legal, might be safer.

In any case, I would vote for closing as INVALID or at least significant
de-prioritisation of this bug.

(In reply to Julian Seward [:jseward] (PTO, back on 3 Aug 2021) from comment #8)

I don't think this is a bug in Gecko -- it looks quite deliberate to me. I
also wouldn't call it a bug in Valgrind, since, per the ELF x86_64 ABI, any
access at or below -128(%rsp) is illegal, so V is correct to report it.

Ah, this is the piece I was missing! Thanks.

I think this is a real bug, and we should fix it (probably by matching Rust's approach), but it's not P1 and I don't believe it's security sensitive. I'm going to open it up and adjust the priority.

Group: javascript-core-security
Priority: P1 → P3
Summary: Valgrind: Invalid write of size 4 from JS stack → Rewrite stack probes to update stack pointer first

we should fix it (probably by matching Rust's approach)

I should perhaps add: x86_64-linux is relatively speaking a minority platform,
so a perhaps stronger reason to fix it is because of arm64-android and
arm32-android: in both cases I believe the relevant ELF ABI does not allow
accesses at any address below the stack pointer.

Is this a "perf" key word bug?

No, this shouldn't have any measurable impact on performance.

Flags: sec-bounty? → sec-bounty-
Status: NEW → RESOLVED
Closed: 10 months ago
Duplicate of bug: 1839669
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: