Closed Bug 1721020 Opened 3 years ago Closed 10 months ago

Rewrite stack probes to update stack pointer first

Tracking

()

Status:

RESOLVED DUPLICATE of bug 1839669

People

(Reporter: sourc7, Assigned: jseward)

References

(Blocks 1 open bug)

Details

(Whiteboard: [reporter-external] [client-bounty-form] [verif?])

Attachments

(3 files)

testcase.html 3 years ago Irvan Kurniawan (:sourc7) 4.29 KB, text/html		Details
valgrind.full.log 3 years ago Irvan Kurniawan (:sourc7) 45.24 KB, text/x-log		Details
Disassembly per comment 8 3 years ago Julian Seward [:jseward] 3.68 KB, text/plain		Details

Irvan Kurniawan (:sourc7)

Reporter

Description

•

3 years ago

Attached file testcase.html — Details

After initialize a lot of new variable to null, then multiple times call new String to trigger GC? (probably), then after call array[x] = true the Valgrind show Invalid write of size 4 from JS stack.

Interestingly I got different Valgrind JS stack output when changing the array[x] = value value to another JS code (e.g. string).

When I run Firefox valgrind build from fuzzfetch, the Valgrind show invalid write of size 4 stack address with only a question mark ??? which doesn't provide clues to the root cause. However when compiling Firefox valgrind build with system clang, I got insightful stack output showing that Invalid write of size 4 is from JS stack.

When finding the regression I found Valgrind report Invalid write of size 4 from JS stack after checking out to commit Unify stack touch logic across all platforms.

Steps to reproduce:

Install Valgrind on Linux
Open Terminal
Run python3 -m pip install fuzzfetch to install Mozilla Fuzzfetch
Run fuzzfetch --valgrind to download Firefox Valgrind build
Open Firefox Valgrind directory
Run Valgrind with command valgrind --trace-children=yes --smc-check=all-non-file --vex-iropt-register-updates=allregs-at-mem-access --show-mismatched-frees=no --read-inline-info=yes firefox
Visit attached testcase.html
After page load is finish, Valgrind will show output "Invalid write of size 4"

Valgrind JS stack:

==3612== Thread 1 Web Content:
==3612== Invalid write of size 4
==3612==    at 0x147C0777148D: ???
==3612==    by 0x4F85D8F: ???
==3612==    by 0xFCEFFFF: slotForIndex (dist/include/mozilla/HashTable.h:1734)
==3612==    by 0xFCEFFFF: lookup<mozilla::detail::HashTable<mozilla::HashMapEntry<js::ScriptSourceChunk, mozilla::UniquePtr<void, JS::FreePolicy> >, mozilla::HashMap<js::ScriptSourceChunk, mozilla::UniquePtr<void, JS::FreePolicy>, js::ScriptSourceChunkHasher, js::SystemAllocPolicy>::MapHashPolicy, js::SystemAllocPolicy>::ForNonAdd> (dist/include/mozilla/HashTable.h:1750)
==3612==    by 0xFCEFFFF: readonlyThreadsafeLookup (dist/include/mozilla/HashTable.h:2074)
==3612==    by 0xFCEFFFF: lookup (dist/include/mozilla/HashTable.h:2079)
==3612==    by 0xFCEFFFF: lookup (dist/include/mozilla/HashTable.h:236)
==3612==    by 0xFCEFFFF: mozilla::Utf8Unit const* js::UncompressedSourceCache::lookup<mozilla::Utf8Unit>(js::ScriptSourceChunk const&, js::UncompressedSourceCache::AutoHoldEntry&) (js/src/vm/JSScript.cpp:1881)
==3612==    by 0x1FFEFFBE3F: ???
==3612==    by 0x1FFEFFBDEF: ???
==3612==    by 0x147C0777154E: ???
==3612==    by 0x1042: ???
==3612==    by 0x2857C1E51317: ???
==3612==  Address 0x1ffeffb540 is on thread 1's stack
==3612==  2048 bytes below stack pointer

Flags: sec-bounty?

Irvan Kurniawan (:sourc7)

Reporter

Comment 1

•

3 years ago

Attached file valgrind.full.log — Details

:Gijs (he/him)

Updated

•

3 years ago

Group: firefox-core-security → javascript-core-security

Component: Security → JavaScript Engine

Product: Firefox → Core

Andrew McCreight [:mccr8]

Comment 2

•

3 years ago

Matthew, as this might be a regression from an old patch of yours (bug 1488763), maybe you could take a look? Thanks.

Flags: needinfo?(mgaudet)

Julian Seward [:jseward]

Assignee

Comment 3

•

3 years ago

Writes at exactly 2048 bytes are generally stack probes of some kind (that is, deliberately compiled in), I have observed in the past. That said, I've run Fx (an m-c build) on V quite a lot recently (and, also, Mochitests) and I don't recall seeing any such reports.

Matthew Gaudet (he/him) [:mgaudet]

Comment 4

•

3 years ago

So, definitely we do intentional stack probes at 2048 byte intervals on all platforms.

Is the reason we see this on this test case just unluckiness triggering heuristics for valgrind? Or is this a legitimate issue because we've missed some precondition for touching the stack that we ought to have.

Could this be more of a Valgrind configuration issue than anything else?

Flags: needinfo?(mgaudet) → needinfo?(jseward)

Iain Ireland [:iain]

Comment 5

•

3 years ago

We currently do stack probes prior to moving the stack pointer:

    masm.touchFrameValues(numStackValues, scratch, framePtr);
    masm.mov(rsp, framePtr);

    // Reserve space for locals and stack values.
    Register valuesSize = regs.takeAny();
    masm.mov(numStackValues, valuesSize);
    masm.shll(Imm32(3), valuesSize);
    masm.subPtr(valuesSize, rsp);

I think the important question here is whether there's anything wrong with that decision. My expectation is that a) it's fine, b) this is a false positive in Valgrind, and c) we could consider refactoring this code to avoid the false positive, but I'm not 100% confident.

For reference, it looks like Rust's stack probing code modifies the stack pointer in the loop, then resets it afterwards.

Julian Seward [:jseward]

Assignee

Comment 6

•

3 years ago

I'll investigate. I have run both the JS shell and the browser quite a bit on V in
the past month or so, and I'm mystified as to why I haven't seen any such complaints.

Flags: needinfo?(jseward)

Daniel Veditz [:dveditz]

Comment 7

•

3 years ago

assigning to Julian for the investigation in comment 6

Assignee: nobody → jseward

Steven DeTar [:sdetar]

Updated

•

3 years ago

Severity: -- → N/A

Priority: -- → P1

Julian Seward [:jseward]

Assignee

Comment 8

•

3 years ago

I can reproduce this, using the supplied test case, on m-c
586586:3881437c0949.

In the attachment is a disassembly of the 256 bytes surrounding the offending
instruction. The instruction is part of the following loop

   // loop setup
   0x000013dd6a2a346e:  mov    %rsp,%rbp
   0x000013dd6a2a3471:  mov    %rdx,%rax
   0x000013dd6a2a3474:  shl    $0x3,%rax
   0x000013dd6a2a3478:  sub    %rax,%rbp
   0x000013dd6a2a347b:  mov    %rsp,%rax
   0x000013dd6a2a347e:  sub    $0x800,%rax

   // loop body
   0x000013dd6a2a3484:  cmp    %rbp,%rax
   0x000013dd6a2a3487:  jb     0x13dd6a2a349b
=> 0x000013dd6a2a348d:  movl   $0x0,(%rax)  <--- the offending insn
   0x000013dd6a2a3493:  sub    $0x800,%rax
   0x000013dd6a2a3499:  jmp    0x13dd6a2a3484

   // post-loop
   0x000013dd6a2a349b:  mov    %rsp,%rbp

which, as Iain points out, looks a lot like what
MacroAssembler::touchFrameValues() in jit/MacroAssembler.cpp would generate.

I don't think this is a bug in Gecko -- it looks quite deliberate to me. I
also wouldn't call it a bug in Valgrind, since, per the ELF x86_64 ABI, any
access at or below -128(%rsp) is illegal, so V is correct to report it.

On that last ABI-related point, it might be cleaner to use the use the Rust
scheme as linked to in comment 5. It strikes me that -- even if we think we
know what we're doing -- accessing memory in a place that, at least, the ELF
x86_64 ABI says is illegal, is pretty dubious, because we can't guarantee that
the Linux kernel isn't relying on the ABI-specified requirement in some way we
don't understand.

I'd say the Rust way -- move %rsp down incrementally, so that all of the probe
accesses are at 0(%rsp) or above, hence, per the ABI, legal, might be safer.

In any case, I would vote for closing as INVALID or at least significant
de-prioritisation of this bug.

Julian Seward [:jseward]

Assignee

Comment 9

•

3 years ago

Attached file Disassembly per comment 8 — Details

Iain Ireland [:iain]

Comment 10

•

3 years ago

(In reply to Julian Seward [:jseward] (PTO, back on 3 Aug 2021) from comment #8)

I don't think this is a bug in Gecko -- it looks quite deliberate to me. I
also wouldn't call it a bug in Valgrind, since, per the ELF x86_64 ABI, any
access at or below -128(%rsp) is illegal, so V is correct to report it.

Ah, this is the piece I was missing! Thanks.

I think this is a real bug, and we should fix it (probably by matching Rust's approach), but it's not P1 and I don't believe it's security sensitive. I'm going to open it up and adjust the priority.

Group: javascript-core-security

Priority: P1 → P3

Summary: Valgrind: Invalid write of size 4 from JS stack → Rewrite stack probes to update stack pointer first

Julian Seward [:jseward]

Assignee

Comment 11

•

3 years ago

we should fix it (probably by matching Rust's approach)

I should perhaps add: x86_64-linux is relatively speaking a minority platform,
so a perhaps stronger reason to fix it is because of arm64-android and
arm32-android: in both cases I believe the relevant ELF ABI does not allow
accesses at any address below the stack pointer.

Matthew Gaudet (he/him) [:mgaudet]

Updated

•

3 years ago

Blocks: sm-jits

Worcester12345

Comment 12

•

3 years ago

Is this a "perf" key word bug?

Iain Ireland [:iain]

Comment 13

•

3 years ago

No, this shouldn't have any measurable impact on performance.

Daniel Veditz [:dveditz]

Updated

•

2 years ago

Flags: sec-bounty? → sec-bounty-

Iain Ireland [:iain]

Updated

•

10 months ago

Status: NEW → RESOLVED

Closed: 10 months ago

Duplicate of bug: 1839669

Resolution: --- → DUPLICATE

You need to log in before you can comment on or make changes to this bug.