Rewrite stack probes to update stack pointer first
Categories
(Core :: JavaScript Engine, task, P3)
Tracking
()
People
(Reporter: sourc7, Assigned: jseward)
References
(Blocks 1 open bug)
Details
(Whiteboard: [reporter-external] [client-bounty-form] [verif?])
Attachments
(3 files)
After initialize a lot of new variable to null
, then multiple times call new String
to trigger GC? (probably), then after call array[x] = true
the Valgrind show Invalid write of size 4
from JS stack.
Interestingly I got different Valgrind JS stack output when changing the array[x] = value
value to another JS code (e.g. string).
When I run Firefox valgrind build from fuzzfetch, the Valgrind show invalid write of size 4 stack address with only a question mark ??? which doesn't provide clues to the root cause. However when compiling Firefox valgrind build with system clang, I got insightful stack output showing that Invalid write of size 4
is from JS stack.
When finding the regression I found Valgrind report Invalid write of size 4
from JS stack after checking out to commit Unify stack touch logic across all platforms.
Steps to reproduce:
- Install Valgrind on Linux
- Open Terminal
- Run
python3 -m pip install fuzzfetch
to install Mozilla Fuzzfetch - Run
fuzzfetch --valgrind
to download Firefox Valgrind build - Open Firefox Valgrind directory
- Run Valgrind with command
valgrind --trace-children=yes --smc-check=all-non-file --vex-iropt-register-updates=allregs-at-mem-access --show-mismatched-frees=no --read-inline-info=yes firefox
- Visit attached testcase.html
- After page load is finish, Valgrind will show output "Invalid write of size 4"
Valgrind JS stack:
==3612== Thread 1 Web Content:
==3612== Invalid write of size 4
==3612== at 0x147C0777148D: ???
==3612== by 0x4F85D8F: ???
==3612== by 0xFCEFFFF: slotForIndex (dist/include/mozilla/HashTable.h:1734)
==3612== by 0xFCEFFFF: lookup<mozilla::detail::HashTable<mozilla::HashMapEntry<js::ScriptSourceChunk, mozilla::UniquePtr<void, JS::FreePolicy> >, mozilla::HashMap<js::ScriptSourceChunk, mozilla::UniquePtr<void, JS::FreePolicy>, js::ScriptSourceChunkHasher, js::SystemAllocPolicy>::MapHashPolicy, js::SystemAllocPolicy>::ForNonAdd> (dist/include/mozilla/HashTable.h:1750)
==3612== by 0xFCEFFFF: readonlyThreadsafeLookup (dist/include/mozilla/HashTable.h:2074)
==3612== by 0xFCEFFFF: lookup (dist/include/mozilla/HashTable.h:2079)
==3612== by 0xFCEFFFF: lookup (dist/include/mozilla/HashTable.h:236)
==3612== by 0xFCEFFFF: mozilla::Utf8Unit const* js::UncompressedSourceCache::lookup<mozilla::Utf8Unit>(js::ScriptSourceChunk const&, js::UncompressedSourceCache::AutoHoldEntry&) (js/src/vm/JSScript.cpp:1881)
==3612== by 0x1FFEFFBE3F: ???
==3612== by 0x1FFEFFBDEF: ???
==3612== by 0x147C0777154E: ???
==3612== by 0x1042: ???
==3612== by 0x2857C1E51317: ???
==3612== Address 0x1ffeffb540 is on thread 1's stack
==3612== 2048 bytes below stack pointer
Reporter | ||
Comment 1•3 years ago
|
||
Updated•3 years ago
|
Comment 2•3 years ago
|
||
Matthew, as this might be a regression from an old patch of yours (bug 1488763), maybe you could take a look? Thanks.
Assignee | ||
Comment 3•3 years ago
|
||
Writes at exactly 2048 bytes are generally stack probes of some kind (that is, deliberately compiled in), I have observed in the past. That said, I've run Fx (an m-c build) on V quite a lot recently (and, also, Mochitests) and I don't recall seeing any such reports.
Comment 4•3 years ago
|
||
So, definitely we do intentional stack probes at 2048 byte intervals on all platforms.
Is the reason we see this on this test case just unluckiness triggering heuristics for valgrind? Or is this a legitimate issue because we've missed some precondition for touching the stack that we ought to have.
Could this be more of a Valgrind configuration issue than anything else?
Comment 5•3 years ago
|
||
We currently do stack probes prior to moving the stack pointer:
masm.touchFrameValues(numStackValues, scratch, framePtr);
masm.mov(rsp, framePtr);
// Reserve space for locals and stack values.
Register valuesSize = regs.takeAny();
masm.mov(numStackValues, valuesSize);
masm.shll(Imm32(3), valuesSize);
masm.subPtr(valuesSize, rsp);
I think the important question here is whether there's anything wrong with that decision. My expectation is that a) it's fine, b) this is a false positive in Valgrind, and c) we could consider refactoring this code to avoid the false positive, but I'm not 100% confident.
For reference, it looks like Rust's stack probing code modifies the stack pointer in the loop, then resets it afterwards.
Assignee | ||
Comment 6•3 years ago
|
||
I'll investigate. I have run both the JS shell and the browser quite a bit on V in
the past month or so, and I'm mystified as to why I haven't seen any such complaints.
Comment 7•3 years ago
|
||
assigning to Julian for the investigation in comment 6
Updated•3 years ago
|
Assignee | ||
Comment 8•3 years ago
|
||
I can reproduce this, using the supplied test case, on m-c
586586:3881437c0949.
In the attachment is a disassembly of the 256 bytes surrounding the offending
instruction. The instruction is part of the following loop
// loop setup
0x000013dd6a2a346e: mov %rsp,%rbp
0x000013dd6a2a3471: mov %rdx,%rax
0x000013dd6a2a3474: shl $0x3,%rax
0x000013dd6a2a3478: sub %rax,%rbp
0x000013dd6a2a347b: mov %rsp,%rax
0x000013dd6a2a347e: sub $0x800,%rax
// loop body
0x000013dd6a2a3484: cmp %rbp,%rax
0x000013dd6a2a3487: jb 0x13dd6a2a349b
=> 0x000013dd6a2a348d: movl $0x0,(%rax) <--- the offending insn
0x000013dd6a2a3493: sub $0x800,%rax
0x000013dd6a2a3499: jmp 0x13dd6a2a3484
// post-loop
0x000013dd6a2a349b: mov %rsp,%rbp
which, as Iain points out, looks a lot like what
MacroAssembler::touchFrameValues() in jit/MacroAssembler.cpp would generate.
I don't think this is a bug in Gecko -- it looks quite deliberate to me. I
also wouldn't call it a bug in Valgrind, since, per the ELF x86_64 ABI, any
access at or below -128(%rsp) is illegal, so V is correct to report it.
On that last ABI-related point, it might be cleaner to use the use the Rust
scheme as linked to in comment 5. It strikes me that -- even if we think we
know what we're doing -- accessing memory in a place that, at least, the ELF
x86_64 ABI says is illegal, is pretty dubious, because we can't guarantee that
the Linux kernel isn't relying on the ABI-specified requirement in some way we
don't understand.
I'd say the Rust way -- move %rsp down incrementally, so that all of the probe
accesses are at 0(%rsp) or above, hence, per the ABI, legal, might be safer.
In any case, I would vote for closing as INVALID or at least significant
de-prioritisation of this bug.
Assignee | ||
Comment 9•3 years ago
|
||
Comment 10•3 years ago
|
||
(In reply to Julian Seward [:jseward] (PTO, back on 3 Aug 2021) from comment #8)
I don't think this is a bug in Gecko -- it looks quite deliberate to me. I
also wouldn't call it a bug in Valgrind, since, per the ELF x86_64 ABI, any
access at or below -128(%rsp) is illegal, so V is correct to report it.
Ah, this is the piece I was missing! Thanks.
I think this is a real bug, and we should fix it (probably by matching Rust's approach), but it's not P1 and I don't believe it's security sensitive. I'm going to open it up and adjust the priority.
Assignee | ||
Comment 11•3 years ago
|
||
we should fix it (probably by matching Rust's approach)
I should perhaps add: x86_64-linux is relatively speaking a minority platform,
so a perhaps stronger reason to fix it is because of arm64-android and
arm32-android: in both cases I believe the relevant ELF ABI does not allow
accesses at any address below the stack pointer.
Comment 12•3 years ago
|
||
Is this a "perf" key word bug?
Comment 13•3 years ago
|
||
No, this shouldn't have any measurable impact on performance.
Updated•2 years ago
|
Updated•10 months ago
|
Description
•