crash entering into URL bar on ppc64le
Categories
(Core :: JavaScript Engine, task, P1)
Tracking
()
People
(Reporter: spectre, Assigned: spectre)
References
Details
Attachments
(2 files, 12 obsolete files)
47 bytes,
text/x-phabricator-request
|
lizzard
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-esr68+
|
Details | Review |
47 bytes,
text/x-phabricator-request
|
lizzard
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-esr68+
|
Details | Review |
Entering any text into the URL bar causes a crash in current opt (-O3
non-debug) builds on ppc64le from trunk. This seems to be completely unrelated to bug 1512162. Brief backtrace follows, full crash signature provided.
@tcampbell, does any of this look familiar? Any places we should start looking? This smells like some sort of memory barrier issue. Interestingly, -O3
but with debug on works fine.
% hg bisect --bad
The first bad revision is:
changeset: 482748:552b13ce8016
user: Ted Campbell <tcampbell@mozilla.com>
date: Mon Jul 15 05:44:14 2019 +0000
summary: Bug 1565556 - Remove JSScript::hasTryNotes/ScopeNotes/ResumeOffsets. r=jandem
Thread 1 "firefox" received signal SIGSEGV, Segmentation fault.
0x00007ffff02a7040 in js::gc::Cell::storeBuffer (this=0x7fffffffa868)
at /home/spectre/src/mozilla-central/js/src/gc/Cell.h:281
281 return chunk()->trailer.storeBuffer;
[...]
#0 0x00007ffff02a7040 in js::gc::Cell::storeBuffer() const
(this=0x7fffffffa868)
at /home/spectre/src/mozilla-central/js/src/gc/Cell.h:281
#1 0x00007ffff02a7040 in js::InternalBarrierMethods<JS::Value>::postBarrier(JS::Value*, JS::Value const&, JS::Value const&)
(next=..., prev=..., vp=0x7fffffffa5f0)
at /home/spectre/src/mozilla-central/js/src/gc/Barrier.h:352
#2 0x00007ffff02a7040 in js::WriteBarriered<JS::Value>::post(JS::Value const&, JS::Value const&) (next=..., prev=..., this=0x7fffffffa5f0)
at /home/spectre/src/mozilla-central/js/src/gc/Barrier.h:456
#3 0x00007ffff02a7040 in js::HeapPtr<JS::Value>::HeapPtr(JS::Value const&)
(v=..., this=0x7fffffffa5f0)
at /home/spectre/src/mozilla-central/js/src/gc/Barrier.h:595
#4 0x00007ffff02a7040 in js::OrderedHashMap<js::HashableValue, js::HeapPtr<JS::Value>, js::HashableValue::Hasher, js::ZoneAllocPolicy>::Entry::Entry<JS::Handle<JS::Value> >(js::HashableValue const&, JS::Handle<JS::Value>&&)
(v=..., k=..., this=0x7fffffffa598)
at /home/spectre/src/mozilla-central/js/src/ds/OrderedHashTable.h:768
#5 0x00007ffff02a7040 in js::OrderedHashMap<js::HashableValue, js::HeapPtr<JS::Value>, js::HashableValue::Hasher, js::ZoneAllocPolicy>::put<JS::Handle<JS::Value> >(js::HashableValue const&, JS::Handle<JS::Value>&&)
(value=..., key=..., this=0x7fffba215ab0)
at /home/spectre/src/mozilla-central/js/src/ds/OrderedHashTable.h:811
#6 0x00007ffff02a7040 in js::MapObject::set_impl(JSContext*, JS::CallArgs const&) (cx=0x7fffd9849000, args=...)
at /home/spectre/src/mozilla-central/js/src/builtin/MapObject.cpp:785
#7 0x00007ffff02a7d18 in JS::CallNonGenericMethod<&js::MapObject::is, &js::MapObject::set_impl>(JSContext*, JS::CallArgs const&)
(args=..., cx=<optimized out>)
at /home/spectre/src/mozilla-central/js/src/vm/NativeObject.h:1425
#8 0x00007ffff02a7d18 in js::MapObject::set(JSContext*, unsigned int, JS::Value*) (cx=<optimized out>, argc=<optimized out>, vp=<optimized out>)
at /home/spectre/src/mozilla-central/js/src/builtin/MapObject.cpp:796
#9 0x00007ffff021d43c in CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), JS::CallArgs const&)
(args=..., native=0x7ffff02a7be0 <js::MapObject::set(JSContext*, unsigned int, JS::Value*)>, cx=0x7fffd9849000)
at /home/spectre/src/mozilla-central/obj-powerpc64le-unknown-linux-gnu/dist/include/js/CallArgs.h:284
#10 0x00007ffff021d43c in js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct) (cx=0x7fffd9849000, args=..., construct=<optimized out>) at /home/spectre/src/mozilla-central/js/src/vm/Interpreter.cpp:539
#11 0x00007ffff020f9a0 in js::CallFromStack(JSContext*, JS::CallArgs const&) (args=..., cx=<optimized out>) at /home/spectre/src/mozilla-central/js/src/vm/Interpreter.cpp:3084
#12 0x00007ffff020f9a0 in Interpret(JSContext*, js::RunState&) (cx=0x7fffd9849000, state=...) at /home/spectre/src/mozilla-central/js/src/vm/Interpreter.cpp:3084
Assignee | ||
Comment 1•5 years ago
|
||
A simple-minded backout crashes with the same signature.
Comment 2•5 years ago
|
||
That's a bit surprising to me. Some quick googling doesn't show any sort of weird size_t/uintptr_t definitions for ppc64le and 64-bit little-endian is our best tested configuration.
Looking at the backtrace, it shows the crash occurs when 'Cell*' is 0x7fffffffa868 which is almost certainly non-sense. The GC allocated Cells must be inside 1MB chunk allocations and then we use pointer tricks to find the chunk trailer from a given Cell *.
Between Frame #4 and #5 the 'this' pointer for the same js::OrderedHashMap changes. Start your investigation there and see if you can determine why the code or the compile managed to screw that up.
Comment 3•5 years ago
|
||
Oops.. That is OrderedHashMap vs OrderedHashMap::Entry. It still seems worth poking in that area.
Looking at the definition of OrderedHashMap there are definitely some code smells about strict C++ object model issues. There might be some funny business of move-references being abused. That might explain why we try and do GC-related operations on a stack address. Will follow up with GC / C++ knowledgeable folks.
Assignee | ||
Comment 4•5 years ago
|
||
Building at -O2
does seem to mask it, which does suggest there is some corner case or maybe even subtle UB that gets tickled (and that the offending revision probably just unmasked some problem that was already there).
Assignee | ||
Comment 5•5 years ago
|
||
I tried poking around some corners but didn't get much traction since I have to confess I don't understand this code very well. In the short term as a bandaid we could adopt a deoptimization solution temporarily (as we did in bug 1512162, even though this isn't the same root cause), but I wasn't able to get the scope much smaller than gc/Barrier.h
. Dan, does this fix the crash for you? Are you able to minimize the scope of the #pragma
any further? It seemed on my system I needed to have the headers covered by the #pragma
too but I'm loathe to add it to gc/Cell.h
also if I can avoid it.
Comment 6•5 years ago
|
||
Looking at this a little closer, it doesn't seem like the stack-copy/move-reference thing I thought might be happening is actually happening. The JS::Value (which is a union of double/int/Cell*, etc) seems to think things are Object/String but the address is a stack address. There should not be any cases where a JS::Value points to a stack so this seems like more problems similar to Bug 1512162.
One thing that does seem a bit suspicious in js/src/ds/OrderedHashTable.h is that we really are supposed to be using the return values of placement-new instead of the original buffer pointer. I'd be surprised if this changes the results but it might be interesting.
What version of GCC is being used for the ppc64le builds?
Assignee | ||
Comment 7•5 years ago
|
||
What lines in OrderedHashTable.h
do you mean, specifically? As for the compiler,
% gcc --version
gcc (GCC) 9.1.1 20190503 (Red Hat 9.1.1-1)
Copyright (C) 2019 Free Software Foundation, Inc.
This is free software; see the source for copying conditions. There is NO
warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
Assignee | ||
Comment 8•5 years ago
|
||
I assume you meant something like this. No, it didn't seem to make any difference to the crash or signature.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 9•5 years ago
|
||
In the absence of a clear explanation why, I will submit the bandaid fix so that at least it can be "unbroken" a la bug 1512162. Maybe it really is a compiler bug but I can't reduce it enough to figure it out.
Assignee | ||
Comment 10•5 years ago
|
||
Assignee | ||
Comment 11•5 years ago
|
||
@tcampbell, please let me know if I should redirect the review.
Assignee | ||
Comment 12•5 years ago
|
||
Thanks! (clearing n-i, marking leave-open so I can continue to work on finding a better solution)
Comment 13•5 years ago
|
||
Sorry about that. I apparently got distracted before hitting the submit problem.
The placement-new patch was what I had in mind. Thanks for ruling it out. I'm not sure what may be upsetting compilers still.
I'd be curious if scoping down to Value.h still solves the problem. In particular, the bitsFromDouble and setDouble methods.
Comment 14•5 years ago
|
||
https://bugzilla.mozilla.org/show_bug.cgi?id=1312488 is something we've run into in the past but the Value code has changed a bit this year and is no longer a union.
Assignee | ||
Comment 15•5 years ago
|
||
Thanks, I'll look into that further. It has some commonalities with bug 1512162. I have to figure out how bug 1551313 broke the build first, though.
Comment 16•5 years ago
|
||
Pushed by btara@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4559e60f4dde
Temporary compiler workaround for crashes on ppc64le. r=tcampbell
Comment 17•5 years ago
|
||
bugherder |
Assignee | ||
Comment 18•5 years ago
|
||
I reduced it down to this block, pretty much the mutators. (I intentionally pulled the tree before the prior workaround landed.) Does anything jump out at you from these functions? It looks like they all manipulate asBits_
. Thanks for your help on this since I don't know this code well.
Assignee | ||
Comment 19•5 years ago
|
||
Just for laughs, I changed no-inline-functions
to no-strict-aliasing
and that, also, fixes it. Is there something odd about that big union that it doesn't like?
Assignee | ||
Comment 20•5 years ago
|
||
@tcampbell: Okay, I'm at the point I'm going to need a little guidance. With no-strict-aliasing
I was able to reduce the scope down to toTag()
. I proved this was the bad function by moving the scope to another function and watching it crash, then move it back and it didn't. I then looked at the assembly by throwing a trap
instruction at the function itself. With strict aliasing on, the code at PC looks like this:
0x00007ffff01da104 <+276>: b 0x7ffff01da11c <js::NativeObject::initializeSlotRange(unsigned int, unsigned int)+300>
// end of prior basic block
0x00007ffff01da108 <+280>: nop
0x00007ffff01da10c <+284>: nop
0x00007ffff01da110 <+288>: addi r29,r29,8
0x00007ffff01da114 <+292>: cmpld r29,r23
0x00007ffff01da118 <+296>: bge 0x7ffff01da1fc <js::NativeObject::initializeSlotRange(unsigned int, unsigned int)+524>
0x00007ffff01da11c <+300>: addi r8,r30,1
0x00007ffff01da120 <+304>: mr r24,r30
0x00007ffff01da124 <+308>: std r20,0(r29)
0x00007ffff01da128 <+312>: clrldi r30,r8,32
=> 0x00007ffff01da12c <+316>: trap
0x00007ffff01da130 <+320>: ld r9,0(r29)
0x00007ffff01da134 <+324>: rldicl r10,r9,17,47
Without strict aliasing (i.e., no-strict-aliasing
), it looks like this -- which is the entirety of the function:
=> 0x00007ffff01b3110 <+0>: trap
0x00007ffff01b3114 <+4>: ld r3,0(r3)
0x00007ffff01b3118 <+8>: rldicl r3,r3,17,47
0x00007ffff01b311c <+12>: blr
You wouldn't have to know PowerPC assembly to see that the generated code (for toTag
at least) is pretty much the same: it's taking asBits_
and rotating it. The only difference is that with strict aliasing gcc
seems to assume it knows enough to inline the function, and that's why no-inline-functions
also (in a wider scope) fixes the issue. (Btw, I don't understand why it's storing r20
to 0(r29)
and then immediately reading that value; that does suggest a compiler bug when it could simply have done mr r9,r20
. The trap
wouldn't have caused it to spill because other instructions didn't spill their own registers.)
As you can see from the WIP above I tried a whole bunch of ways of rewriting the function and either got a compiler error or it still crashed. However, slapping __attribute__((noinline,noclone))
as a function decorator also solves the problem without having to use any #pragma
s. That's the current working solution.
What I need to know: is there another way to rewrite this? Or do you think the attribute solution is the best option? I'll prep a patch either way to backout the temporary bandaid and put in this better idea, whichever it happens to be.
Comment 21•5 years ago
|
||
You mention no-strict-aliasing, but SpiderMonkey should be compiled with -fno-strict-aliasing by default. Did you turn that off? Or why isn't that sufficient to stop this?
Comment 22•5 years ago
|
||
bugherder uplift |
Assignee | ||
Comment 23•5 years ago
|
||
No, I didn't turn it off. When I run gmake
in obj.../js/src
, I see it in the command line options to gcc
but the pragma
is clearly changing the generated code even with it already "off." I don't have an explanation for this; should I just write this off as a weird side effect of messing with the compiler?
Again, the idea would be pretty much just
private:
+#if (__GNUC__ && __linux__ && __PPC64__ && _LITTLE_ENDIAN)
+__attribute__((noinline,noclone))
+#endif
JSValueTag toTag() const {
Would that be acceptable? Like I say, the generated code doesn't quite make sense in the inlined case, so maybe it's a compiler bug after all.
Comment 24•5 years ago
|
||
Thanks for continuing to investigate.
I'd agree that the attribute is preferable to the pragma (particularly on the barrier file). The strict-aliasing thing is certainly weird and as Jan says, firefox generally is far from supporting -fstrict-aliasing still and our options should turn it off.
There is still some C++ badness in Value.h and I will attach a patch that turns JS::Value from a union to a struct that should be a lot more defined by the letter of the language. Can you test this patch and see if it helps your case? I would need to do some slight cleanup before landing, but the general approach should be fine for all platforms.
Comment 25•5 years ago
|
||
Does this patch help at all? It removes the most egregious type-punning code and tries to be closer to pedantic c++.
Assignee | ||
Comment 26•5 years ago
|
||
That looks like it would avoid a lot of the weirdness. I'm out of town on business until Friday but I will try it first thing when I get back.
Comment 27•5 years ago
|
||
I get an immediate crash during startup with the JS::Value patch applied (and Cameron's workaround removed)
#0 0x00007fffb3898d48 in __libc_signal_restore_set (set=0x7ffff6311c38) at ../sysdeps/unix/sysv/linux/internal-signals.h:84
#1 0x00007fffb3898d48 in raise (sig=<optimized out>) at ../sysdeps/unix/sysv/linux/raise.c:48
#2 0x00007fffa80f56fc in nsProfileLock::FatalSignalHandler(int, siginfo_t*, void*) (signo=<optimized out>, info=<optimized out>, context=0x7ffff6311f20)
at /mnt/dan/firefox.git/toolkit/profile/nsProfileLock.cpp:169
#3 0x00007fffa8ab80fc in js::UnixExceptionHandler(int, siginfo_t*, void*) (signum=<optimized out>, info=0x7ffff6312c98, context=0x7ffff6311f20)
at /mnt/dan/firefox.git/js/src/ds/MemoryProtectionExceptionHandler.cpp:273
#4 0x00007fffb39004e8 in <signal handler called> () at arch/powerpc/kernel/vdso64/sigtramp.S
#5 0x00007fffa03b5b58 in JS::Value::setObject(JSObject&) (this=<optimized out>, obj=...) at /mnt/dan/firefox.git/obj-powerpc64le-unknown-linux-gnu/dist/include/js/Value.h:450
#6 0x00007fffa857b810 in JS::Value::setObjectOrNull(JSObject*) (arg=<optimized out>, this=<optimized out>)
at /mnt/dan/firefox.git/obj-powerpc64le-unknown-linux-gnu/dist/include/js/Value.h:501
#7 0x00007fffa857b810 in JS::ObjectOrNullValue (obj=<optimized out>) at /mnt/dan/firefox.git/obj-powerpc64le-unknown-linux-gnu/dist/include/js/Value.h:1045
#8 0x00007fffa857b810 in js::EnvironmentObject::initEnclosingEnvironment(JSObject*) (this=0x3027bd22e040, enclosing=<optimized out>)
at /mnt/dan/firefox.git/js/src/vm/EnvironmentObject.h:277
#9 0x00007fffa85589c4 in js::LexicalEnvironmentObject::createTemplateObject(JSContext*, JS::Handle<js::Shape*>, JS::Handle<JSObject*>, js::gc::InitialHeap) (cx=<optimized out>, shape=...,
shape@entry=..., enclosing=..., heap=heap@entry=js::gc::TenuredHeap) at /mnt/dan/firefox.git/js/src/vm/EnvironmentObject.cpp:928
#10 0x00007fffa855aafc in js::LexicalEnvironmentObject::createGlobal(JSContext*, JS::Handle<js::GlobalObject*>) (cx=<optimized out>, global=...)
at /mnt/dan/firefox.git/obj-powerpc64le-unknown-linux-gnu/dist/include/js/RootingAPI.h:1213
#11 0x00007fffa85af230 in js::GlobalObject::createInternal(JSContext*, JSClass const*) (cx=0x7fff9d249000, clasp=<optimized out>)
at /mnt/dan/firefox.git/obj-powerpc64le-unknown-linux-gnu/dist/include/js/RootingAPI.h:1213
#12 0x00007fffa8844a70 in JSRuntime::createSelfHostingGlobal(JSContext*) (cx=0x7fff9d249000) at /mnt/dan/firefox.git/js/src/vm/SelfHosting.cpp:2664
#13 0x00007fffa8867e58 in JSRuntime::initSelfHosting(JSContext*) (this=<optimized out>, cx=<optimized out>) at /mnt/dan/firefox.git/js/src/vm/SelfHosting.cpp:2775
#14 0x00007fffa8b1d578 in JS::InitSelfHostedCode(JSContext*) (cx=0x7fff9d249000) at /mnt/dan/firefox.git/js/src/jsapi.cpp:434
#15 0x00007fffa1b88280 in nsXPConnect::InitStatics() () at /mnt/dan/firefox.git/js/xpconnect/src/nsXPConnect.cpp:141
#16 0x00007fffa1b20470 in xpcModuleCtor() () at /mnt/dan/firefox.git/js/xpconnect/src/XPCModule.cpp:11
#17 0x00007fffa6bd9fdc in nsLayoutModuleInitialize() () at /mnt/dan/firefox.git/layout/build/nsLayoutModule.cpp:107
#18 0x00007fffa033db64 in nsComponentManagerImpl::Init() (this=0x7fff9eac0300) at /mnt/dan/firefox.git/xpcom/components/nsComponentManager.cpp:493
#19 0x00007fffa03d9cbc in NS_InitXPCOM(nsIServiceManager**, nsIFile*, nsIDirectoryServiceProvider*)
(aResult=0x7fffb0050a80, aBinDirectory=<optimized out>, aAppFileLocationProvider=<optimized out>) at /mnt/dan/firefox.git/xpcom/build/XPCOMInit.cpp:444
#20 0x00007fffa8110318 in ScopedXPCOMStartup::Initialize() (this=0x7fffb0050a80) at /mnt/dan/firefox.git/obj-powerpc64le-unknown-linux-gnu/dist/include/nsCOMPtr.h:839
#21 0x00007fffa8122cec in XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) (this=0x7ffff6313848,
this@entry=0x7ffff6313868, argc=argc@entry=4, argv=argv@entry=0x7ffff6314ed8, aConfig=...) at /mnt/dan/firefox.git/obj-powerpc64le-unknown-linux-gnu/dist/include/mozilla/UniquePtr.h:308
#22 0x00007fffa81235e0 in XRE_main(int, char**, mozilla::BootstrapConfig const&) (argc=<optimized out>, argv=0x7ffff6314ed8, aConfig=...)
at /mnt/dan/firefox.git/toolkit/xre/nsAppRunner.cpp:4800
#23 0x00007fffa81336a8 in mozilla::BootstrapImpl::XRE_main(int, char**, mozilla::BootstrapConfig const&) (this=<optimized out>, argc=<optimized out>, argv=<optimized out>, aConfig=...)
at /mnt/dan/firefox.git/toolkit/xre/Bootstrap.cpp:45
#24 0x000000012bbf7bf4 in do_main(int, char**, char**) (argc=<optimized out>, argv=0x7ffff6314ed8, envp=<optimized out>) at /mnt/dan/firefox.git/browser/app/nsBrowserApp.cpp:218
#25 0x000000012bbf7d24 in main(int, char**, char**) (argc=<optimized out>, argv=0x7ffff6314ed8, envp=0x7ffff6314f00) at /mnt/dan/firefox.git/browser/app/nsBrowserApp.cpp:300
Comment 28•5 years ago
|
||
(It isn't at all obvious in Bugzilla, but the Signature is for things that correspond to https://crash-stats.mozilla.org/ )
Comment 29•5 years ago
|
||
Thanks for testing that out Dan!
Can either of you let me know the build steps you used, or see if your same build process / distro / compiler also reproduces on x64. The types of failures we are seeing don't seem very ppc specific and I wonder if this is a cross-platform UB issue that GCC is biting us on.
Comment 30•5 years ago
|
||
This is on Fedora 30 for ppc64le, gcc-9.2.1-1.fc30.ppc64le and the .mozconfig
used for ./mach build
on today's trunk is
export CC=/usr/bin/gcc
export CXX=/usr/bin/g++
export PATH=$HOME/.cargo/bin:$PATH
mk_add_options MOZ_MAKE_FLAGS="-j32"
ac_add_options --enable-application=browser
ac_add_options --enable-optimize="-Og"
ac_add_options --enable-debug
ac_add_options --disable-release
ac_add_options --disable-tests
ac_add_options --enable-linker=bfd
Assignee | ||
Comment 31•5 years ago
|
||
I'm heading back to my home base and should be able to analyse this a little further this evening.
Assignee | ||
Comment 32•5 years ago
•
|
||
Approximately the same as Dan's, F30, gcc 9.2.1-1. My optimize flags are -Og -mcpu=power9
for a debug build but that shouldn't affect this.
I see the startup crash, but it's actually an assertion: Assertion failure: &toObject() == &obj, at /home/spectre/src/mozilla-central/obj-powerpc64le-unknown-linux-gnu/dist/include/js/Value.h:450
. The backtrace is the same as Dan's, namely,
#0 0x00007fffe3e35f0c in JS::Value::setObject(JSObject&)
(this=<optimized out>, obj=...)
at /home/spectre/src/mozilla-central/obj-powerpc64le-unknown-linux-gnu/dist/include/js/Value.h:450
#1 0x00007fffebfe99a8 in JS::Value::setObjectOrNull(JSObject*)
(arg=<optimized out>, this=<optimized out>)
at /home/spectre/src/mozilla-central/obj-powerpc64le-unknown-linux-gnu/dist/include/js/Value.h:501
#2 0x00007fffebfe99a8 in JS::ObjectOrNullValue (obj=<optimized out>)
at /home/spectre/src/mozilla-central/obj-powerpc64le-unknown-linux-gnu/dist/include/js/Value.h:1045
#3 0x00007fffebfe99a8 in js::EnvironmentObject::initEnclosingEnvironment(JSObject*) (this=0x21e38f52e040, enclosing=<optimized out>)
at /home/spectre/src/mozilla-central/js/src/vm/EnvironmentObject.h:277
#4 0x00007fffebfc4e00 in js::LexicalEnvironmentObject::createTemplateObject(JSContext*, JS::Handle<js::Shape*>, JS::Handle<JSObject*>, js::gc::InitialHeap)
(cx=<optimized out>, shape=...,
shape@entry=..., enclosing=..., heap=heap@entry=js::gc::TenuredHeap)
at /home/spectre/src/mozilla-central/js/src/vm/EnvironmentObject.cpp:928
This actually occurs with or without my prior workaround.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 33•5 years ago
|
||
I read the instruction wrong, so I'm redoing the analysis. In the function prologue r4
which is obj
gets moved to r30
. (r3
is this
.)
0x00007fffe3e35c20 <+0>: addis r2,r12,4020
0x00007fffe3e35c24 <+4>: addi r2,r2,3296
0x00007fffe3e35c28 <+8>: mflr r0
0x00007fffe3e35c2c <+12>: std r0,16(r1)
0x00007fffe3e35c30 <+16>: std r30,-16(r1)
0x00007fffe3e35c34 <+20>: std r31,-8(r1)
0x00007fffe3e35c38 <+24>: stdu r1,-48(r1)
0x00007fffe3e35c3c <+28>: mr r31,r1
=> 0x00007fffe3e35c40 <+32>: mr r30,r4
The shift and store section looks like this (there are trap
instructions between the three lines in setObject
), essentially the entire body of setObjectNoCheck
.
=> 0x00007fffe3e35d04 <+228>: trap
0x00007fffe3e35d08 <+232>: li r9,-1
0x00007fffe3e35d0c <+236>: mr r10,r30
0x00007fffe3e35d10 <+240>: rldimi r10,r9,49,0
0x00007fffe3e35d14 <+244>: std r10,0(r3)
0x00007fffe3e35d18 <+248>: trap
0x00007fffe3e35d1c <+252>: bl 0x7fffe3e35a8c <JS::Value::toObject() const+8>
r9
gets loaded with all one bits. The object is loaded into r10
as a mask target. It shifts the all-ones left 49 bit positions and then inserts it into r10
with another mask of all one-bits and logical-ORs it with the result. It then stores it into asBits_
. The assertion follows the second trap
.
(gdb) set $pc+=4
(gdb) si
0x00007fffe3e35d0c 457 asBits_ = bitsFromTagAndPayload(JSVAL_TAG_OBJECT, PayloadType(obj));
(gdb) i reg r9
r9 0xffffffffffffffff 18446744073709551615
(gdb) si
0x00007fffe3e35d10 457 asBits_ = bitsFromTagAndPayload(JSVAL_TAG_OBJECT, PayloadType(obj));
(gdb) i reg r10 r30
r10 0x2f9b43a2d060 52343901180000
r30 0x2f9b43a2d060 52343901180000
(gdb) si
0x00007fffe3e35d14 in JS::Value::setObjectNoCheck (obj=0x2f9b43a2d060,
this=0x7fffffffc5c0)
at /home/spectre/src/mozilla-central/obj-powerpc64le-unknown-linux-gnu/dist/include/js/Value.h:457
457 asBits_ = bitsFromTagAndPayload(JSVAL_TAG_OBJECT, PayloadType(obj));
(gdb) i reg r10
r10 0xfffe2f9b43a2d060 18446233467657310304
(gdb) si
JS::Value::setObject (this=0x7fffffffc5c0, obj=...)
at /home/spectre/src/mozilla-central/obj-powerpc64le-unknown-linux-gnu/dist/include/js/Value.h:451
451 __asm__("trap\n");
(gdb) x/16dx $r3
0x7fffffffc5c0: 0x43a2d060 0xfffe2f9b 0x98542100 0x2358c2e9
[...]
0x00007fffe3e35d20 in JS::Value::setObject (this=0xfffe2f9b43a32f9c, obj=...)
at /home/spectre/src/mozilla-central/obj-powerpc64le-unknown-linux-gnu/dist/include/js/Value.h:452
452 MOZ_ASSERT(&toObject() == &obj);
(gdb) disas $pc, $pc+0x30
Dump of assembler code from 0x7fffe3e35d20 to 0x7fffe3e35d50:
=> 0x00007fffe3e35d20 <JS::Value::setObject(JSObject&)+256>: nop
0x00007fffe3e35d24 <JS::Value::setObject(JSObject&)+260>: cmpd r30,r3
0x00007fffe3e35d28 <JS::Value::setObject(JSObject&)+264>: bne 0x7fffe3e35d8c <JS::Value::setObject(JSObject&)+364>
0x00007fffe3e35d2c <JS::Value::setObject(JSObject&)+268>: addi r1,r31,48
0x00007fffe3e35d30 <JS::Value::setObject(JSObject&)+272>: ld r0,16(r1)
0x00007fffe3e35d34 <JS::Value::setObject(JSObject&)+276>: mtlr r0
0x00007fffe3e35d38 <JS::Value::setObject(JSObject&)+280>: ld r30,-16(r1)
0x00007fffe3e35d3c <JS::Value::setObject(JSObject&)+284>: ld r31,-8(r1)
0x00007fffe3e35d40 <JS::Value::setObject(JSObject&)+288>: blr
0x00007fffe3e35d44 <JS::Value::setObject(JSObject&)+292>: li r9,0
0x00007fffe3e35d48 <JS::Value::setObject(JSObject&)+296>: b 0x7fffe3e35cfc <JS::Value::setObject(JSObject&)+220>
0x00007fffe3e35d4c <JS::Value::setObject(JSObject&)+300>: li r5,448
End of assembler dump.
(gdb) i reg r3
r3 0xfffe2f9b43a32f9c 18446233467657334684
(gdb) i reg r30
r30 0x2f9b43a2d060 52343901180000
r3
is the return value from toObject()
but it still looks like a Value
, not an object pointer, and the comparison fails. It doesn't even look like the right value.
Assignee | ||
Comment 34•5 years ago
|
||
@tcampbell, I guess I'm not understanding what toPointer()
does. Is that an actual object pointer? It's returning what looks like a Value
complete with tag intact. gdb
session:
r30 0x7f3b82d060 546459275360
(gdb) i reg r10
r10 0xfffe007f3b82d060 18446181670215405664
(gdb) cont
Continuing.
Thread 1 "firefox" hit Breakpoint 1, 0x00007fffe3e35a8c in JS::Value::toObject
(this=0x7fffffffc5c0)
at /home/spectre/src/mozilla-central/obj-powerpc64le-unknown-linux-gnu/dist/include/js/Value.h:728
728 JSObject& toObject() const {
(gdb) i reg r3
r3 0x7fffffffc5c0 140737488340416
(gdb) x/16dx $r3
0x7fffffffc5c0: 0x3b82d060 0xfffe007f 0xf74b2900 0x73b631b9 <<< asBits_ in the first two words, OK.
0x7fffffffc5d0: 0x3b82e040 0x0000007f 0xffffc5e0 0x00007fff
0x7fffffffc5e0: 0xffffc670 0x00007fff 0x00000000 0x00000000
0x7fffffffc5f0: 0xebfc5d28 0x00007fff 0xffffc600 0x00007fff
(gdb) disp/i $pc
1: x/i $pc
=> 0x7fffe3e35a8c <JS::Value::toObject() const+8>: std r31,-8(r1)
(gdb) si
0x00007fffe3e35a90 728 JSObject& toObject() const {
1: x/i $pc
=> 0x7fffe3e35a90 <JS::Value::toObject() const+12>: stdu r1,-48(r1)
(gdb)
0x00007fffe3e35a94 728 JSObject& toObject() const {
1: x/i $pc
=> 0x7fffe3e35a94 <JS::Value::toObject() const+16>: mr r31,r1
(gdb)
JS::Value::isObject (this=0x7fffffffc5c0)
at /home/spectre/src/mozilla-central/obj-powerpc64le-unknown-linux-gnu/dist/include/js/Value.h:729
729 MOZ_ASSERT(isObject());
1: x/i $pc
=> 0x7fffe3e35a98 <JS::Value::toObject() const+20>: ld r9,0(r3)
(gdb)
0x00007fffe3e35a9c 729 MOZ_ASSERT(isObject());
1: x/i $pc
=> 0x7fffe3e35a9c <JS::Value::toObject() const+24>: rldicl r10,r9,17,47
(gdb) i reg r9
r9 0xfffe007f3b82d060 18446181670215405664
(gdb) si
0x00007fffe3e35aa0 729 MOZ_ASSERT(isObject());
1: x/i $pc
=> 0x7fffe3e35aa0 <JS::Value::toObject() const+28>: li r8,-1
(gdb)
0x00007fffe3e35aa4 729 MOZ_ASSERT(isObject());
1: x/i $pc
=> 0x7fffe3e35aa4 <JS::Value::toObject() const+32>: rlwinm r8,r8,0,15,29
(gdb)
0x00007fffe3e35aa8 729 MOZ_ASSERT(isObject());
1: x/i $pc
=> 0x7fffe3e35aa8 <JS::Value::toObject() const+36>: cmpld r10,r8
(gdb)
0x00007fffe3e35aac 729 MOZ_ASSERT(isObject());
1: x/i $pc
=> 0x7fffe3e35aac <JS::Value::toObject() const+40>:
bgt 0x7fffe3e35af4 <JS::Value::toObject() const+112>
(gdb)
JS::Value::toObject (this=0x7fffffffc5c0)
at /home/spectre/src/mozilla-central/obj-powerpc64le-unknown-linux-gnu/dist/include/js/Value.h:616
616 return asBits_ >= JSVAL_SHIFTED_TAG_OBJECT;
1: x/i $pc
=> 0x7fffe3e35ab0 <JS::Value::toObject() const+44>: lis r10,-3
(gdb)
0x00007fffe3e35ab4 616 return asBits_ >= JSVAL_SHIFTED_TAG_OBJECT;
1: x/i $pc
=> 0x7fffe3e35ab4 <JS::Value::toObject() const+48>: ori r10,r10,65535
(gdb)
0x00007fffe3e35ab8 616 return asBits_ >= JSVAL_SHIFTED_TAG_OBJECT;
1: x/i $pc
=> 0x7fffe3e35ab8 <JS::Value::toObject() const+52>: rldicr r10,r10,32,31
(gdb)
0x00007fffe3e35abc 616 return asBits_ >= JSVAL_SHIFTED_TAG_OBJECT;
1: x/i $pc
=> 0x7fffe3e35abc <JS::Value::toObject() const+56>: oris r10,r10,65535
(gdb)
0x00007fffe3e35ac0 616 return asBits_ >= JSVAL_SHIFTED_TAG_OBJECT;
1: x/i $pc
=> 0x7fffe3e35ac0 <JS::Value::toObject() const+60>: ori r10,r10,65535
(gdb)
0x00007fffe3e35ac4 616 return asBits_ >= JSVAL_SHIFTED_TAG_OBJECT;
1: x/i $pc
=> 0x7fffe3e35ac4 <JS::Value::toObject() const+64>: cmpld r9,r10
(gdb)
0x00007fffe3e35ac8 616 return asBits_ >= JSVAL_SHIFTED_TAG_OBJECT;
1: x/i $pc
=> 0x7fffe3e35ac8 <JS::Value::toObject() const+68>:
ble 0x7fffe3e35b3c <JS::Value::toObject() const+184>
(gdb)
731 MOZ_ASSERT((asBits_ & detail::ValueGCThingPayloadMask) != 0);
1: x/i $pc
=> 0x7fffe3e35acc <JS::Value::toObject() const+72>: ld r3,0(r3)
(gdb) si
0x00007fffe3e35ad0 731 MOZ_ASSERT((asBits_ & detail::ValueGCThingPayloadMask) != 0);
1: x/i $pc
=> 0x7fffe3e35ad0 <JS::Value::toObject() const+76>: clrldi. r9,r3,17
(gdb) i reg r3
r3 0xfffe007f3b82d060 18446181670215405664
(gdb) si
0x00007fffe3e35ad4 731 MOZ_ASSERT((asBits_ & detail::ValueGCThingPayloadMask) != 0);
1: x/i $pc
=> 0x7fffe3e35ad4 <JS::Value::toObject() const+80>:
beq 0x7fffe3e35b84 <JS::Value::toObject() const+256>
(gdb) si
732 MOZ_ASSERT((asBits_ & 0x7) == 0);
1: x/i $pc
=> 0x7fffe3e35ad8 <JS::Value::toObject() const+84>: andi. r9,r3,7
(gdb)
0x00007fffe3e35adc 732 MOZ_ASSERT((asBits_ & 0x7) == 0);
1: x/i $pc
=> 0x7fffe3e35adc <JS::Value::toObject() const+88>:
bne 0x7fffe3e35bcc <JS::Value::toObject() const+328>
(gdb)
JS::Value::toPointer<JSObject> (tag=JSVAL_TAG_OBJECT, this=<optimized out>)
at /home/spectre/src/mozilla-central/obj-powerpc64le-unknown-linux-gnu/dist/include/js/Value.h:734
734 return *toPointer<JSObject>(JSVAL_TAG_OBJECT);
which is really return reinterpret_cast<T*>(uintptr_t(asBits_ ^ tag));
1: x/i $pc
=> 0x7fffe3e35ae0 <JS::Value::toObject() const+92>: xoris r3,r3,1
(gdb) i reg r3
r3 0xfffe007f3b82d060 18446181670215405664
(gdb) si
0x00007fffe3e35ae4 in JS::Value::toObject (this=<optimized out>)
at /home/spectre/src/mozilla-central/obj-powerpc64le-unknown-linux-gnu/dist/include/js/Value.h:734
734 return *toPointer<JSObject>(JSVAL_TAG_OBJECT);
1: x/i $pc
=> 0x7fffe3e35ae4 <JS::Value::toObject() const+96>: xori r3,r3,65532
(gdb) i reg r3
r3 0xfffe007f3b83d060 18446181670215471200
(gdb) si
0x00007fffe3e35ae8 734 return *toPointer<JSObject>(JSVAL_TAG_OBJECT);
1: x/i $pc
=> 0x7fffe3e35ae8 <JS::Value::toObject() const+100>: addi r1,r31,48
(gdb) i reg r3
r3 0xfffe007f3b832f9c 18446181670215430044
(gdb)
In return reinterpret_cast<T*>(uintptr_t(asBits_ ^ tag));
, is that XOR supposed to mask off the tag? This is only doing it on the lower half.
Assignee | ||
Comment 35•5 years ago
|
||
I figured it out. toPointer
should be
template <typename T>
T* toPointer(JSValueTag tag) const {
#if defined(JS_NUNBOX32)
return reinterpret_cast<T*>(uintptr_t(asBits_));
#elif defined(JS_PUNBOX64)
// Note: the 'Spectre mitigations' comment at the top of this class
// explains why we use XOR here and in other to* methods.
uint64_t shiftedTag = (uint64_t)tag << JSVAL_TAG_SHIFT;
return reinterpret_cast<T*>(uintptr_t(asBits_ ^ shiftedTag));
#endif
}
With this in, though, now I get our old friend assertion when typing in the URL bar: Assertion failure: isDouble(), at /home/spectre/src/mozilla-central/obj-powerpc64le-unknown-linux-gnu/dist/include/js/Value.h:426
I'll work on this some more.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 36•5 years ago
|
||
This is mostly informational, but this snapshot (overlaid on tip) makes a working browser with both -O3
and -Og
. In fact, I was able to even remove bug 1512162's workaround as well.
Besides the fix above, this also sets JS_NONCANONICAL_HARDWARE_NAN
. The assertion didn't like the sign bit being set on the NaN generated by XPConnect, which was otherwise perfectly valid. This may have been the whole problem, so I'm going to make a test build with just that change and see if that fixes things too.
Assignee | ||
Comment 37•5 years ago
|
||
Yes, simply setting JS_NONCANONICAL_HARDWARE_NAN
seems sufficient, at least right now (tested on both -Og
and -O3
). It just seems wacky such a "minor" thing would manifest in this manner. Since this would only affect ppc64le
, this should be safe for beta, too. I will post a backout of the workaround and the "easy" fix.
I do like the simpler Value.h
though, it gets rid of a lot of hard to follow stuff.
Comment 38•5 years ago
|
||
Nice work with investigations. I'll try and give a bit more context for this Value encoding stuff and look over this again on Monday.
Good catch on the tag-vs-shiftTag bug. I must have grabbed the wrong grabbed the wrong version of patch. Oops.
The JS::Value type is a uint64_t (on all platforms) that can encode double, int, JSObject*, JSString*, etc. On 64-bit platforms we take advantage of the fact that in practice for platforms we support that addresses have the high-bits set to zero. We can then use these bits to tag which type of value is actually stored. The toPointer method tries to clear these encoded tag bits to get back a usable pointer. Due to spectre mitigations (as well as some other nice failsafe benefits) we use XOR to toggle the expected bits instead of using AND masking.
The JS_NONCANONICAL_HARDWARE_NAN is needed on platforms that do not generate the same bit-level encoding of NaN as X86/ARM. (IEEE-754 allows any non-zero mantissa to mean NaN, but spidermonkey expects only a limited number). There is currently a startup check for the must obvious mismatches in [1]. SpiderMonkey should be able to handle NaN with sign bit set or cleared since that does come up on X86/ARM inside the JITs.
If the JS_NONCANONICAL_HARDWARE_NAN fix is sufficient, that would be great. I also see in [2] that another project was running into NaN issues running some webassembly related thing on POWER8.
[1] https://searchfox.org/mozilla-central/rev/4bec213c1a534329bd6fd97f00c0fcfbd60e26ec/js/src/vm/Initialization.cpp#75-84
[2] https://github.com/WebAssembly/wabt/pull/1046
Comment 39•5 years ago
|
||
Good to hear. I'll take the simpler Value.h into another bug and let it ride the trains. Enabling JS_NONCANONICAL_HARDWARE_NAN for just ppc64le seems like a reasonable thing to still get into beta.
Assignee | ||
Comment 40•5 years ago
|
||
Yes, those in [2] are the hex representations of the NaNs I saw in both bug 1512162 and this one: 0xfff80000
(and 0xfff8000000000000
) and, originally, 0xffc00000
(and 0xffc0000000000000
). However, I don't get a failure in CheckCanonicalNaN
. Should we be asserting? Are those NaNs that would bug out?
In the meantime, the beta-suitable patch is coming up. If that sticks, I'll probably do a backout for bug 1512162 also after.
Assignee | ||
Comment 41•5 years ago
|
||
Assignee | ||
Comment 42•5 years ago
|
||
Correction: the NaN was fff9800000000000
. That is greater than JSVAL_SHIFTED_TAG_MAX_DOUBLE
, so that's why it asserts. Still, it seems like we should have failed CheckCanonicalNaN
.
Comment 43•5 years ago
|
||
[Background] In IEEE-754 Floating point, the fraction for a NaN is 52-bits long. If this is all 0, then we have +/- Infinity and if non-zero we are a NaN. Many architectures use the most-significant bit of this fraction to indicate if this is a signalling or non-signally NaN (with polarity varying between architectures). For ARM/X86, the canonical NaN generated for operations such as Inf - Inf has a fraction that is 8_0000_0000_0000. The SpiderMonkey Value representation is designed around this being 'normal'. For platforms such as SPARC where the NaN is different and we don't have JIT support, the JS_NONCANONICAL_HARDWARE_NAN check is sufficient to normalize NaNs in the places that C++ would care.
The value FFF9_8000_0000_0000 is actually the representation of UndefinedValue and much more likely to be a compile-time bug than the hardware generating it. That is the value that the default Value() constexpr constructor.
This is reminding me a lot of Bug 1312488. We no longer have the same work around that we used there, but we could try reintroducing it by changing the implementation of 'setDouble' to have |*this = Value::fromDouble(d);|.
Assignee | ||
Comment 44•5 years ago
|
||
That makes more sense. However, with this patch,
void setDouble(double d) {
- asBits_ = bitsFromDouble(d);
- MOZ_ASSERT(isDouble());
+ *this = Value::fromDouble(d);
+ if (!isDouble()) {
+ fprintf(stderr, "TRAPPED: %lx %f\n", asBits_, d);
+ __asm__("trap\n");
+ }
}
(which I assume is what you mean), I see TRAPPED: fff9800000000000 -nan
, so unless you meant somewhere else that doesn't seem to be what's gone wrong. Is there a way to trap an uninitialized Value()
, or is that allowed/supported?
Assignee | ||
Comment 45•5 years ago
|
||
Some debugging.
[Switching to thread 1 (Thread 0x7ffff7ff4f50 (LWP 2118))]
#0 JS::Value::setDouble (d=<optimized out>, this=<optimized out>)
at /home/spectre/src/mozilla-central/obj-powerpc64le-unknown-linux-gnu/dist/include/js/Value.h:460
460 __asm__("trap\n");
(gdb) bt 5
#0 0x00007fffe55c8e2c in JS::Value::setDouble(double)
(d=<optimized out>, this=<optimized out>)
at /home/spectre/src/mozilla-central/obj-powerpc64le-unknown-linux-gnu/dist/include/js/Value.h:460
#1 0x00007fffe55c8e2c in JS::Value::setNumber(double)
(this=<optimized out>, d=<optimized out>)
at /home/spectre/src/mozilla-central/obj-powerpc64le-unknown-linux-gnu/dist/include/js/Value.h:528
#2 0x00007fffe55bf288 in js::MutableWrappedPtrOperations<JS::Value, JS::MutableHandle<JS::Value> >::setNumber(double) (d=<optimized out>, this=0x7fffffff9fe0)
at /home/spectre/src/mozilla-central/obj-powerpc64le-unknown-linux-gnu/dist/include/js/RootingAPI.h:669
#3 0x00007fffe55bf288 in XPCConvert::NativeData2JS(JSContext*, JS::MutableHandle<JS::Value>, void const*, nsXPTType const&, nsID const*, unsigned int, nsresult*)
(cx=<optimized out>, d=..., s=0x7fffffffa218, type=..., iid=<optimized out>, arrlen=<optimized out>, pErr=0x7fffffffa05c)
at /home/spectre/src/mozilla-central/js/xpconnect/src/XPCConvert.cpp:115
#4 0x00007fffe562ae1c in CallMethodHelper::GatherAndConvertResults()
(this=0x7fffffffa188)
at /home/spectre/src/mozilla-central/obj-powerpc64le-unknown-linux-gnu/dist/include/js/RootingAPI.h:1239
(More stack frames follow...)
(gdb) up 2
#2 0x00007fffe55bf288 in js::MutableWrappedPtrOperations<JS::Value, JS::MutableHandle<JS::Value> >::setNumber (d=<optimized out>, this=0x7fffffff9fe0)
at /home/spectre/src/mozilla-central/obj-powerpc64le-unknown-linux-gnu/dist/include/js/RootingAPI.h:669
669 DECLARE_NONPOINTER_MUTABLE_ACCESSOR_METHODS(*ptr);
(gdb) p d
$1 = <optimized out>
(gdb) up 1
#3 XPCConvert::NativeData2JS (cx=<optimized out>, d=..., s=0x7fffffffa218,
type=..., iid=<optimized out>, arrlen=<optimized out>, pErr=0x7fffffffa05c)
at /home/spectre/src/mozilla-central/js/xpconnect/src/XPCConvert.cpp:115
115 d.setNumber(*static_cast<const float*>(s));
(gdb) p d
$2 = {<js::MutableHandleBase<JS::Value, JS::MutableHandle<JS::Value> >> = {<js::MutableWrappedPtrOperations<JS::Value, JS::MutableHandle<JS::Value> >> = {<js::WrappedPtrOperations<JS::Value, JS::MutableHandle<JS::Value> >> = {<No data fields>}, <No data fields>}, <No data fields>}, ptr = 0x7fffffffa0f0}
(gdb) x/16lx s
0x7fffffffa218: 0xffcc0000 0x00000000 0x00000000 0x00000000
0x7fffffffa228: 0x01000048 0x00000000 0xffffa260 0x00007fff
0x7fffffffa238: 0xe3e655f0 0x00007fff 0xe3e65620 0x00007fff
0x7fffffffa248: 0xf3976900 0x00007fff 0xffffa280 0x00007fff
It looks like the float started as 0xffcc0000
(I assume that is indeed a valid NaN) but somehow ends up as undefined on the other end.
Assignee | ||
Comment 46•5 years ago
|
||
Some more debugging. I put a debugging fprintf
into setNumber
(as called by XPConnect). This is pretty spammy but I do see, in normal operation,
setNumber: 4090853333333333 1057.300000
setNumber: 7ff8000000000000 nan
setNumber: 4090c4cccccccccd 1073.200000
setNumber: 7ff0000000000000 inf
setNumber: 7ff0000000000000 inf
which suggests that those are generated correctly. But, at the point of the assertion,
setNumber: fff9800000000000 -nan
TRAPPED: fff9800000000000 -nan
So, I guess it still is some sort of miscompilation, since the value shouldn't change.
Assignee | ||
Comment 47•5 years ago
|
||
Going back to my debugging for bug 1512162, I get up to this point where the Interpreter calls, ultimately, nsPrefBranch::GetFloatPrefWithDefault(const char* aPrefName, float aDefaultValue, uint8_t aArgc, float* aRetVal)
, based on ["autoFill.stddevMultiplier", [0.0, "getFloatPref"]]
in browser/components/urlbar/UrlbarPrefs.jsm
.
(gdb) f 10
#10 0x00007fffebd95f24 in Interpret (cx=<optimized out>,
cx@entry=0x7fffd1d49000, state=...)
at /home/spectre/src/mozilla-central/js/src/vm/Interpreter.cpp:3084
3084 if (!CallFromStack(cx, args)) {
(gdb) p state
$16 = (js::RunState &) @0x7fffffffabf8: {kind_ = js::RunState::Invoke,
script_ = {<js::RootedBase<JSScript*, JS::Rooted<JSScript*> >> = {<js::MutableWrappedPtrOperations<JSScript*, JS::Rooted<JSScript*> >> = {<js::WrappedPtrOperations<JSScript*, JS::Rooted<JSScript*> >> = {<No data fields>}, <No data fields>}, <No data fields>}, stack = 0x7fffd1d49030, prev = 0x7fffffffb2e0,
ptr = 0x29260e29c4c0}}
(gdb) x/32lx state.script_.stack
0x7fffd1d49030: 0xffffa8f0 0x00007fff 0xffffa8d8 0x00007fff
0x7fffd1d49040: 0xffffa8a8 0x00007fff 0x00000000 0x00000000
0x7fffd1d49050: 0x00000000 0x00000000 0x00000000 0x00000000
0x7fffd1d49060: 0xffffa3f0 0x00007fff 0xffffa7e8 0x00007fff
0x7fffd1d49070: 0xffffa170 0x00007fff 0xffffad80 0x00007fff
0x7fffd1d49080: 0x00000000 0x00000000 0x00000000 0x00000000
0x7fffd1d49090: 0xd0bba800 0x00007fff 0xd1d4c000 0x00007fff
0x7fffd1d490a0: 0xffdfe541 0x00007fff 0xffe00d41 0x00007fff
(gdb) x/4lx 0x00007fffffffa8f0
0x7fffffffa8f0: 0xd1d49030 0x00007fff 0xffffa7b8 0x00007fff
(gdb) x/4lx 0x00007fffffffa8d8
0x7fffffffa8d8: 0xd1d49038 0x00007fff 0xffffaf10 0x00007fff
(gdb) p args
$17 = {<JS::detail::CallArgsBase<JS::detail::IncludeUsedRval>> = {
argv_ = 0x7fffa972f550, argc_ = 2, constructing_ = false,
ignoresReturnValue_ = false, wantUsedRval_ = {
usedRval_ = false}}, <No data fields>}
(gdb) x/32lx args.argv_
0x7fffa972f550: 0x57be3960 0xfffb0f7d 0x00000000 0xfff88000
0x7fffa972f560: 0x00000000 0xfff98000 0x00000000 0xfff90000
0x7fffa972f570: 0xcdcdcdcd 0xcdcdcdcd 0xcdcdcdcd 0xcdcdcdcd
0x7fffa972f580: 0xcdcdcdcd 0xcdcdcdcd 0xcdcdcdcd 0xcdcdcdcd
0x7fffa972f590: 0xcdcdcdcd 0xcdcdcdcd 0xcdcdcdcd 0xcdcdcdcd
0x7fffa972f5a0: 0xcdcdcdcd 0xcdcdcdcd 0xcdcdcdcd 0xcdcdcdcd
0x7fffa972f5b0: 0xcdcdcdcd 0xcdcdcdcd 0xcdcdcdcd 0xcdcdcdcd
0x7fffa972f5c0: 0xcdcdcdcd 0xcdcdcdcd 0xcdcdcdcd 0xcdcdcdcd
This is already wrong because the default value should be 0.0
, not a NaN. But what should the call args look like? I see 0xfff8800000000000
which I assume is probably corresponding to aArgc
. The 0xfff9800000000000
value is already present, so I'm assuming the call args are on the stack in LIFO order. By the time XPConnect is invoked, the wrong value is already being passed to it.
Comment 48•5 years ago
|
||
The default value for a js::Value that hasn't been filled in yet is 0xfff9800000000000 / UndefinedValue so it isn't surprising to see it in a few places in CallArgs which includes things like the 'this' pointer. That said, this may have been miscompiled earlier and we are just pushing it around.
I've managed to kick off a build in QEMU and if it finishes before tomorrow, I should be able to poke around in GDB myself and save some round-trips. Using gcc-9 on my normal x86_64 build didn't seem to turn up anything obvious.
One possible thing to try if you can is to look at [1] and replace that with a mozilla::BitwiseCast<uint64_t> or just a bare memcpy. This code is definitely in violation of C++ aliasing rules, and is unique to ppc64 targets. It was also last looked at more than a decade ago so it is entirely possible as modern GCC got more clever that this slipped though.
Comment 49•5 years ago
|
||
@tcampbell, if access to a real ppc hw would help, let me know.
Assignee | ||
Comment 50•5 years ago
|
||
I don't think that would cause this because it would only be triggered if the number of float arguments overflowed (there's only one here and tons of volatile float registers available), but just to be sure, I tried
if (!s->IsIndirect() && s->type == nsXPTType::T_DOUBLE) {
if (i < FPR_COUNT)
fpregs[i] = s->val.d;
else
- *(double *)d = s->val.d;
+ memcpy(reinterpret_cast<char*>(d), &(s->val.d), sizeof(s->val.d));
and got the same result. I agree it should be corrected and I will do this in a followup but I don't think that's the bug here.
Comment 51•5 years ago
|
||
Good point about number of floats. I think next step is I'll take Dan up on his offer (-> email) and poke around later this week.
Comment 52•5 years ago
|
||
@tcampbell, you should have an email with details, if not, please check spam folder.
Updated•5 years ago
|
Comment 53•5 years ago
|
||
Using the server that Dan provided access to, I am able to reproduce the problem.
This is not a miscompilation and is actually a decade on XPCOM bug. The gist of the problem is that when passing floating point arguments we should use the first-available floating pointer register and not the i-th floating point register.
I'll put a patch up this week. I was able to revert the patches here and on Bug 1512162 and successfully type into URL and load reddit.
Assignee | ||
Comment 54•5 years ago
|
||
Wow! Thanks for doing that!
That might be ABI-specific, though. On 32-bit and I think 64BE, it should still be i-th, so maybe this should be specific to 64LE. I should have caught that while looking at the code, though.
Comment 55•5 years ago
|
||
Ted, that's an awesome news.
Comment 56•5 years ago
|
||
Cameron, you seem to have more familiarity with this stuff so maybe you can work out the patches here? I realize I'm not familiar with the different ABI changes here.
Files in mozilla-central/xpcom/reflect/xptcall/md need to be tweaked. The X64 platform is an example to use: https://searchfox.org/mozilla-central/source/xpcom/reflect/xptcall/md/unix/xptcinvoke_x86_64_unix.cpp
Assignee | ||
Comment 57•5 years ago
|
||
Yes, I can do that. What was the change you made, though, just so we're looking at the same block of code?
Comment 58•5 years ago
|
||
This fixes the issue for me by tweaking the XPCOM glue code. It should probably better follow the x86_64 pattern for simplicity. One area I was very unclear about was the 'if (i > = 7)' line.
The other area of uncertainty for me is what the ABI is for ppc, and ppc64be. The original ppc glue has similar looking code.
Assignee | ||
Comment 59•5 years ago
|
||
The plain _ppc_
ones are 32-bit. We should leave those alone, because I know they're correct (I use them on TenFourFox). It looks like your change didn't touch those, which is fine.
Unfortunately, the _ppc64_
ones are used for both BE and LE. I need to look at the ABI spec again to make sure I get this right.
Assignee | ||
Comment 60•5 years ago
|
||
It looks like from comparing v1 and v2 ABIs that register selection is the same, so the change should work for both (most BE systems are v1). I'm writing up the patch right now and will test it. What's amazing is that this issue escaped notice for so long.
Comment 61•5 years ago
|
||
My patch isn't quite right. It appears by ABI that a float arg advances gp index, but not vice-versa. Worth looking at what GCC does for some examples. The failing code that started this crash was the signature of [1].
Assignee | ||
Comment 62•5 years ago
|
||
Yep, I've already accounted for that. I'm trying to work around the profiler compile issues, which is slowing down my rebuild.
Updated•5 years ago
|
Assignee | ||
Comment 63•5 years ago
|
||
I think it works. With all the hackarounds backed out and the profiler compile fixes in, I no longer get assertions in debug, and the build works fine in opt. Before I unwind all the fixes in the tree, is this what you meant by "following the x86_64 pattern"? I tried to harmonize them as much as possible without requiring additional changes in the assembly stubs. If this looks fine, I'll turn it into a Phabricator patch.
Comment 64•5 years ago
|
||
Comment 65•5 years ago
|
||
I wonder if we should use the opportunity to normalize the xptcstubs_ppc64_linux.cpp to better match the x86_64 version as well. The PrepareAndDispatch can be almost identical except for [1] where we would also need to increment nr_gpr.
Assignee | ||
Comment 66•5 years ago
|
||
Oh, wow, that's wrong too. I'll work on that today.
Comment 67•5 years ago
|
||
Sorry, I linked to the x86_64 as an example. The ppc64 code actually looks correct thanks to Bug 1498938. My proposal is simply to make it similar to X86_64 so it is more likely to be maintained correctly.
Assignee | ||
Comment 68•5 years ago
|
||
Oh, I see it incrementing the GPR count at the end. Yes, that's certainly less clear.
Assignee | ||
Comment 69•5 years ago
|
||
I'm reworking the original one. There's a bug in it that's causing weird font sizes that goes away when I revert it.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 70•5 years ago
|
||
Fixed. I had a typo in the float section that I missed.
Assignee | ||
Comment 71•5 years ago
•
|
||
And here is the stubs one. Does this look good to you as well? I also made a tiny change to the assembly (not here, but I will put in the final patch) to hoist the mtlr
a little earlier, which since this gets called a lot, should help smooth things a bit.
Assuming so, do you want all the changes together, including the backouts for bug 1512162 and the temporary fix here, or should I make those separate Phabricator submissions?
Assignee | ||
Comment 72•5 years ago
|
||
(I'll fix the spacing on the URLs, I just noticed I forgot to do that in the original file.)
Comment 73•5 years ago
|
||
Updated•5 years ago
|
Comment 74•5 years ago
|
||
My advice for final phabricator patches is to make Part 1 to fix up the xptc* files and fix the bug, and then add part 2 to revert the work-arounds. It will make history easier to follow in 5 years :)
(We should also add froydnj as a blocking reviewer on phabricator)
Assignee | ||
Comment 75•5 years ago
|
||
Assignee | ||
Comment 76•5 years ago
|
||
Depends on D46421
Assignee | ||
Comment 77•5 years ago
|
||
I forgot to put the comment in, but I think it's addressed by the example, isn't it?
Comment 78•5 years ago
|
||
For the record, no crashes observed with the last 2 patches from Cameron applied on trunk.
Assignee | ||
Comment 79•5 years ago
|
||
@msirringhaus, you're probably in the best position to verify on big-endian ppc64 based on your comment in bug 1511604 (I'd ask @awilfox but that account appears to be disabled). Please note the above patches for reference and advise right away if a follow up patch is required for BE. I don't have any BE ppc64 systems running Linux or *BSD, but I have preserved the semantics, so it should "just work."
Comment 81•5 years ago
|
||
We (SUSE) actually only do ppc64le builds. Our big endian platform is s390x, which is unaffected by this changeset, AFAIK.
In openSUSE I can at least build for ppc64, but I don't have a test-machine available either.
I'll ask around to see if I can find something.
I did try out your patches for ppc64le on 68esr though, which seems to works fine.
Assignee | ||
Comment 82•5 years ago
|
||
OK, thanks. Let me know if you find a ppc64 build you can play with. Correct, s390x is unaffected by this specific problem. Thanks for also testing on 68ESR.
In the meantime, I'll ask for check-in and then we can start moving it up to beta and ESR once it sticks on central.
Comment 83•5 years ago
|
||
Pushed by tcampbell@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/4887ef7af824
Fix and tune xptcall for ppc64le, and harmonize with x86_64's. r=tcampbell,froydnj
https://hg.mozilla.org/integration/autoland/rev/2d6979f57d6a
part 2 - remove now unnecessary workarounds for ppc64le in js/src/gc and js/xpconnect/src. r=tcampbell
Comment 84•5 years ago
|
||
bugherder |
Updated•5 years ago
|
Updated•5 years ago
|
Comment 85•5 years ago
|
||
Comment on attachment 9093808 [details]
Bug 1576303 - Fix and tune xptcall for ppc64le, and harmonize with x86_64's. r?tcampbell,froydnj
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: This PPC64-only patch fixes the a bug that causes browser to never start.
- User impact if declined: PPC64(LE) builds on Fedora/SUSE/?? will not be usable.
- Fix Landed on Version: 71
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Changes are only in ppc64 specific files. These are not-part-of-the-build for any other platform.
- String or UUID changes made by this patch:
Comment 86•5 years ago
|
||
Comment on attachment 9093809 [details]
Bug 1576303 part 2 - remove now unnecessary workarounds for ppc64le in js/src/gc and js/xpconnect/src. r?tcampbell,froydnj
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: In Bug 1512162 and Bug 1576303, we uplifted workarounds to ESR68 for PPC64 builds. This patch reverts both of them in favour of "Bug 1576303 - Fix and tune xptcall for ppc64le, and harmonize with x86_64's" which is being requested to uplift. These changes where only #ifdefs that apply to PPC64 configuration. It is easy to remove them so we should remove these workarounds from ESR68 and remain aligned with m-c.
- User impact if declined: Unnecessary work-around remain in ESR68 that don't exist anywhere else. This patch isn't strictly needed to fix any bug though.
- Fix Landed on Version: 71
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): #ifdef for tier-3 platform only. We are reverting other uplifts to esr68 in favour of the original code.
- String or UUID changes made by this patch:
Assignee | ||
Comment 87•5 years ago
|
||
I'm going to ask for this on beta too, since the Barrier.h
workaround is there.
Assignee | ||
Comment 88•5 years ago
|
||
Comment on attachment 9093808 [details]
Bug 1576303 - Fix and tune xptcall for ppc64le, and harmonize with x86_64's. r?tcampbell,froydnj
Beta/Release Uplift Approval Request
- User impact if declined: Poorer performance and possibly crashes on ppc64le depending on optimization level. The issue also potentially affects BE ppc64 though we haven't seen any clear reports yet.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Strictly, unbelievably, completely provably NPOTB: the xpconnect pieces are strictly for ppc64, and the other parts just back out ppc64le specific workarounds that were never part of any Tier-1 build.
- String changes made/needed: none
Assignee | ||
Updated•5 years ago
|
Comment 89•5 years ago
|
||
Comment on attachment 9093808 [details]
Bug 1576303 - Fix and tune xptcall for ppc64le, and harmonize with x86_64's. r?tcampbell,froydnj
Removes a workaround, NPOTB. Let's uplift for beta 9.
Updated•5 years ago
|
Comment 91•5 years ago
|
||
Comment on attachment 9093808 [details]
Bug 1576303 - Fix and tune xptcall for ppc64le, and harmonize with x86_64's. r?tcampbell,froydnj
Approved for 68.2esr also.
Updated•5 years ago
|
Comment 92•5 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-esr68/rev/4db8c12af314
https://hg.mozilla.org/releases/mozilla-esr68/rev/31276424f433
Updated•5 years ago
|
Comment 93•5 years ago
|
||
bugherder uplift |
Comment 94•5 years ago
|
||
I tried the mozilla-esr68 branch on a PPC64LE machine using the STR from this bug and the browser no longer crashes. Closing as fixed.
Updated•5 years ago
|
Assignee | ||
Comment 95•5 years ago
|
||
Yes, sorry, I forgot to remove the keyword.
Description
•