Closed Bug 1253216 Opened 4 years ago Closed 4 years ago

Fedora 24 - crash with --disabled-ion

Categories

(Core :: JavaScript Engine: JIT, defect)

45 Branch
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
firefox48 --- fixed

People

(Reporter: stransky, Assigned: lth)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Because of bug 1245783 - JS engine crash when build by gcc6 - I need to disable ion jit in Fedora 24. 

But when build with --disabled-ion firefox crashes at AtomicOperations-none.h. So there's no way how to run FF on Fedora 24 right now.

Looks like parts of JIT code is used even when JIT is disabled.

93	template<typename T>
94	inline T
95	js::jit::AtomicOperations::loadSafeWhenRacy(T* addr)
96	{
97	    MOZ_CRASH();
98	}

sample of bt:
#0  0x00007fffea06df80 in js::jit::AtomicOperations::loadSafeWhenRacy<unsigned char>(unsigned char*) (addr=<optimized out>)
    at /home/komat/CVS/firefox/firefox-45.0/firefox-45.0/js/src/jit/none/AtomicOperations-none.h:97
#1  0x00007fffea075825 in js::TypedArrayObject::getElement(unsigned int) (addr=...)
    at /home/komat/CVS/firefox/firefox-45.0/firefox-45.0/js/src/jit/AtomicOperations.h:219
#2  0x00007fffea075825 in js::TypedArrayObject::getElement(unsigned int) (obj=0x7fffc3b077a0, index=0)
    at /home/komat/CVS/firefox/firefox-45.0/firefox-45.0/js/src/vm/TypedArrayObject.cpp:684
#3  0x00007fffea075825 in js::TypedArrayObject::getElement(unsigned int) (tarray=0x7fffc3b077a0, index=0)
    at /home/komat/CVS/firefox/firefox-45.0/firefox-45.0/js/src/vm/TypedArrayObject.cpp:960
#4  0x00007fffea075825 in js::TypedArrayObject::getElement(unsigned int) (this=this@entry=0x7fffc3b077a0, index=index@entry=0) at /home/komat/CVS/firefox/firefox-45.0/firefox-45.0/js/src/vm/TypedArrayObject.cpp:1778


comes from:

679	    static const NativeType
680	    getIndex(JSObject* obj, uint32_t index)
681	    {
682	        TypedArrayObject& tarray = obj->as<TypedArrayObject>();
683	        MOZ_ASSERT(index < tarray.length());
684	        return jit::AtomicOperations::loadSafeWhenRacy(tarray.viewDataEither().cast<NativeType*>() + index);
685	    }
The PPC64 people had the same problem, and there's a specific hack in place for that that you could perhaps adapt.  Take a look at the nasty #ifdef cluster at the bottom of js/src/jit/AtomicOperations.h.

I'm a little torn about whether it makes sense to make this hack the default for a --disable-ion build, because technically the code that is included for ppc64 needs to be ppc64-specific, it can't be portable code.
Many architectures simply use non-atomic code because Bug 1208663 isn't resolved.  Why does AtomicOperations-none.h calls MOZ_CRASH()?
(In reply to Makoto Kato [:m_kato] from comment #2)
> Many architectures simply use non-atomic code because Bug 1208663 isn't
> resolved.  Why does AtomicOperations-none.h calls MOZ_CRASH()?

That the other architectures use racy operations is a known bug, as you say, but it is never right for the "none" code to use the racy implementations.  The atomic operations are always platform/compiler specific, even when compiling with --disable-ion.  (The operations are in the jit/ directory because they have to be compatible with whatever the jit is doing, even when the jit is "none".)
Should the include at the bottom of AtomicOperations.h be based on the target platform marco (CPU_ARCH?)
instead of JS_CODEGEN_XXXX

Will the other implemenations work for non-ion builds on their respective platforms?
(In reply to Steve Singer (:stevensn) from comment #4)
> Should the include at the bottom of AtomicOperations.h be based on the target platform marco (CPU_ARCH?) instead of JS_CODEGEN_XXXX

I don't think so.  If the JIT is enabled then it's the JIT's atomics that these functions must be compatible with.  JS_CODEGEN_X86 might reasonably have different code than CPU_ARCH==x86 with JS_CODEGEN_NONE.  For example, for JS_CODEGEN_X86 we may end up generating the code for these functions dynamically at startup, using the JIT, while for JS_CODEGEN_NONE on x86 we'd either have to use assembler or -- this being a tier-3 platform at that point and nobody caring very much -- just use racy C++ code and hope for the best, as you've done on PPC64.

It seems possible that after the JS_CODEGEN_XXX cases there could be cases based on CPU_ARCH nested inside the JS_CODEGEN_NONE case, though, which would clean up the current mess and really seems most reasonable.

> Will the other implemenations work for non-ion builds on their respective platforms?

If I understand your question correctly I think my first paragraph above answers it: not necessarily.  (And I don't want to commit to it in any case.)
Something like this, maybe - at least it creates a clear place for other third-tier platforms to hook in.  Note, I also changed one of the PPC64 macro names, it was '__PPC64_' with one trailing underscore, which I assume is wrong.
Attachment #8726670 - Flags: feedback?(steve)
Assignee: nobody → lhansen
Status: NEW → ASSIGNED
(In reply to Lars T Hansen [:lth] from comment #6)
> Created attachment 8726670 [details] [diff] [review]
> clean up the atomic ops ifdef nest
> 
> Something like this, maybe - at least it creates a clear place for other
> third-tier platforms to hook in.  Note, I also changed one of the PPC64
> macro names, it was '__PPC64_' with one trailing underscore, which I assume
> is wrong.

please consider __aarch64__ (ARM 64bit), too.  You know, we don't set JS_CODEGEN_ARM64 for target=aarch64.  So Firefox for Linux/aarch64 always crashes by this issue since it uses interpreter mode only
Attachment 8726670 [details] [diff] works fine on ppc.  Yes I think the missing _ was a mistake.

Someone still has to write non-jit versions of the atomics for x86/64 , arm etc...
Attachment #8726670 - Flags: feedback?(steve) → feedback+
(In reply to Steve Singer (:stevensn) from comment #8)
> Attachment 8726670 [details] [diff] works fine on ppc.  Yes I think the
> missing _ was a mistake.

OK, good.

> Someone still has to write non-jit versions of the atomics for x86/64 , arm
> etc...

Yes, but it was always going to be that way :)  The point of this little patch was just to make the structure more obvious.
(In reply to Makoto Kato [:m_kato] from comment #7)
> (In reply to Lars T Hansen [:lth] from comment #6)
> > Created attachment 8726670 [details] [diff] [review]
> > clean up the atomic ops ifdef nest
> > 
> > Something like this, maybe - at least it creates a clear place for other
> > third-tier platforms to hook in.  Note, I also changed one of the PPC64
> > macro names, it was '__PPC64_' with one trailing underscore, which I assume
> > is wrong.
> 
> please consider __aarch64__ (ARM 64bit), too.  You know, we don't set
> JS_CODEGEN_ARM64 for target=aarch64.  So Firefox for Linux/aarch64 always
> crashes by this issue since it uses interpreter mode only

Yes, same problem.  Also the same problem for Fedora on (I assume) some x86/x64 variant, which started this thread.

At the moment, --disable-ion platforms are basically completely unsupported by Mozilla, so far as I know.  I'd almost be inclined not to add the requested support to mozilla-central, as we did for PPC64 (which I would then propose to remove again), but to instead require downstream repos to maintain a patch for AtomicOperations.h in order to implement their platform atomics for non-ion builds.  The reason of course is that the atomics code in question cannot be portable, and I don't want to give the impression that it is.  But maybe that's too harsh.

Jason, Jan - opinions on how much we should be integrating target-specific fixes for no-Ion platforms?
Flags: needinfo?(jorendorff)
Flags: needinfo?(jdemooij)
> I'm a little torn about whether it makes sense to make this hack the default for a --disable-ion build,
> because technically the code that is included for ppc64 needs to be ppc64-specific, it can't be
> portable code.

Martin, I agree with Lars at least this far: Firefox can't ship on Fedora 26 with Ion disabled - it's not a tested build configuration, as you've discovered. It would be a massive security and stability risk to ship that to users.

That said, since we do have it, I'd be OK taking such a patch that gets a --disable-ion browser basically working, if it helps you out in the short term. But I think maybe that is not the case.

The right way forward is for us to fix bug 1245783.
Flags: needinfo?(jorendorff)
Flags: needinfo?(jdemooij)
Comment on attachment 8726670 [details] [diff] [review]
clean up the atomic ops ifdef nest

OK, let's try to land this then, to pave the path.
Attachment #8726670 - Flags: review?(jorendorff)
Comment on attachment 8726670 [details] [diff] [review]
clean up the atomic ops ifdef nest

Review of attachment 8726670 [details] [diff] [review]:
-----------------------------------------------------------------

OK.
Attachment #8726670 - Flags: review?(jorendorff) → review+
(In reply to Jason Orendorff [:jorendorff] from comment #11)
> The right way forward is for us to fix bug 1245783.

Okay, I'll look into this. We don't intent to ship FF with disabled JIT on primary arches in Fedora. 

But what about second arches, ppc/s390 and so? Those are built with disabled JIT with default, correct? So do you basically say that those builds are risky?
I'm a bit confused by Lars' comment on the tier-3 architectures being too risky.

If there are specific threading risks to the ppc implementation of atomics then please point them out and we should try to fix them.

If your concern is about platform specific code for non tier-1 platforms them I'm a bit confused. The definition of a tier-3 platform (https://developer.mozilla.org/en/docs/Supported_build_configurations) is that they are community maintained and unsupported by mozilla.  Many of the tier-3 platforms (ppc and the *bsd's in particular) have auto-builders that build mozilla central and are pretty good about submitting fixes as things break.

We have other places with platform specific non-portable code in the codebase. (xpt, ipc, etc..) Sometimes they break and those of us in the community try are best to fix them.  Sometimes issues take a bit longer to find/fix than we'd like but we try our best given that most of us do this in our spare time.

It sounds like the issue with gcc6 is a bug in gcc (that I assume gcc will eventually fix) so I can see why the considerations would be different.
(In reply to Martin Stránský from comment #14)
> But what about second arches, ppc/s390 and so? Those are built with disabled
> JIT with default, correct? So do you basically say that those builds are
> risky?

The no-JIT configuration is not something we (actively) test, support, and ship. So yes, it *might* have buggy or unsafe code. For tier 3 Firefox builds, like ppc or OS/2, that's probably acceptable: not many users, these platforms are more-or-less unsupported anyway, and there's more platform-specific code to worry about elsewhere.
(In reply to Jan de Mooij [:jandem] from comment #16)
> (In reply to Martin Stránský from comment #14)
> > But what about second arches, ppc/s390 and so? Those are built with disabled
> > JIT with default, correct? So do you basically say that those builds are
> > risky?
> 
> The no-JIT configuration is not something we (actively) test, support, and
> ship.

AFAIK, we actually kind of test it, with jit tests, which iterate with plenty of different configurations, each of which disables more or less things.
(In reply to Mike Hommey [:glandium] from comment #17)
> > The no-JIT configuration is not something we (actively) test, support, and
> > ship.
> 
> AFAIK, we actually kind of test it, with jit tests, which iterate with
> plenty of different configurations, each of which disables more or less
> things.

True, but that's still a 'normal' build, it just disables the JITs at runtime. The configure flag, on the other hand, does not compile the JIT backend at all (it uses the fake 'none' backend).
(In reply to Steve Singer (:stevensn) from comment #15)

> If there are specific threading risks to the ppc implementation of atomics
> then please point them out and we should try to fix them.

There are not any specific risks, that I am aware of.  (The "risky" remark was probably Jason.)  When the atomics landed they brought with them the need for many new back-end functions, but only few existing back-end functions were changed.  (The functions that load from and store to TypedArray were changed, so that they could provide support for Atomics.store and Atomics.load.)  If there's any risk at all it's that those changes were not adopted in the ppc back-end, but that's a tier-3 problem.  There's no problem with the ifdef nest that landed in mozilla-central.

As Jan said, the concern is only that the "none" platform is not supported by us - we use the "none" target only to check that we don't make JIT assumptions outside the JIT - and my question is only around whether we should go out of our way to make sure the "none" platform is runnable.
(In reply to Martin Stránský from comment #14)
> Okay, I'll look into this. We don't intent to ship FF with disabled JIT on
> primary arches in Fedora.

Great. That was what I was worried about.

> But what about second arches, ppc/s390 and so? Those are built with disabled
> JIT with default, correct? So do you basically say that those builds are
> risky?

I just meant shipping code that isn't being actively tested on tier-1 platforms would be a bad risk.
On tier-3 platforms, obviously, there's no choice. There aren't any known issues (except this one you found, obviously). But we aren't looking.
https://hg.mozilla.org/mozilla-central/rev/69f237c2cf91
Status: ASSIGNED → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Blocks: 1257055
You need to log in before you can comment on or make changes to this bug.