Closed Bug 1871969 Opened 6 months ago Closed 4 months ago

signal 4 on freebsd arm64 due to missing cache synchronisation

Categories

(Core :: JavaScript Engine, defect, P3)

Firefox 120
defect

Tracking

()

RESOLVED FIXED
124 Branch
Tracking Status
firefox-esr115 --- wontfix
firefox122 --- wontfix
firefox123 --- fixed
firefox124 --- fixed

People

(Reporter: jsm, Assigned: nbp)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(4 files)

User Agent: Mozilla/5.0 (X11; FreeBSD amd64; rv:120.0) Gecko/20100101 Firefox/120.0

Steps to reproduce:

Run firefox > 115 i.e versions with write protected code off (i.e w+x pages and no call to mprotect between write and exec) for content processes, on freebsd Arm64. Jit code kills firefox with signal 4. https://searchfox.org/mozilla-central/rev/08899071a2c8a573ac47ac632869bb92269b3ec3/js/src/jit/arm64/vixl/MozCpu-vixl.cpp#113 add a defined FreeBSD here to also execute the cache sync code on FreeBSD. see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271081#c12 to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271081#c12#c14

Actual results:

signal 4

Expected results:

jit code should be ready to execute

The Bugbug bot thinks this bug should belong to the 'Core::JavaScript Engine' component, and is moving the bug to that component. Please correct in case you think the bot is wrong.

Component: Untriaged → JavaScript Engine
Product: Firefox → Core

add a defined FreeBSD here to also execute the cache sync code on FreeBSD

Why is that case limited to specific OSes at all? The #else block below is:

#else
  // If the host isn't AArch64, we must be using the simulator, so this function
  // doesn't have to do anything.
  USE(address, length);

so this code would be buggy for anything that is __aarch64__ and not one of the explicitly listed OSes.

Nicolas, thoughts on resolution here?

Blocks: sm-jits
Severity: -- → S3
Flags: needinfo?(nicolas.b.pierron)
Priority: -- → P3

(In reply to jsm from comment #0)

Run firefox > 115 i.e versions with write protected code off (i.e w+x pages and no call to mprotect between write and exec) for content processes, on freebsd Arm64. Jit code kills firefox with signal 4. https://searchfox.org/mozilla-central/rev/08899071a2c8a573ac47ac632869bb92269b3ec3/js/src/jit/arm64/vixl/MozCpu-vixl.cpp#113 add a defined FreeBSD here to also execute the cache sync code on FreeBSD. see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271081#c12 to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271081#c12#c14

Thanks a lot to the FreeBSD community and you. This would have been harder to investigate otherwise.
Your patch to MozCPU-vixl.cpp sounds right, as well as your analysis on the fact that mprotect was hiding the lack of support.

On ARM the instruction cache has to be invalidated manually when we update the content, as otherwise it might keep a stale cache of what was previously seen as instructions, even if data was properly locked.

Firefox recently moved away from W^X, back to W&X, for performance reasons. After a long debate with the security team, this security mitigation no longer provides what it was meant for given the usage of JIT Spraying.

However, I am not sure for the need of MAP_FIXED in gc/Memory.cpp , I would expect our GC code to support this case correctly without MAP_FIXED.
I will forward the question To Jon, in case another bug has to be opened for the GC case.

Jesper (jsm), do you want to author the patch in Firefox as well? Or should I redo the patch?

(In reply to Ed Maste [:emaste] from comment #2)

add a defined FreeBSD here to also execute the cache sync code on FreeBSD

Why is that case limited to specific OSes at all?

This is because we are not able to test on all existing OSes, and probably a lack of trust that all OSes would behave the same. Especially given that Mac and Windows already require their own implementation.

Flags: needinfo?(jsm)
Flags: needinfo?(jcoppeard)

(In reply to Nicolas B. Pierron [:nbp] from comment #4)

(In reply to Ed Maste [:emaste] from comment #2)

add a defined FreeBSD here to also execute the cache sync code on FreeBSD

Why is that case limited to specific OSes at all?

This is because we are not able to test on all existing OSes, and probably a lack of trust that all OSes would behave the same. Especially given that Mac and Windows already require their own implementation.

OK. Maybe it could be made explicit though:

#elif defined(__aarch64__)
        #if (defined(__linux__) || defined(__android__))
                ...
        #else
                #error OS-specific cache invalidation needed!
        #endif
#else
  // If the host isn't AArch64, we must be using the simulator, so this function
  // doesn't have to do anything.
  USE(address, length);
#endif

Mainly I'd like to avoid falling into the case with the host isn't AArch64 comment, when the host actually is AArch64.

Firefox recently moved away from W^X, back to W&X, for performance reasons. After a long debate with the security team, this security mitigation no longer provides what it was meant for given the usage of JIT Spraying.

If you have it handy could I trouble you for more info on the change (e.g. is there is a public thread discussing it, wiki page etc.)? FreeBSD has a knob to enforce W^X policy (off by default), and a way to tag binaries as requiring WX mappings (NT_FREEBSD_FEATURE_CTL ELF note). I'd like to look at a patch (both in our ports collection and for upstream) to handle this, and would like to include a reference.

(In reply to Ed Maste [:emaste] from comment #5)

Firefox recently moved away from W^X, back to W&X, for performance reasons. After a long debate with the security team, this security mitigation no longer provides what it was meant for given the usage of JIT Spraying.

If you have it handy could I trouble you for more info on the change (e.g. is there is a public thread discussing it, wiki page etc.)? FreeBSD has a knob to enforce W^X policy (off by default), and a way to tag binaries as requiring WX mappings (NT_FREEBSD_FEATURE_CTL ELF note). I'd like to look at a patch (both in our ports collection and for upstream) to handle this, and would like to include a reference.

The change happened in Bug 1835876 (https://hg.mozilla.org/integration/autoland/rev/9d31a829691c)
And the good news is that the behavior can be restored in about:config (user.js) to turn javascript.options.content_process_write_protect_code back to true if needed.

(In reply to Nicolas B. Pierron [:nbp] from comment #4)

(In reply to jsm from comment #0)

Run firefox > 115 i.e versions with write protected code off (i.e w+x pages and no call to mprotect between write and exec) for content processes, on freebsd Arm64. Jit code kills firefox with signal 4. https://searchfox.org/mozilla-central/rev/08899071a2c8a573ac47ac632869bb92269b3ec3/js/src/jit/arm64/vixl/MozCpu-vixl.cpp#113 add a defined FreeBSD here to also execute the cache sync code on FreeBSD. see https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271081#c12 to https://bugs.freebsd.org/bugzilla/show_bug.cgi?id=271081#c12#c14

Thanks a lot to the FreeBSD community and you. This would have been harder to investigate otherwise.
Your patch to MozCPU-vixl.cpp sounds right, as well as your analysis on the fact that mprotect was hiding the lack of support.

On ARM the instruction cache has to be invalidated manually when we update the content, as otherwise it might keep a stale cache of what was previously seen as instructions, even if data was properly locked.

Firefox recently moved away from W^X, back to W&X, for performance reasons. After a long debate with the security team, this security mitigation no longer provides what it was meant for given the usage of JIT Spraying.

However, I am not sure for the need of MAP_FIXED in gc/Memory.cpp , I would expect our GC code to support this case correctly without MAP_FIXED.

Yes the MAP_FIXED solves to run with aslr on FreeBSD and is another story I have to investegate further as mentioned in the downstream bug. As you say it is unrelated to jit crashing with signal 4.

I will forward the question To Jon, in case another bug has to be opened for the GC case.

Jesper (jsm), do you want to author the patch in Firefox as well? Or should I redo the patch?

I would like to be the author, I might be able to attach a patch tomorrow (I am in UTC+1 ) . I like the suggestion of Ed Maste to come up with a compile time error if the OS falls through to not implemented cache invalidation code.

(In reply to Ed Maste [:emaste] from comment #2)

add a defined FreeBSD here to also execute the cache sync code on FreeBSD

Why is that case limited to specific OSes at all?

This is because we are not able to test on all existing OSes, and probably a lack of trust that all OSes would behave the same. Especially given that Mac and Windows already require their own implementation.

Flags: needinfo?(jsm)

(In reply to Ed Maste [:emaste] from comment #5)

Mainly I'd like to avoid falling into the case with the host isn't AArch64 comment, when the host actually is AArch64.

I think this is a regression from the OpenBSD fix in bug 1672619. The patch there fixed the code to compile on OpenBSD by disabling the cache invalidation code if not Linux or Android.

We should at least add something like:

#elif defined(__aarch64__)
# error ...
#else
...
Keywords: regression
Regressed by: 1672619

I have excluded OpenBSD from the error message, because the cache code is neither needed or able to compile there. The patch is for tip[0], but I have only tested on 122 build2 since I do not have easy access to newer nss on FreeBSD.

changeset: 771854:7235102b55e9
bookmark: autoland
tag: tip
user: moz-wptsync-bot <wptsync@mozilla.com>
date: Thu Jan 18 18:58:17 2024 +0000
summary: Bug 1875346 - [wpt-sync] Update web-platform-tests to d46f1aec78700b30ec633165cd51af2fd90bf3b6, a=testonly

Comment on attachment 9375624 [details] [diff] [review]
cacheinvalidation.patch

Review of attachment 9375624 [details] [diff] [review]:
-----------------------------------------------------------------

This change LGTM

(In reply to jsm from comment #9)

I have excluded OpenBSD from the error message, because the cache code is neither needed or able to compile there.

Why does OpenBSD not need to invalidate the instruction cache?

The patch in bug 1672619 changed ifdefs in some other places too. I wonder if we should just revert that patch and fix the original OpenBSD issue in a different way.

OpenBSD write protects always, so I assume that their mprotect behaves the same with regards to cache, otherwise they would have seen the same issue. I do not have a arm64 openbsd atm though..

Assignee: nobody → nicolas.b.pierron
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true

Hum … phabricator does not seems to keep the authorship set on the patch, and instead uses the one from the person who is pushing the patch to phabricator.

Apparently phabricator confuses the person who submit the patch with the patch author.
Looking at the patch which is pushed to our Try server, the authorship is retained: https://hg.mozilla.org/try/rev/c40f220781e5990dc3ef0cda92cd3660f45b8f7d

So everything is good despite phabricator reporting it wrongly.

Flags: needinfo?(nicolas.b.pierron)

Set release status flags based on info from the regressing bug 1672619

Same strategy for CPU::GetCacheType()

Do not error out in GetCacheType.

Pushed by npierron@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1b595199c4de
Use instruction cache invalidation on FreeBSD r=jandem
Status: ASSIGNED → RESOLVED
Closed: 4 months ago
Resolution: --- → FIXED
Target Milestone: --- → 124 Branch

:nbp does this need a beta uplift request for Fx123, or should it ride the trains with Fx124?

Flags: needinfo?(nicolas.b.pierron)

(In reply to Donal Meehan [:dmeehan] from comment #22)

:nbp does this need a beta uplift request for Fx123, or should it ride the trains with Fx124?

As far as I understand this could ride the train, as there is no urgency in getting these patches out, as patches are already handled by the package manager, with the same outcome.

However, if FreeBSD maintainers prefer to have it upstream to Firefox 123, we can do it.
JSM, is there a need for having these patches as part of Firefox sooner than 124?

Flags: needinfo?(nicolas.b.pierron) → needinfo?(jsm)

It would be really nice to already have it in Firefox 123. Thanks.

Flags: needinfo?(jsm)
Flags: needinfo?(jcoppeard)

Comment on attachment 9375952 [details]
Bug 1871969 - Use instruction cache invalidation on FreeBSD

Beta/Release Uplift Approval Request

  • User impact if declined: (see comment 24)
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Already tested and deployed in the form of patches to the FreeBSD community.
  • String changes made/needed:
  • Is Android affected?: No
Attachment #9375952 - Flags: approval-mozilla-beta?

Comment on attachment 9375952 [details]
Bug 1871969 - Use instruction cache invalidation on FreeBSD

Approved for 123 beta 8, thanks.

Attachment #9375952 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: