Investigate defining JS_POISON in release builds, not just #ifdef JS_CRASH_DIAGNOSTICS

RESOLVED FIXED in Firefox 66

Status

()

defect
P3
normal
RESOLVED FIXED
5 years ago
2 months ago

People

(Reporter: cpeterson, Assigned: jonco)

Tracking

(Blocks 2 bugs, {sec-want})

unspecified
mozilla66
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox66 fixed)

Details

(Whiteboard: [js:p1][adv-main66-])

Attachments

(2 attachments, 1 obsolete attachment)

Reporter

Description

5 years ago
As of bug 860254, jemalloc now poisons freed memory in both debug and release builds. bsmedberg recommends poisoning JS memory allocations, too:

https://groups.google.com/forum/#!topic/mozilla.dev.tech.js-engine.internals/dUlUxYsFz5Q
Keywords: sec-want
Reporter

Comment 1

3 years ago
We started poisoning freed Ion code in release builds with bug 977810. This bug is about enabling (just some?) JS_POISON calls in release builds.

Updated

2 years ago
Depends on: 1333005
Is the scope of this bug just changing the defines here: https://searchfox.org/mozilla-central/source/js/src/jsutil.h#357-362

Or did we also want to add more JS_POISON calls?

Assuming it's just changing these, my proposal would be to switch it to |EARLY_BETA_OR_EARLIER|, let it ride through beta, and then switch to always enabling it once beta looked good. I'm inferring from the fact that this is happilly on in nightly that there aren't any (known) critical performance issues or crashes :-)

Does that seem reasonable?
Flags: needinfo?(nicolas.b.pierron)
Reporter

Comment 3

a year ago
bsmedberg's original suggestion was to define JS_POISON in all builds and channels. Enabling JS_CRASH_DIAGNOSTICS would have more overhead than enabling JS_POISON, but perhaps JS_CRASH_DIAGNOSTICS (or a subset) could be enabled with EARLY_BETA_OR_EARLIER?

https://searchfox.org/mozilla-central/search?case=true&q=JS_CRASH_DIAGNOSTICS
Ah, ok I didn't realize JS_CRASH_DIAGNOSTIC would be a bigger. Just switching to JS_POISON always being on seems straightforward then. I'll upload a patch momentarily.
Previously JS_POISON was defined for debug and nightly builds. Now
it'll be enabled for all builds. JS_POISON makes it easier to identify UAFs and
other memory lifetime issues.
Assignee: nobody → agaynor
Poisoning sometimes affects performance a lot, see the JSGC_DISABLE_POISONING environment var. We set it on AWFY and when running Talos:

https://searchfox.org/mozilla-central/rev/cf464eabfeba64e866c1fa36b9fefd674dca9c51/testing/talos/talos/ttest.py#126

At the very least we should compare Talos runs with that enabled/disabled, and probably also try Octane in the browser or JS shell (js/src/octane/run.js runs in the shell).
(In reply to Alex Gaynor [:Alex_Gaynor] from comment #2)
> Or did we also want to add more JS_POISON calls?

I do not think we want to add more for the moment.

> Does that seem reasonable?

As Jan mentioned in comment 6, we should be careful at not regressing performance too much, and maybe we can filter the slow poisoning from the fast one.

Otherwise this sounds good to me.
Another alternative might be to enable it on dev-edition, but not on beta, and compare the two.
Flags: needinfo?(nicolas.b.pierron)
Ok, I did a talos run:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=9251f89d196a3c0a3ca744dc9c254482ec669e5c&framework=1&showOnlyComparable=1&showOnlyImportant=1&selectedTimeRange=172800

Looks like there's a decent number of 2-4% regressions, and a lot of talos tests which are unaffected.

Is that an acceptable level of regression, or should I spend some time investigating which of the poisons can safely be enabled vs. not?
(In reply to Alex Gaynor [:Alex_Gaynor] from comment #8)
> Is that an acceptable level of regression, or should I spend some time
> investigating which of the poisons can safely be enabled vs. not?

Maybe try without these:

* https://searchfox.org/mozilla-central/rev/c621276fbdd9591f52009042d959b9e19b66d49f/js/src/gc/Allocator.cpp#692

This poisons a whole chunk (1 MB) so it's not great for caches and doing this seems less useful than poisoning when we sweep something.

* https://searchfox.org/mozilla-central/rev/c621276fbdd9591f52009042d959b9e19b66d49f/js/src/gc/Nursery.cpp#92
* https://searchfox.org/mozilla-central/rev/c621276fbdd9591f52009042d959b9e19b66d49f/js/src/gc/Nursery.cpp#104

These are a bit more useful, but I can see them causing regressions when we do lots of nursery GCs.
Jan, do you have an opinion on these results? (I just noticed that while one of the motionmark benchmarks shows a regression, another one shows a win; the joys of benchmarking).
Flags: needinfo?(jdemooij)
Sorry for the delay. If Talos and GC telemetry (when this moves to beta) are okay with it, this is probably fine.

Jon owns the GC and most of the poisoning is GC-related, so I'll forward to him. 

FWIW see bug 1467697 comment 12 for a recent profile where poisoning during nursery collection showed up in Nightly. Also I assume JS_EXTRA_POISON is temporary and will be replaced with JS_NIGHTLY_POISON/JS_DIAGNOSTIC_POISON or something that matches the current behavior.
Component: JavaScript Engine: JIT → JavaScript: GC
Flags: needinfo?(jdemooij) → needinfo?(jcoppeard)

Updated

10 months ago
Assignee: agaynor → nobody

Updated

10 months ago
Priority: -- → P3
Attachment #8983927 - Attachment is obsolete: true
Assignee

Comment 14

5 months ago

First of all, replace the JS_*_POISON macros with inline functions.

Attachment #9035369 - Flags: review?(jdemooij)
Assignee

Comment 15

5 months ago

Make most poisoning unconditional. This leaves poisoning for new chunks and the nursery the same as it is now, i.e. nightly-only.

Attachment #9035370 - Flags: review?(jdemooij)
Comment on attachment 9035369 [details] [diff] [review]
bug981991-refactor-macros

Review of attachment 9035369 [details] [diff] [review]:
-----------------------------------------------------------------

Much nicer.

::: js/src/jsutil.h
@@ +344,5 @@
> +}
> +
> +// Poison a region of memory in debug builds. Can be disabled by setting the
> +// JSGC_DISABLE_POISONING environment variable.
> +static inline void ExtraPoison(void* ptr, uint8_t value, size_t num,

Pre-existing: what do you think about renaming this to DebugPoison or DebugOnlyPoison?
Attachment #9035369 - Flags: review?(jdemooij) → review+
Comment on attachment 9035370 [details] [diff] [review]
bug981991-poison-more

Review of attachment 9035370 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks, this makes sense I think. If we do get talos/raptor regressions or see this in profiles we can reconsider.
Attachment #9035370 - Flags: review?(jdemooij) → review+

One issue is that "-t all" doesn't run all talos tests :/ "mach try fuzzy" is needed for the Raptor and jsshell-bench tests (Octane might be interesting for the GC poisoning).

Assignee

Comment 19

5 months ago

(In reply to Jan de Mooij [:jandem] from comment #16)

Pre-existing: what do you think about renaming this to DebugPoison or
DebugOnlyPoison?

Good idea, I'll go with DebugOnlyPoison.

One issue is that "-t all" doesn't run all talos tests

Ah that's annoying. Well, I've run octane locally and didn't see any regressions so I'm going to land this and we can back it out if it causes problems.

Comment 20

5 months ago
Pushed by jcoppeard@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/77dfbff37444
Replace JS_*_POISON macros with inline functions r=jandem
https://hg.mozilla.org/integration/mozilla-inbound/rev/65174e301470
Make most poisoning unconditional r=jandem

Comment 21

5 months ago
bugherder
Status: NEW → RESOLVED
Last Resolved: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
Whiteboard: [js:p1] → [js:p1][adv-main66-]
You need to log in before you can comment on or make changes to this bug.