Closed Bug 1421445 Opened 8 years ago Closed 8 years ago

AMD Bobcat 64-bit crash mitigation

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla59

Tracking Flags:

Tracking

Status

firefox57

---

wontfix

firefox58

---

fixed

firefox59

---

fixed

People

(Reporter: tcampbell, Assigned: tcampbell)

References

Details

Attachments

(3 files, 4 obsolete files)

0001-Bug-1421445-Don-t-waste-space-allocating-jit-code-bu.patch 8 years ago Ted Campbell [:tcampbell] 4.50 KB, patch		Details \| Diff \| Splinter Review
0002-Bug-1421445-Add-NOP-slide-before-JIT-code-buffers-on.patch 8 years ago Ted Campbell [:tcampbell] 2.26 KB, patch	jandem : feedback+	Details \| Diff \| Splinter Review
0001-Bug-1421445-Don-t-waste-space-allocating-jit-code-bu.patch 8 years ago Ted Campbell [:tcampbell] 6.08 KB, patch		Details \| Diff \| Splinter Review
0002-Bug-1421445-Add-NOP-slide-before-JIT-code-buffers-on.patch 8 years ago Ted Campbell [:tcampbell] 2.80 KB, patch	jandem : review+	Details \| Diff \| Splinter Review
0001-Bug-1421445-Don-t-waste-space-allocating-jit-code-bu.patch 8 years ago Ted Campbell [:tcampbell] 6.08 KB, patch	jandem : review+ lizzard : approval-mozilla-beta+	Details \| Diff \| Splinter Review
0001-Bug-1421445-Work-around-Valgrind-bug-concerning-JitC.patch 8 years ago Ted Campbell [:tcampbell] 2.04 KB, patch		Details \| Diff \| Splinter Review
0003-Bug-1421445-Insert-UD2-instructions-after-JitCode.patch 8 years ago Ted Campbell [:tcampbell] 2.84 KB, patch	jandem : review+	Details \| Diff \| Splinter Review

Ted Campbell [:tcampbell]

Assignee

Description

•

8 years ago

As an attempt to reduce crashes in Bug 1281759, I propose adding NOPs before JIT code buffers. Looking at the 64-bit crashes on FF57 / FF58, the reported fault instruction is a few bytes before the code starts in the middle of the JitCode* pointer. This patch observes that we over-allocate those buffers by sizeof(void*) due to incorrect alignment math. By stealing that data to fill with NOPs we remain size neutral and need fewer checks if CPU is affected. In the future, if this fix does not provide stability improvements, we remove the NOPs and have a size reduction. Looking at crash data, it seems that most of the WRITE faults are crashing writing to address of %rax, which is consistent with trying to execute the JitCode* pointer as code. Currently, 75% of the crashes we see for this 64-bit FF57 AMD Bobcat signature are WRITE faults.

Ted Campbell [:tcampbell]

Assignee

Comment 1

•

8 years ago

Attached patch 0001-Bug-1421445-Don-t-waste-space-allocating-jit-code-bu.patch (obsolete) — Details — Splinter Review

Add JitCodeHeader struct and fix allocation size computation to save one word per jit code buffer.

Ted Campbell [:tcampbell]

Assignee

Comment 2

•

8 years ago

Attached patch 0002-Bug-1421445-Add-NOP-slide-before-JIT-code-buffers-on.patch (obsolete) — Details — Splinter Review

Add NOP-slide before jit code buffers. This is size-neutral when including previous patch. As a result, I don't try and be too clever about only applying this to affected machines.

Ted Campbell [:tcampbell]

Assignee

Updated

•

8 years ago

Assignee: nobody → tcampbell

Status: NEW → ASSIGNED

Ted Campbell [:tcampbell]

Assignee

Updated

•

8 years ago

Attachment #8932634 - Flags: feedback?(jdemooij)

Jan de Mooij [:jandem]

Comment 3

•

8 years ago

Comment on attachment 8932634 [details] [diff] [review] 0002-Bug-1421445-Add-NOP-slide-before-JIT-code-buffers-on.patch Review of attachment 8932634 [details] [diff] [review]: ----------------------------------------------------------------- Makes sense. ::: js/src/jit/Ion.cpp @@ +739,5 @@ > return trampolineCode(p->value()); > } > > +void > +JitCodeHeader::init(JitCode* jitCode) { Nit: { on its own line.

Attachment #8932634 - Flags: feedback?(jdemooij) → feedback+

Ted Campbell [:tcampbell]

Assignee

Comment 4

•

8 years ago

Attached patch 0001-Bug-1421445-Don-t-waste-space-allocating-jit-code-bu.patch (obsolete) — Details — Splinter Review

Cleaned up assert to check end of buffer is within range.

Attachment #8932633 - Attachment is obsolete: true

Attachment #8932919 - Flags: review?(jdemooij)

Ted Campbell [:tcampbell]

Assignee

Comment 5

•

8 years ago

Attached patch 0002-Bug-1421445-Add-NOP-slide-before-JIT-code-buffers-on.patch — Details — Splinter Review

Fixed parens/whitespace.

Attachment #8932634 - Attachment is obsolete: true

Attachment #8932920 - Flags: review?(jdemooij)

Ted Campbell [:tcampbell]

Assignee

Comment 6

•

8 years ago

Attached patch 0001-Bug-1421445-Don-t-waste-space-allocating-jit-code-bu.patch — Details — Splinter Review

Oops.. s/code/codeStart in the macro.

Attachment #8932919 - Attachment is obsolete: true

Attachment #8932919 - Flags: review?(jdemooij)

Attachment #8932923 - Flags: review?(jdemooij)

Ted Campbell [:tcampbell]

Assignee

Comment 7

•

8 years ago

Try run with Windows PGO builds in progress: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c1b201532226789ebad03cb8431f34538966e91a

Jan de Mooij [:jandem]

Comment 8

•

8 years ago

Comment on attachment 8932923 [details] [diff] [review] 0001-Bug-1421445-Don-t-waste-space-allocating-jit-code-bu.patch Review of attachment 8932923 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/Linker.cpp @@ +51,4 @@ > > // Bump the code up to a nice alignment. > codeStart = (uint8_t*)AlignBytes((uintptr_t)codeStart, CodeAlignment); > + MOZ_ASSERT(codeStart + masm.bytesNeeded() <= result + bytesNeeded); Good to have this assert.

Attachment #8932923 - Flags: review?(jdemooij) → review+

Jan de Mooij [:jandem]

Updated

•

8 years ago

Attachment #8932920 - Flags: review?(jdemooij) → review+

Ted Campbell [:tcampbell]

Assignee

Comment 9

•

8 years ago

Try run from the final patches (with s/code/codeStart/ fix): https://treeherder.mozilla.org/#/jobs?repo=try&revision=67a3d0645a7fa634429fcc95bf51b6c9fc3c316f

Pulsebot

Comment 10

•

8 years ago

Pushed by tcampbell@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/797d9e71b648 Don't waste space allocating jit code buffer. r=jandem https://hg.mozilla.org/integration/mozilla-inbound/rev/39f221c258fc Add NOP-slide before JIT code buffers on AMD Bobcat. r=jandem

Cosmin Sabou [:CosminS]

Comment 11

•

8 years ago

Backed out for frequent Valgrind test failures https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=39f221c258fc3cbb4162fd891ff68b6631f7ceb9&filter-searchStr=98b6ca8dacff858ad2151e9c6cc26eeff630bc64 https://treeherder.mozilla.org/logviewer.html#?job_id=148805049&repo=mozilla-inbound&lineNumber=35487 https://hg.mozilla.org/integration/mozilla-inbound/rev/33826eb2a21661bd8712411f2d32626503ced303

Flags: needinfo?(tcampbell)

Cristian Brindusan [:cbrindusan]

Comment 12

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/797d9e71b648 https://hg.mozilla.org/mozilla-central/rev/39f221c258fc

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

status-firefox59: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla59

Chris Peterson [:cpeterson]

Comment 13

•

8 years ago

Reopening bug because the patches were backed out on inbound. The patches have been merged to central, but the backout hasn't yet.

Status: RESOLVED → REOPENED

status-firefox57: --- → affected

status-firefox58: --- → affected

Resolution: FIXED → ---

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 14

•

8 years ago

Sorry about that and thanks for already reopening. https://hg.mozilla.org/mozilla-central/rev/33826eb2a216

status-firefox59: fixed → affected

Target Milestone: mozilla59 → ---

Jason Orendorff [:jorendorff]

Updated

•

8 years ago

Priority: -- → P3

Ted Campbell [:tcampbell]

Assignee

Comment 15

•

8 years ago

Attached patch 0001-Bug-1421445-Work-around-Valgrind-bug-concerning-JitC.patch (obsolete) — Details — Splinter Review

The Valgrind failure is that we pack instructions to close to the following JitCodeHeader data leading to decoding garbage and walking off the page and SEGV-ing Valgrind itself. This patch adds a word of padding at end of JitCode buffer, but only on MOZ_VALGRIND. Jan, does this seem too questionable to land? I'll follow up on the valgrind side to see if it should be fixed.

Flags: needinfo?(tcampbell)

Attachment #8935645 - Flags: review?(jdemooij)

Ted Campbell [:tcampbell]

Assignee

Comment 16

•

8 years ago

Another point that :pbone made is to check the Intel guides on recommendations for padding between code and data because they recommend not having garbage nearby.

Ted Campbell [:tcampbell]

Assignee

Comment 17

•

8 years ago

Try run in progress: https://treeherder.mozilla.org/#/jobs?repo=try&revision=d8fa6800124abee6aab034b1f07e96170d8ab45d

mirh

Comment 18

•

8 years ago

Uhm, btw, could you add a check for linux kernel <4.14, alongside cpuid check, in NeedAmdBugWorkaround function?

Ted Campbell [:tcampbell]

Assignee

Comment 19

•

8 years ago

For reference, this is the kernel patch that mirh is referring to: https://github.com/torvalds/linux/commit/bfc1168de949cd3e9ca18c3480b5085deff1ea7c Also for reference, this is what :pbone is talking about in Intel guides: > Assembly/Compiler Coding Rule 57. (M impact, L generality) If (hopefully read-only) data must > occur on the same page as code, avoid placing it immediately after an indirect jump. For example, > follow an indirect jump with its mostly likely target, and place the data after an unconditional branch. After some discussion with :sewardj, it seems the Valgrind failure deserves further review and the assessment in Comment 15 is not correct. Valgrind shouldn't be looking any further than the basic block.

Ted Campbell [:tcampbell]

Assignee

Comment 20

•

8 years ago

Comment on attachment 8935645 [details] [diff] [review] 0001-Bug-1421445-Work-around-Valgrind-bug-concerning-JitC.patch This patch is not the right approach. I have an alternate that only puts the pad at the end of ExecutableAllocator pool that passes valgrind.

Attachment #8935645 - Attachment is obsolete: true

Attachment #8935645 - Flags: review?(jdemooij)

Ted Campbell [:tcampbell]

Assignee

Comment 21

•

8 years ago

Attached patch 0003-Bug-1421445-Insert-UD2-instructions-after-JitCode.patch — Details — Splinter Review

This patch add's UD2 ops at the end of JitCode to avoid perf impact of indirect branch followed by data bytes. This also mitigates the Valgrind crash.

Attachment #8936677 - Flags: review?(jdemooij)

Ted Campbell [:tcampbell]

Assignee

Comment 22

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=48834960305b6e0a01179caa103f9a55ef62539a&group_state=expanded

Jan de Mooij [:jandem]

Comment 23

•

8 years ago

Comment on attachment 8936677 [details] [diff] [review] 0003-Bug-1421445-Insert-UD2-instructions-after-JitCode.patch Review of attachment 8936677 [details] [diff] [review]: ----------------------------------------------------------------- Excellent.

Attachment #8936677 - Flags: review?(jdemooij) → review+

Pulsebot

Comment 24

•

8 years ago

Pushed by tcampbell@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/a031f41d0fb9 Don't waste space allocating jit code buffer. r=jandem https://hg.mozilla.org/integration/mozilla-inbound/rev/90a8cf71fa46 Add NOP-slide before JIT code buffers on AMD Bobcat. r=jandem https://hg.mozilla.org/integration/mozilla-inbound/rev/af47243ec0d7 Insert UD2 instructions after JitCode. r=jandem

Dorel Luca [:dluca]

Comment 25

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/a031f41d0fb9 https://hg.mozilla.org/mozilla-central/rev/90a8cf71fa46 https://hg.mozilla.org/mozilla-central/rev/af47243ec0d7

Status: REOPENED → RESOLVED

Closed: 8 years ago → 8 years ago

status-firefox59: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla59

Sylvestre Ledru [:Sylvestre]

Comment 26

•

8 years ago

Is that something you would like to uplift in 58?

status-firefox57: affected → wontfix

Flags: needinfo?(tcampbell)

Ted Campbell [:tcampbell]

Assignee

Comment 27

•

8 years ago

Comment on attachment 8932923 [details] [diff] [review] 0001-Bug-1421445-Don-t-waste-space-allocating-jit-code-bu.patch Approval Request Comment [Feature/Bug causing the regression]: AMD Bobcat CPU bug [User impact if declined]: This attempts to mitigate some of the AMD Bobcat crash signatures which are the top 64-bit crash on 57 and 58. [Is this code covered by automated tests?]: Automated tests cover regressions on normal CPUs. There is no automated test for the specific CPU fault. [Has the fix been verified in Nightly?]: Nightly has not regressed with these patches. [Needs manual test from QE? If yes, steps to reproduce]: No. We will watch the crash stats after landing. [List of other uplifts needed for the feature/fix]: All three patches on this bug are needed. [Is the change risky?]: Low-moderate [Why is the change risky/not risky?]: We adjust memory layout of a key structure, which is a risk. This code is heavily covered by many existing test suites so if nightly is stable, I don't expect problems in uplift. [String changes made/needed]: None

Flags: needinfo?(tcampbell)

Attachment #8932923 - Flags: approval-mozilla-beta?

Liz Henry (:lizzard) (privacy/fingerprinting team)

Comment 28

•

8 years ago

Comment on attachment 8932923 [details] [diff] [review] 0001-Bug-1421445-Don-t-waste-space-allocating-jit-code-bu.patch Sounds promising to fix a top crash in 58, let's uplift for the 58.0b12 build next week.

Attachment #8932923 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Jason Orendorff [:jorendorff]

Updated

•

8 years ago

Depends on: 1425413

Chris Peterson [:cpeterson]

Comment 29

•

8 years ago

Ted, do you expect that we will see any change in the AMD crash volume from Nightly users if this fix is working? Or are there too few AMD Bobcat users running Nightly? In the four days since this fix landed in Nightly 59 (2017-12-14), there have been 103 EnterJit and 16 EnterBaseline crashes from Win64 Firefox users. In the four days preceding the fix, there were 104 EnterJit and zero EnterBaseline crashes.

Flags: needinfo?(tcampbell)

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 30

•

8 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/59058a3856b7 https://hg.mozilla.org/releases/mozilla-beta/rev/7fa49bea2cd6 https://hg.mozilla.org/releases/mozilla-beta/rev/377dba3f7dd6

status-firefox58: affected → fixed

Chris Peterson [:cpeterson]

Comment 31

•

8 years ago

Ted and I chatted on Slack. He said the mitigation didn't fix the AMD crashes. It just made the crashes weirder, such as impossible faults in register move instructions. He doesn't think we should back out the code change. It cleans up the JIT code and it shouldn't make these AMD crashes worse.

Flags: needinfo?(tcampbell)

mirh

Comment 32

•

8 years ago

Any update on this? Might KAISER/Spectre/Meltdown workarounds-mitigation you or OS implement help (or worsen?) the issue? Also before we all forget again, adding that little check I suggest in comment 18 wouldn't hurt.

You need to log in before you can comment on or make changes to this bug.