1155836 - Try to avoid the AMD bug in the DoGetStyle* functions

Assignee

Description

•

10 years ago

Lately, whenever bug 772330 hits, it's always in one of the nsStyleContext::DoGetStyle helpers. We still don't know what actually triggers the bug. It's probably edge cases in the chip internals that aren't architecturally exposed. I know that it can't be *entirely* caused by the DoGetStyle code, because I've found the same disassembly in both crashy and non-crashy builds. But maybe the code is *partly* responsible; maybe changing DoGetStyle might stop the crashes. These are performance-sensitive functions so we'll need to be very careful. I think the only way I can justify touching this code is if I can squeeze a tiny bit more perf out of it, and maybe as a side effect the AMD bug might go away. As a starting point: I noticed that (at least on Windows) aComputeData gets pushed on the stack. Maybe we could try templating to avoid that.

(Away)

Assignee

Comment 1

•

10 years ago

Attached patch Template on aComputeData (obsolete) — Details — Splinter Review

One possible approach

(Away)

Assignee

Comment 2

•

10 years ago

Comment on attachment 8594136 [details] [diff] [review] Template on aComputeData (Sorry for the flag, bz.) What do you think of this patch? I hear you've put some effort into optimizing these functions so I want to be careful not to regress that. What measurements were you most looking at when you were optimizing? What testing do I need to do (and on which platforms) to see whether this is an acceptable change? And if you have suggestions for different approaches, I'm all ears.

Attachment #8594136 - Flags: feedback?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 3

•

10 years ago

Comment on attachment 8594136 [details] [diff] [review] Template on aComputeData This seems reasonable. The one worry is that this is inlining a bit more than it used to, so might cause a bit more code bloat; we may be able to just put the template implementation out of line to avoid that, if we have to. But I wouldn't worry about it as a first cut. In terms of what I measured when optimizing this, I think I did things like take the WHATWG HTML single-page spec, set the root to display:none, flush layout, set the root to display:block, and measure how long a style flush (as measured by "getComputedStyle(document.documentElement).color" takes, then profile that operation and look at what's going on in the style system bits. In terms of testing, I expect just measuring the performance of said style flush and making sure it doesn't noticeably regress is good enough. I don't know about platforms... I tend to assume/hope that this stuff is reasonably cross-platform, which I realize is silly given our different compilers. ;) One thing we can definitely do, with this patch, is nix the MOZ_UNLIKELY() around "!aComputeData": Now that it's a template aComputeData is a compile-time constant, so the compiler will either drop the code after that if block or drop the "return nullptr", and either way drop the if test, right?

Attachment #8594136 - Flags: feedback?(bzbarsky) → feedback+

(Away)

Assignee

Comment 4

•

10 years ago

(In reply to Not doing reviews right now from comment #3) > Comment on attachment 8594136 [details] [diff] [review] > Template on aComputeData > > This seems reasonable. The one worry is that this is inlining a bit more > than it used to, so might cause a bit more code bloat; we may be able to > just put the template implementation out of line to avoid that, if we have > to. But I wouldn't worry about it as a first cut. FWIW the RuleNode methods had already been getting inlined on MSVC. There might be a little bit of extra code from splitting the <true> and <false> versions, but there are very few PeekFoo callers so most of the <false> ones get dropped by the linker anyway. > One thing we can definitely do, with this patch, is nix the MOZ_UNLIKELY() > around "!aComputeData": Now that it's a template aComputeData is a > compile-time constant, so the compiler will either drop the code after that > if block or drop the "return nullptr", and either way drop the if test, > right? True, good catch.

(Away)

Assignee

Comment 5

•

10 years ago

MSVC seems to take this pattern way too literally (with or without the template change): const nsStyle##name_ * cachedData = mCachedResetData \ ? static_cast<nsStyle##name_*>( \ mCachedResetData->mStyleStructs[eStyleStruct_##name_]) \ : nullptr; \ if (cachedData) /* Have it cached already, yay */ \ return cachedData; \ Note that if you fail the mCachedResetData test, you'll never pass the cachedData test. But the compiler wants to be extra sure: xul!nsStyleContext::DoGetStylePosition: 1035d0f4 55 push ebp 1035d0f5 8bec mov ebp,esp 1035d0f7 8b4120 mov eax,dword ptr [ecx+20h] 1035d0fa 85c0 test eax,eax ; OK, standard null check 1035d0fc 741c je xul!nsStyleContext::DoGetStylePosition+0x26 (1035d11a) ; let's say eax is 0 and follow the 'je': 1035d11a 33c0 xor eax,eax ; seriously? 1035d11c ebe3 jmp xul!nsStyleContext::DoGetStylePosition+0xd (1035d101) ; let's follow that jmp: 1035d101 85c0 test eax,eax ; again? So I changed it to: + if (mCachedResetData) { \ + const nsStyle##name_ * cachedData = \ + static_cast<nsStyle##name_*>( \ + mCachedResetData->mStyleStructs[eStyleStruct_##name_]); \ + if (cachedData) /* Have it cached already, yay */ \ + return cachedData; \ + } \ And the cruft fell out. PGO started inlining a lot deeper, too.

(Away)

Assignee

Comment 6

•

10 years ago

Attached patch Template + rewrote null-check — Details — Splinter Review

As far as I can tell this is a wash, perf-wise. Win32 talos is all gray (as in neither red nor green) with an equal number of non-significant pluses and minuses. I tried bz's WHATWG test locally under xperf and didn't see a difference in the style system.

Attachment #8596229 - Flags: review?(dbaron)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 7

•

10 years ago

Comment on attachment 8596229 [details] [diff] [review] Template + rewrote null-check r=dbaron, and sorry for the delay getting to this. (Once I read it closely it was a lot simpler than I thought it was going to be from skimming it.)

Attachment #8596229 - Flags: review?(dbaron) → review+

Pulsebot

Comment 8

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/98d3519ab098

(Away)

Assignee

Updated

•

10 years ago

Attachment #8594136 - Attachment is obsolete: true

Mats Palmgren (inactive)

Comment 9

•

10 years ago

I think 39a2 is affected by the "AMD bug", so it might be worth uplifting this to see if it helps there. (bug 1160317)

Robert Kaiser

Comment 10

•

10 years ago

Mats, all versions are randomly affected, including betas and sometimes release RCs (but we see to rebuild and finally release a version that is not affected). Be sure that we'll want to uplift to any feasible trains in any case.

Ryan VanderMeulen [:RyanVM]

Comment 11

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/98d3519ab098

Status: NEW → RESOLVED

Closed: 10 years ago

status-firefox40: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla40

(Away)

Assignee

Comment 12

•

10 years ago

jmaher helped get me some extra Talos runs on the inbound landing: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=4d85a02e9c28&newProject=mozilla-inbound&newRevision=98d3519ab098 Avi, we may consider uplifting this to beta or even release. Could you let me know if there's anything in that delta that looks 'real' and what are the next steps for investigating? Thanks!

Flags: needinfo?(avihpit)

Joel Maher ( :jmaher ) (UTC -8)

Comment 13

•

10 years ago

a shortcut view: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=4d85a02e9c28&newProject=mozilla-inbound&newRevision=98d3519ab098&hideMinorChanges=1 and a view of the retriggers on inbound: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&fromchange=da60d90cc685&tochange=67cb17031e9f&filter-searchStr=talos

Avi Halachmi (:avih)

Comment 14

•

10 years ago

The red and green highlights look reasonably real to me. They're not huge, but they're not negligible either. OTOH, besides the regressions, there are also improvements. I'd say that the bug this patch tries to fix is important enough to be worth landing despite the apparent regressions, especially since we want to uplift it ASAP. If it lands and uplifts though, I'd say the regressions are worth filing followup bugs to examine them more closely and hopefully fix them where possible.

Flags: needinfo?(avihpit)

Joel Maher ( :jmaher ) (UTC -8)

Comment 15

•

10 years ago

pgo runs are done and retriggered. The above link doesn't have the changes rolled out to support PGO :( The original compare talos should help us out here: https://compare-talos.paas.mozilla.org/?oldBranch=mozilla-inbound&oldRevs=4d85a02e9c28&newBranch=mozilla-inbound&newRev=98d3519ab098&submit=true

(Away)

Assignee

Comment 16

•

10 years ago

(In reply to Joel Maher (:jmaher) from comment #15) > https://compare-talos.paas.mozilla.org/?oldBranch=mozilla- > inbound&oldRevs=4d85a02e9c28&newBranch=mozilla- > inbound&newRev=98d3519ab098&submit=true I don't get any results on that page. Am I doing something wrong?

Flags: needinfo?(jmaher)

(Away)

Assignee

Comment 17

•

10 years ago

Apparently you have to capitalize Mozilla-Inbound. https://compare-talos.paas.mozilla.org/?oldBranch=Mozilla-Inbound&oldRevs=4d85a02e9c28&newBranch=Mozilla-Inbound&newRev=98d3519ab098&submit=true

Flags: needinfo?(jmaher)

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

10 years ago

Assignee: nobody → dmajor

Component: General → CSS Parsing and Computation

(Away)

Assignee

Comment 18

•

10 years ago

The PGO numbers look a lot better (and are the ones that really matter here). The only result that mildly stands out to me is dromaeo_css on linux32. From looking at the graph of various build results before and after the change, I don't think it's an issue though. At this point I would feel comfortable requesting an uplift.

Joel Maher ( :jmaher ) (UTC -8)

Comment 19

•

10 years ago

sold!

Avi Halachmi (:avih)

Comment 20

•

10 years ago

(In reply to David Major [:dmajor] from comment #17) > Apparently you have to capitalize Mozilla-Inbound. > > https://compare-talos.paas.mozilla.org/?oldBranch=Mozilla-Inbound&oldRevs=4d85a02e9c28&newBranch=Mozilla-Inbound&newRev=98d3519ab098&submit=true (In reply to David Major [:dmajor] from comment #18) > The PGO numbers look a lot better (and are the ones that really matter > here). The only result that mildly stands out to me is dromaeo_css on > linux32. From looking at the graph of various build results before and after > the change, I don't think it's an issue though. > > At this point I would feel comfortable requesting an uplift. Agreed.

(Away)

Assignee

Comment 21

•

10 years ago

Comment on attachment 8596229 [details] [diff] [review] Template + rewrote null-check Approval Request Comment [Feature/regressing bug #]: hardware! [User impact if declined]: high volume crashes [Describe test coverage new/current, TreeHerder]: CI is green (but that doesn't test the fix itself) [Risks and why]: The main risk is that we don't know whether this will actually stop the crashes. [String/UUID change made/needed]: none

Attachment #8596229 - Flags: approval-mozilla-beta?

Attachment #8596229 - Flags: approval-mozilla-aurora?

Liz Henry (:lizzard) (relman/hg->git project)

Updated

•

10 years ago

status-firefox37: --- → affected

status-firefox38: --- → affected

status-firefox39: --- → affected

Liz Henry (:lizzard) (relman/hg->git project)

Comment 22

•

10 years ago

Comment on attachment 8596229 [details] [diff] [review] Template + rewrote null-check Approved for uplift to aurora in hopes this will fix a hardware crash.

Attachment #8596229 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 23

•

10 years ago

https://hg.mozilla.org/releases/mozilla-aurora/rev/f0af8f992aeb

status-firefox39: affected → fixed

Sylvestre Ledru [:Sylvestre]

Comment 24

•

10 years ago

Guys, we are considering to do 38.0 rc build2. We are afraid to do that because of the amd crash. If we take this patch, do you think it will mitigate the risk to have the amd bug in a build2 ? Thanks

Flags: needinfo?(dmajor)

Flags: needinfo?(dbaron)

Flags: needinfo?(bzbarsky)

(Away)

Assignee

Comment 25

•

10 years ago

Given how badly the 38 betas have been hit, there's a high likelihood that we'll see the bug again if we do nothing. If we take the patch, the likelihood is unknown. It may work or it may not. To me that sounds like a better chance than doing nothing. I would say let's take the patch, but monitor the build just as aggressively as if we hadn't.

Flags: needinfo?(dmajor)

Flags: needinfo?(dbaron)

Flags: needinfo?(bzbarsky)

Sylvestre Ledru [:Sylvestre]

Comment 26

•

10 years ago

Comment on attachment 8596229 [details] [diff] [review] Template + rewrote null-check [Triage Comment] OK, let's take it in release & the 38.0 relbranch release ( GECKO380_2015050320_RELBRANCH )

Attachment #8596229 - Flags: approval-mozilla-beta? → approval-mozilla-release+

Sylvestre Ledru [:Sylvestre]

Comment 27

•

10 years ago

(thanks btw)

Randell Jesup [:jesup] (needinfo me)

Comment 28

•

10 years ago

https://hg.mozilla.org/releases/mozilla-release/rev/79b3a4285921 https://hg.mozilla.org/releases/mozilla-release/rev/7e44bac27dd6

status-firefox38.0.5: --- → fixed

Randell Jesup [:jesup] (needinfo me)

Comment 29

•

10 years ago

I presume we aren't fixing this for 37....

status-firefox37: affected → wontfix

status-firefox38: affected → fixed

(Away)

Assignee

Comment 30

•

10 years ago

For what it's worth, so far we haven't seen these crashes during the 38.0.5 and 39.0 betas. (Although we did see one each in JS and the CSS parser)

Template on aComputeData 10 years ago (Away) 9.56 KB, patch	bzbarsky : feedback+	Details \| Diff \| Splinter Review
Template + rewrote null-check 10 years ago (Away) 10.08 KB, patch	dbaron : review+ lizzard : approval-mozilla-aurora+ Sylvestre : approval-mozilla-release+	Details \| Diff \| Splinter Review