Closed
Bug 1155836
Opened 10 years ago
Closed 10 years ago
Try to avoid the AMD bug in the DoGetStyle* functions
Categories
(Core :: CSS Parsing and Computation, defect)
Tracking
()
RESOLVED
FIXED
mozilla40
People
(Reporter: away, Assigned: away)
References
Details
Attachments
(1 file, 1 obsolete file)
10.08 KB,
patch
|
dbaron
:
review+
lizzard
:
approval-mozilla-aurora+
Sylvestre
:
approval-mozilla-release+
|
Details | Diff | Splinter Review |
Lately, whenever bug 772330 hits, it's always in one of the nsStyleContext::DoGetStyle helpers.
We still don't know what actually triggers the bug. It's probably edge cases in the chip internals that aren't architecturally exposed. I know that it can't be *entirely* caused by the DoGetStyle code, because I've found the same disassembly in both crashy and non-crashy builds.
But maybe the code is *partly* responsible; maybe changing DoGetStyle might stop the crashes. These are performance-sensitive functions so we'll need to be very careful. I think the only way I can justify touching this code is if I can squeeze a tiny bit more perf out of it, and maybe as a side effect the AMD bug might go away.
As a starting point: I noticed that (at least on Windows) aComputeData gets pushed on the stack. Maybe we could try templating to avoid that.
Comment on attachment 8594136 [details] [diff] [review]
Template on aComputeData
(Sorry for the flag, bz.)
What do you think of this patch? I hear you've put some effort into optimizing these functions so I want to be careful not to regress that. What measurements were you most looking at when you were optimizing? What testing do I need to do (and on which platforms) to see whether this is an acceptable change?
And if you have suggestions for different approaches, I'm all ears.
Attachment #8594136 -
Flags: feedback?(bzbarsky)
Comment 3•10 years ago
|
||
Comment on attachment 8594136 [details] [diff] [review]
Template on aComputeData
This seems reasonable. The one worry is that this is inlining a bit more than it used to, so might cause a bit more code bloat; we may be able to just put the template implementation out of line to avoid that, if we have to. But I wouldn't worry about it as a first cut.
In terms of what I measured when optimizing this, I think I did things like take the WHATWG HTML single-page spec, set the root to display:none, flush layout, set the root to display:block, and measure how long a style flush (as measured by "getComputedStyle(document.documentElement).color" takes, then profile that operation and look at what's going on in the style system bits.
In terms of testing, I expect just measuring the performance of said style flush and making sure it doesn't noticeably regress is good enough. I don't know about platforms... I tend to assume/hope that this stuff is reasonably cross-platform, which I realize is silly given our different compilers. ;)
One thing we can definitely do, with this patch, is nix the MOZ_UNLIKELY() around "!aComputeData": Now that it's a template aComputeData is a compile-time constant, so the compiler will either drop the code after that if block or drop the "return nullptr", and either way drop the if test, right?
Attachment #8594136 -
Flags: feedback?(bzbarsky) → feedback+
(In reply to Not doing reviews right now from comment #3)
> Comment on attachment 8594136 [details] [diff] [review]
> Template on aComputeData
>
> This seems reasonable. The one worry is that this is inlining a bit more
> than it used to, so might cause a bit more code bloat; we may be able to
> just put the template implementation out of line to avoid that, if we have
> to. But I wouldn't worry about it as a first cut.
FWIW the RuleNode methods had already been getting inlined on MSVC. There might be a little bit of extra code from splitting the <true> and <false> versions, but there are very few PeekFoo callers so most of the <false> ones get dropped by the linker anyway.
> One thing we can definitely do, with this patch, is nix the MOZ_UNLIKELY()
> around "!aComputeData": Now that it's a template aComputeData is a
> compile-time constant, so the compiler will either drop the code after that
> if block or drop the "return nullptr", and either way drop the if test,
> right?
True, good catch.
MSVC seems to take this pattern way too literally (with or without the template change):
const nsStyle##name_ * cachedData = mCachedResetData \
? static_cast<nsStyle##name_*>( \
mCachedResetData->mStyleStructs[eStyleStruct_##name_]) \
: nullptr; \
if (cachedData) /* Have it cached already, yay */ \
return cachedData; \
Note that if you fail the mCachedResetData test, you'll never pass the cachedData test. But the compiler wants to be extra sure:
xul!nsStyleContext::DoGetStylePosition:
1035d0f4 55 push ebp
1035d0f5 8bec mov ebp,esp
1035d0f7 8b4120 mov eax,dword ptr [ecx+20h]
1035d0fa 85c0 test eax,eax ; OK, standard null check
1035d0fc 741c je xul!nsStyleContext::DoGetStylePosition+0x26 (1035d11a)
; let's say eax is 0 and follow the 'je':
1035d11a 33c0 xor eax,eax ; seriously?
1035d11c ebe3 jmp xul!nsStyleContext::DoGetStylePosition+0xd (1035d101)
; let's follow that jmp:
1035d101 85c0 test eax,eax ; again?
So I changed it to:
+ if (mCachedResetData) { \
+ const nsStyle##name_ * cachedData = \
+ static_cast<nsStyle##name_*>( \
+ mCachedResetData->mStyleStructs[eStyleStruct_##name_]); \
+ if (cachedData) /* Have it cached already, yay */ \
+ return cachedData; \
+ } \
And the cruft fell out. PGO started inlining a lot deeper, too.
As far as I can tell this is a wash, perf-wise.
Win32 talos is all gray (as in neither red nor green) with an equal number of non-significant pluses and minuses. I tried bz's WHATWG test locally under xperf and didn't see a difference in the style system.
Attachment #8596229 -
Flags: review?(dbaron)
Comment on attachment 8596229 [details] [diff] [review]
Template + rewrote null-check
r=dbaron, and sorry for the delay getting to this.
(Once I read it closely it was a lot simpler than I thought it was going to be from skimming it.)
Attachment #8596229 -
Flags: review?(dbaron) → review+
Attachment #8594136 -
Attachment is obsolete: true
Comment 9•10 years ago
|
||
I think 39a2 is affected by the "AMD bug", so it might be worth uplifting this
to see if it helps there. (bug 1160317)
Comment 10•10 years ago
|
||
Mats, all versions are randomly affected, including betas and sometimes release RCs (but we see to rebuild and finally release a version that is not affected). Be sure that we'll want to uplift to any feasible trains in any case.
Comment 11•10 years ago
|
||
Status: NEW → RESOLVED
Closed: 10 years ago
status-firefox40:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Assignee | ||
Comment 12•10 years ago
|
||
jmaher helped get me some extra Talos runs on the inbound landing: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=4d85a02e9c28&newProject=mozilla-inbound&newRevision=98d3519ab098
Avi, we may consider uplifting this to beta or even release. Could you let me know if there's anything in that delta that looks 'real' and what are the next steps for investigating? Thanks!
Flags: needinfo?(avihpit)
Comment 13•10 years ago
|
||
a shortcut view:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=4d85a02e9c28&newProject=mozilla-inbound&newRevision=98d3519ab098&hideMinorChanges=1
and a view of the retriggers on inbound:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&fromchange=da60d90cc685&tochange=67cb17031e9f&filter-searchStr=talos
Comment 14•10 years ago
|
||
The red and green highlights look reasonably real to me. They're not huge, but they're not negligible either. OTOH, besides the regressions, there are also improvements.
I'd say that the bug this patch tries to fix is important enough to be worth landing despite the apparent regressions, especially since we want to uplift it ASAP.
If it lands and uplifts though, I'd say the regressions are worth filing followup bugs to examine them more closely and hopefully fix them where possible.
Flags: needinfo?(avihpit)
Comment 15•10 years ago
|
||
pgo runs are done and retriggered. The above link doesn't have the changes rolled out to support PGO :(
The original compare talos should help us out here:
https://compare-talos.paas.mozilla.org/?oldBranch=mozilla-inbound&oldRevs=4d85a02e9c28&newBranch=mozilla-inbound&newRev=98d3519ab098&submit=true
Assignee | ||
Comment 16•10 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #15)
> https://compare-talos.paas.mozilla.org/?oldBranch=mozilla-
> inbound&oldRevs=4d85a02e9c28&newBranch=mozilla-
> inbound&newRev=98d3519ab098&submit=true
I don't get any results on that page. Am I doing something wrong?
Flags: needinfo?(jmaher)
Assignee | ||
Comment 17•10 years ago
|
||
Apparently you have to capitalize Mozilla-Inbound.
https://compare-talos.paas.mozilla.org/?oldBranch=Mozilla-Inbound&oldRevs=4d85a02e9c28&newBranch=Mozilla-Inbound&newRev=98d3519ab098&submit=true
Flags: needinfo?(jmaher)
Assignee: nobody → dmajor
Component: General → CSS Parsing and Computation
Assignee | ||
Comment 18•10 years ago
|
||
The PGO numbers look a lot better (and are the ones that really matter here). The only result that mildly stands out to me is dromaeo_css on linux32. From looking at the graph of various build results before and after the change, I don't think it's an issue though.
At this point I would feel comfortable requesting an uplift.
Comment 19•10 years ago
|
||
sold!
Comment 20•10 years ago
|
||
(In reply to David Major [:dmajor] from comment #17)
> Apparently you have to capitalize Mozilla-Inbound.
>
> https://compare-talos.paas.mozilla.org/?oldBranch=Mozilla-Inbound&oldRevs=4d85a02e9c28&newBranch=Mozilla-Inbound&newRev=98d3519ab098&submit=true
(In reply to David Major [:dmajor] from comment #18)
> The PGO numbers look a lot better (and are the ones that really matter
> here). The only result that mildly stands out to me is dromaeo_css on
> linux32. From looking at the graph of various build results before and after
> the change, I don't think it's an issue though.
>
> At this point I would feel comfortable requesting an uplift.
Agreed.
Assignee | ||
Comment 21•10 years ago
|
||
Comment on attachment 8596229 [details] [diff] [review]
Template + rewrote null-check
Approval Request Comment
[Feature/regressing bug #]: hardware!
[User impact if declined]: high volume crashes
[Describe test coverage new/current, TreeHerder]: CI is green (but that doesn't test the fix itself)
[Risks and why]: The main risk is that we don't know whether this will actually stop the crashes.
[String/UUID change made/needed]: none
Attachment #8596229 -
Flags: approval-mozilla-beta?
Attachment #8596229 -
Flags: approval-mozilla-aurora?
Updated•10 years ago
|
Comment 22•10 years ago
|
||
Comment on attachment 8596229 [details] [diff] [review]
Template + rewrote null-check
Approved for uplift to aurora in hopes this will fix a hardware crash.
Attachment #8596229 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 24•10 years ago
|
||
Guys, we are considering to do 38.0 rc build2. We are afraid to do that because of the amd crash.
If we take this patch, do you think it will mitigate the risk to have the amd bug in a build2 ?
Thanks
Flags: needinfo?(dmajor)
Flags: needinfo?(dbaron)
Flags: needinfo?(bzbarsky)
Assignee | ||
Comment 25•10 years ago
|
||
Given how badly the 38 betas have been hit, there's a high likelihood that we'll see the bug again if we do nothing. If we take the patch, the likelihood is unknown. It may work or it may not. To me that sounds like a better chance than doing nothing. I would say let's take the patch, but monitor the build just as aggressively as if we hadn't.
Flags: needinfo?(dmajor)
Flags: needinfo?(dbaron)
Flags: needinfo?(bzbarsky)
Comment 26•10 years ago
|
||
Comment on attachment 8596229 [details] [diff] [review]
Template + rewrote null-check
[Triage Comment]
OK, let's take it in release & the 38.0 relbranch release ( GECKO380_2015050320_RELBRANCH )
Attachment #8596229 -
Flags: approval-mozilla-beta? → approval-mozilla-release+
Comment 27•10 years ago
|
||
(thanks btw)
Comment 28•10 years ago
|
||
https://hg.mozilla.org/releases/mozilla-release/rev/79b3a4285921
https://hg.mozilla.org/releases/mozilla-release/rev/7e44bac27dd6
status-firefox38.0.5:
--- → fixed
Comment 29•10 years ago
|
||
I presume we aren't fixing this for 37....
Assignee | ||
Comment 30•10 years ago
|
||
For what it's worth, so far we haven't seen these crashes during the 38.0.5 and 39.0 betas. (Although we did see one each in JS and the CSS parser)
You need to log in
before you can comment on or make changes to this bug.
Description
•