The PHC check has too much overhead on arm64
Categories
(Core :: Memory Allocator, enhancement, P1)
Tracking
()
| Tracking | Status | |
|---|---|---|
| firefox129 | --- | fixed |
People
(Reporter: mstange, Assigned: pbone)
References
Details
Attachments
(14 files, 3 obsolete files)
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review | |
|
48 bytes,
text/x-phabricator-request
|
Details | Review |
Profile: https://share.firefox.dev/4aOcGCP
https://speedometer-preview.netlify.app/?suite=TodoMVC-Lit-Complex-DOM&iterationCount=100 is spending too much time in MaybePageAlloc, specifically in this check:
int32_t newDelay = GAtomic::DecrementDelay();
if (newDelay != 0) {
return nullptr;
}
I think the atomic should be revisited. In particular, I don't believe this sentence:
// It's critical that sAllocDelay has ReleaseAcquire semantics, because that
// guarantees that exactly one thread will see sAllocDelay have the value 0.
// (Relaxed semantics wouldn't guarantee that.)
Relaxed atomics are still atomic.
| Assignee | ||
Comment 1•2 years ago
|
||
Thanks Markus.
I think we could use thread local variables here rather than atomics.
This bug is almost a duplicate of 1833807, but I'm going to make it "depends on" because once we do that bug we should re-test arm64.
| Assignee | ||
Comment 2•2 years ago
|
||
Working on this now.
| Assignee | ||
Comment 3•2 years ago
|
||
Currently we have 3 structures containing mutable data seperated by their
lock-safe behaviour (locked, atomic and TLS). I'd prefer to associate
structures by what they mean. This patch moves the tlsIsDisabled
variable, the only one in GTls into GMutable and removes the GTls
class.
| Assignee | ||
Comment 4•2 years ago
|
||
| Assignee | ||
Comment 5•2 years ago
|
||
| Assignee | ||
Comment 6•2 years ago
|
||
This also means that we can remove the logic that checks if it has wrapped
if PHC was disabled on the current thread since that won't happen due to
other threads. The patch also adds code that has to initialise the delay
on each thread.
| Assignee | ||
Comment 7•2 years ago
|
||
Testing the patches first on x86_64 (Ryzen 4 Threadripper) I got the following results for the subtest above (https://speedometer-preview.netlify.app/?suite=TodoMVC-Lit-Complex-DOM&iterationCount=100)
Central: 28.1 +- 0.32
Tip: 28.5 +- 0.36
Disabled: 27.8 +- 0.32
Maybe smaller numbers are better in which case my patch makes this slower?
But looking at the profiles, the overhead of this code goes from 2.2% to 1.5%, so either my patch makes this better or it hits a bottleneck elsewhere.
Central: https://share.firefox.dev/3U4JY9l
Tip: https://share.firefox.dev/4aDsQ0X
I'll run tests on my Mac Mini M1 soon.
| Assignee | ||
Comment 8•2 years ago
|
||
On my Mac M1
Central: 38.7 +- 0.40
Tip: 39.2 +- 0.55
PHC disabled: 39.8 +- 0.46
So my patch helps.
Profiles:
Central: https://share.firefox.dev/49n79kA
Tip: https://share.firefox.dev/3vCs45w
it seems to drop from 1.2% to 0.6% here (you have to find pthread_getspecific and pthread_setspecific)
I also noticed that in my "central" profile the branch outside DecrementDelay() showed up with a moderate number of samples, but in the new profile it had zero. So this has improved the cost of MaybePageAlloc(), not only DecrementDelay()
On ARM the atomic used for Now() didn't show up at all. it is Relaxed.
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
| Assignee | ||
Comment 9•1 year ago
|
||
| Assignee | ||
Comment 10•1 year ago
|
||
| Assignee | ||
Comment 11•1 year ago
|
||
| Assignee | ||
Comment 12•1 year ago
|
||
| Assignee | ||
Comment 13•1 year ago
|
||
| Assignee | ||
Comment 14•1 year ago
|
||
| Assignee | ||
Comment 15•1 year ago
|
||
| Assignee | ||
Comment 16•1 year ago
|
||
Rename this structure to PHCRegion and move its instance into PHC.
| Assignee | ||
Comment 17•1 year ago
|
||
| Assignee | ||
Comment 18•1 year ago
|
||
Updated•1 year ago
|
Updated•1 year ago
|
| Assignee | ||
Comment 19•1 year ago
|
||
This patch uses the wrong approach
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
| Assignee | ||
Comment 21•1 year ago
|
||
Updated•1 year ago
|
| Assignee | ||
Comment 22•1 year ago
|
||
Comment 23•1 year ago
|
||
Comment 24•1 year ago
|
||
| bugherder | ||
https://hg.mozilla.org/mozilla-central/rev/1f2e2a375881
https://hg.mozilla.org/mozilla-central/rev/ecae8e56c5cc
https://hg.mozilla.org/mozilla-central/rev/d1f91fdc7c68
https://hg.mozilla.org/mozilla-central/rev/6248406a1453
https://hg.mozilla.org/mozilla-central/rev/2092fb5bf23b
https://hg.mozilla.org/mozilla-central/rev/da3d695e2d65
https://hg.mozilla.org/mozilla-central/rev/6c5254f3df9f
https://hg.mozilla.org/mozilla-central/rev/d6539e603aa5
https://hg.mozilla.org/mozilla-central/rev/41762a317aae
https://hg.mozilla.org/mozilla-central/rev/6ace59776780
https://hg.mozilla.org/mozilla-central/rev/dc512f63eaa6
https://hg.mozilla.org/mozilla-central/rev/2a81039219c9
https://hg.mozilla.org/mozilla-central/rev/fa1129aa4279
https://hg.mozilla.org/mozilla-central/rev/f72b2832f50f
Updated•1 year ago
|
Description
•