clock_gettime() etc are very slow on some android devices
Categories
(Core :: Performance, defect)
Tracking
()
People
(Reporter: jnicol, Assigned: jnicol)
References
(Blocks 2 open bugs)
Details
(Whiteboard: [sp3])
Attachments
(8 files, 1 obsolete file)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
diannaS
:
approval-mozilla-esr115+
|
Details | Review |
https://share.firefox.dev/3PPYuRI
This shows up a lot in a speedometer (around 4%), and skews comparisons with Chrome as both browsers take timestamps in different places, though appear to be similarly affected overall.
This occurs on several devices I have tested including the Samsung A51, Samsung S9, Pixel 5, OnePlus 8 Pro. On other devices (including Pixel 4, PIxel 6, Samsung S10) this barely shows up in profiles if at all. This isn't a exynos vs snapdragon type thing. Additionally, naively checking the kernel version doesn't show a pattern: if this was broken then fixed in specific versions then perhaps vendor specific backports muddy the water.
It appears that the VDSO fast path is failing for some reason and we end up doing a syscall. I'm trying to get kernel symbols included in the profile to see if that sheds any light on why.
We may need to audit our usage of timestamps on the critical path.
Updated•10 months ago
|
Updated•10 months ago
|
Assignee | ||
Comment 1•9 months ago
|
||
I think I've figured out why this is slow, on the A51 at least.
The code I'm linking to below is from the 4.14 branch of the android common kernel. The actual kernel used on the device will differ but hopefully the relevant bits are still the same.
See ARM64_ERRATUM_858921
, which affects Cortex-A73 CPUs (which the Samsung A51 has). Presumably this workaround is therefore enabled.
We can see here this creates a arch_timer_erratum_workaround
instance with the read_cntvct_el0
field set.
We can then see here that if that field is set, we set clocksource_counter.archdata.vdso_direct
to false
.
Then in update_vsyscall()
, which is where the kernel writes its time data to the vdso data page, if archdata.vdso_direct
is false then it sets vdso_data.use_syscall
to true.
Then finally, see the actual clock_gettime() implementation in the vdso. This uses the syscall_check
macro, which reads the value of vdso_data.use_syscall
and branches to the syscall fallback if it's set.
Assignee | ||
Comment 2•9 months ago
|
||
CLOCK_MONOTONIC_COARSE
is unaffected by this issue, it stays in the VDSO fast path. So at the very least we can add an implementation of TimeStamp::NowLowRes()
using this.
Additionally we can then audit our existing usage of timestamps, and see if there's any we can either remove or switch to low resolution.
Assignee | ||
Comment 3•9 months ago
•
|
||
The Pixel 5 family is also affected via the same mechanism, but due to a different CPU erratum, which affects Cortex A76 and Kyro 4G (of which the pixel 5 has the latter core).
While the pixel 6 family has a Cortex A76, it is not affected due to this commit which restricts that workaround to AArch32 mode.
There are other affected devices too with different CPUs, and sometimes 2 devices may have the same CPU and kernel version but one is affected and one is not. But I'm confident enough that this is our cause, and will chaulk that up to vendors cherry-picking different patches on to their kernels.
Assignee | ||
Comment 4•9 months ago
|
||
We have discovered that clock_gettime(CLOCK_MONOTONIC) can be slow on
certain arm64 devices due to kernel workarounds for CPU errata
avoiding the VDSO fast-path. Using CLOCK_MONOTONIC_COARSE is
unnaffected by these issues. This patch adds an implementation of
TimeStamp::NowLoRes() using the coarse clock, meaning that when lower
precision timestamps are adequate we do not pay the penalty of hitting
the slow path.
Updated•9 months ago
|
Assignee | ||
Comment 5•9 months ago
|
||
On certain devices we have seen that taking frequent timestamps can be
expensive. Currently we measure the time taken to write protect JIT
code, and as this is a hot path we see these timestamps show up in
profiles. However, since bug 1835876 we no longer write protect code
in content processes. This patch therefore avoids measuring the time
when write protection is disabled.
Depends on D185004
Assignee | ||
Comment 6•9 months ago
|
||
On certain devices we have seen that taking frequent high-resolution
timestamps can be expensive, therefore it makes sense to use
low-resolution timestamps instead where possible.
js::Nursery::isUnderused() and CCGCScheduler::MaybePokeCC() check for
timeouts of > 1 second, therefore the low resolution timer is more
than adequate.
Depends on D185005
Assignee | ||
Comment 7•9 months ago
|
||
On certain devices we have seen that taking frequent timestamps can be
expensive, therefore it makes sense to reduce the number of timestamps
taken in hot paths. Measuring the time taken in refresh driver phases
and in NotifyObservers() shows up in profiles. However, these values
are only used to report to telemetry and we do not actually collect
this telemetry on Android. This patch therefore avoids measuring these
durations on Android.
Depends on D185006
Assignee | ||
Comment 8•9 months ago
|
||
As taking timestamps can be expensive on certain devices we should try
to avoid taking them where possible. This patch fixes a few occasions
where we take a timestamp solely for a profiler marker even when the
current thread is not being profiled.
Depends on D185007
Assignee | ||
Comment 9•9 months ago
|
||
On certain devices we have seen that taking frequent timestamps can be
expensive, therefore it makes sense to reduce the number of timestamps
taken in hot paths. HttpChannelChild::OnStopRequest() currently takes
multiple timestamps (for profiler markers and telemetry measurements)
without performing meaningful work in between. This patch replaces
these with a single timestamp used multiple times.
Depends on D185008
Assignee | ||
Comment 10•9 months ago
|
||
On certain devices we have seen that taking frequent timestamps can be
expensive, therefore it makes sense to reduce the number of timestamps
taken in hot paths. nsTimerImpl::Fire() currently takes a timestamp
which only gets used in logging. This patch therefore makes it so the
timestamp is only taken when logging is enabled.
Depends on D185009
Updated•9 months ago
|
Updated•9 months ago
|
Comment 11•9 months ago
|
||
This is a more generic version of D185008.
Comment 12•9 months ago
|
||
Pushed by jnicol@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/dfac5b1681a1 Implement TimeStamp::NowLoRes() when CLOCK_MONOTONIC_COARSE is supported. r=glandium https://hg.mozilla.org/integration/autoland/rev/e1cf0b9086ba Avoid measuring protect time when write-protection is disabled. r=jandem https://hg.mozilla.org/integration/autoland/rev/cf8e2772f907 Use low resolution timestamp for GC timeouts. r=jonco,smaug https://hg.mozilla.org/integration/autoland/rev/65677e6aa05d Remove unused telemetry probes. r=emilio https://hg.mozilla.org/integration/autoland/rev/01be17916dd3 Avoid taking unnecessary timestamps when thread is not being profiled for markers. r=julienw https://hg.mozilla.org/integration/autoland/rev/e03410fcc17d Use single timestamp in HttpChannelChild::OnStopRequest(). r=necko-reviewers,kershaw,jesup,acreskey https://hg.mozilla.org/integration/autoland/rev/7b7e075d34e7 Avoid unnecessary timestamp in nsTimerImpl::Fire if logging is disabled. r=xpcom-reviewers,nika
Updated•9 months ago
|
Comment 13•9 months ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/dfac5b1681a1
https://hg.mozilla.org/mozilla-central/rev/e1cf0b9086ba
https://hg.mozilla.org/mozilla-central/rev/cf8e2772f907
https://hg.mozilla.org/mozilla-central/rev/65677e6aa05d
https://hg.mozilla.org/mozilla-central/rev/01be17916dd3
https://hg.mozilla.org/mozilla-central/rev/e03410fcc17d
https://hg.mozilla.org/mozilla-central/rev/7b7e075d34e7
Comment 14•9 months ago
|
||
Pushed by ealvarez@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ccf1b5218169 Ensure profiler_add_marker arguments are not evaluated when not profiled. r=jnicol,canaltinova
Comment 15•9 months ago
|
||
Backed out for bustages on ProfilerMarkers.h
Backout link: https://hg.mozilla.org/integration/autoland/rev/df7fb7f4fde75763265536c8098274ee9d22677d
Log link: https://treeherder.mozilla.org/logviewer?job_id=425518473&repo=autoland&lineNumber=27635
Assignee | ||
Comment 16•9 months ago
|
||
redirecting ni to emilio
Comment 17•9 months ago
|
||
Comment on attachment 9348145 [details]
Bug 1843534 - Ensure profiler_add_marker arguments are not evaluated when not profiled. r=canaltinova,aabh
Revision D185800 was moved to bug 1848130. Setting attachment 9348145 [details] to obsolete.
Comment 18•9 months ago
|
||
Let me move that to a separate bug to ease tracking.
Updated•8 months ago
|
Comment 19•6 months ago
•
|
||
Note, I'm going to request ESR115 uplift one particular patch on this bug, https://phabricator.services.mozilla.com/D185007 in particular, since it's just a code-removal and it makes the ESR uplifts in bug 1861259 apply more cleanly.
(I'm using moz-phab to request uplift of just that one patch; I think that'll result in a new ESR-bound phabricator revision here. Apologies if anything goes haywire, as the new uplift-approval process is quite new.)
Comment 20•6 months ago
|
||
These telemetry probes have long since expired, so remove them. This
avoids taking unnecessary timestamps in these functions, which we have
seen can be expensive on certain devices.
Original Revision: https://phabricator.services.mozilla.com/D185007
Updated•6 months ago
|
Comment 21•6 months ago
|
||
Uplift Approval Request
- Steps to reproduce for manual QE testing: N/A
- Is Android affected?: yes
- String changes made/needed: None
- Code covered by automated testing: yes
- User impact if declined: None, but it makes uplift in bug 1861259 more awkward/human-error-prone
- Explanation of risk level: This is just removing code for an unused telemetry probe.
- Risk associated with taking this patch: Essentially zero.
- Fix verified in Nightly: yes
- Needs manual QE test: no
Comment 22•6 months ago
|
||
Comment on attachment 9360719 [details]
Bug 1843534 - Remove unused telemetry probes.
Approved for 115.5esr
Comment 23•6 months ago
|
||
uplift |
https://hg.mozilla.org/releases/mozilla-esr115/rev/cb731d48619e
Updated•6 months ago
|
Comment 24•6 months ago
|
||
I don't think it's correct to say that this bug is fixed in ESR 115. Only one patch was uplifted that doesn't affect behaviour.
Updated•6 months ago
|
Description
•