ipc::mscom::SpinEvent should only spin for a short time

RESOLVED FIXED in Firefox 61

Status

()

enhancement
RESOLVED FIXED
2 years ago
Last year

People

(Reporter: Jamie, Assigned: Jamie)

Tracking

Trunk
mozilla61
All
Windows
Points:
---

Firefox Tracking Flags

(firefox61 fixed)

Details

Attachments

(1 attachment)

SpinEvent was introduced to facilitate fast response (on multi-processor systems) when waiting on another thread for something that might return faster than the system's timer resolution. This is important for accessibility, since we don't want lots of calls potentially waiting a full timer tick and thus slowing down responsiveness for the user. However, in a deadlock situation such as bug 1428759, this causes the spin to dominate the processor, thus rendering the system extremely difficult to use. (Obviously, we want to fix deadlocks, but even debugging them becomes extremely difficult because of this.) Furthermore, for calls that don't return immediately, we don't want to waste lots of CPU cycles on spinning. Once you get past a certain point, a few extra ms isn't going to make that much difference to perceived responsiveness.

We should modify SpinEvent so that on multi-processor systems, it spins only for a short time; e.g. 30 ms. After that, it should fall back to waiting on an event, just as we do on uni-processor. While we could query the system to get the timer resolution, this increases complexity and I don't think there's any advantage; we just want to choose a reasonable value for a "fast" call.
Assignee: nobody → jteh
I figured I may as well deal with this while I'm able to reproduce bug 1428759, since that's what initially triggered my concern about SpinEvent.

I tested this by reproducing bug 1428759 before and after the patch. Before, my system's CPU load was 20% or higher (sometimes much higher) during the deadlock. (The load probably depends on how many calls were waiting.) After the patch, the CPU load stayed lower than 10%.
To ensure we weren't always falling back to the event, I tested with some debug output where the event was used. If we were always falling back, I expected to be spammed. Instead, I only saw a few instances where the event was used.
I also did some quick usage testing on a single core vm to make sure everything worked as expected.
Comment on attachment 8942573 [details]
Bug 1429665: ipc::mscom::SpinEvent: Only spin for a short time, falling back to an event thereafter.

https://reviewboard.mozilla.org/r/212828/#review239884

::: ipc/mscom/SpinEvent.cpp:46
(Diff revision 1)
> +{
> +  static const bool gotStatics = InitStatics();
> +  MOZ_ASSERT(gotStatics);
>  
> -  if (!sIsMulticore) {
> -    mDoneEvent.own(::CreateEventW(nullptr, FALSE, FALSE, nullptr));
> +  mDoneEvent.own(::CreateEventW(nullptr, FALSE, FALSE, nullptr));

Ideally we'd only create the event as necessary, but that introduces a lot of additional concurrency headaches that I'd rather we didn't deal with at the moment.

::: ipc/mscom/SpinEvent.cpp:70
(Diff revision 1)
> +    TimeStamp start(TimeStamp::Now());
> -  while (!mDone) {
> +    while (!mDone) {
> -    // The PAUSE instruction is a hint to the CPU that we're doing a spin
> +      // The PAUSE instruction is a hint to the CPU that we're doing a spin
> -    // loop. It is a no-op on older processors that don't support it, so
> +      // loop. It is a no-op on older processors that don't support it, so
> -    // it is safe to use here without any CPUID checks.
> +      // it is safe to use here without any CPUID checks.
> -    CPU_PAUSE();
> +      CPU_PAUSE();

Move this so that CPU_PAUSE is the last instruction inside the loop.
Attachment #8942573 - Flags: review?(aklotz) → review+
Pushed by jteh@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/cedf0be06827
ipc::mscom::SpinEvent: Only spin for a short time, falling back to an event thereafter. r=aklotz
https://hg.mozilla.org/mozilla-central/rev/cedf0be06827
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Performance wins!

== Change summary for alert #12553 (as of Fri, 06 Apr 2018 01:39:26 GMT) ==

Improvements:

  7%  Strings PerfUTF16toUTF8THThousand windows7-32 opt      261,017.23 -> 243,656.92
  6%  Strings PerfUTF16toUTF8JAHundred windows7-32 opt       34,785.42 -> 32,817.00
  4%  Strings PerfUTF16toUTF8KOHundred windows7-32 opt       31,574.17 -> 30,362.75
  4%  Strings PerfUTF16toUTF8THFifteen windows7-32 opt       6,590.67 -> 6,345.42
  4%  Strings PerfUTF16toUTF8THHundred windows7-32 opt       33,074.42 -> 31,866.92
  3%  Strings PerfUTF16toUTF8JAFifteen windows7-32 opt       6,448.15 -> 6,224.83
  3%  Strings PerfUTF8toUTF16VIFifteen windows7-32 opt       9,659.77 -> 9,412.50
  2%  Strings PerfUTF16toUTF8KOFifteen windows7-32 opt       5,752.23 -> 5,627.83

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=12553
You need to log in before you can comment on or make changes to this bug.