Stop using PRIntervalTime as the argument to CondVar::Wait and Monitor::Wait

RESOLVED FIXED in Firefox 61

Status

()

defect
RESOLVED FIXED
2 years ago
Last year

People

(Reporter: Nika, Assigned: Nika)

Tracking

unspecified
mozilla61
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(firefox61 fixed)

Details

Attachments

(2 attachments, 3 obsolete attachments)

TimeStamp/TimeDuration is less error prone and nicer to use.
MozReview-Commit-ID: BN18I8Q6c7S
Attachment #8949855 - Flags: review?(mstange)
I've looked at this before in relation to nsTimerThread code and I came away with the conclusion that its not a slam dunk.  On windows CondVar only has milli-second precision and rounds down.  If the caller is not prepared for that and passes a TimeDuration of something like 500us, then we may get unexpected behavior.

I mean, I think its worth pursuing, but we need to be careful not to round to a zero time where we used to do 1ms timeout on windows.
Comment on attachment 8949855 [details] [diff] [review]
Stop using PRIntervalTime as the argument to CondVar::Wait and Monitor::Wait

Review of attachment 8949855 [details] [diff] [review]:
-----------------------------------------------------------------

::: xpcom/threads/TimerThread.cpp
@@ +518,5 @@
>          }
> +        waitFor = TimeDuration::FromMicroseconds(microseconds);
> +        if (waitFor.IsZero()) {
> +          // round up, wait the minimum time we can wait
> +          waitFor = TimeDuration::FromMicroseconds(1);

This is wrong.  You will end up with a microsecond TimeDuration which windows CondVar rounds down to 0ms here:

https://searchfox.org/mozilla-central/rev/a5abf843f8fac8530aa5de6fb40e16547bb4a47a/mozglue/misc/ConditionVariable_windows.cpp#68-78
Attachment #8949855 - Flags: feedback-
I think the fact that the underlying primitive really only handles millisecond resolution makes using TimeDuration problematic for this API.
Err... the FromMicroseconds(1) is not wrong, but you can have waitFor.IsZero() return false when it has something like 500us in the waitFor value.  I think.  That's the problem with this code, its hard to tell and we don't have tests that catch problems here.
We need to be careful not to spin the timer thread on 0 waits as it can worsen our battery usage.  We don't have any tests that catch this.
I think the Windows rounding problem should be fixed in the Windows code. If msec is zero but !rel_time.IsZero(), then I think that code should set msec to 1. That way, actual zero TimeDurations will be treated as zero, and anything even slightly above zero will be treated as at least 1.
Comment on attachment 8949855 [details] [diff] [review]
Stop using PRIntervalTime as the argument to CondVar::Wait and Monitor::Wait

Review of attachment 8949855 [details] [diff] [review]:
-----------------------------------------------------------------

r+ once Ben's concerns are addressed in one way or another.

::: ipc/glue/MessageChannel.cpp
@@ +2298,3 @@
>  {
> +    return (aTimeout != TimeDuration::Forever()) &&
> +           (aTimeout <= (TimeStamp::Now() - aStart));

You don't really need the first half of this condition. In fact, I think you can remove the entire function.

If you don't end up removing it, please make the TimeStamp argument a const ref because it's big on Windows.

::: netwerk/dns/nsHostResolver.cpp
@@ +1084,2 @@
>  
> +        if ((now - epoch) >= timeout) {

don't need the extra parens

@@ +1086,3 @@
>              timedOut = true;
> +        } else {
> +            // It is possible that PR_WaitCondVar() was interrupted and returned

PR_WaitCondVar() -> CondVar::Wait()

::: toolkit/components/backgroundhangmonitor/BackgroundHangMonitor.cpp
@@ +97,5 @@
>  
>    // Lock for access to members of this class
>    Monitor mLock;
>    // Current time as seen by hang monitors
> +  TimeStamp mIntervalNow;

This should be renamed, maybe to mCurrentTime

@@ +187,1 @@
>    // PermaHang timeout in ticks

remove "in ticks"

@@ +189,2 @@
>    // Time at last activity
> +  TimeStamp mInterval;

rename to mLastActivity

@@ +297,3 @@
>    // Default values for the first iteration of thread loop
> +  TimeDuration waitTime = 0;
> +  TimeDuration recheckTimeout = 0;

I'd just use default initialization here, i.e. no " = 0"

@@ +341,3 @@
>  
>      // Locally hold mIntervalNow
> +    TimeStamp intervalNow = mIntervalNow;

currentTime

::: xpcom/tests/gtest/TestThreadPool.cpp
@@ +86,5 @@
>        MonitorAutoLock mon(mMonitor);
>        if (!mDone) {
>          // Wait for a reasonable timeout since we don't want to block gtests
>          // forever should any regression happen.
> +        mon.Wait(TimeDuration::FromSeconds(2));

This needs to stay 300, and there is no FromMinutes

::: xpcom/threads/CondVar.h
@@ +56,5 @@
>    /**
>     * Wait
>     * @see prcvar.h
>     **/
> +  void Wait(TimeDuration aDuration = TimeDuration::Forever());

It's possible that static analysis isn't going to like passing by value here, but since TimeDuration is 64 bytes big everywhere, passing by value is fine.
Attachment #8949855 - Flags: review?(mstange) → review+
How about this:

static DWORD
ConvertDurationToIntegerMilliseconds(TimeDuration aDuration)
{
  double milliseconds = aDuration.ToMilliseconds();
  if (milliseconds <= 0.0) {
    return 0;
  }
  if (milliseconds > UINT32_MAX) {
    return INFINITE;
  }

  DWORD msec = static_cast<DWORD>(milliseconds);
  if (msec == 0 && !aDuration.IsZero()) {
    // Ensure we don't round to zero for durations that aren't zero.
    return 1;
  }

  return msec;
}
oops, the <= should be a <
(In reply to Markus Stange [:mstange] from comment #7)
> I think the Windows rounding problem should be fixed in the Windows code. If
> msec is zero but !rel_time.IsZero(), then I think that code should set msec
> to 1. That way, actual zero TimeDurations will be treated as zero, and
> anything even slightly above zero will be treated as at least 1.

Its unclear to me what implications this might have for other code.  I think we should probably ask Nathan as he reviewed bug 956899 previously.

Nathan, do you think its safe to make this code round values that meet the condition `0 > v && v < 1ms` up to 1ms?

https://searchfox.org/mozilla-central/rev/a5abf843f8fac8530aa5de6fb40e16547bb4a47a/mozglue/misc/ConditionVariable_windows.cpp#68-78
Flags: needinfo?(nfroyd)
Attachment #8949855 - Attachment is obsolete: true
MozReview-Commit-ID: 6vJfJ9rzJac
Attachment #8949870 - Flags: review?(nfroyd)
Comment on attachment 8949869 [details] [diff] [review]
Part 1: Stop using PRIntervalTime as the argument to CondVar::Wait and Monitor::Wait

Review of attachment 8949869 [details] [diff] [review]:
-----------------------------------------------------------------

There are surprisingly many places that actually want to wait for a specified amount of time.

::: dom/workers/WorkerPrivate.cpp
@@ +4014,5 @@
>    AssertIsOnWorkerThread();
>    mMutex.AssertCurrentThreadOwns();
>  
>    // Wait for a worker event.
> +  mCondVar.Wait(aDuration);

Does anything else actually pass in a duration here?  Maybe we should just ditch the argument, unless we really did expect the previous UINT32_MAX argument to mean that we should stop waiting after a while...

::: ipc/glue/GeckoChildProcessHost.cpp
@@ +380,5 @@
>  
> +    if (timeout != TimeDuration::Forever()) {
> +      current = TimeStamp::Now();
> +      TimeDuration elapsed = current - waitStart;
> +      if (elapsed > timeout) {

I think the same comment from MessageChannel.cpp applies here as well.

::: ipc/glue/MessageChannel.cpp
@@ +2332,5 @@
>  
>      // If the timeout didn't expire, we know we received an event. The
>      // converse is not true.
> +    bool expired = timeout <= TimeStamp::Now() - waitStart;
> +    return WaitResponse(expired);

If we're going to wait for a timeout here, we could actually use the return value of Wait() to know whether we timed out or not, rather than doing a bunch of extra work...  Oh, bummer, the CondVar::Wait() API doesn't expose that information.  Maybe we should fix that?

::: xpcom/threads/CondVar.h
@@ +120,5 @@
> +{
> +#ifdef MOZILLA_INTERNAL_API
> +  AUTO_PROFILER_THREAD_SLEEP;
> +#endif //MOZILLA_INTERNAL_API
> +  mImpl.wait_for(*mLock, aDuration);

Please don't do this.  The whole point of the PR_INTERVAL_NO_TIMEOUT check in the original non-debug Wait() implementation was to be able to optimize for the case where we don't have to translate into whatever the platform condition variable wants, which can be expensive.  (ideally, the compiler would completely optimize out the check, though I'm not sure anybody verified that...)  Waiting forever is the common case, and it's worth trying to make that as fast as possible.

Ideally, we'd have a different API in CondVar that had a no-timeout Wait() and a WaitFor() that took a timeout.  But that's a fix for another time.
I think rounding up is OK.  I was going to say that nobody should be waiting for such short durations, but after looking at the previous patch, I guess we have a fair number of places that might pass in really weird numbers.
Flags: needinfo?(nfroyd)
Comment on attachment 8949870 [details] [diff] [review]
Part 2: Round submillisecond condition variable waits up to 1ms

Review of attachment 8949870 [details] [diff] [review]:
-----------------------------------------------------------------

This is fine.
Attachment #8949870 - Flags: review?(nfroyd) → review+
Asking for review again so you can take a look at the changes I made before I push it.

MozReview-Commit-ID: BN18I8Q6c7S
Attachment #8950341 - Flags: review?(nfroyd)
Attachment #8949869 - Attachment is obsolete: true
Attachment #8949870 - Attachment is obsolete: true
Comment on attachment 8950341 [details] [diff] [review]
Part 1: Stop using PRIntervalTime as the argument to CondVar::Wait and Monitor::Wait

Review of attachment 8950341 [details] [diff] [review]:
-----------------------------------------------------------------

This is OK, thank you.

::: js/src/threading/ConditionVariable.h
@@ +102,1 @@
>        ? CVStatus::Timeout : CVStatus::NoTimeout;

Should we just remove js::CVStatus as a followup?

::: mozglue/misc/ConditionVariable_windows.cpp
@@ +58,5 @@
>    bool r = SleepConditionVariableCS(&platformData()->cv_, cs, INFINITE);
>    MOZ_RELEASE_ASSERT(r);
>  }
>  
> +mozilla::CVStatus

Could you please do this change + switching to check the return value of wait_for in a few places as a separate change next time?  I should have made that clearer; I was just talking about so many things from the previous patch!

@@ +66,5 @@
>    CRITICAL_SECTION* cs = &lock.platformData()->criticalSection;
>  
>    // Note that DWORD is unsigned, so we have to be careful to clamp at 0.
>    // If rel_time is Forever, then ToMilliseconds is +inf, which evaluates as
>    // greater than UINT32_MAX, resulting in the correct INFINITE wait.

WDYT about explicitly checking for TimeDuration::Forever here before doing all the conversion math, just like the posix version does?

::: xpcom/threads/CondVar.h
@@ +116,5 @@
>  
> +// NOTE: debug impl is in BlockingResourceBase.cpp
> +#ifndef DEBUG
> +inline void
> +CondVar::Wait()

I don't know whether compilers treat a defined-in-the-class (and therefore inlineable) definition differently from a defined-outside-the-class but declared inline definition.  If they do, and this actually winds up being out-of-lined and that's different from what happens today, I think it's going to cause havoc with crash-stats.  WDYT about defining this in the class, same as is done today?
Attachment #8950341 - Flags: review?(nfroyd) → review+
Pushed by nika@thelayzells.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b608d2dcbb86
Part 1: Stop using PRIntervalTime as the argument to CondVar::Wait and Monitor::Wait, r=mstange, r=froydnj
https://hg.mozilla.org/integration/mozilla-inbound/rev/c79dc40faa41
Part 2: Round submillisecond condition variable waits up to 1ms, r=froydnj
Pushed by nika@thelayzells.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4476e8f51fa6
Part 3: Fix some platform-specific consumers of the Monitor::Wait APIs on a CLOSED TREE, a=bustage
Pushed by nika@thelayzells.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b98740e7c639
Part 4: Spell CVStatus correctly on a CLOSED TREE, a=bustage
Please ignore #c24 :D
Pushed by nika@thelayzells.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b044c550a875
Part 1: Stop using PRIntervalTime as the argument to CondVar::Wait and Monitor::Wait, r=mstange, r=froydnj
https://hg.mozilla.org/integration/mozilla-inbound/rev/76ac0b53d771
Part 2: Round submillisecond condition variable waits up to 1ms, r=froydnj
https://hg.mozilla.org/mozilla-central/rev/b044c550a875
https://hg.mozilla.org/mozilla-central/rev/76ac0b53d771
Status: NEW → RESOLVED
Closed: Last year
Resolution: --- → FIXED
Target Milestone: --- → mozilla61
Flags: needinfo?(nika)
You need to log in before you can comment on or make changes to this bug.