Closed Bug 1829983 Opened 1 year ago Closed 1 year ago

Intermittent Assertion failure: bundleWakeup - mTimers[0].Timeout() <= ComputeAcceptableFiringDelay(mTimers[0].Delay(), minTimerDelay, maxTimerDelay), at /builds/worker/checkouts/gecko/xpcom/threads/TimerThread.cpp:682

Categories

(Core :: XPCOM, defect, P5)

defect

Tracking

()

RESOLVED FIXED
116 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox113 --- unaffected
firefox114 --- wontfix
firefox115 --- wontfix
firefox116 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: jlink)

References

Details

(Keywords: assertion, intermittent-failure)

Attachments

(3 files)

Filed by: smolnar [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=413776710&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/eNSO9wlnSVCOGyNXdCw0fA/runs/0/artifacts/public/logs/live_backing.log


INFO - TEST-START | /html/syntax/parsing/html5lib_domjs-unsafe.html?run_type=uri
[task 2023-04-26T00:16:50.268Z] 00:16:50     INFO - Closing window 3e466e65-511a-44fa-b993-fb8e6e733eeb
[task 2023-04-26T00:16:50.351Z] 00:16:50     INFO - PID 3232 | [Child 9772, Main Thread] WARNING: IPC message 'PSessionStore::Msg_SessionStoreUpdate' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:504
[task 2023-04-26T00:16:50.572Z] 00:16:50     INFO - PID 3232 | [Parent 6432, Main Thread] WARNING: NS_ENSURE_TRUE(she && she->mInfo->mSharedState.Get()) failed: file /builds/worker/checkouts/gecko/docshell/shistory/SessionHistoryEntry.cpp:1155
[task 2023-04-26T00:16:50.586Z] 00:16:50     INFO - PID 3232 | [Parent 6432, Main Thread] WARNING: NS_ENSURE_TRUE(she && she->mInfo->mSharedState.Get()) failed: file /builds/worker/checkouts/gecko/docshell/shistory/SessionHistoryEntry.cpp:1155
[task 2023-04-26T00:16:50.589Z] 00:16:50     INFO - PID 3232 | [Parent 6432, Main Thread] WARNING: NS_ENSURE_TRUE(she && she->mInfo->mSharedState.Get()) failed: file /builds/worker/checkouts/gecko/docshell/shistory/SessionHistoryEntry.cpp:1155
[task 2023-04-26T00:16:50.591Z] 00:16:50     INFO - PID 3232 | [Parent 6432, Main Thread] WARNING: NS_ENSURE_TRUE(she && she->mInfo->mSharedState.Get()) failed: file /builds/worker/checkouts/gecko/docshell/shistory/SessionHistoryEntry.cpp:1155
[task 2023-04-26T00:16:50.593Z] 00:16:50     INFO - PID 3232 | [Parent 6432, Main Thread] WARNING: NS_ENSURE_TRUE(she && she->mInfo->mSharedState.Get()) failed: file /builds/worker/checkouts/gecko/docshell/shistory/SessionHistoryEntry.cpp:1155
[task 2023-04-26T00:16:50.595Z] 00:16:50     INFO - PID 3232 | [Parent 6432, Main Thread] WARNING: NS_ENSURE_TRUE(she && she->mInfo->mSharedState.Get()) failed: file /builds/worker/checkouts/gecko/docshell/shistory/SessionHistoryEntry.cpp:1155
[task 2023-04-26T00:16:50.597Z] 00:16:50     INFO - PID 3232 | [Parent 6432, Main Thread] WARNING: NS_ENSURE_TRUE(she && she->mInfo->mSharedState.Get()) failed: file /builds/worker/checkouts/gecko/docshell/shistory/SessionHistoryEntry.cpp:1155
[task 2023-04-26T00:16:50.599Z] 00:16:50     INFO - PID 3232 | [Parent 6432, Main Thread] WARNING: NS_ENSURE_TRUE(she && she->mInfo->mSharedState.Get()) failed: file /builds/worker/checkouts/gecko/docshell/shistory/SessionHistoryEntry.cpp:1155
[task 2023-04-26T00:16:50.728Z] 00:16:50     INFO - PID 3232 | [Child 9772, Main Thread] WARNING: NS_ENSURE_TRUE(nsContentUtils::IsJavascriptMIMEType(type)) failed: file /builds/worker/checkouts/gecko/dom/script/ScriptLoader.cpp:908
[task 2023-04-26T00:16:50.741Z] 00:16:50     INFO - PID 3232 | [Child 9772, Main Thread] WARNING: NS_ENSURE_TRUE(nsContentUtils::IsJavascriptMIMEType(type)) failed: file /builds/worker/checkouts/gecko/dom/script/ScriptLoader.cpp:908
[task 2023-04-26T00:16:50.742Z] 00:16:50     INFO - PID 3232 | [Child 9772, Main Thread] WARNING: NS_ENSURE_TRUE(nsContentUtils::IsJavascriptMIMEType(type)) failed: file /builds/worker/checkouts/gecko/dom/script/ScriptLoader.cpp:908
[task 2023-04-26T00:16:50.742Z] 00:16:50     INFO - PID 3232 | [Child 9772, Main Thread] WARNING: NS_ENSURE_TRUE(nsContentUtils::IsJavascriptMIMEType(type)) failed: file /builds/worker/checkouts/gecko/dom/script/ScriptLoader.cpp:908
[task 2023-04-26T00:16:50.822Z] 00:16:50     INFO - PID 3232 | [Child 9772, Main Thread] WARNING: NS_ENSURE_TRUE(nsContentUtils::IsJavascriptMIMEType(type)) failed: file /builds/worker/checkouts/gecko/dom/script/ScriptLoader.cpp:908
[task 2023-04-26T00:16:50.837Z] 00:16:50     INFO - PID 3232 | Assertion failure: bundleWakeup - mTimers[0].Timeout() <= ComputeAcceptableFiringDelay(mTimers[0].Delay(), minTimerDelay, maxTimerDelay), at /builds/worker/checkouts/gecko/xpcom/threads/TimerThread.cpp:682
[task 2023-04-26T00:16:50.995Z] 00:16:50     INFO - STDOUT: Initializing stack-fixing for the first stack frame, this may take a while...
[task 2023-04-26T00:16:51.732Z] 00:16:51     INFO - Browser not responding, setting status to CRASH
[task 2023-04-26T00:16:51.747Z] 00:16:51     INFO - mozcrash Copy/paste: Z:\task_168245974338458\fetches\minidump-stackwalk\minidump-stackwalk.exe --symbols-url=https://symbols.mozilla.org/ --cyborg=C:\Users\task_168245974338458\AppData\Local\Temp\tmpmc2dxk_u\3e279984-ba40-4be1-9e51-77fb8b1dbef1.trace C:\Users\task_168245974338458\AppData\Local\Temp\tmpzhbueqii\minidumps\3e279984-ba40-4be1-9e51-77fb8b1dbef1.dmp Z:\task_168245974338458\build\symbols
[task 2023-04-26T00:17:00.476Z] 00:17:00     INFO - mozcrash Saved minidump as Z:\task_168245974338458\build\blobber_upload_dir\3e279984-ba40-4be1-9e51-77fb8b1dbef1.dmp
[task 2023-04-26T00:17:00.507Z] 00:17:00     INFO - mozcrash Saved app info as Z:\task_168245974338458\build\blobber_upload_dir\3e279984-ba40-4be1-9e51-77fb8b1dbef1.extra
[task 2023-04-26T00:17:00.695Z] 00:17:00     INFO - PROCESS-CRASH | /html/syntax/parsing/html5lib_domjs-unsafe.html?run_type=uri | application crashed [@ TimerThread::ComputeWakeupTimeFromTimers]
[task 2023-04-26T00:17:00.695Z] 00:17:00     INFO - Mozilla crash reason: MOZ_ASSERT(bundleWakeup - mTimers[0].Timeout() <= ComputeAcceptableFiringDelay(mTimers[0].Delay(), minTimerDelay, maxTimerDelay))
[task 2023-04-26T00:17:00.695Z] 00:17:00     INFO - Crash dump filename: C:\Users\task_168245974338458\AppData\Local\Temp\tmpzhbueqii\minidumps\3e279984-ba40-4be1-9e51-77fb8b1dbef1.dmp
[task 2023-04-26T00:17:00.695Z] 00:17:00     INFO - Operating system: Windows NT
[task 2023-04-26T00:17:00.706Z] 00:17:00     INFO -                   10.0.22621
[task 2023-04-26T00:17:00.706Z] 00:17:00     INFO - CPU: amd64
[task 2023-04-26T00:17:00.706Z] 00:17:00     INFO -      family 6 model 85 stepping 7
[task 2023-04-26T00:17:00.706Z] 00:17:00     INFO -      8 CPUs
[task 2023-04-26T00:17:00.706Z] 00:17:00     INFO - 
[task 2023-04-26T00:17:00.706Z] 00:17:00     INFO - Crash reason:  EXCEPTION_BREAKPOINT
INFO - Crash address: 0x7ffdcea87188
[task 2023-04-26T00:17:00.706Z] 00:17:00     INFO - Crashing instruction: `int 0x3`
[task 2023-04-26T00:17:00.706Z] 00:17:00     INFO - No memory accessed by instruction
[task 2023-04-26T00:17:00.706Z] 00:17:00     INFO - Process uptime: 23 seconds
[task 2023-04-26T00:17:00.706Z] 00:17:00     INFO - 
[task 2023-04-26T00:17:00.707Z] 00:17:00     INFO - Thread 7 Timer (crashed)
[task 2023-04-26T00:17:00.707Z] 00:17:00     INFO -  0  xul.dll!TimerThread::ComputeWakeupTimeFromTimers() const [TimerThread.cpp:a5e419aac42803b83971143f8bc56c624ae2bccb : 680 + 0x0]
[task 2023-04-26T00:17:00.707Z] 00:17:00     INFO -      rax = 0x00007ffdd81ae403    rdx = 0x0000000000000000
[task 2023-04-26T00:17:00.707Z] 00:17:00     INFO -      rcx = 0x00007ffdf6c43e30    rbx = 0x000000003b9aca00
[task 2023-04-26T00:17:00.707Z] 00:17:00     INFO -      rsi = 0x000000b5395aef60    rdi = 0x00000230e0a3d400
[task 2023-04-26T00:17:00.707Z] 00:17:00     INFO -      rbp = 0x000000b5395aee20    rsp = 0x000000b5395aede0
[task 2023-04-26T00:17:00.707Z] 00:17:00     INFO -       r8 = 0x000000000000000e     r9 = 0x00007ffe198fce20
[task 2023-04-26T00:17:00.707Z] 00:17:00     INFO -      r10 = 0x00007ffe19770000    r11 = 0x000000b5395ae7a0
[task 2023-04-26T00:17:00.707Z] 00:17:00     INFO -      r12 = 0x000000000000000d    r13 = 0x00000000000000ce
[task 2023-04-26T00:17:00.707Z] 00:17:00     INFO -      r14 = 0x00000230eb92a000    r15 = 0x000000b5395aee60
[task 2023-04-26T00:17:00.707Z] 00:17:00     INFO -      rip = 0x00007ffdcea87188
[task 2023-04-26T00:17:00.707Z] 00:17:00     INFO -     Found by: given as instruction pointer in context
[task 2023-04-26T00:17:00.707Z] 00:17:00     INFO -  1  xul.dll!TimerThread::Run() [TimerThread.cpp:a5e419aac42803b83971143f8bc56c624ae2bccb : 838 + 0x2c]
[task 2023-04-26T00:17:00.707Z] 00:17:00     INFO -      rbx = 0x00000230e0a3d4d8    rsi = 0x00000230e0a3d440
[task 2023-04-26T00:17:00.707Z] 00:17:00     INFO -      rdi = 0x00000230e0a3d400    rbp = 0x000000000000006e
[task 2023-04-26T00:17:00.707Z] 00:17:00     INFO -      rsp = 0x000000b5395aeee0    r12 = 0x0000000000000000
[task 2023-04-26T00:17:00.707Z] 00:17:00     INFO -      r13 = 0x000000b5395aef60    r14 = 0x0000000000000000
[task 2023-04-26T00:17:00.707Z] 00:17:00     INFO -      r15 = 0x000000b5395aef30    rip = 0x00007ffdcea87972
[task 2023-04-26T00:17:00.707Z] 00:17:00     INFO -     Found by: call frame info
[task 2023-04-26T00:17:00.707Z] 00:17:00     INFO -  2  xul.dll!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:a5e419aac42803b83971143f8bc56c624ae2bccb : 1233 + 0xc]
[task 2023-04-26T00:17:00.707Z] 00:17:00     INFO -      rbx = 0x00000230e0a7c218    rsi = 0x00000230e0a7c200
[task 2023-04-26T00:17:00.708Z] 00:17:00     INFO -      rdi = 0x000000b5395af69f    rbp = 0x0000000000000000
[task 2023-04-26T00:17:00.708Z] 00:17:00     INFO -      rsp = 0x000000b5395af480    r12 = 0x00000230e0a7c218
[task 2023-04-26T00:17:00.708Z] 00:17:00     INFO -      r13 = 0xaaaaaaaaaaaaaaaa    r14 = 0x00000230e0a7c350
[task 2023-04-26T00:17:00.708Z] 00:17:00     INFO -      r15 = 0x0000000000000000    rip = 0x00007ffdcea9399c
[task 2023-04-26T00:17:00.708Z] 00:17:00     INFO -     Found by: call frame info
[task 2023-04-26T00:17:00.708Z] 00:17:00     INFO -  3  xul.dll!NS_ProcessNextEvent(nsIThread*, bool) [nsThreadUtils.cpp:a5e419aac42803b83971143f8bc56c624ae2bccb : 479 + 0x16]
[task 2023-04-26T00:17:00.708Z] 00:17:00     INFO -      rbx = 0x0000000000000000    rsi = 0x00000230e7afe5c0
[task 2023-04-26T00:17:00.708Z] 00:17:00     INFO -      rdi = 0x000000b5395af810    rbp = 0x000000b5395af750
[task 2023-04-26T00:17:00.709Z] 00:17:00     INFO -      rsp = 0x000000b5395af670    r12 = 0x00000230e7d0f238
[task 2023-04-26T00:17:00.709Z] 00:17:00     INFO -      r13 = 0x00000230e7d0f248    r14 = 0x00000230e7afe5f0
[task 2023-04-26T00:17:00.709Z] 00:17:00     INFO -      r15 = 0x000000b5395af6e8    rip = 0x00007ffdcea9a745
INFO -     Found by: call frame info
[task 2023-04-26T00:17:00.709Z] 00:17:00     INFO -  4  xul.dll!mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) [MessagePump.cpp:a5e419aac42803b83971143f8bc56c624ae2bccb : 300 + 0x9]
[task 2023-04-26T00:17:00.709Z] 00:17:00     INFO -      rbx = 0x00000230e0a7c200    rsi = 0x00000230e7afe5c0
[task 2023-04-26T00:17:00.709Z] 00:17:00     INFO -      rbp = 0x000000b5395af750    rsp = 0x000000b5395af6c0
[task 2023-04-26T00:17:00.709Z] 00:17:00     INFO -      r12 = 0x00000230e7d0f238    r13 = 0x00000230e7d0f248
[task 2023-04-26T00:17:00.709Z] 00:17:00     INFO -      r14 = 0x00000230e7afe5f0    r15 = 0x000000b5395af6e8
[task 2023-04-26T00:17:00.709Z] 00:17:00     INFO -      rip = 0x00007ffdcf55257e
[task 2023-04-26T00:17:00.709Z] 00:17:00     INFO -     Found by: call frame info
[task 2023-04-26T00:17:00.709Z] 00:17:00     INFO -  5  xul.dll!MessageLoop::RunHandler() [message_loop.cc:a5e419aac42803b83971143f8bc56c624ae2bccb : 362 + 0x7]
[task 2023-04-26T00:17:00.709Z] 00:17:00     INFO -      rbx = 0x00000230e7afd5e8    rsi = 0x000000b5395af810
[task 2023-04-26T00:17:00.709Z] 00:17:00     INFO -      rdi = 0x00000230e0a7c200    rbp = 0x000000b5395af750
[task 2023-04-26T00:17:00.709Z] 00:17:00     INFO -      rsp = 0x000000b5395af730    r12 = 0x00000230e7d0f238
[task 2023-04-26T00:17:00.709Z] 00:17:00     INFO -      r13 = 0x00000230e7d0f248    r14 = 0x000000b5395af810
[task 2023-04-26T00:17:00.709Z] 00:17:00     INFO -      r15 = 0x00007ffde98bcfb8    rip = 0x00007ffdcf4d6cb0
[task 2023-04-26T00:17:00.709Z] 00:17:00     INFO -     Found by: call frame info
[task 2023-04-26T00:17:00.709Z] 00:17:00     INFO -  6  xul.dll!MessageLoop::Run() [message_loop.cc:a5e419aac42803b83971143f8bc56c624ae2bccb : 344 + 0x4]
[task 2023-04-26T00:17:00.710Z] 00:17:00     INFO -      rbx = 0x00000230e7afd5e8    rsi = 0x00000230e7afd5e0
[task 2023-04-26T00:17:00.710Z] 00:17:00     INFO -      rdi = 0x00000230e0a7c200    rbp = 0x0000000000000005
[task 2023-04-26T00:17:00.710Z] 00:17:00     INFO -      rsp = 0x000000b5395af780    r12 = 0x00000230e7d0f238
[task 2023-04-26T00:17:00.710Z] 00:17:00     INFO -      r13 = 0x00000230e7d0f248    r14 = 0x000000b5395af810
[task 2023-04-26T00:17:00.710Z] 00:17:00     INFO -      r15 = 0x00007ffde98bcfb8    rip = 0x00007ffdcf4d6b6f
[task 2023-04-26T00:17:00.710Z] 00:17:00     INFO -     Found by: call frame info
[task 2023-04-26T00:17:00.710Z] 00:17:00     INFO -  7  xul.dll!nsThread::ThreadFunc(void*) [nsThread.cpp:a5e419aac42803b83971143f8bc56c624ae2bccb : 391 + 0x7]
<...>
Duplicate of this bug: 1829984
Component: DOM: Core & HTML → XPCOM
Flags: needinfo?(jlink)

Thanks for pinging me on this. I am looking into this now.

Flags: needinfo?(jlink)

Despite having looked at this for a while I don't have a definitive answer on what's happening as I haven't been able to reproduce this assert. The logic seems correct and I even asked Doug to take a quick look at it as well and it seems like the condition that is failing should indeed by an invariant for the loop.

In taking a close look at the TimeStamp class (specifically on Windows since that is fairly different from the other platforms) one thing that I have observed is that comparisons and subtractions between timestamps that are very far apart (greater than what can be represented by an int64_t) are incorrect. For example, TimeStamp{0xfffffffffffffffff} > TimeStamp{0} == false and TimeStamp{0xffffffffffffffff} - TimeStamp{0} is a negative TimeDuration (even though there is logic in there that is trying to prevent that). (This is basically what is being reported in https://bugzilla.mozilla.org/show_bug.cgi?id=1203389, and there are a few other bugs that are related.)

Despite trying to figure out a combination of input times/durations that could make the assert fail and even fuzzing the function in question and failing to reproduce the assert, I am still wondering if somehow one of these incorrectly-handled overflow/underflow/wrap cases is still to blame.

To try and track this down I have a patch with some debug code to point out what inputs have caused this if it happens again. Taking another look at what's reported here though, I'm not sure how useful that would be! I was under the impression that this assert had been reported as failing twice but it looks like the other report (https://bugzilla.mozilla.org/show_bug.cgi?id=1829984) is actually from the same job. And it seems like, in the two weeks since the bug has been filed, there have been zero more reports of it failing.

@nika: Am I correct that there have been no further reports of this assert firing? (And is https://treeherder.mozilla.org/intermittent-failures/bugdetails?bug=1829983&startday=2023-04-24&endday=2023-05-10&tree=all the right way to check that?)

Flags: needinfo?(nika)

(In reply to Justin Link from comment #4)

@nika: Am I correct that there have been no further reports of this assert firing? (And is https://treeherder.mozilla.org/intermittent-failures/bugdetails?bug=1829983&startday=2023-04-24&endday=2023-05-10&tree=all the right way to check that?)

Given this is a debug assertion, we won't be getting reports from the wild of the assertion failing (if we want to get more reports like that, upgrading to a diagnostic assertion would give us checks on nightly/early beta as well, which might give more information). The intermittent failures view is a way to see any failures which the sherrifs have credited to this bug, so it's not actually searching our try run logs for the assertion failure or anything like that (I don't think we have a way to do that?).

There's a chance that this failure has been hit e.g. in a different test and is present as another bug, but I unfortunately don't know of a good way to check that.

(In reply to Justin Link from comment #4)

In taking a close look at the TimeStamp class (specifically on Windows since that is fairly different from the other platforms) one thing that I have observed is that comparisons and subtractions between timestamps that are very far apart (greater than what can be represented by an int64_t) are incorrect. For example, TimeStamp{0xfffffffffffffffff} > TimeStamp{0} == false and TimeStamp{0xffffffffffffffff} - TimeStamp{0} is a negative TimeDuration (even though there is logic in there that is trying to prevent that). (This is basically what is being reported in https://bugzilla.mozilla.org/show_bug.cgi?id=1203389, and there are a few other bugs that are related.)

Despite trying to figure out a combination of input times/durations that could make the assert fail and even fuzzing the function in question and failing to reproduce the assert, I am still wondering if somehow one of these incorrectly-handled overflow/underflow/wrap cases is still to blame.

That would seem somewhat surprising to me, as I don't know how we would end up with a timestamp like one of those in this situation, but I suppose it's possible?

Flags: needinfo?(nika)
Assignee: nobody → jlink
Status: NEW → ASSIGNED

(In reply to Nika Layzell [:nika] (ni? for response) from comment #5)

(In reply to Justin Link from comment #4)

@nika: Am I correct that there have been no further reports of this assert firing? (And is https://treeherder.mozilla.org/intermittent-failures/bugdetails?bug=1829983&startday=2023-04-24&endday=2023-05-10&tree=all the right way to check that?)

Given this is a debug assertion, we won't be getting reports from the wild of the assertion failing (if we want to get more reports like that, upgrading to a diagnostic assertion would give us checks on nightly/early beta as well, which might give more information). The intermittent failures view is a way to see any failures which the sherrifs have credited to this bug, so it's not actually searching our try run logs for the assertion failure or anything like that (I don't think we have a way to do that?).

There's a chance that this failure has been hit e.g. in a different test and is present as another bug, but I unfortunately don't know of a good way to check that.

Okay, that's roughly what I figured. What about the Intermittent Failures Robot though? Does that not try to match up new failures with existing bugs? Or is that still relying on some human work? (FWIW I did search bugzilla for "ComputeAcceptableFiringDelay" and this is the only bug that references that term.)

(In reply to Justin Link from comment #4)

In taking a close look at the TimeStamp class (specifically on Windows since that is fairly different from the other platforms) one thing that I have observed is that comparisons and subtractions between timestamps that are very far apart (greater than what can be represented by an int64_t) are incorrect. For example, TimeStamp{0xfffffffffffffffff} > TimeStamp{0} == false and TimeStamp{0xffffffffffffffff} - TimeStamp{0} is a negative TimeDuration (even though there is logic in there that is trying to prevent that). (This is basically what is being reported in https://bugzilla.mozilla.org/show_bug.cgi?id=1203389, and there are a few other bugs that are related.)

Despite trying to figure out a combination of input times/durations that could make the assert fail and even fuzzing the function in question and failing to reproduce the assert, I am still wondering if somehow one of these incorrectly-handled overflow/underflow/wrap cases is still to blame.

That would seem somewhat surprising to me, as I don't know how we would end up with a timestamp like one of those in this situation, but I suppose it's possible?

To be clear, the bug doesn't specifically require those values, just values that are far apart. Given the way that the code restricts how TimeStamps can be created it shouldn't be easy to end up with such far-apart times as you point out. The only way that I could see it happening would be if someone were to create a new TimeStamp by adding on a very long TimeDuration. This is a pretty normal operation to perform (without an extra-ordinarily large duration, of course) so possibly there is an infrequent situation in which a crazy large duration gets used.

I'm not sure if we will ultimately want to go ahead with it, but I've put a patch up for review which adds some additional output in the case where the assert is about to fail.

If we merge this change we could keep it around for a few months to give us some additional info in case the assert does fire again. It is debug-build-only so it wouldn't have any impact on users in the meantime.

Pushed by jlink@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/08f526a6cc90
Added some debug output in case this assert fails again r=dthayer

Btw, this bug should stay open - the patch stack that has been pushed is to help debug the problem if it occurs again. There is no expectation that it has been fixed by those patches.

Pushed by jlink@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b610dbc0fe43
Added some debug-only getters to TimeStamp and TimeDuration r=dthayer
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 115 Branch

(In reply to Justin Link from comment #10)

Btw, this bug should stay open - the patch stack that has been pushed is to help debug the problem if it occurs again. There is no expectation that it has been fixed by those patches.

For future reference, the leave-open keyword on the bug is the correct way to indicate this. Not setting it now since the patches already landed, but feel free to do so if/when you land more debugging patches in the future.

Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: 115 Branch → ---
Status: REOPENED → ASSIGNED
Status: ASSIGNED → RESOLVED
Closed: 1 year ago1 year ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---

Thanks to the debug output that we added before I have been able to reproduce the assert and I can see what is happening now.

The key observation is that, on Windows, each TimeStamp object actually has two values, (one from GTC and one optional from QPC) and which one gets used for a given operation will depend on whether QPC values are present or not. In general those two values do not differ by much.

In this particular case, the timeout of first timer in the list has a QPC value and that QPC value refers to a point in time that is slightly ahead of what the GTC value specifies. Most timers in the list also have timeouts with QPCs so we use the QPC values for most of our logic but we do end up encountering a timer without a QPC in its timeout and we end up basing our wake-up time on that timer. When we then compare our non-QPC wake-up time against the first timer in the assert outside of the loop, we now use the GTC value, which is slightly earlier than the QPC value, and find that the wake-up time is slightly later than what would be allowed by that timer's GTC value.

Another way to think about the behavior is that, by having two distinct values in a TimeStamp object, each timeout is actually specifying two different times at which the timer wants to wake up. We pick a final wake-up time that is compatible with at least one of the specified timeouts.

For this code I think that the right solution is to simply remove the assert after the loop (or possibly make it non-Windows-only). The purpose of that assert was really to double-check the logic in the loop and I haven't seen anything makes me think that there is anything wrong with the logic in the loop. The problem comes from an implementation detail of the TimeStamp class.

This dual-valued nature of the TimeStamp object has some counter-intuitive and unexpected consequences, such as non-transitivity of normally-transitive comparisons (ie, a > b > c doesn't necessarily imply a > c and a == b == c doesn't necessarily imply a == c). In this case it's not a big deal (at least not right now) but I could certainly imagine it introducing a subtle problem somewhere and I don't have any good idea on how to find those places in the case if they do exist.

Pushed by jlink@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/fcc729658406
Disabled assert that's invalid on Windows r=dthayer
Status: REOPENED → RESOLVED
Closed: 1 year ago1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 116 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: