Closed Bug 1158321 Opened 9 years ago Closed 9 years ago

Assertion failure: TimeStamp::Now() > previousVsync during shutdown when running mochitests

Categories

(Core :: Graphics, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla41
Tracking Status
firefox41 --- fixed

People

(Reporter: ehsan.akhgari, Assigned: mchang)

References

Details

(Keywords: assertion, Whiteboard: gfx-noted)

Attachments

(1 file, 3 obsolete files)

I got these assertions once, and I cannot reproduce:

Assertion failure: TimeStamp::Now() > previousVsync, at /Users/ehsan/moz/src/gfx/thebes/gfxPlatformMac.cpp:576
#01: VsyncCallback(__CVDisplayLink*, CVTimeStamp const*, CVTimeStamp const*, unsigned long long, unsigned long long*, void*) (gfxPlatformMac.cpp:576, in XUL)
#02: CVDisplayLink::performIO(CVTimeStamp*) (in CoreVideo) + 238
#03: CVDisplayLink::runIOThread() (in CoreVideo) + 627
#04: startIOThread(void*) (in CoreVideo) + 147
#05: _pthread_body (in libsystem_pthread.dylib) + 131
#06: _pthread_body (in libsystem_pthread.dylib) + 0
[NPAPI 42985] ###!!! ABORT: Aborting on channel error.: file /Users/ehsan/moz/src/ipc/glue/MessageChannel.cpp, line 1630
[Child 42984] ###!!! ABORT: Aborting on channel error.: file /Users/ehsan/moz/src/ipc/glue/MessageChannel.cpp, line 1630
#01: mozilla::ipc::MessageChannel::OnChannelErrorFromLink() (MessageChannel.cpp:1632, in XUL)
#01: mozilla::ipc::MessageChannel::OnChannelErrorFromLink() (MessageChannel.cpp:1632, in XUL)
#02: mozilla::ipc::ProcessLink::OnChannelError() (MessageLink.cpp:405, in XUL)
#02: mozilla::ipc::ProcessLink::OnChannelError() (MessageLink.cpp:405, in XUL)
#03: non-virtual thunk to mozilla::ipc::ProcessLink::OnChannelError() (MessageLink.cpp:406, in XUL)
#03: non-virtual thunk to mozilla::ipc::ProcessLink::OnChannelError() (MessageLink.cpp:406, in XUL)
#04: IPC::Channel::ChannelImpl::OnFileCanReadWithoutBlocking(int) (ipc_channel_posix.cc:881, in XUL)
#04: IPC::Channel::ChannelImpl::OnFileCanReadWithoutBlocking(int) (ipc_channel_posix.cc:881, in XUL)
#05: base::MessagePumpLibevent::OnLibeventNotification(int, short, void*) (message_pump_libevent.cc:233, in XUL)
#05: base::MessagePumpLibevent::OnLibeventNotification(int, short, void*) (message_pump_libevent.cc:233, in XUL)
#06: event_persist_closure (event.c:1302, in XUL)
#06: event_persist_closure (event.c:1302, in XUL)
#07: event_process_active_single_queue (event.c:1346, in XUL)
#07: event_process_active_single_queue (event.c:1346, in XUL)
#08: event_process_active (event.c:1420, in XUL)
#08: event_process_active (event.c:1420, in XUL)
#09: event_base_loop (event.c:1621, in XUL)
#09: event_base_loop (event.c:1621, in XUL)
#10: base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) (message_pump_libevent.cc:350, in XUL)
#10: base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) (message_pump_libevent.cc:350, in XUL)
#11: MessageLoop::RunInternal() (message_loop.cc:234, in XUL)
#11: MessageLoop::RunInternal() (message_loop.cc:234, in XUL)
#12: MessageLoop::RunHandler() (message_loop.cc:227, in XUL)
#12: MessageLoop::RunHandler() (message_loop.cc:227, in XUL)
#13: MessageLoop::Run() (message_loop.cc:200, in XUL)
#13: MessageLoop::Run() (message_loop.cc:200, in XUL)
#14: base::Thread::ThreadMain() (thread.cc:173, in XUL)
#14: base::Thread::ThreadMain() (thread.cc:173, in XUL)
#15: ThreadFunc(void*) (platform_thread_posix.cc:39, in XUL)
#16: _pthread_body (in libsystem_pthread.dylib) + 131
#17: _pthread_body (in libsystem_pthread.dylib) + 0
[Child 42984] ###!!! ABORT: Aborting on channel error.: file /Users/ehsan/moz/src/ipc/glue/MessageChannel.cpp, line 1630
Hit MOZ_CRASH() at /Users/ehsan/moz/src/memory/mozalloc/mozalloc_abort.cpp:33
#15: ThreadFunc(void*) (platform_thread_posix.cc:39, in XUL)
#16: _pthread_body (in libsystem_pthread.dylib) + 131
#17: _pthread_body (in libsystem_pthread.dylib) + 0
[NPAPI 42985] ###!!! ABORT: Aborting on channel error.: file /Users/ehsan/moz/src/ipc/glue/MessageChannel.cpp, line 1630
Hit MOZ_CRASH() at /Users/ehsan/moz/src/memory/mozalloc/mozalloc_abort.cpp:33
TEST-INFO | Main app process: killed by SIGHUP
TEST-UNEXPECTED-FAIL | dom/workers/test/serviceworkers/test_request_context.html | application terminated with exit code 1
runtests.py | Application ran for: 0:00:30.073694
zombiecheck | Reading PID log: /var/folders/ry/m4kkhrl17c76cqyfd5qv5sph0000gn/T/tmpjH3lV9pidlog
PROCESS-CRASH | dom/workers/test/serviceworkers/test_request_context.html | application crashed [None]
Crash dump filename: /var/folders/ry/m4kkhrl17c76cqyfd5qv5sph0000gn/T/tmpf6U6Ja.mozrunner/minidumps/F1E5F605-88E5-46DD-A0C1-9C56CF631E53.dmp
MINIDUMP_STACKWALK binary not found: /Users/ehsan/bin/minidump_stackwalk
Stopping web server
Stopping web socket server
Stopping ssltunnel
TEST-INFO | leakcheck | default process: leak threshold set at 0 bytes
TEST-INFO | leakcheck | plugin process: leak threshold set at 0 bytes
TEST-INFO | leakcheck | tab process: leak threshold set at 100000 bytes
TEST-INFO | leakcheck | geckomediaplugin process: leak threshold set at 20000 bytes

TEST-UNEXPECTED-FAIL | leakcheck | default process: missing output line for total leaks!
TEST-INFO | leakcheck | missing output line from log file /var/folders/ry/m4kkhrl17c76cqyfd5qv5sph0000gn/T/tmpf6U6Ja.mozrunner/runtests_leaks.log

TEST-UNEXPECTED-FAIL | leakcheck | plugin process: missing output line for total leaks!
TEST-INFO | leakcheck | missing output line from log file /var/folders/ry/m4kkhrl17c76cqyfd5qv5sph0000gn/T/tmpf6U6Ja.mozrunner/runtests_leaks_plugin_pid42985.log

TEST-UNEXPECTED-FAIL | leakcheck | tab process: missing output line for total leaks!
TEST-INFO | leakcheck | missing output line from log file /var/folders/ry/m4kkhrl17c76cqyfd5qv5sph0000gn/T/tmpf6U6Ja.mozrunner/runtests_leaks_tab_pid42984.log
runtests.py | Running tests: end.
SUITE-END | took 35s
Assignee: nobody → mchang
Status: NEW → ASSIGNED
Whiteboard: gfx-noted
FWIW I was running:

$ ./mach mochitest --e10s --auto-close dom/workers/test/serviceworkers/test_request_context.html but I doubt the test in question matters.  The --e10s part however might.
Attached patch Logging patch (obsolete) — Splinter Review
Can you please try this logging patch and see if you can still reproduce the issue? It will spam lots and lots of things, but just the couple of lines before the assert hits should be good enough. I hope it doesn't change the timing enough that you can't reproduce anymore.

Thanks!
Flags: needinfo?(cpeterson)
I hit this assertion failure pretty easily (at least 50% of the time) when running `mach mochitest dom/plugins` on a local debug build. I'm running OS X 10.10.3 on a MacBook Pro (Retina, Mid 2012). I hit this assertion whether my MacBook Pro is on battery or AC power, in case there is some CPU clock speed changes when on battery.

Samples of TimeStamp::Now() and previousVsync from two test runs:

now  = 110092742723762
prev = 110092745807560
diff = -3,083,798 (now < prev!)

now  = 111006657771499
prev = 111006659980567
diff = -2,209,068 (now < prev!)

I wonder if the problem is that previousVsync is from a different clock than TimeStamp::Now()? previousVsync is a TimeStamp converted from VsyncCallback's CVOptionFlags parameter aOutputTime->hostTime, but where does aOutputTime->hostTime come from? Some system clock? TimeStamp::Now() is clock_gettime(CLOCK_MONOTONIC).
Blocks: 1119850
Flags: needinfo?(cpeterson)
Keywords: assertion
Duration diff 0.175481, future: 16.545202
Duration diff 0.259317, future: 16.461363
Duration diff 0.303633, future: 16.417045
Duration diff 0.163172, future: 16.557504
Duration diff 0.774871, future: 15.945801
Duration diff 0.793513, future: 15.927158
Duration diff 0.192498, future: 16.528170
Duration diff 0.545942, future: 16.174724
Duration diff 0.521050, future: 16.199613
Duration diff 0.301575, future: 16.419085
Duration diff 0.323385, future: 16.397273
Duration diff 1.334701, future: 15.385955
Duration diff 0.943783, future: 15.776871
Duration diff 0.548116, future: 16.172535
Duration diff 2.066739, future: 14.653909
Duration diff 2.020680, future: 14.699967
Duration diff 1.477585, future: 15.243059
Duration diff 1.462060, future: 15.258582
Duration diff 1.259310, future: 15.461329
Duration diff 2.052767, future: 14.667870
Duration diff 1.614314, future: 15.106321
++DOMWINDOW == 95 (0x120988c00) [pid = 57432] [serial = 369] [outer = 0x12644c000]
Duration diff 0.284162, future: 16.436470
Duration diff -3.053547, future: 15.903723
Assertion failure: now > previousVsync, at /Users/chris/Code/mozilla/inbound/gfx/thebes/gfxPlatformMac.cpp:586
Attached patch Logging patch (obsolete) — Splinter Review
Oh that's interesting.. that breaks a nice assumption... Can you try this patch just to confirm what I think is happening is actually happening? Thanks!
Attachment #8612997 - Attachment is obsolete: true
Flags: needinfo?(cpeterson)
Duration diff 0.109088, future: 16.616889
Duration diff 0.106752, future: 16.616407
Duration diff 0.405896, future: 16.315854
Duration diff 0.705602, future: 16.015443
Duration diff 0.668517, future: 16.052175
Duration diff 0.505237, future: 16.215279
Duration diff 0.138762, future: 16.581664
Duration diff 0.341157, future: 16.379225
Duration diff 0.421749, future: 16.298610
Duration diff 0.227911, future: 16.492436
Duration diff 0.442846, future: 16.277495
1398 INFO TEST-PASS | dom/plugins/test/mochitest/test_src_url_change.html | One new instance should have been created. 
MEMORY STAT vsize after test: 3626053632
MEMORY STAT residentFast after test: 544444416
MEMORY STAT heapAllocated after test: 98660752
Duration diff 0.529838, future: 16.190499
Duration diff 0.782083, future: 15.938252
Duration diff 0.701349, future: 16.018984
Duration diff 0.956169, future: 15.764163
Duration diff 1.015755, future: 15.704577
Duration diff 0.895535, future: 15.824796
Duration diff 1.014130, future: 15.706199
1399 INFO TEST-OK | dom/plugins/test/mochitest/test_src_url_change.html | took 5297ms
Duration diff -0.729203, future: 16.268881
Assertion failure: now > previousVsync, at /Users/chris/Code/mozilla/inbound/gfx/thebes/gfxPlatformMac.cpp:590
Flags: needinfo?(cpeterson)
Duration diff 1.024614, future: 15.695560
Duration diff 0.906234, future: 15.813940
Duration diff 0.656712, future: 16.063462
Duration diff 0.573366, future: 16.146808
Duration diff 0.062270, future: 16.657904
Duration diff 0.667093, future: 16.053081
Duration diff 0.560997, future: 16.159177
Duration diff 0.513735, future: 16.206439
Duration diff 0.072535, future: 16.647639
Duration diff 0.093133, future: 16.627041
Duration diff 0.598898, future: 16.121276
Duration diff 0.063662, future: 16.656512
Duration diff 0.613032, future: 16.107142
Duration diff 0.225923, future: 16.494251
Duration diff 0.059968, future: 16.660206
Duration diff 0.628149, future: 16.092025
Duration diff 0.620853, future: 16.099321
Duration diff 0.865846, future: 15.854328
Duration diff 1.054227, future: 15.665947
Duration diff 0.678881, future: 16.041293
Duration diff -2.818073, future: 16.134815
Assertion failure: now > previousVsync, at /Users/chris/Code/mozilla/inbound/gfx/thebes/gfxPlatformMac.cpp:590
@Chris - Can you please verify and test that this patch works for you? Thanks for testing and logging!
Attachment #8613056 - Attachment is obsolete: true
Attachment #8613157 - Flags: feedback?(cpeterson)
Comment on attachment 8613157 [details] [diff] [review]
Normalize OS X vsync timestamp if CVDisplayLinkCallback executes early

Review of attachment 8613157 [details] [diff] [review]:
-----------------------------------------------------------------

LGTM, but why might the VsyncCallback execute before the reported vsync time? Are we just papering over some other problem? :)

::: gfx/thebes/gfxPlatformMac.cpp
@@ +574,5 @@
>    int64_t nextVsyncTimestamp = aOutputTime->hostTime;
>    mozilla::TimeStamp nextVsync = mozilla::TimeStamp::FromSystemTime(nextVsyncTimestamp);
>  
>    mozilla::TimeStamp previousVsync = display->mPreviousTimestamp;
>    display->mPreviousTimestamp = nextVsync;

Since we're seeing some timer strangeness, maybe we should sanity check the reported vsync times with an assertion like MOZ_ASSERT(nextVsync > previousVsync)?
Attachment #8613157 - Flags: feedback?(cpeterson) → feedback+
(In reply to Chris Peterson [:cpeterson] from comment #9)
> Comment on attachment 8613157 [details] [diff] [review]
> Normalize OS X vsync timestamp if CVDisplayLinkCallback executes early
> 
> Review of attachment 8613157 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> LGTM, but why might the VsyncCallback execute before the reported vsync
> time? Are we just papering over some other problem? :)
> 

I'm not really sure why... Apple's docs don't say much. The callback just means OS X expects you to start rendering now and gives you a timestamp of when the expected vsync will occur. There isn't much documentation on when the callback executes, just that you will get called back. See https://developer.apple.com/library/mac/documentation/QuartzCore/Reference/CVDisplayLinkRef/#//apple_ref/c/func/CVDisplayLinkSetOutputCallback

> ::: gfx/thebes/gfxPlatformMac.cpp
> @@ +574,5 @@
> >    int64_t nextVsyncTimestamp = aOutputTime->hostTime;
> >    mozilla::TimeStamp nextVsync = mozilla::TimeStamp::FromSystemTime(nextVsyncTimestamp);
> >  
> >    mozilla::TimeStamp previousVsync = display->mPreviousTimestamp;
> >    display->mPreviousTimestamp = nextVsync;
> 
> Since we're seeing some timer strangeness, maybe we should sanity check the
> reported vsync times with an assertion like MOZ_ASSERT(nextVsync >
> previousVsync)?

Good idea! I added that assertion to this patch. Can you please try this and ensure that the assertion is always true? Thanks for testing btw!
Attachment #8613157 - Attachment is obsolete: true
Attachment #8613694 - Flags: feedback?(cpeterson)
Comment on attachment 8613694 [details] [diff] [review]
Normalize OS X vsync timestamp if CVDisplayLinkCallback executes early

WFM. No assertion failures.
Attachment #8613694 - Flags: feedback?(cpeterson) → feedback+
Attachment #8613694 - Flags: review?(mstange)
Attachment #8613694 - Flags: review?(mstange) → review+
https://hg.mozilla.org/mozilla-central/rev/dc416c09c90a
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla41
Depends on: 1171156
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: