Closed Bug 1458166 Opened 2 years ago Closed 2 years ago

Intermittent PROCESS-CRASH | Main app process exited normally | application crashed [@ mozilla::gfx::Log<3, mozilla::gfx::BasicLogger>::WriteLog] after Assertion failure: [GFX1]

Categories

(Core :: Audio/Video: Playback, defect, P2, critical)

defect

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox-esr52 --- wontfix
firefox-esr60 --- wontfix
firefox59 --- wontfix
firefox60 --- wontfix
firefox61 --- fixed
firefox62 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: pehrsons)

References

Details

(Keywords: crash, intermittent-failure, Whiteboard: [retriggered])

Crash Data

Attachments

(2 files)

Filed by: apavel [at] mozilla.com

https://treeherder.mozilla.org/logviewer.html#?job_id=176407987&repo=mozilla-inbound

https://queue.taskcluster.net/v1/task/Kf6NUUoOTqistrqOqz06ig/runs/0/artifacts/public/logs/live_backing.log

task 2018-05-01T06:16:42.548Z] 06:16:42     INFO - GECKO(1489) | ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2018-05-01T06:16:42.597Z] 06:16:42     INFO - GECKO(1489) | nsStringStats
[task 2018-05-01T06:16:42.597Z] 06:16:42     INFO - GECKO(1489) |  => mAllocCount:           9366
[task 2018-05-01T06:16:42.598Z] 06:16:42     INFO - GECKO(1489) |  => mReallocCount:          261
[task 2018-05-01T06:16:42.599Z] 06:16:42     INFO - GECKO(1489) |  => mFreeCount:            9366
[task 2018-05-01T06:16:42.599Z] 06:16:42     INFO - GECKO(1489) |  => mShareCount:           6199
[task 2018-05-01T06:16:42.599Z] 06:16:42     INFO - GECKO(1489) |  => mAdoptCount:            184
[task 2018-05-01T06:16:42.601Z] 06:16:42     INFO - GECKO(1489) |  => mAdoptFreeCount:        184
[task 2018-05-01T06:16:42.602Z] 06:16:42     INFO - GECKO(1489) |  => Process ID: 1599, Thread ID: 140547233118016
[task 2018-05-01T06:16:42.702Z] 06:16:42     INFO - GECKO(1489) | 1525155402689	Marionette	DEBUG	Received observer notification xpcom-will-shutdown
[task 2018-05-01T06:16:42.703Z] 06:16:42     INFO - GECKO(1489) | 1525155402689	Marionette	DEBUG	New connections will no longer be accepted
[task 2018-05-01T06:16:42.723Z] 06:16:42     INFO - GECKO(1489) | [Parent 1489, Main Thread] WARNING: NS_ENSURE_TRUE(mDB) failed: file /builds/worker/workspace/build/src/netwerk/cache/nsDiskCacheDeviceSQL.cpp, line 1424
[task 2018-05-01T06:16:42.863Z] 06:16:42     INFO - GECKO(1489) | [Parent 1489, Main Thread] WARNING: NS_ENSURE_TRUE(maybeContext) failed: file /builds/worker/workspace/build/src/xpcom/threads/nsThread.cpp, line 805
[task 2018-05-01T06:16:42.864Z] 06:16:42     INFO - GECKO(1489) | [Child 1542, Main Thread] WARNING: NS_ENSURE_TRUE(maybeContext) failed: file /builds/worker/workspace/build/src/xpcom/threads/nsThread.cpp, line 805
[task 2018-05-01T06:16:42.864Z] 06:16:42     INFO - GECKO(1489) | ###!!! [Child][MessageChannel] Error: (msgtype=0x2D00CD,name=PContent::Msg_GraphicsError) Closed channel: cannot send/recv
[task 2018-05-01T06:16:42.865Z] 06:16:42     INFO - GECKO(1489) | [Child 1542, Main Thread] WARNING: MsgDropped in ContentChild: file /builds/worker/workspace/build/src/dom/ipc/ContentChild.cpp, line 2388
[task 2018-05-01T06:16:42.865Z] 06:16:42     INFO - GECKO(1489) | [GFX1]: Texture deallocated too late during shutdown
[task 2018-05-01T06:16:42.867Z] 06:16:42     INFO - GECKO(1489) | Assertion failure: [GFX1]: Texture deallocated too late during shutdown, at /builds/worker/workspace/build/src/gfx/2d/Logging.h:727
[task 2018-05-01T06:17:16.255Z] 06:17:16     INFO - GECKO(1489) | #01: mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::~Log [gfx/2d/Logging.h:731]
[task 2018-05-01T06:17:16.256Z] 06:17:16     INFO - 
[task 2018-05-01T06:17:16.258Z] 06:17:16     INFO - GECKO(1489) | #02: mozilla::layers::DeallocateTextureClient [gfx/layers/client/TextureClient.cpp:331]
[task 2018-05-01T06:17:16.258Z] 06:17:16     INFO - 
[task 2018-05-01T06:17:16.260Z] 06:17:16     INFO - GECKO(1489) | #03: mozilla::layers::TextureClient::Destroy [gfx/layers/client/TextureClient.cpp:433]
[task 2018-05-01T06:17:16.261Z] 06:17:16     INFO - 
[task 2018-05-01T06:17:16.265Z] 06:17:16     INFO - GECKO(1489) | #04: mozilla::layers::TextureClient::~TextureClient [gfx/layers/client/TextureClient.cpp:622]
[task 2018-05-01T06:17:16.266Z] 06:17:16     INFO - 
[task 2018-05-01T06:17:16.268Z] 06:17:16     INFO - GECKO(1489) | #05: mozilla::layers::TextureClient::~TextureClient [gfx/layers/client/TextureClient.cpp:629]
[task 2018-05-01T06:17:16.269Z] 06:17:16     INFO - 
[task 2018-05-01T06:17:16.273Z] 06:17:16     INFO - GECKO(1489) | #06: mozilla::AtomicRefCountedWithFinalize<mozilla::layers::TextureClient>::Release [gfx/layers/AtomicRefCountedWithFinalize.h:145]
[task 2018-05-01T06:17:16.275Z] 06:17:16     INFO - 
[task 2018-05-01T06:17:16.278Z] 06:17:16     INFO - GECKO(1489) | #07: mozilla::layers::TextureClientHolder::Release [memory/mozalloc/mozalloc.h:180]
[task 2018-05-01T06:17:16.279Z] 06:17:16     INFO - 
[task 2018-05-01T06:17:16.280Z] 06:17:16     INFO - GECKO(1489) | #08: mozilla::layers::TextureClientRecycleAllocator::RecycleTextureClient [mfbt/RefPtr.h:41]
[task 2018-05-01T06:17:16.281Z] 06:17:16     INFO - 
[task 2018-05-01T06:17:16.283Z] 06:17:16     INFO - GECKO(1489) | #09: mozilla::AtomicRefCountedWithFinalize<mozilla::layers::TextureClient>::Release [gfx/layers/AtomicRefCountedWithFinalize.h:154]
[task 2018-05-01T06:17:16.283Z] 06:17:16     INFO - 
[task 2018-05-01T06:17:16.284Z] 06:17:16     INFO - GECKO(1489) | #10: nsTArray_Impl<mozilla::layers::ImageClientSingle::Buffer, nsTArrayInfallibleAllocator>::ClearAndRetainStorage [gfx/layers/client/ImageClient.h:110]

[task 2018-05-01T06:17:24.754Z] 06:17:24     INFO - PROCESS-CRASH | Main app process exited normally | application crashed [@ mozilla::gfx::Log<3, mozilla::gfx::BasicLogger>::WriteLog]
[task 2018-05-01T06:17:24.755Z] 06:17:24     INFO - Crash dump filename: /tmp/tmp4OQpPb.mozrunner/minidumps/767889b0-7036-036e-0d37-3362ecfa7775.dmp
[task 2018-05-01T06:17:24.756Z] 06:17:24     INFO - Operating system: Linux
[task 2018-05-01T06:17:24.757Z] 06:17:24     INFO -                   0.0.0 Linux 4.4.0-1014-aws #14taskcluster1-Ubuntu SMP Tue Apr 3 10:27:00 UTC 2018 x86_64
[task 2018-05-01T06:17:24.758Z] 06:17:24     INFO - CPU: amd64
[task 2018-05-01T06:17:24.759Z] 06:17:24     INFO -      family 6 model 62 stepping 4
[task 2018-05-01T06:17:24.759Z] 06:17:24     INFO -      2 CPUs
[task 2018-05-01T06:17:24.760Z] 06:17:24     INFO - 
[task 2018-05-01T06:17:24.761Z] 06:17:24     INFO - GPU: UNKNOWN
[task 2018-05-01T06:17:24.762Z] 06:17:24     INFO - 
[task 2018-05-01T06:17:24.762Z] 06:17:24     INFO - Crash reason:  SIGSEGV
[task 2018-05-01T06:17:24.763Z] 06:17:24     INFO - Crash address: 0x0
[task 2018-05-01T06:17:24.764Z] 06:17:24     INFO - Process uptime: not available
[task 2018-05-01T06:17:24.764Z] 06:17:24     INFO - 
[task 2018-05-01T06:17:24.765Z] 06:17:24     INFO - Thread 0 (crashed)
[task 2018-05-01T06:17:24.766Z] 06:17:24     INFO -  0  libxul.so!mozilla::gfx::Log<3, mozilla::gfx::BasicLogger>::WriteLog [Logging.h:5bf7bfe1db79fc344db9fa64dbae539771a8db75 : 728 + 0x18]
[task 2018-05-01T06:17:24.766Z] 06:17:24     INFO -     rax = 0x0000000000000000   rdx = 0x0000000000000000
[task 2018-05-01T06:17:24.767Z] 06:17:24     INFO -     rcx = 0x00007f6bb8e642dd   rbx = 0x00007ffc8a4fe880
[task 2018-05-01T06:17:24.768Z] 06:17:24     INFO -     rsi = 0x00007f6bb9133770   rdi = 0x00007f6bb9132540
[task 2018-05-01T06:17:24.768Z] 06:17:24     INFO -     rbp = 0x00007ffc8a4fe7d0   rsp = 0x00007ffc8a4fe7d0
[task 2018-05-01T06:17:24.769Z] 06:17:24     INFO -      r8 = 0x00007f6bb9133770    r9 = 0x00007f6bba414740
[task 2018-05-01T06:17:24.770Z] 06:17:24     INFO -     r10 = 0x000000000000004c   r11 = 0x0000000000000000
[task 2018-05-01T06:17:24.771Z] 06:17:24     INFO -     r12 = 0x00007ffc8a4fe7f0   r13 = 0x00007ffc8a4fea50
[task 2018-05-01T06:17:24.771Z] 06:17:24     INFO -     r14 = 0x00007ffc8a4fe898   r15 = 0x00007f6b9d3f0970
[task 2018-05-01T06:17:24.772Z] 06:17:24     INFO -     rip = 0x00007f6ba7751ca6
[task 2018-05-01T06:17:24.773Z] 06:17:24     INFO -     Found by: given as instruction pointer in context
[task 2018-05-01T06:17:24.773Z] 06:17:24     INFO -  1  libxul.so!mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::~Log [Logging.h:5bf7bfe1db79fc344db9fa64dbae539771a8db75 : 726 + 0x12]
[task 2018-05-01T06:17:24.773Z] 06:17:24     INFO -     rbx = 0x00007ffc8a4fe880   rbp = 0x00007ffc8a4fe820
[task 2018-05-01T06:17:24.774Z] 06:17:24     INFO -     rsp = 0x00007ffc8a4fe7e0   r12 = 0x00007ffc8a4fe7f0
[task 2018-05-01T06:17:24.775Z] 06:17:24     INFO -     r13 = 0x00007ffc8a4fea50   r14 = 0x00007ffc8a4fe898
[task 2018-05-01T06:17:24.776Z] 06:17:24     INFO -     r15 = 0x00007f6b9d3f0970   rip = 0x00007f6ba8075f4a
[task 2018-05-01T06:17:24.777Z] 06:17:24     INFO -     Found by: call frame info
[task 2018-05-01T06:17:24.778Z] 06:17:24     INFO -  2  libxul.so!mozilla::layers::DeallocateTextureClient [TextureClient.cpp:5bf7bfe1db79fc344db9fa64dbae539771a8db75 : 330 + 0x2e]
[task 2018-05-01T06:17:24.779Z] 06:17:24     INFO -     rbx = 0x00007ffc8a4fe880   rbp = 0x00007ffc8a4fea30
[task 2018-05-01T06:17:24.780Z] 06:17:24     INFO -     rsp = 0x00007ffc8a4fe830   r12 = 0x00007f6b96ad3e80
[task 2018-05-01T06:17:24.781Z] 06:17:24     INFO -     r13 = 0x00007ffc8a4fea50   r14 = 0x0000000000000000
[task 2018-05-01T06:17:24.782Z] 06:17:24     INFO -     r15 = 0x00007f6b9d3f0970   rip = 0x00007f6ba81f4e99
[task 2018-05-01T06:17:24.783Z] 06:17:24     INFO -     Found by: call frame info
[task 2018-05-01T06:17:24.784Z] 06:17:24     INFO -  3  libxul.so!mozilla::layers::TextureClient::Destroy [TextureClient.cpp:5bf7bfe1db79fc344db9fa64dbae539771a8db75 : 433 + 0x17]
[task 2018-05-01T06:17:24.784Z] 06:17:24     INFO -     rbx = 0x00007ffc8a4fea70   rbp = 0x00007ffc8a4feac0
[task 2018-05-01T06:17:24.785Z] 06:17:24     INFO -     rsp = 0x00007ffc8a4fea40   r12 = 0x00007ffc8a4fea48
[task 2018-05-01T06:17:24.787Z] 06:17:24     INFO -     r13 = 0x00007ffc8a4fea50   r14 = 0x00007f6b97a9c100
[task 2018-05-01T06:17:24.788Z] 06:17:24     INFO -     r15 = 0x00007f6b9d3f0970   rip = 0x00007f6ba81f540a
[task 2018-05-01T06:17:24.790Z] 06:17:24     INFO -     Found by: call frame info
[task 2018-05-01T06:17:24.791Z] 06:17:24     INFO -  4  libxul.so!mozilla::layers::TextureClient::~TextureClient [TextureClient.cpp:5bf7bfe1db79fc344db9fa64dbae539771a8db75 : 628 + 0x8]
[task 2018-05-01T06:17:24.792Z] 06:17:24     INFO -     rbx = 0x00007f6b97acc1c0   rbp = 0x00007ffc8a4feae0
[task 2018-05-01T06:17:24.793Z] 06:17:24     INFO -     rsp = 0x00007ffc8a4fead0   r12 = 0x0000000000000000
[task 2018-05-01T06:17:24.793Z] 06:17:24     INFO -     r13 = 0x00007f6b97acc1d8   r14 = 0x0000000000000000
[task 2018-05-01T06:17:24.794Z] 06:17:24     INFO -     r15 = 0x00007f6b8c7cca98   rip = 0x00007f6ba81fd0a3
[task 2018-05-01T06:17:24.795Z] 06:17:24     INFO -     Found by: call frame info
[task 2018-05-01T06:17:24.796Z] 06:17:24     INFO -  5  libxul.so!mozilla::layers::TextureClient::~TextureClient [TextureClient.cpp:5bf7bfe1db79fc344db9fa64dbae539771a8db75 : 629 + 0x5]
[task 2018-05-01T06:17:24.797Z] 06:17:24     INFO -     rbx = 0x00007f6b97acc1c0   rbp = 0x00007ffc8a4feb00
[task 2018-05-01T06:17:24.798Z] 06:17:24     INFO -     rsp = 0x00007ffc8a4feaf0   r12 = 0x0000000000000000
[task 2018-05-01T06:17:24.798Z] 06:17:24     INFO -     r13 = 0x00007f6b97acc1d8   r14 = 0x0000000000000000
[task 2018-05-01T06:17:24.799Z] 06:17:24     INFO -     r15 = 0x00007f6b8c7cca98   rip = 0x00007f6ba81fd139
[task 2018-05-01T06:17:24.799Z] 06:17:24     INFO -     Found by: call frame info
[task 2018-05-01T06:17:24.800Z] 06:17:24     INFO -  6  libxul.so!mozilla::AtomicRefCountedWithFinalize<mozilla::layers::TextureClient>::Release [AtomicRefCountedWithFinalize.h:5bf7bfe1db79fc344db9fa64dbae539771a8db75 : 145 + 0x10]
[task 2018-05-01T06:17:24.801Z] 06:17:24     INFO -     rbx = 0x00007f6b97acc1c8   rbp = 0x00007ffc8a4fecb0
[task 2018-05-01T06:17:24.801Z] 06:17:24     INFO -     rsp = 0x00007ffc8a4feb10   r12 = 0x0000000000000000
[task 2018-05-01T06:17:24.802Z] 06:17:24     INFO -     r13 = 0x00007f6b97acc1d8   r14 = 0x0000000000000000
[task 2018-05-01T06:17:24.803Z] 06:17:24     INFO -     r15 = 0x00007f6b8c7cca98   rip = 0x00007f6ba8118258
[task 2018-05-01T06:17:24.804Z] 06:17:24     INFO -     Found by: call frame info
[task 2018-05-01T06:17:24.805Z] 06:17:24     INFO -  7  libxul.so!mozilla::layers::TextureClientHolder::Release [TextureClientRecycleAllocator.cpp:5bf7bfe1db79fc344db9fa64dbae539771a8db75 : 20 + 0x9]
[task 2018-05-01T06:17:24.805Z] 06:17:24     INFO -     rbx = 0x00007f6b97bcf380   rbp = 0x00007ffc8a4fecd0
[task 2018-05-01T06:17:24.807Z] 06:17:24     INFO -     rsp = 0x00007ffc8a4fecc0   r12 = 0x0000000000000000
[task 2018-05-01T06:17:24.807Z] 06:17:24     INFO -     r13 = 0x00007ffc8a4fed00   r14 = 0x00007ffc8a4fecf8
[task 2018-05-01T06:17:24.808Z] 06:17:24     INFO -     r15 = 0x00007f6b8c7cca98   rip = 0x00007f6ba81f381a
[task 2018-05-01T06:17:24.809Z] 06:17:24     INFO -     Found by: call frame info
[task 2018-05-01T06:17:24.809Z] 06:17:24     INFO -  8  libxul.so!mozilla::layers::TextureClientRecycleAllocator::RecycleTextureClient [TextureClientRecycleAllocator.cpp:5bf7bfe1db79fc344db9fa64dbae539771a8db75 : 271 + 0x8]
[task 2018-05-01T06:17:24.810Z] 06:17:24     INFO -     rbx = 0x00007f6b8c7cca70   rbp = 0x00007ffc8a4fed50
[task 2018-05-01T06:17:24.811Z] 06:17:24     INFO -     rsp = 0x00007ffc8a4fece0   r12 = 0x00007f6b8c7cca90
[task 2018-05-01T06:17:24.812Z] 06:17:24     INFO -     r13 = 0x00007ffc8a4fed00   r14 = 0x00007ffc8a4fecf8
[task 2018-05-01T06:17:24.813Z] 06:17:24     INFO -     r15 = 0x00007f6b8c7cca98   rip = 0x00007f6ba81f67e0
[task 2018-05-01T06:17:24.813Z] 06:17:24     INFO -     Found by: call frame info
[task 2018-05-01T06:17:24.813Z] 06:17:24     INFO -  9  libxul.so!mozilla::AtomicRefCountedWithFinalize<mozilla::layers::TextureClient>::Release [AtomicRefCountedWithFinalize.h:5bf7bfe1db79fc344db9fa64dbae539771a8db75 : 152 + 0x3]
[task 2018-05-01T06:17:24.814Z] 06:17:24     INFO -     rbx = 0x00007f6b97acc1c8   rbp = 0x00007ffc8a4fef00
[task 2018-05-01T06:17:24.814Z] 06:17:24     INFO -     rsp = 0x00007ffc8a4fed60   r12 = 0x0000000000000000
[task 2018-05-01T06:17:24.816Z] 06:17:24     INFO -     r13 = 0x00007f6b97acc1d8   r14 = 0x00007f6ba81eab28
[task 2018-05-01T06:17:24.816Z] 06:17:24     INFO -     r15 = 0x00007f6bac6c70a9   rip = 0x00007f6ba811828e
[task 2018-05-01T06:17:24.817Z] 06:17:24     INFO -     Found by: call frame info
[task 2018-05-01T06:17:24.818Z] 06:17:24     INFO - 10  libxul.so!nsTArray_Impl<mozilla::layers::ImageClientSingle::Buffer, nsTArrayInfallibleAllocator>::ClearAndRetainStorage [ImageClient.h:5bf7bfe1db79fc344db9fa64dbae539771a8db75 : 110 + 0x5]
[task 2018-05-01T06:17:24.818Z] 06:17:24     INFO -     rbx = 0x00007f6b97a0a6d8   rbp = 0x00007ffc8a4fef30
[task 2018-05-01T06:17:24.819Z] 06:17:24     INFO -     rsp = 0x00007ffc8a4fef10   r12 = 0x00007f6b97a0a6d8
[task 2018-05-01T06:17:24.819Z] 06:17:24     INFO -     r13 = 0x00007f6b96b92a08   r14 = 0x00007f6b97ab3818
[task 2018-05-01T06:17:24.820Z] 06:17:24     INFO -     r15 = 0x00007f6bac6c70a9   rip = 0x00007f6ba81df7d6
[task 2018-05-01T06:17:24.821Z] 06:17:24     INFO -     Found by: call frame info
For reference bug 1414825 is the same problem but in the canvas code and was fixed.
Done some retiggers and backfills, and managed to reach until this this push (which could be the culprit): https://treeherder.mozilla.org/#/jobs?repo=autoland&filter-searchStr=test-linux64%2Fdebug-mochitest-media-e10s-1&tochange=f01311ae4951f5ffb0e26a5a30c48527697e5583&fromchange=1259c5bc20a733a19d343819d6d88c61445ec86f
The failure rate is quite different from push to push, on some pushes it fails about 10 time out of 20 retriggeres, on some others, about 4-5 time out of 20.
:jmaher Can you please take a look here?
Flags: needinfo?(jmaher)
Whiteboard: [retriggered]
Thanks Noemi- while this is not specific to a test case, it is specific to a job and happens on shutdown of the browser after running the dom/media/test/* tests.  In bug 1452536 where this looks to have started new tests were added to that directory.

I know the failure rate is variable- but we are looking at 25-50% failure rate.  I vote for backing out bug 1452536.  :aryx do you have other thoughts?

:cpearce- is this assertion easy to fix?  Any concerns with backing this out until a fix can be made?
Blocks: 1452536
Flags: needinfo?(jmaher)
Flags: needinfo?(cpearce)
Flags: needinfo?(aryx.bugmail)
There are 71 occurrences in the last 7 days, majority on linux64 debug, 1 on linux64-ccov opt, 1 on linux64-qr debug and linux64-ccov opt . Any updates on this? 

I picked random occurrences too see if this occurs in the same place and in 7/7 logs i picked, this occurred after

TEST-START | dom/media/test/test_video_dimensions.html 

Wouldn't disabling this test be an easier fix for now ?

Thanks
Flags: needinfo?(jmaher)
(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #4)
> Thanks Noemi- while this is not specific to a test case, it is specific to a
> job and happens on shutdown of the browser after running the
> dom/media/test/* tests.  In bug 1452536 where this looks to have started new
> tests were added to that directory.
> 
> I know the failure rate is variable- but we are looking at 25-50% failure
> rate.  I vote for backing out bug 1452536.  :aryx do you have other thoughts?
> 
> :cpearce- is this assertion easy to fix?  Any concerns with backing this out
> until a fix can be made?

I saw this failure on my Try pushes for bug 1452536 but had assumed it was an infra problem with Try as it's in totally unrelated code to what I was changing.

So chances are, the spike is correlated with bug 1452536 landing. I have no idea why.
Flags: needinfo?(cpearce)
(In reply to Andreea Pavel [:apavel] from comment #7)
> There are 71 occurrences in the last 7 days, majority on linux64 debug, 1 on
> linux64-ccov opt, 1 on linux64-qr debug and linux64-ccov opt . Any updates
> on this? 
> 
> I picked random occurrences too see if this occurs in the same place and in
> 7/7 logs i picked, this occurred after
> 
> TEST-START | dom/media/test/test_video_dimensions.html 
> 
> Wouldn't disabling this test be an easier fix for now ?
> 
> Thanks

The tests are run in alphabetical order in this job. This is the last test in the job. The problem is caused by an earlier test causing some kind of reference cycle, which only manifests as a problem at shutdown. So this test appears in the log right before the failure is logged because it's the last test run, and the failure shows up at the end of the run.
(In reply to Andreea Pavel [:apavel] from comment #8)
> Leaving here the failure logs i looked at: 
> 
> 
> https://treeherder.mozilla.org/logviewer.
> html#?job_id=176751547&repo=autoland&lineNumber=7913
> https://treeherder.mozilla.org/logviewer.html#?job_id=176719006&repo=mozilla-
> central&lineNumber=6297
> https://treeherder.mozilla.org/logviewer.html#?job_id=176719009&repo=mozilla-
> central&lineNumber=6321
> https://treeherder.mozilla.org/logviewer.
> html#?job_id=176425165&repo=autoland&lineNumber=6298
> https://treeherder.mozilla.org/logviewer.html#?job_id=176429302&repo=mozilla-
> central&lineNumber=6308
> https://treeherder.mozilla.org/logviewer.
> html#?job_id=176519586&repo=autoland&lineNumber=6312
> https://treeherder.mozilla.org/logviewer.html#?job_id=176585410&repo=mozilla-
> central&lineNumber=6294

All of these logs are failures in the MediaStreamGraph shutting down. My code in bug 1452536 doesn't affect the MSG's behaviour and the MSG is not run in any of the tests. So I don't see how my code could have caused it. Did the failure go away when bug 1452536 was backed out?
Exhibit A, try push with my changes re-added:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5838b8a1b538adfc0fd63a96d2b1abe45664443d

Exhibit B, try push *without* my changes re-added:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=9d50782729a6a247f777bbd36c436e410978d16d

Both are showing this failure. It's not bug 1452536 that caused this.
jya, jib, padenot: Any changes to the MSG in the last two weeks that could have caused this failure? We see call stack like this:

https://treeherder.mozilla.org/logviewer.html#?job_id=176719006&repo=mozilla-central&lineNumber=6444


GECKO(1378) | Hit MOZ_CRASH(GFX: An assert from the graphics logger) at /builds/worker/workspace/build/src/gfx/2d/Logging.h:728
GECKO(1378) | #01: mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::~Log [gfx/2d/Logging.h:731]
GECKO(1378) | #02: mozilla::layers::DeallocateTextureClient [gfx/layers/client/TextureClient.cpp:331]
GECKO(1378) | #03: mozilla::layers::TextureClient::Destroy [gfx/layers/client/TextureClient.cpp:433]
GECKO(1378) | #04: mozilla::layers::TextureClient::~TextureClient [gfx/layers/client/TextureClient.cpp:622]
GECKO(1378) | #05: mozilla::layers::TextureClient::~TextureClient [gfx/layers/client/TextureClient.cpp:629]
GECKO(1378) | #06: mozilla::AtomicRefCountedWithFinalize<mozilla::layers::TextureClient>::Release [gfx/layers/AtomicRefCountedWithFinalize.h:145]
GECKO(1378) | #07: mozilla::layers::TextureClientHolder::Release [memory/mozalloc/mozalloc.h:180]
GECKO(1378) | #08: mozilla::layers::TextureClientRecycleAllocator::RecycleTextureClient [mfbt/RefPtr.h:41]
GECKO(1378) | #09: mozilla::AtomicRefCountedWithFinalize<mozilla::layers::TextureClient>::Release [gfx/layers/AtomicRefCountedWithFinalize.h:154]
GECKO(1378) | #10: nsTArray_Impl<mozilla::layers::ImageClientSingle::Buffer, nsTArrayInfallibleAllocator>::ClearAndRetainStorage [gfx/layers/client/ImageClient.h:110]
GECKO(1378) | #11: nsTArray_Impl<mozilla::layers::ImageClientSingle::Buffer, nsTArrayInfallibleAllocator>::~nsTArray_Impl [xpcom/ds/nsTArray.h:868]
GECKO(1378) | #12: mozilla::layers::ImageClientSingle::~ImageClientSingle [gfx/layers/client/ImageClient.h:50]
GECKO(1378) | #13: mozilla::layers::ImageClientSingle::~ImageClientSingle [gfx/layers/client/ImageClient.h:88]
GECKO(1378) | #14: mozilla::layers::CompositableClient::Release [gfx/layers/client/CompositableClient.h:76]
GECKO(1378) | #15: mozilla::layers::SharedPlanarYCbCrImage::~SharedPlanarYCbCrImage [gfx/layers/ipc/SharedPlanarYCbCrImage.cpp:37]
GECKO(1378) | #16: mozilla::layers::SharedPlanarYCbCrImage::~SharedPlanarYCbCrImage [gfx/layers/ipc/SharedPlanarYCbCrImage.cpp:40]
GECKO(1378) | #17: mozilla::layers::Image::Release [gfx/layers/ImageContainer.h:205]
GECKO(1378) | #18: nsTArray_Impl<mozilla::VideoChunk, nsTArrayInfallibleAllocator>::DestructRange [xpcom/ds/nsTArray.h:2072]
GECKO(1378) | #19: nsTArray_Impl<mozilla::VideoChunk, nsTArrayInfallibleAllocator>::ClearAndRetainStorage [xpcom/ds/nsTArray.h:1295]
GECKO(1378) | #20: mozilla::MediaSegmentBase<mozilla::VideoSegment, mozilla::VideoChunk>::~MediaSegmentBase [xpcom/ds/nsTArray.h:868]
GECKO(1378) | #21: mozilla::VideoSegment::~VideoSegment [dom/media/VideoSegment.cpp:115]
GECKO(1378) | #22: mozilla::StreamTracks::~StreamTracks [memory/mozalloc/mozalloc.h:180]
GECKO(1378) | #23: mozilla::MediaStream::~MediaStream [dom/media/MediaStreamGraph.cpp:1855]
GECKO(1378) | #24: mozilla::TrackUnionStream::~TrackUnionStream [dom/media/TrackUnionStream.h:18]
GECKO(1378) | #25: mozilla::MediaStream::Release [dom/media/MediaStreamGraph.h:260]
GECKO(1378) | #26: mozilla::detail::RunnableMethodImpl<mozilla::DOMMediaStream::OwnedStreamListener*, void (mozilla::DOMMediaStream::OwnedStreamListener::*)(mozilla::MediaStreamGraph*, mozilla::MediaStream*, :MediaStreamGraph*, RefPtr<mozilla::MediaStream>, int, int>::~RunnableMethodImpl [xpcom/threads/nsThreadUtils.h:1190]
GECKO(1378) | #27: mozilla::detail::RunnableMethodImpl<mozilla::DOMMediaStream::OwnedStreamListener*, void (mozilla::DOMMediaStream::OwnedStreamListener::*)(mozilla::MediaStreamGraph*, mozilla::MediaStream*, :MediaStreamGraph*, RefPtr<mozilla::MediaStream>, int, int>::~RunnableMethodImpl [xpcom/threads/nsThreadUtils.h:1190]
GECKO(1378) | #28: mozilla::Runnable::Release [xpcom/threads/nsThreadUtils.cpp:50]
GECKO(1378) | #29: mozilla::EventTargetWrapper::Runner::~Runner [xpcom/threads/AbstractThread.cpp:114]
GECKO(1378) | #30: mozilla::EventTargetWrapper::Runner::~Runner [xpcom/threads/AbstractThread.cpp:114]
GECKO(1378) | #31: mozilla::Runnable::Release [xpcom/threads/nsThreadUtils.cpp:50]
GECKO(1378) | #32: nsTArray_Impl<nsCOMPtr<nsIRunnable>, nsTArrayInfallibleAllocator>::ClearAndRetainStorage [xpcom/ds/nsTArray.h:542]
GECKO(1378) | #33: nsTArray_Impl<nsCOMPtr<nsIRunnable>, nsTArrayInfallibleAllocator>::~nsTArray_Impl [xpcom/ds/nsTArray.h:868]
GECKO(1378) | #34: mozilla::MediaStreamGraphImpl::~MediaStreamGraphImpl [dom/media/MediaStreamGraph.cpp:78]
GECKO(1378) | #35: mozilla::MediaStreamGraphImpl::Release [dom/media/MediaStreamGraph.cpp:3769]
GECKO(1378) | #36: nsTimerImpl::Callback::~Callback [xpcom/threads/nsTimerImpl.h:90]
GECKO(1378) | #37: nsTimerImpl::CancelImpl [xpcom/threads/nsTimerImpl.cpp:537]
GECKO(1378) | #38: nsTimerImpl::Cancel [xpcom/threads/nsTimerImpl.cpp:511]
GECKO(1378) | #39: TimerThread::Shutdown [xpcom/threads/TimerThread.cpp:366]
GECKO(1378) | #40: nsTimerImpl::Shutdown [xpcom/threads/nsTimerImpl.cpp:381]
GECKO(1378) | #41: mozilla::ShutdownXPCOM [xpcom/build/XPCOMInit.cpp:868]
GECKO(1378) | #42: XRE_TermEmbedding [toolkit/xre/nsEmbedFunctions.cpp:228]
GECKO(1378) | #43: mozilla::ipc::ScopedXREEmbed::Stop [ipc/glue/ScopedXREEmbed.cpp:109]
GECKO(1378) | #44: XRE_InitChildProcess [toolkit/xre/nsEmbedFunctions.cpp:724]
GECKO(1378) | #45: content_process_main [ipc/contentproc/plugin-container.cpp:52]
GECKO(1378) | #46: main [browser/app/nsBrowserApp.cpp:283]
GECKO(1378) | #47: libc.so.6 + 0x20830
Flags: needinfo?(padenot)
Flags: needinfo?(jyavenard)
Flags: needinfo?(jib)
There's a few bugs like this one already.
Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(padenot)
Flags: needinfo?(jyavenard)
Flags: needinfo?(jmaher)
Flags: needinfo?(jib)
Flags: needinfo?(aryx.bugmail)
Resolution: --- → DUPLICATE
Duplicate of bug: 1447563
They're similar but not the same. And since bug 1447563 has had patches landed already I'd prefer to not do more work on it.

The stack in comment 15 shows that a MediaStream is holding onto a track buffer that's holding onto a VideoSegment that's holding onto an image that is getting destructed too late, in this case by the MSG dtor.

But this is meant to be cleaned up by the MSG shutdown runnable, [1]. I need to look into why this didn't have the desired effect.


[1] https://searchfox.org/mozilla-central/rev/b28b94dc81d60c6d9164315adbd4a5073526d372/dom/media/MediaStreamGraph.cpp#1502
Assignee: nobody → apehrson
Status: RESOLVED → REOPENED
Rank: 15
Priority: P5 → P2
Resolution: DUPLICATE → ---
(In reply to Chris Pearce (:cpearce) from comment #10)
> (In reply to Andreea Pavel [:apavel] from comment #7)
> > There are 71 occurrences in the last 7 days, majority on linux64 debug, 1 on
> > linux64-ccov opt, 1 on linux64-qr debug and linux64-ccov opt . Any updates
> > on this? 
> > 
> > I picked random occurrences too see if this occurs in the same place and in
> > 7/7 logs i picked, this occurred after
> > 
> > TEST-START | dom/media/test/test_video_dimensions.html 
> > 
> > Wouldn't disabling this test be an easier fix for now ?
> > 
> > Thanks
> 
> The tests are run in alphabetical order in this job. This is the last test
> in the job. The problem is caused by an earlier test causing some kind of
> reference cycle, which only manifests as a problem at shutdown. So this test
> appears in the log right before the failure is logged because it's the last
> test run, and the failure shows up at the end of the run.

This is not necessarily true, test_video_dimensions.html does `v2.mozCaptureStreamUntilEnded()` [1] which gets you a MediaStream, and implied, a MediaStreamGraph. It's likely this is the graph exhibiting shutdown issues.


[1] https://searchfox.org/mozilla-central/rev/b28b94dc81d60c6d9164315adbd4a5073526d372/dom/media/test/test_video_dimensions.html#57
I have reproduced on a loaner, and the MediaStreams are removed from the graph before the shutdown runnable runs. This would explain why the shutdown runnable is unable to clean up.

This would have been fine in the normal case where no external references are held to the stream. In this case however, a DOMMediaStream causes the MediaStream in question to be Destroy()ed through CC, but it doesn't remove its listener from the stream. This triggers a track-ended event later on, that dispatches a runnable which holds a reference to the stream, to main thread. This runnable is what keeps the Image alive (through the MediaStream, its track buffer, and so on) past normal shutdown.

I have fixes at two separate layers lined up to catch this case.
Comment on attachment 8973634 [details]
Bug 1458166 - Clear out tracks while destroying MediaStream.

https://reviewboard.mozilla.org/r/242002/#review247894
Attachment #8973634 - Flags: review?(bvandyk) → review+
Comment on attachment 8973635 [details]
Bug 1458166 - Remove listeners before destroying MediaStreams.

https://reviewboard.mozilla.org/r/242004/#review247898
Attachment #8973635 - Flags: review?(bvandyk) → review+
Pushed by pehrsons@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/da8423563072
Clear out tracks while destroying MediaStream. r=bryce
https://hg.mozilla.org/integration/autoland/rev/8dc7d2696a29
Remove listeners before destroying MediaStreams. r=bryce
https://hg.mozilla.org/mozilla-central/rev/da8423563072
https://hg.mozilla.org/mozilla-central/rev/8dc7d2696a29
Status: REOPENED → RESOLVED
Closed: 2 years ago2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla62
Please request Beta approval on this once we're reasonably confident that it fixes the oranges.
Flags: needinfo?(apehrson)
Comment on attachment 8973634 [details]
Bug 1458166 - Clear out tracks while destroying MediaStream.

This approval request applies to all patches on this bug.

Approval Request Comment
[Feature/Bug causing the regression]: Unclear/latent bug exposed by timing changes.
[User impact if declined]: Possible shutdown crash, intermittents in automation.
[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]: Yes, see orangefactor in bug details.
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: No
[Why is the change risky/not risky?]: Patches add simple cleanup code on CC/shutdown. Threading assumptions are trivial.
[String changes made/needed]: None
Flags: needinfo?(apehrson)
Attachment #8973634 - Flags: approval-mozilla-beta?
Comment on attachment 8973634 [details]
Bug 1458166 - Clear out tracks while destroying MediaStream.

Cleans up some intermittent oranges on Beta. Approved for 61.0b5.
Attachment #8973634 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Duplicate of this bug: 1378025
(In reply to OrangeFactor Robot from comment #32)
> 1 failures in 639 pushes (0.002 failures/push) were associated with this bug
> in the last 7 days.    
> 
> Repository breakdown:
> * try: 1
> 
> Platform breakdown:
> * linux64: 1
> 
> For more details, see:
> https://treeherder.mozilla.org/intermittent-failures.html#/
> bugdetails?bug=1458166&startday=2018-05-14&endday=2018-05-20&tree=trunk

Oddly enough the link says 0.
the one failure is from try
Ah, user error, thanks!

I still find it weird that it gives you a "more details" link that doesn't include any of the details mentioned.
You need to log in before you can comment on or make changes to this bug.