<a class="header-button" href="https://bugzilla.mozilla.org/home" title="Go to home page"> Bugzilla

Reporter

Updated

•

11 years ago

No longer blocks: core-e10s

status-firefox33: --- → affected

Updated

•

11 years ago

tracking-e10s: ? → +

Depends on: old-e10s-m2

Comment 1

•

11 years ago

Is this still reproducible? I can't repro on my linux laptop.

Reporter

Comment 2

•

11 years ago

I can no longer repro (this particular) crash when zooming the Mozilla Location Service map.

Status: NEW → RESOLVED

Closed: 11 years ago

status-firefox33: affected → unaffected

status-firefox34: --- → unaffected

Resolution: --- → WORKSFORME

Updated

•

11 years ago

Blocks: old-e10s-m2

No longer depends on: old-e10s-m2

Reporter

Comment 3

•

11 years ago

I can repro this e10s crash again. Same STR on OS X: zoom map in and out for about 10 seconds. bp-d6379d74-5c53-4e99-bc67-b9f4c2140922 bp-50b7d16f-9f7a-4c6b-85cc-068f12140923

Status: RESOLVED → REOPENED

status-firefox35: --- → affected

tracking-e10s: + → ?

Resolution: WORKSFORME → ---

Milan Sreckovic [:milan] (needinfo for best results)

Updated

•

11 years ago

tracking-e10s: ? → +

(no longer active)

Comment 4

•

11 years ago

I just got this crash as well in e10s, while watching a youtube video: https://crash-stats.mozilla.com/report/index/d437ea25-48d8-4fb1-a6e1-b21912141023

Updated

•

11 years ago

Assignee: nobody → nical.bugzilla

Comment 5

•

11 years ago

Looks like there's 2 ways MessageChannel::Send can return false: http://dxr.mozilla.org/mozilla-central/source/ipc/glue/MessageChannel.cpp#503 If it's an actor id mismatch it's going to be a bit tough since I don't know how that works, and also we are creating an actor so I don't know what the mismatch would be about. If it's because the connection closed somehow, fixing this crash shouldn't be hard but it'll probably move the problem to why did the connection close (OOM, maybe?). Creating texture actors happens a lot so it's not surprising if it happens to be the one message that races with loosing the ipdl connection. Just speculation at this point.

Benoit Jacob [:bjacob] (mostly away)

Comment 6

•

11 years ago

Attached patch don't try to create the actor if the message channel is closed (obsolete) — Details — Splinter Review

This check can't hurt since there is no way to SendPTextureConstructor can do anything but crash when the ipdl connection is closed. If it fixes the crash (or move it to another stack) it'll tell us that we are loosing the connection which is a problem of its own, if it doesn't then we know that it's an actor mismatch.

Attachment #8522212 - Flags: review?(bjacob)

Comment 7

•

11 years ago

Comment on attachment 8522212 [details] [diff] [review] don't try to create the actor if the message channel is closed Review of attachment 8522212 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/layers/ipc/ShadowLayers.cpp @@ +812,5 @@ > TextureFlags aFlags) > { > if (!HasShadowManager() || > + !mShadowManager->IPCOpen() > + !mShadowManager->GetIPCChannel()->Connected()) { Missing || so this doesn't compile. If this fixes the bug, then doesn't this imply that this new GetIPCChannel()->Connected() condition should be tested by IPCOpen()?

Attachment #8522212 - Flags: review?(bjacob) → review-

Comment 8

•

11 years ago

Attached patch updated patch — Details — Splinter Review

Woops, forgot to qrefresh before submitting. I agree that IPCOpen would be a better place, although I was hesitant to go that way since it is (IIRC) specific to the shutdown sequence and I wasn't sure I should touch that part. That said I think it's pretty safe to put it there. This patch has the check in IPCOpen

Attachment #8522212 - Attachment is obsolete: true

Benoit Jacob [:bjacob] (mostly away)

Updated

•

11 years ago

Attachment #8522273 - Flags: review?(bjacob)

Updated

•

11 years ago

Attachment #8522273 - Flags: review?(bjacob) → review+

https://hg.mozilla.org/integration/mozilla-inbound/rev/80f873bf8adc

Comment 9

•

11 years ago

Carsten Book [:Tomcat]

Comment 10

•

11 years ago

sorry had to back this out for test failures like https://treeherder.mozilla.org/ui/logviewer.html#?job_id=3899338&repo=mozilla-inbound

Flags: needinfo?(nical.bugzilla)

:Felipe Gomes (needinfo for replies!)

Comment 11

•

11 years ago

I'm getting this crash on e10s on just by using and tab switching between IRCCloud and bugzilla

Comment 13

•

11 years ago

I know there's already a patch in here, but I also read a comment suggesting that we might not know the underlying cause of the crash. I was able to get a stack: * thread #1: tid = 0x70ea01, 0x000000010007431d libmozalloc.dylib`mozalloc_abort(msg=0x00007fff5fbf8088) + 93 at mozalloc_abort.cpp:37, queue = 'com.apple.main-thread', stop reason = EXC_BAD_ACCESS (code=1, address=0x0) * frame #0: 0x000000010007431d libmozalloc.dylib`mozalloc_abort(msg=0x00007fff5fbf8088) + 93 at mozalloc_abort.cpp:37 frame #1: 0x00000001006884f5 XUL`Abort(aMsg=0x00007fff5fbf8088) + 21 at nsDebugImpl.cpp:469 frame #2: 0x0000000100687f50 XUL`NS_DebugBreak(aSeverity=3, aStr=0x0000000000000000, aExpr=0x0000000000000000, aFile=0x00000001072dd314, aLine=221) + 1232 at nsDebugImpl.cpp:426 frame #3: 0x0000000100d8a97a XUL`mozilla::Logger::~Logger(this=0x00007fff5fbf8640) + 330 at logging.cc:47 frame #4: 0x0000000100d8a825 XUL`mozilla::Logger::~Logger(this=0x00007fff5fbf8640) + 21 at logging.cc:14 frame #5: 0x00000001006c8a15 XUL`mozilla::LogWrapper::~LogWrapper(this=0x00007fff5fbf8640) + 21 at logging.h:59 frame #6: 0x00000001006c89f5 XUL`mozilla::LogWrapper::~LogWrapper(this=0x00007fff5fbf8640) + 21 at logging.h:59 frame #7: 0x0000000100d660b3 XUL`base::SharedMemory::CreateOrOpen(this=0x000000012ddaf8f8, name=0x00007fff5fbf87f0, posix_flags=514, size=274432) + 963 at shared_memory_posix.cc:221 frame #8: 0x0000000100d65ca8 XUL`base::SharedMemory::Create(this=0x000000012ddaf8f8, cname=0x00007fff5fbf8850, read_only=false, open_existing=false, size=274432) + 184 at shared_memory_posix.cc:79 frame #9: 0x0000000100e12bee XUL`mozilla::ipc::SharedMemoryBasic::Create(this=0x000000012ddaf8d0, aNbytes=274432) + 94 at SharedMemoryBasic_chromium.h:40 frame #10: 0x0000000100dfa373 XUL`mozilla::ipc::CreateSegment(aNBytes=274432, aHandle=mozilla::ipc::SharedMemoryBasic::Handle at 0x00007fff5fbf8900) + 227 at Shmem.cpp:145 frame #11: 0x0000000100dfa0d7 XUL`mozilla::ipc::Shmem::Alloc(=IHadBetterBeIPDLCodeCallingThis_OtherwiseIAmADoodyhead at 0x00007fff5fbf89d8, aNBytes=262160, aType=TYPE_BASIC, aUnsafe=true, aProtect=false) + 295 at Shmem.cpp:365 frame #12: 0x0000000100f8db5e XUL`mozilla::layers::PCompositorChild::CreateSharedMemory(this=0x00000001167df400, aSize=262160, aType=TYPE_BASIC, aUnsafe=true, aId=0x00007fff5fbf8ba4) + 110 at PCompositorChild.cpp:649 frame #13: 0x0000000100f8dd4c XUL`_ZThn16_N7mozilla6layers16PCompositorChild18CreateSharedMemoryEmNS_3ipc12SharedMemory16SharedMemoryTypeEbPi(this=0x00000001167df410, aSize=262160, aType=TYPE_BASIC, aUnsafe=true, aId=0x00007fff5fbf8ba4) + 76 at UnifiedProtocols4.cpp:667 frame #14: 0x00000001011a5df4 XUL`mozilla::layers::PLayerTransactionChild::CreateSharedMemory(this=0x0000000111a51f50, aSize=262160, aType=TYPE_BASIC, aUnsafe=true, aId=0x00007fff5fbf8ba4) + 84 at PLayerTransactionChild.cpp:706 frame #15: 0x00000001011a7793 XUL`mozilla::layers::PLayerTransactionChild::AllocUnsafeShmem(this=0x0000000111a51f50, aSize=262160, aType=TYPE_BASIC, aOutMem=0x000000012b851358) + 83 at PLayerTransactionChild.cpp:958 frame #16: 0x0000000101bc4580 XUL`mozilla::layers::ShadowLayerForwarder::AllocUnsafeShmem(this=0x0000000117b38680, aSize=262160, aType=TYPE_BASIC, aShmem=0x000000012b851358) + 192 at ShadowLayers.cpp:725 frame #17: 0x0000000101b4f3c1 XUL`mozilla::layers::ShmemTextureClient::Allocate(this=0x000000012b8512e0, aSize=262160) + 177 at TextureClient.cpp:585 frame #18: 0x0000000101b4fef2 XUL`mozilla::layers::BufferTextureClient::AllocateForSurface(this=0x000000012b8512e0, aSize=mozilla::gfx::IntSize at 0x00007fff5fbf8cb8, aFlags=ALLOC_DEFAULT) + 354 at TextureClient.cpp:718 frame #19: 0x0000000101b47d37 XUL`mozilla::layers::TextureClient::CreateForDrawing(aAllocator=0x0000000117b38680, aFormat=B8G8R8X8, aSize=mozilla::gfx::IntSize at 0x00007fff5fbf8d68, aMoz2DBackend=COREGRAPHICS, aTextureFlags=IMMEDIATE_UPLOAD, aAllocFlags=ALLOC_DEFAULT) + 535 at TextureClient.cpp:395 frame #20: 0x0000000101b6511c XUL`mozilla::layers::TextureClientPool::GetTextureClient(this=0x000000011fd69ac0) + 412 at TextureClientPool.cpp:65 frame #21: 0x0000000101b69821 XUL`mozilla::layers::TileClient::GetBackBuffer(this=0x00007fff5fbf99f8, aDirtyRegion=0x00007fff5fbf9708, aContent=COLOR, aMode=SURFACE_COMPONENT_ALPHA, aCreatedTextureClient=0x00007fff5fbf9727, aAddPaintedRegion=0x00007fff5fbf96e0, aCanRerasterizeValidRegion=false, aBackBufferOnWhite=0x00007fff5fbf96d8) + 753 at TiledContentClient.cpp:752 frame #22: 0x0000000101b6b13a XUL`mozilla::layers::ClientTiledLayerBuffer::ValidateTile(this=0x000000012b676c78, aTile=0x00007fff5fbf99f8, aTileOrigin=0x00007fff5fbf99f0, aDirtyRegion=0x00007fff5fbf9d68) + 522 at TiledContentClient.cpp:1105 frame #23: 0x0000000101b88d37 XUL`mozilla::layers::TiledLayerBuffer<mozilla::layers::ClientTiledLayerBuffer, mozilla::layers::TileClient>::Update(this=0x000000012b676c78, aNewValidRegion=0x0000000126d8de00, aPaintRegion=0x00007fff5fbfa660) + 3655 at TiledLayerBuffer.h:501 frame #24: 0x0000000101b6a68b XUL`mozilla::layers::ClientTiledLayerBuffer::PaintThebes(this=0x000000012b676c78, aNewValidRegion=0x0000000126d8de00, aPaintRegion=0x00007fff5fbfa660, aCallback=0x0000000103e0fd30, aCallbackData=0x00007fff5fbfb300)(mozilla::layers::PaintedLayer*, gfxContext*, nsIntRegion const&, mozilla::layers::DrawRegionClip, nsIntRegion const&, void*), void*) + 1227 at TiledContentClient.cpp:959 frame #25: 0x0000000101b46e34 XUL`mozilla::layers::ClientTiledPaintedLayer::RenderLayer(this=0x0000000126d8dc00) + 1204 at ClientTiledPaintedLayer.cpp:380 frame #26: 0x0000000101b4734c XUL`_ZThn544_N7mozilla6layers23ClientTiledPaintedLayer11RenderLayerEv(this=0x0000000126d8de20) + 28 at Unified_cpp_gfx_layers2.cpp:471 frame #27: 0x0000000101b577b5 XUL`mozilla::layers::ClientLayer::RenderLayerWithReadback(this=0x0000000126d8de20, aReadback=0x00007fff5fbfa798) + 37 at ClientLayerManager.h:370 frame #28: 0x0000000101b62cb3 XUL`mozilla::layers::ClientContainerLayer::RenderLayer(this=0x000000011df23000) + 355 at ClientContainerLayer.h:69 frame #29: 0x0000000101b62e1c XUL`_ZThn536_N7mozilla6layers20ClientContainerLayer11RenderLayerEv(this=0x000000011df23218) + 28 at Unified_cpp_gfx_layers2.cpp:76 frame #30: 0x0000000101b42cba XUL`mozilla::layers::ClientLayerManager::EndTransactionInternal(this=0x0000000112b29400, aCallback=0x0000000103e0fd30, aCallbackData=0x00007fff5fbfb300, =END_DEFAULT)(mozilla::layers::PaintedLayer*, gfxContext*, nsIntRegion const&, mozilla::layers::DrawRegionClip, nsIntRegion const&, void*), void*, mozilla::layers::LayerManager::EndTransactionFlags) + 378 at ClientLayerManager.cpp:268 frame #31: 0x0000000101b42e9b XUL`mozilla::layers::ClientLayerManager::EndTransaction(this=0x0000000112b29400, aCallback=0x0000000103e0fd30, aCallbackData=0x00007fff5fbfb300, aFlags=END_DEFAULT)(mozilla::layers::PaintedLayer*, gfxContext*, nsIntRegion const&, mozilla::layers::DrawRegionClip, nsIntRegion const&, void*), void*, mozilla::layers::LayerManager::EndTransactionFlags) + 91 at ClientLayerManager.cpp:298 frame #32: 0x0000000103e96047 XUL`nsDisplayList::PaintRoot(this=0x00007fff5fbfb258, aBuilder=0x00007fff5fbfb300, aCtx=0x0000000000000000, aFlags=13) + 2839 at nsDisplayList.cpp:1444 frame #33: 0x0000000103ec8d30 XUL`nsLayoutUtils::PaintFrame(aRenderingContext=0x0000000000000000, aFrame=0x000000011fc14458, aDirtyRegion=0x00007fff5fbfbc30, aBackstop=0, aFlags=772) + 4272 at nsLayoutUtils.cpp:3152 frame #34: 0x0000000103f2328f XUL`PresShell::Paint(this=0x000000011dcad000, aViewToPaint=0x000000011675b970, aDirtyRegion=0x00007fff5fbfbc30, aFlags=1) + 2143 at nsPresShell.cpp:6337 frame #35: 0x00000001039ef2ce XUL`nsViewManager::ProcessPendingUpdatesPaint(this=0x000000011dca0880, aWidget=0x0000000112b28f80) + 526 at nsViewManager.cpp:443 frame #36: 0x00000001039eef76 XUL`nsViewManager::ProcessPendingUpdatesForView(this=0x000000011dca0880, aView=0x000000011675b970, aFlushDirtyRegion=true) + 502 at nsViewManager.cpp:384 frame #37: 0x00000001039eff14 XUL`nsViewManager::ProcessPendingUpdates(this=0x000000011dca0880) + 132 at nsViewManager.cpp:1075 frame #38: 0x0000000103df6245 XUL`nsRefreshDriver::Tick(this=0x000000011dc4dc00, aNowEpoch=1416343232301476, aNowTime=TimeStamp at 0x00007fff5fbfc180) + 4645 at nsRefreshDriver.cpp:1354 frame #39: 0x0000000103dfd08c XUL`mozilla::RefreshDriverTimer::TickDriver(driver=0x000000011dc4dc00, jsnow=1416343232301476, now=TimeStamp at 0x00007fff5fbfc1b8) + 92 at nsRefreshDriver.cpp:173 frame #40: 0x0000000103dfcf62 XUL`mozilla::RefreshDriverTimer::Tick(this=0x0000000117bf1780) + 322 at nsRefreshDriver.cpp:164 frame #41: 0x0000000103dfce11 XUL`mozilla::RefreshDriverTimer::TimerTick(aTimer=0x000000011dc99860, aClosure=0x0000000117bf1780) + 33 at nsRefreshDriver.cpp:190 frame #42: 0x000000010077534a XUL`nsTimerImpl::Fire(this=0x000000011dc99860) + 986 at nsTimerImpl.cpp:621 frame #43: 0x0000000100775761 XUL`nsTimerEvent::Run(this=0x0000000117bfd980) + 209 at nsTimerImpl.cpp:714 frame #44: 0x0000000100770246 XUL`nsThread::ProcessNextEvent(this=0x0000000111a7e040, aMayWait=false, aResult=0x00007fff5fbfc5c3) + 2086 at nsThread.cpp:830 frame #45: 0x00000001007c700b XUL`NS_ProcessPendingEvents(aThread=0x0000000111a7e040, aTimeout=20) + 171 at nsThreadUtils.cpp:207 frame #46: 0x0000000103a0a409 XUL`nsBaseAppShell::NativeEventCallback(this=0x00000001166fdac0) + 201 at nsBaseAppShell.cpp:98 frame #47: 0x0000000103a84901 XUL`nsAppShell::ProcessGeckoEvents(aInfo=0x00000001166fdac0) + 433 at nsAppShell.mm:377 frame #48: 0x00007fff8d312b31 CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17 frame #49: 0x00007fff8d312455 CoreFoundation`__CFRunLoopDoSources0 + 245 frame #50: 0x00007fff8d3357f5 CoreFoundation`__CFRunLoopRun + 789 frame #51: 0x00007fff8d3350e2 CoreFoundation`CFRunLoopRunSpecific + 290 frame #52: 0x00007fff93349eb4 HIToolbox`RunCurrentEventLoopInMode + 209 frame #53: 0x00007fff93349c52 HIToolbox`ReceiveNextEventCommon + 356 frame #54: 0x00007fff93349ae3 HIToolbox`BlockUntilNextEventMatchingListInMode + 62 frame #55: 0x00007fff8f1df533 AppKit`_DPSNextEvent + 685 frame #56: 0x00007fff8f1dedf2 AppKit`-[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 128 frame #57: 0x0000000103a83557 XUL`-[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:](self=0x00000001167b18f0, _cmd=0x00007fff8fa0d404, mask=18446744073709551615, expiration=0x422d63c37f00000d, mode=0x00007fff7ba161c0, flag='\x01') + 119 at nsAppShell.mm:118 frame #58: 0x00007fff8f1d61a3 AppKit`-[NSApplication run] + 517 frame #59: 0x0000000103a852a7 XUL`nsAppShell::Run(this=0x00000001166fdac0) + 167 at nsAppShell.mm:651 frame #60: 0x00000001049e49b9 XUL`XRE_RunAppShell + 345 at nsEmbedFunctions.cpp:731 frame #61: 0x0000000100df7754 XUL`mozilla::ipc::MessagePumpForChildProcess::Run(this=0x0000000111a21240, aDelegate=0x00007fff5fbfe048) + 196 at MessagePump.cpp:272 frame #62: 0x0000000100d8ba55 XUL`MessageLoop::RunInternal(this=0x00007fff5fbfe048) + 117 at message_loop.cc:233 frame #63: 0x0000000100d8b965 XUL`MessageLoop::RunHandler(this=0x00007fff5fbfe048) + 21 at message_loop.cc:226 frame #64: 0x0000000100d8b90d XUL`MessageLoop::Run(this=0x00007fff5fbfe048) + 45 at message_loop.cc:200 frame #65: 0x00000001049e4203 XUL`XRE_InitChildProcess(aArgc=3, aArgv=0x00007fff5fbff638, aGMPLoader=0x0000000000000000) + 2691 at nsEmbedFunctions.cpp:568 frame #66: 0x00000001000018db plugin-container`content_process_main(argc=6, argv=0x00007fff5fbff638) + 299 at plugin-container.cpp:190 frame #67: 0x00000001000019d2 plugin-container`main(argc=7, argv=0x00007fff5fbff638) + 34 at MozillaRuntimeMain.cpp:11 frame #68: 0x0000000100001794 plugin-container`start + 52

Comment 14

•

11 years ago

Opening the following URL in an e10s build crashes quite often for me: http://12illustrations.com/post/32805685485/tmnt-by-chris-uminga If it doesn't crash at first just force-reload a few times until it does. Currently disabling a few things to see if some add-on causes it.

Comment 15

•

11 years ago

Attached patch Make sure to not end message with LayerTransactionChild if it was destroyed — Details — Splinter Review

This would be a better fix if the issue is that we are using the channel after it's shut down, and is good to take even if not.

Flags: needinfo?(nical.bugzilla)

Attachment #8528689 - Flags: review?(jmuizelaar)

Updated

•

11 years ago

Attachment #8528689 - Flags: review?(jmuizelaar) → review+

https://hg.mozilla.org/integration/mozilla-inbound/rev/f335ca3e87c5

Comment 16

•

11 years ago

Whiteboard: [leave-open]

Carsten Book [:Tomcat]

Comment 17

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/f335ca3e87c5

Comment 18

•

11 years ago

I haven't been able to reproduce this crash both with and without my (speculative) patch. Is anyone able to reproduce this on today's nightly?

Andrew McCreight (out of office until 8/21) [:mccr8]

Comment 19

•

11 years ago

Yes - I can still reproduce it using the steps in duped bug 1100902: STR: 1.) Enable e10s and visit https://www.mozilla.org/en-US/firefox/os/ 2.) Scroll up/down the page

Comment 20

•

11 years ago

I can also reproduce using the steps in comment 19.

Comment 21

•

11 years ago

I can reliably crash my Nightlies visiting https://www.mozilla.org/en-US/firefox/os/ but I can't get a custom build to fail, not even with the default profile I use for Nightly. I failed so far trying to get a custom build with the Nightly build config - but maybe that would help us reproduce locally?

Comment 22

•

11 years ago

Running: /Applications/FirefoxNightly.app/Contents/MacOS/firefox -P default I can't make the content process crash. If I however click the "Nightly" icon that's fixed in the dock and then navigate to the fxos page I can crash it reliably. That's the same executable with the same profile...

Comment 23

•

11 years ago

Pinned my custom build to the dock and can reliably crash the content process. Nicolas, can you reproduce this with the new information provided?

Flags: needinfo?(nical.bugzilla)

Comment 24

•

11 years ago

It also seems to crash more reliably with more tabs open. So to reproduce it's best to open 50 GitHub (or something) tabs in a window and open the FxOS page as the last, and then scroll up and down.

Comment 25

•

11 years ago

It really does only happen for me when Firefox is pinned to the dock. In that case I even see artifacts when scrolling, I don't see those when executing "mach run".

Comment 26

•

11 years ago

TextureClientPool::GetTextureClient() sets the local variable |textureClient| but both of the functions can return a nullptr. With GFX_DEBUG_TRACK_CLIENTS_IN_POOL=1 we should check whether we have a textureClient at all, no? http://mxr.mozilla.org/mozilla-central/source/gfx/layers/client/TextureClientPool.cpp#107

Comment 27

•

11 years ago

(In reply to Tim Taubert [:ttaubert] from comment #26) > TextureClientPool::GetTextureClient() sets the local variable > |textureClient| but both of the functions can return a nullptr. With > GFX_DEBUG_TRACK_CLIENTS_IN_POOL=1 we should check whether we have a > textureClient at all, no? > > http://mxr.mozilla.org/mozilla-central/source/gfx/layers/client/ > TextureClientPool.cpp#107 Looks like that's just another thing we might hit in a debug build but isn't the actual problem here. It's the backtrace in comment #13 we're hitting mostly.

Comment 28

•

11 years ago

Debugging further, dup() fails with "Too many open files" here: http://mxr.mozilla.org/mozilla-central/source/ipc/chromium/src/base/shared_memory_posix.cc#220

Comment 29

•

11 years ago

When starting Firefox from the dock RLIMIT_NOFILE=800 which seems to be a little too small with dozens of tabs open. When starting Firefox with |mach run| RLIMIT_NOFILE=4864. That's why we don't run into this issue in the latter case. I wonder which of those is the right rlimit? And shouldn't we handle these types of failures better? Are we creating too many file handles?

Comment 30

•

11 years ago

Good catch. I couldn't reproduce this on the 10.10 mac I borrowed, which version are you using? 800 isn't a lot of file descriptors. We typically have one fd per texture we share with the compositor, so at least 2 per layer, more with tiling (1 or 2 per tile), and that's only gfx.

Flags: needinfo?(nical.bugzilla)

Comment 31

•

11 years ago

I used Nightly for testing, resp. fx-team tip for custom builds.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 32

•

11 years ago

Oh and OS X 10.9

Comment 33

•

11 years ago

This seems to be beyond graphics; let me find a better person to assign to.

Assignee: nical.bugzilla → milan

Milan Sreckovic [:milan] (needinfo for best results)

Comment 34

•

11 years ago

CatLee, can we explain comment 29 with the RLIMIT_NOFILE being different by a factor of 5 between the different builds?

Flags: needinfo?(catlee)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 35

•

11 years ago

As catlee pointed out, comment 23 explains this limit is not a part of the build.

Flags: needinfo?(catlee)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 36

•

11 years ago

(In reply to Tim Taubert [:ttaubert] from comment #27) > (In reply to Tim Taubert [:ttaubert] from comment #26) > > TextureClientPool::GetTextureClient() sets the local variable > > |textureClient| but both of the functions can return a nullptr. With > > GFX_DEBUG_TRACK_CLIENTS_IN_POOL=1 we should check whether we have a > > textureClient at all, no? > > > > http://mxr.mozilla.org/mozilla-central/source/gfx/layers/client/ > > TextureClientPool.cpp#107 > > Looks like that's just another thing we might hit in a debug build but isn't > the actual problem here... I created bug 1109828 to take care of this.

Bill McCloskey [inactive unless it's an emergency] (:billm)

Comment 37

•

11 years ago

When running |mach run| from iTerm I see RLIMIT_FILENO=4864. That seems to come from iTerm somehow and doesn't behave the same with Terminal.app. RLIMIT_FILENO=800 is explained easily as 256 is the default value on OS X. After nsSocketTransportService::DiscoverMaxCount() was called once we bump it up to 800 (bug 607741 + 250) and it stays there forever. If 800 is a low limit, should we consider increasing it? If the dup() call in our IPC code is the only path hitting it should we try and raise it lazily when receiving EMFILE? Adding Patrick who seems to know about socket/file limits.

Comment 38

•

11 years ago

I'm a little worried about just increasing the limit. On my Linux system, the hard limit (ulimit -Hn) is 4096. I don't think we're allowed to use more than that without user intervention. If a normal website requires more than 800 file descriptors, and if we can never exceed 4096, then it would be very easy for a badly behaving site to cause lots of problems. Is there a way that the graphics code could use fewer (but larger) textures?

Patrick McManus [:mcmanus]

Comment 39

•

11 years ago

Yeah, maybe we should rather work towards decreasing the number of fds. Here's some discussion/bugs from the Chromium project: https://groups.google.com/a/chromium.org/forum/#!msg/chromium-dev/lFgKOYo3iYM/dWPjnFmn-OgJ https://code.google.com/p/chromium/issues/detail?id=362603 https://code.google.com/p/chromium/issues/detail?id=372339

Comment 40

•

11 years ago

here's the weird thing about socket numbers - the select() api will basically crash if you pass it a fd with a value > FD_SETSIZE because the fd value is an offset into a bit array of FD_SETSIZE. on winsock that's 1024.. which is controlled deep in nspr by setting a define before including some windows headers. now gecko is smart enough not to use select(), but history has told us our fd's end up in places other than gecko code. Addons and especially LSP's (third party network stack hooks on windows) - and this code does indeed call select() on the fd. boom. And fd's are a process global resource of course.. so if graphics starts using a large amount of them (that never see an LSP, thankfully) that still can result in networking allocating one later with a high value which finds its way into these processes. so that's the backstory. Easy thing 1 - bump up the 800 number to whatever we can probe the OS for at max on non-windows. I'm assuming this is a windows bug though too, right? This won't cost RAM unless we actually use them. Other idea - bump up the NSPR limit. I do worry about the compatbility of this - FD_SETSIZE needs to be #defined universally before using the select() calls and we have no way to control that inside a third party piece of code.. our value is technically customized now, but its a pretty common value. Maybe I worry too much - select() has been out of style for many years now.

Milan Sreckovic [:milan] (needinfo for best results)

Reporter

Comment 41

•

11 years ago

(In reply to Patrick McManus [:mcmanus] from comment #40) > Easy thing 1 - bump up the 800 number to whatever we can probe the OS for at > max on non-windows. I'm assuming this is a windows bug though too, right? ttaubert and I can reproduce this crash on OS X. I don't know if it happens on Windows at all.

Comment 42

•

11 years ago

Is it curious that we have this show up under E10S where we should have more processes and thus more overall file descriptors, or is it because IPC is now cross-process that we use more of them. I'm pretty sure we ran into something similar on B2G and had to up the max number to 1024, so that may be the first thing we may try, just go from 800 to 1024, see if it helps. Also, I wonder if higher FD limit is just masking a problem - perhaps increasing that number reduces a chance of us reusing the same file descriptor, and saves us from operations where we may operate on an FD we shouldn't be, or something like that.

John Daggett (:jtd)

Comment 43

•

11 years ago

Any chance of progress on this one? I typically hit this once or twice a day. "A tab crashed" but affects all tabs... Latest crash: https://crash-stats.mozilla.com/report/index/bp-6e7c4751-4449-4ed8-8b0e-29b462141218 Happened while browsing on zillow.com, zooming out around Petaluma, CA. That'll teach me to think of any place other than Petaluma... OSX 10.8.5, Retina MacBook Pro Other tabs included moco gmail, bugzilla

Comment 44

•

11 years ago

Re-nomming because this makes browsing some web properties pretty much impossible with e10s enabled.

tracking-e10s: + → ?

Comment 45

•

11 years ago

[Tracking Requested - why for this release]: regression in 35. Makes e10s terrible for some use cases

tracking-e10s: ? → m5+

tracking-firefox35: --- → ?

Assignee

Updated

•

11 years ago

Assignee: milan → gwright

Assignee

Updated

•

11 years ago

Flags: needinfo?(gwright)

Assignee

Comment 46

•

11 years ago

I'm having trouble reproducing this, having tried both comment 1 and comment 18's steps. This is on OS X with a release build. I've also tried a fresh e10s profile and tested with lots of tabs open. Any other suggestions? I'll keep trying different for now but if anyone can come up with a semi-reliable method for reproducing that'd be great. I'm on a 13" rMBP with intel graphics running 10.9.5 for what it's worth.

Comment 47

•

11 years ago

So I also just tried reproducing this with no success... It's true I haven't seen this one for a while, so perhaps something else that landed in platform fixed it.

Milan Sreckovic [:milan] (needinfo for best results)

Reporter

Comment 48

•

11 years ago

I haven't seen this bug in a long time. When I filed the bug, I could easily reproduce the crash within 5 seconds, but I can no longer reproduce it. I don't know if this is relevant, but I was running OS X 10.9 when I could reproduce the bug and now I'm running OS X 10.10. We can probably close this bug as WFM.

Status: REOPENED → NEW

Keywords: reproducible

Comment 49

•

11 years ago

George, I'll let you close it as WFM as you're assigned to it.

Reporter

Comment 50

•

11 years ago

WFM!

Status: NEW → RESOLVED

Closed: 11 years ago → 11 years ago

Flags: needinfo?(gwright)

Resolution: --- → WORKSFORME

:Gavin Sharp [email: gavin@gavinsharp.com]

Assignee

Comment 51

•

11 years ago

\o/

Comment 52

•

11 years ago

Just loading Google maps still triggers this for me. I don't see it in a new profile, though. I will try to figure out what in my profile might be triggering it.

Comment 53

•

11 years ago

I also hit this a ton today again. :( George, if you're in the office tomorrow, come find me and I'll reproduce for you.

Status: RESOLVED → REOPENED

Resolution: WORKSFORME → ---

Comment 54

•

11 years ago

(In reply to :Gavin Sharp [email: gavin@gavinsharp.com] from comment #52) > Just loading Google maps still triggers this for me. I don't see it in a new > profile, though. I will try to figure out what in my profile might be > triggering it. Guess the number of tabs in your old profile is a lot higher? See comment #39 and before. This problem won't just go away because we're not actually crashing, we just run out of fds...

Comment 55

•

11 years ago

(In reply to Tim Taubert [:ttaubert] from comment #54) > (In reply to :Gavin Sharp [email: gavin@gavinsharp.com] from comment #52) > > Just loading Google maps still triggers this for me. I don't see it in a new > > profile, though. I will try to figure out what in my profile might be > > triggering it. > > Guess the number of tabs in your old profile is a lot higher? See comment > #39 and before. > > This problem won't just go away because we're not actually crashing, we just > run out of fds... So the cause behind this crash is known / understood then? I have hit this again, following gw280's request for me to try to reproduce with just my integrated graphics card forced on (via gfxCardStatus). I can reproduce if I have a high number of tabs loaded, and scroll quickly on https://www.mozilla.org/en-US/firefox/os/. Is this something that just comes with the territory when using a single content process for so many tabs?

Flags: needinfo?(ttaubert)

Comment 56

•

11 years ago

(In reply to Mike Conley (:mconley) - Needinfo me! from comment #55) > (In reply to Tim Taubert [:ttaubert] from comment #54) > > (In reply to :Gavin Sharp [email: gavin@gavinsharp.com] from comment #52) > > This problem won't just go away because we're not actually crashing, we just > > run out of fds... > > So the cause behind this crash is known / understood then? Yeah, we run out of file descriptors, RLIMIT_FILENO=800 for Firefox on Mac. This number is much higher if start with ./mach from iTerm for example so it was hard to reproduce at first. > Is this something that just comes with the territory when using a single > content process for so many tabs? It seems that all the graphics layer IPC can cause us to try to use >800 fds and we don't (can't easily?) handle that without crashing. I'm not sure but we might have 800 fds per child process - so the chance of running into it then might be lower. For power users with a ton of tabs/groups this still might be easy to hit.

Flags: needinfo?(ttaubert)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 57

•

11 years ago

That sounds bad. mconley: thanks for checking that for me. Can you give an idea of what "high number of tabs" actually means? I tried locally with around 15 and couldn't get it to crash, which led to my suspecting the way the code interacts with the NVIDIA gpu (which I don't have).

Comment 58

•

11 years ago

> Is this something that just comes with the territory when using a single > content process for so many tabs? Both the parent and the child need an fd for the shared memory region, so the chrome process would presumably still run out of file descriptors even with process-per-tab. Why do we have so many shared memory regions? My understanding is that we throw away shared memory graphics buffers for tabs that aren't visible. And it seems pretty crazy that we need 800 shared memory regions for one web page. It really sounds like there's some sort of leak or inefficiency here that we need to fix. If that's the case, it might be more related to how long the browser has been running rather than the number of tabs open at the time.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 59

•

11 years ago

Is this any harder to reproduce if you set preferences layers.tile-height and layers.tile-width to 512 instead of 256?

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 60

•

11 years ago

(In reply to Milan Sreckovic [:milan] from comment #59) > Is this any harder to reproduce if you set preferences layers.tile-height > and layers.tile-width to 512 instead of 256? From a quick bit of testing that does seem to make this crash go away for me. I'll leave those flipped to confirm. (I certainly have a lot of tabs in this profile, and often leave it running for >24 hours usually.)

Milan Sreckovic [:milan] (needinfo for best results)

Comment 61

•

11 years ago

There is not much of a downside to increasing the tile size on OS X, the 256x256 was really meant for B2G, and we never followed up on the conversations to make this change. I agree it isn't the actual solution, but if it can make the problem go away, it's an easy start. There are also conversations about avoiding the FD usage for shared textures (there is an alternative on OS X), as well as doing special allocators so that we don't have the #FD == #shared textures + other stuff, but instead have more than one shared textures per FD, but that's a longer conversation. Also, slightly change how the videos deal with shared textures.

Milan Sreckovic [:milan] (needinfo for best results)

Comment 62

•

11 years ago

George is following up on this.

Assignee

Comment 63

•

11 years ago

Attached patch 0001-Bug-1036682-Increase-the-tile-size-to-512x512-on-non.patch (obsolete) — Details — Splinter Review

Let's increase the tile-size for now, as I think we should do this anyway as per comment 61, to minimise the impact of this crash. We should still work on a solution to gracefully fail when we run out of FDs though.

Attachment #8555973 - Flags: review?(jmuizelaar)

Comment 64

•

11 years ago

Comment on attachment 8555973 [details] [diff] [review] 0001-Bug-1036682-Increase-the-tile-size-to-512x512-on-non.patch Review of attachment 8555973 [details] [diff] [review]: ----------------------------------------------------------------- It would be good to have some justification for this outside of just minimizing memory usage. ::: modules/libpref/init/all.js @@ +3922,5 @@ > pref("layers.tile-height", 256); > +#else > +pref("layers.tile-height", 512); > +#endif > + This only changes the height.

Attachment #8555973 - Flags: review?(jmuizelaar) → review-

Assignee

Comment 65

•

11 years ago

Attached patch 0001-Bug-1036682-Bump-RLIMIT_NOFILE-to-the-hard-ceiling-o.patch (obsolete) — Details — Splinter Review

Attachment #8555973 - Attachment is obsolete: true

Attachment #8558085 - Flags: review?(jmuizelaar)

Comment 66

•

11 years ago

Comment on attachment 8558085 [details] [diff] [review] 0001-Bug-1036682-Bump-RLIMIT_NOFILE-to-the-hard-ceiling-o.patch Review of attachment 8558085 [details] [diff] [review]: ----------------------------------------------------------------- Seems reasonable.

Attachment #8558085 - Flags: review?(jmuizelaar) → review+

Assignee

Comment 67

•

11 years ago

So on my machine at least, RLIMIT_NOFILE was 2560 before, and that explains why I was unable to reproduce this bug. The ceiling on my machine is huge (9223372036854775807) so I'm not sure if we actually want to set it to the hard ceiling. Maybe something like min(16384, nofile_max)?

Assignee

Comment 68

•

11 years ago

Attached patch 0001-First-attempt-at-a-spinner-for-excessively-long-page.patch (obsolete) — Details — Splinter Review

Take 2, with a new cap imposed of 16k

Attachment #8558085 - Attachment is obsolete: true

Attachment #8558096 - Flags: review?(jmuizelaar)

Comment 69

•

11 years ago

Comment on attachment 8558096 [details] [diff] [review] 0001-First-attempt-at-a-spinner-for-excessively-long-page.patch Review of attachment 8558096 [details] [diff] [review]: ----------------------------------------------------------------- Wrong patch.

Attachment #8558096 - Flags: review?(jmuizelaar) → review-

Assignee

Comment 70

•

11 years ago

Attached patch 0001-Bug-1036682-Bump-RLIMIT_NOFILE-to-the-hard-ceiling-o.patch — Details — Splinter Review

Nice job, me.

Attachment #8558096 - Attachment is obsolete: true

Attachment #8558098 - Flags: review?(jmuizelaar)

Comment 71

•

11 years ago

Comment on attachment 8558098 [details] [diff] [review] 0001-Bug-1036682-Bump-RLIMIT_NOFILE-to-the-hard-ceiling-o.patch Review of attachment 8558098 [details] [diff] [review]: ----------------------------------------------------------------- ::: gfx/thebes/gfxPlatformMac.cpp @@ +87,5 @@ > + // buffered tiles in e10s, so let's bump the soft limit to the hard limit for the OS > + // up to a new cap of 16k > + struct rlimit limits; > + if (getrlimit(RLIMIT_NOFILE, &limits) == 0) { > + limits.rlim_cur = std::min((rlim_t)16384, limits.rlim_max); how about rlimt_t(16384) instead

Attachment #8558098 - Flags: review?(jmuizelaar) → review+

https://hg.mozilla.org/integration/mozilla-inbound/rev/7d4064c36f78

Assignee

Comment 72

•

11 years ago

Carsten Book [:Tomcat]

Comment 73

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/7d4064c36f78

Assignee

Comment 74

•

11 years ago

I'm going to go ahead and resolve this now, and if we want to reduce the number of fds used by tiles we should create another bug.

Status: REOPENED → RESOLVED

Closed: 11 years ago → 11 years ago

Resolution: --- → FIXED

Comment 75

•

11 years ago

So when I run this in my local build, I get: WARNING: Unable to bump RLIMIT_NOFILE to the maximum number on this OS: file ../../../mozilla/gfx/thebes/gfxPlatformMac.cpp, line 93 which presumably means the patch is not having any actual effect. Do we have any actual indication that this patch ever successfully raises the RLIMIT_NOFILE value?

Flags: needinfo?(gwright)

Comment 76

•

11 years ago

And in particular, it's failing with EINVAL. Looking around a bit, on recent Mac you can't set RLIMIT_NOFILE to anything larger than OPEN_MAX (e.g. see the "compatibility" bit in "man setrlimit" or <https://developer.apple.com/library/mac/documentation/Darwin/Reference/ManPages/man2/setrlimit.2.html>). Over here, OPEN_MAX seems to be 10240.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Comment 77

•

11 years ago

Also, did the bug on increasing tile size ever get filed?

Assignee

Comment 78

•

11 years ago

That's odd, because it definitely increased it on my machine (10.9); I called getrlimit afterwards and checked the value had gone up, and it most certainly did. I'll prepare a patch that accounts for OPEN_MAX.

Flags: needinfo?(gwright)

James Willcox (:snorp) (jwillcox@mozilla.com) (he/him)

Assignee

Comment 79

•

11 years ago

Attached patch OPEN_MAX-setrlimit.patch — Details — Splinter Review

Flagging snorp as the usual suspects are on a plane or something

Attachment #8560618 - Flags: review?(snorp)

Comment 80

•

11 years ago

Comment on attachment 8560618 [details] [diff] [review] OPEN_MAX-setrlimit.patch Review of attachment 8560618 [details] [diff] [review]: ----------------------------------------------------------------- r+ with nit ::: gfx/thebes/gfxPlatformMac.cpp @@ +88,1 @@ > // up to a new cap of 16k We're not setting it to 16k anymore

Attachment #8560618 - Flags: review?(snorp) → review+

Assignee

Comment 81

•

11 years ago

(In reply to Boris Zbarsky [:bz] from comment #77) > Also, did the bug on increasing tile size ever get filed? It did not. I have filed bug 1130545 now.

https://hg.mozilla.org/integration/mozilla-inbound/rev/2e998e2012dd

Assignee

Comment 82

•

11 years ago

Phil Ringnalda (:philor)

Comment 83

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/2e998e2012dd

Assignee

Comment 84

•

10 years ago

Boris, are you still seeing the issue where the number of fds isn't increased?

Flags: needinfo?(bzbarsky)

Comment 85

•

10 years ago

It's working correctly now.

Flags: needinfo?(bzbarsky)