Closed Bug 1295272 Opened 4 years ago Closed 3 years ago

[e10s] Refuse drag and drop when there isn't enough memory available to complete the operation

Categories

(Core :: DOM: Drag & Drop, defect)

50 Branch
x86_64
Windows 7
defect
Not set

Tracking

()

RESOLVED FIXED
mozilla53
Tracking Status
e10s + ---
firefox49 --- unaffected
firefox50 + wontfix
firefox51 + wontfix
firefox52 + verified
firefox53 --- verified

People

(Reporter: epinal99-bugzilla2, Assigned: eflores)

References

Details

(Keywords: feature, Whiteboard: [MemShrink:P2])

Attachments

(3 files)

It's a follow-up for the remaining issue of bug 1171307, partially fixed by bug 1272018.

STR: Be sure e10s is enabled
1) Open http://www.imagebam.com/image/b05156413493542
2) Drag the image and drop onto the image
3) Repeat 5-6 times

Result: huge memory increase leading to freeze the computer.

Before bug 1272018, STR lead to crash Firefox (bug 1171307).
Depends on: 1272018, 1171307
Flags: needinfo?(cyu)
Keywords: regression
Whiteboard: [MemShrink]
Attached file memory.txt
Just freeze
I'll look at this.
Assignee: nobody → cyu
Flags: needinfo?(cyu)
Memory is allocated in 
#0  je_malloc (size=294951183) at memory/mozjemalloc/jemalloc.c:6374
#1  0x00007fffe7535668 in mozilla::gfx::AlignedArray<unsigned char, 16>::Realloc (aZero=false, aCount=294951168, this=0x7fffb728fdc8) at gfx/2d/Tools.h:181
#2  mozilla::gfx::SourceSurfaceAlignedRawData::Init (this=this@entry=0x7fffb728fda0, aSize=..., aFormat=aFormat@entry=mozilla::gfx::SurfaceFormat::B8G8R8X8, aClearMem=aClearMem@entry=true, aClearValue=aClearValue@entry=255 '\377', aStride=aStride@entry=0) at gfx/2d/SourceSurfaceRawData.cpp:66
#3  0x00007fffe74d7879 in mozilla::gfx::Factory::CreateDataSourceSurface (aSize=..., aFormat=aFormat@entry=mozilla::gfx::SurfaceFormat::B8G8R8X8, aZero=aZero@entry=false) at gfx/2d/Factory.cpp:836
#4  0x00007fffe7501cfa in mozilla::gfx::CreateDataSourceSurfaceFromData (aSize=..., aFormat=<optimized out>, aData=<optimized out>, aDataStride=aDataStride@entry=30048) at gfx/2d/DataSurfaceHelpers.cpp:30
#5  0x00007fffe78a9b67 in nsContentUtils::DataTransferItemToImage (aItem=..., aContainer=aContainer@entry=0x7fffffffae30) at dom/base/nsContentUtils.cpp:7505
#6  0x00007fffe9099b65 in mozilla::dom::TabParent::AddInitialDnDDataTo (this=0x7fffc0074000, aDataTransfer=aDataTransfer@entry=0x7fffb7153f60) at dom/ipc/TabParent.cpp:3340
#7  0x00007fffe79fffb8 in DragDataProducer::Produce (this=this@entry=0x7fffffffb350, aDataTransfer=aDataTransfer@entry=0x7fffb7153f60, aCanDrag=aCanDrag@entry=0x7fffffffb4af, aSelection=aSelection@entry=0x7fffffffb5c0, aDragNode=aDragNode@entry=0x7fffffffb4b0) at dom/base/nsContentAreaDragDrop.cpp:430
#8  0x00007fffe7a0127e in nsContentAreaDragDrop::GetDragData (aWindow=<optimized out>, aTarget=<optimized out>, aSelectionTargetNode=aSelectionTargetNode@entry=0x7fffc7437430, aIsAltKeyPressed=<optimized out>, aDataTransfer=aDataTransfer@entry=0x7fffb7153f60, aCanDrag=aCanDrag@entry=0x7fffffffb4af, aSelection=0x7fffffffb5c0, aDragNode=0x7fffffffb4b0) at dom/base/nsContentAreaDragDrop.cpp:128
#9  0x00007fffe8845756 in mozilla::EventStateManager::DetermineDragTargetAndDefaultData (this=this@entry=0x7fffcecd5970, aWindow=<optimized out>, aSelectionTarget=0x7fffc7437430, aDataTransfer=aDataTransfer@entry=0x7fffb7153f60, aSelection=aSelection@entry=0x7fffffffb5c0, aTargetNode=0x7fffffffb5e0) at dom/events/EventStateManager.cpp:1846
#10 0x00007fffe885020f in mozilla::EventStateManager::GenerateDragGesture (this=this@entry=0x7fffcecd5970, aPresContext=aPresContext@entry=0x7fffd0e15800, aEvent=aEvent@entry=0x7fffffffbe60) at dom/events/EventStateManager.cpp:1748
#11 0x00007fffe88510f1 in mozilla::EventStateManager::PreHandleEvent (this=this@entry=0x7fffcecd5970, aPresContext=0x7fffd0e15800, aEvent=aEvent@entry=0x7fffffffbe60, aTargetFrame=0x7fffc3e37d98, aTargetContent=<optimized out>, aStatus=aStatus@entry=0x7fffffffbd44) at dom/events/EventStateManager.cpp:697
#12 0x00007fffe980dd74 in PresShell::HandleEventInternal (this=this@entry=0x7fffd0e53c00, aEvent=aEvent@entry=0x7fffffffbe60, aStatus=aStatus@entry=0x7fffffffbd44, aIsHandlingNativeEvent=aIsHandlingNativeEvent@entry=true) at layout/base/nsPresShell.cpp:8486
#13 0x00007fffe980e5a2 in PresShell::HandlePositionedEvent (this=this@entry=0x7fffd0e53c00, aTargetFrame=aTargetFrame@entry=0x7fffc3e37d98, aEvent=aEvent@entry=0x7fffffffbe60, aEventStatus=aEventStatus@entry=0x7fffffffbd44) at layout/base/nsPresShell.cpp:8327
#14 0x00007fffe9810898 in PresShell::HandleEvent (this=<optimized out>, aFrame=<optimized out>, aEvent=<optimized out>, aDontRetargetEvents=<optimized out>, aEventStatus=0x7fffffffbd44, aTargetContent=0x0) at layout/base/nsPresShell.cpp:8114
#15 0x00007fffe93945a2 in nsViewManager::DispatchEvent (this=<optimized out>, aEvent=aEvent@entry=0x7fffffffbe60, aView=<optimized out>, aStatus=aStatus@entry=0x7fffffffbd44) at view/nsViewManager.cpp:816
#16 0x00007fffe9774b50 in PresShell::DispatchSynthMouseMove (this=0x7fffd0e53c00, aEvent=0x7fffffffbe60, aFlushOnHoverChange=<optimized out>) at layout/base/nsPresShell.cpp:3707
#17 0x00007fffe97da30f in PresShell::ProcessSynthMouseMoveEvent (this=this@entry=0x7fffd0e53c00, aFromScroll=<optimized out>) at layout/base/nsPresShell.cpp:5867
#18 0x00007fffe981cf19 in PresShell::nsSynthMouseMoveEvent::WillRefresh (this=0x7fffb3735490, aTime=...) at layout/base/nsPresShell.h:681
#19 0x00007fffe96c9116 in nsRefreshDriver::Tick (this=this@entry=0x7fffd0e52c00, aNowEpoch=aNowEpoch@entry=1471418580362269, aNowTime=...) at layout/base/nsRefreshDriver.cpp:1699
#20 0x00007fffe96d1966 in mozilla::RefreshDriverTimer::TickDriver (now=..., jsnow=1471418580362269, driver=0x7fffd0e52c00) at layout/base/nsRefreshDriver.cpp:275
#21 mozilla::RefreshDriverTimer::TickRefreshDrivers (aJsNow=aJsNow@entry=1471418580362269, aNow=aNow@entry=..., aDrivers=..., this=0x7fffcd19be20) at layout/base/nsRefreshDriver.cpp:247
#22 0x00007fffe96d1a0b in mozilla::RefreshDriverTimer::Tick (this=0x7fffcd19be20, jsnow=1471418580362269, now=...) at layout/base/nsRefreshDriver.cpp:266
#23 0x00007fffe96d1c42 in mozilla::VsyncRefreshDriverTimer::RunRefreshDrivers (aTimeStamp=..., this=0x7fffcd19be20) at layout/base/nsRefreshDriver.cpp:589
#24 mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::TickRefreshDriver (this=<optimized out>, aVsyncTimestamp=...) at layout/base/nsRefreshDriver.cpp:509
#25 0x00007fffe96caa06 in mozilla::detail::RunnableMethodArguments<mozilla::TimeStamp>::applyImpl<mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver, void (mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::*)(mozilla::TimeStamp), StoreCopyPassByValue<mozilla::TimeStamp>, 0ul> (args=..., m=<optimized out>, o=<optimized out>) at objdir-nightly/dist/include/nsThreadUtils.h:729
#26 mozilla::detail::RunnableMethodArguments<mozilla::TimeStamp>::apply<mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver, void (mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::*)(mozilla::TimeStamp)> (m=<optimized out>, o=<optimized out>, this=<optimized out>) at objdir-nightly/dist/include/nsThreadUtils.h:736
#27 mozilla::detail::RunnableMethodImpl<void (mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::*)(mozilla::TimeStamp), true, false, mozilla::TimeStamp>::Run (this=<optimized out>) at objdir-nightly/dist/include/nsThreadUtils.h:764
#28 0x00007fffe6373c63 in nsThread::ProcessNextEvent (this=0x7ffff6bbf900, aMayWait=<optimized out>, aResult=0x7fffffffc3a7) at xpcom/threads/nsThread.cpp:1058
#29 0x00007fffe63b85cc in NS_ProcessNextEvent (aThread=<optimized out>, aMayWait=<optimized out>) at xpcom/glue/nsThreadUtils.cpp:290
#30 0x00007fffe699a622 in mozilla::ipc::MessagePump::Run (this=0x7fffe2e75040, aDelegate=0x7ffff6badd50) at ipc/glue/MessagePump.cpp:96
#31 0x00007fffe693820b in MessageLoop::RunInternal (this=0x7ffff6badd50) at ipc/chromium/src/base/message_loop.cc:232
#32 0x00007fffe69382b7 in MessageLoop::RunHandler (this=<optimized out>) at ipc/chromium/src/base/message_loop.cc:225
#33 MessageLoop::Run (this=<optimized out>) at ipc/chromium/src/base/message_loop.cc:205
#34 0x00007fffe93cca29 in nsBaseAppShell::Run (this=0x7fffd5a07320) at widget/nsBaseAppShell.cpp:156
#35 0x00007fffea15fb12 in nsAppStartup::Run (this=0x7fffd5a0f290) at toolkit/components/startup/nsAppStartup.cpp:284
#36 0x00007fffea20bbfd in XREMain::XRE_mainRun (this=this@entry=0x7fffffffc7b0) at toolkit/xre/nsAppRunner.cpp:4302
#37 0x00007fffea20ce3a in XREMain::XRE_main (this=this@entry=0x7fffffffc7b0, argc=argc@entry=4, argv=argv@entry=0x7fffffffdbd8, aAppData=aAppData@entry=0x7fffffffc9f0) at toolkit/xre/nsAppRunner.cpp:4429
#38 0x00007fffea20d243 in XRE_main (argc=4, argv=0x7fffffffdbd8, aAppData=0x7fffffffc9f0, aFlags=<optimized out>) at toolkit/xre/nsAppRunner.cpp:4520
#39 0x0000000000405d26 in do_main (argc=argc@entry=4, argv=argv@entry=0x7fffffffdbd8, envp=envp@entry=0x7fffffffdc00, xreDirectory=0x7ffff6b61b40) at browser/app/nsBrowserApp.cpp:259
#40 0x0000000000405354 in main (argc=4, argv=0x7fffffffdbd8, envp=0x7fffffffdc00) at browser/app/nsBrowserApp.cpp:392
(gdb) c
Continuing.

But the SourceSurfaceRawData instance is deallocated in CC. If we drag the image several times before CC is triggered, it can create temporary high memory usage to freeze the system.

Thread 1 "firefox" hit Breakpoint 3, chunk_dealloc (chunk=0x7fff84400000, size=295698432) at memory/mozjemalloc/jemalloc.c:3148
3148    {
(gdb) bt
#0  chunk_dealloc (chunk=0x7fff84400000, size=295698432) at memory/mozjemalloc/jemalloc.c:3148
#1  0x00000000004215b9 in huge_dalloc (ptr=ptr@entry=0x7fff84400000) at memory/mozjemalloc/jemalloc.c:5403
#2  0x000000000042494c in je_free (ptr=0x7fff84400000) at memory/mozjemalloc/jemalloc.c:6694
#3  0x00007fffe74d9f8e in mozilla::gfx::AlignedArray<unsigned char, 16>::Dealloc (this=0x7fffb728fdc8, this=0x7fffb728fdc8) at gfx/2d/Tools.h:158
#4  mozilla::gfx::AlignedArray<unsigned char, 16>::~AlignedArray (this=0x7fffb728fdc8, __in_chrg=<optimized out>) at gfx/2d/Tools.h:137
#5  mozilla::gfx::SourceSurfaceAlignedRawData::~SourceSurfaceAlignedRawData (this=0x7fffb728fda0, __in_chrg=<optimized out>) at gfx/2d/SourceSurfaceRawData.h:112
#6  mozilla::gfx::SourceSurfaceAlignedRawData::~SourceSurfaceAlignedRawData (this=0x7fffb728fda0, __in_chrg=<optimized out>) at gfx/2d/SourceSurfaceRawData.h:114
#7  0x00007fffe7726ef0 in mozilla::detail::RefCounted<mozilla::gfx::SourceSurface, (mozilla::detail::RefCountAtomicity)0>::Release (this=0x7fffb728fda8) at objdir-nightly/dist/include/mozilla/RefCounted.h:135
#8  mozilla::RefPtrTraits<mozilla::gfx::SourceSurface>::Release (aPtr=0x7fffb728fda0) at objdir-nightly/dist/include/mozilla/RefPtr.h:40
#9  RefPtr<mozilla::gfx::SourceSurface>::ConstRemovingRefPtrTraits<mozilla::gfx::SourceSurface>::Release (aPtr=0x7fffb728fda0) at objdir-nightly/dist/include/mozilla/RefPtr.h:387
#10 RefPtr<mozilla::gfx::SourceSurface>::~RefPtr (this=0x7fffb378ce80, __in_chrg=<optimized out>) at objdir-nightly/dist/include/mozilla/RefPtr.h:78
#11 gfxSurfaceDrawable::~gfxSurfaceDrawable (this=0x7fffb378ce60, __in_chrg=<optimized out>) at gfx/thebes/gfxDrawable.h:75
#12 gfxSurfaceDrawable::~gfxSurfaceDrawable (this=0x7fffb378ce60, __in_chrg=<optimized out>) at gfx/thebes/gfxDrawable.h:75
#13 0x00007fffe77c21b6 in gfxDrawable::Release (this=0x7fffb378ce60) at gfx/thebes/gfxDrawable.h:23
#14 0x00007fffe7816ee3 in mozilla::RefPtrTraits<gfxDrawable>::Release (aPtr=<optimized out>) at objdir-nightly/dist/include/mozilla/RefPtr.h:40
#15 RefPtr<gfxDrawable>::ConstRemovingRefPtrTraits<gfxDrawable>::Release (aPtr=<optimized out>) at objdir-nightly/dist/include/mozilla/RefPtr.h:387
#16 RefPtr<gfxDrawable>::~RefPtr (this=0x7fffb7126358, __in_chrg=<optimized out>) at objdir-nightly/dist/include/mozilla/RefPtr.h:78
#17 mozilla::image::DynamicImage::~DynamicImage (this=0x7fffb7126340, __in_chrg=<optimized out>) at image/DynamicImage.h:67
#18 mozilla::image::DynamicImage::~DynamicImage (this=0x7fffb7126340, __in_chrg=<optimized out>) at image/DynamicImage.h:67
#19 mozilla::image::DynamicImage::Release (this=0x7fffb7126340) at image/DynamicImage.cpp:114
#20 0x00007fffe6320fec in nsDiscriminatedUnion::Cleanup (this=this@entry=0x7fffb716cfb8) at xpcom/ds/nsVariant.cpp:1618
#21 0x00007fffe6323769 in nsVariantCC::cycleCollection::Unlink (this=<optimized out>, p=0x7fffb716cfb0) at xpcom/ds/nsVariant.cpp:2220
#22 0x00007fffe62e208c in nsCycleCollector::CollectWhite (this=this@entry=0x7fffe1643000) at xpcom/base/nsCycleCollector.cpp:3339
#23 0x00007fffe62e9f7b in nsCycleCollector::Collect (this=0x7fffe1643000, aCCType=ManualCC, aBudget=..., aManualListener=0x0, aPreferShorterSlices=false) at xpcom/base/nsCycleCollector.cpp:3685
#24 0x00007fffe62eab6d in nsCycleCollector_collect (aManualListener=aManualListener@entry=0x0) at xpcom/base/nsCycleCollector.cpp:4156
#25 0x00007fffe7a62036 in nsJSContext::CycleCollectNow (aListener=aListener@entry=0x0, aExtraForgetSkippableCalls=aExtraForgetSkippableCalls@entry=0) at dom/base/nsJSEnvironment.cpp:1420
#26 0x00007fffe7a63599 in nsJSContext::CycleCollectNow (aExtraForgetSkippableCalls=0, aListener=0x0) at objdir-nightly/dist/include/nsTString.h:452
#27 nsJSEnvironmentObserver::Observe (this=<optimized out>, aSubject=<optimized out>, aTopic=<optimized out>, aData=<optimized out>) at dom/base/nsJSEnvironment.cpp:338
#28 0x00007fffe63272d6 in nsObserverList::NotifyObservers (this=<optimized out>, aSubject=aSubject@entry=0x0, aTopic=aTopic@entry=0x7fffeb8ee6d9 "memory-pressure", someData=someData@entry=0x7fffeba1c39c u"heap-minimize") at xpcom/ds/nsObserverList.cpp:112
#29 0x00007fffe63273a8 in nsObserverService::NotifyObservers (this=0x7fffe2e9f520, aSubject=0x0, aTopic=0x7fffeb8ee6d9 "memory-pressure", aSomeData=0x7fffeba1c39c u"heap-minimize") at xpcom/ds/nsObserverService.cpp:305
#30 0x00007fffe62fc66b in (anonymous namespace)::MinimizeMemoryUsageRunnable::Run (this=0x7fffb7d579a0) at xpcom/base/nsMemoryReporterManager.cpp:2544
#31 0x00007fffe6373c63 in nsThread::ProcessNextEvent (this=0x7ffff6bbf900, aMayWait=<optimized out>, aResult=0x7fffffffc3a7) at xpcom/threads/nsThread.cpp:1058
#32 0x00007fffe63b85cc in NS_ProcessNextEvent (aThread=<optimized out>, aMayWait=<optimized out>) at xpcom/glue/nsThreadUtils.cpp:290
#33 0x00007fffe699a622 in mozilla::ipc::MessagePump::Run (this=0x7fffe2e75040, aDelegate=0x7ffff6badd50) at ipc/glue/MessagePump.cpp:96
#34 0x00007fffe693820b in MessageLoop::RunInternal (this=0x7ffff6badd50) at ipc/chromium/src/base/message_loop.cc:232
#35 0x00007fffe69382b7 in MessageLoop::RunHandler (this=<optimized out>) at ipc/chromium/src/base/message_loop.cc:225
#36 MessageLoop::Run (this=<optimized out>) at ipc/chromium/src/base/message_loop.cc:205
#37 0x00007fffe93cca29 in nsBaseAppShell::Run (this=0x7fffd5a07320) at widget/nsBaseAppShell.cpp:156
#38 0x00007fffea15fb12 in nsAppStartup::Run (this=0x7fffd5a0f290) at toolkit/components/startup/nsAppStartup.cpp:284
#39 0x00007fffea20bbfd in XREMain::XRE_mainRun (this=this@entry=0x7fffffffc7b0) at toolkit/xre/nsAppRunner.cpp:4302
#40 0x00007fffea20ce3a in XREMain::XRE_main (this=this@entry=0x7fffffffc7b0, argc=argc@entry=4, argv=argv@entry=0x7fffffffdbd8, aAppData=aAppData@entry=0x7fffffffc9f0) at toolkit/xre/nsAppRunner.cpp:4429
#41 0x00007fffea20d243 in XRE_main (argc=4, argv=0x7fffffffdbd8, aAppData=0x7fffffffc9f0, aFlags=<optimized out>) at toolkit/xre/nsAppRunner.cpp:4520
#42 0x0000000000405d26 in do_main (argc=argc@entry=4, argv=argv@entry=0x7fffffffdbd8, envp=envp@entry=0x7fffffffdc00, xreDirectory=0x7ffff6b61b40) at browser/app/nsBrowserApp.cpp:259
#43 0x0000000000405354 in main (argc=4, argv=0x7fffffffdbd8, envp=0x7fffffffdc00) at browser/app/nsBrowserApp.cpp:392
(gdb)
Unassign from the bug since the fix looks to be in gfx.
Assignee: cyu → nobody
Milan, are you aware of this bug?
Flags: needinfo?(milan)
I am now :)

Just to understand better - there is no actual memory leak, but the way drag and drop now works, we can ask for too much memory and overwhelm the system before CC kicks in.  In the past, this workflow would just crash, but since we started using shared memory for drag and drop (bug 1272018), this happens instead.  Is that about right?

Does the frozen system ever come back?  Like, if you give it 10 minutes, does it right itself up?

Not quite sure what we can do in graphics here, it seems like things are behaving as drag and drop is asking for?
Flags: needinfo?(milan)
Cervantes, can you field some of Milan's questions from comment 6?
Flags: needinfo?(cyu)
(In reply to Milan Sreckovic [:milan] from comment #6)
> I am now :)
> 
> Just to understand better - there is no actual memory leak, but the way drag
> and drop now works, we can ask for too much memory and overwhelm the system
> before CC kicks in.  In the past, this workflow would just crash, but since
> we started using shared memory for drag and drop (bug 1272018), this happens
> instead.  Is that about right?
> 
Yes, the workflow crashes the tab on 48 because we fail the transmission in:
http://hg.mozilla.org/releases/mozilla-release/file/tip/ipc/chromium/src/chrome/common/ipc_channel_win.cc#l361

And the use of shared memory prevents the crash so we have the chance to observe this problem.

> Does the frozen system ever come back?  Like, if you give it 10 minutes,
> does it right itself up?
> 
Tested on my laptop with 16GB of memory (this is my only Windows workstation):
32 bit: the working set of the browser process ramps up the ~1.9 GB and stays the same during test. Working set drops back to ~450 MB after the test stops. No slowdown or freeze observed because there is still lots of free memory. I think CC kicks in to free the memory.
64 bit: the working set can ramp up so much as to use up all physical memory. The system becomes less responsive, freezes a little, but memory usage goes back down quickly.
Flags: needinfo?(cyu)
Tracked since it's a recent regression in Fx50.
Flags: needinfo?(milan)
[Tracking Requested - why for this release]:

[Tracking Requested - why for this release]:

A few notes.  I don't think we should fix this by going back to the original behaviour.  The standard case is better, we can handle larger images, and what used to be a crash even in normal cases is now a freeze when stress tested beyond hardware's capability.

To really resolve this fully, we would need some UX direction. We could track the amount of memory, see that a drag/drop would get us beyond comfortable range, and refuse to perform the operation, perhaps notifying the user.  Or we could try delaying the operation until the memory usage drops down; either way, the fix would be to not do what the test is doing.

I would probably lean towards the first one - just refuse to do the operation, and refuse it fairly quietly (something in the browser console is good enough.)  The user would just repeat the operation and by then the memory usage would go down.  But that makes this bug a "feature", and not really a regression - a more frequent crash is now a less freeze, often temporary.
Flags: needinfo?(milan)
Keywords: regressionfeature
Summary: [e10s] Huge memory use leading to freeze computer when using drag and drop with a large image → [e10s] Refuse drag and drop when there isn't enough memory available to complete the operation
Is there a good reason for the thing (whatever it is) that retains this large amount of memory to be cycle collected? Can we have it eagerly release the image as soon as the DnD ends rather than waiting for the whole thing to be cycle collected?
Neil, is this your area of expertise? What do you think of comment 12?
Flags: needinfo?(enndeakin)
Whiteboard: [MemShrink] → [MemShrink:P2]
What is keeping the data alive past the end of the drag?
Flags: needinfo?(enndeakin)
(In reply to Neil Deakin from comment #14)
> What is keeping the data alive past the end of the drag?

From comment 6 (and comment 12) it sounds like it's intended to be cycle collected, but CC doesn't kick in quick enough for the amount of data we're leaving around. If the question is what cycle it's in, I guess Milan could elaborate.
I believe the cycle collection decision was made in bug 1210591 - perhaps that was OK when we weren't using shared memory, but we should do something different now?

For those that can reproduce, what happens if you change https://dxr.mozilla.org/mozilla-central/source/dom/ipc/TabParent.cpp#3320 from nsVariantCC to nsVariant?
Flags: needinfo?(cyu)
Flags: needinfo?(continuation)
Changing nsVariantCC to nsVariant will only make things stay alive for longer.
Flags: needinfo?(continuation)
(In reply to Andrew McCreight [:mccr8] from comment #17)
> Changing nsVariantCC to nsVariant will only make things stay alive for
> longer.

That would be a wrong thing to try then :)
Flags: needinfo?(cyu)
Resetting the flags from Comment 10.
Hi :milan,
Is there anything you can help here?
Flags: needinfo?(milan)
Outside of the bug 1301301 perhaps improving the cycle collection times in this scenario as well (I don't really know one way or another), and not knowing much about CC in the first place, the only idea I have is from comment 11.  Mike, do you think UX needs to get involved, or is this simple enough of a workflow (drag and drop fails because low memory situation message in the browser console) to just run with it?
Flags: needinfo?(milan) → needinfo?(mconley)
See Also: → 1301301
Bug 1301301 isn't related. That is about a cycle collection taking too long to run, while in this bug the problem is that we don't cycle collect fast enough.

What would work better is for there to be some specific point in the code where we could say "okay, we're done with this drag and drop data!" and then destroy it, rather than waiting on cycle collection to run. Or at least dropping the giant data buffers it holds onto. I don't know enough about the lifetime of the drag and drop data to say if that is possible.
See Also: 1301301
> while in this bug the problem is that we don't cycle collect fast enough.
What I mean, is that we don't trigger a CC soon enough.
(In reply to Andrew McCreight [:mccr8] from comment #22)
> What would work better is for there to be some specific point in the code
> where we could say "okay, we're done with this drag and drop data!" and then
> destroy it, rather than waiting on cycle collection to run. Or at least
> dropping the giant data buffers it holds onto. I don't know enough about the
> lifetime of the drag and drop data to say if that is possible.

The operating system no longer needs the drag data at the point when nsBaseDragService::EndDragSession(true) is called.
(In reply to Milan Sreckovic [:milan] from comment #21)
> Outside of the bug 1301301 perhaps improving the cycle collection times in
> this scenario as well (I don't really know one way or another), and not
> knowing much about CC in the first place, the only idea I have is from
> comment 11.  Mike, do you think UX needs to get involved, or is this simple
> enough of a workflow (drag and drop fails because low memory situation
> message in the browser console) to just run with it?

Reading this bug, I don't believe UX needs to get involved. It sounds to me like we just need to free up some memory as soon as the drag has ended (comment 22), and it sounds like that point is when nsBaseDragService::EndDragSession(true) is called (comment 24).
Flags: needinfo?(mconley)
Assignee: nobody → edwin
Hi Edwin, Milan, is this something that we might have a fix for soon? Or is it already too late for 50?
Flags: needinfo?(milan)
Flags: needinfo?(edwin)
Just to understand what's going on:
1. We have a local image on the "source" side.
2. We copy it into shared memory
3. We send the shared memory info over IPC
4. On the "destination" side, we copy out of shared memory into a local image

It looks like we're asking for shared memory to get deallocated, so this is a question of deallocating the local image on the destination side sooner, not with CC, but explicitly, after EndDrawSession happens?
Haven't looked at this yet; it will miss 50.
Flags: needinfo?(edwin)
Status: NEW → ASSIGNED
Attached patch 1295272.patchSplinter Review
There isn't much we can do about the cycle-collected objects themselves here (nsIVariant, DataTransfer*).

This patch just clears every nsIVariant contained by a KIND_OTHER DataTransferItem in EndDragSession.
Attachment #8811337 - Flags: review?(enndeakin)
Comment on attachment 8811337 [details] [diff] [review]
1295272.patch

>+      nsCOMPtr<nsIVariant> variant = item->DataNoSecurityCheck();

Since DataNoSecurityCheck() calls FillInExternalData this will retrieve drag data from another application only to immediately delete it. However, since we only care here about drags from Firefox itself, we can return early at the beginning of DiscardInternalTransferData when mSourceNode is null. As in:

  if (mDataTransfer && mSourceNode) {

> DiscardInternalTransferData();

Do we need to do this for both parent and child processes?
Attachment #8811337 - Flags: review?(enndeakin) → review+
(In reply to Neil Deakin from comment #31)
> > DiscardInternalTransferData();
> 
> Do we need to do this for both parent and child processes?

Just parent. I'll change that before landing (as well as your first comment).
Pushed by eflores@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/6a359b09d915
Discard internal DataTransferItem data in nsBaseDragService::EndDragSession - r=enndeakin
https://hg.mozilla.org/mozilla-central/rev/6a359b09d915
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Hi Loic,
could you help verify if this issue is fixed as expected on the latest Nightly build? Thanks!
Flags: needinfo?(epinal99-bugzilla2)
Hi :edwin, 
Is this worth uplifting to Beta51 and Aurora52?
Flags: needinfo?(edwin)
(In reply to Gerry Chang [:gchang] from comment #35)
> Hi Loic,
> could you help verify if this issue is fixed as expected on the latest
> Nightly build? Thanks!

Yes, it's fixed. There is an increase of the memory use during drag&drop (~200 MB for the image in comment #0) but after the drop, the memory is back to normal. No more freeze of the computer.
Flags: needinfo?(epinal99-bugzilla2)
(In reply to Gerry Chang [:gchang] from comment #36)
> Hi :edwin, 
> Is this worth uplifting to Beta51 and Aurora52?

The issue exists in 50, but given that this is more of a stress test than a realistic situation, I'm happy to wait it out and have this ride the trains.
Flags: needinfo?(milan)
53 doesn't go to release until April and 52 is our next ESR. Especially as we look to increase our e10s population size, can we consider at least uplifting to Aurora? We've got a long cycle to flush out any issues.
Comment on attachment 8811337 [details] [diff] [review]
1295272.patch

Approval Request Comment
[Feature/Bug causing the regression]: 1272018
[User impact if declined]: Long delays, users perhaps killing Firefox because of them
[Is this code covered by automated tests?]:
[Has the fix been verified in Nightly?]:
[Needs manual test from QE? If yes, steps to reproduce]: Yes, please. See comment 0.
[List of other uplifts needed for the feature/fix]:
[Is the change risky?]: There is a mild risk.
[Why is the change risky/not risky?]: We are deleting objects that could be used by others.  We skip those used by JS, so we don't think this will happen, but if that logic is faulty things may start disappearing.
[String changes made/needed]:
Flags: needinfo?(milan)
Attachment #8811337 - Flags: approval-mozilla-aurora?
Comment on attachment 8811337 [details] [diff] [review]
1295272.patch

let's take this in aurora and watch for any fallout
Attachment #8811337 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
I've managed to reproduce this issue on an old Nightly build from 2016-08-15. (following STR from comment 0)

I can confirm the fix on latest Nightly 53.0a1 (2017-01-12) and latest Aurora 52.0a2 (2017-01-12) using Windows 7 x64, x86/Windows 10 x64.
Duplicate of this bug: 1336417
Flags: needinfo?(edwin.bugs)
You need to log in before you can comment on or make changes to this bug.