Closed Bug 1476195 Opened 6 years ago Closed 5 years ago

Dragstart on a child process produces zombie drag session in the parent process and breaks subsequent drag-and-drops on macOS

Categories

(Core :: DOM: Copy & Paste and Drag & Drop, defect, P3)

x86_64
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla71
Tracking Status
firefox64 --- wontfix
firefox65 --- wontfix
firefox71 --- fixed

People

(Reporter: yuki, Assigned: yuki)

References

Details

Attachments

(5 files, 2 obsolete files)

# Abstract

When a discarded tab is restored by an event handler for "mousedown" events on a draggable element in a remote sidebar content, "dragstart" event is dispatched but no further "dragover" events is dispatched. Moreover, when this problem happens, "dragover" event is not dispatched anymore until I drag something UI element of Firefox itself (for example, a native tab of Firefox's tab bar.) Currently I successfully reproduce this problem only on macOS (Sierra 10.12.6).

Original report for Tree Style Tab addon is:
https://github.com/piroor/treestyletab/issues/1778

# Steps to reproduce

 1. Go to "about:config" and set "extensions.webextensions.remote" to "true".
    (This problem requires enabled e10s for addons.)
 2. Go to "about:preferences" and set "General" => "Startup" =>
    "Restore previous session" checked.
 3. Open multiple tabs with webpages.
 4. Restart Nightly.
 5. Go to "about:debugging" and load the attached "sidebar-example.xpi" as
    an temporary addon.
 6. Then a sidebar panel is loaded. There are some items for tabs:
    * items with black color are already loaded tabs.
    * items with red color are discarded (pending) tabs.
    * "mousedown" on each item makes corresponding tab active.
    * these items are draggable, and "dragover" event changes the background
      color of a hovered tab green.
 7. Try to drag an item with red color. Please note that you need to produce
    "mousedown" and "dragstart" at very near time.

# Actual result

No "dragging" happens. No hovered item is shown with green background color, and moreover, you cannot start any further dragging in the sidebar panel even if you retry to drag any other item.

# Expected result

The dragged item is "dragged" and hovered item is shown with green background color. And I can drag any other item again.
In the screencast, all of "DRAG START", "DRAG OVER" and "DRAG END" logs appeared on the success case but only "DRAG START" logs are appeared on the failure case. (Please note that the number of logs are just incremented because there are no other logs.)
Priority: -- → P3
I started Firefox with "MOZ_LOG=timestamp,sync,nsDragService:5,nsCocoaWidgets:5" and tried to logging. On success case, many logs like "I/nsCocoaWidgets ChildView doDragAction: entered" and "I/nsCocoaWidgets ChildView draggingUpdated: entered" are reported. On the other hand, the failure case reports no log for those modules.
I'm trying to research with a custom debug build on macOS, and
https://dxr.mozilla.org/mozilla-central/rev/085cdfb90903d4985f0de1dc7786522d9fb45596/dom/events/EventStateManager.cpp#1939
I've confirmed that the dragstart event is not canceled at here in both success and failure cases.
(the "startEvent" had mUserCancelled=false and mDefaultPreventedOnContent=false in both cases.)
Hmm...
After more debugging, I've realized that the difference of the success case and the failure case. When I successfully started dragging from the sidebar, "count" on this line:
https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/events/EventStateManager.cpp#2085
became 1 (or more). On the other hand, when I couldn't start dragging, the "count" was always 0 despite I certainly called "dataTransfer.setData()" of the dragstart event. So, when this problem happens, the EventStateManager looks to be receiving wrong drag session.
https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/events/EventStateManager.cpp#1991
=> https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/base/nsContentAreaDragDrop.cpp#116
=> https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/base/nsContentAreaDragDrop.cpp#542
=> https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/ipc/TabParent.cpp#3374

On the failure case, Firefox tried to get dragged data by TabParent::AddInitialDnDDataTo() but it returned nothing. The drag data must be prepared by TabParent::RecvInvokeDragSession(), so I set a breakpoint there.

https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/ipc/TabParent.cpp#3315

Then, I realized that the function was not called on the failure case. Thus it looks that the child process failed to send drag data to the parent process or the parent failed to receive drag data sent from the child.
Update:
https://dxr.mozilla.org/mozilla-central/source/obj-x86_64-pc-linux-gnu/ipc/ipdl/PBrowserParent.cpp#4339
On success case "InvokeDragSession" IPC message was logged but on failure case it was not. If "InvokeDragSession" IPC message was sent from the child process, it should also be logged, because other IPC messages from the child process were still logged. So I think that "InvokeDragSession" IPC message was not sent from the child process.

The "InvokeDragSession" IPC message looks sent from: 
https://dxr.mozilla.org/mozilla-central/source/obj-x86_64-pc-linux-gnu/ipc/ipdl/PContent.cpp#435
=> https://dxr.mozilla.org/mozilla-central/source/obj-x86_64-pc-linux-gnu/ipc/ipdl/PBrowserChild.cpp#2378
=> https://dxr.mozilla.org/mozilla-central/source/widget/nsDragServiceProxy.cpp#95
=> https://dxr.mozilla.org/mozilla-central/source/widget/nsBaseDragService.cpp#259
=> https://dxr.mozilla.org/mozilla-central/source/widget/nsBaseDragService.cpp#298
=> https://dxr.mozilla.org/mozilla-central/source/dom/events/EventStateManager.cpp#2162
So, if InvokeDragSessionWithImage() was not called, no "InvokeDragSession" IPC message should be sent from the child process. As the next step I'm trying to research that the method is actually called or not.
Update:
With more logging, I've realized that the child process didn't send "InvokeDragSession" IPC message on the failure case. At the time EventStateManager::DoDefaultDragStart() was returned here before drag session is invoked:
https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/events/EventStateManager.cpp#2077
I think that the operation "Immediate dragstart in a remote sidebar content after restoration of discarded (pending) tab" produced a zombie drag session.
The condition "dragging of a pendingtab" looks unnecessary. I've successfully reproduced the problem around a regular tab, on a debug build of Firefox for macOS. The debug build is very very slow and easily I can move mouse before the clicked tab is activated. Thus I'm guessing that the actual trigger is maybe "switching of active child process while starting of a new drag session".
I've researched more.

On a success case, the "dragstert" action in the child process was processed and an "InvokeDragSession" IPC message was sent. The event state manager of the parent process received it and invoked the drag session successfully.
Moreover, its PreHandleEvent() method was called for an "eDragOver" event, then its DispatchCrossProcessEvent() and MaybeInvokeDragSession() methods were called and following drag-and-drop events were dispatched as expected.

----

On the other hand, there are different two code paths on failure cases: the trigger case and others.

The trigger case: the "dragstert" action in the child process was processed and an "InvokeDragSession" IPC message was sent. The event state manager of the parent process received it and at least tried to invoke the drag session. After that, due to something unknown reasons, in this case PreHandleEvent() method was not called. As the result following drag-and-drop events were not dispatched.
I'm gessing that a zombie drag session was left on the child process on this case.

And, other failure cases: the "dragstert" action in the child process was processed, but it didn't send an "InvokeDragSession" IPC message due to a zombie drag session. As the result following drag-and-drop events were not dispatched.
On the trigger case, the "InvokeDragSession" IPC message from the child process was handled by the parent process, but the operation was aborted at this line:

https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsDragService.mm#300

> nsresult
> nsDragService::InvokeDragSessionImpl(nsIArray* aTransferableArray,
>                                      nsIScriptableRegion* aDragRgn,
>                                      uint32_t aActionType)
> {
>   NS_OBJC_BEGIN_TRY_ABORT_BLOCK_NSRESULT;
> 
>   if (!gLastDragView) {
>     // gLastDragView is only set during -[ChildView mouseDragged:].
>     // InvokeDragSessionImpl is only called while Gecko processes a mouse move
>     // event. So if we get here with gLastDragView being null, that means that
>     // the mouse button has already been released, and mouseMoved is on the
>     // stack instead of mouseDragged. In that case we need to abort the drag
>     // because the OS won't know where to drop whatever's being dragged, and we
>     // might end up with a stuck drag & drop session.
>     return NS_ERROR_FAILURE;
>   }

After some tries I've realized that "gLastDragView" became unexpectedly null while dragging, when the mouse cursor was moved out from the element dragstart event was fired on.
The "gLastDragView" is updated here:
https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsChildView.mm#4728
and it is cleared just 10 lines after. There is no other code which can touch to the global variable.
So I tried to inspect more details with custom MOZ_LOG()s, and finally I got these results:

----
Success case: 

1. The child process sends "InvokeDragSession" IPC message by "dragService->InvokeDragSessionWithImage()"
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/events/EventStateManager.cpp#2162
2. The parent process receives the message by "TabParent::RecvInvokeDragSession()"
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/ipc/TabParent.cpp#3315
3. Cocoa's "mouseDragged" event handler is called.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsChildView.mm#4718
4. Cocoa's "mouseDragged" event handler sets gLastDragView.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsChildView.mm#4728
5. "nsDragService::InvokeDragSessionImpl()" is called and the drag session is invoked.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsDragService.mm#286
6. Cocoa's "mouseDragged" event handler clears gLastDragView.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsChildView.mm#4738

----
Failure case (trigger case) 1:

1. The child process sends "InvokeDragSession" IPC message by "dragService->InvokeDragSessionWithImage()"
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/events/EventStateManager.cpp#2162
2. The parent process receives the message by "TabParent::RecvInvokeDragSession()"
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/ipc/TabParent.cpp#3315
3. "nsDragService::InvokeDragSessionImpl()" is called.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsDragService.mm#286
4. The drag session is aborted due to missing gLastDragView.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsDragService.mm#300
5. Cocoa's "mouseDragged" event handler is called.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsChildView.mm#4718
6. Cocoa's "mouseDragged" event handler sets gLastDragView.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsChildView.mm#4728
7. Cocoa's "mouseDragged" event handler clears gLastDragView.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsChildView.mm#4738

----
Failure case (trigger case) 2:

1. Cocoa's "mouseDragged" event handler is called.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsChildView.mm#4718
2. Cocoa's "mouseDragged" event handler sets gLastDragView.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsChildView.mm#4728
3. Cocoa's "mouseDragged" event handler clears gLastDragView.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsChildView.mm#4738
4. The child process sends "InvokeDragSession" IPC message by "dragService->InvokeDragSessionWithImage()"
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/events/EventStateManager.cpp#2162
5. The parent process receives the message by "TabParent::RecvInvokeDragSession()"
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/ipc/TabParent.cpp#3315
6. "nsDragService::InvokeDragSessionImpl()" is called.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsDragService.mm#286
7. The drag session is aborted due to missing gLastDragView.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsDragService.mm#300

----
All operations listed above are executed on the main thread of the parent process, however the order of those operations look quite fragile. I'm not clear around cocoa so I couldn't find out why such odd results are produced. Any further idea from cocoa experts?
Flags: needinfo?(mstange)
That does sound rather brittle. Do you know what the callstacks are when nsDragService::InvokeDragSessionImpl is called, both for the case where it's called during the mouseDragged handler and for the case where it's not?

I think Neil Deakin remembers how this is supposed to work but he's away until Aug 23.
Flags: needinfo?(mstange)
(In reply to Markus Stange [:mstange] from comment #13)
> That does sound rather brittle. Do you know what the callstacks are when
> nsDragService::InvokeDragSessionImpl is called, both for the case where it's
> called during the mouseDragged handler and for the case where it's not?

Thanks, I've attached stack traces for both cases. On the success case "nsDragService::InvokeDragSessionImpl" was called from "mouseDragged". On the failure case it was called on a different code path. Do they help further debugging?
Flags: needinfo?(enndeakin)
OK, I got it. The original code of invoking drag session with "sLastDragView" looks designed for completely synchronous drag events. On the other hand, actually the drag event was processed in a following event loop at the failure case, thus "sLastDragView" were already cleared. It must be cleared after the operation invoking drag session finish.
Flags: needinfo?(enndeakin)
I tried to write a patch for this bug, but I realized that it is not easy (for me).

At first I planned to move the code "sLastDragView = nil" from mouseDragged (in https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsChildView.mm#4738 ) to nsDragService::InvokeDragSessionImpl() ( https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsDragService.mm#286 ). It solves the problem I saw, but it may introduce other new problems due to unexpectedly held sLastDragView, because nsDragService::InvokeDragSessionImpl() is not guaranteed that called while dragging.

Thus it looks a better solution that holding sLastDragView's information as a part of Gecko's drag event itself and deliver it to nsDragService::InvokeDragSessionImpl() through multiple event loops. However, because most codes around drag events are cross-platform, I couldn't imagine how to include it to an event information with no exposing to cross-platform implementations. How should I do that?

And, there is another concern about automated test. How to write tests for such platform native drag and drop actions on macOS?
Flags: needinfo?(enndeakin)
Would it be sufficient to just move the clearing of gLastDragView to mouseUp?

You might also ask Stephen Pohl about it since he worked on some similar issues with gLastDragView.

We don't have a means of testing the native parts of drag and drop automatically. It will need to be tested manually.
Flags: needinfo?(enndeakin)
(In reply to Neil Deakin from comment #20)
> Would it be sufficient to just move the clearing of gLastDragView to mouseUp?

Thank you for the advice! Then I've researched about this viewpoint, and I've got stuck from some problems:

1. Cocoa's "mouseUp" handler isn't called for releasing of the mouse button outside the window, thus it looks impossible to detect such an action without any invoked drag session.

2. The gLastDragView is referred only by nsDragService::InvokeDragSessionImpl, so I thought that I can clear gLastDragView with nil before all cases which can reach to nsDragService::InvokeDragSessionImpl. But there are too many callers (especially EventStateManager::PreHandleEvent) and I've given up to list them up.

I gave up to list all possible callers of the nsDragService::InvokeDragSessionImpl up, but logically it should be called from any flow triggered by cocoa's mouseDown handler, and now I think clearing gLastDragView on mouseDown instead of mouseDragged is enough. Only one concern is: gLastDragView will keep something view for a long time needlessly. How about this idea?
Flags: needinfo?(spohl.mozilla.bugs)
Assignee: nobody → yuki
Status: NEW → ASSIGNED
Stephen, could you review my approach on the patch https://bugzilla.mozilla.org/attachment.cgi?id=9006440 and my previous comment https://bugzilla.mozilla.org/show_bug.cgi?id=1476195#c21 ?
Sorry about the delay here. I've been sidetracked for quite a bit. Will respond as soon as possible.
Comment on attachment 9006440 [details] [diff] [review]
invoke drag session from asynchronously processed drag event

Review of attachment 9006440 [details] [diff] [review]:
-----------------------------------------------------------------

Based on your very detailed writeup (thank you for that!) I agree with you that this seems like a sensible approach. This could fix other issues involving broken drag sessions as well, such as bug 1338748.

Before we land this patch we should make sure to change the following comments to reflect the new reality:
https://dxr.mozilla.org/mozilla-central/source/widget/cocoa/nsChildView.mm#3295
https://dxr.mozilla.org/mozilla-central/source/widget/cocoa/nsDragService.mm#293

::: widget/cocoa/nsChildView.mm
@@ +4555,5 @@
>  - (void)mouseDown:(NSEvent*)theEvent
>  {
>    NS_OBJC_BEGIN_TRY_ABORT_BLOCK;
>  
> +  // It was held on mouseDragged, and it may not be cleared if no drag session was invoked.

nit: s/It/gLastDragView/

@@ +4727,5 @@
>    if (mTextInputHandler->OnHandleEvent(theEvent)) {
>      return;
>    }
>  
> +  // For asynchronous event handling, it should not be cleared immediately.

nit: It would probably be best to drop this comment as I don't expect a reader to gain much value from it if she/he encounters this comment outside of the context of this bug.
Attachment #9006440 - Flags: feedback+
Flags: needinfo?(spohl.mozilla.bugs)
Stephen, thank you for the feedback! I've updated some more comments.
Attachment #9006440 - Attachment is obsolete: true
Attachment #9017397 - Flags: review?(mstange)
Attachment #9017397 - Flags: review?(mstange) → review?(aswan)
Piro, I dont know if aswan is familiar with cocoa stuff, but in any case he's burning down issue for 64.  A core graphics peer really should be reviewing this.
Component: General → Drag and Drop
Flags: needinfo?(yuki)
Product: WebExtensions → Core
Sorry, I simply chose him from the list of "suggested reviewers" popup. At first I chose mstange because he was the common reviewer of recent patches for the file. But he was not listed in the "suggested reviewers" and not respond, so I thought that I might did something wrong step. Did I do correct approach at the first time and I should reset the reviewer to mstang?
Flags: needinfo?(yuki)
Summary: Immediate dragstart in a remote sidebar content after restoration of discarded (pending) tab prevents dispatching of further dragover event → Dragstart on a child process produces zombie drag session in the parent process and breaks subsequent drag-and-drops on macOS
Comment on attachment 9017397 [details] [diff] [review]
invoke drag session from asynchronously processed drag event

mstang said he'd do it next week
Attachment #9017397 - Flags: review?(aswan) → review?(mstange)
Thanks!

Any progress of reviewing?

Flags: needinfo?(mstange)

I heard that deprecation of nsPresShell is in progress. Does the change affect to this patch?

Flags: needinfo?(masayuki)

(In reply to YUKI "Piro" Hiroshi from comment #32)

I heard that deprecation of nsPresShell is in progress. Does the change affect to this patch?

No, I'm gonna remove only nsIPresShell, developers should treat mozilla::PresShell directly instead.

Flags: needinfo?(masayuki)
Attachment #9017397 - Attachment is obsolete: true
Attachment #9017397 - Flags: review?(mstange)

The patch looks unmergable for the latest mozilla-central, thus I've recreated it and attached with the Phabricator.

(In reply to YUKI "Piro" Hiroshi from comment #31)

Any progress of reviewing?

Sorry for the terribly, incredibly long wait here. I've finally given this a proper look and understand both the problem and the fix now. Unfortunately the drag and drop code and in particular its interaction with remote frames is not well documented at all, so this took me a while.

The idea in your patch is fine: Making gLastDragView usable outside of mouseDragged. You introduced a few mistakes in the move to Phabricator (such that the patch won't actually work anymore), but I can fix those. There's one other change I'm going to make: Instead of resetting gLastDragView only in mouseDown and in InvokeDragSessionImpl, I'm also going to set it to nil in mouseUp, otherwise we'd break the patch for bug 1329997: We can't allow starting a drag session if the mouse button has already been released.

Flags: needinfo?(mstange)
Attachment #9074422 - Attachment description: Bug 1476195 - invoke drag session from asynchronously processed drag event, r=mstange → Bug 1476195 - Allow invoking drag sessions outside of mouseDragged. r=mstange,spohl

I've updated the patch. I've also requested review from spohl again because I made some changes.

Keywords: checkin-needed

Pushed by spohl@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/49bb3a94f090
Allow invoking drag sessions outside of mouseDragged. r=mstange,spohl

Keywords: checkin-needed

Is this is the same issue as 1580688 and 1421333?

Flags: needinfo?(yuki)

The video attached at the bug 1421333 looks same to this bug. Descriptions in the sample HTML attached to the bug 1580688 also looks to say same phenomenon. Thus I think both these two bugs should be fixed by this patch.

Flags: needinfo?(yuki)
Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla71

Please see bug 1581285. I suspect that some of the bugs marked a duplicate of this one are not duplicates, or the fix here did not cover all the bases. There is still a problem with drag/drop in 71.0a1. Thanks!

Do you think we could put this into beta?

Flags: needinfo?(yuki)

I think this change is small enough, and safe to apply to beta. Moreover, there are multiple duplicated bugs. It indicates that more people are affected, and uplifting this to beta may help dogfooding for them on macOS.

Flags: needinfo?(yuki)
QA Whiteboard: [qa-71b-p2]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: