Dragstart on a child process produces zombie drag session in the parent process and breaks subsequent drag-and-drops on macOS

ASSIGNED
Assigned to

Status

()

P3
normal
ASSIGNED
8 months ago
a month ago

People

(Reporter: yuki, Assigned: yuki, NeedInfo)

Tracking

Trunk
x86_64
macOS
Points:
---

Firefox Tracking Flags

(firefox64 affected, firefox65 affected)

Details

Attachments

(5 attachments, 1 obsolete attachment)

(Assignee)

Description

8 months ago
# Abstract

When a discarded tab is restored by an event handler for "mousedown" events on a draggable element in a remote sidebar content, "dragstart" event is dispatched but no further "dragover" events is dispatched. Moreover, when this problem happens, "dragover" event is not dispatched anymore until I drag something UI element of Firefox itself (for example, a native tab of Firefox's tab bar.) Currently I successfully reproduce this problem only on macOS (Sierra 10.12.6).

Original report for Tree Style Tab addon is:
https://github.com/piroor/treestyletab/issues/1778

# Steps to reproduce

 1. Go to "about:config" and set "extensions.webextensions.remote" to "true".
    (This problem requires enabled e10s for addons.)
 2. Go to "about:preferences" and set "General" => "Startup" =>
    "Restore previous session" checked.
 3. Open multiple tabs with webpages.
 4. Restart Nightly.
 5. Go to "about:debugging" and load the attached "sidebar-example.xpi" as
    an temporary addon.
 6. Then a sidebar panel is loaded. There are some items for tabs:
    * items with black color are already loaded tabs.
    * items with red color are discarded (pending) tabs.
    * "mousedown" on each item makes corresponding tab active.
    * these items are draggable, and "dragover" event changes the background
      color of a hovered tab green.
 7. Try to drag an item with red color. Please note that you need to produce
    "mousedown" and "dragstart" at very near time.

# Actual result

No "dragging" happens. No hovered item is shown with green background color, and moreover, you cannot start any further dragging in the sidebar panel even if you retry to drag any other item.

# Expected result

The dragged item is "dragged" and hovered item is shown with green background color. And I can drag any other item again.
(Assignee)

Comment 2

8 months ago
In the screencast, all of "DRAG START", "DRAG OVER" and "DRAG END" logs appeared on the success case but only "DRAG START" logs are appeared on the failure case. (Please note that the number of logs are just incremented because there are no other logs.)
Priority: -- → P3
(Assignee)

Comment 3

8 months ago
I started Firefox with "MOZ_LOG=timestamp,sync,nsDragService:5,nsCocoaWidgets:5" and tried to logging. On success case, many logs like "I/nsCocoaWidgets ChildView doDragAction: entered" and "I/nsCocoaWidgets ChildView draggingUpdated: entered" are reported. On the other hand, the failure case reports no log for those modules.
(Assignee)

Comment 4

8 months ago
I'm trying to research with a custom debug build on macOS, and
https://dxr.mozilla.org/mozilla-central/rev/085cdfb90903d4985f0de1dc7786522d9fb45596/dom/events/EventStateManager.cpp#1939
I've confirmed that the dragstart event is not canceled at here in both success and failure cases.
(the "startEvent" had mUserCancelled=false and mDefaultPreventedOnContent=false in both cases.)
Hmm...
(Assignee)

Comment 5

7 months ago
After more debugging, I've realized that the difference of the success case and the failure case. When I successfully started dragging from the sidebar, "count" on this line:
https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/events/EventStateManager.cpp#2085
became 1 (or more). On the other hand, when I couldn't start dragging, the "count" was always 0 despite I certainly called "dataTransfer.setData()" of the dragstart event. So, when this problem happens, the EventStateManager looks to be receiving wrong drag session.
(Assignee)

Comment 6

7 months ago
https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/events/EventStateManager.cpp#1991
=> https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/base/nsContentAreaDragDrop.cpp#116
=> https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/base/nsContentAreaDragDrop.cpp#542
=> https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/ipc/TabParent.cpp#3374

On the failure case, Firefox tried to get dragged data by TabParent::AddInitialDnDDataTo() but it returned nothing. The drag data must be prepared by TabParent::RecvInvokeDragSession(), so I set a breakpoint there.

https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/ipc/TabParent.cpp#3315

Then, I realized that the function was not called on the failure case. Thus it looks that the child process failed to send drag data to the parent process or the parent failed to receive drag data sent from the child.
(Assignee)

Comment 7

7 months ago
Update:
https://dxr.mozilla.org/mozilla-central/source/obj-x86_64-pc-linux-gnu/ipc/ipdl/PBrowserParent.cpp#4339
On success case "InvokeDragSession" IPC message was logged but on failure case it was not. If "InvokeDragSession" IPC message was sent from the child process, it should also be logged, because other IPC messages from the child process were still logged. So I think that "InvokeDragSession" IPC message was not sent from the child process.

The "InvokeDragSession" IPC message looks sent from: 
https://dxr.mozilla.org/mozilla-central/source/obj-x86_64-pc-linux-gnu/ipc/ipdl/PContent.cpp#435
=> https://dxr.mozilla.org/mozilla-central/source/obj-x86_64-pc-linux-gnu/ipc/ipdl/PBrowserChild.cpp#2378
=> https://dxr.mozilla.org/mozilla-central/source/widget/nsDragServiceProxy.cpp#95
=> https://dxr.mozilla.org/mozilla-central/source/widget/nsBaseDragService.cpp#259
=> https://dxr.mozilla.org/mozilla-central/source/widget/nsBaseDragService.cpp#298
=> https://dxr.mozilla.org/mozilla-central/source/dom/events/EventStateManager.cpp#2162
So, if InvokeDragSessionWithImage() was not called, no "InvokeDragSession" IPC message should be sent from the child process. As the next step I'm trying to research that the method is actually called or not.
(Assignee)

Comment 8

7 months ago
Update:
With more logging, I've realized that the child process didn't send "InvokeDragSession" IPC message on the failure case. At the time EventStateManager::DoDefaultDragStart() was returned here before drag session is invoked:
https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/events/EventStateManager.cpp#2077
I think that the operation "Immediate dragstart in a remote sidebar content after restoration of discarded (pending) tab" produced a zombie drag session.
(Assignee)

Comment 9

7 months ago
The condition "dragging of a pendingtab" looks unnecessary. I've successfully reproduced the problem around a regular tab, on a debug build of Firefox for macOS. The debug build is very very slow and easily I can move mouse before the clicked tab is activated. Thus I'm guessing that the actual trigger is maybe "switching of active child process while starting of a new drag session".
(Assignee)

Comment 10

7 months ago
I've researched more.

On a success case, the "dragstert" action in the child process was processed and an "InvokeDragSession" IPC message was sent. The event state manager of the parent process received it and invoked the drag session successfully.
Moreover, its PreHandleEvent() method was called for an "eDragOver" event, then its DispatchCrossProcessEvent() and MaybeInvokeDragSession() methods were called and following drag-and-drop events were dispatched as expected.

----

On the other hand, there are different two code paths on failure cases: the trigger case and others.

The trigger case: the "dragstert" action in the child process was processed and an "InvokeDragSession" IPC message was sent. The event state manager of the parent process received it and at least tried to invoke the drag session. After that, due to something unknown reasons, in this case PreHandleEvent() method was not called. As the result following drag-and-drop events were not dispatched.
I'm gessing that a zombie drag session was left on the child process on this case.

And, other failure cases: the "dragstert" action in the child process was processed, but it didn't send an "InvokeDragSession" IPC message due to a zombie drag session. As the result following drag-and-drop events were not dispatched.
(Assignee)

Comment 11

7 months ago
On the trigger case, the "InvokeDragSession" IPC message from the child process was handled by the parent process, but the operation was aborted at this line:

https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsDragService.mm#300

> nsresult
> nsDragService::InvokeDragSessionImpl(nsIArray* aTransferableArray,
>                                      nsIScriptableRegion* aDragRgn,
>                                      uint32_t aActionType)
> {
>   NS_OBJC_BEGIN_TRY_ABORT_BLOCK_NSRESULT;
> 
>   if (!gLastDragView) {
>     // gLastDragView is only set during -[ChildView mouseDragged:].
>     // InvokeDragSessionImpl is only called while Gecko processes a mouse move
>     // event. So if we get here with gLastDragView being null, that means that
>     // the mouse button has already been released, and mouseMoved is on the
>     // stack instead of mouseDragged. In that case we need to abort the drag
>     // because the OS won't know where to drop whatever's being dragged, and we
>     // might end up with a stuck drag & drop session.
>     return NS_ERROR_FAILURE;
>   }

After some tries I've realized that "gLastDragView" became unexpectedly null while dragging, when the mouse cursor was moved out from the element dragstart event was fired on.
(Assignee)

Comment 12

7 months ago
The "gLastDragView" is updated here:
https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsChildView.mm#4728
and it is cleared just 10 lines after. There is no other code which can touch to the global variable.
So I tried to inspect more details with custom MOZ_LOG()s, and finally I got these results:

----
Success case: 

1. The child process sends "InvokeDragSession" IPC message by "dragService->InvokeDragSessionWithImage()"
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/events/EventStateManager.cpp#2162
2. The parent process receives the message by "TabParent::RecvInvokeDragSession()"
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/ipc/TabParent.cpp#3315
3. Cocoa's "mouseDragged" event handler is called.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsChildView.mm#4718
4. Cocoa's "mouseDragged" event handler sets gLastDragView.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsChildView.mm#4728
5. "nsDragService::InvokeDragSessionImpl()" is called and the drag session is invoked.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsDragService.mm#286
6. Cocoa's "mouseDragged" event handler clears gLastDragView.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsChildView.mm#4738

----
Failure case (trigger case) 1:

1. The child process sends "InvokeDragSession" IPC message by "dragService->InvokeDragSessionWithImage()"
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/events/EventStateManager.cpp#2162
2. The parent process receives the message by "TabParent::RecvInvokeDragSession()"
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/ipc/TabParent.cpp#3315
3. "nsDragService::InvokeDragSessionImpl()" is called.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsDragService.mm#286
4. The drag session is aborted due to missing gLastDragView.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsDragService.mm#300
5. Cocoa's "mouseDragged" event handler is called.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsChildView.mm#4718
6. Cocoa's "mouseDragged" event handler sets gLastDragView.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsChildView.mm#4728
7. Cocoa's "mouseDragged" event handler clears gLastDragView.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsChildView.mm#4738

----
Failure case (trigger case) 2:

1. Cocoa's "mouseDragged" event handler is called.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsChildView.mm#4718
2. Cocoa's "mouseDragged" event handler sets gLastDragView.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsChildView.mm#4728
3. Cocoa's "mouseDragged" event handler clears gLastDragView.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsChildView.mm#4738
4. The child process sends "InvokeDragSession" IPC message by "dragService->InvokeDragSessionWithImage()"
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/events/EventStateManager.cpp#2162
5. The parent process receives the message by "TabParent::RecvInvokeDragSession()"
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/dom/ipc/TabParent.cpp#3315
6. "nsDragService::InvokeDragSessionImpl()" is called.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsDragService.mm#286
7. The drag session is aborted due to missing gLastDragView.
   https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsDragService.mm#300

----
All operations listed above are executed on the main thread of the parent process, however the order of those operations look quite fragile. I'm not clear around cocoa so I couldn't find out why such odd results are produced. Any further idea from cocoa experts?
Flags: needinfo?(mstange)
That does sound rather brittle. Do you know what the callstacks are when nsDragService::InvokeDragSessionImpl is called, both for the case where it's called during the mouseDragged handler and for the case where it's not?

I think Neil Deakin remembers how this is supposed to work but he's away until Aug 23.
Flags: needinfo?(mstange)
(Assignee)

Comment 17

7 months ago
(In reply to Markus Stange [:mstange] from comment #13)
> That does sound rather brittle. Do you know what the callstacks are when
> nsDragService::InvokeDragSessionImpl is called, both for the case where it's
> called during the mouseDragged handler and for the case where it's not?

Thanks, I've attached stack traces for both cases. On the success case "nsDragService::InvokeDragSessionImpl" was called from "mouseDragged". On the failure case it was called on a different code path. Do they help further debugging?
Flags: needinfo?(enndeakin)
(Assignee)

Comment 18

7 months ago
OK, I got it. The original code of invoking drag session with "sLastDragView" looks designed for completely synchronous drag events. On the other hand, actually the drag event was processed in a following event loop at the failure case, thus "sLastDragView" were already cleared. It must be cleared after the operation invoking drag session finish.
Flags: needinfo?(enndeakin)
(Assignee)

Comment 19

7 months ago
I tried to write a patch for this bug, but I realized that it is not easy (for me).

At first I planned to move the code "sLastDragView = nil" from mouseDragged (in https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsChildView.mm#4738 ) to nsDragService::InvokeDragSessionImpl() ( https://dxr.mozilla.org/mozilla-central/rev/4e56a2f51ad739ca52046723448f3129a58f1666/widget/cocoa/nsDragService.mm#286 ). It solves the problem I saw, but it may introduce other new problems due to unexpectedly held sLastDragView, because nsDragService::InvokeDragSessionImpl() is not guaranteed that called while dragging.

Thus it looks a better solution that holding sLastDragView's information as a part of Gecko's drag event itself and deliver it to nsDragService::InvokeDragSessionImpl() through multiple event loops. However, because most codes around drag events are cross-platform, I couldn't imagine how to include it to an event information with no exposing to cross-platform implementations. How should I do that?

And, there is another concern about automated test. How to write tests for such platform native drag and drop actions on macOS?
Flags: needinfo?(enndeakin)

Comment 20

7 months ago
Would it be sufficient to just move the clearing of gLastDragView to mouseUp?

You might also ask Stephen Pohl about it since he worked on some similar issues with gLastDragView.

We don't have a means of testing the native parts of drag and drop automatically. It will need to be tested manually.
Flags: needinfo?(enndeakin)
(Assignee)

Comment 21

7 months ago
(In reply to Neil Deakin from comment #20)
> Would it be sufficient to just move the clearing of gLastDragView to mouseUp?

Thank you for the advice! Then I've researched about this viewpoint, and I've got stuck from some problems:

1. Cocoa's "mouseUp" handler isn't called for releasing of the mouse button outside the window, thus it looks impossible to detect such an action without any invoked drag session.

2. The gLastDragView is referred only by nsDragService::InvokeDragSessionImpl, so I thought that I can clear gLastDragView with nil before all cases which can reach to nsDragService::InvokeDragSessionImpl. But there are too many callers (especially EventStateManager::PreHandleEvent) and I've given up to list them up.

I gave up to list all possible callers of the nsDragService::InvokeDragSessionImpl up, but logically it should be called from any flow triggered by cocoa's mouseDown handler, and now I think clearing gLastDragView on mouseDown instead of mouseDragged is enough. Only one concern is: gLastDragView will keep something view for a long time needlessly. How about this idea?
Flags: needinfo?(spohl.mozilla.bugs)
(Assignee)

Updated

7 months ago
Assignee: nobody → yuki
Status: NEW → ASSIGNED
(Assignee)

Comment 23

6 months ago
Stephen, could you review my approach on the patch https://bugzilla.mozilla.org/attachment.cgi?id=9006440 and my previous comment https://bugzilla.mozilla.org/show_bug.cgi?id=1476195#c21 ?
Sorry about the delay here. I've been sidetracked for quite a bit. Will respond as soon as possible.
Comment on attachment 9006440 [details] [diff] [review]
invoke drag session from asynchronously processed drag event

Review of attachment 9006440 [details] [diff] [review]:
-----------------------------------------------------------------

Based on your very detailed writeup (thank you for that!) I agree with you that this seems like a sensible approach. This could fix other issues involving broken drag sessions as well, such as bug 1338748.

Before we land this patch we should make sure to change the following comments to reflect the new reality:
https://dxr.mozilla.org/mozilla-central/source/widget/cocoa/nsChildView.mm#3295
https://dxr.mozilla.org/mozilla-central/source/widget/cocoa/nsDragService.mm#293

::: widget/cocoa/nsChildView.mm
@@ +4555,5 @@
>  - (void)mouseDown:(NSEvent*)theEvent
>  {
>    NS_OBJC_BEGIN_TRY_ABORT_BLOCK;
>  
> +  // It was held on mouseDragged, and it may not be cleared if no drag session was invoked.

nit: s/It/gLastDragView/

@@ +4727,5 @@
>    if (mTextInputHandler->OnHandleEvent(theEvent)) {
>      return;
>    }
>  
> +  // For asynchronous event handling, it should not be cleared immediately.

nit: It would probably be best to drop this comment as I don't expect a reader to gain much value from it if she/he encounters this comment outside of the context of this bug.
Attachment #9006440 - Flags: feedback+
Flags: needinfo?(spohl.mozilla.bugs)
(Assignee)

Comment 26

5 months ago
Stephen, thank you for the feedback! I've updated some more comments.
Attachment #9006440 - Attachment is obsolete: true
Attachment #9017397 - Flags: review?(mstange)
(Assignee)

Updated

5 months ago
status-firefox64: --- → affected
status-firefox65: --- → affected
(Assignee)

Updated

5 months ago
Attachment #9017397 - Flags: review?(mstange) → review?(aswan)
Piro, I dont know if aswan is familiar with cocoa stuff, but in any case he's burning down issue for 64.  A core graphics peer really should be reviewing this.
Component: General → Drag and Drop
Flags: needinfo?(yuki)
Product: WebExtensions → Core
(Assignee)

Comment 28

5 months ago
Sorry, I simply chose him from the list of "suggested reviewers" popup. At first I chose mstange because he was the common reviewer of recent patches for the file. But he was not listed in the "suggested reviewers" and not respond, so I thought that I might did something wrong step. Did I do correct approach at the first time and I should reset the reviewer to mstang?
Flags: needinfo?(yuki)
(Assignee)

Updated

5 months ago
Summary: Immediate dragstart in a remote sidebar content after restoration of discarded (pending) tab prevents dispatching of further dragover event → Dragstart on a child process produces zombie drag session in the parent process and breaks subsequent drag-and-drops on macOS
Comment on attachment 9017397 [details] [diff] [review]
invoke drag session from asynchronously processed drag event

mstang said he'd do it next week
Attachment #9017397 - Flags: review?(aswan) → review?(mstange)
(Assignee)

Comment 30

5 months ago
Thanks!
(Assignee)

Comment 31

a month ago

Any progress of reviewing?

Flags: needinfo?(mstange)
You need to log in before you can comment on or make changes to this bug.