Closed Bug 1235162 Opened 8 years ago Closed 7 years ago

Crash when dragging objects in OS X (objc_msgSend | CFMessagePortIsValid)

Categories

(Core :: Widget: Cocoa, defect, P1)

41 Branch
x86_64
macOS
defect

Tracking

()

VERIFIED FIXED
mozilla53
Tracking Status
firefox48 --- wontfix
firefox49 --- wontfix
firefox-esr45 --- wontfix
firefox50 --- wontfix
firefox51 --- wontfix
firefox52 --- wontfix
firefox53 --- verified

People

(Reporter: abr, Assigned: spohl)

References

Details

(Keywords: crash, Whiteboard: tpi:+)

Crash Data

Attachments

(1 file, 2 obsolete files)

This bug was filed from the Socorro interface and is 
report bp-f64cf86a-ff34-4c65-8ecb-2a6a52151225.
=============================================================

Dragging objects (images, links, text, tabs) on OS X intermittently crashes with invalid or null addresses. From the Socorro comments on similar stacks, this appears to be happening quite frequently to some users. Several such crash comments mention SOHONotes by name:

https://crash-stats.mozilla.com/report/list?product=Firefox&range_unit=days&range_value=28&signature=objc_msgSend+|+CFMessagePortIsValid#tab-comments
Hardware: Unspecified → x86_64
Version: unspecified → 41 Branch
Happens on at least 48 and 49, guessing newer are affected as well. Jimm, this one seems to be one of those crashes that don't necessarily affect tons of users, but for those who are affected it hurts a lot. Can we get someone on this one?
Flags: needinfo?(jmathies)
Sure we can take a look. Currently #34 on the crash list for OSX across all versions.
Component: DOM: Events → Widget: Cocoa
Flags: needinfo?(jmathies)
Priority: -- → P1
Whiteboard: tpi:+
I had a look at this today and wanted to capture a few things in a comment. Markus, could you have a look at this and tell me if you have any thoughts what might be the cause here? Thanks in advance!

Using Socorro[1], I was able to determine that this affects OSX 10.9.5 and above. The first affected Firefox build is 41.0b9 with buildid 20150910171927. Version 50.0a2 with buildid 20160815004002 is still affected, so this continues to be a valid issue.

I had a look at the submissions between 41.0b8 and 41.0b9[2], but there is nothing that jumps out at me. I checked every diff and nothing touched nsChildView.mm or nsDragService.mm, which would be the most likely places where this crash might have been introduced.
I even had a look at submissions between 41.0b7 and 41.0b8[3], thinking that maybe the bug was introduced then but nobody happened to crash. Again, nothing that looked suspicious to me.

I was able to generate crash stacks that looked similar to the ones in Socorro by passing invalid pointers to [NSView dragImage]'s |event|, |pasteboard| or |source| parameters, but they were all missing a topmost AppKit stack frame of -[NSCoreDragManager _dragUntilMouseUp:accepted:]. This makes me think that one of those three pointers becomes invalid after we initiate the drag. I suspect that the type of the parameter for |NSCoreDragManager _dragUntilMouseUp| could tell us which one of the three it is, but the only public documentation that I could find for this method shows a type of (id)[4].

We had a similar report back in bug 388522 and we fixed this by explicitly retaining the NSView* and NSEvent* before calling |dragImage|. So I traced our retain/release sequence on |gLastDragView| and |gLastDragMouseDownEvent| in nsDragService.mm and nsChildView.mm and again, everything looks valid and I can't see any double-releases or similar.

I noticed that we don't add our image to the NSPasteboard before we pass the NSPasteboard instance to |dragImage| like Apple's example online shows[5]. I tried to do this as follows:

>  NSPasteboard* pboard = [NSPasteboard pasteboardWithName:NSDragPboard];
>  [pboard declareTypes:[NSArray arrayWithObject:NSPasteboardTypeTIFF] owner:nil];
>  [pboard setData:[image TIFFRepresentation] forType:NSPasteboardTypeTIFF];

However, if I then pass |pboard| to |dragImage| I can no longer drag links to the desktop or drag tabs and reorder them in the tab bar. I can still drag a tab out of the tab group and create a new window though.

The last thing I wanted to note is that |dragImage:at:offset:event:pasteboard:source:slideBack:| is deprecated and Apple recommends the use of |beginDraggingSessionWithItems:event:source:| instead.

So, my questions are:
1. Do you agree that one of the three pointers (NSView*, NSEvent*, NSPasteboard*) has become invalid during a drag when this crash occurs? If so, which one seems most likely and any idea how to guard against that?
2. Is it possible that our failure to add the image to the NSPasteboard* could cause this issue? If so, do you happen to know why we lose the ability to drag links to the desktop or reorder tabs if we do so?
3. The new |beginDraggingSessionWithItems:event:source:| continues to rely on NSView*, NSEvent* and NSPasteboard* that we've been using so far, so I had low confidence that a move to this new method would solve this issue. Do you agree?


[1] https://crash-stats.mozilla.com/search/?signature=~objc_msgSend%20%7C%20CFMessagePortIsValid&date=%3E%3D2015-08-20&_sort=build_id&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=platform_version#crash-reports
[2] https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=4ed5ddd4ab04&tochange=5a3f1e91bfef
[3] https://hg.mozilla.org/releases/mozilla-beta/pushloghtml?fromchange=65541f722678&tochange=4ed5ddd4ab04
[4] https://github.com/pombredanne/osx_headers/blob/c7e6776dfa795f6653e8a2ddab22f061eafba784/Frameworks/AppKit/NSCoreDragManager.h#L25
[5] https://developer.apple.com/library/mac/documentation/Cocoa/Reference/ApplicationKit/Classes/NSView_Class/index.html#//apple_ref/doc/uid/20000014-SW135
Flags: needinfo?(mstange)
Crash volume for signature 'objc_msgSend | CFMessagePortIsValid':
 - nightly (version 51): 0 crashes from 2016-08-01.
 - aurora  (version 50): 10 crashes from 2016-08-01.
 - beta    (version 49): 34 crashes from 2016-08-02.
 - release (version 48): 281 crashes from 2016-07-25.
 - esr     (version 45): 402 crashes from 2016-05-02.

Crash volume on the last weeks (Week N is from 08-22 to 08-28):
            W. N-1  W. N-2  W. N-3
 - nightly       0       0       0
 - aurora        6       3       0
 - beta         11      16       4
 - release      78      80      27
 - esr          32      39      35

Affected platform: Mac OS X

Crash rank on the last 7 days:
           Browser     Content   Plugin
 - nightly
 - aurora  #342
 - beta    #1864
 - release #193
 - esr     #263
(In reply to Stephen A Pohl [:spohl] from comment #6)
> I had a look at this today and wanted to capture a few things in a comment.

This is some really thorough investigation, thank you for this.

> Using Socorro[1], I was able to determine that this affects OSX 10.9.5 and
> above.

I see reports from all versions of 10.9, and all versions of 10.10. I do not see a single report from 10.11. So it seems like this is a bug that was fixed in 10.11.

Here's a report from 10.9.0, for example:
https://crash-stats.mozilla.com/report/index/6b835ca1-3948-4659-9ca3-8c44d2160902

> I was able to generate crash stacks that looked similar to the ones in
> Socorro by passing invalid pointers to [NSView dragImage]'s |event|,
> |pasteboard| or |source| parameters, but they were all missing a topmost
> AppKit stack frame of -[NSCoreDragManager _dragUntilMouseUp:accepted:]. This
> makes me think that one of those three pointers becomes invalid after we
> initiate the drag. I suspect that the type of the parameter for
> |NSCoreDragManager _dragUntilMouseUp| could tell us which one of the three
> it is, but the only public documentation that I could find for this method
> shows a type of (id)[4].

That parameter is the event. -[NSCoreDragManager _dragUntilMouseUp:accepted:] calls "window" and "locationInWindow" on it.

> We had a similar report back in bug 388522 and we fixed this by explicitly
> retaining the NSView* and NSEvent* before calling |dragImage|. So I traced
> our retain/release sequence on |gLastDragView| and |gLastDragMouseDownEvent|
> in nsDragService.mm and nsChildView.mm and again, everything looks valid and
> I can't see any double-releases or similar.

The only thing I see is that we leak the fallback grey box drag image... but that's probably unrelated and shouldn't happen in regular circumstances.

> I noticed that we don't add our image to the NSPasteboard before we pass the
> NSPasteboard instance to |dragImage| like Apple's example online shows[5]. I
> tried to do this as follows:
> 
> >  NSPasteboard* pboard = [NSPasteboard pasteboardWithName:NSDragPboard];
> >  [pboard declareTypes:[NSArray arrayWithObject:NSPasteboardTypeTIFF] owner:nil];
> >  [pboard setData:[image TIFFRepresentation] forType:NSPasteboardTypeTIFF];
> 
> However, if I then pass |pboard| to |dragImage| I can no longer drag links
> to the desktop or drag tabs and reorder them in the tab bar.

The pasteboard has already been filled with information about the dragged element at that point. Your code here overwrites it with the preview image. But that image is only there to make dragging look good; it's not actually "the thing you're dragging". For example, if you select pieces of text and drag them, then the pasteboard contains that text and na bit of metadata about it, and "image" contains a rendered version of the text to show during the drag.

> I can still
> drag a tab out of the tab group and create a new window though.

That's probably because the dragged tab is additionally tracked internally to Firefox and Firefox doesn't rely on the pasteboard contents.

> The last thing I wanted to note is that
> |dragImage:at:offset:event:pasteboard:source:slideBack:| is deprecated and
> Apple recommends the use of |beginDraggingSessionWithItems:event:source:|
> instead.

It would be good to switch, then. :-)

> So, my questions are:
> 1. Do you agree that one of the three pointers (NSView*, NSEvent*,
> NSPasteboard*) has become invalid during a drag when this crash occurs?

Not sure yet. If you can send me the AppKit binary from 10.10.5, I could take a closer look.

> 2. Is it possible that our failure to add the image to the NSPasteboard*
> could cause this issue?

It really shouldn't, and doing so is not the right solution.

> 3. The new |beginDraggingSessionWithItems:event:source:| continues to rely
> on NSView*, NSEvent* and NSPasteboard* that we've been using so far, so I
> had low confidence that a move to this new method would solve this issue. Do
> you agree?

I've only glanced at the API briefly and I don't see how it lets you influence the drag image. One of us needs to look into this more closely.
Flags: needinfo?(mstange)
Setting the assignee field to reflect reality in this bug.
Assignee: nobody → spohl.mozilla.bugs
Blocks: 1301349
I'm still looking at this, but I wanted to double-check the versions of OSX that this affects:

(In reply to Markus Stange [:mstange] from comment #8)
> (In reply to Stephen A Pohl [:spohl] from comment #6)
> > Using Socorro[1], I was able to determine that this affects OSX 10.9.5 and
> > above.
> 
> I see reports from all versions of 10.9, and all versions of 10.10. I do not
> see a single report from 10.11. So it seems like this is a bug that was
> fixed in 10.11.
> 
> Here's a report from 10.9.0, for example:
> https://crash-stats.mozilla.com/report/index/6b835ca1-3948-4659-9ca3-
> 8c44d2160902

The summary in [1] seems to show 2.8% of crashes occurring on 10.11. [2] seems to be an example of it. Am I using crash-stats incorrectly?

[1] https://crash-stats.mozilla.com/signature/?date=%3E%3D2016-08-12T03%3A59%3A19.626548&date=%3C2016-09-09T03%3A59%3A19.626548&product=Firefox&signature=objc_msgSend%20%7C%20CFMessagePortIsValid&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=platform_version&_sort=-date&page=1#summary
[2] https://crash-stats.mozilla.com/report/index/176c47fc-9943-43f1-8e06-425f32160909
Flags: needinfo?(mstange)
No, you're completely right, I must have missed it. The crash frequency is much lower than on the other platforms though... not sure what we can conclude from that.
Flags: needinfo?(mstange)
Attached patch wip (obsolete) — Splinter Review
I have been digging into this for some time now but I'm still fighting with a few remaining issues. Markus, would you mind taking a look and provide initial feedback? Here's what this patch does so far:

1. Explicitly declares ChildView to be implementing the NSDraggingSource, NSDraggingDestination and NSPasteboardItemDataProvider protocols. This is required by the new dragging API or we can't pass an instance of ChildView as source to NSView's beginDraggingSessionWithItems.
2. Changed deprecated UTI types to the new types. NSView's beginDraggingSessionWithItems does not work with the old UTI types. See "open issues" for remaining work regarding UTI types.
3. Implemented NSPasteboardItemDataProvider's provideDataForType.
4. Dragging within Firefox works as expected, including tab dragging. However, we currently only show a thumbnail image in non-e10s mode for tabs.

Open issues:
1. Need to find the proper new values for kCorePboardType_url, kCorePboardType_urld and kCorePboardType_urln.
2. Need to verify if we have to loop over the transferables to declare all the types that we support, or whether we can just declare all of them regardless of the transferables (see TODO in patch).
3. Need to find a solution for showing the tab thumbnail image in e10s mode. By constructing the thumbnail image in the imageComponentsProvider, I can properly generate the thumbnail image for e10s. However, it will be positioned incorrectly. Possibly due to [gLastDragMouseDownEvent locationInWindow] not being correct anymore.
4. Dropping outside of Firefox does not yet work. Possibly due to incorrect UTI types.
5. Trying to drop to the desktop doesn't work. Usually after 1-3 attempts, the obj-c exception below is shown in the console. Dragging is completely broken after this. This might be the same issue as the original crash reported in this bug, but I don't have any way to prove this. Based on comment 8, it sounds like "-[NSCoreDragManager _dragUntilMouseUp:accepted:] calls "window" and "locationInWindow" on it.", "it" being the event passed to beginDraggingSessionWithItems. Could it be that locationInWindow is no longer correct? NaN could be due to a division by zero, but I'm not sure where this might be happening.
2016-09-30 13:41:56.097 firefox[5123:7627632] CALayer position contains NaN: [nan nan]
2016-09-30 13:41:56.100 firefox[5123:7627632] (
	0   CoreFoundation                      0x00007fff8ed324f2 __exceptionPreprocess + 178
	1   libobjc.A.dylib                     0x00007fff9291873c objc_exception_throw + 48
	2   CoreFoundation                      0x00007fff8ed994bd +[NSException raise:format:] + 205
	3   QuartzCore                          0x00007fff930979b6 _ZN2CA5Layer12set_positionERKNS_4Vec2IdEEb + 152
	4   QuartzCore                          0x00007fff93097916 -[CALayer setPosition:] + 44
	5   HIToolbox                           0x00007fff91ec0711 _ZL20UpdateLayerPositionsP14OpaqueCoreDrag + 630
	6   HIToolbox                           0x00007fff920e50f8 _Z29UnflockAnimationTimerCallbackP16__CFRunLoopTimerPv + 759
	7   CoreFoundation                      0x00007fff8ecafb94 __CFRUNLOOP_IS_CALLING_OUT_TO_A_TIMER_CALLBACK_FUNCTION__ + 20
	8   CoreFoundation                      0x00007fff8ecaf823 __CFRunLoopDoTimer + 1075
	9   CoreFoundation                      0x00007fff8ecaf37a __CFRunLoopDoTimers + 298
	10  CoreFoundation                      0x00007fff8eca6871 __CFRunLoopRun + 1841
	11  CoreFoundation                      0x00007fff8eca5ed8 CFRunLoopRunSpecific + 296
	12  CoreFoundation                      0x00007fff8ed0b925 CFMessagePortSendRequest + 949
	13  HIServices                          0x00007fff969057dc SendDragIPCMessage + 530
	14  HIServices                          0x00007fff96902c61 CoreDragMessageHandler + 1321
	15  CoreFoundation                      0x00007fff8ed47628 __CFMessagePortPerform + 584
	16  CoreFoundation                      0x00007fff8ecaf019 __CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE1_PERFORM_FUNCTION__ + 41
	17  CoreFoundation                      0x00007fff8ecaef89 __CFRunLoopDoSource1 + 473
	18  CoreFoundation                      0x00007fff8eca69bb __CFRunLoopRun + 2171
	19  CoreFoundation                      0x00007fff8eca5ed8 CFRunLoopRunSpecific + 296
	20  CoreFoundation                      0x00007fff8ed0b925 CFMessagePortSendRequest + 949
	21  HIServices                          0x00007fff969057dc SendDragIPCMessage + 530
	22  HIServices                          0x00007fff968fdb29 SendDropMessage + 64
	23  HIServices                          0x00007fff968fc807 DragInApplication + 505
	24  HIServices                          0x00007fff968fb6cf CoreDragStartDragging + 705
	25  AppKit                              0x00007fff8c48b369 -[NSCoreDragManager _dragUntilMouseUp:accepted:] + 1010
	26  AppKit                              0x00007fff8c73adde _handleBeginDraggingSession + 97
	27  CoreFoundation                      0x00007fff8ecc7067 __CFRUNLOOP_IS_CALLING_OUT_TO_AN_OBSERVER_CALLBACK_FUNCTION__ + 23
	28  CoreFoundation                      0x00007fff8ecc6fd7 __CFRunLoopDoObservers + 391
	29  CoreFoundation                      0x00007fff8eca65da __CFRunLoopRun + 1178
	30  CoreFoundation                      0x00007fff8eca5ed8 CFRunLoopRunSpecific + 296
	31  HIToolbox                           0x00007fff91e81935 RunCurrentEventLoopInMode + 235
	32  HIToolbox                           0x00007fff91e8176f ReceiveNextEventCommon + 432
	33  HIToolbox                           0x00007fff91e815af _BlockUntilNextEventMatchingListInModeWithFilter + 71
	34  AppKit                              0x00007fff8c1cfdf6 _DPSNextEvent + 1067
	35  AppKit                              0x00007fff8c1cf226 -[NSApplication _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 454
	36  XUL                                 0x0000000104495fe2 -[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] + 82
)
Attachment #8796642 - Flags: feedback?(mstange)
What you said all sounds good. I haven't looked at the patch itself. I don't have any ideas on the crash. I'm keeping the "feedback?" in order to look at it more closely another time.
This seems to have stalled out. Bump?
Comment on attachment 8796642 [details] [diff] [review]
wip

Not stalled, actively working on it. Most issues in comment 12 are now fixed but I haven't been able to isolate the crash yet. Will post the next patch once this is done.

Clearing f? on the old patch.
Attachment #8796642 - Flags: feedback?(mstange)
Attached patch Patch (obsolete) — Splinter Review
All known issues have been addressed. This patch switches from the deprecated dragimage:at:offset:event:pasteboard:source:slideBack API to the new beginDraggingSessionWithItems:event:source: API.

Try run in comment 16.
Attachment #8796642 - Attachment is obsolete: true
Attachment #8816580 - Flags: review?(mstange)
Neil, just a heads up that this is incoming. I don't expect there to be too many conflicts between this and bug 1309596 though.
Comment on attachment 8816580 [details] [diff] [review]
Patch

Review of attachment 8816580 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good, except for the loop issue mentioned below.
This patch would have been a little easier to review if you had kept the original order and not mixed in style changes like aArgument renames and line breaks, so please try to avoid mixing these types of changes in the future.

::: widget/cocoa/nsChildView.mm
@@ +5981,5 @@
> +    unsigned int typeCount = [pasteboardOutputDict count];
> +    NSMutableArray* types = [NSMutableArray arrayWithCapacity:typeCount + 1];
> +    [types addObjectsFromArray:[pasteboardOutputDict allKeys]];
> +    [types addObject:kWildcardPboardType];
> +    for (unsigned int k = 0; k < typeCount; k++) {

This loop broke when you moved the code from SetUpDragClipboard. The loop body here is doing the same thing every iteration.
You'll want something like NSString* currentType = [types objectAtIndex:k] and then use currentType instead of aType.
Attachment #8816580 - Flags: review?(mstange) → review+
Attached patch PatchSplinter Review
(In reply to Markus Stange [:mstange] from comment #19)
> Comment on attachment 8816580 [details] [diff] [review]
> Patch
> 
> Review of attachment 8816580 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Looks good, except for the loop issue mentioned below.
> This patch would have been a little easier to review if you had kept the
> original order and not mixed in style changes like aArgument renames and
> line breaks, so please try to avoid mixing these types of changes in the
> future.

Sorry, I worked on this patch for so long that I made most of these changes for better readability while working on the code. I will make sure to split similar changes out into separate patches in the future.

> ::: widget/cocoa/nsChildView.mm
> @@ +5981,5 @@
> > +    unsigned int typeCount = [pasteboardOutputDict count];
> > +    NSMutableArray* types = [NSMutableArray arrayWithCapacity:typeCount + 1];
> > +    [types addObjectsFromArray:[pasteboardOutputDict allKeys]];
> > +    [types addObject:kWildcardPboardType];
> > +    for (unsigned int k = 0; k < typeCount; k++) {
> 
> This loop broke when you moved the code from SetUpDragClipboard. The loop
> body here is doing the same thing every iteration.
> You'll want something like NSString* currentType = [types objectAtIndex:k]
> and then use currentType instead of aType.

Oops, thank you! Fixed. Carrying over r+.
Attachment #8816580 - Attachment is obsolete: true
Attachment #8819906 - Flags: review+
Comment on attachment 8819906 [details] [diff] [review]
Patch

I need to take another look at this. Clearing r+ for now.
Attachment #8819906 - Attachment is obsolete: true
Attachment #8819906 - Flags: review+
Comment on attachment 8819906 [details] [diff] [review]
Patch

We may be able to simplify some of this in the future, but this requires more testing and should probably happen in separate bugs if we deem it necessary.
Attachment #8819906 - Attachment is obsolete: false
Attachment #8819906 - Flags: review+
https://hg.mozilla.org/mozilla-central/rev/82ca4696b342
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Stephen, is it something we should uplift? Thanks
(In reply to Sylvestre Ledru [:sylvestre] from comment #26)
> Stephen, is it something we should uplift? Thanks

I suggest we let this ride the trains for the following reasons:
1. Based on comment 3 and 4, this seems to affect a small number of users.
2. This patch is replacing a deprecated API, which is much more invasive than a simple crash fix.
3. Drag & drop functionality is important and if there are any regressions, it would be good to get them fixed in nightly rather than later.
Flags: needinfo?(spohl.mozilla.bugs)
Depends on: 1325770
Blocks: 1325906
Depends on: 1329997
Flags: qe-verify+
There are no crashes in Socorro on Fx > 52, in the last 3 months.
Marking as verified.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Depends on: 1358075
Depends on: 1328994
Depends on: 1361116
No longer depends on: 1328994
Depends on: 1358755
You need to log in before you can comment on or make changes to this bug.