Crash in IPCError-browser | SHMEM_CREATED_MESSAGE Payload error: message could not be deserialized

RESOLVED FIXED in Firefox 55

Status

()

defect
P1
critical
RESOLVED FIXED
2 years ago
11 months ago

People

(Reporter: ds, Assigned: spohl)

Tracking

({crash, regression})

54 Branch
mozilla60
Unspecified
macOS
Points:
---

Firefox Tracking Flags

(firefox-esr52 unaffected, firefox53 unaffected, firefox54 wontfix, firefox55 fixed, firefox58 wontfix, firefox59 wontfix, firefox60 fixed)

Details

(Whiteboard: tpi:+, crash signature, URL)

Attachments

(4 attachments, 2 obsolete attachments)

(Reporter)

Description

2 years ago
This bug was filed from the Socorro interface and is 
report bp-1575434e-83d5-440f-be91-3ba110170505.
=============================================================

Attempting to drag the image at the given URL reliably results in this crash. Dragging other random images seems okay, so it may have something to do with the large size of this particular one.
(Assignee)

Comment 1

2 years ago
Able to reproduce. Looking into it.
Assignee: nobody → spohl.mozilla.bugs
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
(Assignee)

Comment 2

2 years ago
The code in question was introduced in bug 1272018, but this started crashing as soon as bug 1268616 was fixed. :erahm and/or :nical, would you be able to have a look and redirect this as appropriate? Thank you!
Assignee: spohl.mozilla.bugs → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(erahm)
I can't reproduce on my desktop but from a quick look at the code I am pretty sure this is happening because we try to allocate a large shmem in here https://hg.mozilla.org/releases/mozilla-aurora/annotate/94b7e538af7d/dom/base/nsContentUtils.cpp#l8018 but we don't check the return value of mAllocator->AllocShmem(..). I would not put much trust in the state of the Shmem struct if AllocShmem returns false.
Flags: needinfo?(nical.bugzilla)
Assignee: nobody → nical.bugzilla
(Assignee)

Comment 4

2 years ago
(In reply to Nicolas Silva [:nical] from comment #3)
> I can't reproduce on my desktop

Just to clarify: I was able to reproduce on OSX 10.11 and macOS 10.12 on two separate MacBooks. Thank you for looking into it!
This patch wraps the Shmem in a Maybe<T> to make extra sure we don't forget to check that it was properly created before using it.
Attachment #8865926 - Flags: review?(bas)
Comment on attachment 8865926 [details] [diff] [review]
Don't use the shmem if allocation failed.

Review of attachment 8865926 [details] [diff] [review]:
-----------------------------------------------------------------

I doubt this is security sensitive.
Attachment #8865926 - Flags: review?(bas) → review+

Comment 7

2 years ago
Pushed by nsilva@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/54af3ab740d0
Don't use a Shmem we failed to allocate in nsContentUtils. r=Bas
Flags: needinfo?(erahm)
This affect beta 54. Once we have verified the fix on nightly we should request the uplift.

Comment 9

2 years ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/54af3ab740d0
Status: NEW → RESOLVED
Last Resolved: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55

Comment 10

2 years ago
no more reports are recorded after 55.0a1 build 20170514030207 - could you request beta uplift?
Flags: needinfo?(nical.bugzilla)
Keywords: crash, regression

Comment 11

2 years ago
in the meantime there was one crash from a nightly build with the patch: bp-8e69da37-8957-4b37-938c-f70450170524
Comment on attachment 8865926 [details] [diff] [review]
Don't use the shmem if allocation failed.

Approval Request Comment
[Feature/Bug causing the regression]:
[User impact if declined]: crashes when darg-n-dropping large images
[Is this code covered by automated tests?]: no.
[Has the fix been verified in Nightly?]: The crash volume has decreased a lot, although there is still one crash that was recorded with similar signature.
[Needs manual test from QE? If yes, steps to reproduce]: no.
[List of other uplifts needed for the feature/fix]: none.
[Is the change risky?]: no.
[Why is the change risky/not risky?]: Simple change which has been baking in nightly for a while.
[String changes made/needed]: none.
Flags: needinfo?(nical.bugzilla)
Attachment #8865926 - Flags: approval-mozilla-beta?
(In reply to [:philipp] from comment #11)
> in the meantime there was one crash from a nightly build with the patch:
> bp-8e69da37-8957-4b37-938c-f70450170524

Interesting. For this one, I can think of two reasons:
 - The size calculations above caused some overflow/underflow which meant allocation succeeded but the surface size is larger than the allocated size.
 - We failed to allocate the shmem but reported a successful allocation (if the shmem allocation has a bug like this, it is pretty bad).
Comment on attachment 8865926 [details] [diff] [review]
Don't use the shmem if allocation failed.

not that many crashes, and we're late in the beta 54 cycle
Attachment #8865926 - Flags: approval-mozilla-beta? → approval-mozilla-beta-
I'm still getting this crash, even with today's mozilla-central nightly.

STR

1) Visit https://upload.wikimedia.org/wikipedia/commons/6/61/Verbesina_encelioides_panoramic.jpg and wait for the image to finish loading.

2) Try to drag the image to the left (left-mouse-down and hold, drag left).

3) Crash.

I tested on an OS X 10.11.6 desktop, fully updated.  It isn't a laptop, and I was using a real mouse.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Dragging in other directions has the same result -- the tab crashes.
My STR from comment #15 also works (the tab crashes) with a trackpad (on a laptop), on both OS X 10.11.6 and 10.12.5.
The crashes don't happen if, after the image has finished loading, I first click on it once (to enlarge it).  Before that click, the cursor is a magnifying glass with a plus ("+") in it.  Afterwards the cursor is a magnifying glass with a minus ("-") in it.  I suspect these crashes now happen only with images that have two different (displayable) sizes, and then only at the smaller of the two sizes.
> and then only at the smaller of the two sizes

Oops, not true.  They happen (using my testcase, at least) at both sizes.
I still crash even with this bug's original testcase, https://i.redd.it/o84qgvq9sjvy.jpg.

Could some other change have landed in the meantime to unfix this bug?
Just to make sure people understand, I'm click-dragging on the image itself (not on any of its scrollbars).  No, this isn't supposed to work.  But it shouldn't crash, either :-)

Updated

2 years ago
Flags: needinfo?(nical.bugzilla)

Updated

2 years ago
Priority: -- → P2
Whiteboard: tpi:+
Steven, could you send me a link to a crash report of the bug still reproducing for you? There is an awful lot of places in the browser where dealing with very large allocations can potentially cause issues.
Flags: needinfo?(nical.bugzilla) → needinfo?(smichaud)
Sure.  Here are a couple made on my OS X 10.11.6 desktop (using a real mouse), and a couple more made on a MacBook Pro running macOS 10.12.5 (using the track pad).  I tested with today's mozilla-central nightly, using a fresh profile.

bp-560fb8ad-79c7-4c75-9f0e-8ca190170608
bp-c42c7e5c-95b3-4dc8-89bd-fe7c80170608
bp-32184de4-c731-4703-8633-b5a210170608
bp-be8bf134-78d3-4b44-8a97-59a870170608

Are you able to reproduce these crashes?  Do you have a Mac?  If not, you should at least borrow one.

Note that, as I mentioned above, you need to click on the image itself and try to drag it (not on any of its scrollbars).  Both this bug's original URL and the one I mentioned in comment #15 "work" for me (I crash with them):

https://i.redd.it/o84qgvq9sjvy.jpg
https://upload.wikimedia.org/wikipedia/commons/6/61/Verbesina_encelioides_panoramic.jpg
Flags: needinfo?(smichaud)
> bp-560fb8ad-79c7-4c75-9f0e-8ca190170608
> bp-c42c7e5c-95b3-4dc8-89bd-fe7c80170608
> bp-32184de4-c731-4703-8633-b5a210170608
> bp-be8bf134-78d3-4b44-8a97-59a870170608

I just looked at these and realized they're not much use -- they don't show any XUL symbols!!

That must be another bug.  I don't know if it happened recently.  It's been a while since I've played with Mozilla bug reports :-)
(In reply to Steven Michaud [:smichaud] (Retired) from comment #24)

That's bug 1371017.
Posted image Large image
Here's an example image you can drag to reproduce.
We came across this crash today while doing triage of a bug on Wikipedia:
https://en.wikipedia.org/wiki/Template:Iraqi_insurgency_detailed_map
(Assignee)

Comment 31

a year ago
(In reply to Adam Stevenson [:adamopenweb] from comment #30)
> We came across this crash today while doing triage of a bug on Wikipedia:
> https://en.wikipedia.org/wiki/Template:Iraqi_insurgency_detailed_map

I'm able to reproduce reliably with this image and a local build off of current trunk. Note however that this only crashes on a hidpi screen, such as a MacBook retina display. :nical or :glandium, any thoughts here? Frame 1 is https://hg.mozilla.org/mozilla-central/annotate/2320b7fd9266/dom/base/nsContentUtils.cpp#l8486

(lldb) bt
* thread #1, queue = 'com.apple.main-thread', stop reason = signal SIGTERM
  * frame #0: 0x00007fffc6f65ec9 libsystem_platform.dylib`_platform_memmove$VARIANT$Haswell + 41
    frame #1: 0x00000001139d494d XUL`(anonymous namespace)::GetSurfaceDataShmem::ReturnType (aSurface=0x0000000189dddf40, aLength=0x00007fff5e63eff8, aStride=0x00007fff5e63eff4, aContext=(mAllocator = 0x0000000101e75778))::GetSurfaceDataImpl<(anonymous namespace)::GetSurfaceDataShmem>(mozilla::gfx::DataSourceSurface*, unsigned long*, int*, (anonymous namespace)::GetSurfaceDataShmem) at nsContentUtils.cpp:8486
    frame #2: 0x00000001139d4167 XUL`nsContentUtils::GetSurfaceData(aSurface=0x0000000189dddf40, aLength=0x00007fff5e63eff8, aStride=0x00007fff5e63eff4, aAllocator=0x0000000101e75020) at nsContentUtils.cpp:8515
    frame #3: 0x00000001139d2d49 XUL`nsContentUtils::TransferableToIPCTransferable(aTransferable=0x000000014526ba40, aIPCDataTransfer=0x000000018ad8b498, aInSyncMessage=false, aChild=0x0000000101e75020, aParent=0x0000000000000000) at nsContentUtils.cpp:8262
    frame #4: 0x00000001139d22a7 XUL`nsContentUtils::TransferablesToIPCTransferables(aTransferables=0x000000018ad641e0, aIPC=0x00007fff5e63f6a8, aInSyncMessage=false, aChild=0x0000000101e75020, aParent=0x0000000000000000) at nsContentUtils.cpp:8043
    frame #5: 0x00000001165a1af1 XUL`nsDragServiceProxy::InvokeDragSessionImpl(this=0x000000018af10030, aArrayTransferables=0x000000018ad641e0, aRegion=0x0000000000000000, aActionType=7) at nsDragServiceProxy.cpp:59
    frame #6: 0x000000011653983a XUL`nsBaseDragService::InvokeDragSession(this=0x000000018af10030, aDOMNode=0x0000000101e624a0, aPrincipalURISpec=0x00007fff5e63fe68, aTransferableArray=0x000000018ad641e0, aDragRgn=0x0000000000000000, aActionType=7, aContentPolicyType=37) at nsBaseDragService.cpp:247
    frame #7: 0x0000000116539c73 XUL`nsBaseDragService::InvokeDragSessionWithImage(this=0x000000018af10030, aDOMNode=0x0000000101e624a0, aPrincipalURISpec=0x00007fff5e63fe68, aTransferableArray=0x000000018ad641e0, aRegion=0x0000000000000000, aActionType=7, aImage=0x0000000000000000, aImageX=0, aImageY=0, aDragEvent=0x0000000101e32660, aDataTransfer=0x000000018ab1a0b0) at nsBaseDragService.cpp:285
    frame #8: 0x00000001151be4b6 XUL`mozilla::EventStateManager::DoDefaultDragStart(this=0x00000001452d6f00, aPresContext=0x0000000189d8d800, aDragEvent=0x00007fff5e63fd58, aDataTransfer=0x0000000106f1f200, aDragTarget=0x0000000101e62420, aSelection=0x0000000000000000, aPrincipalURISpec=0x00007fff5e63fe68) at EventStateManager.cpp:2146
    frame #9: 0x00000001151b66ae XUL`mozilla::EventStateManager::GenerateDragGesture(this=0x00000001452d6f00, aPresContext=0x0000000189d8d800, aEvent=0x00007fff5e640820) at EventStateManager.cpp:1935
    frame #10: 0x00000001151b5005 XUL`mozilla::EventStateManager::PreHandleEvent(this=0x00000001452d6f00, aPresContext=0x0000000189d8d800, aEvent=0x00007fff5e640820, aTargetFrame=0x000000018bcfe798, aTargetContent=0x0000000101e62420, aStatus=0x00007fff5e640ed8) at EventStateManager.cpp:723
    frame #11: 0x0000000116aa59fb XUL`mozilla::PresShell::HandleEventInternal(this=0x000000010df36000, aEvent=0x00007fff5e640820, aStatus=0x00007fff5e640ed8, aIsHandlingNativeEvent=false) at PresShell.cpp:7739
    frame #12: 0x0000000116aa5014 XUL`mozilla::PresShell::HandleEventWithTarget(this=0x000000010df36000, aEvent=0x00007fff5e640820, aFrame=0x000000018bcfe798, aContent=0x0000000101e62420, aStatus=0x00007fff5e640ed8, aIsHandlingNativeEvent=true, aTargetContent=0x00007fff5e640bf8) at PresShell.cpp:7560
    frame #13: 0x00000001152547cb XUL`mozilla::PointerEventHandler::DispatchPointerFromMouseOrTouch(aShell=0x000000010df36000, aFrame=0x000000018bcfe798, aContent=0x0000000101e62420, aEvent=0x00007fff5e641018, aDontRetargetEvents=false, aStatus=0x00007fff5e640ed8, aTargetContent=0x00007fff5e640bf8) at PointerEventHandler.cpp:532
    frame #14: 0x0000000116aa420d XUL`mozilla::PresShell::HandleEvent(this=0x000000010df36000, aFrame=0x000000018b639020, aEvent=0x00007fff5e641018, aDontRetargetEvents=false, aEventStatus=0x00007fff5e640ed8) at PresShell.cpp:7339
    frame #15: 0x0000000116533b40 XUL`nsViewManager::DispatchEvent(this=0x0000000104a0ce40, aEvent=0x00007fff5e641018, aView=0x000000010e972000, aStatus=0x00007fff5e640ed8) at nsViewManager.cpp:812
    frame #16: 0x0000000116533813 XUL`nsView::HandleEvent(this=0x000000010e972000, aEvent=0x00007fff5e641018, aUseAttachedEvents=false) at nsView.cpp:1139
    frame #17: 0x0000000116571744 XUL`mozilla::widget::PuppetWidget::DispatchEvent(this=0x0000000107770800, aEvent=0x00007fff5e641018, aStatus=0x00007fff5e640f94) at PuppetWidget.cpp:410
    frame #18: 0x00000001134769c1 XUL`mozilla::layers::APZCCallbackHelper::DispatchWidgetEvent(aEvent=0x00007fff5e641018) at APZCCallbackHelper.cpp:499
    frame #19: 0x0000000116095525 XUL`mozilla::dom::TabChild::DispatchWidgetEventViaAPZ(this=0x000000010de2c800, aEvent=0x00007fff5e641018) at TabChild.cpp:1798
    frame #20: 0x0000000116094a68 XUL`mozilla::dom::TabChild::HandleRealMouseButtonEvent(this=0x000000010de2c800, aEvent=0x000000018ae7b200, aGuid=0x00007fff5e641190, aInputBlockId=0x00007fff5e641188) at TabChild.cpp:1738
    frame #21: 0x0000000116056ec0 XUL`mozilla::dom::TabChild::ProcessPendingCoalescedMouseDataAndDispatchEvents(this=0x000000010de2c800) at TabChild.cpp:1580
    frame #22: 0x0000000116056c3d XUL`mozilla::dom::CoalescedMouseMoveFlusher::WillRefresh(this=0x00000001066bafd0, aTime=(mValue = 53322539026879)) at CoalescedMouseData.cpp:77
    frame #23: 0x0000000116a47455 XUL`nsRefreshDriver::Tick(this=0x0000000189d8e000, aNowEpoch=1518794983143588, aNowTime=(mValue = 53322539026879)) at nsRefreshDriver.cpp:1886
    frame #24: 0x0000000116a4e9cf XUL`mozilla::RefreshDriverTimer::TickDriver(driver=0x0000000189d8e000, jsnow=1518794983143588, now=(mValue = 53322539026879)) at nsRefreshDriver.cpp:340
    frame #25: 0x0000000116a4e7d3 XUL`mozilla::RefreshDriverTimer::TickRefreshDrivers(this=0x000000010dfed160, aJsNow=1518794983143588, aNow=(mValue = 53322539026879), aDrivers=0x000000010dfed190) at nsRefreshDriver.cpp:310
    frame #26: 0x0000000116a4e68f XUL`mozilla::RefreshDriverTimer::Tick(this=0x000000010dfed160, jsnow=1518794983143588, now=(mValue = 53322539026879)) at nsRefreshDriver.cpp:332
    frame #27: 0x0000000116a5137d XUL`mozilla::VsyncRefreshDriverTimer::RunRefreshDrivers(this=0x000000010dfed160, aTimeStamp=(mValue = 53322539026879)) at nsRefreshDriver.cpp:773
    frame #28: 0x0000000116a5041c XUL`mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::TickRefreshDriver(this=0x00000001065fcb00, aVsyncTimestamp=(mValue = 53322539026879)) at nsRefreshDriver.cpp:686
    frame #29: 0x0000000116a50059 XUL`mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::NotifyVsync(this=0x00000001065fcb00, aVsyncTimestamp=(mValue = 53322539026879)) at nsRefreshDriver.cpp:587
    frame #30: 0x0000000116ef0a1e XUL`mozilla::layout::VsyncChild::RecvNotify(this=0x0000000101e023d0, aVsyncTimestamp=0x00007fff5e641d28) at VsyncChild.cpp:68
    frame #31: 0x0000000112448b6b XUL`mozilla::layout::PVsyncChild::OnMessageReceived(this=0x0000000101e023d0, msg__=0x000000010df40170) at PVsyncChild.cpp:155
    frame #32: 0x00000001122e83c6 XUL`mozilla::ipc::PBackgroundChild::OnMessageReceived(this=0x0000000106628800, msg__=0x000000010df40170) at PBackgroundChild.cpp:1812
    frame #33: 0x0000000111f22543 XUL`mozilla::ipc::MessageChannel::DispatchAsyncMessage(this=0x0000000106628938, aMsg=0x000000010df40170) at MessageChannel.cpp:2110
    frame #34: 0x0000000111f20bcb XUL`mozilla::ipc::MessageChannel::DispatchMessage(this=0x0000000106628938, aMsg=0x000000010df40170) at MessageChannel.cpp:2040
    frame #35: 0x0000000111f215f3 XUL`mozilla::ipc::MessageChannel::RunMessage(this=0x0000000106628938, aTask=0x000000010df40110) at MessageChannel.cpp:1886
    frame #36: 0x0000000111f21d08 XUL`mozilla::ipc::MessageChannel::MessageTask::Run(this=0x000000010df40110) at MessageChannel.cpp:1919
    frame #37: 0x000000011140f973 XUL`nsThread::ProcessNextEvent(this=0x0000000101e325b0, aMayWait=false, aResult=0x00007fff5e642d83) at nsThread.cpp:1040
    frame #38: 0x000000011143146c XUL`NS_ProcessPendingEvents(aThread=0x0000000101e325b0, aTimeout=10) at nsThreadUtils.cpp:459
    frame #39: 0x000000011659a67e XUL`nsBaseAppShell::NativeEventCallback(this=0x000000010660c280) at nsBaseAppShell.cpp:98
    frame #40: 0x000000011663b101 XUL`nsAppShell::ProcessGeckoEvents(aInfo=0x000000010660c280) at nsAppShell.mm:436
    frame #41: 0x00007fffb15ec321 CoreFoundation`__CFRUNLOOP_IS_CALLING_OUT_TO_A_SOURCE0_PERFORM_FUNCTION__ + 17
    frame #42: 0x00007fffb15cd21d CoreFoundation`__CFRunLoopDoSources0 + 557
    frame #43: 0x00007fffb15cc716 CoreFoundation`__CFRunLoopRun + 934
    frame #44: 0x00007fffb15cc114 CoreFoundation`CFRunLoopRunSpecific + 420
    frame #45: 0x00007fffb0b2cebc HIToolbox`RunCurrentEventLoopInMode + 240
    frame #46: 0x00007fffb0b2ccf1 HIToolbox`ReceiveNextEventCommon + 432
    frame #47: 0x00007fffb0b2cb26 HIToolbox`_BlockUntilNextEventMatchingListInModeWithFilter + 71
    frame #48: 0x00007fffaf0c3a54 AppKit`_DPSNextEvent + 1120
    frame #49: 0x00007fffaf83f7ee AppKit`-[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 2796
    frame #50: 0x0000000116639ae4 XUL`::-[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:](self=0x000000010660c700, _cmd="nextEventMatchingMask:untilDate:inMode:dequeue:", mask=18446744073709551615, expiration=4001-01-01 00:00:00 UTC, mode="kCFRunLoopDefaultMode", flag=YES) at nsAppShell.mm:158
    frame #51: 0x00007fffaf0b83db AppKit`-[NSApplication run] + 926
    frame #52: 0x000000011663bd27 XUL`nsAppShell::Run(this=0x000000010660c280) at nsAppShell.mm:715
    frame #53: 0x0000000119ad6660 XUL`XRE_RunAppShell() at nsEmbedFunctions.cpp:892
    frame #54: 0x0000000111f27401 XUL`mozilla::ipc::MessagePumpForChildProcess::Run(this=0x0000000101e5e3d0, aDelegate=0x00007fff5e644a80) at MessagePump.cpp:269
    frame #55: 0x0000000111dfce25 XUL`MessageLoop::RunInternal(this=0x00007fff5e644a80) at message_loop.cc:326
    frame #56: 0x0000000111dfcd85 XUL`MessageLoop::RunHandler(this=0x00007fff5e644a80) at message_loop.cc:319
    frame #57: 0x0000000111dfcd2d XUL`MessageLoop::Run(this=0x00007fff5e644a80) at message_loop.cc:299
    frame #58: 0x0000000119ad5c7d XUL`XRE_InitChildProcess(aArgc=16, aArgv=0x00007fff5e644d40, aChildData=0x00007fff5e644cc8) at nsEmbedFunctions.cpp:718
    frame #59: 0x0000000119ae3427 XUL`mozilla::BootstrapImpl::XRE_InitChildProcess(this=0x0000000101e041b0, argc=19, argv=0x00007fff5e644d40, aChildData=0x00007fff5e644cc8) at Bootstrap.cpp:69
    frame #60: 0x00000001015bb35c plugin-container`content_process_main(bootstrap=0x0000000101e041b0, argc=19, argv=0x00007fff5e644d40) at plugin-container.cpp:63
    frame #61: 0x00000001015bb415 plugin-container`main(argc=20, argv=0x00007fff5e644d40) at MozillaRuntimeMain.cpp:25
    frame #62: 0x00000001015bb2d4 plugin-container`start + 52
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(mh+mozilla)
For what it's worth, my STR from comment #15 still "works":

bp-5cfb0ce8-cfa8-46b9-833e-100300180216

This is on a desktop Mac, with a non-HDPI display.
I wonder if the `bufLen` code [1] should being using a CheckedInt (particularly size.width * BytesPerPixel)?

[1] https://searchfox.org/mozilla-central/rev/74b7ffee403c7ffd05b8b476c411cbf11d134eb9/dom/base/nsContentUtils.cpp#8429
More information about the crash would be useful, like register contents and disassembly of the code that crashes.
Flags: needinfo?(mh+mozilla) → needinfo?(spohl.mozilla.bugs)
(Assignee)

Comment 35

a year ago
(lldb) register read
General Purpose Registers:
       rax = 0x00000001a1301000
       rbx = 0x0000000118a595f0  XUL`nsBaseDragService::InvokeDragSession(nsIDOMNode*, nsTSubstring<char> const&, nsIArray*, nsIScriptableRegion*, unsigned int, unsigned int) at nsBaseDragService.cpp:227
       rcx = 0x0000000008340a40
       rdx = 0x0000000008bdea40
       rdi = 0x00000001a1b9f000
       rsi = 0x000000013579e000
       rbp = 0x00007fff5c33c980
       rsp = 0x00007fff5c33c980
        r8 = 0x0000000000000000
        r9 = 0x0000000008bdea41
       r10 = 0x0000000000000000
       r11 = 0x000000006c401000
       r12 = 0x00000001073f0700
       r13 = 0x00007fffb3f48642  CoreFoundation`__CF120290
       r14 = 0x0000000000000000
       r15 = 0x0000000000000088
       rip = 0x00007fffae8abec9  libsystem_platform.dylib`_platform_memmove$VARIANT$Haswell + 41
    rflags = 0x0000000000010202
        cs = 0x000000000000002b
        fs = 0x0000000000000000
        gs = 0x0000000000000000

(lldb) disassemble --frame
libsystem_platform.dylib`_platform_memmove$VARIANT$Haswell:
    0x7fffae8abea0 <+0>:   pushq  %rbp
    0x7fffae8abea1 <+1>:   movq   %rsp, %rbp
    0x7fffae8abea4 <+4>:   movq   %rdi, %r11
    0x7fffae8abea7 <+7>:   subq   %rsi, %r11
    0x7fffae8abeaa <+10>:  movq   %rdi, %rax
    0x7fffae8abead <+13>:  cmpq   %rdx, %r11
    0x7fffae8abeb0 <+16>:  jb     0x7fffae8abecd            ; <+45>
    0x7fffae8abeb2 <+18>:  cmpq   $0x60, %rdx
    0x7fffae8abeb6 <+22>:  jbe    0x7fffae8abee7            ; <+71>
    0x7fffae8abeb8 <+24>:  cmpq   $0x4000, %rdx             ; imm = 0x4000 
    0x7fffae8abebf <+31>:  jb     0x7fffae8abf80            ; <+224>
    0x7fffae8abec5 <+37>:  movq   %rdx, %rcx
    0x7fffae8abec8 <+40>:  cld    
->  0x7fffae8abec9 <+41>:  rep    
    0x7fffae8abeca <+42>:  movsb  (%rsi), %es:(%rdi)
    0x7fffae8abecb <+43>:  popq   %rbp
    0x7fffae8abecc <+44>:  retq   
    0x7fffae8abecd <+45>:  cmpq   %rdi, %rsi
    0x7fffae8abed0 <+48>:  je     0x7fffae8abecb            ; <+43>
    0x7fffae8abed2 <+50>:  addq   %rdx, %rsi
    0x7fffae8abed5 <+53>:  addq   %rdx, %rdi
    0x7fffae8abed8 <+56>:  cmpq   $0x60, %rdx
    0x7fffae8abedc <+60>:  jb     0x7fffae8ac03e            ; <+414>
    0x7fffae8abee2 <+66>:  jmp    0x7fffae8ac060            ; <+448>
    0x7fffae8abee7 <+71>:  cmpq   $0x10, %rdx
    0x7fffae8abeeb <+75>:  jbe    0x7fffae8abf37            ; <+151>
    0x7fffae8abeed <+77>:  movups -0x10(%rsi,%rdx), %xmm4
    0x7fffae8abef2 <+82>:  subq   $0x20, %rdx
    0x7fffae8abef6 <+86>:  jbe    0x7fffae8abf25            ; <+133>
    0x7fffae8abef8 <+88>:  vmovups (%rsi), %ymm1
    0x7fffae8abefc <+92>:  vmovups %ymm1, (%rdi)
    0x7fffae8abf00 <+96>:  addq   $0x20, %rsi
    0x7fffae8abf04 <+100>: addq   $0x20, %rdi
    0x7fffae8abf08 <+104>: subq   $0x20, %rdx
    0x7fffae8abf0c <+108>: jb     0x7fffae8abf22            ; <+130>
    0x7fffae8abf0e <+110>: vmovups (%rsi), %ymm2
    0x7fffae8abf12 <+114>: vmovups %ymm2, (%rdi)
    0x7fffae8abf16 <+118>: addq   $0x20, %rsi
    0x7fffae8abf1a <+122>: addq   $0x20, %rdi
    0x7fffae8abf1e <+126>: subq   $0x20, %rdx
    0x7fffae8abf22 <+130>: vzeroupper 
    0x7fffae8abf25 <+133>: addq   $0x10, %rdx
    0x7fffae8abf29 <+137>: jle    0x7fffae8abf31            ; <+145>
    0x7fffae8abf2b <+139>: movups (%rsi), %xmm3
    0x7fffae8abf2e <+142>: movups %xmm3, (%rdi)
    0x7fffae8abf31 <+145>: movups %xmm4, (%rdi,%rdx)
    0x7fffae8abf35 <+149>: popq   %rbp
    0x7fffae8abf36 <+150>: retq   
    0x7fffae8abf37 <+151>: subq   $0x8, %rdx
    0x7fffae8abf3b <+155>: jb     0x7fffae8abf4d            ; <+173>
    0x7fffae8abf3d <+157>: movq   (%rsi), %rcx
    0x7fffae8abf40 <+160>: movq   (%rsi,%rdx), %r8
    0x7fffae8abf44 <+164>: movq   %rcx, (%rdi)
    0x7fffae8abf47 <+167>: movq   %r8, (%rdi,%rdx)
    0x7fffae8abf4b <+171>: popq   %rbp
    0x7fffae8abf4c <+172>: retq   
    0x7fffae8abf4d <+173>: addq   $0x8, %rdx
    0x7fffae8abf51 <+177>: je     0x7fffae8abf78            ; <+216>
    0x7fffae8abf53 <+179>: xorq   %r8, %r8
    0x7fffae8abf56 <+182>: movb   (%rsi,%r8), %cl
    0x7fffae8abf5a <+186>: movb   %cl, (%rdi,%r8)
    0x7fffae8abf5e <+190>: subq   $0x1, %rdx
    0x7fffae8abf62 <+194>: je     0x7fffae8abf78            ; <+216>
    0x7fffae8abf64 <+196>: movb   0x1(%rsi,%r8), %cl
    0x7fffae8abf69 <+201>: movb   %cl, 0x1(%rdi,%r8)
    0x7fffae8abf6e <+206>: addq   $0x2, %r8
    0x7fffae8abf72 <+210>: subq   $0x1, %rdx
    0x7fffae8abf76 <+214>: jne    0x7fffae8abf56            ; <+182>
    0x7fffae8abf78 <+216>: popq   %rbp
    0x7fffae8abf79 <+217>: retq   
    0x7fffae8abf7a <+218>: nopw   (%rax,%rax)
    0x7fffae8abf80 <+224>: vmovups (%rsi), %ymm0
    0x7fffae8abf84 <+228>: addq   $0x20, %rdi
    0x7fffae8abf88 <+232>: andq   $-0x20, %rdi
    0x7fffae8abf8c <+236>: movq   %rdi, %rcx
    0x7fffae8abf8f <+239>: subq   %rax, %rcx
    0x7fffae8abf92 <+242>: addq   %rcx, %rsi
    0x7fffae8abf95 <+245>: subq   %rcx, %rdx
    0x7fffae8abf98 <+248>: vmovups (%rsi), %ymm1
    0x7fffae8abf9c <+252>: vmovups %ymm0, (%rax)
    0x7fffae8abfa0 <+256>: vmovups 0x20(%rsi), %ymm2
    0x7fffae8abfa5 <+261>: addq   $0x40, %rsi
    0x7fffae8abfa9 <+265>: subq   $0x80, %rdx
    0x7fffae8abfb0 <+272>: jbe    0x7fffae8ac000            ; <+352>
    0x7fffae8abfb2 <+274>: testq  $0x1f, %rsi
    0x7fffae8abfb9 <+281>: je     0x7fffae8abfe0            ; <+320>
    0x7fffae8abfbb <+283>: vmovaps %ymm1, (%rdi)
    0x7fffae8abfbf <+287>: vmovaps %ymm2, 0x20(%rdi)
    0x7fffae8abfc4 <+292>: addq   $0x40, %rdi
    0x7fffae8abfc8 <+296>: vmovups (%rsi), %ymm1
    0x7fffae8abfcc <+300>: vmovups 0x20(%rsi), %ymm2
    0x7fffae8abfd1 <+305>: addq   $0x40, %rsi
    0x7fffae8abfd5 <+309>: subq   $0x40, %rdx
    0x7fffae8abfd9 <+313>: ja     0x7fffae8abfbb            ; <+283>
    0x7fffae8abfdb <+315>: jmp    0x7fffae8ac000            ; <+352>
    0x7fffae8abfdd <+317>: nopl   (%rax)
    0x7fffae8abfe0 <+320>: vmovaps %ymm1, (%rdi)
    0x7fffae8abfe4 <+324>: vmovaps %ymm2, 0x20(%rdi)
    0x7fffae8abfe9 <+329>: addq   $0x40, %rdi
    0x7fffae8abfed <+333>: vmovaps (%rsi), %ymm1
    0x7fffae8abff1 <+337>: vmovaps 0x20(%rsi), %ymm2
    0x7fffae8abff6 <+342>: addq   $0x40, %rsi
    0x7fffae8abffa <+346>: subq   $0x40, %rdx
    0x7fffae8abffe <+350>: ja     0x7fffae8abfe0            ; <+320>
    0x7fffae8ac000 <+352>: vmovups (%rsi,%rdx), %ymm3
    0x7fffae8ac005 <+357>: vmovups 0x20(%rsi,%rdx), %ymm4
    0x7fffae8ac00b <+363>: vmovaps %ymm1, (%rdi)
    0x7fffae8ac00f <+367>: vmovaps %ymm2, 0x20(%rdi)
    0x7fffae8ac014 <+372>: vmovups %ymm3, 0x40(%rdi,%rdx)
    0x7fffae8ac01a <+378>: vmovups %ymm4, 0x60(%rdi,%rdx)
    0x7fffae8ac020 <+384>: popq   %rbp
    0x7fffae8ac021 <+385>: vzeroupper 
    0x7fffae8ac024 <+388>: retq   
    0x7fffae8ac025 <+389>: nopw   %cs:(%rax,%rax)
    0x7fffae8ac030 <+400>: subq   $0x8, %rsi
    0x7fffae8ac034 <+404>: movq   (%rsi), %rcx
    0x7fffae8ac037 <+407>: subq   $0x8, %rdi
    0x7fffae8ac03b <+411>: movq   %rcx, (%rdi)
    0x7fffae8ac03e <+414>: subq   $0x8, %rdx
    0x7fffae8ac042 <+418>: jae    0x7fffae8ac030            ; <+400>
    0x7fffae8ac044 <+420>: addq   $0x8, %rdx
    0x7fffae8ac048 <+424>: je     0x7fffae8ac05c            ; <+444>
    0x7fffae8ac04a <+426>: subq   $0x1, %rsi
    0x7fffae8ac04e <+430>: movb   (%rsi), %cl
    0x7fffae8ac050 <+432>: subq   $0x1, %rdi
    0x7fffae8ac054 <+436>: movb   %cl, (%rdi)
    0x7fffae8ac056 <+438>: subq   $0x1, %rdx
    0x7fffae8ac05a <+442>: jne    0x7fffae8ac04a            ; <+426>
    0x7fffae8ac05c <+444>: popq   %rbp
    0x7fffae8ac05d <+445>: retq   
    0x7fffae8ac05e <+446>: nop    
    0x7fffae8ac060 <+448>: vmovups -0x20(%rsi), %ymm0
    0x7fffae8ac065 <+453>: movq   %rdi, %r11
    0x7fffae8ac068 <+456>: subq   $0x1, %rdi
    0x7fffae8ac06c <+460>: andq   $-0x20, %rdi
    0x7fffae8ac070 <+464>: movq   %r11, %rcx
    0x7fffae8ac073 <+467>: subq   %rdi, %rcx
    0x7fffae8ac076 <+470>: subq   %rcx, %rsi
    0x7fffae8ac079 <+473>: subq   %rcx, %rdx
    0x7fffae8ac07c <+476>: vmovups -0x20(%rsi), %ymm1
    0x7fffae8ac081 <+481>: vmovups %ymm0, -0x20(%r11)
    0x7fffae8ac087 <+487>: vmovups -0x40(%rsi), %ymm2
    0x7fffae8ac08c <+492>: subq   $0x40, %rsi
    0x7fffae8ac090 <+496>: subq   $0x80, %rdx
    0x7fffae8ac097 <+503>: jbe    0x7fffae8ac102            ; <+610>
    0x7fffae8ac099 <+505>: testq  $0x1f, %rsi
    0x7fffae8ac0a0 <+512>: je     0x7fffae8ac0e0            ; <+576>
    0x7fffae8ac0a2 <+514>: vmovaps %ymm1, -0x20(%rdi)
    0x7fffae8ac0a7 <+519>: vmovaps %ymm2, -0x40(%rdi)
    0x7fffae8ac0ac <+524>: subq   $0x40, %rdi
    0x7fffae8ac0b0 <+528>: vmovups -0x20(%rsi), %ymm1
    0x7fffae8ac0b5 <+533>: vmovups -0x40(%rsi), %ymm2
    0x7fffae8ac0ba <+538>: subq   $0x40, %rsi
    0x7fffae8ac0be <+542>: subq   $0x40, %rdx
    0x7fffae8ac0c2 <+546>: ja     0x7fffae8ac0a2            ; <+514>
    0x7fffae8ac0c4 <+548>: jmp    0x7fffae8ac102            ; <+610>
    0x7fffae8ac0c6 <+550>: nopw   %cs:(%rax,%rax)
    0x7fffae8ac0d5 <+565>: nopw   %cs:(%rax,%rax)
    0x7fffae8ac0e0 <+576>: vmovaps %ymm1, -0x20(%rdi)
    0x7fffae8ac0e5 <+581>: vmovaps %ymm2, -0x40(%rdi)
    0x7fffae8ac0ea <+586>: subq   $0x40, %rdi
    0x7fffae8ac0ee <+590>: vmovaps -0x20(%rsi), %ymm1
    0x7fffae8ac0f3 <+595>: vmovaps -0x40(%rsi), %ymm2
    0x7fffae8ac0f8 <+600>: subq   $0x40, %rsi
    0x7fffae8ac0fc <+604>: subq   $0x40, %rdx
    0x7fffae8ac100 <+608>: ja     0x7fffae8ac0e0            ; <+576>
    0x7fffae8ac102 <+610>: subq   %rdx, %rsi
    0x7fffae8ac105 <+613>: vmovups -0x20(%rsi), %ymm3
    0x7fffae8ac10a <+618>: vmovups -0x40(%rsi), %ymm4
    0x7fffae8ac10f <+623>: vmovaps %ymm1, -0x20(%rdi)
    0x7fffae8ac114 <+628>: vmovaps %ymm2, -0x40(%rdi)
    0x7fffae8ac119 <+633>: vmovups %ymm3, 0x20(%rax)
    0x7fffae8ac11e <+638>: vmovups %ymm4, (%rax)
    0x7fffae8ac122 <+642>: popq   %rbp
    0x7fffae8ac123 <+643>: vzeroupper 
    0x7fffae8ac126 <+646>: retq   
    0x7fffae8ac127 <+647>: nop    
    0x7fffae8ac128 <+648>: nop    
    0x7fffae8ac129 <+649>: nop    
    0x7fffae8ac12a <+650>: nop    
    0x7fffae8ac12b <+651>: nop    
    0x7fffae8ac12c <+652>: nop    
    0x7fffae8ac12d <+653>: nop    
    0x7fffae8ac12e <+654>: nop    
    0x7fffae8ac12f <+655>: nop    
    0x7fffae8ac130 <+656>: nop    
    0x7fffae8ac131 <+657>: nop    
    0x7fffae8ac132 <+658>: nop    
    0x7fffae8ac133 <+659>: nop    
    0x7fffae8ac134 <+660>: nop    
    0x7fffae8ac135 <+661>: nop    
    0x7fffae8ac136 <+662>: nop    
    0x7fffae8ac137 <+663>: nop    
    0x7fffae8ac138 <+664>: nop    
    0x7fffae8ac139 <+665>: nop    
    0x7fffae8ac13a <+666>: nop    
    0x7fffae8ac13b <+667>: nop    
    0x7fffae8ac13c <+668>: nop    
    0x7fffae8ac13d <+669>: nop    
    0x7fffae8ac13e <+670>: nop    
    0x7fffae8ac13f <+671>: nop
Flags: needinfo?(spohl.mozilla.bugs) → needinfo?(mh+mozilla)
So it wants to do a memmove of 146MB and fails after 9M. What is the size it's supposed to be moving? What is the size of the in and out buffers? (if they were malloc'ed, you can call Debug::jemalloc_ptr_info, see https://hg.mozilla.org/mozilla-central/rev/b379388eaa92)
Flags: needinfo?(mh+mozilla) → needinfo?(spohl.mozilla.bugs)
(Assignee)

Comment 37

a year ago
The size it wants to move is indeed 146,664,000 (plus one to null terminate). I'm able to confirm that on lo-dpi, it is exactly 4 times less, as expected. It looks like we're dealing with shared memory here. I wasn't able to use Debug::jemalloc_ptr_info to get the buffer sizes (size was always returned as 0) and I haven't finished tracing back how/where the shared memory segments are created. While stepping through the code however I noticed Shmem::AssertInvariants(). There, we access the first and last byte of the shared memory segment to ensure that the segment is owned by the current process. This does indeed pass. However, just now when I tried accessing every byte of map.mData I crashed when accessing byte number 8,397,268. When I tried accessing every byte of GetSurfaceDataContext::GetBuffer(surfaceData), I crashed at byte number 4,583,822.

Setting n-i in case this is helpful. I will look into the creation of these buffers next.
Flags: needinfo?(spohl.mozilla.bugs)
Flags: needinfo?(nical.bugzilla)
Flags: needinfo?(mh+mozilla)
You should probably take a look at the vm_map output. In all likeliness, those memory segments are not entirely valid (and the check for the end happens to pass because something is mapped there). jemalloc_ptr_info probably returns 0 because that's unknown to jemalloc.
Flags: needinfo?(mh+mozilla)
(Assignee)

Comment 39

a year ago
Quick update:

Here is some seemingly relevant console output that I was able to extract from all the other console spew:

[Parent 65082, Main Thread] WARNING: Failed to map shared memory (146673664 bytes) into 103, port 21cb. (os/kern) invalid argument (4)
: file /Users/spohl/Documents/mozilla-central/ipc/glue/SharedMemoryBasic_mach.mm, line 589
###!!! [Parent][DispatchAsyncMessage] Error: SHMEM_CREATED_MESSAGE Payload error: message could not be deserialized
###!!! [Parent][MessageChannel] Error: (msgtype=0x150081,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv
###!!! [Parent][MessageChannel] Error: (msgtype=0x150081,name=PBrowser::Msg_Destroy) Channel error: cannot send/recv

I'm now working off of the assumption that:
1. The child requests shared memory to write surface data to from the parent process. [1]
2. The parent fails to map the requested shared memory, i.e. the call to mach_vm_map fails. [2]
3. The child either doesn't wait to receive the error response from the parent, or fails to properly handle the error and ends up using an invalid buffer.

The questions I need to confirm are:
1. Does the child indeed rely on the parent for shared memory?
2. Should the mach_vm_map call succeed? Or should we expect that a size of 146,673,664 bytes can indeed fail? If failure is expected, what is the correct way to handle this situation? Should the drag session be interrupted?
3. How are mach_vm_map failures currently handled and propagated to the child? Is the child waiting properly for the response?

This would cover the problematic buffer obtained from GetSurfaceDataContext::GetBuffer(surfaceData). I will still have to figure out why map.mData is seemingly not pointing to a valid buffer either.

[1] https://dxr.mozilla.org/mozilla-central/rev/7208b6a7b11c3ed8c87a7f17c9c30a8f9583e791/dom/base/nsContentUtils.cpp#8504
[2] https://dxr.mozilla.org/mozilla-central/rev/7208b6a7b11c3ed8c87a7f17c9c30a8f9583e791/ipc/glue/SharedMemoryBasic_mach.mm#585-591
(Assignee)

Comment 40

a year ago
I've been able to trace this further, but I'm stumped at the moment:
1. The child creates the shared memory segment with the proper size and maps it.[1]
2. To map the segment, the child calls mach_vm_map.[2] This call succeeds.
3. If I read every byte of the shared memory segment in the child at this point, the process does not crash.
4. The child then sends a ShmemCreated message to the parent.[3]
5. The parent receives the ShmemCreated message and attempts to read and map the segment.[4]
6. This time, on the parent side, the mach_vm_map[2] call fails. The parent attempts to notify the child of this fact[5], but the child is not waiting for this message and instead attempts to memcpy the surface data into the shared memory buffer.[6]
7. On the child side, the shared memory buffer is no longer valid at this point and manually reading every byte of it, or attempting to memcpy surface data into it[6] will end up crashing the child process.

So it looks like the shared memory segment is getting corrupted between creating/mapping it and attempting to write to it. The one thing that happens between these two steps is the fact that we send the ShmemCreated message to the parent, which attempts to read/map the segment, which fails.

[1] https://dxr.mozilla.org/mozilla-central/source/ipc/glue/Shmem.cpp#89
[2] https://dxr.mozilla.org/mozilla-central/source/ipc/glue/SharedMemoryBasic_mach.mm#585-586
[3] https://dxr.mozilla.org/mozilla-central/source/ipc/glue/ProtocolUtils.cpp#724-729
[4] https://dxr.mozilla.org/mozilla-central/source/ipc/glue/Shmem.cpp#117
[5] https://dxr.mozilla.org/mozilla-central/source/obj-x86_64-pc-linux-gnu/ipc/ipdl/PContentParent.cpp#7230
[6] https://dxr.mozilla.org/mozilla-central/source/dom/base/nsContentUtils.cpp#8506
Flags: needinfo?(mh+mozilla)
(In reply to Stephen A Pohl [:spohl] from comment #40)
> I've been able to trace this further, but I'm stumped at the moment:
> 1. The child creates the shared memory segment with the proper size and maps
> it.[1]
> 2. To map the segment, the child calls mach_vm_map.[2] This call succeeds.
> 3. If I read every byte of the shared memory segment in the child at this
> point, the process does not crash.
> 4. The child then sends a ShmemCreated message to the parent.[3]
> 5. The parent receives the ShmemCreated message and attempts to read and map
> the segment.[4]
> 6. This time, on the parent side, the mach_vm_map[2] call fails.

The return value from mach_vm_map would be interesting to know. Like whether it's e.g. KERN_NO_SPACE or KERN_INVALID_OBJECT.

> The parent
> attempts to notify the child of this fact[5], but the child is not waiting
> for this message and instead attempts to memcpy the surface data into the
> shared memory buffer.[6]

Theoretically, it shouldn't matter, right?

> 7. On the child side, the shared memory buffer is no longer valid at this
> point and manually reading every byte of it, or attempting to memcpy surface
> data into it[6] will end up crashing the child process.

That is definitely weird. Do you know if something else broke the mapping? Does the vm_map tool say it's still mapped properly?

> So it looks like the shared memory segment is getting corrupted between
> creating/mapping it and attempting to write to it. The one thing that
> happens between these two steps is the fact that we send the ShmemCreated
> message to the parent, which attempts to read/map the segment, which fails.

... which /shouldn't/ matter. Could it be that something in the child ends up getting the error from the parent and unmap the segment from another thread *while* the memcpy is happening? That would also explain the non-page aligned offsets in comment 37.
Flags: needinfo?(mh+mozilla)
Stephen, I can't help but think that https://github.com/steven-michaud/HookCase might be useful here. With it you can log calls to mach_vm_map() and friends across all of Firefox's processes and threads. If you try it, let me know your results.

I just released a Version 2.0 that's compatible with macOS High Sierra, and with the latest KPTI-supporting updates for Sierra and El Capitan. It also fixes some bugs.
It should be possible to do the same with dtrace.
Yes for mach_vm_map(). But HookCase is considerably more flexible, and can let you find much more information and play many more debugging tricks.
I couldn't resist trying my own hand at this -- HookCase seemed to be such a perfect fit for it.

And I think I've figured it out.  The crashes happen (eventually) when, in the child, in SharedMemoryBasic::Create(), mach_make_memory_entry_64() creates a smaller object than 'size'.  But the child ignores this difference.  And when the parent calls mach_vm_map(), it sets the 'size' parameter to the original value with which SharedMemoryBasic::Create() was called in the child.  But this 'size' is too large, and the parent's call to mach_vm_map() returns '4' == KERN_INVALID_ARGUMENT.

I've attached the source code for the hook library that I used.  It was a bit more complicated than I'd hoped.  All the relevant "mach" methods are called *very* often in contexts other than the SharedMemoryBasic class.  So I also needed to hook some of that class's methods to allow my logging from the "mach" methods only to happen in the correct context.  These SharedMemoryBasic methods aren't exported, but HookCase can deal with that.  I tested with yesterday's mozilla-central nightly.

There's no way you could have accomplished the same thing using dtrace.
Attachment #8955031 - Attachment mime type: text/x-troff-mm → text/text
Attachment #8955031 - Attachment mime type: text/text → text/plain
(Assignee)

Comment 46

a year ago
Posted patch Patch (obsolete) — Splinter Review
Wow! Thank you, Steven! :-)

Taking a quick step back here, when we initiate a drag session we convert transferables to IPC transferables. The conversion at issue here is for the "application/x-moz-nativeimage" flavor of transferables. It looks like this flavor can be very large, especially on hidpi screens.

So, in SharedMemoryBasic::Create, we call mach_vm_allocate with a size of 146,673,664. This call succeeds. We then call mach_make_memory_entry_64. This call succeeds as well, but the size returned is smaller: 134,217,728 to be exact.

This patch checks for a size discrepancy and bails if there is one. I've also added a mach_vm_deallocate call that appears to have been missing when mach_make_memory_entry_64 failed.

I will also file a bug to better check shmem allocations in debug builds. I'm afraid that we might see some fallout from it, so I prefer keeping it separate from the crash fix here.
Assignee: nical.bugzilla → spohl.mozilla.bugs
Status: REOPENED → ASSIGNED
Attachment #8955558 - Flags: review?(mh+mozilla)
(Assignee)

Updated

a year ago
Priority: P2 → P1
(Assignee)

Comment 47

a year ago
Posted patch Patch (obsolete) — Splinter Review
Forgot to `hg qrefresh`. This adds a previously missing surface unmap call when we bailed early that I happened to come across.
Attachment #8955558 - Attachment is obsolete: true
Attachment #8955558 - Flags: review?(mh+mozilla)
Attachment #8955560 - Flags: review?(mh+mozilla)
(Assignee)

Updated

a year ago
Depends on: 1442721
Comment on attachment 8955560 [details] [diff] [review]
Patch

Review of attachment 8955560 [details] [diff] [review]:
-----------------------------------------------------------------

::: ipc/glue/SharedMemoryBasic_mach.mm
@@ +556,3 @@
>      LOG_ERROR("Failed to make memory entry (%zu bytes). %s (%x)\n",
>                size, mach_error_string(kr), kr);
> +    mach_vm_deallocate(mach_task_self(), address, round_page(size));

Should mPort be freed here, or is it okay to leave that to the destructor?
Attachment #8955560 - Flags: review?(mh+mozilla) → review+
(Assignee)

Comment 49

a year ago
Posted patch PatchSplinter Review
Addressed comment below. Carrying over r+.

(In reply to Mike Hommey [:glandium] from comment #48)
> Comment on attachment 8955560 [details] [diff] [review]
> Patch
> 
> Review of attachment 8955560 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> ::: ipc/glue/SharedMemoryBasic_mach.mm
> @@ +556,3 @@
> >      LOG_ERROR("Failed to make memory entry (%zu bytes). %s (%x)\n",
> >                size, mach_error_string(kr), kr);
> > +    mach_vm_deallocate(mach_task_self(), address, round_page(size));
> 
> Should mPort be freed here, or is it okay to leave that to the destructor?

Let's be explicit about it by calling CloseHandle() here. Thanks for the heads up!
Attachment #8955560 - Attachment is obsolete: true
Attachment #8956536 - Flags: review+
(Assignee)

Comment 50

a year ago
https://hg.mozilla.org/integration/mozilla-inbound/rev/7a6bc227dc03ad3cfa268cb4b6349d191b46baf5
Bug 1362303: Avoid crashes when dragging on macOS due to failed allocations of large shmem segments. r=glandium

Comment 51

a year ago
bugherder
https://hg.mozilla.org/mozilla-central/rev/7a6bc227dc03
Status: ASSIGNED → RESOLVED
Last Resolved: 2 years agoa year ago
Resolution: --- → FIXED
Target Milestone: mozilla55 → mozilla60
You need to log in before you can comment on or make changes to this bug.