Closed Bug 1569930 Opened 6 months ago Closed 5 months ago

Crash in drawSnapshot with Fission enabled [@ mozilla::gfx::CrossProcessPaint::ResolveInternal]

Categories

(Core :: Graphics, defect, P3, critical)

70 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla70
Fission Milestone M4
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- unaffected
firefox68 --- unaffected
firefox69 --- unaffected
firefox70 --- fixed

People

(Reporter: whimboo, Assigned: mattwoodrow)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(2 files, 1 obsolete file)

Attached file marionette testcase (obsolete) —

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:70.0) Gecko/20100101 Firefox/70.0 ID:20190729095501

Running the attached Marionette test results in a crash of Firefox Nightly.

$ mach marionette-test %path_to_file%

As it looks like it is a regression from bug 1561395. Ryan, could you please have a look?

Crash details:

Operating system: Mac OS X
                  10.14.5 18F132
CPU: amd64
     family 6 model 142 stepping 10
     8 CPUs

GPU: UNKNOWN

Crash reason:  EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
Crash address: 0xb0
Process uptime: 1 seconds

Thread 0 (crashed)
 0  XUL!mozilla::gfx::CrossProcessPaint::ResolveInternal(mozilla::dom::WindowGlobalParent*, nsRefPtrHashtable<nsUint64HashKey, mozilla::gfx::SourceSurface>*) [CrossProcessPaint.cpp:50df4b75c9b6c7fec8c8c4685fd188634d193e75 : 343 + 0x0]
    rax = 0x4ae757e67bc600e3   rdx = 0x00007ffeedc01838
    rcx = 0x0000000000000017   rbx = 0x00007ffeedc01838
    rsi = 0x0000000000000000   rdi = 0x00000001225c8100
    rbp = 0x00007ffeedc015e0   rsp = 0x00007ffeedc01400
     r8 = 0x00000001021007d8    r9 = 0x0000000000000008
    r10 = 0x0000000000000048   r11 = 0x000000000000001d
    r12 = 0x0000000000000000   r13 = 0x0000000000000000
    r14 = 0x0000000127266000   r15 = 0x00000001225c8100
    rip = 0x0000000103ba4c2f
    Found by: given as instruction pointer in context
 1  XUL!mozilla::gfx::CrossProcessPaint::ResolveInternal(mozilla::dom::WindowGlobalParent*, nsRefPtrHashtable<nsUint64HashKey, mozilla::gfx::SourceSurface>*) [CrossProcessPaint.cpp:50df4b75c9b6c7fec8c8c4685fd188634d193e75 : 364 + 0xd]
    rbp = 0x00007ffeedc017d0   rsp = 0x00007ffeedc015f0
    rip = 0x0000000103ba4e1d
    Found by: previous frame's frame pointer
 2  XUL!mozilla::gfx::CrossProcessPaint::ReceiveFragment(mozilla::dom::WindowGlobalParent*, mozilla::gfx::PaintFragment&&) [CrossProcessPaint.cpp:50df4b75c9b6c7fec8c8c4685fd188634d193e75 : 258 + 0x70]
    rbp = 0x00007ffeedc01920   rsp = 0x00007ffeedc017e0
    rip = 0x0000000103ba43d6
    Found by: previous frame's frame pointer
 3  XUL!mozilla::MozPromise<mozilla::gfx::PaintFragment, mozilla::ipc::ResponseRejectReason, true>::ThenValue<mozilla::dom::WindowGlobalParent::DrawSnapshotInternal(mozilla::gfx::CrossProcessPaint*, mozilla::Maybe<mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> > const&, float, unsigned int)::$_4, mozilla::dom::WindowGlobalParent::DrawSnapshotInternal(mozilla::gfx::CrossProcessPaint*, mozilla::Maybe<mozilla::gfx::IntRectTyped<mozilla::gfx::UnknownUnits> > const&, float, unsigned int)::$_5>::DoResolveOrRejectInternal(mozilla::MozPromise<mozilla::gfx::PaintFragment, mozilla::ipc::ResponseRejectReason, true>::ResolveOrRejectValue&) [MozPromise.h:50df4b75c9b6c7fec8c8c4685fd188634d193e75 : 721 + 0x1b]
    rbp = 0x00007ffeedc01940   rsp = 0x00007ffeedc01930
    rip = 0x000000010588480a
    Found by: previous frame's frame pointer
 4  XUL!mozilla::MozPromise<mozilla::gfx::PaintFragment, mozilla::ipc::ResponseRejectReason, true>::ThenValueBase::ResolveOrRejectRunnable::Run() [MozPromise.h:50df4b75c9b6c7fec8c8c4685fd188634d193e75 : 397 + 0x1e]
    rbp = 0x00007ffeedc01960   rsp = 0x00007ffeedc01950
    rip = 0x00000001033b2b44
    Found by: previous frame's frame pointer
 5  XUL!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:50df4b75c9b6c7fec8c8c4685fd188634d193e75 : 1224 + 0x6]
    rbp = 0x00007ffeedc01e50   rsp = 0x00007ffeedc01970
    rip = 0x00000001028510a3
    Found by: previous frame's frame pointer
 6  XUL!NS_ProcessPendingEvents(nsIThread*, unsigned int) [nsThreadUtils.cpp:50df4b75c9b6c7fec8c8c4685fd188634d193e75 : 434 + 0xe]
    rbp = 0x00007ffeedc01ea0   rsp = 0x00007ffeedc01e60
    rip = 0x000000010284eb32
    Found by: previous frame's frame pointer
 7  XUL!nsBaseAppShell::NativeEventCallback() [nsBaseAppShell.cpp:50df4b75c9b6c7fec8c8c4685fd188634d193e75 : 87 + 0x14]
    rbp = 0x00007ffeedc01ed0   rsp = 0x00007ffeedc01eb0
    rip = 0x0000000105bd61f7
    Found by: previous frame's frame pointer
 8  XUL!nsAppShell::ProcessGeckoEvents(void*) [nsAppShell.mm:50df4b75c9b6c7fec8c8c4685fd188634d193e75 : 440 + 0x8]
    rbp = 0x00007ffeedc01f20   rsp = 0x00007ffeedc01ee0
    rip = 0x0000000105c5a30c
    Found by: previous frame's frame pointer
 9  CoreFoundation!-[_CFXNotificationRegistrar match:object:observer:enumerator:] + 0x816
    rbp = 0x00007ffeedc01f30   rsp = 0x00007ffeedc01f30
    rip = 0x00007fff2ef22083
    Found by: previous frame's frame pointer
10  CoreFoundation!-[_CFXNotificationRegistrar match:object:observer:enumerator:] + 0x7bc
    rbp = 0x00007ffeedc01f60   rsp = 0x00007ffeedc01f40
    rip = 0x00007fff2ef22029
    Found by: previous frame's frame pointer
11  CoreFoundation!__CFStringDecodeByteStream3 + 0x84b
    rbp = 0x00007ffeedc01fd0   rsp = 0x00007ffeedc01f70
    rip = 0x00007fff2ef059eb
    Found by: previous frame's frame pointer
12  CoreFoundation!+[NSDate allocWithZone:] + 0x21
    rbp = 0x00007ffeedc02ce0   rsp = 0x00007ffeedc01fe0
    rip = 0x00007fff2ef04fb5
    Found by: previous frame's frame pointer
13  CoreFoundation!__CFRunLoopRun + 0xb37
    rbp = 0x00007ffeedc02d70   rsp = 0x00007ffeedc02cf0
    rip = 0x00007fff2ef048be
    Found by: previous frame's frame pointer
14  HIToolbox!RunCurrentEventLoopInMode + 0x124
    rbp = 0x00007ffeedc02dc0   rsp = 0x00007ffeedc02d80
    rip = 0x00007fff2e1f096b
    Found by: previous frame's frame pointer
15  HIToolbox!ReceiveNextEventCommon + 0x25b
    rbp = 0x00007ffeedc02e40   rsp = 0x00007ffeedc02dd0
    rip = 0x00007fff2e1f06a5
    Found by: previous frame's frame pointer
16  HIToolbox!_BlockUntilNextEventMatchingListInModeWithFilter + 0x40
    rbp = 0x00007ffeedc02e60   rsp = 0x00007ffeedc02e50
    rip = 0x00007fff2e1f0436
    Found by: previous frame's frame pointer
17  AppKit!_DPSNextEvent + 0x3c5
    rbp = 0x00007ffeedc03270   rsp = 0x00007ffeedc02e70
    rip = 0x00007fff2c58a987
    Found by: previous frame's frame pointer
18  AppKit!-[NSApplication(NSEvent) _nextEventMatchingEventMask:untilDate:inMode:dequeue:] + 0x551
    rbp = 0x00007ffeedc034f0   rsp = 0x00007ffeedc03280
    rip = 0x00007fff2c58971f
    Found by: previous frame's frame pointer
19  XUL!-[GeckoNSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] [nsAppShell.mm:50df4b75c9b6c7fec8c8c4685fd188634d193e75 : 169 + 0x2c]
    rbp = 0x00007ffeedc03560   rsp = 0x00007ffeedc03500
    rip = 0x0000000105c59357
    Found by: previous frame's frame pointer
20  AppKit!-[NSApplication run] + 0x2bb
    rbp = 0x00007ffeedc03630   rsp = 0x00007ffeedc03570
    rip = 0x00007fff2c58383c
    Found by: previous frame's frame pointer
21  XUL!nsAppShell::Run() [nsAppShell.mm:50df4b75c9b6c7fec8c8c4685fd188634d193e75 : 703 + 0x19]
    rbp = 0x00007ffeedc03670   rsp = 0x00007ffeedc03640
    rip = 0x0000000105c5abb9
    Found by: previous frame's frame pointer
22  XUL!nsAppStartup::Run() [nsAppStartup.cpp:50df4b75c9b6c7fec8c8c4685fd188634d193e75 : 276 + 0xa]
    rbp = 0x00007ffeedc03690   rsp = 0x00007ffeedc03680
    rip = 0x000000010713ca0e
    Found by: previous frame's frame pointer
23  XUL!XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) [nsAppRunner.cpp:50df4b75c9b6c7fec8c8c4685fd188634d193e75 : 4771 + 0xdaf]
    rbp = 0x00007ffeedc03810   rsp = 0x00007ffeedc036a0
    rip = 0x00000001072b0b05
    Found by: previous frame's frame pointer
24  XUL!mozilla::BootstrapImpl::XRE_main(int, char**, mozilla::BootstrapConfig const&) [Bootstrap.cpp:50df4b75c9b6c7fec8c8c4685fd188634d193e75 : 45 + 0xf1]
    rbp = 0x00007ffeedc03950   rsp = 0x00007ffeedc03820
    rip = 0x00000001072bc6bd
    Found by: previous frame's frame pointer
25  firefox!main [nsBrowserApp.cpp:50df4b75c9b6c7fec8c8c4685fd188634d193e75 : 295 + 0x1c5]
    rbp = 0x00007ffeedc03db0   rsp = 0x00007ffeedc03960
    rip = 0x0000000101ffc163
    Found by: previous frame's frame pointer
26  libdyld.dylib!start + 0x1
    rbp = 0x00007ffeedc03dc8   rsp = 0x00007ffeedc03dc0
    rip = 0x00007fff5ae2f3d5
    Found by: previous frame's frame pointer
27  libdyld.dylib!start + 0x1
    rbp = 0x00007ffeedc03dc8   rsp = 0x00007ffeedc03dc8
    rip = 0x00007fff5ae2f3d5
    Found by: stack scanning
Flags: needinfo?(rhunt)
Flags: needinfo?(matt.woodrow)
Priority: -- → P3

Sadly there is still no easy way to instruct Marionette to attach a debugger. To simplify it I would suggest to add the following two lines after the call to navigate():

import time
time.sleep(30)

During that time search for the pid of Firefox as started by Marionette, and attach it to the debugger.

Note that when I remove the iframe from the example HTML page, the crash doesn't occur anymore.

Attached file marionette testcase

Updated Marionette testcase which should result in a 100% crash rate. Formerly the test was racy given that example.org is kinda fast to load, and with waiting after navigation everything was loaded.

With the updated test, we call drawSnapshot() while the iframe is still loading, and maybe is changing it's current window global, which might result in seeing it as null.

Attachment #9081600 - Attachment is obsolete: true
Blocks: 1570147
Flags: needinfo?(rhunt)

I can reproduce this now, thanks!

This is an interesting problem here, since we're trying to take a snapshot of content with an OOP iframe, and it goes away mid-snapshot.

Should we reject the promise, and not return any snapshot, or should we carry on but return a snapshot with a blank area where the iframe is?

My instinct here is to reject the promise, since otherwise there's no way to detect that your snapshot was incomplete. You can then repeat requesting snapshots until you get a complete one.

Maybe in the future we could add a flag for 'best effort' that'll just draw what it can and not worry about bits that get lost.

I actually wonder why the OOP frame goes away. Is that expected due to a process change, while the iframe has not been finished loading?

Another thing here could also be that Marionette in this case doesn't wait long enough for the navigation to complete. We have event listeners attached to the top-browsing context in our framescript, which waits for the DOMContentLoaded and pageshow events. Maybe with OOP frames we fire those events too early? As we have seen above, with adding an additional sleep the crash doesn't happen.

Lets get some feedback from Nika and/or Mike.

Flags: needinfo?(nika)
Flags: needinfo?(mconley)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #5)

I actually wonder why the OOP frame goes away. Is that expected due to a process change, while the iframe has not been finished loading?

It could be due to any number of reasons. Frames can go away at any time. I'm guessing this example could be due to a process switch, but it could also be caused by JS code removing the iframe element or inserting new ones.

Another thing here could also be that Marionette in this case doesn't wait long enough for the navigation to complete. We have event listeners attached to the top-browsing context in our framescript, which waits for the DOMContentLoaded and pageshow events. Maybe with OOP frames we fire those events too early? As we have seen above, with adding an additional sleep the crash doesn't happen.

We don't currently delay load events long enough. This is being worked on by :jwatt in bug 1559841.

Flags: needinfo?(nika)
Flags: needinfo?(mconley)

(In reply to :Nika Layzell (ni? for response) from comment #6)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #5)

I actually wonder why the OOP frame goes away. Is that expected due to a process change, while the iframe has not been finished loading?

It could be due to any number of reasons. Frames can go away at any time. I'm guessing this example could be due to a process switch, but it could also be caused by JS code removing the iframe element or inserting new ones.

The example as used in the Marionette test is just a data url with a contained iframe with http://example.org as source. There is no JS which is fiddling with the iframe.

Another thing here could also be that Marionette in this case doesn't wait long enough for the navigation to complete. We have event listeners attached to the top-browsing context in our framescript, which waits for the DOMContentLoaded and pageshow events. Maybe with OOP frames we fire those events too early? As we have seen above, with adding an additional sleep the crash doesn't happen.

We don't currently delay load events long enough. This is being worked on by :jwatt in bug 1559841.

Ah that explains it. Thanks. I assume that means that the navigation request from Marionette returns once the outer data url has been loaded, and as such the iframe might even not have started loading.

Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b8c5fc82d21d
Handle races in CrossProcessPaint without crashing, and instead report it back to the caller. r=rhunt
Status: NEW → RESOLVED
Closed: 5 months ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla70
Assignee: nobody → matt.woodrow
Flags: needinfo?(matt.woodrow)

Retroactively moving fixed bugs whose summaries mention "Fission" (or other Fission-related keywords) but are not assigned to a Fission Milestone to an appropriate Fission Milestone.

This will generate a lot of bugmail, so you can filter your bugmail for the following UUID and delete them en masse:

0ee3c76a-bc79-4eb2-8d12-05dc0b68e732

Fission Milestone: --- → M4
You need to log in before you can comment on or make changes to this bug.