Closed Bug 1411401 Opened 7 years ago Closed 7 years ago

MinGW build does not run without `gfx.direct2d.disabled` and `layers.acceleration.disabled` due to weird MinGW stuff

Categories

(Core :: Graphics, enhancement, P1)

enhancement

Tracking

()

RESOLVED FIXED
Tracking Status
firefox57 --- wontfix
firefox58 --- affected

People

(Reporter: tjr, Assigned: tjr)

References

(Blocks 1 open bug)

Details

(Whiteboard: [tor][gfx-noted])

Attachments

(1 file)

I don't know if I'll be able to summarize this well, but the gist of it is that there's a calling convention mismatch between the MS ABI and MinGW. Background: https://sourceforge.net/p/mingw-w64/mailman/mingw-w64-public/?style=threaded&viewmonth=201412&viewday=23 <- This thread is the best resource https://lists.torproject.org/pipermail/tor-talk/2013-June/028564.html https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64384 One culprit we know causing this problem is ID2D1RenderTarget::GetSize. Because there are only a few uses of this in the codebase, we're going to add a wrapper function that can normalize things for MinGW.
Priority: -- → P3
Whiteboard: [tor] → [tor][gfx-noted]
There is a solution for that in Wine widl: https://source.winehq.org/git/wine.git/commitdiff/b42a15513eaa973b40ab967014b311af64acbb98 It's not perfect as discussed here: https://www.winehq.org/pipermail/wine-devel/2017-July/118470.html but it should do the trick. mingw-w64 d2d headers are not generated from .idl files, but we could have something similar there. As a side note, that's one of those bugs what would be irrelevant in clang with mingw-w64.
If I understand correctly, our options are: 1: write and carry a simple-ish local patch for MinGW. That patch will add the object pointer as a method argument. Then we patch our cairo code to work with that new function definition when compiling for MinGW. 2: develop and carry a complicated patch for gcc 3: patch our cairo code to do something crazy I went for the third option (attached). Browser still crashes, because I'm not sure I did it correctly or if there's another crasher after the GetSize one; but if I understand the problem correctly, something similar to this might work, without the need to patch MinGW or gcc?
Actually, I think this could be fixed in upstream mingw-w64, without changes to cairo nor local patches. The solution would be like something that Wine does, here is example of Wine widl generated GetSize: #ifdef WIDL_EXPLICIT_AGGREGATE_RETURNS virtual D2D1_SIZE_F* STDMETHODCALLTYPE GetSize( D2D1_SIZE_F *__ret) = 0; D2D1_SIZE_F STDMETHODCALLTYPE GetSize( ) { D2D1_SIZE_F __ret; return *GetSize(&__ret); } #else virtual D2D1_SIZE_F STDMETHODCALLTYPE GetSize( ) = 0; #endif We could do the same for mingw-w64. Inline wrapper hides ABI incompatibility and avoids requirement to change callers. (This solution will not work for code that implements those interfaces by inheriting interface by a class, but there is probably no such case in cairo anyway).
Does taking the address of this virtual function with "&" do the right thing? May want to dive into the disassembly. I'm worried we might have to do something gross like going into the object's vtable by hand. Although if Jacek has a cleaner way then all the better. :-)
(In reply to Jacek Caban from comment #4) > We could do the same for mingw-w64. Inline wrapper hides ABI incompatibility > and avoids requirement to change callers. Gave it a shot here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7287b5aa27b5cac6496f71059341805baa766cac&selectedJob=139625965 I'm going to need to work with dmajor to learn how to debug this though (as it still crashes).
There are more similar functions that need fixup. Grepping thought header from Wine, I can see that those may be needed (I didn't check which ones are used in cairo): ID2D1Bitmap::GetSize ID2D1Bitmap::GetPixelSize ID2D1Bitmap::GetPixelFormat ID2D1SolidColorBrush::GetColor ID2D1LinearGradientBrush::GetStartPoint ID2D1LinearGradientBrush::GetEndPoint ID2D1RadialGradientBrush::GetCenter ID2D1RadialGradientBrush::GetGradientOriginOffset ID2D1Layer::GetSize ID2D1RenderTarget::GetPixelFormat ID2D1RenderTarget::GetSize ID2D1RenderTarget::GetPixelSize I hope that helps.
(In reply to Tom Ritter [:tjr] from comment #8) > New try run: Does it work?
The new build seems to be asserting here: https://dxr.mozilla.org/mozilla-central/source/js/src/jsfun.h#142 I guess that's progress?
I'm not sure if it matters for these builds, but just in case, could you add this patch to your stack until Microsoft updates their debugger? It will help in case we ever need to record these builds with Time Travel Debugging. https://hg.mozilla.org/try/rev/b1ceb5a9a7f1071f9f710f9e9ba5bae14e696fa2
Hm. I tried debugging in gdb, but hit a lot of walls: https://pastebin.mozilla.org/9073549 I'm asking in #mingw-w64 for help. I'm not sure if using cygwin's gdb is doomed to failure; not sure about the architecture mismatches and if those are an issue. I'm not sure why gdb doesn't think it was compiled with debugging information. And I'm not sure about https://searchfox.org/mozilla-central/source/xpcom/build/PoisonIOInterposerBase.cpp#226 and why this is causes asserts when running under gdb :)
> I did a new push for the sake of freshness: > https://treeherder.mozilla.org/#/ > jobs?repo=try&revision=97c14f720b24b14970f71be1c76570d027c5f566 Aaaand, of course it fails in some new and exciting way. I don't see any instructions nearby in memory that would help me place it. If only I had some .pdb's :-(
Attachment #8921747 - Attachment is obsolete: true
Blocks: 1330608
Priority: P3 → P1
I spent some more time on this today. My latest build is https://treeherder.mozilla.org/#/jobs?repo=try&revision=d87c79b35f208c0cb9e77c98152896aa9fe355fc It does not contain the sandbox which is now enabled. I can't debug it in gdb on Windows, I'm hitting https://searchfox.org/mozilla-central/source/xpcom/build/PoisonIOInterposerBase.cpp#226. I can actually get the browser to get pretty far along, although it will pop a dialog box saying instruction at XXX cannot reference memory at YYY (and then continue). It will then crash a little while later at this seemingly inexplicable assembly code: > 052D8C80 mov eax,0 > 052D8C85 mov dword ptr [eax],69h I think Tor has some experience debugging on Windows with gdb - do either of you recognize this PoisonIOInterposerBase thing?
Flags: needinfo?(richard)
Flags: needinfo?(gk)
(In reply to Tom Ritter [:tjr] from comment #15) > It will then crash a little while later at this seemingly > inexplicable assembly code: > > 052D8C80 mov eax,0 > > 052D8C85 mov dword ptr [eax],69h This looks like MOZ_NoReturn, which is the ending point of MOZ_ASSERT/MOZ_CRASH/etc: https://searchfox.org/mozilla-central/rev/74b7ffee403c7ffd05b8b476c411cbf11d134eb9/mfbt/Assertions.h#214 So it looks like you crashed on line 69h==105 of some file. I don't suppose there's any nearby assembly that might point to a __FILE__ string, or an assert text?
(In reply to David Major [:dmajor] from comment #16) > (In reply to Tom Ritter [:tjr] from comment #15) > > It will then crash a little while later at this seemingly > > inexplicable assembly code: > > > 052D8C80 mov eax,0 > > > 052D8C85 mov dword ptr [eax],69h > > This looks like MOZ_NoReturn, which is the ending point of > MOZ_ASSERT/MOZ_CRASH/etc: > https://searchfox.org/mozilla-central/rev/ > 74b7ffee403c7ffd05b8b476c411cbf11d134eb9/mfbt/Assertions.h#214 > > So it looks like you crashed on line 69h==105 of some file. I don't suppose > there's any nearby assembly that might point to a __FILE__ string, or an > assert text? More Assembly: > 05178C64 mov dword ptr [esp+8],69h > 05178C6C mov dword ptr [esp+4],0A18403Ch > 05178C74 mov dword ptr [esp],0A1840D4h > 05178C7B call 02B43240 > 05178C80 mov eax,0 > 05178C85 mov dword ptr [eax],69h 0A18403Ch is the string in this assertion https://searchfox.org/mozilla-central/rev/74b7ffee403c7ffd05b8b476c411cbf11d134eb9/layout/svg/nsSVGIntegrationUtils.cpp#68 0A1840D4h is the string in this assertion which has a line number match! https://searchfox.org/mozilla-central/rev/74b7ffee403c7ffd05b8b476c411cbf11d134eb9/layout/svg/nsSVGIntegrationUtils.cpp#106 I will have to dig in to that...
Great! (FWIW this is my usual method of debugging builds that I don't have symbols for.)
We talked about it at the meeting yesterday. Clearing ni. Tom, if you need anything in addition to that please ping us again.
Flags: needinfo?(richard)
Flags: needinfo?(gk)
Okay this seems to have magically fixed itself, kinda. (For the record, up until now the sandbox had always been disabled, it was only recently we enabled it.) I stubbed out the Poison Interposer, and that helped the debug build. ================================================================================================ https://treeherder.mozilla.org/#/jobs?repo=try&revision=29e0334d2c02f4bfcf4c2a30cbf61bb7a4b252f8 Sandbox Disabled, Poison Disabled ------------------------------------------------------------------------------------------------ debug Sometimes crashes, sometimes doesn't. See these messages either time. Crash Annotation GraphicsCriticalError: |[0][GFX1-]: Failed to initialize GPU process (t=9.91905) [GFX1-]: Failed to initialize GPU process [GFX2-]: Could not acquire an MLGDevice! Crash Annotation GraphicsCriticalError: |[0][GFX1-]: Failed to initialize GPU process (t=9.91905) |[1][GFX1-]: [D3D11] failed to get compositor device. (t=9.98167) [GFX1-]: [D3D11] failed to get compositor device. Crash Annotation GraphicsCriticalError: |[0][GFX1-]: Failed to initialize GPU process (t=9.91905) |[1][GFX1-]: [D3D11] failed to get compositor device. (t=9.98167) |[2][GFX1-]: [D3D11] Failed to init compositor with reason: FEATURE_FAILURE_D3D11_NO_DEVICE (t=9.98167) [GFX1-]: [D3D11] Failed to init compositor with reason: FEATURE_FAILURE_D3D11_NO_DEVICE ------------------------------------------------------------------------------------------------ opt - Has not crashed. ================================================================================================ https://treeherder.mozilla.org/#/jobs?repo=try&revision=157fa82a6236b31ead51d00d8ea695a3db14b294 Sandbox Disabled, Poison Enabled debug - Crashes on startup with Asserts in PoisonIOInterposerBase opt - Has not crashed ================================================================================================ https://treeherder.mozilla.org/#/jobs?repo=try&revision=dffb8d630f3a3a7324830c0ae33aafd01c330b40 Sandbox Enabled, Poison Disabled - This one is the most interesting ------------------------------------------------------------------------------------------------ debug - Crashes Early on (before I've gone through the profile selection window) I see [GFX2-]: Failure initializing MLGDeviceD3D11: Could not create copy blend state (80070057) [GFX2-]: Failure initializing MLGDeviceD3D11: Could not create copy blend state (80070057) [GFX2-]: Could not acquire an MLGDevice! After the profile selection window I see [GFX2-]: Failure initializing MLGDeviceD3D11: Could not create copy blend state (80070057) [GPU 9384, Main Thread] WARNING: Shutting down GPU process early due to a crash!: file /builds/worker/workspace/build/src/gfx/ipc/GPUParent.cpp, line 439 Running it in gdb I see (before I select a profile in the profile window): Crash Annotation GraphicsCriticalError: |[0][GFX1-]: Failed to create remote compositor (t=9.01102) [GFX1-]: Failed to create remote compositor [GFX2-]: Could not acquire an MLGDevice! Crash Annotation GraphicsCriticalError: |[0][GFX1-]: Failed to create remote compositor (t=9.01102) |[1][GFX1-]: [D3D11] failed to get compositor device. (t=9.07364) [GFX1-]: [D3D11] failed to get compositor device. Crash Annotation GraphicsCriticalError: |[0][GFX1-]: Failed to create remote compositor (t=9.01102) |[1][GFX1-]: [D3D11] failed to get compositor device. (t=9.07364) |[2][GFX1-]: [D3D11] Failed to init compositor with reason: FEATURE_FAILURE_D3D11_NO_DEVICE (t=9.07364) [GFX1-]: [D3D11] Failed to init compositor with reason: FEATURE_FAILURE_D3D11_NO_DEVICE Crash Annotation GraphicsCriticalError: |[0][GFX1-]: Failed to create remote compositor (t=9.01102) |[1][GFX1-]: [D3D11] failed to get compositor device. (t=9.07364) |[2][GFX1-]: [D3D11] Failed to init compositor with reason: FEATURE_FAILURE_D3D11_NO_DEVICE (t=9.07364) |[3][GFX1-]: Receive IPC close with reason=AbnormalShutdown (t=10.7841) [GFX1-]: Receive IPC close with reason=AbnormalShutdown I also some of [GPU 3028, Main Thread] WARNING: NS_ENSURE_TRUE(InitStaticMembers()) failed: file /builds/worker/workspace/build/src/modules/libpref/Preferences.cpp, line 3929 (and other lines) If I set security.sandbox.content.level to 0: I see the same GPU errors, but it seems to run. to 1: It doesn't run. Nothing that jumps out at me. ------------------------------------------------------------------------------------------------ opt - Definetly a subprocess crash or error happening. No newtab page, just a white content window. Attempting to debug this doesn't yield a crash I can find (using simple gdb and another try with 'set follow-fork-mode child') ================================================================================================ https://treeherder.mozilla.org/#/jobs?repo=try&revision=130bc92976e2a20026c08213cfa45ec054fbc095 Sandbox Enabled Poison Enabled debug - Crashes on startup with Asserts in PoisonIOInterposerBase opt - same as no-poison opt ================================================================================================ Conclusions - I should stub out the Poison Interposer in MinGW builds (both debug and opt) - Something is definitely going screwy with the Sandbox. I sent in a new try run here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=52164b634bc2801d32628654e71d938807c7ebd6 It has the time traveling debugging patch, enables and sets the content sandbox to level 1 by default, and stubs out the poison interposer.
Blocks: 1441558
Blocks: 1443838
Depends on: 1443856
Blocks: 1434316
Blocks: 1460668
No longer depends on: 1443856
If I see this correctly then we should get this patch with the mingw-w64 bump to commit ee9fc3d0b8c8868280e38384edd968067b8e71fc. However, even building with that one crashes my Tor Browser when I set both prefs to "false". It seems it is not crashing anymore as early as loading the first window, but still, as soon, as I try to load actual content the browser stops working.
(In reply to Georg Koppen from comment #22) > If I see this correctly then we should get this patch with the mingw-w64 > bump to commit ee9fc3d0b8c8868280e38384edd968067b8e71fc. However, even > building with that one crashes my Tor Browser when I set both prefs to > "false". It seems it is not crashing anymore as early as loading the first > window, but still, as soon, as I try to load actual content the browser > stops working. No; this patch had never been upstreamed into MinGW due to some structural changes that were requested. Upstreaming is possible; I'm hoping Jacek will be able to clean it up for me as part of the mingw-clang work he's going to do.
The commit that Georg mentioned is actually upstreamed version of your patch. It's not enabled by default, it requires explicitly defining WIDL_EXPLICIT_AGGREGATE_RETURNS (eg. -DWIDL_EXPLICIT_AGGREGATE_RETURNS in CXXFLAGS). It would be possible to define that on mingw-w64 side, but it's not clear to me if we want that. As for clang, it shouldn't be needed. I'd expect clang to do the right thing here without hacks.
Awesome. I'm going to mark this as fixed since it's going to be rolled up into the mozconfigs added in Bug 1434316
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: