Closed Bug 1515702 Opened 7 years ago Closed 6 years ago

Cannot run my debug build without --disable-e10s because of startup crash after landing bug 1485016

Categories

(Core :: Security, defect)

x86_64
Windows 10
defect
Not set
blocker

Tracking

()

RESOLVED FIXED
Tracking Status
firefox65 --- unaffected
firefox66 + fixed
firefox67 --- fixed

People

(Reporter: masayuki, Assigned: tjr)

References

Details

(Keywords: crash, regression)

My mozconfig file is like this: > export CC="clang-cl.exe" > export CXX="clang-cl.exe" > export LINKER="lld-link.exe" > > mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/../fx64-dbg > mk_add_options MOZ_MAKE_FLAGS="-j16" > mk_add_options AUTOCLOBBER=1 > > ac_add_options --target=x86_64-pc-mingw32 > ac_add_options --host=x86_64-pc-mingw32 > > ac_add_options --enable-debug > ac_add_options --enable-dmd > ac_add_options --enable-profiling I build today's m-c with |./mach build| and try to run it with |./mach run -P debug --no-remote 2>&1 | tee|. However, it crashes at startup. I sometime see this assertion: > [24012, Main Thread] ###!!! ASSERTION: Should have proccessed it by now: '!aState.mHavePendingPopupgroup', file m:/src/layout/base/nsCSSFrameConstructor.cpp, line 9501 > #01: nsCSSFrameConstructor::ConstructBlock (m:\src\layout\base\nsCSSFrameConstructor.cpp:10565) > #02: nsCSSFrameConstructor::ConstructNonScrollableBlock (m:\src\layout\base\nsCSSFrameConstructor.cpp:4554) > #03: nsCSSFrameConstructor::ConstructFrameFromItemInternal (m:\src\layout\base\nsCSSFrameConstructor.cpp:3614) > #04: nsCSSFrameConstructor::ConstructFramesFromItem (m:\src\layout\base\nsCSSFrameConstructor.cpp:5674) > #05: nsCSSFrameConstructor::ConstructFramesFromItemList (m:\src\layout\base\nsCSSFrameConstructor.cpp:9493) > #06: nsCSSFrameConstructor::ConstructAnonymousContentForCanvas (m:\src\layout\base\nsCSSFrameConstructor.cpp:2777) > #07: nsCSSFrameConstructor::ConstructDocElementFrame (m:\src\layout\base\nsCSSFrameConstructor.cpp:2517) > #08: nsCSSFrameConstructor::ContentRangeInserted (m:\src\layout\base\nsCSSFrameConstructor.cpp:6960) > ###!!! [Child][MessageChannel::SendAndWait] Error: Channel error: cannot send/recvtor.cpp:6880) > #10: mozilla::PresShell::Initialize (m:\src\layout\base\PresShell.cpp:1770) > #11: nsContentSink::StartLayout (m:\src\dom\base\nsContentSink.cpp:1210) > #12: nsHtml5TreeOpExecutor::StartLayout (m:\src\parser\html\nsHtml5TreeOpExecutor.cpp:646) > $ 3: nsHtml5TreeOperation::Perform (m:\src\parser\html\nsHtml5TreeOperation.cpp:1109) > #14: nsHtml5TreeOpExecutor::RunFlushLoop (m:\src\parser\html\nsHtml5TreeOpExecutor.cpp:462) > #15: nsHtml5ExecutorFlusher::Run (m:\src\parser\html\nsHtml5StreamParser.cpp:120) > #16: nsThread::ProcessNextEvent (m:\src\xpcom\threads\nsThread.cpp:1144) > #17: NS_ProcessNextEvent (m:\src\xpcom\threads\nsThreadUtils.cpp:468) > #18: mozilla::ipc::MessagePump::Run (m:\src\ipc\glue\MessagePump.cpp:88) > #19: MessageLoop::RunHandler (m:\src\ipc\chromium\src\base\message_loop.cc:308) > #20: MessageLoop::Run (m:\src\ipc\chromium\src\base\message_loop.cc:290) > #21: nsBaseAppShell::Run (m:\src\widget\nsBaseAppShell.cpp:139) > #22: nsAppShell::Run (m:\src\widget\windows\nsAppShell.cpp:409) > #23: nsAppStartup::Run (m:\src\toolkit\components\startup\nsAppStartup.cpp:272) > #24: XREMain::XRE_mainRun (m:\src\toolkit\xre\nsAppRunner.cpp:4616) > #25: XREMain::XRE_main (m:\src\toolkit\xre\nsAppRunner.cpp:4754) > #26: XRE_main (m:\src\toolkit\xre\nsAppRunner.cpp:4839) > #27: NS_internal_main (m:\src\browser\app\nsBrowserApp.cpp:293) > #28: wmain (m:\src\toolkit\xre\nsWindowsWMain.cpp:129) > #29: __scrt_common_main_seh (f:\dd\vctools\crt\vcstartup\src\startup\exe_common.inl:283) > #30: BaseThreadInitThunk[C:\WINDOWS\System32\KERNEL32.DLL +0x17e94] > #31: RtlUserThreadStart[C:\WINDOWS\SYSTEM32\ntdll.dll +0x6a251] Or just: > ###!!! [Child][MessageChannel::SendAndWait] Error: Channel error: cannot send/recv https://hg.mozilla.org/mozilla-central/rev/61ae84746b34 is the first revision I reproduce this bug. If I run my debug build with --disable-e10s, the crash is gone.
Flags: needinfo?(tom)
Hm. We're going to back that out for now for other reasons, so we'll be able to investigate this closer before it re-lands. The stack you pasted is not a direct CFG error of course... For now you can pass --disable-hardening to disable CFG and that will be the equivalent of backing out the commit.
Flags: needinfo?(tom)
I see this too. Visual studio says the crash is in RtlpHandleInvalidUserCallTarget.
I also see this issue. Added --disable-hardening to .mozconfig and now the issue is gone.
(In reply to Tom Ritter [:tjr] (Away until 1/2) from comment #1) > For now you can pass --disable-hardening to disable CFG and that will be the > equivalent of backing out the commit. Thanks, confirmed to ban the issue with it.
Could someone who's hitting this set a breakpoint on ntdll!RtlpHandleInvalidUserCallTarget and report what code rcx points to? (Presumably it's some address in some Firefox DLL...)
I'm hitting this with 32 bit builds. Let me know if I need do something else or more info would be helpful. What I got poking around in windbg: Stopped at a breakpoint on ntdll!RtlpHandleInvalidUserCallTarget: registers: eax=02c778aa ebx=06e9f884 ecx=163bc55a edx=00000000 esi=163bc55a edi=06e9f9a8 eip=77e16004 esp=06e9f7b0 ebp=06e9f860 iopl=0 nv up ei pl nz na pe nc cs=0023 ss=002b ds=002b es=002b fs=0053 gs=002b efl=00000206 That ecx address is for: xul!std::basic_streambuf<char,std::char_traits<char> >::_Lock: Stacktrace: 00 06e9f7ac 77dbf041 ntdll!RtlpHandleInvalidUserCallTarget 01 06e9f834 6c227b5d ntdll!LdrpValidateUserCallTargetBitMapRet+0x44 02 06e9f860 6c24870e MSVCP140!std::basic_ostream<char,std::char_traits<char> >::sentry::sentry+0x33 [f:\dd\vctools\crt\crtw32\stdhpp\ostream @ 120] 03 06e9f8ac 10e38c8c MSVCP140!std::basic_ostream<char,std::char_traits<char> >::operator<<+0x1e [f:\dd\vctools\crt\crtw32\stdhpp\ostream @ 423] 04 06e9fa5c 10e33baa xul!mozilla::layers::AsyncPanZoomController::RequestContentRepaint+0x4fc [c:\Projects\mozilla-central\gfx\layers\apz\src\AsyncPanZoomController.cpp @ 3763] 05 06e9fac0 10e5c225 xul!mozilla::layers::AsyncPanZoomController::RequestContentRepaint+0x13a [c:\Projects\mozilla-central\gfx\layers\apz\src\AsyncPanZoomController.cpp @ 3706] 06 06e9facc 1034eb65 xul!mozilla::detail::RunnableMethodImpl<mozilla::layers::AsyncPanZoomController *,void (mozilla::layers::AsyncPanZoomController::*)(mozilla::layers::RepaintRequest::ScrollOffsetUpdateType) __attribute__((thiscall)),1,mozilla::RunnableKind::Standard,mozilla::layers::RepaintRequest::ScrollOffsetUpdateType>::Run+0x15 [c:\Projects\mozilla-builds\obj-ff-dbg-opt\dist\include\nsThreadUtils.h @ 1161] 07 06e9fb00 1034f4e3 xul!MessageLoop::RunTask+0xa5 [c:\Projects\mozilla-central\ipc\chromium\src\base\message_loop.cc @ 442] 08 06e9fb1c 1034f7ca xul!MessageLoop::DeferOrRunPendingTask+0xb3 [c:\Projects\mozilla-central\ipc\chromium\src\base\message_loop.cc @ 449] 09 06e9fb74 1033bc88 xul!MessageLoop::DoWork+0x15a [c:\Projects\mozilla-central\ipc\chromium\src\base\message_loop.cc @ 522] 0a 06e9fbac 1033c7b8 xul!base::MessagePumpForUI::DoRunLoop+0x78 [c:\Projects\mozilla-central\ipc\chromium\src\base\message_pump_win.cc @ 204] 0b 06e9fbd0 1034e8b1 xul!base::MessagePumpWin::Run+0x48 [c:\Projects\mozilla-central\ipc\chromium\src\base\message_pump_win.h @ 80] 0c 06e9fbf4 1034e7cc xul!MessageLoop::RunInternal+0x71 [c:\Projects\mozilla-central\ipc\chromium\src\base\message_loop.cc @ 314] 0d 06e9fc28 10357193 xul!MessageLoop::RunHandler+0x6c [c:\Projects\mozilla-central\ipc\chromium\src\base\message_loop.cc @ 308] 0e 06e9fd18 1033d0ab xul!base::Thread::ThreadMain+0x2a3 [c:\Projects\mozilla-central\ipc\chromium\src\base\thread.cc @ 192] 0f 06e9fd20 74bf8484 xul!`anonymous namespace'::ThreadFunc+0xb [c:\Projects\mozilla-central\ipc\chromium\src\base\platform_thread_win.cc @ 31] 10 06e9fd34 0f836b94 KERNEL32!BaseThreadInitThunk+0x24 11 06e9fd74 77da3ab8 mozglue!patched_BaseThreadInitThunk+0xb4 [c:\Projects\mozilla-central\mozglue\build\WindowsDllBlocklist.cpp @ 715] 12 06e9fdbc 77da3a88 ntdll!__RtlUserThreadStart+0x2f 13 06e9fdcc 00000000 ntdll!_RtlUserThreadStart+0x1b
Thanks Bryce! This... > That ecx address is for: > xul!std::basic_streambuf<char,std::char_traits<char> >::_Lock: Sounds awfully lot like bug 1485016 comment 8. But our flags for Decimal.cpp look OK this time. I wonder if it's due to your clang version. Could you try using the TW64(clang-cl) toolchain from this job? https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=585f7d2135eeaddf35d239c909ef43ffa37fa860 (Normally this would just be "run mach bootstrap", but the compiler change has been backed out -- and so has the CFG change, so don't pull m-c before you test this please.)
Extracted the `clang.tar.bz2` from the job over my Users/Bryce/.mozbuild/clang directory and that does appear to have fixed the issue. I don't think I'd bootstrapped in the last couple of days, so I'm going to try another build with the most recent bootstrap clang and see how I do. Will report back in ~hour after the rebuild is done.
Bootstrapping while on central commit a1e34d928bad and doing a full rebuild results in me not encountering the issue either. Based on the above, it sounds like my issue is related to a stale clang toolchain.
Could you take another look at this, :tjr?
Flags: needinfo?(tom)
Should this bug be closed now since the patch that caused it was backed out in bug 1485016?
No, it's one of the things we're going to need to investigate and fix for relanding 1485016
Flags: needinfo?(tom)
(In reply to Tom Ritter [:tjr] from comment #12) > No, it's one of the things we're going to need to investigate and fix for > relanding 1485016 Based on Bryce's investigations above it sounds like this was just a matter of compiler version. We had to work through a ton of clang updates to get green on automation, so it makes sense that local developer builds would need to bootstrap and pick up the updated clang or else hit the same problems. (Maybe we should more loudly PSA this and/or enforce it in configure)

Bug 1485016 re-landed without issue, so I'm calling this fixed.

Assignee: nobody → tom
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.