Closed
Bug 1424505
Opened 7 years ago
Closed 7 years ago
Crash in I_RpcNegotiateTransferSyntax
Categories
(Core :: IPC: MSCOM, defect, P1)
Tracking
()
VERIFIED
FIXED
mozilla61
People
(Reporter: jseward, Assigned: Jamie)
References
Details
(Keywords: crash, regression, Whiteboard: [necko-triaged])
Crash Data
Attachments
(10 files)
4.01 KB,
patch
|
Jamie
:
review+
RyanVM
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
346.60 KB,
application/x-xpinstall
|
Details | |
4.96 KB,
patch
|
jimm
:
review+
|
Details | Diff | Splinter Review |
3.92 KB,
patch
|
bugzilla
:
review+
|
Details | Diff | Splinter Review |
1.85 KB,
text/plain
|
Details | |
3.59 KB,
text/plain
|
Jamie
:
feedback+
|
Details |
2.88 KB,
patch
|
Jamie
:
review+
RyanVM
:
approval-mozilla-beta+
RyanVM
:
approval-mozilla-release+
|
Details | Diff | Splinter Review |
942 bytes,
text/plain
|
Details | |
1.03 KB,
text/plain
|
Details | |
59 bytes,
text/x-review-board-request
|
eeejay
:
review+
|
Details |
This bug was filed from the Socorro interface and is
report bp-3dba594f-d3ce-47b4-a785-6a92f0171209.
=============================================================
This is topcrash #16 in the windows nightly 20171207220423, with
14 crashes from 8 different installations. Unfortunately, there
are no frames in Gecko code AFAICS, so I can't guess what part of
the system this has to do with. dmajor, do you know?
It's thread #72 in the process, but doesn't have a name. #72 is nestled in
between a mixture of "DNS Resolver #<n>", "StreamTrans #<n>", and "SSL Cert
<n>". Maybe it is related to one of those?
Top 10 frames of crashing thread:
0 rpcrt4.dll I_RpcNegotiateTransferSyntax
1 rpcrt4.dll LRPC_BIND_CCALL::BaseBind
2 rpcrt4.dll LRPC_BIND_CCALL::Bind
3 rpcrt4.dll LRPC_BASE_BINDING_HANDLE::DriveStateForwardAsyncCallback
4 rpcrt4.dll LrpcBHDriveStateForwardRoutine
5 rpcrt4.dll CheckLrpcBinding
6 rpcrt4.dll LRPC_BASE_RETRY_THUNK::CompleteOnProvider
7 rpcrt4.dll LRPC_CLIENT_IO::ThreadPoolInvokeCallback
8 ntdll.dll TppSimplepExecuteCallback
9 ntdll.dll TpPostTask
=============================================================
Reporter | ||
Updated•7 years ago
|
Flags: needinfo?(dmajor)
> Unfortunately, there
> are no frames in Gecko code AFAICS, so I can't guess what part of
> the system this has to do with. dmajor, do you know?
It looks like a dedicated Windows RPC thread (but I have no idea what it's doing).
These are 100% accessibility active... maybe Aaron can find some clues.
Flags: needinfo?(dmajor) → needinfo?(aklotz)
Updated•7 years ago
|
Priority: -- → P2
Whiteboard: [necko-triaged]
Comment 2•7 years ago
|
||
this issue seems to be regressing in volume in firefox 59. all the crashes are from 64bit builds on windows 7.
Accessibility client facet
1 osk.exe|6.1.7601.18512 99 38.52 %
2 osk.exe|6.1.7600.16385 30 11.67 %
3 nvda.exe|2017.3.0.14335 3 1.17 %
4 osk.exe|6.1.7601.23403 3 1.17 %
status-firefox57:
--- → affected
status-firefox58:
--- → affected
status-firefox59:
--- → affected
tracking-firefox59:
--- → ?
Keywords: regression
Hardware: Unspecified → x86_64
Comment 4•7 years ago
|
||
Too late to fix in 58 though.
Comment 5•7 years ago
|
||
#3 overall browser crash in early Beta 3 returns.
Updated•7 years ago
|
Component: Networking → IPC: MSCOM
Flags: needinfo?(aklotz)
Comment 6•7 years ago
|
||
December 26. A Tuesday. My guess is a Patch Tuesday push caused this spike.
90% Windows 7 SP1, 10% Windows 7 RTM.
I didn't land anything during the break. I don't think that Jamie did either (but ni? to confirm).
I can probably open a discussion on our MSFT list.
Flags: needinfo?(jteh)
Comment 7•7 years ago
|
||
the graph of this signature is just starting on december 26, because we purged all crash data before that date in bug 1427111 (that's the same with all crash signature), so i don't think it's due to any patch day. the signature starts increasing in volume during the 59 cycle on nightly & now that it has gone into beta there as well. this indicates that something on our part made the crash more common.
Flags: needinfo?(aklotz)
Comment 9•7 years ago
|
||
Aaron, did you have any luck contacting MSFT? This is over 10% of our browser crashes on Beta at the moment.
Flags: needinfo?(aklotz)
Comment 10•7 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #9)
> Aaron, did you have any luck contacting MSFT? This is over 10% of our
> browser crashes on Beta at the moment.
Awaiting response.
Flags: needinfo?(aklotz)
Comment 11•7 years ago
|
||
Note: Even though this crash shows up in Socorro as rpcrt4!I_RpcNegotiateTransferSyntax, when you examine the actual dump files, the stack is as follows:
rpcrt4!LRPC_CASSOCIATION::Bind+0x3d2
rpcrt4!LRPC_BIND_CCALL::BaseBind+0x85
rpcrt4!LRPC_BIND_CCALL::Bind+0x1a
rpcrt4!LRPC_BASE_BINDING_HANDLE::DriveStateForwardAsyncCallback+0x4a
rpcrt4!LrpcBHDriveStateForwardRoutine+0x5c
rpcrt4!CLIENT_IO_PROVIDER::Complete+0x1ce
rpcrt4!LRPC_BASE_RETRY_THUNK::CompleteOnProvider+0x22
rpcrt4!LRPC_CLIENT_IO::ThreadPoolInvokeCallback+0x16
ntdll!TppSimplepExecuteCallback+0x91
ntdll!TppWorkerThread+0x893
Communication with MSFT is obviously using the "real" stack.
Updated•7 years ago
|
status-firefox60:
--- → affected
tracking-firefox60:
--- → +
Comment 12•7 years ago
|
||
This is very high volume, enough that we should block the release on it.
It seems to affect users from Italy (based on the swearing in the crash-stats comments)
Is there anything we can do to mitigate the problem until we hear from MSFT?
Flags: needinfo?(aklotz)
Comment 13•7 years ago
|
||
The vast majority of crashes are triggered by osk.exe. Based on some local testing, that program still works even when a11y is force-disabled, so I'd suggest trying it out on the UIA blocklist.
Flags: needinfo?(aklotz)
Attachment #8952478 -
Flags: review?(jteh)
Assignee | ||
Comment 14•7 years ago
|
||
Comment on attachment 8952478 [details] [diff] [review]
Potential work around: block osk.exe from UIA instantiation
Review of attachment 8952478 [details] [diff] [review]:
-----------------------------------------------------------------
::: accessible/windows/msaa/CompatibilityUIA.cpp
@@ +133,5 @@
> return false;
> }
>
> +static const char* gBlockedUiaClients[] = {
> + "osk.exe"
You noted that the on-screen keyboard seems to work even when blocked. Just to clarify, did you check this on both Windows 7 and Windows 10? I'm just concerned they might be different; a lot of Microsoft a11y stuff changed significantly between 7 and 10.
::: accessible/windows/msaa/LazyInstantiator.cpp
@@ +232,5 @@
> nsCOMPtr<nsIFile> clientExe;
> if (!a11y::GetInstantiator(getter_AddRefs(clientExe))) {
> return true;
> }
> +
nit: I'm all for white space fixes in the surrounding area :), but this is the only change in this file, which makes the patch look like it touches this file when it doesn't really.
Attachment #8952478 -
Flags: review?(jteh) → review+
Comment 15•7 years ago
|
||
Yep, works on Win10.
Updated•7 years ago
|
Keywords: leave-open
Comment 16•7 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/c0260c67218a901452602a1fbf2b6ebfd66abaa3
Bug 1424505: Block osk.exe from UIA instantiation; r=Jamie
Comment 17•7 years ago
|
||
bugherder |
Comment 19•7 years ago
|
||
Comment on attachment 8952478 [details] [diff] [review]
Potential work around: block osk.exe from UIA instantiation
Approval Request Comment
[Feature/Bug causing the regression]: a11y+e10s on Windows
[User impact if declined]: Crashes
[Is this code covered by automated tests?]: No
[Has the fix been verified in Nightly?]: It's in Nightly, but it needs to spend some time "in the wild" for us to know whether or not it is effective.
[Needs manual test from QE? If yes, steps to reproduce]: No
[List of other uplifts needed for the feature/fix]: None
[Is the change risky?]: Low
[Why is the change risky/not risky?]: Simple patch that activates existing mechanisms.
[String changes made/needed]: None
Flags: needinfo?(aklotz)
Attachment #8952478 -
Flags: approval-mozilla-beta?
Comment 20•7 years ago
|
||
uplift |
Comment on attachment 8952478 [details] [diff] [review]
Potential work around: block osk.exe from UIA instantiation
Approved and uplifted for 59b12. Thanks!
https://hg.mozilla.org/releases/mozilla-beta/rev/abc5f0d35082
Attachment #8952478 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 21•7 years ago
|
||
Noting that since we didn't see a lot of crashes here in nightly, we wanted to rush this fix directly to beta. That way we can see clear result over the weekend.
Updated•7 years ago
|
Comment 22•7 years ago
|
||
it doesn't look like the workaround was effective - the crash signature is still around 10% of browser crashes in early data for 59.0b12 and osk.exe is recorded as a11y clients in most of these reports.
Flags: needinfo?(aklotz)
Updated•7 years ago
|
Assignee: nobody → jmathies
Priority: P2 → P1
Comment 23•7 years ago
|
||
Based on crash stats:
- accessibility on
- UIAUTOMATION api use
- common clients - osk.exe|6.1.7600.16385, osk.exe|6.1.7601.18512
- low install count: 350 installs / 1293 crashes = 3 crashes per install
- 64-bit only
https://crash-stats.mozilla.com/search/?signature=%3DI_RpcNegotiateTransferSyntax&product=Firefox&date=%3E%3D2018-02-01T16%3A50%3A00.000Z&date=%3C2018-02-26T16%3A50%3A30.000Z&_sort=-date&_facets=signature&_facets=version&_facets=accessibility&_facets=accessibility_client&_facets=accessibility_in_proc_client&_facets=cpu_count&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-accessibility_client
Assignee | ||
Comment 24•7 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #23)
> - UIAUTOMATION api use
> - common clients - osk.exe|6.1.7600.16385, osk.exe|6.1.7601.18512
Do these stats only include builds with the osk.exe UIA blocking (59b12 and whatever Nightly)? If so, does this mean that patch isn't working for some reason?
I wonder if UIA just repeatedly tries to go in-proc for some reason. From what we've seen, it resorts to out-proc if it can't go in-proc. However, if it *did* keep trying for some reason, it would exceed kMaxUiaAttemps (currently 5) in UiaHookProc and then it wouldn't get blocked after that. (The reason for kMaxUiaAttempts is performance, so we can't just get rid of it.)
Comment 25•7 years ago
|
||
That's a good point. Perhaps we should only charge an attempt against kMaxUiaAttempts if the result was None as opposed to Some(true) or Some(false).
Flags: needinfo?(aklotz)
Assignee | ||
Comment 26•7 years ago
|
||
(In reply to Aaron Klotz [:aklotz] from comment #25)
> That's a good point. Perhaps we should only charge an attempt against
> kMaxUiaAttempts if the result was None as opposed to Some(true) or
> Some(false).
The only problem is that if UIA just keeps re-trying and getting blocked every time, each re-try will hurt performance. I guess it depends how many times this thing re-tries, and unfortunately, we don't have an answer to that question.
Comment 27•7 years ago
|
||
(In reply to James Teh [:Jamie] from comment #24)
> (In reply to Jim Mathies [:jimm] from comment #23)
> > - UIAUTOMATION api use
> > - common clients - osk.exe|6.1.7600.16385, osk.exe|6.1.7601.18512
>
> Do these stats only include builds with the osk.exe UIA blocking (59b12 and
> whatever Nightly)? If so, does this mean that patch isn't working for some
> reason?
My stats were across all builds and channels. Here's just 59b12, which still has the crash in it for both osk versions -
https://crash-stats.mozilla.com/search/?signature=%3DI_RpcNegotiateTransferSyntax&version=59.0b12&product=Firefox&date=%3E%3D2018-02-01T10%3A50%3A00.000Z&date=%3C2018-02-26T10%3A50%3A30.000Z&_sort=-date&_facets=signature&_facets=version&_facets=accessibility&_facets=accessibility_client&_facets=accessibility_in_proc_client&_facets=cpu_count&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-accessibility_client
So basically the patch that landed here doesn't appear to be blocking this client.
Updated•7 years ago
|
Comment 28•7 years ago
|
||
STR:
1) Install the free desktop version of Sticky Password
2) Launch Sticky Password
3) launch a instance of Firefox
4) In the Sticky Password main app window, select menu on the upper right, then select settings
5) In the web accounts, Find Firefox and click Install
6) After the Sticky Addon is installed and running, go to a search engine and submit a search
result: this crash
https://crash-stats.mozilla.com/report/index/97eb55b8-f09a-4e5e-9f7d-7f4770180228
https://crash-stats.mozilla.com/report/index/71b1fb38-ce3b-46e9-aca9-15d460180228
https://crash-stats.mozilla.com/report/index/cbcb9fdf-cc6d-421b-8e78-7221a0180228
Comment 29•7 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #28)
> STR:
>
> 1) Install the free desktop version of Sticky Password
> 2) Launch Sticky Password
> 3) launch a instance of Firefox
> 4) In the Sticky Password main app window, select menu on the upper right,
> then select settings
> 5) In the web accounts, Find Firefox and click Install
> 6) After the Sticky Addon is installed and running, go to a search engine
> and submit a search
>
> result: this crash
>
> https://crash-stats.mozilla.com/report/index/97eb55b8-f09a-4e5e-9f7d-
> 7f4770180228
> https://crash-stats.mozilla.com/report/index/71b1fb38-ce3b-46e9-aca9-
> 15d460180228
> https://crash-stats.mozilla.com/report/index/cbcb9fdf-cc6d-421b-8e78-
> 7221a0180228
#5 - correction, choose the 'web browsers' menu item, not 'web accounts'.
Comment 30•7 years ago
|
||
(In reply to Aaron Klotz [:aklotz] from comment #11)
> Note: Even though this crash shows up in Socorro as
> rpcrt4!I_RpcNegotiateTransferSyntax, when you examine the actual dump files,
> the stack is as follows:
>
> rpcrt4!LRPC_CASSOCIATION::Bind+0x3d2
> rpcrt4!LRPC_BIND_CCALL::BaseBind+0x85
> rpcrt4!LRPC_BIND_CCALL::Bind+0x1a
> rpcrt4!LRPC_BASE_BINDING_HANDLE::DriveStateForwardAsyncCallback+0x4a
> rpcrt4!LrpcBHDriveStateForwardRoutine+0x5c
> rpcrt4!CLIENT_IO_PROVIDER::Complete+0x1ce
> rpcrt4!LRPC_BASE_RETRY_THUNK::CompleteOnProvider+0x22
> rpcrt4!LRPC_CLIENT_IO::ThreadPoolInvokeCallback+0x16
> ntdll!TppSimplepExecuteCallback+0x91
> ntdll!TppWorkerThread+0x893
>
> Communication with MSFT is obviously using the "real" stack.
Caught this finally after multiple tries -
rpcrt4.dll!LRPC_CASSOCIATION::Bind(struct _RPC_CLIENT_INTERFACE *,int,class RPCP_ALPC_MESSAGE_ATTRIBUTES *,class LRPC_BASE_CCALL *,unsigned long,int,unsigned __int64,int,class LRPC_BINDING * *,struct tagLRPC_BIND_DATA *) Unknown
rpcrt4.dll!LRPC_BIND_CCALL::BaseBind(int,unsigned long,int,unsigned __int64,int) Unknown
rpcrt4.dll!LRPC_BIND_CCALL::Bind(int,unsigned long,int) Unknown
rpcrt4.dll!LRPC_BASE_BINDING_HANDLE::DriveStateForwardAsyncCallback(int,unsigned long,class LRPC_BIND_CCALL *,long,int) Unknown
rpcrt4.dll!LrpcBHDriveStateForwardRoutine(class LRPC_CLIENT_IO *) Unknown
rpcrt4.dll!CLIENT_IO_PROVIDER::Complete(class MUTEX *,class LRPC_CLIENT_IO *,int,int,long) Unknown
rpcrt4.dll!LRPC_BASE_RETRY_THUNK::CompleteOnProvider(class LRPC_CLIENT_IO *,int) Unknown
rpcrt4.dll!LRPC_CLIENT_IO::ThreadPoolInvokeCallback(struct _TP_CALLBACK_INSTANCE *,void *) Unknown
ntdll.dll!TppSimplepExecuteCallback() Unknown
ntdll.dll!TppWorkerThread() Unknown
00000000000f00a3() Unknown
0000000000000008() Unknown
0000000100010007() Unknown
00000000003bdc70() Unknown
00000000127da460() Unknown
Comment 31•7 years ago
|
||
Not having much luck reproducing in a local release build.
Comment 32•7 years ago
|
||
STR updates:
pre-setup:
1) Make sure the pre-allocated process manager is enabled via prefs (dom.ipc.processPrelaunch.enabled)
2) Make sure you have dom.ipc.processCount set to a value > 1
3) Disable session restore, and set your browser up to open a new tab page on launch
Setting a new profile as your default for this may help.
str:
1) Install the free desktop version of Sticky Password
2) Launch the desktop client for Sticky Password
3) In the Sticky Password main app window, select menu on the upper right, then select settings
5) Choose the 'web browsers' menu item and click the Install button for Firefox
This step will either install an extension directly or navigate Firefox to the web site to install the Sticky Passwords web extension. Go ahead and install this.
6) After the Sticky Addon is installed, restart Firefox
7) Launch Firefox (new tab page should be open)
8) Attach a debugger
9) Navigate the open new tab page to any remote web site (triggering child process launch)
*crash*
This works in local builds as well.
While debugging, I managed to catch this next exception right before the crash we end up reporting to crash reporter. This exception was caught and dealt with internally.
Aaron, any idea what's going on here?
ntdll.dll!RtlPcToFileHeader() Unknown
KernelBase.dll!BasepGetModuleHandleExW() Unknown
KernelBase.dll!GetModuleHandleExW() Unknown
UIAutomationCore.dll!GetProviderInfo(struct IUnknown *) Unknown
UIAutomationCore.dll!Error::GeneralProviderError(struct IUnknown *,unsigned short const *,long,struct HWND__ *,int,int) Unknown
UIAutomationCore.dll!AccNavUtils::HitTestOrFocus(struct IAccessible *,int,struct tagPOINT,struct HWND__ *,struct AccPair *) Unknown
UIAutomationCore.dll!MsaaProxy::GetFocus(struct IRawElementProviderFragment * *) Unknown
UIAutomationCore.dll!UiaNode::ProviderDrillForPointOrFocus(int,int,int,double,double,struct IRawElementProviderSimple * *,class IUiaNode * *,struct UiaCacheRequest *,struct UiaCacheResponse *,int *) Unknown
UIAutomationCore.dll!UiaNode::CrossProcess_DrillForPointOrFocus(int,int,int,double,double,struct UiaCacheRequest *,struct UiaCacheResponse *) Unknown
UIAutomationCore.dll!RemoteUiaNodeStub::Incoming_DrillForPointOrFocus(class UiaNode *,class ITargetContextInvoker *,class IServerConnection *,class MessageParser &,class MessageBuilder &) Unknown
UIAutomationCore.dll!RemoteUiaNodeStub::OnMessage(struct IUnknown *,class ITargetContextInvoker *,class IServerConnection *,enum Protocol_MethodId,class MessageParser &,class MessageBuilder &) Unknown
UIAutomationCore.dll!InvokeOnCorrectContext_Callback(void *) Unknown
UIAutomationCore.dll!ProcessIncomingRequest(class MessageParser &,class MessageBuilder &,class IServerConnection *) Unknown
UIAutomationCore.dll!HookBasedServerConnectionManager::HookCallback(void *,unsigned long,void * *,unsigned long *,void * *) Unknown
UIAutomationCore.dll!HookUtil<&HookBasedClientConnection::HookCallback(void *,unsigned long,void * *,unsigned long *,void * *),0>::CallOut(void *,unsigned long,void * *,unsigned long *,void * *) Unknown
UIAutomationCore.dll!HandleHookMessage(struct tagCWPSTRUCT *,unsigned long,void (*)(void *,unsigned long,void * *,unsigned long *,void * *),void (*)(int,void *)) Unknown
UIAutomationCore.dll!HookUtil<&HookBasedClientConnection::HookCallback(void *,unsigned long,void * *,unsigned long *,void * *),0>::CallWndProc(int,unsigned __int64,__int64) Unknown
user32.dll!DispatchHookW() Unknown
user32.dll!fnHkINLPCWPSTRUCTW() Unknown
user32.dll!__fnDWORD() Unknown
ntdll.dll!KiUserCallbackDispatcherContinue() Unknown
user32.dll!ZwUserPeekMessage() Unknown
user32.dll!PeekMessageW() Unknown
msctf.dll!CThreadInputMgr::PeekMessageW(struct tagMSG *,struct HWND__ *,unsigned int,unsigned int,unsigned int,int *) Unknown
> xul.dll!mozilla::widget::WinUtils::PeekMessageW(tagMSG * aMsg, HWND__ * aWnd, unsigned int aFirstMessage, unsigned int aLastMessage, unsigned int aOption) Line 699 C++
xul.dll!nsAppShell::ProcessNextNativeEvent(bool mayWait) Line 504 C++
xul.dll!nsBaseAppShell::DoProcessNextNativeEvent(bool mayWait) Line 140 C++
xul.dll!nsBaseAppShell::OnProcessNextEvent(nsIThreadInternal * thr, bool mayWait) Line 290 C++
xul.dll!nsThread::ProcessNextEvent(bool aMayWait, bool * aResult) Line 955 C++
xul.dll!NS_ProcessNextEvent(nsIThread * aThread, bool aMayWait) Line 517 C++
xul.dll!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate * aDelegate) Line 97 C++
xul.dll!MessageLoop::RunHandler() Line 320 C++
xul.dll!MessageLoop::Run() Line 300 C++
xul.dll!nsBaseAppShell::Run() Line 159 C++
xul.dll!nsAppShell::Run() Line 403 C++
xul.dll!nsAppStartup::Run() Line 289 C++
xul.dll!XREMain::XRE_mainRun() Line 4686 C++
xul.dll!XREMain::XRE_main(int argc, char * * argv, const mozilla::BootstrapConfig & aConfig) Line 4814 C++
xul.dll!XRE_main(int argc, char * * argv, const mozilla::BootstrapConfig & aConfig) Line 4906 C++
firefox.exe!do_main(int argc, char * * argv, char * * envp) Line 232 C++
firefox.exe!NS_internal_main(int argc, char * * argv, char * * envp) Line 306 C++
firefox.exe!wmain(int argc, wchar_t * * argv) Line 132 C++
firefox.exe!__scrt_common_main_seh() Line 253 C++
kernel32.dll!BaseThreadInitThunk() Unknown
ntdll.dll!RtlUserThreadStart() Unknown
Flags: needinfo?(aklotz)
Comment 33•7 years ago
|
||
My best guess is that UIAutomationCore.dll!AccNavUtils::HitTestOrFocus probably calls AccessibleWrap::accHitTest and/or AccessibleWrap::get_accFocus, and then something bad happened.
I wonder what triggers that UIAutomationCore.dll!Error::GeneralProviderError code path. Was that because of an error code or does UIA wrap its calls into us with a SEH handler or something...
ni? Jamie since he has some UIA contacts. Maybe they can give us a hint?
Flags: needinfo?(aklotz) → needinfo?(jteh)
Comment 34•7 years ago
|
||
(In reply to Aaron Klotz [:aklotz] from comment #33)
> I wonder what triggers that UIAutomationCore.dll!Error::GeneralProviderError
> code path. Was that because of an error code or does UIA wrap its calls into
> us with a SEH handler or something...
Looks like it's due to FAILED(HRESULT).
From what I see it looks like GeneralProviderError just logs information about the provider to ETW.
Comment 35•7 years ago
|
||
More detail on how to reproduce.
Setup:
1) Make sure the pre-allocated process manager is enabled via prefs (dom.ipc.processPrelaunch.enabled)
2) Make sure you have dom.ipc.processCount set to a value 1
3) Disable session restore, and set your browser up to open a new tab page on launch
4) Install the free desktop version of Sticky Password and set up a local password database.
5) In the Sticky Password main app window, select menu on the upper right, then select settings
6) Choose the 'web browsers' menu item and click the Install button for Firefox
This step will either install an extension directly or navigate Firefox to the web site to install the Sticky Passwords web extension. Go ahead and install this. Note if you have multiple browsers installed, appears the app tries to pick the default install for launch.
7) After the Sticky Addon is installed, close Firefox
STR:
1) Launch Firefox -
check: new tab page should be open
check: Sticky Password extension toolbar button should indicate an 'inactive' session via a grey icon.
2) Attach a debugger to all Firefox processes (helps if you use one that auto attaches to new child processes).
3) Click on the Sticky Password button in the Firefox toolbar
check: desktop client should launch
check: toolbar icon should change to a 'active' (colorful) icons
4) Navigate the open new tab page to any remote web site (triggering child process launch)
*crash*
Assignee | ||
Comment 36•7 years ago
|
||
I sent an email to my UIA contact at Microsoft, but haven't seen a response yet.
I tried reproducing this myself in case there's some additional info I can offer. Unfortunately, the Sticky Password app is horribly inaccessible and I can't even get to the menu to install the browser extension. Can someone perhaps provide the direct URL for the extension?
Flags: needinfo?(jteh)
Updated•7 years ago
|
Crash Signature: [@ I_RpcNegotiateTransferSyntax] → [@ I_RpcNegotiateTransferSyntax]
[@ rpcrt4.dll@0x499ca]
[@ rpcrt4.dll@0x49518]
Comment 37•7 years ago
|
||
Comment 38•7 years ago
|
||
Comment 39•7 years ago
|
||
(In reply to [:philipp] from comment #38)
> Created attachment 8956535 [details]
> sticky passwords webextension
This doesn't install correctly, fyi.
Comment 40•7 years ago
|
||
So far I've managed to reproduce this about three times. Currently trying again today.
Aaron, would you mind taking a crack at reproducing as well please. This might block 59.
Flags: needinfo?(aklotz)
Comment 41•7 years ago
|
||
Yeah, I'll dust off a Win7 box and see what I can find...
Flags: needinfo?(aklotz)
Comment 42•7 years ago
|
||
I have everything set up via Comment 35 but I have yet to reproduce the crash...
Comment 43•7 years ago
|
||
Also note as an addendum to Comment 35: Sticky Password does not play well with multiple instances of Firefox (any channel) or multiple profiles.
I could not get it to recognize the extension installation without removing all other instances of Firefox and Nightly except for the one I was working with, and I had to set a default profile.
Comment 44•7 years ago
|
||
Wild thought: I wonder if this is similar to bug 1294903, where file picker stuff was running in the COM MTA and the MTA was unexpectedly shutdown while code was still running. That bug was also specific to Windows 7.
In the case of the parent process, the MTA is not currently guaranteed to remain running for the entire lifetime of the COM process. In particular, I see a lot of CoInitializeEx/CoUninitialize pairs on background threads that will come and go. If the timing is just right, the MTA could be completely torn down.
This could explain the difficulty of reproducing this problem: The various players need to race such that something is running in the MTA concurrently with the last reference to the MTA being CoUninitialize'd.
The easiest way to land an experimental patch would be to instantiate an instance of mozilla::mscom::EnsureMTA sometime after XPCOM and the observer service are fired up (EnsureMTA depends on those components). That would guarantee that at least one thread is maintaining the MTA for the entire time that the process is running.
Comment 45•7 years ago
|
||
I think we should try this. Feel free to try locally first.
Flags: needinfo?(jmathies)
Attachment #8956659 -
Flags: review?(jmathies)
Updated•7 years ago
|
Attachment #8956659 -
Flags: review?(jmathies) → review+
Comment 46•7 years ago
|
||
(In reply to Aaron Klotz [:aklotz] from comment #45)
> Created attachment 8956659 [details] [diff] [review]
> Keep MTA continuously running in the background for entire life of parent
> process
>
> I think we should try this. Feel free to try locally first.
Running a local build with this patch. Unfortunately I've not had much luck reproducing the crash in a local build but FWIW so far no crash. Will keep testing.
Flags: needinfo?(jmathies)
Comment 47•7 years ago
|
||
Hey Aaron, lets get this into nightly, I should be able to confirm pretty easily then.
Flags: needinfo?(aklotz)
Comment 48•7 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/35129c889938eeefd8ccc81cf78a223e045e7f00
Bug 1424505: Ensure at least one consistent reference to the COM MTA at all times on Windows 7; r=jimm
Comment 49•7 years ago
|
||
The above push is inside a NIGHTLY_BUILD include guard until we've had a chance to assess its impact on Nightly.
Flags: needinfo?(aklotz)
Comment 50•7 years ago
|
||
bugherder |
Comment 51•7 years ago
|
||
(In reply to Cosmin Sabou [:cosmin_s] from comment #50)
> https://hg.mozilla.org/mozilla-central/rev/35129c889938
This didn't fix the issue, still crashing this morning.
Comment 52•7 years ago
|
||
Lets try this. In testing I found that the pre-allocated process manager [1] seemed to exacerbate this. That class has one unique feature, it recycles [2] content processes. My theory is that this is probably bad practice when we have accessibility clients reaching into these processes via COM doing weird unexpected things. Disabling the ppm fixed the crash locally as well, which seems to confirm this.
Note I think this could create negative side effects due to higher content process turnover rates.
[1] https://searchfox.org/mozilla-central/search?q=PreallocatedProcessManager&path=
[2] https://searchfox.org/mozilla-central/rev/588d8120aa11738657da93e09a03378bcd1ba8ec/dom/ipc/ContentParent.cpp#1809
Updated•7 years ago
|
Attachment #8957390 -
Flags: review?(aklotz)
Comment 53•7 years ago
|
||
Assignee | ||
Comment 54•7 years ago
|
||
I eventually managed to get this installed, but can't repro the crash. :(
(In reply to Jim Mathies [:jimm] from comment #52)
> it recycles
> [2] content processes. My theory is that this is probably bad practice when
> we have accessibility clients reaching into these processes via COM doing
> weird unexpected things.
The thing is that those clients should only be able to make accessibility calls, and if we had a bug in our a11y code somewhere, I'd expect that to crash the content process, not the parent.
Are you able to tell me the last COM call you see with COM logging enabled just before the crash occurs?
The other thing to try might be to log every single call that hits the Handler. Unfortunately, that would be manual work - we don't have a framework for that - but it might be enlightening. It's possible a call hits the Handler but never ends up hitting the content process, at which point we'd know what call was causing the problem and be able to narrow it down at least.
Comment 55•7 years ago
|
||
This tab process had been recycled about three times before it crashed loading a bugzilla page.
Comment 56•7 years ago
|
||
Doesn't look like much in that log. Jamie, any thoughts?
Flags: needinfo?(jteh)
Updated•7 years ago
|
Attachment #8957412 -
Attachment mime type: text/x-log → text/plain
Assignee | ||
Comment 57•7 years ago
|
||
Nope. :( Just a few QueryInterfaces on two distinct objects, probably just for retrieval of the top level document. (We do quite a few QueryInterfaces internally to cache common interfaces.)
Jim, can you set a parent process breakpoint for AccessibleHandler!mozilla::a11y::AccessibleHandler::ResolveIA2, perhaps just outputting the first few frames of the stack rather than breaking? Something like:
bp AccessibleHandler!mozilla::a11y::AccessibleHandler::ResolveIA2 "kp 5; g"
Most handler calls end up calling that function, so it's a good way to catch most of them. The last one before the crash may (or may not) be interesting.
Flags: needinfo?(jteh)
Comment 58•7 years ago
|
||
Comment on attachment 8957390 [details] [diff] [review]
disable process recycling v.1
Review of attachment 8957390 [details] [diff] [review]:
-----------------------------------------------------------------
::: dom/ipc/ContentParent.cpp
@@ +2916,5 @@
> }
> #ifdef ACCESSIBILITY
> else if (aData && !strcmp(aTopic, "a11y-init-or-shutdown")) {
> if (*aData == '1') {
> + // Shut down the preallocated process manager to avoid recycled
Given that this bug is specific to Win7, should we gate this on !IsWin8OrNewer()?
Attachment #8957390 -
Flags: review?(aklotz) → review+
Comment 59•7 years ago
|
||
(In reply to Aaron Klotz [:aklotz] from comment #58)
> Comment on attachment 8957390 [details] [diff] [review]
> disable process recycling v.1
>
> Review of attachment 8957390 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> ::: dom/ipc/ContentParent.cpp
> @@ +2916,5 @@
> > }
> > #ifdef ACCESSIBILITY
> > else if (aData && !strcmp(aTopic, "a11y-init-or-shutdown")) {
> > if (*aData == '1') {
> > + // Shut down the preallocated process manager to avoid recycled
>
> Given that this bug is specific to Win7, should we gate this on
> !IsWin8OrNewer()?
Considered this. My guess is this is showing up on 8 and up just under different signatures so I didn't gate.
Comment 60•7 years ago
|
||
That's a fair point.
Updated•7 years ago
|
Attachment #8957639 -
Attachment description: accessible handler tracve 1 → accessible handler trace 1
Updated•7 years ago
|
Attachment #8957639 -
Attachment is patch: false
Comment 62•7 years ago
|
||
Spoke with Aaron, the plan is to land the work around for confirmation over the weekend. If we come up with something else early next week we can back it out.
Comment 63•7 years ago
|
||
Pushed by jmathies@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/cb808626c8df
Avoid recycling content process when accessibility is active. r=aklotz
Comment 64•7 years ago
|
||
We can leave this as something to try and fix in a dot release, but at this point, I'm not going to block the 59.0 release.
Our overall crash rate for the 59 release is low. The crash rate for this signature is much higher on 59 than on 58, but for Win 7 only, and happens repeatedly for particular users, rather than affecting all Win7 users. So let's aim for a 59.0.x dot release to give us time for testing.
Aaron, is there any workaround for users we can add in SUMO or can you suggest wording for a "known issue" release note?
Flags: needinfo?(aklotz)
Comment 65•7 years ago
|
||
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #64)
> We can leave this as something to try and fix in a dot release, but at this
> point, I'm not going to block the 59.0 release.
>
> Our overall crash rate for the 59 release is low. The crash rate for this
> signature is much higher on 59 than on 58, but for Win 7 only, and happens
> repeatedly for particular users, rather than affecting all Win7 users. So
> let's aim for a 59.0.x dot release to give us time for testing.
SGTM!
>
> Aaron, is there any workaround for users we can add in SUMO or can you
> suggest wording for a "known issue" release note?
Yes. If the users are not using a legit a11y client (and most of our crash reports indicate this to be the case), they can disable accessibility in their preferences.
Flags: needinfo?(aklotz)
Comment 66•7 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/d028bee2fa0edb25eb14789709f2f7e3f95a5859
Bug 1424505: Backed out changeset 35129c889938 since it was speculative and didn't work. r=backout
Comment 67•7 years ago
|
||
Adding to release notes for 59 as a known issue:
On-screen keyboard on some Windows 7 devices may cause a crash. Users can work around this issue by turning off accessibility features in Preferences, in the Privacy and Security Permissions section
Joni, I know it's last minute, but can you add a page in SUMO for us to link to? That way, I can write a shorter release note and link to the page for the details.
relnote-firefox:
--- → 59+
Flags: needinfo?(jsavage)
Comment 68•7 years ago
|
||
Actually, never mind, I'll link to this existing page: https://support.mozilla.org/en-US/kb/accessibility-services#w_how-do-i-disable-firefox-accessibility-service
Flags: needinfo?(jsavage)
Comment 69•7 years ago
|
||
Comment 70•7 years ago
|
||
bugherder |
Comment 71•7 years ago
|
||
(In reply to Noemi Erli[:noemi_erli] from comment #70)
> https://hg.mozilla.org/mozilla-central/rev/cb808626c8df
Rats, still hit this today.
The stacks in the report below are pretty good, the main thread is clearly digging for focus related information. The COM thread crashes in the background.
https://crash-stats.mozilla.com/report/index/99ab4ca0-cbcf-4aad-8549-c3f190180311#allthreads
Comment 72•7 years ago
|
||
One of the things I've noticed about the sticky passwords app, they mess with focus by placing non-firefox process UI on top of the firefox window in cases where they generate drop down menus and such. This tends to trigger top level window focus changes.
Assignee | ||
Comment 73•7 years ago
|
||
Took a look at the trace and the crash dump. Both are definitely related to calls to the content process to answer a focus query. However, the trace seems to suggest it was an accParent call after an accFocus call, where the dump suggests the crash occurred on the accFocus call.
Jim, in your STR (comment 35), you noted:
> 4) Navigate the open new tab page to any remote web site (triggering child process launch)
However, from what I can see, the new tab page is itself loaded in a content process. In fact, the same content process gets used for me (though I guess it probably doesn't with the process recycling disabled).
Anyway, it would seem that this is somehow related to a focus query hitting a remote document when it is either in the process of dying or in the process of being created. The former seems more likely, though I don't really understand why it would happen. Even if the document was shut down just before or while we were marshaling it (which is possible because shut down happens in the main thread while marshaling occurs in an MTA thread), we're still holding a reference, so it shouldn't be destroyed... and any calls to fill the handler cache should return CO_E_OBJNOTCONNECTED.
Assignee | ||
Comment 74•7 years ago
|
||
Comment on attachment 8957639 [details]
accessible handler trace 1
Thanks. See comment 73.
Attachment #8957639 -
Flags: feedback?(jteh) → feedback+
Comment 75•7 years ago
|
||
(In reply to James Teh [:Jamie] from comment #73)
> Took a look at the trace and the crash dump. Both are definitely related to
> calls to the content process to answer a focus query. However, the trace
> seems to suggest it was an accParent call after an accFocus call, where the
> dump suggests the crash occurred on the accFocus call.
>
> Jim, in your STR (comment 35), you noted:
>
> > 4) Navigate the open new tab page to any remote web site (triggering child process launch)
>
> However, from what I can see, the new tab page is itself loaded in a content
> process. In fact, the same content process gets used for me (though I guess
> it probably doesn't with the process recycling disabled).
>
Doesn't make any difference I guess, process recycling doesn't seem to be involved.
Shorter STR that I use now:
1) have the sp apps installed but not logged in
2) open firefox, open a new tab page, close other tabs
3) click the sb toolbar icon, this should launch the desktop app
4) log into the desktop app
5) Navigate the new tab page to a bugzilla bug page
*crash*
Updated•7 years ago
|
Crash Signature: [@ I_RpcNegotiateTransferSyntax]
[@ rpcrt4.dll@0x499ca]
[@ rpcrt4.dll@0x49518] → [@ I_RpcNegotiateTransferSyntax]
[@ rpcrt4.dll@0x499ca]
[@ rpcrt4.dll@0x49518]
[@ rpcrt4.dll@0x45e6a]
Comment 76•7 years ago
|
||
different approach:
this signature is clearly regressing in volume during the 59 cycle and is pretty common on nightly too - unfortunately we've purged all crash reports submitted before december 26, so it's not possible to get a regression range by looking at crash stats.
however this bug was filed as part of the nightly crash triage roster for build 20171207220423 and i'd assume the signature was caught just shortly after something regressed it (because it's occurring rather frequently).
in the week before build 20171207220423, there were just four patches landing in Core::Disability Access AP
https://bugzilla.mozilla.org/buglist.cgi?f1=cf_status_firefox58&list_id=14059141&o1=notequals&resolution=FIXED&chfieldto=2017-12-07&chfield=target_milestone&query_format=advanced&chfieldfrom=2017-12-01&chfieldvalue=mozilla59&bug_status=RESOLVED&bug_status=VERIFIED&v1=fixed&component=Disability%20Access%20APIs&product=Core
because comment #73 mentioned a focus query, perhaps bug 1421144 is related?
Assignee | ||
Comment 78•7 years ago
|
||
It's related in the sense that before bug 1421144 landed, it wasn't possible for an accessibility client to query the focus on demand. If a client missed the event (or if one wasn't fired as is the case in some obscure circumstances), tough luck; it couldn't retrieve the focus. So, before that point, the call to accFocus would simply return nothing. Now, if a remote document is focused, it correctly returns the focused remote accessible.
I'd already started to investigate this avenue, which is why I was speculating about a focus query hitting a dying document in comment 73:
> Anyway, it would seem that this is somehow related to a focus query hitting a remote document when it is either in the process of dying or in the process of being created. The former seems more likely, though I don't really understand why it would happen. Even if the document was shut down just before or while we were marshaling it (which is possible because shut down happens in the main thread while marshaling occurs in an MTA thread), we're still holding a reference, so it shouldn't be destroyed... and any calls to fill the handler cache should return CO_E_OBJNOTCONNECTED.
Flags: needinfo?(jteh)
Comment 79•7 years ago
|
||
I discussed this with Jamie and Jim on IRC. In the interest of hopefully stopping the bleeding on 59, I created a Try push of bug 1421144 backed out on top of mozilla-release so we can see if that helps. If it does, that would at least give us a path forward for 59.0.2 while we work to determine the root cause for 60+. For anybody that can consistently reproduce these crashes, feedback on the below builds would be greatly appreciated!
Win32:
https://queue.taskcluster.net/v1/task/ZfFIt3G8Rfq7AZaSMDysYQ/runs/0/artifacts/public/build/target.zip
Win64:
https://queue.taskcluster.net/v1/task/Nz7NUddlSfqnAguwd-jh1A/runs/0/artifacts/public/build/target.zip
Comment 80•7 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #79)
> I discussed this with Jamie and Jim on IRC. In the interest of hopefully
> stopping the bleeding on 59, I created a Try push of bug 1421144 backed out
> on top of mozilla-release so we can see if that helps. If it does, that
> would at least give us a path forward for 59.0.2 while we work to determine
> the root cause for 60+. For anybody that can consistently reproduce these
> crashes, feedback on the below builds would be greatly appreciated!
>
> Win32:
> https://queue.taskcluster.net/v1/task/ZfFIt3G8Rfq7AZaSMDysYQ/runs/0/
> artifacts/public/build/target.zip
>
> Win64:
> https://queue.taskcluster.net/v1/task/Nz7NUddlSfqnAguwd-jh1A/runs/0/
> artifacts/public/build/target.zip
Surfing in this build now, no crashes yet. Will report back in a bit.
Comment 81•7 years ago
|
||
> > Win64:
> > https://queue.taskcluster.net/v1/task/Nz7NUddlSfqnAguwd-jh1A/runs/0/
> > artifacts/public/build/target.zip
>
> Surfing in this build now, no crashes yet. Will report back in a bit.
Newer hopefully more reliable STR:
Setup:
1) Install Sticky Passwords desktop app and the Firefox WebExtension (see comment 32)
2) Launch Firefox and open this bug page
3) Log into Sticky Passwords in Firefox
4) Click on the toolbar icon and select bookmarks
5) Add a Sticky Passwords bookmark to this bugzilla page
6) Lock the Sticky Passwords desktop app by right clicking on the desktop tray icon and select Lock
7) Configure Firefox to open a single new tab page, disable session restore and all pinned tabs
Repro:
1) Open Firefox
2) Open a simple remote web page (www.techmeme.com)
3) Open a new tab page tab
4) Close the remote tab
6) click on the SB toolbar button, select bookmarks, and navigate to the bugzilla bookmark
result: *crash*
I've reproduced using this in nightly and tested with the try builds here. So far I haven't reproduced in the try build.
Comment 82•7 years ago
|
||
a user at https://www.reddit.com/r/firefox/comments/85zfc8/5901_64bit_win_7_random_crashes_since_last_week/ also reported back that the instability stopped when using the try build from comment 79
Comment 83•7 years ago
|
||
Let's move forward with the revert option for 59 then. Jamie, can you please give this your official blessing per our IRC conversation yesterday?
Attachment #8961084 -
Flags: review?(jteh)
Attachment #8961084 -
Flags: approval-mozilla-release?
Comment 84•7 years ago
|
||
Just realized I've been surfing with this build now for about eight hours. No issues.
Comment 85•7 years ago
|
||
(In reply to Noemi Erli[:noemi_erli] from comment #70)
> https://hg.mozilla.org/mozilla-central/rev/cb808626c8df
Just a note to self, I need to back this change out.
Flags: needinfo?(jmathies)
Assignee | ||
Comment 86•7 years ago
|
||
Comment on attachment 8961084 [details] [diff] [review]
revert bug 1421144
Consider it blessed. I realise no one is suggesting otherwise, but just to be super clear, this must only be considered a short-term bandaid. Reverting this long-term is not an acceptable option and I'm probably going to need some help fixing this properly, since I still can't reproduce it myself.
Attachment #8961084 -
Flags: review?(jteh)
Attachment #8961084 -
Flags: review+
Assignee | ||
Updated•7 years ago
|
Attachment #8961084 -
Flags: review+
Comment 87•7 years ago
|
||
Comment on attachment 8961084 [details] [diff] [review]
revert bug 1421144
Yeah, I'm only treating this as a stop-gap for 59 for now. We can decide what to do for 60+ later in the cycle depending on how the investigation goes.
Attachment #8961084 -
Flags: approval-mozilla-release? → approval-mozilla-release+
This should be tracked as a blocking bug.
Comment 89•7 years ago
|
||
backout bugherder uplift |
Comment 90•7 years ago
|
||
I'm going to track this as blocking for 60 too, to make sure we address this in some way before we ship.
status-firefox61:
--- → affected
Comment 92•7 years ago
|
||
So far, it appears that the backout had the intended affect on the crash rate for 59.0.2.
Comment 93•7 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #85)
> (In reply to Noemi Erli[:noemi_erli] from comment #70)
> > https://hg.mozilla.org/mozilla-central/rev/cb808626c8df
>
> Just a note to self, I need to back this change out.
Try push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1735c496b8abd67254a6d125596f1bb2bed57588
Comment 94•7 years ago
|
||
Should we land the stop-gap on 60 as well if we don't have a resolution by, say, a week from now?
Comment 95•7 years ago
|
||
Hello, new non-tech person here. I followed the link from this page:
https://www.mozilla.org/en-US/firefox/59.0.2/releasenotes/
...regarding issues with the extension "Sticky Password". Hoping to help, here's a link from the forum on the StickyPassword.com site that may be related:
https://www.stickypassword.com/forum/viewtopic.php?f=133&t=91447
...especially/possibly the final comment (as of this writing).
That's it -- thanks to all for your significant efforts.
Comment 96•7 years ago
|
||
backout uplift |
Per IRC discussion with Jim, we're going to backout rev cb808626c8df from all affected channels (Beta/Trunk) and also bug 1421144 from Beta to work around the crashes before 60 goes to release given where we are in the cycle.
https://hg.mozilla.org/mozilla-central/rev/d5bf9378b0802f7b10c551594bb6bd516f5040c3
https://hg.mozilla.org/releases/mozilla-beta/rev/8e7c044b2e71af8b7dba06ced0298fe50446c575
https://hg.mozilla.org/releases/mozilla-beta/rev/7d6400c2038cf35259e42d30541654a94e7a7b9a
tracking-firefox61:
--- → blocking
Updated•7 years ago
|
Attachment #8961084 -
Flags: approval-mozilla-beta+
Assignee | ||
Comment 97•7 years ago
|
||
Given that this crash only occurs on Windows 7, would it be possible to get bug 1421144 reinstated for 60 but disabled for Windows 7? (I would provide a trivial patch on top of bug 1421144.) That way, users on later versions of Windows would get the fix for bug 1421144, but we wouldn't crash on Windows 7. Especially given that 60 is an ESR, it'd be nice if users not running Windows 7 didn't have to wait for another release to get this fix.
Jim, what do you think?
Comment 98•7 years ago
|
||
(In reply to James Teh [:Jamie] from comment #97)
> Given that this crash only occurs on Windows 7, would it be possible to get
> bug 1421144 reinstated for 60 but disabled for Windows 7? (I would provide a
> trivial patch on top of bug 1421144.) That way, users on later versions of
> Windows would get the fix for bug 1421144, but we wouldn't crash on Windows
> 7. Especially given that 60 is an ESR, it'd be nice if users not running
> Windows 7 didn't have to wait for another release to get this fix.
>
> Jim, what do you think?
Sure assuming that fix we keep backing out was valuable vs. trivial.
What we really need do here is figure out a way to debug this. My attempts to reproduce are mixed but if I try long enough I can. Problem is there's not much you can tell from the actual crash.
I wonder about these users who appear to be on Windows 7, with active on-screen keyboards. What type of user has this setup?
Assignee | ||
Comment 99•7 years ago
|
||
(In reply to Jim Mathies [:jimm] from comment #98)
> > Given that this crash only occurs on Windows 7, would it be possible to get
> > bug 1421144 reinstated for 60 but disabled for Windows 7?
> Sure assuming that fix we keep backing out was valuable vs. trivial.
It fixes a bug that users won't encounter that often (users don't tend to use the System menu that much), but it's rather annoying/confusing when they do hit it. Having said that, I don't know of a user aside from me who has reported this, so that might suggest it's not worth the effort for 60. I still think it needs to be fixed going forward, though.
The fact that this is so difficult to reproduce is super frustrating. I'm going to put some thought into how we can possibly reproduce this without Sticky Password, I think.
Assignee | ||
Comment 100•7 years ago
|
||
I'm trying to find a way to repro this without Sticky Password, since that's out of our control. Bug 1421144 introduced code which makes accFocus work for remote documents, and it is this which seems to expose these crashes on Windows 7. So, I just hacked together this test code which runs accFocus on the root accessible every 50 ms.
Unfortunately, I still can't reproduce this. In addition, my vm was single core, so I tried changing it to have 4 cores, but still couldn't reproduce it.
Compile with:
cl accFocusLoop.cpp /EHsc oleacc.lib
Then, run:
accFocusLoop hwnd
where hwnd is the HWND of the Firefox main window (in decimal).
Jim, I'm curious as to whether you can repro with this test code?
Comment 101•7 years ago
|
||
We need to make a decision this week about what to do for 61+ here. We've gone with the backout for the previous 2 cycles and can't keep doing this every cycle.
Flags: needinfo?(jteh)
Assignee | ||
Comment 102•7 years ago
|
||
Similar to the accFocus test, but this one uses the UI Automation client API to retrieve the focus in a loop. We believe this is triggered by a UIA client, so I thought this worth a shot. Compile with:
cl uiaFocusLoop.cpp /EHsc
Unfortunately, no joy: I still can't reproduce it. :(
I did notice that UIA doesn't behave correctly at all for accFocus in remote documents in Windows 7. It just returns the top level accessible, as if the IAccessible::accFocus call failed. It works fine for documents hosted in the parent process, though. This is certainly suspicious. Perhaps this has something to do with RPC_E_CANTCALLOUT_ININPUTSYNCCALL, since we don't enable that hack for UIA. It works correctly in Windows 10, though.
I also tried having Firefox return RPC_E_DISCONNECTED whenever it had to do a remote accFocus, since that error code could show up if a remote document dies while we're handling accFocus and I thought perhaps UIA couldn't handle it properly. Nope; still no crash.
Flags: needinfo?(jteh)
Assignee | ||
Updated•7 years ago
|
Assignee: jmathies → jteh
Comment hidden (mozreview-request) |
Comment 104•7 years ago
|
||
mozreview-review |
Comment on attachment 8972245 [details]
Bug 1424505: Don't try to retrieve the accessible focus from remote documents on Windows 7.
https://reviewboard.mozilla.org/r/240916/#review246740
Looks OK. Sorry if this impacts Windows 7 users.
Attachment #8972245 -
Flags: review?(eitan) → review+
Comment 105•7 years ago
|
||
Pushed by jteh@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/68bfd841265f
Don't try to retrieve the accessible focus from remote documents on Windows 7. r=eeejay
Comment 106•7 years ago
|
||
bugherder |
Comment 107•7 years ago
|
||
Looks like Jamie's patch is holding up well for 61/62. This bug has become a bit of a mess - should we consider closing this one out and moving the follow-up work to a new one?
Assignee | ||
Comment 109•7 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #107)
> Looks like Jamie's patch is holding up well for 61/62. This bug has become a
> bit of a mess - should we consider closing this one out and moving the
> follow-up work to a new one?
I guess that makes sense. It'd essentially be the same bug, just with less urgency. :)
Filed bug 1461237.
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(jteh)
Resolution: --- → FIXED
Updated•7 years ago
|
Target Milestone: --- → mozilla61
Updated•7 years ago
|
Keywords: leave-open
Comment 110•7 years ago
|
||
I think we can call this verified despite the low ongoing crash rate given how it's still a couple orders of magnitude lower than it was previously.
Status: RESOLVED → VERIFIED
Flags: needinfo?(jmathies)
You need to log in
before you can comment on or make changes to this bug.
Description
•