Closed Bug 1347451 Opened 7 years ago Closed 7 years ago

Intermittent test_navigation.py TestBackForwardNavigation.test_frameset_after_navigating_in_frame | application crashed [@ mozilla::a11y::ProxyTextChangeEvent(mozilla::a11y::ProxyAccessible *,nsString const &,int,unsigned int,bool,bool)]

Categories

(Core :: Disability Access APIs, defect, P5)

53 Branch
Unspecified
Windows
defect

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox52 --- unaffected
firefox-esr52 --- unaffected
firefox53 + fix-optional
firefox54 + fix-optional
firefox55 - fix-optional
firefox56 - fix-optional
firefox57 - fix-optional

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(4 keywords, Whiteboard: [stockwell unknown])

Attachments

(1 file, 1 obsolete file)

Here the five first frames of the crash:

16:29:57     INFO -  Crash reason:  EXCEPTION_BREAKPOINT
16:29:57     INFO -  Crash address: 0x7fcea3a7ba2
16:29:57     INFO -  Process uptime: 243 seconds
16:29:57     INFO -  Thread 0 (crashed)
16:29:57     INFO -   0  xul.dll!mozilla::a11y::ProxyTextChangeEvent(mozilla::a11y::ProxyAccessible *,nsString const &,int,unsigned int,bool,bool) [Platform.cpp:ab9bdbdc7780 : 127 + 0x1b]
16:29:57     INFO -      rax = 0x0000000000000000   rdx = 0x000000e5c3c5c958
16:29:57     INFO -      rcx = 0x00000000ffffffff   rbx = 0x000000000000007f
16:29:57     INFO -      rsi = 0x0000000000000007   rdi = 0x0000000000000000
16:29:57     INFO -      rbp = 0x0000000000000000   rsp = 0x000000e5c3c5c940
16:29:57     INFO -       r8 = 0x000000e5c3c5c950    r9 = 0x000000e5c3c5c948
16:29:57     INFO -      r10 = 0x0000000000000000   r11 = 0x000000e5c3c57e90
16:29:57     INFO -      r12 = 0x000000e5c3c5cac8   r13 = 0x000000e5c3c5cac0
16:29:57     INFO -      r14 = 0x000000e5c3c5ca60   r15 = 0x000000e5c3c5cb28
16:29:57     INFO -      rip = 0x000007fcea3a7ba2
16:29:57     INFO -      Found by: given as instruction pointer in context
16:29:57     INFO -   1  xul.dll!mozilla::a11y::DocAccessibleParent::RecvTextChangeEvent(unsigned __int64 const &,nsString const &,int const &,unsigned int const &,bool const &,bool const &) [DocAccessibleParent.cpp:ab9bdbdc7780 : 322 + 0x41]
16:29:57     INFO -      rbx = 0x000000000000007f   rbp = 0x0000000000000000
16:29:57     INFO -      rsp = 0x000000e5c3c5c980   r12 = 0x000000e5c3c5cac8
16:29:57     INFO -      r13 = 0x000000e5c3c5cac0   r14 = 0x000000e5c3c5ca60
16:29:57     INFO -      r15 = 0x000000e5c3c5cb28   rip = 0x000007fcea410992
16:29:57     INFO -      Found by: call frame info
16:29:57     INFO -   2  xul.dll!mozilla::a11y::PDocAccessibleParent::OnMessageReceived(IPC::Message const &) [PDocAccessibleParent.cpp:ab9bdbdc7780 : 416 + 0x3c]
16:29:57     INFO -      rbx = 0x000000000000007f   rbp = 0x0000000000000000
16:29:57     INFO -      rsp = 0x000000e5c3c5ca00   r12 = 0x000000e5c3c5cac8
16:29:57     INFO -      r13 = 0x000000e5c3c5cac0   r14 = 0x000000e5c3c5ca60
16:29:57     INFO -      r15 = 0x000000e5c3c5cb28   rip = 0x000007fce764c0d2
16:29:57     INFO -      Found by: call frame info
16:29:57     INFO -   3  xul.dll!mozilla::dom::PContentParent::OnMessageReceived(IPC::Message const &) [PContentParent.cpp:ab9bdbdc7780 : 2919 + 0x9]
16:29:57     INFO -      rbx = 0x000000000000007f   rbp = 0x0000000000000000
16:29:57     INFO -      rsp = 0x000000e5c3c5cb20   r12 = 0x000000e5c3c5cac8
16:29:57     INFO -      r13 = 0x000000e5c3c5cac0   r14 = 0x000000e5c3c5ca60
16:29:57     INFO -      r15 = 0x000000e5c3c5cb28   rip = 0x000007fce75bf8ab
16:29:57     INFO -      Found by: call frame info
16:29:57     INFO -   4  xul.dll!mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const &) [MessageChannel.cpp:ab9bdbdc7780 : 1872 + 0xa]
16:29:57     INFO -      rbx = 0x000000000000007f   rbp = 0x0000000000000000
16:29:57     INFO -      rsp = 0x000000e5c3c5edd0   r12 = 0x000000e5c3c5cac8
16:29:57     INFO -      r13 = 0x000000e5c3c5cac0   r14 = 0x000000e5c3c5ca60
16:29:57     INFO -      r15 = 0x000000e5c3c5cb28   rip = 0x000007fce7221cfd
16:29:57     INFO -      Found by: call frame info
16:29:57     INFO -   5  xul.dll!mozilla::ipc::MessageChannel::DispatchMessageW(IPC::Message &&) [MessageChannel.cpp:ab9bdbdc7780 : 1807 + 0x5]
16:29:57     INFO -      rbx = 0x000000000000007f   rbp = 0x0000000000000000
16:29:57     INFO -      rsp = 0x000000e5c3c5ee30   r12 = 0x000000e5c3c5cac8
16:29:57     INFO -      r13 = 0x000000e5c3c5cac0   r14 = 0x000000e5c3c5ca60
16:29:57     INFO -      r15 = 0x000000e5c3c5cb28   rip = 0x000007fce722277f
16:29:57     INFO -      Found by: call frame info

https://dxr.mozilla.org/mozilla-central/rev/6d38ad302429c98115c354d643e81987ecec5d3c/accessible/windows/msaa/Platform.cpp#127

> MOZ_ASSERT(wrapper);

The gecko.log shows the following:

[Parent 3664] ###!!! ASSERTION: no proxy for event!: 'Error', file c:/builds/moz2_slave/autoland-w64-d-000000000000000/build/src/accessible/ipc/DocAccessibleParent.cpp, line 216
[Parent 3664] ###!!! ASSERTION: no proxy for event!: 'Error', file c:/builds/moz2_slave/autoland-w64-d-000000000000000/build/src/accessible/ipc/DocAccessibleParent.cpp, line 216
IPDL protocol error: Handler returned error code!

###!!! [Parent][DispatchAsyncMessage] Error: PBrowser::Msg_PDocAccessibleConstructor Processing error: message was deserialized, but the handler returned false (indicating failure)

IPDL protocol error: Handler returned error code!

###!!! [Parent][DispatchAsyncMessage] Error: PBrowser::Msg_PDocAccessibleConstructor Processing error: message was deserialized, but the handler returned false (indicating failure)

Assertion failure: wrapper, at c:/builds/moz2_slave/autoland-w64-d-000000000000000/build/src/accessible/windows/msaa/Platform.cpp:127
#01: DumpFrameArray[C:\slave\test\build\application\firefox\xul.dll +0x38f0992]
#02: soundtouch::SoundTouch::operator=[C:\slave\test\build\application\firefox\xul.dll +0xb2c0d2]
#03: soundtouch::SoundTouch::operator=[C:\slave\test\build\application\firefox\xul.dll +0xa9f8ab]
#04: soundtouch::SoundTouch::operator=[C:\slave\test\build\application\firefox\xul.dll +0x701cfd]
#05: soundtouch::SoundTouch::operator=[C:\slave\test\build\application\firefox\xul.dll +0x70277f]
#06: soundtouch::SoundTouch::operator=[C:\slave\test\build\application\firefox\xul.dll +0x70e93a]
#07: soundtouch::SoundTouch::operator=[C:\slave\test\build\application\firefox\xul.dll +0x70d9e9]
#08: ???[C:\slave\test\build\application\firefox\xul.dll +0x20f7bf]
#09: ???[C:\slave\test\build\application\firefox\xul.dll +0x20e88f]
#10: soundtouch::SoundTouch::operator=[C:\slave\test\build\application\firefox\xul.dll +0x70d3f7]
#11: ???[C:\slave\test\build\application\firefox\xul.dll +0x6dfd82]
#12: ???[C:\slave\test\build\application\firefox\xul.dll +0x6dfa46]
#13: mozilla_dump_image[C:\slave\test\build\application\firefox\xul.dll +0x298182b]
#14: mozilla_dump_image[C:\slave\test\build\application\firefox\xul.dll +0x29f06c4]
#15: workerlz4_maxCompressedSize[C:\slave\test\build\application\firefox\xul.dll +0x3b4d45a]
#16: workerlz4_maxCompressedSize[C:\slave\test\build\application\firefox\xul.dll +0x3bf35c4]
#17: workerlz4_maxCompressedSize[C:\slave\test\build\application\firefox\xul.dll +0x3bf00fe]
#18: workerlz4_maxCompressedSize[C:\slave\test\build\application\firefox\xul.dll +0x3bef934]
#19: ???[C:\slave\test\build\application\firefox\firefox.exe +0x19b6]
#20: ???[C:\slave\test\build\application\firefox\firefox.exe +0x143e]
#21: ???[C:\slave\test\build\application\firefox\firefox.exe +0x1d7b]
#22: TargetNtUnmapViewOfSection[C:\slave\test\build\application\firefox\firefox.exe +0x40c2d]
#23: BaseThreadInitThunk[C:\windows\system32\KERNEL32.DLL +0x167e]
#24: RtlUserThreadStart[C:\windows\SYSTEM32\ntdll.dll +0x1c3f1]
Severity: normal → critical
Keywords: crash
OS: Unspecified → Windows
[Tracking Requested - why for this release]:
The Marionette unit test which is triggering this crash has now been uplifted to all supported branches. And as it looks like we have no crash for mozilla-release and mozilla-esr52 so far. So this is a regression started in Firefox 53 which is currently on Beta.

https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1347451&endday=2017-03-17&startday=2017-03-06&tree=all

Marionette unit test:
https://dxr.mozilla.org/mozilla-central/rev/ff04d410e74b69acfab17ef7e73e7397602d5a68/testing/marionette/harness/marionette_harness/tests/unit/test_navigation.py#108

Trevor, do you have an idea what could have been landed which caused such a crash?
Tracking 53/54/55+ for this crash regression.
no, we just need to debug it.
Flags: needinfo?(tbsaunde+mozbugs)
Alexander, can you help find an owner to take this on since it's a regression in 53 ? Thanks!
Flags: needinfo?(surkov.alexander)
(In reply to Trevor Saunders (:tbsaunde) from comment #4)
> no, we just need to debug it.

Trevor, do you have cycles to take a look?

btw, are you sure that bug 1348351 wasn't marked dupe by mistake? bug 1348351 is about AddChildDoc crash, while this one is about text events stuff.
Flags: needinfo?(surkov.alexander)
the first reported instance was March 14th, doing some retriggers to determine if there is an obvious root cause.  As for the frequency, this is increasing, at 39 times in the last week.
Whiteboard: [stockwell needswork]
I am also trying to come up with a browser-chrome test in a11y that captures the crash atm.
David, can you help here? I think it is too late to fix this for beta 53. We could still take a patch for 54.
Flags: needinfo?(dbolter)
NI Yura because he owns test strategy and what we should do with this intermittent. Yura was is the plan and status (re: comment 10). Please coordinate with Alex on this bug's ownership (in prep for your PTO).

NI Trevor for pending question on comment 8.
Flags: needinfo?(yzenevich)
Flags: needinfo?(tbsaunde+mozbugs)
Flags: needinfo?(dbolter)
(In reply to David Bolter [:davidb] from comment #14)
> NI Yura because he owns test strategy and what we should do with this
> intermittent. Yura was is the plan and status (re: comment 10). Please
> coordinate with Alex on this bug's ownership (in prep for your PTO).

its really not complicated, we just need to debug it and fix it.  It'd be nice to have a test that doesn't involve marionette for it though.  It'd be nice if marionette stopped using a11y, but that's a different issue.

> NI Trevor for pending question on comment 8.

I've been working on it slowely.
Flags: needinfo?(tbsaunde+mozbugs)
Trevor as long as we do not have a debugger argument for Marionette you could do the following workaround:

* Run marionette with --jsdebugger
* Do not click the button in the modal dialog which opens in the browser before the test starts
* Retrieve the pid of Firefox, and attach gdb/lldb manually to the running process
* Click the button in the dialog to let the test start.

Does it help you to get the crash in your debugger? I hope, but if not the debugger support is not that far away. I will work on it once I was able to land the patch on bug 958475.
(In reply to Henrik Skupin (:whimboo) from comment #16)
> Trevor as long as we do not have a debugger argument for Marionette you
> could do the following workaround:

I was under the impression this was windows specific so was looking at it with printf on try.
It is. But I thought to remember that you wanted to use a debugger. So whatever works best for you. If you haven't gotten the details yet about the assertion stack, it is also still something on my list to do for Marionette tests.
(In reply to Henrik Skupin (:whimboo) from comment #18)
> It is. But I thought to remember that you wanted to use a debugger. So
> whatever works best for you. If you haven't gotten the details yet about the
> assertion stack, it is also still something on my list to do for Marionette
> tests.

Henrik, is there a way to figure out what line the test might have triggered the crash?
Flags: needinfo?(yzenevich)
Flags: needinfo?(yzenevich)
Attached patch 1347451 b-c a11y test (obsolete) — Splinter Review
This is a test that AFAIK does the same thing as test_frameset_after_navigating_in_frame test in test_navigation.py. However, this b-c test doesn't seem to crash the browser intermitently, unlike the marionette test:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=8b3175aacd13525d4a21ecc945582011fed2cc47
Flags: needinfo?(yzenevich)
Fixed ESlint
Attachment #8855606 - Attachment is obsolete: true
(In reply to Yura Zenevich [:yzen] from comment #19)
> Henrik, is there a way to figure out what line the test might have triggered
> the crash?

sure. Treeherder also has a gecko.log for those jobs. When you search for the test name you will see all the tracing output from Marionette. This is the data which gets send to the browser, and which is replied to the Python client. The last message before the assertion should give an indication what it is. 

Sorry that I cannot point out the real line of code right now, but I'm on a limited mobile connection and those log files are huge! If you could paste the content around the assertion here, I can certainly help.
Version: unspecified → 53 Branch
Whiteboard: [stockwell needswork] → [stockwell unknown]
Since this is a test crash, marking this fix-optional for 54.
We could still fix this for 55, but let's figure that's optional since Yura is on leave, and aim for 56.
I don't feel the need to track this for 55. If a fix is ready and deemed low risk, please nominate for uplift to Beta55.
We are only 1 week from RC. Mark 56 fix-optional but still happy to have the fix in 56.
https://wiki.mozilla.org/Bugmasters#Intermittent_Test_Failure_Cleanup
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: