Closed Bug 1090921 Opened 10 years ago Closed 8 years ago

Intermittent test_speech_queue.html | This test left crash dumps behind, but we weren't expecting it to! | Main app process exited normally | application crashed [@ MessageLoop::DeletePendingTasks()] | Assertion failure: work_queue_.empty(), at /builds/

Categories

(Core :: IPC, defect, P3)

x86
macOS
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
e10s - ---
firefox36 --- wontfix
firefox37 --- affected
firefox38 --- affected
firefox39 --- affected
firefox-esr31 --- unaffected
b2g-v2.1 --- unaffected
b2g-v2.2 --- affected
b2g-v2.5 --- affected
b2g-master --- affected

People

(Reporter: cbook, Assigned: billm)

References

()

Details

(Keywords: assertion, crash, intermittent-failure, Whiteboard: [e10s])

Attachments

(4 files)

Rev5 MacOSX Mountain Lion 10.8 mozilla-central debug test mochitest-3

https://treeherder.mozilla.org/ui/logviewer.html#?job_id=550346&repo=mozilla-central

14:14:59 INFO - Assertion failure: work_queue_.empty(), at /builds/slave/m-cen-osx64-d-0000000000000000/build/ipc/chromium/src/base/message_loop.cc:410 

14:15:18 INFO - 255 INFO TEST-UNEXPECTED-ERROR | /tests/dom/media/webspeech/synth/test/test_speech_queue.html | This test left crash dumps behind, but we weren't expecting it to! 

14:49:11 WARNING - PROCESS-CRASH | Main app process exited normally | application crashed [@ MessageLoop::DeletePendingTasks()]
14:49:11 INFO - Crash dump filename: /var/folders/qg/zn406_js1y7777hyb4dqlbth00000w/T/tmpelH1T_.mozrunner/minidumps/B78298A1-C95A-42D0-B2AD-C661A843FF04.dmp
14:49:11 INFO - Operating system: Mac OS X
14:49:11 INFO - 10.8.0 12A269
14:49:11 INFO - CPU: amd64
14:49:11 INFO - family 6 model 42 stepping 7
14:49:11 INFO - 8 CPUs
14:49:11 INFO - Crash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
14:49:11 INFO - Crash address: 0x0
14:49:11 INFO - Thread 0 (crashed)
14:49:11 INFO - 0 XUL!MessageLoop::DeletePendingTasks() [message_loop.cc:53d84829b2b8 : 410 + 0x0]
14:49:11 INFO - rbx = 0x00007fff7b240c68 r12 = 0xffffffffffffffff
14:49:11 INFO - r13 = 0x0000000000000002 r14 = 0x00007fff5fbfd9a8
14:49:11 INFO - r15 = 0x0000000000000000 rip = 0x000000010087dc3f
14:49:11 INFO - rsp = 0x00007fff5fbfd870 rbp = 0x00007fff5fbfd8d0
14:49:11 INFO - Found by: given as instruction pointer in context
14:49:11 INFO - 1 XUL!MessageLoop::~MessageLoop() [message_loop.cc:53d84829b2b8 : 178 + 0x7]
14:49:11 INFO - rbx = 0x0000000000000001 r12 = 0xffffffffffffffff
14:49:11 INFO - r13 = 0x0000000000000002 r14 = 0x00007fff5fbfd9a8
14:49:11 INFO - r15 = 0x0000000000000000 rip = 0x000000010087d878
14:49:11 INFO - rsp = 0x00007fff5fbfd8e0 rbp = 0x00007fff5fbfd970
14:49:11 INFO - Found by: call frame info
14:49:11 INFO - 2 XUL!XRE_InitChildProcess [nsEmbedFunctions.cpp:53d84829b2b8 : 557 + 0x4]
14:49:11 INFO - rbx = 0x00007fff5fbfd900 r12 = 0x000000010692f000
14:49:11 INFO - r13 = 0x0000000000000002 r14 = 0x0000000000000000
14:49:11 INFO - r15 = 0x00007fff5fbfeed0 rip = 0x0000000102dcc546
14:49:11 INFO - rsp = 0x00007fff5fbfd980 rbp = 0x00007fff5fbfee90
14:49:11 INFO - Found by: call frame info
14:49:11 INFO - 3 plugin-container!main [plugin-container.cpp:53d84829b2b8 : 158 + 0x9]
14:49:11 INFO - rbx = 0x000000000000000a r12 = 0x0000000000000000
14:49:11 INFO - r13 = 0x0000000000000000 r14 = 0x00007fff5fbfeed0
14:49:11 INFO - r15 = 0x0000000000000000 rip = 0x0000000100000e0b
14:49:11 INFO - rsp = 0x00007fff5fbfeea0 rbp = 0x00007fff5fbfeeb0
14:49:11 INFO - Found by: call frame info
14:49:11 INFO - 4 plugin-container!start + 0x33
This is also affecting the Marionette unit tests, however these crashes are currently not being reported. When we landed bug 1038868 we started to report these crashes as failures, but it got backed out due to the spike of crashes reported.
Kairo: do you know if we see this crashes also in the real world websites ?
Severity: normal → critical
Flags: needinfo?(kairo)
(In reply to Carsten Book [:Tomcat] from comment #18)
> Kairo: do you know if we see this crashes also in the real world websites ?

https://crash-stats.mozilla.com/search/?signature=~MessageLoop%3A%3ADeletePendingTasks has one crash in the last week with this signature.
Flags: needinfo?(kairo)
It's quite unclear if that crash signature on 34 has anything at all to do with this; the stack backtraces are rather different (and this was an assertion, there's no reason to be sure it would crash in an opt build).
Bug 1038868 apparently makes this into a really bad fail (by exposing that it's failing).  That one happens in an entirely different Mn test, not related to speech or media.  These seem to be shutdown-timing problems with IPC shutdown specific to Mac.  CC-ing IPC, shutdown and Mac folk, since this doesn't seem to be media-related at all (or at most indirectly).
Component: WebRTC → IPC
I've also hit this in Nightly by opening a non-e10s window, then closing the e10s window.  Same stack
Note: while the test failures are Mac, this failure was on Linux Inbound
I can also reproduce this by opening Nightly (now e10s by default), visiting a site, and then closing Nightly.
This happens almost every time shutting down the browser in my debug builds...
Whiteboard: [e10s]
Requesting e10s tracking on this: per comment 31 and various comments in #e10s, this is a bad shutdown crash that gets in the way of running and testing debug builds.
tracking-e10s: --- → ?
Assignee: nobody → wmccloskey
Is there any progress on this? It's blocking us from landing bug 1038868 for over two months now, which will improve our ability to check and report crashes in Firefox OS on devices.
Flags: needinfo?(wmccloskey)
Dave, if you have an STR that I can reproduce, it would help a lot here. I don't see the crashes that other people have seen.

That said, I probably won't get to this for a few more weeks.
Flags: needinfo?(wmccloskey)
It happened reliably on shutdown with builds from a month ago (see e.g. comment 31 and comment 41), so that's a place to start looking at least.
It (still) happens all the time for me when shutting down my inbound debug builds after doing webrtc testing. (likely the webrtc tests have nothing to do with it, and I think I crash similarly without it.   I'm sure there's a timing component to this (which is why it's intermittent.)
For me, this happens all the time, while trying to run marionette-unit tests on Firefox debug builds on MacOSX10.9.5:
MINIDUMP_STACKWALK=/Users/mwargers/mercurial/tools/breakpad/osx/minidump_stackwalk python runtests.py --binary=/Users/mwargers/mozilla-central/obj-x86_64-apple-darwin13.4.0/dist/NightlyDebug.app/Contents/MacOS/firefox-bin --type=browser --symbols-path=/Users/mwargers/mozilla-central/obj-x86_64-apple-darwin13.4.0/dist/firefox-38.0a1.en-US.mac64.crashreporter-symbols.zip  tests/unit-tests.ini

With a release build, I don't get this crash.
Attached file stack.txt —
Ok, I crash every time, while trying to run layout/base/tests/marionette/test_selectioncarets.py
I attached the stack that I get while running that. Unfortunately, it doesn't seem like a useful stack to me.
I'm surprised these tests seem to work just fine on treeherder, while they are crashing for me, locally.
(In reply to Martijn Wargers [:mwargers] (QA) from comment #125)
> I'm surprised these tests seem to work just fine on treeherder, while they
> are crashing for me, locally.

This is *very* timing-sensitive, so it's not that surprising.  I get failures just trying to exit local debug builds.

Any ETA?  I presume it's still blocking an important landing
Flags: needinfo?(wmccloskey)
I'm hoping to look into some IPC bugs in a week or two. I don't know if that will fix this crash or not though.
Flags: needinfo?(wmccloskey)
Attached patch add debugging output — — Splinter Review
I'm pretty sure this is caused by MessageChannel enqueuing its OnMaybeDequeueOne task too late, but I want to make sure of that before I go on a crusade.
Attachment #8577004 - Flags: review?(dvander)
Attachment #8577004 - Flags: review?(dvander) → review+
Unexpected task! Unknown:Unknown:-1

That is disappointing. Maybe things have been shut down to the point where the data is no longer available.
(In reply to Martijn Wargers [:mwargers] (QA) from comment #124)
> Ok, I crash every time, while trying to run
> layout/base/tests/marionette/test_selectioncarets.py
> I attached the stack that I get while running that. Unfortunately, it
> doesn't seem like a useful stack to me.

I don't seem to get this one anymore, running ./mach marionette-test layout/base/tests/marionette/test_selectioncarets.py with a recently updated debug build.
QA Whiteboard: QAExclude
I can see the same crash when closing firefox after opening ./toolkit/devtools/server/tests/browser/animation.html
Attached patch task_tracking.diff — — Splinter Review
In case of the crash in comment #180, attaching patch is needed to output BirthPlace.

The place was http://hg.mozilla.org/mozilla-central/file/617dbce26726/ipc/glue/MessageChannel.cpp#l737
I can get this reproduced on OS X debug builds fairly easily, by having content in one tab, opening a new tab, closing the old tab before the new one finished loading.
See Also: → 1236350
Bulk assigning P3 to all open intermittent bugs without a priority set in Firefox components per bug 1298978.
Priority: -- → P3
This seems to have stopped. We did a lot of work on this area of the code, so I'm not surprised.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
Removing leave-open keyword from resolved bugs, per :sylvestre.
Keywords: leave-open
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: