Closed Bug 1388764 Opened 7 years ago Closed 4 years ago

Intermittent PROCESS-CRASH | <test> | application crashed [@ mozilla::ipc::MessageChannel::WillDestroyCurrentMessageLoop()]

Categories

(Core :: IPC, defect, P3)

defect

Tracking

()

RESOLVED FIXED
85 Branch
Tracking Status
firefox-esr78 --- disabled
firefox56 --- wontfix
firefox57 --- disabled
firefox58 --- disabled
firefox83 --- disabled
firefox84 --- disabled
firefox85 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: gbrown)

References

Details

(Keywords: crash, intermittent-failure, Whiteboard: [stockwell fixed:other])

Crash Data

Attachments

(2 files)

See Also: → 1397739
This seems to have spiked pretty badly recently. Can you take a look, Bill?
Flags: needinfo?(wmccloskey)
See Also: → 1398070
Whiteboard: [stockwell needswork]
this seems to fail often on reftest[-no-accel] chunk 3 on linux debug; possibly we can narrow down the failure to a specific test case? Is this known already?
Flags: needinfo?(gbrown)
I don't think that is known already. It might be helpful. Note that bug 1398070 looks related, so :mrbkap might have additional info/thoughts.
Flags: needinfo?(gbrown) → needinfo?(mrbkap)
I see this is on Ru3 all the time, so I ran Ru in 32 chunks instead of 8 and see it in Ru12 and retriggered 10 times: https://treeherder.mozilla.org/#/jobs?repo=try&revision=c55ce497f35cb425f0a5ce3003581f2be08cb190 we are in layout/reftest/bugs/reftest.list for Ru12, specifically in the range of: layout/reftests/bugs/411367-1.html ... layout/reftests/bugs/518172-1a.html I then narrowed it down to a range of 487 tests in the bugs/reftest.list and pushed with 24 chunks- I retriggered those chunks 20 times each: https://treeherder.mozilla.org/#/jobs?repo=try&revision=99ce41a541c4b5344b5eeee160506ea16318c6e6 ^ and no failures. I also pushed those same 487 tests in 2 chunks with 40 retriggers each: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4e4b672476e874d3e28aaf9cf1d7b459bf0987ee ^ Given this- I wonder if our failures are related to loading the full set of tests while only running a few. On this theory, I have pushed with 128 chunks (yeah, insane): https://treeherder.mozilla.org/#/jobs?repo=try&revision=ad1968069713b79961b46f39554eeb3253feae4e I will selectively retrigger jobs in that above push to get full coverage of the Ru12 and see if that helps.
I forgot to mention the 2 chunks with 40 retriggers resulted in no failures
on my 128 chunk run, I see a failure on Ru49 (120 tests): https://treeherder.mozilla.org/#/jobs?repo=try&revision=ad1968069713b79961b46f39554eeb3253feae4e the first test in Ru49 is... layout/reftests/bugs/518172-1b.html we failed in Ru12 in a push with fewer failures on with the end of the range as: layout/reftests/bugs/518172-1a.html one could deduce that 518172-*.html is problematic. In additional I see this in ru50 and ru48 which is a different chunk, but only 1 instance per chunk vs 5 instances in ru49. bug 518172 is related to -moz-transform if that helps.
One thing to note is that bug 1398070 seems to apply to the parent process while these assertions are triggering in the child process. I don't see the protocol in question in these crash logs, so we'll have to do some additional logging somewhere to get to the bottom of this.
billm's recent checkin probably fixed this.
Flags: needinfo?(wmccloskey)
Flags: needinfo?(mrbkap)
yeah, I see many failures today, I suspect this isn't fixed as I see 14 failures today: https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1388764&endday=2017-09-26&startday=2017-09-26&tree=trunk many of them are after bug 1398070 was in-tree.
bug 1398070 just disabled the crash for opt, we still have debug failures. :mrbkap, can you help us get a fix for this soon? I don't know if you are the right person to do this- but at a failure rate of 50+/day, this is making windows7 reftests have little value.
Flags: needinfo?(mrbkap)
Based on looking at 3 logs, this crash is happening in an NPAPI process spawned during layout/reftests/bugs/508908-1.xul (To figure that out, I looked at the line in the log after "plugin process: missing output line for total leaks!" which gives the pid of the process that crashed: missing output line from log file c:\users\genericworker\appdata\local\temp\tmphxulg2.mozrunner\runreftest_leaks_plugin_pid4260.log Then I went and looked in the log for that pid.)
I looked at 4 more logs, and they all followed that pattern. Maybe you could disable that test and file a followup bug about it in the Plugins component?
Flags: needinfo?(jmaher)
I verified on try that disabling that test seems to solve our problems- thanks for pointing it out!
Flags: needinfo?(jmaher)
Assignee: nobody → jmaher
Status: NEW → ASSIGNED
Attachment #8912358 - Flags: review?(gbrown)
Attachment #8912358 - Flags: review?(gbrown) → review+
Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/56f8de0910b9 disable layout/reftests/bugs/508908-1.xul to avoid crash in mozilla::ipc::MessageChannel::WillDestroyCurrentMessageLoop(). r=gbrown
Flags: needinfo?(mrbkap)
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla58
The mochitest issues are happening on trunk still too.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
It looks like the crashing NPAPI plugin is being created in test_iframe_sandbox_plugins.html now. I guess there's a bug in the test plugin, somehow.
Kyle, we had to disable layout/reftests/bugs/508908-1.xul already in this bug due to the high frequency of the crashes it was hitting. It looks like test_iframe_sandbox_plugins.html is going to be next on the list for the same reason. How terrible will it be if we aren't running that test on debug builds on 57/58? Do you have cycles to look into why the test plugin is behaving so badly?
Flags: needinfo?(kyle)
Yeah, go ahead and disable for right now. I'll assign this to myself and try to look at it soon, though I've had problems replicating locally so far.
Flags: needinfo?(kyle)
Thanks, Kyle.
Assignee: jmaher → kyle
Keywords: leave-open
Target Milestone: mozilla58 → ---
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/0d0caaa6c2f0 Skip test_iframe_sandbox_plugins.html on debug builds for frequent mozilla::ipc::MessageChannel::WillDestroyCurrentMessageLoop() shutdown crashes. r=qdot
This is indeed making debug mochitests happier. https://hg.mozilla.org/releases/mozilla-beta/rev/de6b8900aa7b
It's less than permafail now, but now dom/html/test/test_object_plugin_nav.html is hitting this crash on at least Windows still :(. Guess I'll try working on a more thorough disabling patch on Try.
Disabling test_object_plugin_nav.html on Windows debug is enough to make mochitest-2 happy across the board.
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/b87f7d3fddff Skip test_object_plugin_nav.html on Windows debug builds for frequent mozilla::ipc::MessageChannel::WillDestroyCurrentMessageLoop() shutdown crashes. r=qdot
For whatever reason, test_object_plugin_nav.html is still causing issues on Linux debug for at least my 58-as-Beta Try simulations. Will extend the disabling from comment 57 to cover Linux as well.
Flags: needinfo?(ryanvm)
Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/a5059c4e527e Skip test_object_plugin_nav.html on Linux debug builds for frequent mozilla::ipc::MessageChannel::WillDestroyCurrentMessageLoop() shutdown crashes. r=qdot
Flags: needinfo?(ryanvm)
Crash Signature: [@ mozilla::ipc::MessageChannel::WillDestroyCurrentMessageLoop()] → [@ mozilla::ipc::MessageChannel::WillDestroyCurrentMessageLoop()] [@ mozilla::ipc::MessageChannel::Clear()]
Crash Signature: [@ mozilla::ipc::MessageChannel::WillDestroyCurrentMessageLoop()] [@ mozilla::ipc::MessageChannel::Clear()] → [@ mozilla::ipc::MessageChannel::WillDestroyCurrentMessageLoop()] [@ mozilla::ipc::MessageChannel::Clear()] [@ mozilla::ipc::MessageChannel::Close()]

Moving these bugs (intermittent test failures with crashes) out of P5.

Priority: P5 → --
Priority: -- → P3
Assignee: kyle → nobody
Summary: Intermittent PROCESS-CRASH | Main app process exited normally | application crashed [@ mozilla::ipc::MessageChannel::WillDestroyCurrentMessageLoop()] → Intermittent PROCESS-CRASH | <test> | application crashed [@ mozilla::ipc::MessageChannel::WillDestroyCurrentMessageLoop()]

(In reply to Intermittent Failures Robot from comment #147)

1 failures in 4810 pushes (0.0 failures/push) were associated with this bug in the last 7 days.

This failure was mis-classified; it does not appear to be related to this bug (no MessageChannel crash).

(In reply to Pulsebot from comment #52)

Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0d0caaa6c2f0
Skip test_iframe_sandbox_plugins.html on debug builds for frequent
mozilla::ipc::MessageChannel::WillDestroyCurrentMessageLoop() shutdown
crashes. r=qdot

test_iframe_sandbox_plugins.html was removed entirely by

https://hg.mozilla.org/mozilla-central/rev/4d3ed7f582f503fb36d3eda50fcc55a3e1d8269e.

(In reply to Pulsebot from comment #57)

Pushed by ryanvm@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b87f7d3fddff
Skip test_object_plugin_nav.html on Windows debug builds for frequent
mozilla::ipc::MessageChannel::WillDestroyCurrentMessageLoop() shutdown
crashes. r=qdot

test_object_plugin_nav.html was renamed to test_object_nav.html, also by

https://hg.mozilla.org/mozilla-central/rev/4d3ed7f582f503fb36d3eda50fcc55a3e1d8269e.

Re-enable tests previously disabled by this bug, for intermittent crashes.

Assignee: nobody → whole.grains
Keywords: leave-open
Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/303700ad84c3 Re-enable test_object_nav.html and 508908-1.xhtml; r=jmaher
Whiteboard: [stockwell disabled] → [stockwell fixed:other]
Status: REOPENED → RESOLVED
Closed: 7 years ago4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 85 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: