Closed
Bug 1334097
Opened 7 years ago
Closed 7 years ago
Crash in OOM | unknown | mozalloc_abort | mozalloc_handle_oom | moz_xmalloc | mozilla::ipc::MessageChannel::MaybeUndeferIncall
Categories
(Core :: IPC, defect)
Tracking
()
RESOLVED
FIXED
mozilla55
People
(Reporter: philipp, Assigned: billm)
References
Details
(Keywords: crash, regression)
Crash Data
Attachments
(2 files)
7.55 KB,
patch
|
bugzilla
:
review+
gchang
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
7.28 KB,
patch
|
jcristau
:
approval-mozilla-esr52+
|
Details | Diff | Splinter Review |
This bug was filed from the Socorro interface and is report bp-2f8dd5ce-2d43-4ee4-8f6c-dd1c92170126. ============================================================= Crashing Thread (0) Frame Module Signature Source 0 mozglue.dll mozalloc_abort(char const* const) memory/mozalloc/mozalloc_abort.cpp:33 1 mozglue.dll mozalloc_handle_oom(unsigned int) memory/mozalloc/mozalloc_oom.cpp:46 2 mozglue.dll moz_xmalloc memory/mozalloc/mozalloc.cpp:85 3 xul.dll mozilla::ipc::MessageChannel::MaybeUndeferIncall() ipc/glue/MessageChannel.cpp:1854 4 xul.dll `anonymous namespace'::UnhookNeuteredWindows ipc/glue/WindowsMessageLoop.cpp:645 5 xul.dll mozilla::ipc::MessageChannel::Call(IPC::Message*, IPC::Message*) ipc/glue/MessageChannel.cpp:1324 6 xul.dll mozilla::plugins::PPluginScriptableObjectChild::CallNPN_Evaluate(nsCString const&, mozilla::plugins::Variant*, bool*) obj-firefox/ipc/ipdl/PPluginScriptableObjectChild.cpp:104 7 xul.dll mozilla::plugins::PluginScriptableObjectChild::Evaluate(_NPString*, _NPVariant*) dom/plugins/ipc/PluginScriptableObjectChild.cpp:1189 8 npswf32_24_0_0_194.dll F_2019925551_______________________________________________________________________________ F_1128892918________________________________________________________________________:48888 plugin crashes with this signature are regressing since firefox 52 builds in all versions of windows and in code that apparently got added with bug 1312960. Correlations for Firefox Aurora (100.0% in signature vs 14.05% overall) Module "plugin-container.exe" = true (100.0% in signature vs 22.11% overall) Module "comdlg32.dll" = true (100.0% in signature vs 26.38% overall) startup_crash = null (95.00% in signature vs 19.43% overall) Module "winspool.drv" = true [100.0% vs 22.07% if platform_pretty_version = Windows 10] (100.0% in signature vs 16.47% overall) Module "dinput8.dll" = true [100.0% vs 22.28% if platform_pretty_version = Windows 10] (100.0% in signature vs 16.61% overall) Module "dsound.dll" = true [100.0% vs 22.41% if platform_pretty_version = Windows 10] (100.0% in signature vs 22.81% overall) Module "icm32.dll" = true [100.0% vs 31.83% if platform_pretty_version = Windows 10] (100.0% in signature vs 30.89% overall) Module "urlmon.dll" = true [100.0% vs 42.64% if platform_pretty_version = Windows 7] (30.00% in signature vs 86.75% overall) plugin_version = null (70.00% in signature vs 13.25% overall) plugin = true (70.00% in signature vs 13.25% overall) process_type = plugin (70.00% in signature vs 11.62% overall) Module "NPSWF32_24_0_0_194.dll" = true [83.33% vs 11.10% if platform_version = 6.1.7601 Service Pack 1] (45.00% in signature vs 90.97% overall) e10s_enabled = 1 (50.00% in signature vs 10.72% overall) plugin_version = 24.0.0.194
Comment 1•7 years ago
|
||
There are not a ton of these crashes (I only see 54). About half are in the plugin process. I looked at a few that weren't, and they all still had Flash on the stack. None of them have an OOM allocation size. I assume that's expected for the plugin process, but I'm not sure why that is the case for the others. It might be because the process annotation isn't being applied properly. I see this crash which does not have process type plugin, but the stack does not look like a main process stack to me: bp-fe8a5291-5352-4bf0-8710-29b072170120
Updated•7 years ago
|
Comment 2•7 years ago
|
||
Bill, any thoughts on what might be going on here?
Flags: needinfo?(wmccloskey)
Comment 3•7 years ago
|
||
It's possible to tell the allocation size by loading the minidump, but you're right that currently OOM size annotations don't work in plugin processes because we try to send that data through the IPC layer instead of the out-of-band annotation layer.
Assignee | ||
Comment 4•7 years ago
|
||
Well, we're allocating a small, fixed-sized object. There's not a lot we can do about that. My patch switched us from using a vector to a linked list. A linked list should actually reduce the chance of OOM since it doesn't need contiguous memory.
Flags: needinfo?(wmccloskey)
Comment 5•7 years ago
|
||
[Tracking Requested - why for this release]: #23 top crash on 52.0b in last 7 days. The crash signature appeared from 52.0a2 and seen on 53 and 54 as well. It seems not a general OOM case.
tracking-firefox52:
--- → ?
Assignee | ||
Comment 6•7 years ago
|
||
(In reply to Astley Chen [:astley] (UTC+8) from comment #5) > It seems not a general OOM case. Why do you say that?
Comment 7•7 years ago
|
||
We used to see crashes on the line after it (mPending.insertBack(task);), so it might be possible this is some kind of signature change. I don't think anybody checked the process types on those crashes. I'd guess that maybe this could happen if the main process is hanging really badly, and the plugin process keeps trying to send messages, so the message queue backs up.
Updated•7 years ago
|
Flags: needinfo?(kchen)
Comment 8•7 years ago
|
||
The crash rate seems mitigated after new flash version 24.0.0.221 was out on 2/14, but that might not be the case though. Besides, this crash is out of Top50 crasher on 52.0b. Keep monitoring. [1] https://helpx.adobe.com/flash-player/release-note/fp_24_air_24_release_notes.html
tracking-firefox52:
? → ---
Reporter | ||
Comment 9•7 years ago
|
||
the number of submitted crash reports only has gone down after the infobar submission mechanism has been switched off in 52.0b6 (before that 85% of reports came through the infobar) - so the real world impact of this will stay the same
Comment 10•7 years ago
|
||
(In reply to [:philipp] from comment #9) > the number of submitted crash reports only has gone down after the infobar > submission mechanism has been switched off in 52.0b6 (before that 85% of > reports came through the infobar) - so the real world impact of this will > stay the same Thanks for the heads-up. Where could I get this info(infobar is on or off in specific build) ?
Reporter | ||
Comment 11•7 years ago
|
||
the infobar mechanism is active in beta builds which have the EARLY_BETA_OR_EARLIER flag set. there is some documentation about it at https://wiki.mozilla.org/Release_Management/Beta_Release_Checklist#Set_EARLY_BETA_OR_EARLIER - but i don't know how to easily check/verify that. it's possible to exclude all reports coming through the infobar in supersearch though in order to get a consistent picture: https://crash-stats.mozilla.com/search/?signature=%3DOOM%20|%20unknown%20|%20mozalloc_abort%20|%20mozalloc_handle_oom%20|%20moz_xmalloc%20|%20mozilla%3A%3Aipc%3A%3AMessageChannel%3A%3AMaybeUndeferIncall&submitted_from_infobar=!__true__&date=%3E%3D2016-08-21T08%3A33%3A55.000Z&_sort=-date&_facets=signature&_facets=version&_facets=user_comments&_facets=install_time&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-version
Updated•7 years ago
|
Comment 12•7 years ago
|
||
Even excluding those infobar submission reports, using the query from comment 11, I'm seeing very high crash volume in ESR52. And, around 100 crashes a week from various beta 53 versions. Bill is there any thing actionable here? Can you find someone to take another look at the ESR crashes? If you facet on platform you get nearly all Windows NT: https://crash-stats.mozilla.com/search/?signature=%3DOOM%20%7C%20unknown%20%7C%20mozalloc_abort%20%7C%20mozalloc_handle_oom%20%7C%20moz_xmalloc%20%7C%20mozilla%3A%3Aipc%3A%3AMessageChannel%3A%3AMaybeUndeferIncall&submitted_from_infobar=%21__true__&date=%3E%3D2016-08-21T08%3A33%3A55.000Z&date=%3C2017-04-14T01%3A18%3A00.000Z&_sort=-date&_facets=signature&_facets=version&_facets=user_comments&_facets=install_time&_facets=platform&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-platform
Assignee | ||
Comment 13•7 years ago
|
||
This does seem like a problem. The more I think about it, OOMs in the plugin process don't make a lot of sense. The plugin is almost always Flash, and it runs all its code in a separate process. Our plugin process is just a broker. So it should be very small. It seems like we must be leaking messages somewhere. I'll look into this next week.
Assignee | ||
Comment 14•7 years ago
|
||
I think I know what's happening here. We've always had a problem where we busy wait during Call invocations. If there's a deferred messages, we undefer it [1]. Then we don't block on the condition var because mPending is non-empty [2]. But then when we dispatch the message, we defer it again because nothing has really changed [3]. I think my patch made this worse. Before, we would constantly shift the message back and forth between mPending and mDeferred, but that was it. My patch causes us to continually allocate new MessageTasks and post them to the event queue. I'll have to think about how to fix this. I've wanted to fix the busy waiting for a while, but I've been scared to do so. We don't exactly have great tests for plugins, and I don't understand all the code perfectly. On the other hand, I think I did the re-posting thing because it's needed to preserve the correct message ordering. So if we don't fix the busy waiting, there isn't an obvious solution. [1] http://searchfox.org/mozilla-central/rev/d8ac097a1de2544f0e51d186bc850662c5b7430a/ipc/glue/MessageChannel.cpp#1460 [2] http://searchfox.org/mozilla-central/rev/d8ac097a1de2544f0e51d186bc850662c5b7430a/ipc/glue/MessageChannel.cpp#1463 [3] http://searchfox.org/mozilla-central/rev/d8ac097a1de2544f0e51d186bc850662c5b7430a/ipc/glue/MessageChannel.cpp#1957
Assignee | ||
Comment 15•7 years ago
|
||
Actually, the busy waiting has not been around forever as I assumed. It started happening in bug 1170231. I'll have to read that over very carefully to try to understand what it was trying to fix.
Assignee | ||
Updated•7 years ago
|
Flags: needinfo?(wmccloskey)
Flags: needinfo?(kchen)
Assignee | ||
Comment 16•7 years ago
|
||
See comment 14.
Assignee: nobody → wmccloskey
Attachment #8860506 -
Flags: review?(aklotz)
Assignee | ||
Comment 17•7 years ago
|
||
Oh, I went through the test case in bug 1170231 and it passes. Adding back the old condition caused it to deadlock. So I think this patch successfully avoids the deadlock while also avoiding busy waiting.
Comment 18•7 years ago
|
||
Adding tracking for 54 since we are possibly close to a fix. Thanks Bill!
status-firefox55:
--- → affected
tracking-firefox54:
--- → +
Comment 19•7 years ago
|
||
Comment on attachment 8860506 [details] [diff] [review] Avoid busy waiting caused by MaybeUndeferIncall Review of attachment 8860506 [details] [diff] [review]: ----------------------------------------------------------------- Nice!
Attachment #8860506 -
Flags: review?(aklotz) → review+
Comment 20•7 years ago
|
||
Pushed by wmccloskey@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/78480dd41c9d Avoid busy waiting caused by MaybeUndeferIncall (r=aklotz)
Comment 21•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/78480dd41c9d
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment 22•7 years ago
|
||
This grafts cleanly to Beta, so it's good for uplift if/when you feel comfortable nominating it. For ESR52, it'll need a bit of rebasing. I'm thinking this is something we may want to consider backporting all the way back to ESR52 if it's reasonable to do so, though.
Flags: needinfo?(wmccloskey)
Assignee | ||
Comment 23•7 years ago
|
||
[Approval Request Comment] If this is not a sec:{high,crit} bug, please state case for ESR consideration: User impact if declined: We think this will fix the #1 plugin crash in ESR. Fix Landed on Version: 55 Risk to taking this patch (and alternatives if risky): At worst, this could cause more plugin instability. It doesn't affect our regular IPC paths. String or UUID changes made by this patch: None See https://wiki.mozilla.org/Release_Management/ESR_Landing_Process for more info.
Flags: needinfo?(wmccloskey)
Attachment #8863044 -
Flags: approval-mozilla-esr52?
Assignee | ||
Comment 24•7 years ago
|
||
Comment on attachment 8860506 [details] [diff] [review] Avoid busy waiting caused by MaybeUndeferIncall Approval Request Comment [Feature/Bug causing the regression]: bug 1312960 [User impact if declined]: OOM crashes in plugin process [Is this code covered by automated tests?]: somewhat [Has the fix been verified in Nightly?]: no [Needs manual test from QE? If yes, steps to reproduce]: no STR [List of other uplifts needed for the feature/fix]: none [Is the change risky?]: not very. it only affects plugins. [Why is the change risky/not risky?]: [String changes made/needed]: none
Attachment #8860506 -
Flags: approval-mozilla-beta?
Assignee | ||
Comment 25•7 years ago
|
||
Also, I don't see any crashes on Nightly since the fix landed. However, the crash rate on Nightly was pretty low before my patch, so it's not obviously fixed.
Comment 26•7 years ago
|
||
Comment on attachment 8860506 [details] [diff] [review] Avoid busy waiting caused by MaybeUndeferIncall Fix a crash. Beta54+. Should be in 54 beta 4.
Attachment #8860506 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 27•7 years ago
|
||
uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/4a587e536117
Comment 28•7 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #24) > [User impact if declined]: OOM crashes in plugin process > [Is this code covered by automated tests?]: somewhat > [Has the fix been verified in Nightly?]: no > [Needs manual test from QE? If yes, steps to reproduce]: no STR Setting qe-verify- since there's nothing actionable for QE here.
Flags: qe-verify-
Updated•7 years ago
|
Comment 29•7 years ago
|
||
Comment on attachment 8863044 [details] [diff] [review] esr patch fix top plugin crash in esr52.2
Attachment #8863044 -
Flags: approval-mozilla-esr52? → approval-mozilla-esr52+
Comment 30•7 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-esr52/rev/fe2a2c7e88cb
You need to log in
before you can comment on or make changes to this bug.
Description
•