Closed Bug 1344008 Opened 8 years ago Closed 8 years ago

Crash in OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xrealloc | nsTArray_base<T>::EnsureCapacity<T> | nsTArray_Impl<T>::AppendElement<T> | TelemetryHistogram::Accumulate

Categories

(Toolkit :: Telemetry, defect)

52 Branch
x86
Windows
defect
Not set
critical

Tracking

()

RESOLVED DUPLICATE of bug 1369041
Tracking Status
firefox-esr45 --- unaffected
firefox51 --- unaffected
firefox52 --- wontfix
firefox-esr52 --- affected
firefox53 --- wontfix
firefox54 --- fix-optional
firefox55 --- affected

People

(Reporter: philipp, Unassigned)

Details

(Keywords: crash, regression)

Crash Data

This bug was filed from the Socorro interface and is report bp-ab9b726b-05b8-4fb1-bd28-bd1b82170302. ============================================================= Crashing Thread (0) Frame Module Signature Source 0 mozglue.dll mozalloc_abort(char const* const) memory/mozalloc/mozalloc_abort.cpp:33 1 mozglue.dll mozalloc_handle_oom(unsigned int) memory/mozalloc/mozalloc_oom.cpp:46 2 mozglue.dll moz_xrealloc memory/mozalloc/mozalloc.cpp:107 3 xul.dll nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::EnsureCapacity<nsTArrayInfallibleAllocator>(unsigned int, unsigned int) obj-firefox/dist/include/nsTArray-inl.h:183 4 xul.dll nsTArray_Impl<mozilla::FramePropertyTable::PropertyValue, nsTArrayInfallibleAllocator>::AppendElement<mozilla::FramePropertyTable::PropertyValue, nsTArrayInfallibleAllocator>(mozilla::FramePropertyTable::PropertyValue&&) obj-firefox/dist/include/nsTArray.h:2073 5 xul.dll TelemetryHistogram::Accumulate(mozilla::Telemetry::ID, unsigned int) toolkit/components/telemetry/TelemetryHistogram.cpp:2212 6 xul.dll nsCycleCollector::CleanupAfterCollection() xpcom/base/nsCycleCollector.cpp:3570 7 xul.dll nsCycleCollector::Collect(ccType, js::SliceBudget&, nsICycleCollectorListener*, bool) xpcom/base/nsCycleCollector.cpp:3678 8 xul.dll nsCycleCollector_collect(nsICycleCollectorListener*) xpcom/base/nsCycleCollector.cpp:4144 9 xul.dll nsJSContext::CycleCollectNow(nsICycleCollectorListener*, int) dom/base/nsJSEnvironment.cpp:1440 10 xul.dll nsJSEnvironmentObserver::Observe(nsISupports*, char const*, char16_t const*) dom/base/nsJSEnvironment.cpp:338 11 xul.dll nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*) xpcom/ds/nsObserverList.cpp:112 12 xul.dll nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) xpcom/ds/nsObserverService.cpp:281 13 xul.dll nsThread::DoMainThreadSpecificProcessing(bool) xpcom/threads/nsThread.cpp:1454 14 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1172 15 xul.dll mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:96 16 xul.dll mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:301 17 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:225 18 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:205 19 xul.dll nsBaseAppShell::Run() widget/nsBaseAppShell.cpp:156 20 xul.dll nsAppShell::Run() widget/windows/nsAppShell.cpp:262 21 xul.dll XRE_RunAppShell toolkit/xre/nsEmbedFunctions.cpp:866 22 xul.dll mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:269 23 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:225 24 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:205 25 xul.dll XRE_InitChildProcess toolkit/xre/nsEmbedFunctions.cpp:698 26 firefox.exe content_process_main(int, char** const) ipc/contentproc/plugin-container.cpp:197 27 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:115 28 firefox.exe __scrt_common_main_seh f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253 29 kernel32.dll BaseThreadInitThunk 30 ntdll.dll __RtlUserThreadStart 31 ntdll.dll _RtlUserThreadStart this oom crash signature is regressing in firefox 52 builds on windows. just by looking at how the signature progressed in the nightly & aurora channels it looks like a patch that landed on nightly mid-january addressed that issue (53.0a1 build 20170115030210 was the last version which submitted a report with this signature).
Flags: needinfo?(gfritzsche)
This is a OOM content process crash. Presumably this goes through: - internal_Accumulate(): https://hg.mozilla.org/releases/mozilla-release/annotate/2183f7cb4f88/toolkit/components/telemetry/TelemetryHistogram.cpp#l1413 - internal_RemoteAccumulate(): https://hg.mozilla.org/releases/mozilla-release/annotate/2183f7cb4f88/toolkit/components/telemetry/TelemetryHistogram.cpp#l1373 ... which is where the OOMing array alloc probably happens. Those allocations are however bounded by a constant limit. The report says the OOM allocation size is 512 KB - this doesn't seem that excessive? Mid january that IPC code was refactored into a different file: https://hg.mozilla.org/mozilla-central/rev/850f95f34e6d As the logic didn't really change in the process, i'd assume that either: - we still see this under a different signature, e.g. involving TelemetryIPCAccumulator [1] - something else is behind the OOM spikes and was fixed separately Chris, any thoughts? 1: https://crash-stats.mozilla.com/search/?signature=~TelemetryIPCAccumulator&date=%3E%3D2017-02-24T10%3A59%3A00.000Z&date=%3C2017-03-03T10%3A59%3A00.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature
Flags: needinfo?(gfritzsche) → needinfo?(chutten)
thanks for the hint - so the crashes are probably continuing as [@OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xrealloc | nsTArray_base<T>::EnsureCapacity<T> | nsTArray_Impl<T>::AppendElement<T> | TelemetryIPCAccumulator::AccumulateChildHistogram] in 53 and later.
Crash Signature: [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xrealloc | nsTArray_base<T>::EnsureCapacity<T> | nsTArray_Impl<T>::AppendElement<T> | TelemetryHistogram::Accumulate] → [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xrealloc | nsTArray_base<T>::EnsureCapacity<T> | nsTArray_Impl<T>::AppendElement<T> | TelemetryHistogram::Accumulate] [@OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xrealloc | nsTArr…
I'm afraid I don't have much to add here... though... CycleCollector's finishing up and we're trying to allocate. What are the chances that we're in a memory pressure situation and then double the size of our storage before CC can release to us the space?
Flags: needinfo?(chutten)
Georg, any thoughts for how to move forward with this?
Flags: needinfo?(gfritzsche)
Benjamin, do you know who could provide input on this crash? It's unclear to us what we're dealing with here.
Flags: needinfo?(gfritzsche) → needinfo?(benjamin)
There are some very large allocations going on: https://crash-stats.mozilla.com/search/?signature=%3DOOM%20%7C%20large%20%7C%20mozalloc_abort%20%7C%20mozalloc_handle_oom%20%7C%20moz_xrealloc%20%7C%20nsTArray_base%3CT%3E%3A%3AEnsureCapacity%3CT%3E%20%7C%20nsTArray_Impl%3CT%3E%3A%3AAppendElement%3CT%3E%20%7C%20TelemetryIPCAccumulator%3A%3AAccumulateChildHistogram&date=%3E%3D2017-03-16T12%3A11%3A00.000Z&date=%3C2017-03-23T12%3A11%3A00.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=oom_allocation_size#crash-reports Looking at https://crash-stats.mozilla.com/report/index/c407c872-66cb-4533-8379-7440c2170323 in particular: gHistogramAccumulations is being appended-to and is quite large. https://hg.mozilla.org/releases/mozilla-beta/annotate/a5ccc32310c9/toolkit/components/telemetry/TelemetryIPCAccumulator.cpp#l47 the allocation size is 524,288 bytes. I don't know what this means precisely. I see two options: We already hit the kHistogramAccumulationsArrayHighWaterMark check and have submitted an event to the main thread to send it, but haven't actually run that event yet, because the main thread is doing stuff or seriously backed-up. We managed to "skip over" the kHistogramAccumulationsArrayHighWaterMark check and since it's an == not a >= we're never going to hit it. This seems impossible to me because we're in a mutex, but if there is anything that accesses gHistogramAccumulations outside of the lock this is possible.
Flags: needinfo?(benjamin)
The "==" was deliberate to avoid dispatching to main for the first and every subsequent overtopping of that wall. I too am dubious that it might be being manipulated outside of the mutex. But if the main thread's not serving requests in a timely fashion there's nothing stopping it from continuing to fill endlessly, waiting for the IPC timer to fire. (in a theoretical case anyway. So long as it isn't completely hung, the IPCTimerFired will be serviced, eventually)
Georg, does Benjamin's answer help? Is this something you want to take on maybe for the 55 timeframe? It isn't a high volume crash. Wontfix for 53 though.
Crash Signature: nsTArray_base<T>::EnsureCapacity<T> | nsTArray_Impl<T>::AppendElement<T> | TelemetryIPCAccumulator::AccumulateChildHistogram] → nsTArray_base<T>::EnsureCapacity<T> | nsTArray_Impl<T>::AppendElement<T> | TelemetryIPCAccumulator::AccumulateChildHistogram] [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xrealloc | nsTArray_base<T>::EnsureCapacity<T> | nsTArray_Impl<T>::…
Sorry for the long turn-around. Chris, this looks like a dupe of bug 1369041?
Flags: needinfo?(gfritzsche) → needinfo?(chutten)
If this is due to Telemetry IPC as surmised and not FrameProperties' mProperties nsTArray as the original sig seems to blame, then yes.
Flags: needinfo?(chutten)
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.