Closed
Bug 1344008
Opened 8 years ago
Closed 8 years ago
Crash in OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xrealloc | nsTArray_base<T>::EnsureCapacity<T> | nsTArray_Impl<T>::AppendElement<T> | TelemetryHistogram::Accumulate
Categories
(Toolkit :: Telemetry, defect)
Tracking
()
RESOLVED
DUPLICATE
of bug 1369041
Tracking | Status | |
---|---|---|
firefox-esr45 | --- | unaffected |
firefox51 | --- | unaffected |
firefox52 | --- | wontfix |
firefox-esr52 | --- | affected |
firefox53 | --- | wontfix |
firefox54 | --- | fix-optional |
firefox55 | --- | affected |
People
(Reporter: philipp, Unassigned)
Details
(Keywords: crash, regression)
Crash Data
This bug was filed from the Socorro interface and is
report bp-ab9b726b-05b8-4fb1-bd28-bd1b82170302.
=============================================================
Crashing Thread (0)
Frame Module Signature Source
0 mozglue.dll mozalloc_abort(char const* const) memory/mozalloc/mozalloc_abort.cpp:33
1 mozglue.dll mozalloc_handle_oom(unsigned int) memory/mozalloc/mozalloc_oom.cpp:46
2 mozglue.dll moz_xrealloc memory/mozalloc/mozalloc.cpp:107
3 xul.dll nsTArray_base<nsTArrayInfallibleAllocator, nsTArray_CopyWithMemutils>::EnsureCapacity<nsTArrayInfallibleAllocator>(unsigned int, unsigned int) obj-firefox/dist/include/nsTArray-inl.h:183
4 xul.dll nsTArray_Impl<mozilla::FramePropertyTable::PropertyValue, nsTArrayInfallibleAllocator>::AppendElement<mozilla::FramePropertyTable::PropertyValue, nsTArrayInfallibleAllocator>(mozilla::FramePropertyTable::PropertyValue&&) obj-firefox/dist/include/nsTArray.h:2073
5 xul.dll TelemetryHistogram::Accumulate(mozilla::Telemetry::ID, unsigned int) toolkit/components/telemetry/TelemetryHistogram.cpp:2212
6 xul.dll nsCycleCollector::CleanupAfterCollection() xpcom/base/nsCycleCollector.cpp:3570
7 xul.dll nsCycleCollector::Collect(ccType, js::SliceBudget&, nsICycleCollectorListener*, bool) xpcom/base/nsCycleCollector.cpp:3678
8 xul.dll nsCycleCollector_collect(nsICycleCollectorListener*) xpcom/base/nsCycleCollector.cpp:4144
9 xul.dll nsJSContext::CycleCollectNow(nsICycleCollectorListener*, int) dom/base/nsJSEnvironment.cpp:1440
10 xul.dll nsJSEnvironmentObserver::Observe(nsISupports*, char const*, char16_t const*) dom/base/nsJSEnvironment.cpp:338
11 xul.dll nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*) xpcom/ds/nsObserverList.cpp:112
12 xul.dll nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) xpcom/ds/nsObserverService.cpp:281
13 xul.dll nsThread::DoMainThreadSpecificProcessing(bool) xpcom/threads/nsThread.cpp:1454
14 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1172
15 xul.dll mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:96
16 xul.dll mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:301
17 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:225
18 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:205
19 xul.dll nsBaseAppShell::Run() widget/nsBaseAppShell.cpp:156
20 xul.dll nsAppShell::Run() widget/windows/nsAppShell.cpp:262
21 xul.dll XRE_RunAppShell toolkit/xre/nsEmbedFunctions.cpp:866
22 xul.dll mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:269
23 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:225
24 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:205
25 xul.dll XRE_InitChildProcess toolkit/xre/nsEmbedFunctions.cpp:698
26 firefox.exe content_process_main(int, char** const) ipc/contentproc/plugin-container.cpp:197
27 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:115
28 firefox.exe __scrt_common_main_seh f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253
29 kernel32.dll BaseThreadInitThunk
30 ntdll.dll __RtlUserThreadStart
31 ntdll.dll _RtlUserThreadStart
this oom crash signature is regressing in firefox 52 builds on windows.
just by looking at how the signature progressed in the nightly & aurora channels it looks like a patch that landed on nightly mid-january addressed that issue (53.0a1 build 20170115030210 was the last version which submitted a report with this signature).
Flags: needinfo?(gfritzsche)
Comment 1•8 years ago
|
||
This is a OOM content process crash. Presumably this goes through:
- internal_Accumulate(): https://hg.mozilla.org/releases/mozilla-release/annotate/2183f7cb4f88/toolkit/components/telemetry/TelemetryHistogram.cpp#l1413
- internal_RemoteAccumulate(): https://hg.mozilla.org/releases/mozilla-release/annotate/2183f7cb4f88/toolkit/components/telemetry/TelemetryHistogram.cpp#l1373
... which is where the OOMing array alloc probably happens.
Those allocations are however bounded by a constant limit.
The report says the OOM allocation size is 512 KB - this doesn't seem that excessive?
Mid january that IPC code was refactored into a different file:
https://hg.mozilla.org/mozilla-central/rev/850f95f34e6d
As the logic didn't really change in the process, i'd assume that either:
- we still see this under a different signature, e.g. involving TelemetryIPCAccumulator [1]
- something else is behind the OOM spikes and was fixed separately
Chris, any thoughts?
1: https://crash-stats.mozilla.com/search/?signature=~TelemetryIPCAccumulator&date=%3E%3D2017-02-24T10%3A59%3A00.000Z&date=%3C2017-03-03T10%3A59%3A00.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-signature
Flags: needinfo?(gfritzsche) → needinfo?(chutten)
Reporter | ||
Comment 2•8 years ago
|
||
thanks for the hint - so the crashes are probably continuing as [@OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xrealloc | nsTArray_base<T>::EnsureCapacity<T> | nsTArray_Impl<T>::AppendElement<T> | TelemetryIPCAccumulator::AccumulateChildHistogram] in 53 and later.
Crash Signature: [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xrealloc | nsTArray_base<T>::EnsureCapacity<T> | nsTArray_Impl<T>::AppendElement<T> | TelemetryHistogram::Accumulate] → [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xrealloc | nsTArray_base<T>::EnsureCapacity<T> | nsTArray_Impl<T>::AppendElement<T> | TelemetryHistogram::Accumulate]
[@OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xrealloc | nsTArr…
Comment 3•8 years ago
|
||
I'm afraid I don't have much to add here... though...
CycleCollector's finishing up and we're trying to allocate. What are the chances that we're in a memory pressure situation and then double the size of our storage before CC can release to us the space?
Flags: needinfo?(chutten)
Comment 4•8 years ago
|
||
Georg, any thoughts for how to move forward with this?
Flags: needinfo?(gfritzsche)
Comment 5•8 years ago
|
||
Benjamin, do you know who could provide input on this crash?
It's unclear to us what we're dealing with here.
Flags: needinfo?(gfritzsche) → needinfo?(benjamin)
Comment 6•8 years ago
|
||
There are some very large allocations going on: https://crash-stats.mozilla.com/search/?signature=%3DOOM%20%7C%20large%20%7C%20mozalloc_abort%20%7C%20mozalloc_handle_oom%20%7C%20moz_xrealloc%20%7C%20nsTArray_base%3CT%3E%3A%3AEnsureCapacity%3CT%3E%20%7C%20nsTArray_Impl%3CT%3E%3A%3AAppendElement%3CT%3E%20%7C%20TelemetryIPCAccumulator%3A%3AAccumulateChildHistogram&date=%3E%3D2017-03-16T12%3A11%3A00.000Z&date=%3C2017-03-23T12%3A11%3A00.000Z&_sort=-date&_facets=signature&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=oom_allocation_size#crash-reports
Looking at https://crash-stats.mozilla.com/report/index/c407c872-66cb-4533-8379-7440c2170323 in particular:
gHistogramAccumulations is being appended-to and is quite large. https://hg.mozilla.org/releases/mozilla-beta/annotate/a5ccc32310c9/toolkit/components/telemetry/TelemetryIPCAccumulator.cpp#l47 the allocation size is 524,288 bytes. I don't know what this means precisely. I see two options:
We already hit the kHistogramAccumulationsArrayHighWaterMark check and have submitted an event to the main thread to send it, but haven't actually run that event yet, because the main thread is doing stuff or seriously backed-up.
We managed to "skip over" the kHistogramAccumulationsArrayHighWaterMark check and since it's an == not a >= we're never going to hit it. This seems impossible to me because we're in a mutex, but if there is anything that accesses gHistogramAccumulations outside of the lock this is possible.
Flags: needinfo?(benjamin)
Updated•8 years ago
|
status-firefox55:
--- → affected
Comment 7•8 years ago
|
||
The "==" was deliberate to avoid dispatching to main for the first and every subsequent overtopping of that wall. I too am dubious that it might be being manipulated outside of the mutex.
But if the main thread's not serving requests in a timely fashion there's nothing stopping it from continuing to fill endlessly, waiting for the IPC timer to fire. (in a theoretical case anyway. So long as it isn't completely hung, the IPCTimerFired will be serviced, eventually)
Comment 8•8 years ago
|
||
Georg, does Benjamin's answer help? Is this something you want to take on maybe for the 55 timeframe? It isn't a high volume crash.
Wontfix for 53 though.
Flags: needinfo?(gfritzsche)
Reporter | ||
Updated•8 years ago
|
Crash Signature: nsTArray_base<T>::EnsureCapacity<T> | nsTArray_Impl<T>::AppendElement<T> | TelemetryIPCAccumulator::AccumulateChildHistogram] → nsTArray_base<T>::EnsureCapacity<T> | nsTArray_Impl<T>::AppendElement<T> | TelemetryIPCAccumulator::AccumulateChildHistogram]
[@ OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xrealloc | nsTArray_base<T>::EnsureCapacity<T> | nsTArray_Impl<T>::…
Comment 9•8 years ago
|
||
Sorry for the long turn-around.
Chris, this looks like a dupe of bug 1369041?
Flags: needinfo?(gfritzsche) → needinfo?(chutten)
Comment 10•8 years ago
|
||
If this is due to Telemetry IPC as surmised and not FrameProperties' mProperties nsTArray as the original sig seems to blame, then yes.
Flags: needinfo?(chutten)
Updated•8 years ago
|
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•