Crash in [@ OOM | large | mozalloc_abort | mozalloc_handle_oom | moz_xmalloc | ChunkedJSONWriteFunc::AllocChunk]
Categories
(Core :: Gecko Profiler, defect, P3)
Tracking
()
People
(Reporter: gsvelto, Assigned: mozbugz)
References
(Blocks 1 open bug)
Details
(Keywords: crash)
Crash Data
Attachments
(18 files)
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review | |
48 bytes,
text/x-phabricator-request
|
Details | Review |
This bug is for crash report bp-2b10eba0-0e04-4b10-9aa9-9c77b0200203.
Top 10 frames of crashing thread:
0 mozglue.dll mozalloc_abort memory/mozalloc/mozalloc_abort.cpp:33
1 mozglue.dll mozalloc_handle_oom memory/mozalloc/mozalloc_oom.cpp:51
2 mozglue.dll moz_xmalloc memory/mozalloc/mozalloc.cpp:54
3 xul.dll ChunkedJSONWriteFunc::AllocChunk tools/profiler/core/ProfileJSONWriter.cpp:84
4 xul.dll SpliceableChunkedJSONWriter::SpliceableChunkedJSONWriter tools/profiler/public/ProfileJSONWriter.h:115
5 xul.dll profiler_get_profile tools/profiler/core/platform.cpp:3799
6 xul.dll mozilla::ProfilerChild::GrabShutdownProfile tools/profiler/gecko/ProfilerChild.cpp:114
7 xul.dll mozilla::ChildProfilerController::ShutdownProfilerChild tools/profiler/gecko/ChildProfilerController.cpp:93
8 xul.dll mozilla::detail::RunnableMethodImpl<RefPtr<mozilla::AbstractCanonical<RefPtr<AudioDeviceInfo> > >, void xpcom/threads/nsThreadUtils.h:1215
9 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1220
This is an OOM crash happening in the profiler code when a content process is shutting down. The allocation triggering the crash is rather large (2MiB) which is surprising because it seems that the code is just writing out JSON code.
Maybe this can involve a lot of JSON code? Is it possible to fail gracefully here instead of going OOM?
If this is not actionable please go ahead and close this as WONTFIX.
Assignee | ||
Comment 1•5 years ago
|
||
The Profiler can indeed generate big JSON strings, and at that time they coexist with the big Profiler raw buffer, increasing the odds of OOMs!
But we should definitely be able to handle most OOMs. I'll keep this bug (focusing on JSON generation) as a dependency of bug 1558402.
In the meantime, you may be able to lessen the occurrences by reducing the Profiler buffer size (controlled through the add-on or Nightly's built-in popup), of course at the cost of recording a smaller window of time.
Reporter | ||
Comment 2•5 years ago
|
||
Super-silly question, could this JSON be streamed out incrementally without using large buffers? We had a couple more bugs like this one which we fixed by streaming out data to disk but not being familiar with the code here I don't know if it's possible.
Assignee | ||
Comment 3•5 years ago
|
||
That JSON is later sent to the web client js code (from https://profiler.firefox.com) in the same Firefox, so I'm afraid it will stick around in memory anyway -- and it can be multi-MB big!
But there are probably options that could help. As you suggested, we could stream the generated JSON to disk, then clean up the Profiler data, and later stream the JSON out of disk to the Profiler client. There are big plans for the future! π
Updated•4 years ago
|
Updated•4 years ago
|
Comment 4•2 years ago
|
||
I crashed on this today (https://crash-stats.mozilla.org/report/index/a742e636-1fe9-4fb0-9706-061800220727). So it looks like we still have ways to crash when capturing big profiles. It's strange that we crashed trying to allocate a 2.1MB block when 1.67GB was free. Maybe memory fragmentation?
Assignee | ||
Comment 5•2 years ago
•
|
||
In recent bugs I've (hopefully) dealt with situations where we hit an arbitrary Firefox limit (e.g.: string or IPC maximum size), but yes there is still the possibility of actual OOMs, where we just can't allocate system memory as needed.
Regardless of the strangeness of not being able to allocate 2MB out of 1.6GB free, we need to handle these runtime errors.
Dependent bug 1558402 should deal with these in general, but this one here gives a good list of crash sites that we could handle first, and see effects in crash-stats.
Reporter | ||
Comment 6•2 years ago
|
||
(In reply to Florian Quèze [:florian] from comment #4)
I crashed on this today (https://crash-stats.mozilla.org/report/index/a742e636-1fe9-4fb0-9706-061800220727). So it looks like we still have ways to crash when capturing big profiles. It's strange that we crashed trying to allocate a 2.1MB block when 1.67GB was free. Maybe memory fragmentation?
Windows clearly bumped against the page file limit (the last error value is ERROR_COMMITMENT_LIMIT
). What might have happened is the following:
- We hit the commit-space limit, an allocation returned NULL
- We crashed Firefox
- Windows automatically enlarged the page file in the meantime
- The crash report generator picked up the new, larger page file that resulted in the large amount of free commit space
Bug 1716727 should allow avoiding this type of crash.
Comment hidden (Intermittent Failures Robot) |
Assignee | ||
Comment 8•2 years ago
|
||
The FailureLatch interface, and some implementation helpers and classes, will
be used to record the first failure (if any) during some long process.
Updated•2 years ago
|
Assignee | ||
Comment 9•2 years ago
|
||
This removes some redundant initializers in constructors (present and future).
Also some [[nodiscard]]s were added where appropriate, and empty lines for
clarity.
Depends on D155647
Assignee | ||
Comment 10•2 years ago
|
||
Note that at this point, it's using the FailureLatchInfallibleSource singleton,
so operations are still effectively infallible, i.e. they will terminate the
program.
But users already handle future fallible outcomes.
Depends on D155648
Assignee | ||
Comment 11•2 years ago
|
||
Instead of redundantly explaining each static_cast
in different
SpliceableChunkedJSONWriter functions, safe accesses are now isolated to only
two functions, making the public function implementations simpler, including
changes in the following patch.
Depends on D155649
Assignee | ||
Comment 12•2 years ago
|
||
For now, all users specify an infallible latch.
Depends on D155650
Assignee | ||
Comment 14•2 years ago
|
||
When something goes wrong, the most likely cause is running out of memory, so
we clear our data to try and free some memory ASAP, to hopefully reduce the
likelihood of a terminating OOM elsewhere.
Depends on D155652
Assignee | ||
Comment 15•2 years ago
|
||
This will help with forwarding the chosen FailureLatch to the code and
structures gathering per-thread samples and markers.
Depends on D155653
Assignee | ||
Comment 16•2 years ago
|
||
Because JITFrameInfo objects are creating during profiling, they need to have
their own fallible FailureLatchSource, so that any error can be safely handled
even before we know which FailureLatch will be used in the final JSON
generation work.
Depends on D155654
Assignee | ||
Comment 22•2 years ago
|
||
Switch from the FailureLatchInfallibleSource singleton to a local fallible FailureLatchSource.
Depends on D155660
Assignee | ||
Comment 23•2 years ago
|
||
Use a fallible FailureLatchSource, and record the failure message (if any).
Depends on D155661
Assignee | ||
Comment 24•2 years ago
|
||
Use a fallible FailureLatchSource, and record the failure message (if any).
Depends on D155662
Assignee | ||
Comment 25•2 years ago
|
||
There is no easy way to inject failures like OOM during the JSON generation
process, except from markers, which have direct access to the
SpliceableJSONWriter that may now use a fallible FailureLatchSource.
Depends on D155663
Comment hidden (Intermittent Failures Robot) |
Comment 27•2 years ago
|
||
Pushed by gsquelart@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c769477c263b FailureLatch - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/cef60c38bc1e Add default initializers for ChunkedJSONWriteFunc member variables - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/7b8313ab7356 Make ChunkedJSONWriteFunc a FailureLatch, with optionally-fallible operations - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/e97b56983a0e Refactor SpliceableChunkedJSONWriter's access to its ChunkedJSONWriteFunc - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/9a2ac25ecd49 Make Spliceable{,Chunked}JSONWriter a FailureLatch - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/d53fee4addea Make UniqueJSONStrings a FailureLatch - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/031562a9d9fe Make UniqueJSONStrings operations fallible - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/a3e3adcadcd2 Make ProcessStreamingContext a FailureLatch - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/d24fe096ec92 JITFrameInfo carries its own FailureLatchSource - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/57886f5885d4 Make UniqueStacks a FailureLatch - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/65e8fde135ac Make UniqueStacks operations fallible - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/bc03754da595 Make JITFrameInfo operations fallible - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/5c940a7feae6 Handle failures and exit early in main JSON-generation function - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/0326bea6ec5c Pass a FailureLatch to EntryGetter, to potentially exit early on failure - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/e0fe782f6ad4 Handle failures in nsProfiler - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/6cf4c6aa5296 Handle failures when generating shutdown profile - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/76697984e917 Handle failures when generating child profile - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/cb7dec6613b9 Add gtest to test failure injected by marker - r=canaltinova
Comment 28•2 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/c769477c263b
https://hg.mozilla.org/mozilla-central/rev/cef60c38bc1e
https://hg.mozilla.org/mozilla-central/rev/7b8313ab7356
https://hg.mozilla.org/mozilla-central/rev/e97b56983a0e
https://hg.mozilla.org/mozilla-central/rev/9a2ac25ecd49
https://hg.mozilla.org/mozilla-central/rev/d53fee4addea
https://hg.mozilla.org/mozilla-central/rev/031562a9d9fe
https://hg.mozilla.org/mozilla-central/rev/a3e3adcadcd2
https://hg.mozilla.org/mozilla-central/rev/d24fe096ec92
https://hg.mozilla.org/mozilla-central/rev/57886f5885d4
https://hg.mozilla.org/mozilla-central/rev/65e8fde135ac
https://hg.mozilla.org/mozilla-central/rev/bc03754da595
https://hg.mozilla.org/mozilla-central/rev/5c940a7feae6
https://hg.mozilla.org/mozilla-central/rev/0326bea6ec5c
https://hg.mozilla.org/mozilla-central/rev/e0fe782f6ad4
https://hg.mozilla.org/mozilla-central/rev/6cf4c6aa5296
https://hg.mozilla.org/mozilla-central/rev/76697984e917
https://hg.mozilla.org/mozilla-central/rev/cb7dec6613b9
Updated•2 years ago
|
Description
•