Make it possible for the front-end to determine that a frame crash was caused by an OOM
Categories
(Firefox :: Tabbed Browser, task, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox76 | --- | fixed |
People
(Reporter: gsvelto, Assigned: Yoric)
References
(Blocks 1 open bug)
Details
Attachments
(4 files, 1 obsolete file)
Currently in the case of an OOM crash of a content process we'll show users the "This tab crashed" interface and ask them to submit a crash report. In a post-Fission world where OOM crashes are more likely it might be less disruptive to silently reload the crashed tab (or display an infobar mentioning that the tab was reloaded because it consumed too much memory).
I'm filing this bug to investigate what would be required to inform the tab that its content process disappeared due to an OOM.
Updated•5 years ago
|
Assignee | ||
Comment 1•5 years ago
|
||
Are you talking about the process being killed by the OS (as in Linux oom-killer) or about a content process committing ritual self-immolation because it cannot allocate memory?
For the former, there doesn't seem to be a simple way to detect it:
- Linux distributions write this in logs, but not all in the same file and not always immediately;
- I haven't found anything for Windows or macOS yet;
- neither our IPC code nor Chrome (I've checked in their code) seems to detect this.
For the latter, it might be simpler.
Reporter | ||
Comment 2•5 years ago
|
||
(In reply to David Teller [:Yoric] (please use "needinfo") from comment #1)
Are you talking about the process being killed by the OS (as in Linux oom-killer) or about a content process committing ritual self-immolation because it cannot allocate memory?
Both, in some cases we can always detect OOM crashes on Windows but not all the time on Linux/macOS.
For the former, there doesn't seem to be a simple way to detect it:
- Linux distributions write this in logs, but not all in the same file and not always immediately;
- I haven't found anything for Windows or macOS yet;
- neither our IPC code nor Chrome (I've checked in their code) seems to detect this.
For the latter, it might be simpler.
We're going to cheat :-) When doing large allocations - and on Windows when any allocation fails - we set the OOMAllocationSize
crash annotation to a value that is different from 0. The idea is to use that to detect OOMs. When we hit a content process crash generate the minidump here:
There the CrashReporterHost
object contains the crash annotations (we set them here) so we could use that to inspect the OOMAllocationSize
and use it to inform the front-end that this was an OOM.
We're passing a property bag to the front-end here and we could add an entry to flag OOM crashes.
This will always work on Windows but on macOS and Linux only sometimes. On Linux we can improve the situation but it's more work so I'll leave it for another bug. If you're curious about investigating that too OOM crashes will often look like this:
https://crash-stats.mozilla.com/report/index/b44f5e06-568c-492b-b2d5-0ad6f0191119
The crash reason is SIGBUS
and more often than not the crashing address is a multiple of the page size. That's because we were trying to page in something and the kernel couldn't find a free page so it killed us. SIGBUS
can also be raised in other scenarios - but they're rare - and the crash address might not be a multiple of the page-size if the access happened in the middle of an object but IMHO it's a good starting point.
Assignee | ||
Comment 3•5 years ago
|
||
and on Windows when any allocation fails
Why only on Windows?
The crash reason is SIGBUS and more often than not the crashing address is a multiple of the page size. That's because we were trying to page in something and the kernel couldn't find a free page so it killed us.
So it's a problem of us not always using fallible allocations? Does this mean that we should patch jemalloc to have a flag "allocation crashed recently"?
We're passing a property bag to the front-end here and we could add an entry to flag OOM crashes.
Well, this certainly simplifies the situation :)
Do we have infrastructure to test OOMs somewhere?
Assignee | ||
Comment 4•5 years ago
|
||
Also, what you're describing is case 2 (self-immolation) and case 2bis (we should have self-immolated but we forgot so the OS did it for us), but not case 1 (oom-killer), right?
Reporter | ||
Comment 5•5 years ago
|
||
(In reply to David Teller [:Yoric] (please use "needinfo") from comment #4)
Also, what you're describing is case 2 (self-immolation) and case 2bis (we should have self-immolated but we forgot so the OS did it for us), but not case 1 (oom-killer), right?
On Windows failed allocations always return NULL
so we always self-immolate when doing infallible allocations. On Linux most allocations succeed (e.g. mmap()
almost never fails) so it's the OOM killer that's killing us most of the time. The core difference between the two is that Linux allow processes to overcommit memory while Windows doesn't. So on Linux an allocation might succeed but then we would crash when touching the allocated memory because the kernel isn't able to back it up with actual physical memory. That makes it harder to detect OOMs.
Assignee | ||
Comment 6•5 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #5)
On Windows failed allocations always return NULL so we always self-immolate when doing infallible allocations.
Ok. Is that the most common situation we need to deal with? Rather than whatever oom-killer-ish technique Windows may be employing when the swap grows too large and it needs to start killing random processes?
That makes it harder to detect OOMs [on Linux and probably macOS].
Oh. I thought that Windows also allowed overcommiting. Good point. I don't see any simple way around this.
Perhaps for macOS and Linux we could somehow detect we're encountering a page fault in a page that we have allocated (and not deallocated) with jemalloc? Surely jemalloc must have some list of the pages it owns. Anyway, that sounds like a followup bug.
Comment 7•5 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #2)
https://crash-stats.mozilla.com/report/index/b44f5e06-568c-492b-b2d5-0ad6f0191119
The crash reason is
SIGBUS
and more often than not the crashing address is a multiple of the page size. That's because we were trying to page in something and the kernel couldn't find a free page so it killed us.SIGBUS
can also be raised in other scenarios - but they're rare
Getting ENOSPC
on a mapped file from /dev/shm
(e.g., from glibc's shm_open
) isn't especially rare, unfortunately, and it's likely the cause if the crash is in graphics code touching texture memory; see bug 1245239 and connected bugs. That's sort of an OOM, in that it's a resource shortage and not a memory unsafety bug, but it doesn't necessarily mean the entire system is out of memory.
Is there evidence that some of our SIGBUS
crashes are from non-file-backed memory?
Reporter | ||
Comment 8•5 years ago
|
||
Oh yes, I forgot about bug 1245239. Many common SIGBUS crashes happen when copying, setting or scanning memory at page-aligned addresses, see these ones for example:
[__memmove_avx_unaligned_erms | mozilla::AudioStream::GetUnprocessed]
Or these:
[__memcpy_sse2_unaligned_erms | mozilla::layers::MappedYCbCrChannelData::CopyInto]
I don't think these involve shared segments.
Either way the goal here is to make these kind of content process crashes almost invisible to the user , crashes caused by shared-memory exhaustion might also fit in this category, WDYT?
Reporter | ||
Comment 9•5 years ago
|
||
(In reply to David Teller [:Yoric] (please use "needinfo") from comment #6)
Ok. Is that the most common situation we need to deal with? Rather than whatever oom-killer-ish technique Windows may be employing when the swap grows too large and it needs to start killing random processes?
Yes, in fact what we consider our OOM crash rate is measured this way. It's basically the number of crashes with the OOMAllocationSize
crash annotation set over the total number of crashes.
Oh. I thought that Windows also allowed overcommiting. Good point. I don't see any simple way around this.
Perhaps for macOS and Linux we could somehow detect we're encountering a page fault in a page that we have allocated (and not deallocated) with jemalloc? Surely jemalloc must have some list of the pages it owns. Anyway, that sounds like a followup bug.
Maybe but it's not urgent either. Windows is by far the biggest source of OOM crashes so if we could implement this within the existing machinery it would be more than sufficient. We can improve macOS & Linux OOM crash detection at a later stage.
Assignee | ||
Comment 10•5 years ago
|
||
I'll start working on it as soon as I have received my Windows machine.
Assignee | ||
Comment 11•5 years ago
|
||
(also, my jemalloc-related idea doesn't work)
Assignee | ||
Comment 12•5 years ago
|
||
Ok, if I read code correctly, ContentCrashHandlers.jsm is in charge of displaying about:tabcrashed
whenever a <xul:browser>
has crashed. Also, this same module already has access to the nsIPropertyBag
in which OOMAllocationSize
is stored.
Sounds like I should be able to replace about:tabcrashed
with a lazily-loaded <xul:browser>
for the current page.
Not sure how I can test it, though.
Reporter | ||
Comment 13•5 years ago
|
||
You can try CrashTestUtils
, it has a crash-as-an-oom function. We're using it in xpcshell tests, see this one for example.
Assignee | ||
Comment 14•5 years ago
|
||
We'll use this method to expose additional information to the front-end for recovering from OOM.
Assignee | ||
Comment 15•5 years ago
|
||
Depends on D54129
Assignee | ||
Comment 16•5 years ago
|
||
As Fission currently causes lots of OOM under Windows, to make life tolerable, we need OOM-crashed tabs to reload automatically. This patch attempts to piggyback upon the existing tabbrowser infrastructure to make this happen.
I'd like the input from someone who knows tabbrowser.discardBrowser better than me, though!
Depends on D54130
Assignee | ||
Comment 17•5 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #13)
You can try
CrashTestUtils
, it has a crash-as-an-oom function. We're using it in xpcshell tests, see this one for example.
Thanks, I'll try this. I need to also find how to test that a tab was lazified.
Assignee | ||
Comment 18•5 years ago
|
||
Ok, I should be able to test with browser.linkedTab
.
Assignee | ||
Comment 19•5 years ago
|
||
Gabriele, if we immediately restore the currently focused tab, there's a decent change that we'd go into some kind of infinite OOM loop. That wouldn't be good. How do you want to handle that? One option is to just expose the crash information and let fx-team handle this side, as they have UX resources at hand.
Reporter | ||
Comment 20•5 years ago
|
||
(In reply to David Teller [:Yoric] (please use "needinfo") from comment #19)
Gabriele, if we immediately restore the currently focused tab, there's a decent change that we'd go into some kind of infinite OOM loop.
Good point. We should be able to mostly prevent that by adjusting the memory priority of the other content processes (the ones handling background tabs); I'll file a bug for that. That being said we might still be in a scenario where there's only one tab and it's opening a huge website so we keep reloading and crashing. In fact I'm not even sure if reloading the currently focused tab is a good idea; background tabs can certainly be reloaded with minimal disruption but for the foreground one we might want to pay attention.
That wouldn't be good. How do you want to handle that? One option is to just expose the crash information and let fx-team handle this side, as they have UX resources at hand.
+1 to hand this over to the fx-team; they can certainly find a better solution given the necessary tools.
Reporter | ||
Comment 21•5 years ago
|
||
CC'ing :mconley because he might want to know about this.
Comment 22•5 years ago
|
||
Putting on the radar for Fission front-end work.
We'll probably want to get some UX help to figure how to communicate the various failure cases to the user.
Assignee | ||
Comment 23•5 years ago
|
||
So, what I'll do in this bug is send a new observable event with the id of the crashing tab so that the front-end can decide whether to display a message, add a lazy reload or reload immediately.
:mconley, any favored name for this observable?
Comment 24•5 years ago
|
||
(In reply to David Teller [:Yoric] (please use "needinfo") from comment #23)
So, what I'll do in this bug is send a new observable event with the id of the crashing tab so that the front-end can decide whether to display a message, add a lazy reload or reload immediately.
Hm, what ID do you mean?
:mconley, any favored name for this observable?
We already have several signals for when a oop frame crashes - example:
oop-frameloader-crashed
observer notificationoop-browser-crashed
eventipc:content-shutdown
observer notification
I'm reluctant to add another unless we really need it. Can we piggyback off one of these instead?
Reporter | ||
Comment 25•5 years ago
|
||
Adding a field to ipc:content-shutdown
saying that the crash was an OOM should be trivial.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 26•5 years ago
|
||
(In reply to Mike Conley (:mconley) (:⚙️) (Wayyyy behind on needinfos) from comment #24)
(In reply to David Teller [:Yoric] (please use "needinfo") from comment #23)
So, what I'll do in this bug is send a new observable event with the id of the crashing tab so that the front-end can decide whether to display a message, add a lazy reload or reload immediately.
Hm, what ID do you mean?
I meant the tab id.
:mconley, any favored name for this observable?
We already have several signals for when a oop frame crashes - example:
oop-frameloader-crashed
observer notificationoop-browser-crashed
eventipc:content-shutdown
observer notificationI'm reluctant to add another unless we really need it. Can we piggyback off one of these instead?
Ok, I've rethought it a bit. It's now field isLikelyOOM
of the subject of notification ipc:content-shutdown
.
Assignee | ||
Comment 27•5 years ago
|
||
BrowserTestUtils.crashFrame now accepts additional options
, with an argument crashType
that may
take "CRASH_OOM" or "CRASH_INVALID_POINTER_DEREF"|null to specify the nature of the crash. The names
are taken from CrashTestUtils.jsm but this module cannot be imported as such as it has non-trivial
binary dependencies.
Depends on D54130
Updated•5 years ago
|
Assignee | ||
Comment 28•5 years ago
|
||
Depends on D54700
Comment 29•5 years ago
|
||
Comment 30•5 years ago
|
||
Backed out 4 changesets (bug 1589493) for chrome failures at dom/ipc/tests/test_process_error_oom.xul
Bakout: https://hg.mozilla.org/integration/autoland/rev/ab306a7d27cd4fe35f985d089a0bd35e18faa26c
Failure push: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=9b97128e83d8f0b2a1e586b9de280e57a55a549d
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=278639151&repo=autoland&lineNumber=3670
task 2019-11-28T15:34:54.902Z] 15:34:54 INFO - TEST-START | dom/ipc/tests/test_process_error_oom.xul
[task 2019-11-28T15:34:54.909Z] 15:34:54 INFO - GECKO(520) | ++DOMWINDOW == 43 (188E0800) [pid = 4000] [serial = 43] [outer = 275135E0]
[task 2019-11-28T15:34:54.950Z] 15:34:54 INFO - GECKO(520) | ++DOCSHELL 19C90000 == 11 [pid = 4000] [id = {d6ca31f0-ed4a-49ca-a2d9-50f70f0d8a9c}]
[task 2019-11-28T15:34:54.950Z] 15:34:54 INFO - GECKO(520) | ++DOMWINDOW == 44 (24E6D700) [pid = 4000] [serial = 44] [outer = 00000000]
[task 2019-11-28T15:34:54.950Z] 15:34:54 INFO - GECKO(520) | ++DOMWINDOW == 45 (19C91C00) [pid = 4000] [serial = 45] [outer = 24E6D700]
[task 2019-11-28T15:34:54.965Z] 15:34:54 INFO - GECKO(520) | Chrome file doesn't exist: Z:\task_1574954351\build\tests\mochitest\chrome\dom\ipc\tests\process_error.xul
[task 2019-11-28T15:34:55.020Z] 15:34:55 INFO - GECKO(520) | ++DOMWINDOW == 46 (1E55F400) [pid = 4000] [serial = 46] [outer = 24E6D700]
[task 2019-11-28T15:34:55.075Z] 15:34:55 INFO - GECKO(520) | [Parent 4000, Main Thread] WARNING: '!tsi', file z:/build/build/src/dom/base/Document.cpp, line 1426
[task 2019-11-28T15:34:55.075Z] 15:34:55 INFO - GECKO(520) | [Parent 4000, Main Thread] WARNING: Not same origin error!: file z:/build/build/src/dom/base/nsJSEnvironment.cpp, line 523
[task 2019-11-28T15:34:55.075Z] 15:34:55 INFO - GECKO(520) | JavaScript error: chrome://browser/content/aboutNetError.js, line 270: ReferenceError: RPMGetFormatURLPref is not defined
[task 2019-11-28T15:34:55.730Z] 15:34:55 INFO - GECKO(520) | --DOMWINDOW == 45 (2735FC00) [pid = 4000] [serial = 38] [outer = 00000000] [url = about:blank]
[task 2019-11-28T15:34:55.730Z] 15:34:55 INFO - GECKO(520) | --DOMWINDOW == 44 (27580000) [pid = 4000] [serial = 37] [outer = 00000000] [url = about:blank]
[task 2019-11-28T15:34:55.730Z] 15:34:55 INFO - GECKO(520) | --DOMWINDOW == 43 (1EA77800) [pid = 4000] [serial = 16] [outer = 00000000] [url = about:blank]
[task 2019-11-28T15:34:55.730Z] 15:34:55 INFO - GECKO(520) | --DOMWINDOW == 42 (1EA76000) [pid = 4000] [serial = 15] [outer = 00000000] [url = about:blank]
[task 2019-11-28T15:34:55.730Z] 15:34:55 INFO - GECKO(520) | --DOMWINDOW == 41 (1EA78C00) [pid = 4000] [serial = 17] [outer = 00000000] [url = about:blank]
[task 2019-11-28T15:34:55.731Z] 15:34:55 INFO - GECKO(520) | --DOMWINDOW == 40 (20805400) [pid = 4000] [serial = 28] [outer = 00000000] [url = about:blank]
[task 2019-11-28T15:34:55.731Z] 15:34:55 INFO - GECKO(520) | --DOMWINDOW == 39 (2001F000) [pid = 4000] [serial = 26] [outer = 00000000] [url = about:blank]
[task 2019-11-28T15:34:55.731Z] 15:34:55 INFO - GECKO(520) | --DOMWINDOW == 38 (1E2E1800) [pid = 4000] [serial = 9] [outer = 00000000] [url = about:blank]
[task 2019-11-28T15:34:55.731Z] 15:34:55 INFO - GECKO(520) | --DOMWINDOW == 37 (1E2DF000) [pid = 4000] [serial = 7] [outer = 00000000] [url = about:blank]
[task 2019-11-28T15:34:55.738Z] 15:34:55 INFO - GECKO(520) | --DOMWINDOW == 36 (1EA7A000) [pid = 4000] [serial = 18] [outer = 00000000] [url = about:blank]
[task 2019-11-28T15:34:55.738Z] 15:34:55 INFO - GECKO(520) | --DOMWINDOW == 35 (24CC1C00) [pid = 4000] [serial = 32] [outer = 00000000] [url = about:blank]
[task 2019-11-28T15:34:56.113Z] 15:34:56 INFO - GECKO(520) | --DOMWINDOW == 34 (0BFD03A0) [pid = 4000] [serial = 1] [outer = 00000000] [url = chrome://gfxsanity/content/sanityparent.html]
[task 2019-11-28T15:34:56.113Z] 15:34:56 INFO - GECKO(520) | --DOMWINDOW == 33 (0BFD0940) [pid = 4000] [serial = 8] [outer = 00000000] [url = chrome://gfxsanity/content/sanitytest.html]
[task 2019-11-28T15:34:56.113Z] 15:34:56 INFO - GECKO(520) | --DOMWINDOW == 32 (1EA9A5E0) [pid = 4000] [serial = 14] [outer = 00000000] [url = moz-extension://75c27674-9b83-422f-bafc-a70ea4a39ceb/_generated_background_page.html]
[task 2019-11-28T15:35:00.052Z] 15:35:00 INFO - GECKO(520) | --DOMWINDOW == 31 (1E2E2C00) [pid = 4000] [serial = 29] [outer = 00000000] [url = about:blank]
[task 2019-11-28T15:35:00.052Z] 15:35:00 INFO - GECKO(520) | --DOMWINDOW == 30 (1F0F2400) [pid = 4000] [serial = 24] [outer = 00000000] [url = moz-extension://75c27674-9b83-422f-bafc-a70ea4a39ceb/_generated_background_page.html]
[task 2019-11-28T15:35:00.052Z] 15:35:00 INFO - GECKO(520) | --DOMWINDOW == 29 (19C92400) [pid = 4000] [serial = 2] [outer = 00000000] [url = about:blank]
[task 2019-11-28T15:35:00.052Z] 15:35:00 INFO - GECKO(520) | --DOMWINDOW == 28 (1ED85800) [pid = 4000] [serial = 20] [outer = 00000000] [url = chrome://gfxsanity/content/sanitytest.html]
[task 2019-11-28T15:35:04.332Z] 15:35:04 INFO - GECKO(520) | --DOMWINDOW == 27 (19C91C00) [pid = 4000] [serial = 45] [outer = 00000000] [url = about:blank]
[task 2019-11-28T15:35:04.332Z] 15:35:04 INFO - GECKO(520) | --DOMWINDOW == 26 (188D9C00) [pid = 4000] [serial = 42] [outer = 00000000] [url = chrome://mochikit/content/tests/SimpleTest/iframe-between-tests.html]
[task 2019-11-28T15:36:49.336Z] 15:36:49 INFO - GECKO(520) | [Parent 4000, Jump List] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012: file z:/build/build/src/widget/windows/WinUtils.cpp, line 1346
[task 2019-11-28T15:38:49.328Z] 15:38:49 INFO - GECKO(520) | [Parent 4000, Jump List] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80520012: file z:/build/build/src/widget/windows/WinUtils.cpp, line 1346
[task 2019-11-28T15:40:21.211Z] 15:40:21 INFO - TEST-INFO | started process screenshot
[task 2019-11-28T15:40:21.278Z] 15:40:21 INFO - TEST-INFO | screenshot: exit 0
[task 2019-11-28T15:40:21.278Z] 15:40:21 INFO - TEST-UNEXPECTED-FAIL | dom/ipc/tests/test_process_error_oom.xul | Test timed out.
[task 2019-11-28T15:40:21.278Z] 15:40:21 INFO - SimpleTest.ok@chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:277:18
[task 2019-11-28T15:40:21.278Z] 15:40:21 INFO - reportError@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:121:22
[task 2019-11-28T15:40:21.278Z] 15:40:21 INFO - TestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:142:18
[task 2019-11-28T15:40:21.278Z] 15:40:21 INFO - setTimeout handlerTestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:170:15
[task 2019-11-28T15:40:21.278Z] 15:40:21 INFO - setTimeout handlerTestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:170:15
[task 2019-11-28T15:40:21.278Z] 15:40:21 INFO - setTimeout handlerTestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:170:15
[task 2019-11-28T15:40:21.278Z] 15:40:21 INFO - setTimeout handlerTestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:170:15
[task 2019-11-28T15:40:21.278Z] 15:40:21 INFO - setTimeout handlerTestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:170:15
[task 2019-11-28T15:40:21.278Z] 15:40:21 INFO - setTimeout handlerTestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:170:15
[task 2019-11-28T15:40:21.279Z] 15:40:21 INFO - setTimeout handlerTestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:170:15
[task 2019-11-28T15:40:21.279Z] 15:40:21 INFO - setTimeout handlerTestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:170:15
Reporter | ||
Comment 31•5 years ago
|
||
We've been unlucky, bug 1595908 converted the .xul tests under dom/ipc to .xhtml before we landed this so the new test (which is still .xul) started failing once landed.
Assignee | ||
Comment 32•5 years ago
|
||
Ah, I was wondering what went wrong. I'll try and update this today.
Comment 33•5 years ago
|
||
Comment 34•5 years ago
|
||
Comment 35•5 years ago
|
||
Backed out 4 changesets for linting failure at builds/worker/checkouts/gecko/dom/ipc/tests/test_process_error_oom.xhtml:12:11
Backout link: https://hg.mozilla.org/integration/autoland/rev/5fa805b486fb2639f652aa9c27c081188ac0b9e1
Failure logs:
Assignee | ||
Comment 36•5 years ago
|
||
Not sure the Geckoview test is related. As far as I can tell, my code is entirely dead in Geckoview (unless we stumble upon a OOM, but that's another problem).
The failure seems to show up only under Windows, so I'll probably wait until I have a Windows machine to reproduce it. It was ordered 2 weeks ago, so it should arrive eventually :)
Comment 37•5 years ago
|
||
(In reply to David Teller [:Yoric] (please use "needinfo") from comment #36)
Not sure the Geckoview test is related. As far as I can tell, my code is entirely dead in Geckoview (unless we stumble upon a OOM, but that's another problem).
The failure seems to show up only under Windows, so I'll probably wait until I have a Windows machine to reproduce it. It was ordered 2 weeks ago, so it should arrive eventually :)
Can you provide a status update? I opened a feature request a few days ago with a similar goal (https://bugzilla.mozilla.org/show_bug.cgi?id=1611631). We use Firefox as part of a digital signage solution and are hit with "Gah. Your tab just crashed.", due to the OOM killing the content process (probably due to some leaky javascript).
At the moment there seems to be noway we can detect a crashed content process, so some sort of builtin solution would be very useful.
Assignee | ||
Comment 40•5 years ago
|
||
(In reply to Kristian Klausen from comment #37)
(In reply to David Teller [:Yoric] (please use "needinfo") from comment #36)
Not sure the Geckoview test is related. As far as I can tell, my code is entirely dead in Geckoview (unless we stumble upon a OOM, but that's another problem).
The failure seems to show up only under Windows, so I'll probably wait until I have a Windows machine to reproduce it. It was ordered 2 weeks ago, so it should arrive eventually :)
Can you provide a status update? I opened a feature request a few days ago with a similar goal (https://bugzilla.mozilla.org/show_bug.cgi?id=1611631). We use Firefox as part of a digital signage solution and are hit with "Gah. Your tab just crashed.", due to the OOM killing the content process (probably due to some leaky javascript).
At the moment there seems to be noway we can detect a crashed content process, so some sort of builtin solution would be very useful.
The machine finally arrived. I'll try and get on it next week.
Comment 41•5 years ago
|
||
Comment 42•5 years ago
|
||
Backed out 4 changesets (Bug 1589493) for causing lint failure in test_process_error_oom.xhtml
Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&selectedJob=290407975&resultStatus=testfailed%2Cbusted%2Cexception&revision=9dbe0bdd321bad5adde689db741b95964d33de6a
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=290407975&repo=autoland&lineNumber=291
Backout: https://hg.mozilla.org/integration/autoland/rev/5df7ef660fc08196eaef5d18d942e171414bbab4
Comment 43•5 years ago
|
||
Assignee | ||
Updated•5 years ago
|
Comment 44•5 years ago
|
||
Backed out 4 changesets (bug 1589493) for multiple failures and crashes on CrashReporterHost
Backout: https://hg.mozilla.org/integration/autoland/rev/603d321001279e54fdd86ef256e27fb014ba695a
Failure push: https://treeherder.mozilla.org/#/jobs?repo=autoland&revision=59fc685edca22b1f6ad0ec117511b2f54df34113
Failure log:
crashtest: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=290587324&repo=autoland&lineNumber=16871
chrome failures: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=290593741&repo=autoland&lineNumber=2742
reftest: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=290590933&repo=autoland&lineNumber=38539
mda: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=290591305&repo=autoland&lineNumber=2310
Comment 45•5 years ago
|
||
Comment 46•5 years ago
|
||
Backed out 4 changesets (Bug 1589493) for causing mochitest failure at dom/ipc/tests/test_process_error_oom.xhtml
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=290735880&repo=autoland&lineNumber=2646
[task 2020-02-27T10:27:42.315Z] 10:27:42 INFO - TEST-INFO | screentopng: exit 0
[task 2020-02-27T10:27:42.316Z] 10:27:42 INFO - Buffered messages logged at 10:22:43
[task 2020-02-27T10:27:42.317Z] 10:27:42 INFO - TEST-PASS | dom/ipc/tests/test_process_error.xhtml | Expected the right browsing context id on the oop-browser-crashed event.
[task 2020-02-27T10:27:42.318Z] 10:27:42 INFO - TEST-PASS | dom/ipc/tests/test_process_error.xhtml | Received correct observer topic.
[task 2020-02-27T10:27:42.319Z] 10:27:42 INFO - TEST-PASS | dom/ipc/tests/test_process_error.xhtml | Subject implements nsIPropertyBag2.
[task 2020-02-27T10:27:42.320Z] 10:27:42 INFO - TEST-PASS | dom/ipc/tests/test_process_error.xhtml | dumpID is present and not an empty string
[task 2020-02-27T10:27:42.320Z] 10:27:42 INFO - Buffered messages finished
[task 2020-02-27T10:27:42.321Z] 10:27:42 INFO - TEST-UNEXPECTED-FAIL | dom/ipc/tests/test_process_error.xhtml | Test timed out.
[task 2020-02-27T10:27:42.321Z] 10:27:42 INFO - SimpleTest.ok@chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:299:16
[task 2020-02-27T10:27:42.321Z] 10:27:42 INFO - reportError@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:128:22
[task 2020-02-27T10:27:42.321Z] 10:27:42 INFO - TestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:150:18
[task 2020-02-27T10:27:42.321Z] 10:27:42 INFO - setTimeout handler*TestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:184:15
[task 2020-02-27T10:27:42.321Z] 10:27:42 INFO - setTimeout handler*TestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:184:15
[task 2020-02-27T10:27:42.322Z] 10:27:42 INFO - setTimeout handler*TestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:184:15
[task 2020-02-27T10:27:42.322Z] 10:27:42 INFO - setTimeout handler*TestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:184:15
[task 2020-02-27T10:27:42.322Z] 10:27:42 INFO - setTimeout handler*TestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:184:15
[task 2020-02-27T10:27:42.322Z] 10:27:42 INFO - setTimeout handler*TestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:184:15
[task 2020-02-27T10:27:42.322Z] 10:27:42 INFO - setTimeout handler*TestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:184:15
[task 2020-02-27T10:27:42.323Z] 10:27:42 INFO - setTimeout handler*TestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:184:15
[task 2020-02-27T10:27:42.323Z] 10:27:42 INFO - setTimeout handler*TestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:184:15
[task 2020-02-27T10:27:42.323Z] 10:27:42 INFO - setTimeout handler*TestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:184:15
[task 2020-02-27T10:27:42.323Z] 10:27:42 INFO - TestRunner.runTests/<@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:420:16
[task 2020-02-27T10:27:42.323Z] 10:27:42 INFO - promise callback*TestRunner.runTests@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:407:48
[task 2020-02-27T10:27:42.323Z] 10:27:42 INFO - RunSet.runtests@chrome://mochikit/content/tests/SimpleTest/setup.js:218:14
[task 2020-02-27T10:27:42.323Z] 10:27:42 INFO - RunSet.runall@chrome://mochikit/content/tests/SimpleTest/setup.js:197:12
[task 2020-02-27T10:27:42.323Z] 10:27:42 INFO - hookupTests@chrome://mochikit/content/tests/SimpleTest/setup.js:294:12
[task 2020-02-27T10:27:42.323Z] 10:27:42 INFO - parseTestManifest@chrome://mochikit/content/manifestLibrary.js:50:13
[task 2020-02-27T10:27:42.323Z] 10:27:42 INFO - getTestManifest/req.onload@chrome://mochikit/content/manifestLibrary.js:61:28
[task 2020-02-27T10:27:42.323Z] 10:27:42 INFO - EventHandlerNonNull*getTestManifest@chrome://mochikit/content/manifestLibrary.js:57:3
[task 2020-02-27T10:27:42.323Z] 10:27:42 INFO - hookup@chrome://mochikit/content/tests/SimpleTest/setup.js:270:20
[task 2020-02-27T10:27:42.323Z] 10:27:42 INFO - linkAndHookup@chrome://mochikit/content/harness.xhtml:45:3
[task 2020-02-27T10:27:42.324Z] 10:27:42 INFO - parseTestManifest@chrome://mochikit/content/manifestLibrary.js:50:13
[task 2020-02-27T10:27:42.324Z] 10:27:42 INFO - getTestManifest/req.onload@chrome://mochikit/content/manifestLibrary.js:61:28
[task 2020-02-27T10:27:42.324Z] 10:27:42 INFO - EventHandlerNonNull*getTestManifest@chrome://mochikit/content/manifestLibrary.js:57:3
[task 2020-02-27T10:27:42.324Z] 10:27:42 INFO - getTestList@chrome://mochikit/content/chrome-harness.js:258:18
[task 2020-02-27T10:27:42.325Z] 10:27:42 INFO - loadTests@chrome://mochikit/content/harness.xhtml:24:14
Comment 47•5 years ago
|
||
Assignee | ||
Updated•5 years ago
|
Comment 48•5 years ago
|
||
Backed out 4 changesets (Bug 1589493) for causing mochitest failures at dom/ipc/tests/test_process_error.xhtml
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=290766439&repo=autoland&lineNumber=2643
[task 2020-02-27T14:07:24.814Z] 14:07:24 INFO - TEST-PASS | dom/ipc/tests/test_process_error.xhtml | dumpID is present and not an empty string
[task 2020-02-27T14:07:24.815Z] 14:07:24 INFO - Buffered messages finished
[task 2020-02-27T14:07:24.815Z] 14:07:24 INFO - TEST-UNEXPECTED-FAIL | dom/ipc/tests/test_process_error.xhtml | Test timed out.
[task 2020-02-27T14:07:24.815Z] 14:07:24 INFO - SimpleTest.ok@chrome://mochikit/content/tests/SimpleTest/SimpleTest.js:299:16
[task 2020-02-27T14:07:24.815Z] 14:07:24 INFO - reportError@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:128:22
[task 2020-02-27T14:07:24.815Z] 14:07:24 INFO - TestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:150:18
[task 2020-02-27T14:07:24.816Z] 14:07:24 INFO - setTimeout handler*TestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:184:15
[task 2020-02-27T14:07:24.816Z] 14:07:24 INFO - setTimeout handler*TestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:184:15
[task 2020-02-27T14:07:24.816Z] 14:07:24 INFO - setTimeout handler*TestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:184:15
[task 2020-02-27T14:07:24.816Z] 14:07:24 INFO - setTimeout handler*TestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:184:15
[task 2020-02-27T14:07:24.816Z] 14:07:24 INFO - setTimeout handler*TestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:184:15
[task 2020-02-27T14:07:24.816Z] 14:07:24 INFO - setTimeout handler*TestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:184:15
[task 2020-02-27T14:07:24.816Z] 14:07:24 INFO - setTimeout handler*TestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:184:15
[task 2020-02-27T14:07:24.816Z] 14:07:24 INFO - setTimeout handler*TestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:184:15
[task 2020-02-27T14:07:24.817Z] 14:07:24 INFO - setTimeout handler*TestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:184:15
[task 2020-02-27T14:07:24.817Z] 14:07:24 INFO - setTimeout handler*TestRunner._checkForHangs@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:184:15
[task 2020-02-27T14:07:24.817Z] 14:07:24 INFO - TestRunner.runTests/<@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:420:16
[task 2020-02-27T14:07:24.817Z] 14:07:24 INFO - promise callback*TestRunner.runTests@chrome://mochikit/content/tests/SimpleTest/TestRunner.js:407:48
[task 2020-02-27T14:07:24.817Z] 14:07:24 INFO - RunSet.runtests@chrome://mochikit/content/tests/SimpleTest/setup.js:218:14
[task 2020-02-27T14:07:24.817Z] 14:07:24 INFO - RunSet.runall@chrome://mochikit/content/tests/SimpleTest/setup.js:197:12
[task 2020-02-27T14:07:24.817Z] 14:07:24 INFO - hookupTests@chrome://mochikit/content/tests/SimpleTest/setup.js:294:12
[task 2020-02-27T14:07:24.817Z] 14:07:24 INFO - parseTestManifest@chrome://mochikit/content/manifestLibrary.js:50:13
[task 2020-02-27T14:07:24.817Z] 14:07:24 INFO - getTestManifest/req.onload@chrome://mochikit/content/manifestLibrary.js:61:28
[task 2020-02-27T14:07:24.818Z] 14:07:24 INFO - EventHandlerNonNull*getTestManifest@chrome://mochikit/content/manifestLibrary.js:57:3
[task 2020-02-27T14:07:24.818Z] 14:07:24 INFO - hookup@chrome://mochikit/content/tests/SimpleTest/setup.js:270:20
[task 2020-02-27T14:07:24.818Z] 14:07:24 INFO - linkAndHookup@chrome://mochikit/content/harness.xhtml:45:3
[task 2020-02-27T14:07:24.818Z] 14:07:24 INFO - parseTestManifest@chrome://mochikit/content/manifestLibrary.js:50:13
[task 2020-02-27T14:07:24.818Z] 14:07:24 INFO - getTestManifest/req.onload@chrome://mochikit/content/manifestLibrary.js:61:28
[task 2020-02-27T14:07:24.818Z] 14:07:24 INFO - EventHandlerNonNull*getTestManifest@chrome://mochikit/content/manifestLibrary.js:57:3
[task 2020-02-27T14:07:24.818Z] 14:07:24 INFO - getTestList@chrome://mochikit/content/chrome-harness.js:258:18
[task 2020-02-27T14:07:24.818Z] 14:07:24 INFO - loadTests@chrome://mochikit/content/harness.xhtml:24:14
[task 2020-02-27T14:07:24.818Z] 14:07:24 INFO - EventListener.handleEvent*@chrome://mochikit/content/harness.xhtml:48:12
[task 2020-02-27T14:07:25.677Z] 14:07:25 INFO - GECKO(2732) | MEMORY STAT vsizeMaxContiguous not supported in this build configuration.
[task 2020-02-27T14:07:25.677Z] 14:07:25 INFO - GECKO(2732) | MEMORY STAT | vsize 3028MB | residentFast 343MB | heapAllocated 116MB
Assignee | ||
Updated•5 years ago
|
Comment 49•5 years ago
|
||
Comment 50•5 years ago
|
||
Backed out 4 changesets (bug 1589493) for mochitest multiple failures in test_process_error_oom.xhtml
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=291259175&repo=autoland&lineNumber=49001
Backout: https://hg.mozilla.org/integration/autoland/rev/ab2dc21cce26e91ccddc8d8030cead53d0200d06
Assignee | ||
Comment 51•5 years ago
|
||
Mmmmh.... can't reproduce the issue either locally or on try.
Comment 52•5 years ago
|
||
Comment 53•5 years ago
|
||
Backed out 4 changesets for causing failures in test_process_error_oom.xhtml
Backout link: https://hg.mozilla.org/integration/autoland/rev/34693216604b848f77a737ecf36001bd2251ebad
Failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=291404853&repo=autoland&lineNumber=2664
Push with failures: https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&resultStatus=testfailed%2Cbusted%2Cexception&revision=d914e968de2c8fc2f4f4bb59154e8a9e8cf995c0
Assignee | ||
Comment 54•5 years ago
|
||
Ok, I finally know why the tests don't always pass.
In some configurations, moz_xmalloc
isn't a public symbol. I'll try and find a way around this.
Comment 55•5 years ago
|
||
Comment 56•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/73e3711e7849
https://hg.mozilla.org/mozilla-central/rev/3e2d218c4f0d
https://hg.mozilla.org/mozilla-central/rev/5afbdf2538dc
https://hg.mozilla.org/mozilla-central/rev/6a351aef2167
Updated•4 years ago
|
Updated•4 years ago
|
Description
•