If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Trigger onerror when out of memory instead of crashing with OOM

NEW
Unassigned

Status

()

Core
ImageLib
5 years ago
3 years ago

People

(Reporter: djf, Unassigned)

Tracking

(Blocks: 1 bug)

18 Branch
ARM
Gonk (Firefox OS)
Points:
---

Firefox Tracking Flags

(b2g-v2.0 affected)

Details

(Whiteboard: [MemShrink:P2], [2.0-flame-test-run-2])

(Reporter)

Description

5 years ago
The FirefoxOS gallery app needs to be able to decode large images on a low-memory device without virtual memory, and it needs to be able to do so without crashing.  If there is not enough memory to decode an image, I'd expect the img element's onerror handler to be triggered. Instead, however, the gallery app dies with an OOM error.

Perhaps this has more to do with the process and memory management in the FirefoxOS kernel, so feel free to reassign this bug to a different product and component.
(Reporter)

Updated

5 years ago
Blocks: 854783
I don't know that this is possible in our system.  We can try to approximate this, but I'm not convinced we can get close.

There are three essential problems.

1.  One is that it's impossible to read the amount of free memory on the system without race conditions.  That is, as soon as you read the amount of free memory on the system, your process or some other process might allocate some memory, making your measurement invalid.

2.  But suppose that doesn't happen, and the gallery app sees that it has enough memory to decode an image, and does so.  At this point, any other module in Gecko might try to allocate memory and OOM us.  So we'd have to leave a large bit of memory unused.  How large?  It's very hard to say, and depends completely on the application.

3. But suppose the gallery app doesn't OOM itself.  There's nothing preventing the main process from needing to allocate some memory and OOM'ing the gallery.

I think something like this would be nice to have, but I also am not convinced it's possible to approximate the ideal close enough for it to be useful.
I think a much better solution would be "decode pieces of large images when you're zoomed all the way in, so we don't have to use so much memory".  I don't know if that's possible either.
Whiteboard: [MemShrink]
(Reporter)

Comment 3

5 years ago
(In reply to Justin Lebar [:jlebar] from comment #1)
> I don't know that this is possible in our system.  We can try to approximate
> this, but I'm not convinced we can get close.
> 
> There are three essential problems.
> 
> 1.  One is that it's impossible to read the amount of free memory on the
> system without race conditions.  That is, as soon as you read the amount of
> free memory on the system, your process or some other process might allocate
> some memory, making your measurement invalid.
> 

I'm probably mis-understanding the system architecture, but I assume that somewhere (imgFrame::Init()?) there is a malloc() call of some sort that is returning null when there is not enough memory to decode the image. Why can't that be a non-fatal error instead of causing a crash?

Is the kernel killing the process before it even gets to that stage?
> I assume that somewhere (imgFrame::Init()?) there is a malloc() call of some sort that is returning 
> null when there is not enough memory to decode the image.

There is in fact a malloc() call.  But I doubt it's returning null.

On Linux, mmap will happily hand out more memory than is available on the system.  When an app writes to that memory, the OS takes a page fault and "wires" some RAM to back that page.  Processes get killed due to OOM on Linux when this process fails.  On Android, there's an additional mechanism, the low-memory killer (LMK).  This one kills processes before the system is completely out of memory.  But it's the same mechanism -- you die when we take a page fault, wire a page, and cause the amount of free RAM to decrease below a threshold.

I think the only time we commonly return null from malloc on Linux is when we run out of virtual memory.  I don't think that's what's happening here.  But you could check; it's a simple matter of looking at the logcat + dmesg to see if we were killed due to the LMK.
Whiteboard: [MemShrink] → [MemShrink:P2]
Maybe we could allocate image memory in a "discardable" memory pool in the kernel.
Whiteboard: [MemShrink:P2] → [MemShrink]
Whiteboard: [MemShrink] → [MemShrink:P2]
(In reply to Justin Lebar [:jlebar] from comment #5)
> Maybe we could allocate image memory in a "discardable" memory pool in the
> kernel.

Bug 748598 is related.

Updated

3 years ago
QA Whiteboard: [QAnalyst-Triage?]
status-b2g-v2.0: --- → affected
Flags: needinfo?(ktucker)
Whiteboard: [MemShrink:P2] → [MemShrink:P2], [2.0-flame-test-run-2]
QA Whiteboard: [QAnalyst-Triage?] → [QAnalyst-Triage+]
Flags: needinfo?(ktucker)

Updated

3 years ago
QA Whiteboard: [QAnalyst-Triage+] → [QAnalyst-Triage+][lead-review+]
You need to log in before you can comment on or make changes to this bug.