Closed Bug 1277397 Opened 8 years ago Closed 1 year ago

High memory usage triggered by a jpg

Categories

(Core :: Graphics: ImageLib, defect, P3)

defect

Tracking

()

RESOLVED FIXED
113 Branch
Tracking Status
firefox-esr52 --- wontfix
firefox-esr78 --- wontfix
firefox-esr91 --- wontfix
firefox-esr102 --- wontfix
firefox56 --- wontfix
firefox57 --- wontfix
firefox58 --- wontfix
firefox59 --- wontfix
firefox89 --- wontfix
firefox90 --- wontfix
firefox91 --- wontfix
firefox111 --- wontfix
firefox112 --- wontfix
firefox113 --- fixed

People

(Reporter: tsmith, Assigned: tnikkel)

References

(Blocks 2 open bugs)

Details

(Keywords: crash, csectype-oom, testcase, Whiteboard: [MemShrink:P2][gfx-noted])

Attachments

(3 files)

Attached image test_case.jpg
This test case triggers a large amount of memory to be consumed and then the browser process continues to grow until the system OOM killer terminates the process. I've seen it go up to ~10GB before the process was killed.
See Also: → 1262549
Whiteboard: [MemShrink]
The size indicated in the SOF (start of frame) chunk is 60000 x 60000 pixels. So we probably try to allocate a surface that size.
Flags: needinfo?(seth)
Whiteboard: [MemShrink] → [MemShrink][gfx-noted]
(In reply to Timothy Nikkel (:tnikkel) from comment #1)
> The size indicated in the SOF (start of frame) chunk is 60000 x 60000
> pixels. So we probably try to allocate a surface that size.

Yep, that'd require at least 14GB. Given Linux's overcommit behavior, we probably succeed in allocating that much and then trigger an OOM later as the decoder writes data into the surface.

However, what I'd like to know is, why are we even *attempting* to allocate a surface that larger? |image.mem.surfacecache.max_size_kb|'s default value limits us to 1GB of memory for images at the most. Decoding should've failed.
Flags: needinfo?(seth)
Yep, that's what happens on my system. We're able to display it when zoomed out because of downscale-during-decode, but decoding fails when zoomed in. We never even attempt to allocate a 60000x60000 surface.

Tyson, can you reproduce this in safe mode? Have you changed any image-related prefs in about:config? Can you verify that you're on Linux? (I assumed you were because you mentioned the OOM killer.)
Flags: needinfo?(twsmith)
See Also: → 1281725
Sounds like we need the SurfaceCache limit in bug 1281725 enforced for JPEG too. It may be good to put that logic somewhere in ImgLib that's shared across all formats.
Whiteboard: [MemShrink][gfx-noted] → [MemShrink:P2][gfx-noted]
(In reply to Jet Villegas (:jet) from comment #4)
> Sounds like we need the SurfaceCache limit in bug 1281725 enforced for JPEG
> too. It may be good to put that logic somewhere in ImgLib that's shared
> across all formats.

In this case my guess is that libjpeg-turbo itself is allocating the memory, so we can't trivially share the fix. libpng's deinterlacing system is also specific to libpng, I'm afraid.
(In reply to Seth Fowler [:seth] [:s2h] from comment #3)
> Tyson, can you reproduce this in safe mode? 
Yes, I can repro in safe mode.

> Have you changed any image-related prefs in about:config?
Nope, I can repro with a fresh profile.

> Can you verify that you're on Linux? (I assumed you were because you mentioned the OOM killer.)
Yes I was running Linux.
Flags: needinfo?(twsmith)
Blocks: grizzly
Our fuzzers are still hitting this issue.
Hi,

I've been poking around with this bug for the last few days, mostly been trying to find out where that huge amount of memory is being allocated.
Bare in mind I've never touched ImgLib before.

As far as I can tell, the memory gets allocated well before the JPEG gets decoded. Presumably something is reading the JPEG size (from the header, maybe?) and allocating some buffer somewhere.

I've been searching through nsJPEGDecoder->DecoderFactory->RasterImage->ImageFactory->imgRequest trying to find something that allocates memory, reads image size or anything relevant.
I checked out (via a lot of reading, and a lot of break-point setting) everything that looked like it might allocate the memory, or return the information used to allocate the memory, but not found anything yet.
No matter where I set a breakpoint the memory has already been allocated.
Safe to say the memory allocation happens way before libjpeg gets involved.

I feel like I'm looking in the wrong place here, any pointers or ideas of where to look would be useful.
If you break in nsJPEGDecoder::ReadJPEGData I would think that no frames or buffers would have been allocated by the first call of that function.
Nope. 
`watch -n 1 'free -h'` still shows memory usage rising up even when breaking on entry to nsJPEGDecoder::ReadJPEGData.
The memory keeps rising , with gdb just sitting on the entry to the method, until it fills all the RAM and swap and gets purged by OOM killer.
Oh, are you attached to the parent process? The image decoding will happen in the child process, so you won't hit the breakpoint for the correct jpeg decoder. You can either attach to the child process or flip the pref browser.tabs.remote.autostart to go into single process mode.
Thanks for the tip.
It might be useful to put that tip in the MDN documentation somewhere, any thoughts on where that might live?


comment 5 is correct, it is libjpeg-turbo allocating the memory on the call to jpeg_consume_input.
Here: https://dxr.mozilla.org/mozilla-central/source/image/decoders/nsJPEGDecoder.cpp#474

I've added a check in the readJPEG method to check if decoding the image will allocate more memory than SurfaceCache::MaximumCapacity() similar to the check in nsPNGDecoder.

The patch compiles for me on Linux and successfully stops the test case image loading, and therefore filling up all my RAM.

When loading the test image FF just doesnt load the image.
I'm not sure how to report back a useful error to the user.
Comment on attachment 8935421 [details]
Bug 1277397 - High memory usage triggered by a jpg

https://reviewboard.mozilla.org/r/206316/#review212062

::: image/decoders/nsJPEGDecoder.cpp:384
(Diff revision 1)
>      }
>  
> +   
> +    //If the memory needed by the image is bigger than the SurfaceCache size limit,
> +    //don't start decompressing. Bail out.
> +    if ( ((mInfo.image_width * mInfo.image_height) *24) >

I think that libjpegturbo only makes this large allocation if mInfo.buffered_image is set to true below. So I think we only need this check if that is true. This will allow us to decode large images if we are downscaling them to a smaller size because we only need enough space for the downscaled size image.
(In reply to Timothy Nikkel (:tnikkel) from comment #14)
> Comment on attachment 8935421 [details]
> Bug 1277397 - High memory usage triggered by a jpg
> 
> https://reviewboard.mozilla.org/r/206316/#review212062
> 
> ::: image/decoders/nsJPEGDecoder.cpp:384
> (Diff revision 1)
> >      }
> >  
> > +   
> > +    //If the memory needed by the image is bigger than the SurfaceCache size limit,
> > +    //don't start decompressing. Bail out.
> > +    if ( ((mInfo.image_width * mInfo.image_height) *24) >
> 
> I think that libjpegturbo only makes this large allocation if
> mInfo.buffered_image is set to true below. So I think we only need this
> check if that is true. This will allow us to decode large images if we are
> downscaling them to a smaller size because we only need enough space for the
> downscaled size image.

Hm. Given this new check is above where mInfo.buffered_image is determined, it would be nice if we changed mDecodeStyle at this point instead. I think it is better for the user to decode sequentially instead of progressively, if the alternative is not at all, no?
Oops - I didn't see comment 51 before I pushed that most recent change.

The above moves the check to after we assign mInfo.buffered_image and still succeeds in refusing to load the test image.
I havent managed to get the test image to load at all actually. 
I'm not sure how to get FF to downscale it since the zoom level resets every time I reload the page, and zooming out when the image has already been not-loaded doesnt load the image.
Since I cant check that the image does load if we're downscaling, I'm concerned that the above patch doesnt work as we'd like.
Appreciate if that could be checked by someone.

I'd agree that the best option would be to at least try to load the image (sequentially).
I'll get on implementing that and see what happens.
The above patch checks the size of the image and sets mDecodeStyle to SEQUENTIAL.
Assuming the test case image is a plain white image (thats what I can see), this patch allows the image to be loaded and FF does not use a huge amount of memory when doing it.

Worth noting that when opening the test case FF complains of 'Corrupt JPEG data: 18 extraneous bytes before marked 0xdb'
GIMP also complains of the same thing and flat out can't open the image.

Added crashtest in bug 1633121.

Hey Taylor,
Can you still reproduce this issue or should we close it?

Flags: needinfo?(twsmith)

I can still reproduce this with m-c 20210705-4d0245a42361.

I used an ASan builds and set ASAN_OPTIONS=max_allocation_size_mb=512:allocator_may_return_null=false to get the following stack trace:

==3898==ERROR: AddressSanitizer: requested allocation size 0x3b99d037 (0x3b99e038 after adjustments for alignment, red zones etc.) exceeds maximum supported size of 0x20000000 (thread T8 (TaskCon~read #1))
    #0 0x55d3cbb76abd in malloc /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3
    #1 0x7f50e4a9ee8d in alloc_large /builds/worker/checkouts/gecko/media/libjpeg/jmemmgr.c:391:29
    #2 0x7f50e4a9f3eb in alloc_barray /builds/worker/checkouts/gecko/media/libjpeg/jmemmgr.c:527:28
    #3 0x7f50e4aa0220 in realize_virt_arrays /builds/worker/checkouts/gecko/media/libjpeg/jmemmgr.c:741:26
    #4 0x7f50e4a5c1b9 in master_selection /builds/worker/checkouts/gecko/media/libjpeg/jdmaster.c:573:3
    #5 0x7f50e4a5c1b9 in jinit_master_decompress /builds/worker/checkouts/gecko/media/libjpeg/jdmaster.c:736:3
    #6 0x7f50e4a28901 in jpeg_start_decompress /builds/worker/checkouts/gecko/media/libjpeg/jdapistd.c:49:5
    #7 0x7f50df2b0dcf in mozilla::image::nsJPEGDecoder::ReadJPEGData(char const*, unsigned long) /builds/worker/checkouts/gecko/image/decoders/nsJPEGDecoder.cpp:400:11
    #8 0x7f50df34f2e9 in operator() /builds/worker/checkouts/gecko/image/decoders/nsJPEGDecoder.cpp:186:34
    #9 0x7f50df34f2e9 in mozilla::Maybe<mozilla::Variant<mozilla::image::TerminalState, mozilla::image::Yield> > mozilla::image::StreamingLexer<mozilla::image::nsJPEGDecoder::State, 16ul>::ContinueUnbufferedRead<mozilla::image::nsJPEGDecoder::DoDecode(mozilla::image::SourceBufferIterator&, mozilla::image::IResumable*)::$_5>(char const*, unsigned long, unsigned long, mozilla::image::nsJPEGDecoder::DoDecode(mozilla::image::SourceBufferIterator&, mozilla::image::IResumable*)::$_5) /builds/worker/checkouts/gecko/image/StreamingLexer.h:555:9
    #10 0x7f50df2ad610 in UnbufferedRead<(lambda at /builds/worker/checkouts/gecko/image/decoders/nsJPEGDecoder.cpp:183:21)> /builds/worker/checkouts/gecko/image/StreamingLexer.h:501:12
    #11 0x7f50df2ad610 in Lex<(lambda at /builds/worker/checkouts/gecko/image/decoders/nsJPEGDecoder.cpp:183:21)> /builds/worker/checkouts/gecko/image/StreamingLexer.h:469:26
    #12 0x7f50df2ad610 in mozilla::image::nsJPEGDecoder::DoDecode(mozilla::image::SourceBufferIterator&, mozilla::image::IResumable*) /builds/worker/checkouts/gecko/image/decoders/nsJPEGDecoder.cpp:182:17
    #13 0x7f50df157697 in mozilla::image::Decoder::Decode(mozilla::image::IResumable*) /builds/worker/checkouts/gecko/image/Decoder.cpp:177:19
    #14 0x7f50df164d19 in mozilla::image::DecodedSurfaceProvider::Run() /builds/worker/checkouts/gecko/image/DecodedSurfaceProvider.cpp:123:34
    #15 0x7f50df191dcc in mozilla::image::DecodingTask::Run() /builds/worker/checkouts/gecko/image/DecodePool.cpp:146:12
    #16 0x7f50dc1d6254 in mozilla::TaskController::RunPoolThread() /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:287:33
    #17 0x7f50f95b23fe in _pt_root /builds/worker/checkouts/gecko/nsprpub/pr/src/pthreads/ptthread.c:201:5
    #18 0x7f50fd6d1608 in start_thread /build/glibc-eX1tMB/glibc-2.31/nptl/pthread_create.c:477:8

==3898==HINT: if you don't care about these errors you may set allocator_may_return_null=1
SUMMARY: AddressSanitizer: allocation-size-too-big /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cpp:145:3 in malloc
Thread T8 (TaskCon~read #1) created by T0 (Web Content) here:
    #0 0x55d3cbb60e6c in pthread_create /builds/worker/fetches/llvm-project/llvm/projects/compiler-rt/lib/asan/asan_interceptors.cpp:205:3
    #1 0x7f50f95a2474 in _PR_CreateThread /builds/worker/checkouts/gecko/nsprpub/pr/src/pthreads/ptthread.c:458:14
    #2 0x7f50f959394e in PR_CreateThread /builds/worker/checkouts/gecko/nsprpub/pr/src/pthreads/ptthread.c:533:12
    #3 0x7f50dc1d6dbb in mozilla::TaskController::InitializeThreadPool() /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:181:10
    #4 0x7f50dc1d832a in mozilla::TaskController::AddTask(already_AddRefed<mozilla::Task>&&) /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:357:7
    #5 0x7f50de15a1ee in DispatchOffThreadTask() /builds/worker/checkouts/gecko/js/xpconnect/src/XPCJSContext.cpp:1182:26
    #6 0x7f50e83f8583 in js::GlobalHelperThreadState::submitTask(js::GCParallelTask*, js::AutoLockHelperThreadState const&) /builds/worker/checkouts/gecko/js/src/vm/HelperThreads.cpp:2034:3
    #7 0x7f50e8d3749b in startWithLockHeld /builds/worker/checkouts/gecko/js/src/gc/GCParallelTask.cpp:36:23
    #8 0x7f50e8d3749b in js::GCParallelTask::start() /builds/worker/checkouts/gecko/js/src/gc/GCParallelTask.cpp:46:3
    #9 0x7f50e8d3ecf6 in js::gc::GCRuntime::beginPreparePhase(JS::GCReason, js::gc::AutoGCSession&) /builds/worker/checkouts/gecko/js/src/gc/GC.cpp:4440:14
    #10 0x7f50e8d545f5 in js::gc::GCRuntime::incrementalSlice(js::SliceBudget&, mozilla::Maybe<JS::GCOptions> const&, JS::GCReason) /builds/worker/checkouts/gecko/js/src/gc/GC.cpp:7135:12
    #11 0x7f50e8d5945b in js::gc::GCRuntime::gcCycle(bool, js::SliceBudget const&, mozilla::Maybe<JS::GCOptions> const&, JS::GCReason) /builds/worker/checkouts/gecko/js/src/gc/GC.cpp:7656:3
    #12 0x7f50e8d5ab82 in js::gc::GCRuntime::collect(bool, js::SliceBudget const&, mozilla::Maybe<JS::GCOptions> const&, JS::GCReason) /builds/worker/checkouts/gecko/js/src/gc/GC.cpp:7865:9
    #13 0x7f50e8d22eea in gc /builds/worker/checkouts/gecko/js/src/gc/GC.cpp:7945:3
    #14 0x7f50e8d22eea in js::gc::GCRuntime::freezeSelfHostingZone() /builds/worker/checkouts/gecko/js/src/gc/GC.cpp:1610:3
    #15 0x7f50e860fc3f in InitSelfHostingFromStencil(JSContext*, JS::Handle<js::GlobalObject*>, js::frontend::CompilationInput&, js::frontend::CompilationStencil const&) /builds/worker/checkouts/gecko/js/src/vm/SelfHosting.cpp:2688:21
    #16 0x7f50e86104e4 in JSRuntime::initSelfHosting(JSContext*, mozilla::Span<unsigned char const, 18446744073709551615ul>, bool (*)(JSContext*, mozilla::Span<unsigned char const, 18446744073709551615ul>)) /builds/worker/checkouts/gecko/js/src/vm/SelfHosting.cpp:2743:14
    #17 0x7f50e83ff64d in JS::InitSelfHostedCode(JSContext*, mozilla::Span<unsigned char const, 18446744073709551615ul>, bool (*)(JSContext*, mozilla::Span<unsigned char const, 18446744073709551615ul>)) /builds/worker/checkouts/gecko/js/src/vm/Initialization.cpp:249:12
    #18 0x7f50de159085 in XPCJSContext::Initialize() /builds/worker/checkouts/gecko/js/xpconnect/src/XPCJSContext.cpp:1380:8
    #19 0x7f50de15ad42 in XPCJSContext::NewXPCJSContext() /builds/worker/checkouts/gecko/js/xpconnect/src/XPCJSContext.cpp:1416:23
    #20 0x7f50de1edb68 in nsXPConnect::InitJSContext() /builds/worker/checkouts/gecko/js/xpconnect/src/nsXPConnect.cpp:84:25
    #21 0x7f50dc266a8e in NS_InitXPCOM /builds/worker/checkouts/gecko/xpcom/build/XPCOMInit.cpp:491:5
    #22 0x7f50e7dc2997 in XRE_InitEmbedding2(nsIFile*, nsIFile*, nsIDirectoryServiceProvider*) /builds/worker/checkouts/gecko/toolkit/xre/nsEmbedFunctions.cpp:189:8
    #23 0x7f50dd4646aa in mozilla::ipc::ScopedXREEmbed::Start() /builds/worker/checkouts/gecko/ipc/glue/ScopedXREEmbed.cpp
    #24 0x7f50e33c6c6b in mozilla::dom::ContentProcess::Init(int, char**) /builds/worker/checkouts/gecko/dom/ipc/ContentProcess.cpp:202:13
    #25 0x7f50e7dc322c in XRE_InitChildProcess(int, char**, XREChildData const*) /builds/worker/checkouts/gecko/toolkit/xre/nsEmbedFunctions.cpp:708:21
    #26 0x55d3cbbaa74d in content_process_main(mozilla::Bootstrap*, int, char**) /builds/worker/checkouts/gecko/browser/app/../../ipc/contentproc/plugin-container.cpp:57:28
    #27 0x55d3cbbaab7d in main /builds/worker/checkouts/gecko/browser/app/nsBrowserApp.cpp:327:18
    #28 0x7f50fd19e0b2 in __libc_start_main /build/glibc-eX1tMB/glibc-2.31/csu/../csu/libc-start.c:308:16
QA Whiteboard: [qa-not-actionable]
Version: 46 Branch → unspecified
Severity: critical → S2

High mem usage only triggered by files specifically constructed to cause such a problem, lowering severity -> S3.

Severity: S2 → S3
See Also: → CVE-2023-32209

This means that for trivial images whose header specifies large dimensions but with no image data we don't use a lot of memory.

Assignee: nobody → tnikkel
Status: NEW → ASSIGNED
Pushed by tnikkel@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/f2bfdba84905
Set a max memory limit for libjpeg turbo to use. r=gfx-reviewers,bradwerth
Status: ASSIGNED → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 113 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: