Possible uninitialized memory leak when rendering corrupted JPEGs

RESOLVED FIXED in Firefox 48

Status

()

Core
ImageLib
RESOLVED FIXED
3 years ago
6 months ago

People

(Reporter: Michal Zalewski, Assigned: tnikkel)

Tracking

({sec-other})

31 Branch
mozilla48
sec-other
Points:
---

Firefox Tracking Flags

(firefox34 wontfix, firefox35 wontfix, firefox36 wontfix, firefox37 affected, firefox38 affected, firefox39 affected, firefox48 fixed, firefox-esr31 ?)

Details

(Whiteboard: [adv-main48-])

Attachments

(15 attachments)

90.53 KB, text/plain
Details
2.24 KB, patch
Details | Diff | Splinter Review
431 bytes, application/x-shellscript
Details
43.87 KB, image/gif
Details
43.87 KB, image/gif
Details
324.00 KB, image/png
Details
4.77 KB, application/x-gzip
Details
112.57 KB, patch
Details | Diff | Splinter Review
2.90 KB, patch
Details | Diff | Splinter Review
14.32 KB, patch
Details | Diff | Splinter Review
3.91 KB, patch
Details | Diff | Splinter Review
2.79 KB, patch
Details | Diff | Splinter Review
38.15 KB, text/plain
Details
9.03 KB, patch
seth
: review+
Details | Diff | Splinter Review
5.13 KB, patch
Details | Diff | Splinter Review
(Reporter)

Description

3 years ago
Hey,

This image appears to decode to several different variants:

http://lcamtuf.coredump.cx/ffjpeg/

More specifically, the image file is:

http://lcamtuf.coredump.cx/ffjpeg/id:000511,src:000172,+cov.jpg

...the JS simply loads and renders it repeatedly on <canvas>, then compares .toDataURL() for each attempt. I haven't really investigated yet, but it may imply the use of uninitialized memory. I'm pretty sure this is distinct from bug 1045977 and possibly caused by libjpeg-turbo.
Component: Security → ImageLib
Product: Firefox → Core
Gary, do you have a Valgrind browser build around and can run this test case?  Thanks.
Flags: needinfo?(gary)
Unfortunately, nope, I don't have one ready off-hand, plus I'm swamped these few weeks. :(
Flags: needinfo?(gary)
(In reply to Michal Zalewski from comment #0)

I didn't get any reports of badness when displaying this.

> http://lcamtuf.coredump.cx/ffjpeg/id:000511,src:000172,+cov.jpg

Nor when *loading* this

> http://lcamtuf.coredump.cx/ffjpeg/

But when I come to scroll down the completed page, there are indeed
reports of undefined value usage.  Log to follow.
Created attachment 8471779 [details]
spew-24-mc-1050342

Log file.  Errors towards the end.
Thanks for the log, Julian.  This looks very similar to bug 1045977, in that it is writing on memory allocated from a VolatileBuffer, so it may be fixed by the same patch.
Depends on: 1045977
QA Contact: kamiljoz
Kamil: please verify that the fix in bug 1045977 actually fixes this one
Flags: needinfo?(kamiljoz)
Keywords: csectype-disclosure, sec-high
Whiteboard: probable dupe of bug 1045977
Using m-c of just now (200945:0b9dd32d1e16) I can no longer get
any Valgrind complaints from http://lcamtuf.coredump.cx/ffjpeg/

That said, I still see differences from V vs native execution,
which are somewhat troubling IMO:

* on V, there's only 1 variant reported.  Natively, 3 are.

* on V, many of the boxes never get filled in -- they are
  just grey frames.  Natively, all of them are.

Comment 8

3 years ago
The variants look a bit suspicious to me - looks like we might be looking at some data structure. It should get decoded to grey frames.. Bug 1045977 probably helped but it looks like something else going on here.
Ok, thanks!
Whiteboard: probable dupe of bug 1045977
So I tried the link from comment #0 on several different machines and got anywhere between 1-3 variants on all the different platforms that I tried. I used the latest m-c build:
- http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-08-27-03-02-02-mozilla-central/

* Windows 7 Home Premium SP1 x64 -> received 1, 2 and 3 variants
* OSX 10.9.4 x64 -> received 2 and 3 variants (never received 1 variant)
* Ubuntu 13.10 x64 (VM) -> received 1, 2 and 3 variants
* Windows 8.1 x64 (VM) -> usually receive 1 variant but I get 2 variants here and there but it's really rare

I'm not sure what the expected results are supposed to be?? Should we be getting a single variant on all the different platforms?
Flags: needinfo?(kamiljoz)
Kamil, I think you need a build with --enable-valgrind; none of the tbpl builds as far as I know have that enabled during configure, so you might have to compile one yourself.

https://developer.mozilla.org/en-US/docs/Mozilla/Testing/Valgrind (not sure how updated this is - also, Valgrind on SVN *might* work on Mac OS X 10.9.x)

Comment 12

3 years ago
(In reply to Kamil Jozwiak [:kjozwiak] from comment #10)
> I'm not sure what the expected results are supposed to be?? Should we be
> getting a single variant on all the different platforms?

Yeah, it should be a single variant.

Comment 13

3 years ago
Been poking at this for a few hours and not getting very far.

However, I've seen one interesting thing - the output depends on whether we hit the network cache. If the image is loaded from the server, you can see the usual trash in the output. But, if you go to about:memory and hit minimize memory usage, the image will turn gray.
Assignee: nobody → mwu

Comment 14

3 years ago
I don't know if I know jpeg decoding well enough to really tackle this bug quickly.

Comment 15

3 years ago
Unassigning myself since I don't think I can easily figure this one out.
Assignee: mwu → nobody
status-firefox32: --- → ?
status-firefox33: --- → ?
status-firefox34: --- → affected
status-firefox35: --- → affected
status-firefox-esr31: --- → ?
Flags: needinfo?(milan)
Milan, can you identify someone who can work on this?
Yes, eventually; I'm assuming the urgency matches that of sec-high, rather than sec-critical, but I realized I don't know what "csectype-disclosure" to the urgency.
Flags: needinfo?(milan)
Thanks for the info Gary, I used the MDN and installed valgrind on my Ubuntu 13.10 x64 VM. I basically got the same behavior as Julian mentioned in comment #7. When I visit the link using valgrind, I always get a single variant with the "The image has just one variant. You're probably OK." popup. However, visiting the same link natively will display several variants.

Tested the above using changeset 843332cc69af [m-c]
"csectype-disclosure" is an information disclosure issue. It doesn't bear on the urgency specifically. Treating this as a sec-high rated issue is enough.
I can't reproduce this any more on my local machine.  Kamil, can you try to reproduce again?  Maybe one of the other image patches fixed this.
Flags: needinfo?(kamiljoz)
(Reporter)

Comment 21

3 years ago
It does repro for me on 33.0.2
I only managed to reproduce the issue once on fx 34.0 while running fx through valgrind. However, when downloading the builds and running them natively, I reproduced the issue on every single build several times. Results:

Building & Running through valgrind: (loaded the website at 25 times on each of the following builds)

* fx 36.0a1 using changeset 675913ddbb55: no variants
* fx 35.0a2 using changeset ee017c79f5a8: no variants
* fx 34.0 using changeset 73905ff57286: received 2 variants
* fx 33.0.2 using changeset 7dc4a9d1b3e6: no variants
* fx 33.1 using changeset 2c6590150a85: no variants

Native builds:

* fx 36.0a1: received 2 variants several times
** http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-10-31-06-18-04-mozilla-central/

* fx 35.0a2: received 2, 3 variants several times
** http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2014-10-31-00-40-03-mozilla-aurora/

* fx 34.0b5: received 2 variants several times
** http://ftp.mozilla.org/pub/mozilla.org/firefox/candidates/34.0b5-candidates/build2/

* fx 33.0.2: received 2 variants several times
** http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/33.0.2/

* fx 33.1: received 2 variants several times
** http://ftp.mozilla.org/pub/mozilla.org/firefox/candidates/33.1-candidates/build2/
Flags: needinfo?(kamiljoz)
Group: gfx-core-security
Seth, do you have some idea who could look at this?  Thanks.
Flags: needinfo?(seth)
Michael, can you take a look at this?
Flags: needinfo?(mwu)
status-firefox32: ? → ---
status-firefox33: ? → ---
status-firefox34: affected → wontfix
status-firefox35: affected → wontfix
status-firefox36: --- → affected
status-firefox37: --- → affected
status-firefox38: --- → affected

Comment 25

2 years ago
(In reply to Al Billings [:abillings] from comment #24)
> Michael, can you take a look at this?

I looked at it before and gave up because I can't untangle the jpeg decoder well enough to figure out what's going on. See also comment 15.
Flags: needinfo?(mwu)
Still an issue post-bug 1061240?
Seth, want me to take a look at this again? See if I can reproduce it with newer builds.
(In reply to Kamil Jozwiak [:kjozwiak] from comment #27)
> Seth, want me to take a look at this again? See if I can reproduce it with
> newer builds.

Yes, that'd be great! It sounds from the discussion in this bug like this issue was actually *inside* libjpeg-turbo, so we may well have gotten a fix for free by updating.
Looks like the issue is still happening. I didn't go through each channel via valgrind as that would take a lot longer and reproducing it that way is harder than using native builds (see comment #22). Either way, I got 2 or more variants on each channel:

* fx 38.0a1: received 2 variants twice (really hard to reproduce on m-c)
** http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2015-01-22-03-02-02-mozilla-central/

* fx 37.0a2: received 2 variants multiple times (easily reproducible)
** http://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/2015-01-22-00-40-04-mozilla-aurora/

* fx 36.0b2: received 2 variants multiple times (easily reproducible)
** http://ftp.mozilla.org/pub/mozilla.org/firefox/candidates/36.0b2-candidates/build1/

* fx 35.0: received 2 variants multiple times (easily reproducible)
** http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/latest/

* fx 31.4.0 ESR: received 2 variants multiple times (easily reproducible)
** http://ftp.mozilla.org/pub/mozilla.org/firefox/releases/latest-esr/
Thanks for looking into it. It sounds like we still need to look into this. I wonder if we can construct a testcase that reproduces against *just* libjpeg-turbo, which would greatly reduce the amount of code that's under suspicion here.
(Reporter)

Comment 31

2 years ago
I couldn't get inconsistent output or any obvious ASAN violations with libjpeg-turo's djpeg, sorry :-( My best guess is that the problem is specific to how browsers use the library (dunno, progressive rendering, whatnot), but I honestly don't know what's up. We could always try to rope in DRC, but he may be disinclined to investigate if we don't have a clean repro against libjpeg-turbo.
(Reporter)

Comment 32

2 years ago
OK, better late than never. I believe I have a much simpler repro; this usually triggers two variants on my box:

http://lcamtuf.coredump.cx/jpeg_simpler/

The original image:

http://lcamtuf.coredump.cx/jpeg_simpler/image_orig.jpg

Fault-triggering variant:

http://lcamtuf.coredump.cx/jpeg_simpler/image.jpg

The modification amounts to a single bit flip in byte 172:

ff c0 00 11 08 01 00 00 98 03 01 11    00  02 11    01
^^^^^ ^^^^^ ^^ ^^^^^ ^^^^^ ^^ ^^ ^^    ^^  ^^ **    ^^
SOF0  Lf    P  YRes  XRes  Nf Ci Hi/Vi Tqi Ci Hi/Vi Tqi

The byte being flipped is marked with **. It's the horizontal sampling factor for component 2. Between the cases, 0x11 becomes 0x31 - so Hi changes from 0x1 to 0x3.

However, I am not sure what's the significance of sampling factor 3, or why does it apparently make jpeg-turbo use uninitialized memory in the browser, without being reproducible with djpeg under ASAN / Valgrind.

Perhaps worth asking DRC.
(Reporter)

Comment 33

2 years ago
Other potentially useful tidbits:

- Factors 2 and 4 are not affected, higher factors don't work,

- Changing horizontal and vertical sampling factors seems to have comparable results,

- This does not appear to be reproducible for sampling factors for component 1; but both chroma components (2 & 3) are affected in the same way.
(Reporter)

Comment 34

2 years ago
Ping?

Also adding DRC in case he has any thoughts.

Comment 35

2 years ago
So if I'm understanding correctly, something (do we know what, or is it still unclear?) is flipping the sampling factor byte and causing it to say 3:1 instead of 1:1, and this is causing libjpeg-turbo to read uninitialized memory?  If so, then this is not a surprise, because you're asking us to decode an image with a different amount of data than what is actually in the image, and as far as I know, there isn't any way to verify that the sampling factor reported in the header actually matches the data.  I think the underlying issue is-- why does that byte get twiddled in the first place?

I'm afraid that I don't have much to offer here, but if you come up with a patch to guard against this at the libjpeg-turbo level, I'm happy to review it.
(Reporter)

Comment 36

2 years ago
The issue is that *something* is using uninitialized memory when trying to decode a JPEG with a 3:1 chroma sampling factor instead of 2:1; the bit is intentionally flipped by a fuzzer, but the browser should not allow uninitialized memory to be used for rendering, because when this image is read back (e.g., via the <canvas> API), it could leak secrets from browser's memory.

The reason why I think it has to do with libjpeg-turbo is that the behavior is exactly the same in Chrome and Firefox, despite fairly different image parsing "glue". That said, it's complicated because the issue doesn't obviously reproduce with djpeg or so, and probably has to do with how the browsers integrate with the library.

I was mostly just hoping that some plausible scenario would come to you right away; the interesting observations is that this only affects the chroma components, both for X and Y sampling factors; and that it seems to be specific to the 3:1 factor, not showing up for 4:1.
status-firefox36: affected → wontfix
status-firefox39: --- → affected

Comment 37

2 years ago
(In reply to Michal Zalewski from comment #36)
> The reason why I think it has to do with libjpeg-turbo is that the behavior
> is exactly the same in Chrome and Firefox, despite fairly different image
> parsing "glue". That said, it's complicated because the issue doesn't
> obviously reproduce with djpeg or so, and probably has to do with how the
> browsers integrate with the library.

That's a reasonable hypothesis, but I can't reproduce it with djpeg either, and I also tried using 'djpeg -memsrc', which causes the image to be decoded entirely in memory.
(Reporter)

Comment 38

2 years ago
Cc:ing Noel. The approach taken for Chrome at this point is just to reject JPEGs with sampling factor of 3.

Comment 39

2 years ago
Yeah, 3 is an invalid sampling factor for JPEG. Another remedy is to map 3 -> 1. That produces one variant of the decoded image pixels always (that's the goal), but it only fixes one manifestation of this bug (comment #32). The other image

  http://lcamtuf.coredump.cx/ffjpeg/id:000511,src:000172,+cov.jpg

still produces 2 variants for me, but here the JPEG decoder does complain - "Corrupt JPEG data: premature end of data segment" after seeing enough data to detect it. Yet some pixels have been drawn to the screen by that point. That's the clue and comment #13, the bug occurs when the image is loaded from the network.

The browser paints the image when it's in view, that causes a decode, but the decode suspends when it runs out the network data. The browser paints whatever was decoded so far. When more network data arrives, the process (network/paint/decode/suspend) continues, eventually failing when enough network data has arrived to detect corruption.

I note that browsers have used this network packet driven paint approach since forever (well, at least since 1998 :) So why don't we see this bug in djpeg? DRC & Michal mentioned:

> That said, it's complicated because the issue doesn't obviously reproduce with djpeg 
> ...and I also tried using 'djpeg -memsrc', which causes the image to be decoded entirely in memory.

No bug there. You have effectively received the image in "one packet" and the output pixels are always the same. Bust that image up into packets and pump those packets into djpeg somehow, and I think the bug (two or more variants) will manifest in djpeg.

Perhaps curl http://lcamtuf.coredump.cx/ffjpeg/id:000511,src:000172,+cov.jpg and trace the response at HTTP level to work out how the content (encoded JPEG) arrives. How many packets? What was the content size in each? Use those numbers as a guide to drive djpeg similarly, packet-wise. Output the decoded pixels to a file when djpeg hits the error "Corrupt JPEG data: ..." and md5sum the file. Wrap it all in a shell script that explores variations in the packet sizes if needed. I expect djpeg will eventually produce two different md5sum.

I don't hack on djpeg myself, but others here are encouraged. Easter: and I'm gone for 5 days so no time for me to do this interesting simulate-the-network experiment.

Comment 40

2 years ago
How is 3 an invalid sampling factor?  It is not a common sampling factor, but sampling factors can be any integer >= 1.

Comment 41

2 years ago
As a standards matter "Table B.2 – Frame header parameter sizes and values", ITU T.81 [1], 1-4 only are permitted. As a practical matter, is it true that Y >= Cb == Cr (the image is YCbCr but has 1 >= 3 == 1 in the horizontal) and for 8x8 DCT (JPEG in the wild it seems) that the rational horz / vert sampling factors H1,V1 H2,V2 H3,V3 of a three component image can be tabulated assuming integer division and storage? See for example, "Figure 3. Compressed JPEG Exif file subsampling encoding" [2], and curious: no 3's there.

[1] http://www.w3.org/Graphics/JPEG/itu-t81.pdf
[2] http://dougkerr.net/Pumpkin/articles/Subsampling.pdf

Seems to me that 3 is redundant / questionable. Seems to me that under integer division and rounding that the dependent calculations needed for JPEG decoding, and anything that depends on those computed values (perhaps memory allocations), as in these example component calculations from jdinput.c:

    /* Size in DCT blocks */
    compptr->width_in_blocks = (JDIMENSION)
      jdiv_round_up((long) cinfo->image_width * (long) compptr->h_samp_factor,
		    (long) (cinfo->max_h_samp_factor * DCTSIZE));

    /* Size in samples */
    compptr->downsampled_width = (JDIMENSION)
      jdiv_round_up((long) cinfo->image_width * (long) compptr->h_samp_factor,
		    (long) cinfo->max_h_samp_factor);

    /* Overall image size in MCUs */
    cinfo->MCUs_per_row = (JDIMENSION)
      jdiv_round_up((long) cinfo->image_width,
		    (long) (cinfo->max_h_samp_factor*DCTSIZE));

might not go so well when h_sample_factor or max_h_sample_factor is 3, and if 8x8 DCT is the norm for JPEG in the wild, then what purpose does 3 serve that sane sampling values (4,2,1) do not? Michal also advised me elsewhere regarding comment #32 that:

> Other potentially useful tidbits:
> - Factors 2 and 4 are not affected
  Nod, factor 3 looks to be a problem-child
> - Higher factors don't work.
  Yeah, libjpeg decoders fail if they see sampling factors <= 0 or > 4.
> - Changing horizontal and vertical sampling factors seems to have comparable results.
  Nod, applies to all sampling factor types.
> - This does not appear to be reproducible for sampling factors for component 1, but both
>   chroma components (2 & 3, Cb & Cr) are affected in the same way.
  Nod, but component 1 (Y) could be 3 too.

And then I also don't see anywhere in the jpeg library code where the order relation Y >= Cb == Cr is checked or where individual values are sanitized. So I deem factor 3 invalid for 8x8 DCT images, map 3 -> 1 in Chrome to prevent the decoder reading too much data, and retest -- no bug reproduction in Chrome anymore, stable decoded pixels in all cases for http://lcamtuf.coredump.cx/jpeg_simpler

Factor 3 limiting seems stable, 3 is not practically valid and my guess is hte Internetz won't cry if I hack Factor 3 in this way. Factor 3 is so uncommon that I've never seen one until now.

But not so quick... what about the other images reported on this bug? Also Chrome has been fuzzing JPEG using a large image corpus containing valid and invalid JPEG for ages, 24x7. Surely its statistical bit-flippers would have flicked a sampling factor from 1 -> 3 by now, but no reports. I've recently added a special fuzzer to focus on the three images and test-case I have for this bug, but no reports so far. I suspect that our fuzzing infrastructure does not simulate the network. And as mentioned, the network is required to manifest this bug. A packet-driven djpeg might confirm that.

Comment 42

2 years ago
Actually, 3:1:1 video does exist in Sony HDCAMs, so a chroma subsampling factor of 3 is not unheard-of.  The libjpeg API was created to be very flexible in this regard, and there is nothing to prevent one from generating a JPEG file in which the luma is subsampled instead of the chroma.

cjpeg -sample 1x1,3x1,3x1

will generate such a "1:3:3" JPEG file, which is extremely unorthodox and probably even stupid, but it's currently displayable by every browser on the planet.  A more practical application would be to subsample only one chroma component and not the other.  For instance, if you have an image with sharp features mainly in the red hue, you could subsample the Cb component but not the Cr component, thus saving a significant amount of space but still preserving the sharp red features.

Comment 43

2 years ago
In other words, I don't think it's valid to assume that the chroma subsampling factors are equal.  libjpeg makes no requirement for this, and in fact, the cjpeg tool allows you to very easily set these factors independently of one another.

Comment 44

2 years ago
Created attachment 8588564 [details] [diff] [review]
svn diff to libjpeg-turbo trunk 1.4.80

Put 3 aside for a moment and let's focus on the bug. I checked out libjpeg-turbo trunk on built in on OSX, then patched djpeg to add some packet drive, svn diff attached.

Comment 45

2 years ago
Created attachment 8588566 [details]
test script using patched djpeg

Next I wrote a script to use the -packet argument to examine various packet lengths to use when reading input with djpeg, test script attached.

Comment 46

2 years ago
Next, saved the image from comment #32 http://lcamtuf.coredump.cx/jpeg_simpler/image.jpg to my Desktop and ran the patched djpeg over it: produced two output images (different m5sums)

% ./test.sh ~/Desktop/image.jpg
8ecda51fbf7f1e568d1ad6a53819aa8d
9653746e31812d63df6c7eddbb50ab4c

Comment 47

2 years ago
The packet boundary is at 1287 for image.jpg, the two output variants are:

% ./djpeg -packet 1287 -gif ~/Desktop/image.jpg > image1.gif
% ./djpeg -packet 1288 -gif ~/Desktop/image.jpg > image2.gif

Comment 48

2 years ago
Created attachment 8588571 [details]
image1.gif

Comment 49

2 years ago
Created attachment 8588573 [details]
image2.gif

Comment 50

2 years ago
Created attachment 8588581 [details]
firefox-36-0-4-jpeg-simpler-variants c.f. image1.gif image2.gif

And image{1,2}.gif appear to be the variants reported for my Firefox 36.0.4 when running http://lcamtuf.coredump.cx/jpeg_simpler
(Reporter)

Comment 51

2 years ago
Neat, good hunch!

Comment 52

2 years ago
Also, I fetched the lib jpeg6b source from http://sourceforge.net/projects/libjpeg and compiled it on OSX, and patched its djpeg similarly to Comment #45, then tested with the -packet argument per Comment #46:

% ./test.sh ~/Desktop/image.jpg
9653746e31812d63df6c7eddbb50ab4c

Only one variant. This bugs seems to be a libjpeg-turbo regression compared to lib jpeg6b.
Thanks for investigating this, Noel!

Comment 54

2 years ago
OK, now for your next trick, explain why this occurs only with libjpeg-turbo. I need to sit down and study it further, but I'm having a little trouble following what has been said so far. I need to look at the code  and see exactly what you're trying to make libjpeg-turbo do, but at first glance, I don't understand why Mozilla would cause a JPEG to be decompressed in a different manner than djpeg. Please assume that I know nothing about the Mozilla code, not because I'm an idiot but because I'm not a Mozilla developer, so if someone could translate for me, it would be much appreciated.
(In reply to DRC from comment #54)
> I don't understand why Mozilla would cause a JPEG to be
> decompressed in a different manner than djpeg.

Noel can probably answer better, but it was my impression from the comments above that the bug has to do with the data being delivered in a series of smaller chunks vs. in a single large, complete chunk.

Comment 56

2 years ago
Yeah, I get that, but this seems like minimally a very odd use of the libjpeg API to me.  Generally an application will fill the input buffer in fill_input_buffer(), unless there is not enough data left in the image to fill it.  I don't understand at all:

-- how libjpeg-turbo could ever be expected to decode the JPEG "properly" when the header is corrupted
-- how the corruption of this particular byte in the header is interacting with the Huffman decoder to cause issues when the buffer is not completely filled in fill_input_buffer()
-- why the issues only occur with a specific "packet size"
-- how the buffer is being over-read (or is it?)

It does, however, appear to be specific to the accelerated Huffman decoder, so it seems like the ultimate solution would be to detect the circumstances under which the accelerated Huffman decoder is causing problems and disable it for that specific MCU.

Comment 57

2 years ago
Created attachment 8589631 [details]
broken-images.tar.gz 4 test images

One thing that might help is having the set of 4 images I mentioned that are involved in this bug (attached in a tgz)

8 -r--r------ 1 noel 3196 Apr  8 23:28 jpeg-image-000.jpg (aka image.jpg)
8 -r--r------ 1 noel 1775 Apr  8 23:28 jpeg-image-001.jpg
8 -r--r------ 1 noel 1235 Apr  8 23:28 jpeg-image-002.jpg
8 -r--r------ 1 noel 1771 Apr  8 23:28 jpeg-image-003.jpg

Comment 58

2 years ago
(In reply to Seth Fowler [:seth] from comment #55)
> 
> Noel can probably answer better, but it was my impression from the comments
> above that the bug has to do with the data being delivered in a series of
> smaller chunks vs. in a single large, complete chunk.

Correct impression and the right problem statement: chunked vs. one chunk data deliver is the cause.

(In reply to DRC from comment #56)
> Yeah, I get that, but this seems like minimally a very odd use of the
> libjpeg API to me ...

Not odd to me.

> I don't understand at all:
> -- how libjpeg-turbo could ever be expected to decode the JPEG "properly"
> when the header is corrupted
> -- how the corruption of this particular byte in the header is interacting
> with the Huffman decoder to cause issues when the buffer is not completely
> filled in fill_input_buffer()
> -- why the issues only occur with a specific "packet size"
> -- how the buffer is being over-read (or is it?)

A way to address these concerns is to show how libjpeg6b handles it.  We have 4 test images, and the details of how to get the libjpeg6b source, built it, and patch it's djpeg for testing (comment #52).

% pwd && time for i in jpeg-image-00*.jpg; do
>   echo $i && ./test.sh $i
> done

/Users/noel/libjpeg6b/jpeg-6b

jpeg-image-000.jpg
9653746e31812d63df6c7eddbb50ab4c
jpeg-image-001.jpg
7b18edfc052d20964736badfd978c245
jpeg-image-002.jpg
c1a1c9b829214e4f9735dc252a660398
jpeg-image-003.jpg
125a9f6524b1d938feee07e2364de7cb

real	0m40.374s
user	0m22.888s
sys	0m21.200s

libjpeg6b seems stable (always decoded the same) no matter what packet size we use.

Comment 59

2 years ago
And we can patch the libjpeg-turbo trunk 1.4.80 djpeg similarly (comment #44) and build and test.

% pwd && time for i in jpeg-image-00*.jpg; do
>  echo $i && ./test.sh $i;
> done

/Users/noel/libjpeg-turbo-code

jpeg-image-000.jpg
8ecda51fbf7f1e568d1ad6a53819aa8d
9653746e31812d63df6c7eddbb50ab4c
jpeg-image-001.jpg
7b18edfc052d20964736badfd978c245
a4186ed043ba1ba0f2219747db72435d
jpeg-image-002.jpg
cd442a5e64b4f064f7dcbfbfa95fbe71
jpeg-image-003.jpg
107fdcb0fbde3068e6a5562b6803b941
117c31eda47a53a4cde0ff8f83ccea6f
48710495cc692b22afac2c60c2a13add
496122309d8fa29b170fb79f7ee940d4
56a45e37e87c510137869b1a397c5a9c
7447f17b90ed5f96215e81ec17310bc2
7cb97c71ec80ba28b044675184486273
81e09f566e17877fff67ff81df704cfe
9091c8496bd97249eb88afe13a485c65
a1f403d485df31396c5789a814370532

real	1m46.331s
user	0m54.370s
sys	1m1.685s

libjpeg_turbo seems generally unstable (this bug) in that the decoded output depends on packet size (and is slower for this image test as an aside).

Comment 60

2 years ago
Created attachment 8589643 [details] [diff] [review]
libjpeg6b-to-libjpegturbo.1.4.80-decoder.diff

So something changed in libjpeg_turbo compared to libjpeg6b. I computed the diff of the decoder jd*.c files that are common to both libraries. The 3252 line diff attached, and it makes for interesting reading: I've stared at it...

Comment 61

2 years ago
Created attachment 8589689 [details] [diff] [review]
libjpeg-turbo-1-4-80-disable-decode-mcu-fast.svn.diff

... and in all those changes, the ones in jdhuff.c looked most interesting to me, and specifically, I noted that the way libjpeg6b decoded MCU has been changed in libjpeg_turbo. I disabled that code (svn diff to libjpeg_turbo trunk 1.4.80 attached), rebuilt and retested.

% pwd && time for i in jpeg-image-00*.jpg; do
>   echo $i && ./test.sh $i;
> done

/Users/noel/libjpeg-turbo-code

jpeg-image-000.jpg
9653746e31812d63df6c7eddbb50ab4c (matches libjpeg6)
jpeg-image-001.jpg
7b18edfc052d20964736badfd978c245 (matches libjpeg6)
jpeg-image-002.jpg
cd442a5e64b4f064f7dcbfbfa95fbe71
jpeg-image-003.jpg
7447f17b90ed5f96215e81ec17310bc2

Stable decodes regardless of packet size, and the md5 sums even match libjpeg6 in two cases (good).

Still wondering about what else changed that caused this divergence in decoded output compared to libjpeg6 in the two other cases, but I hope my current result has shown that libjpeg_turbo can be as robust as libjpeg6b in decoding corrupted data to answer your questions.

I guess the washup will be deciding if/how to roll back the turbo changes to better align with libjpeg6b - it's the gold standard of the web [1] and is robust in the face of error - seem an error was introduced with the MCU decoding changes.

These are my results.

[1] http://seclists.org/fulldisclosure/2013/Nov/83

Comment 62

2 years ago
Apologies, I left out the timing result in #61

real	1m41.280s
user	0m52.495s
sys	0m59.457s

Comment 63

2 years ago
Noel, libjpeg-turbo uses an optimized Huffman decoding algorithm when possible. This increases performance for the most common use cases in libjpeg-turbo (particularly for memory-to-memory decoding.) You cannot just run the time command on djpeg and call it a benchmark. djpeg has I/O overhead that tends to mask the real effect of changes to the decoder. Your use case is an odd use case for libjpeg-turbo. I am unaware of another application that uses the library in that way. But regardless, you have not told me anything I didn't already know. As I said, disabling the accelerated Huffman decoder works around the issue, but that is not a general-purpose solution. I really need to understand the mechanism behind this failure so the issue can be either fixed or selectively worked around without compromising performance.

Comment 64

2 years ago
(In reply to noel gordon from comment #61)
> Still wondering about what else changed that caused this divergence in
> decoded output compared to libjpeg6 in the two other cases, but I hope my
> current result has shown that libjpeg_turbo can be as robust as libjpeg6b in
> decoding corrupted data to answer your questions.
> 
> I guess the washup will be deciding if/how to roll back the turbo changes to
> better align with libjpeg6b - it's the gold standard of the web [1] and is
> robust in the face of error - seem an error was introduced with the MCU
> decoding changes.

The "gold standard"?  Noel, jpeg-6b has not been maintained since 1998.  Two people decided to move the project forward-- myself and Guido Volbeding.  He moved the project forward in a way that introduced new uses for the DCT and a new, unofficial (rejected by the ITU-T), incompatible extension to the JPEG format, as well as backward-incompatible changes to the API/ABI.  I moved it forward in a way that increased performance by as much as 5x and maintained backward API/ABI compatibility.  I was not ever attempting to compete with the IJG, but Linux distributions and browser developers (including Mozilla) preferred the libjpeg-turbo approach, which is why it has become the de facto standard library for JPEG encoding/decoding.

I would like to fix or work around this issue, but you have not adequately answered my questions regarding the underlying cause.  You seem to be instead pointing fingers at me and declaring that the solution is to abandon libjpeg-turbo and revert to a 17-year-old code base instead.  Understand that that isn't an option, and move forward with your argument.  Understand that there is a good reason why libjpeg-turbo does what it does.  If it needs to be extended to include this case, then fine.  Let's extend it, but first we need to understand the exact mechanism behind the issue.  Spewing reams of computer output at me isn't helping.  I need an English explanation, please.
(Reporter)

Comment 65

2 years ago
Hey DRC,

To take a step back, the problem is that browsers use libjpeg-turbo to render images that are untrusted and potentially malicious, so the way they mesh with the rendering library and the library itself must be very robust.

Memory corruption bugs when decoding malformed images have obvious security consequences by corrupting the state of the program and potentially making it behave in a way chosen by the attacker (e.g., by jumping to an unintended location, flipping the wrong byte).

But another class of issues are bugs like Heartbleed, where the program does not crash, but outputs data that contains bits of uninitialized memory, which may contain secrets handled earlier by the process. If the attacker an read back the decoded buffer containing uninitialized memory, they may be able to steal said secrets. For Heartbleed, that would be cryptographic keys, passwords, and other people's HTTP requests. In a browser, the memory can contain bits of HTTP cookies, passwords, and so on.

The problem we're seeing here is that feeding a malformed JPEG with an unorthodox sampling factor apparently causes libjpeg-turbo use uninitialized memory, as evidenced by the same data decoding to different images across runs. This doesn't reproduce with stock djpeg, but Noel figured that this has to do with the tendency for browsers to call libjpeg-turbo with partial buffers to progressively render the image (he or Seth can explain the implementation better than I can). He modified djpeg to mimic that, and also tracked down the issue to a specific file which, if reverted to jpeg6b, makes the problem go away.

I don't think anyone is seriously arguing for reverting to jpeg6b, but I guess there are two ways forward:

1) Figure out which change, exactly, leads to this behavior and how to correct the issue. This is complicated by the fact that MSAN and ASAN don't pick up the error automatically. I think Noel is sort of hoping that you would have some ideas and suggestions based on his investigation so far.

2) Decide that the way Firefox and Chrome (and probably all other browsers) use libjpeg-turbo is wrong and give them a different API or ask them to work around the issue in some way.

Comment 66

2 years ago
OK, fair enough, and thanks for the explanation.  That is exactly what I needed.  I will need to examine the issue further to give any sort of feedback regarding (1), but referring to the Huffman source (jdhuff.c), there are already certain circumstances under which we disable the accelerated Huffman decoder, and I think the ideal situation here would be to figure out how to detect this circumstance as well and similarly disable the accelerated Huffman decoder.  As for (2), the obvious way to work around this would be to either wait until you have received enough data to fill the buffer before actually filling it, or to decode the image entirely in memory without using buffered I/O.  Whether or not either of those approaches make sense for Mozilla is not for me to decide, but I suspect that both of them would perform better.

But just to clarify (and I'm not stating this as a fact-- I'm asking whether this is correct), it isn't just the fact that the sampling factor is unorthodox that is causing the issue.  It's the fact that the sampling factor doesn't match the actual image data (because the header is corrupt.)

Stand by.  I'll try to look at this today.

Comment 67

2 years ago
I modified Noel's test script so that it runs djpeg in valgrind, and valgrind detects no buffer overruns or anything of that nature, regardless of the packet size.  In looking back at this thread, it doesn't seem that anyone has yet established that there is an actual buffer overrun or other security issue here.  There is just an assumption that such an issue might exist because there are multiple variants.  Correct me if I'm wrong.

Comment 68

2 years ago
You might have missed Michal's comment 65 "This is complicated by the fact that MSAN and ASAN don't pick up the error automatically." This may be because of bugs in those tools (TBD). Given that caveat, it is fair to say that "no one has established that there is an actual buffer overrun or other security issue" at this time. It's the primary reason we are having this discussion at all - to establish that as a fact or rule it out.
(In reply to DRC from comment #67)
Per comment 4, Valgrind detects no buffer overruns, but it does detect
a use of uninitialised memory, and says where that comes from.  The
allocation point is on this path

 mozilla::image::imgFrame::Init
 -> mozilla::image::AllocateBufferForImage
    -> mozilla::VolatileBuffer::Init
       -> moz_posix_memalign
          -> posix_memalign
             -> memalign

and the usage point is on this path

 nsImageFrame::PaintImage
 -> nsLayoutUtils::DrawSingleImage
    -> DrawImageInternal
       -> mozilla::image::RasterImage::Draw
          -> mozilla::image::RasterImage::DrawWithPreDownscaleIfNeeded
             -> mozilla::image::imgFrame::Draw

Speaking from a position of total ignorance of how the library is
integrated in Firefox, I'd get the impression from this that Fx
allocated a buffer for the image to be decoded into, called the
decoder to get it done, but for whatever reason, the decoder didn't
entirely fill the buffer.

Comment 70

2 years ago
But at least in my testing, valgrind did not detect uninitialized memory at the low level. This should be easy for someone  else to confirm using Noel's script and patch. Also, I have no idea what MSAN and ASAN are.

Comment 71

2 years ago
If you have run my scripts and patch, then have you confirmed any of my findings such as #51 and #61? I yet to see anyone on this confirm my results.

Comment 72

2 years ago
I'm yet to see (sry)... Maybe Michal. Have you reproduced at all?

Comment 73

2 years ago
Well it is getting late for me...

(In reply to DRC from comment #64)

> I would like to fix or work around this issue, but you have not adequately
> answered my questions regarding the underlying cause.

If you've run my scripts and patch you might have independently confirmed my results, which appear to me to point to the cause if I'm not mistaken.

> You seem to be instead pointing fingers at me and declaring that the solution is to abandon
> libjpeg-turbo and revert to a 17-year-old code base instead.

I've pointed fingers at broken code only, nothing more thanks. The intimation otherwise does not sit well with me. Moreover, I said "better align with libjpeg6b". I did not suggest reverting to using libjpeg6b. I will assume you misunderstood me and leave it at that, unless you don't understand what I've just said.

> Understand that that isn't an option, and move forward with your argument.  Understand
> that there is a good reason why libjpeg-turbo does what it does.

Understand that if what libjpeg_turbo does exposes Firefox and Chrome users to the issues Michal mentioned, then what is has done has no good reason. Easy to be fast and wrong: better to be fast and secure.

> If it needs to be extended to include this case, then fine.  Let's extend it, but
> first we need to understand the exact mechanism behind the issue.  Spewing
> reams of computer output at me isn't helping.

Software engineering is a business requiring lots of reading, of code mainly and dealing with its many details. This spew (your words) is provided free of charge and you may use how you see fit, and might help to resolve this bug.

> I need an English explanation, please.

+    // HACK(noel): disable the libjpeg_turbo decode_mcu_fast code path.
+    // https://bugzilla.mozilla.org/show_bug.cgi?id=1050342
+    usefast = 0;

appears to be most succinct explanation I have.

Comment 74

2 years ago
(In reply to Michal Zalewski from comment #65)

> 1) Figure out which change, exactly, leads to this behavior and how to
> correct the issue. This is complicated by the fact that MSAN and ASAN don't
> pick up the error automatically. I think Noel is sort of hoping that you
> would have some ideas and suggestions based on his investigation so far.

From the simple explanation one can figure out the "which change, exactly" and propose a possibly correction.
(In reply to Julian Seward [:jseward] from comment #69)
> (In reply to DRC from comment #67)
> Per comment 4, Valgrind detects no buffer overruns, but it does detect
> a use of uninitialised memory, and says where that comes from.

Wasn't that bug 1045977? You said in comment 7 that you couldn't reproduce that anymore after bug 1045977 landed.

Comment 76

2 years ago
Created attachment 8590446 [details] [diff] [review]
libjpeg-turbo-1-4-80.proposal.svn.diff

Running the usual script with this patch produced:

/Users/noel/libjpeg-turbo-code
jpeg-image-000.jpg
9653746e31812d63df6c7eddbb50ab4c
jpeg-image-001.jpg
7b18edfc052d20964736badfd978c245
jpeg-image-002.jpg
c1a1c9b829214e4f9735dc252a660398
jpeg-image-003.jpg
87ce646a1feac48c880653755199a7bd

Stable decodes, md5 match libjpeg6 in all cases.

A proposal mind, performance impact here and so on. Main benefit is it identifies, via the retracted code, where the MSAN and ASAN efforts should be focussed and instrumented.

(Apologies for the cross, Seth).
(In reply to Seth Fowler [:seth] from comment #75)
> Wasn't that bug 1045977? You said in comment 7 that you couldn't reproduce
> that anymore after bug 1045977 landed.

Ach, my bad.  You're right.  Ignore comment #69.  Sorry for the noise.

Comment 78

2 years ago
(In reply to noel gordon from comment #74)
> From the simple explanation one can figure out the "which change, exactly"
> and propose a possibly correction.

Yes, and I believe I already noted that disabling the accelerated Huffman codec caused your test to produce only one variant, but again, without any evidence that these multiple variants actually translate to uninitialized memory usage or other issues at the low level, it would be ill-advised to regress decoding performance by 20%.
(In reply to DRC from comment #70)
> Also, I have no idea what MSAN and ASAN are.

They are memory testing tools similar in intent to valgrind, except they're "compiled in" and the resulting builds perform better. The performance differences don't really matter if you're unit-testing a library like JPEG, but if you're running the entire browser in a multi-hour fuzzing cycle it's significant.

https://code.google.com/p/address-sanitizer/wiki/AddressSanitizer
https://code.google.com/p/memory-sanitizer/wiki/MemorySanitizer

Comment 80

2 years ago
Just to get me up to speed, are MSAN and ASAN detecting issues here, or is that still undetermined?

Comment 81

2 years ago
Still to be determined, I think. Folks would need the djpeg -packet patch to test. Perhaps we should submit a separate patch for that upstream so folks who want to test can pull it from upstream? ...

Comment 82

2 years ago
Created attachment 8594511 [details] [diff] [review]
libjpeg-turbo-1-4-80.djpeg.svn.diff

Here is the djpeg patch I'm using.

Comment 83

2 years ago
Noel, I am the maintainer of libjpeg-turbo, so it's doubtful that anyone else from upstream would want to test this other than me.  Before I'm willing to do anything about this at all, it needs to be established that the issue represents a real security threat.

Comment 84

2 years ago
Well let's find the bug then, which we've only been able to reproduce using a patched djpeg.

(In reply to DRC from comment #78)
> Yes, and I believe I already noted that disabling the accelerated Huffman codec caused your test to produce only one variant.

This because setting usefast = 0 makes the code use decode_mcu_slow() only, which produces output that matches the original IJG decode_mcu() code, since, apart from setup, these routines look identical to me.

> but again, without any evidence that these multiple variants actually translate to uninitialized memory > usage or other issues at the low level, it would be ill-advised to regress decoding performance by 20%.

Not sure where that performance number comes from, but I agree there'd be some performance hit if one just backed off to using usefast = 0.

Seems this issue must be due/in decode_mcu_fast() somewhere ...

Comment 85

2 years ago
https://code.google.com/p/chromium/codesearch#chromium/src/third_party/libjpeg_turbo/jdhuff.h&l=212

212 #define HUFF_DECODE_FAST(s,nb,htbl) \
213  FILL_BIT_BUFFER_FAST; \
214  s = PEEK_BITS(HUFF_LOOKAHEAD); \
215  s = htbl->lookup[s]; \
216  nb = s >> HUFF_LOOKAHEAD; \
217  /* Pre-execute the common case of nb <= HUFF_LOOKAHEAD */ \
218  DROP_BITS(nb); \
219  s = s & ((1 << HUFF_LOOKAHEAD) - 1); \
220  if (nb > HUFF_LOOKAHEAD) { \
221    /* Equivalent of jpeg_huff_decode() */ \
222    /* Don't use GET_BITS() here because we don't want to modify bits_left */ \
223    s = (get_buffer >> bits_left) & ((1 << (nb)) - 1); \
224    while (s > htbl->maxcode[nb]) { \
225      s <<= 1; \
226      s |= GET_BITS(1); \
227      nb++; \
228    } \

Something is missing here compared to jpeg_huff_decode(): variable nb could be > 16 and jpeg_huff_decode checks for that case and explains why.  Without that check, the following array is accessed with an nb > 16 compared to decode_mcu_slow() path.

229    s = htbl->pub->huffval[ (int) (s + htbl->valoffset[nb]) & 0xFF ]; \

Why was the & 0xFF added here?

I see that htbl->valoffset[17] has been set to 0 when initializing the huff tables, is that correct?

If so, then simplifying, this array access expression is logically s = array[ s & 0xff ].  I think "s" stands for "symbol", and valid symbols for an AC table are <= 255.

Are DC tables handled by this code too? DC tables have valid symbols <= 15 in my cursory reading of the T.81.

230  }

Comment 86

2 years ago
While I'm awaiting answers to Comment #85, I did try to rewrite the code as:

   228    } \
   +      if (nb > 16)
   +         goto bailout;
   229    s = htbl->pub->huffval[ (int) (s + htbl->valoffset[nb]) ]; \
   230  }

and adjust decode_mdc_slow() to provide a "bailout" label to make it just fall back to slow decoding.

The result is stable decodes that match the slow path (good), but I'm measuring a small perf hit on Mac (not so good).

Comment 87

2 years ago
But that's the thing-- we haven't established that this is a "bug."  We've just established that a malformed JPEG can produce two variants in the decoded junk pixels, but that doesn't prove that uninitialized memory is actually being used, and I would think that valgrind would show something if it was.  In fact, I can explain why two variants are being generated, and it's my opinion that this behavior is harmless.

The accelerated Huffman codec avoids branches for performance reasons, and the code is carefully designed (in terms of instruction order) to perform well with the GCC and Visual C++ optimizers.  And FYI-- the performance numbers I'm quoting are the result of my extensive benchmarking with libjpeg-turbo.  I refer you to the performance study on libjpeg-turbo.org for further information.  I founded the libjpeg-turbo project and have been the principal developer and the maintainer of it in the 5 years hence, so I know from experience what effect fast vs. slow Huffman decoding has on overall performance.  The 20% drop in performance is documented in README-turbo.txt.

The fast Huffman decoder does produce slightly different behavior if an invalid Huffman code is encountered.  In the specific case that we're referring to here, packet sizes >= 800 will cause nb to be 17 for certain MCUs.  The slow Huffman decoder specifically detects that situation and inserts a 0 into the output.  But to avoid a branch, the fast Huffman decoder doesn't do that.  Rather, it handles the case in a different way.  If nb == 17, then htbl->valoffset[nb] will be 0, but s is likely to be a bogus (and large) value, so that's why the & 0xFF is there -- to ensure that (s + htbl->valoffset[nb]) <= 255 before using it to index htbl->pub->huffval[].

In this specific case, when the packet size >= 1286, some values of s are such that, when ANDed with 255, the resulting value points to a non-zero entry in the huffval[] table.  So the output byte is not 0 as it would be if the slow Huffman decoder was used, but I posit that this is not incorrect behavior-- it is just different behavior than what libjpeg traditionally had.

You could just as easily do the following to make the fast Huffman decoder produce one variant:

    if (nb > 16)  \
      s = 0;  \
    else  \
      s = htbl->pub->huffval[ (int) (s + htbl->valoffset[nb]) ];  \

But of course that is introducing a branch as well, so it will also affect performance.  You could also clamp (s + htbl->valoffset[nb]) to 255 rather than ANDing it with 255 in order to get one variant, but clamping would involve a branch.

Comment 88

2 years ago
In short, I really do not believe that this is a security issue at all.  I think it's harmless behavior.

Comment 89

2 years ago
(In reply to DRC from comment #87)
> But that's the thing-- we haven't established that this is a "bug."  We've
> just established that a malformed JPEG can produce two variants in the

Two or more variants are produced, we agree, but not about whether it's a bug -- I'd prefer the browser to render the same way in all circumstances. That's just my opinion, Seth may have a different view. Anyway, bug or not, there's an issue filed in chrome and mozilla and ...

> decoded junk pixels, but that doesn't prove that uninitialized memory is
> actually being used, and I would think that valgrind would show something if
> it was.

valgrind might not work well unless the code under test is being packet-driven. Variants themselves don't prove uninitialized memory sure, but I don't know this code -- we're just trying to figure out if variant production causes an uninitialized memory access -- so my apologies if I have to ask dumb questions about this code. There were few details in the change description associated with the change for me to work it out eg., why the change was made, what were the pros / cons, was the change peer reviewed in public, how was it tested, and so on ... hence the questions and my reading the code ...

> In fact, I can explain why two variants are being generated,

to work out for myself that variant production was due to this missing piece in HUFF_DECODE_FAST
  
   if (nb > 16)  \

The change descriptions did not mention the possibility for variant production, or that that was a change in behavior relative to libjpeg6b.  Maybe an oversight, but I'd add such details to a change description if I were the one doing the changes (such things would be important to me to describe).  But if you knew how to explain it, might have been good to mention that much earlier in this bug.

> and it's my opinion that this behavior is harmless.

Be clear that I can have no opinion one way or the other.  All I need do is confirm it to be harmless, by analysis, and all my opinions aside.

> The accelerated Huffman codec avoids branches for performance reasons, and
> the code is carefully designed (in terms of instruction order) to perform
> well with the GCC and Visual C++ optimizers.  And FYI-- the performance
> numbers I'm quoting are the result of my extensive benchmarking with
> libjpeg-turbo.  I refer you to the performance study on libjpeg-turbo.org
> for further information.

Are you referring to http://www.libjpeg-turbo.org/About/Performance and five test images?  Five test image would not be an extensive test-case for me as a browser vendor. Perhaps you are referring to some other wider performance study on the libjpeg-turbo.org site?

> I founded the libjpeg-turbo project and have been
> the principal developer and the maintainer of it in the 5 years hence, so I
> know from experience what effect fast vs. slow Huffman decoding has on
> overall performance.  The 20% drop in performance is documented in
> README-turbo.txt.

I see nothing like a 20% difference in desktop browser: it's more like 5% and I can provide data, Seth may have similar data.  Some images decode slower with fast Huffman enabled, compare to not having it enabled, sometimes by as much as -20%.  The quoted 20% does not appear to me to be the common case in my testing.

Comment 90

2 years ago
(In reply to DRC from comment #87)

> The fast Huffman decoder does produce slightly different behavior if an
> invalid Huffman code is encountered.

Agree, I'm seeing a change in behavior relative to libjpeg6b.  Aside: if you do plan to move away from libjpeg6b compat, chromium-team would like a heads-up about that.

> In the specific case that we're
> referring to here, packet sizes >= 800 will cause nb to be 17 for certain
> MCUs.

Yes nb = 17 for some MCU, not sure about the "packet sizes >= 800" restriction yet.  You have to decode the MCU sometime (sufficient data) and you'll see its nb = 17 producing MCU eventually, regardless of packet size.  I noted nb = 17 happening for memory-to-memory decodes during experiments (the single packet case) for example.

>  The slow Huffman decoder specifically detects that situation and
> inserts a 0 into the output.

Yes, and issues a warning message too [1].  No warning issued in HUFF_DECODE_FAST, right?  A browser might use the warning to completely stop the decode in its tracks.  Seems we don't have that option on the fast path Huffman decoding though.

[1] https://github.com/mozilla/mozjpeg/blob/master/jdhuff.c#L476

> But to avoid a branch, the fast Huffman decoder doesn't do that.
> Rather, it handles the case in a different way.

Ok it was done for speed, and  

> If nb == 17, then htbl->valoffset[nb] will be 0, but s is likely to be a
> bogus (and large) value, so that's why the & 0xFF is there -- to ensure that
> (s + htbl->valoffset[nb]) <= 255 before using it to index
> htbl->pub->huffval[].

yes thanks for explaining the & 255 thing.  Keeps you within htbl->pub->huffval[] bounds.

Makes me wonder how that array it's allocated or whether its content is initialized from 0..255?

Comment 91

2 years ago
Created attachment 8594857 [details] [diff] [review]
libjpeg-turbo-1-4-80.huffman-table-access.svn.diff

So a quick patch to help understand how these htbl->pub->huffval[] things are being accessed during fast huffman decodes with HUFF_DECODE_FAST.

Comment 92

2 years ago
Created attachment 8594859 [details]
huffman-table-access-results.txt

Test fter building libjpeg-turbo-1-4-80.huffman-table-access.svn.diff

for i in jpeg-image-00*.jpg; do
  echo $i 1>&2 && ./djpeg -gif -debug "$i" > image.gif
done 2>huffman-table-access-results.txt

huffman-table-access-results.txt attached. Clamped "& 255" table accesses noted, for nb=17 only, for AC and DC tables, outside the table symbol range.

Comment 93

2 years ago
(In reply to DRC from comment #87)

The empty huffval[] entries are memset to 0 on creation [1]. Looks fine to me.

[1] https://github.com/mozilla/mozjpeg/blob/master/jdmarker.c#L475

> In this specific case, when the packet size >= 1286, some values of s are
> such that, when ANDed with 255, the resulting value points to a non-zero
> entry in the huffval[] table.  So the output byte is not 0 as it would be if
> the slow Huffman decoder was used.

Agree, and can confirm this with the last patch comment #91, remove the & 255 restriction on |where| and print out the value accessed from the table.  It's mostly 0, but sometimes not for some packets sizes, resulting in two or more variants.

The table memory accesses are _always safe_ due to the & 255.

> but I posit that this is not incorrect
> behavior-- it is just different behavior than what libjpeg traditionally had.

Yes, just different from libjpeg6b, sadly.

> You could just as easily do the following to make the fast Huffman decoder
> produce one variant:
> 
>     if (nb > 16)  \
>       s = 0;  \
>     else  \
>       s = htbl->pub->huffval[ (int) (s + htbl->valoffset[nb]) ];  \

Possible, but it should also produce a warning message in that s=0; clause too for compat with jpeg_huff_decode (), right?

I figured it safer for using apps to fall back to the slow path and make it produce the warning message instead, this so the code did not produce two warnings (one on the fast path and then again on the slow path due to having to fall back to slow decoding later in the MCU block).

> But of course that is introducing a branch as well, so it will also affect
> performance.  You could also clamp (s + htbl->valoffset[nb]) to 255 rather
> than ANDing it with 255 in order to get one variant, but clamping would
> involve a branch.

Yes the extra branching leads to a perf hit, but would regain compat.  I can report that on WIN32 the cost of the simple fallback (to account to issuing one warning message only)

   +      if (nb > 16)
   +         goto bailout;

is neutral perf wise. On my macbook air, the cost is ~1% loss.

Comment 94

2 years ago
I expect is I measure on my macpro (not measured yet), that the perf hit would also be neutral.

(In reply to DRC from comment #88)
> In short, I really do not believe that this is a security issue at all.  I
> think it's harmless behavior.

Concur, not a security issue, just a compat issue.
(Reporter)

Comment 95

2 years ago
Interesting, thanks for the investigation - and sorry for the false alarm, 9 out of 10 issues with inconsistent decoding reported by AFL turned out to be security bugs due to uninit memory, and so it seemed like a natural assumption here.
Thanks for all of the analysis.
Flags: needinfo?(seth)
Keywords: csectype-disclosure, sec-high → sec-other

Comment 97

2 years ago
> Yes, just different from libjpeg6b, sadly.

> Possible, but it should also produce a warning message in that s=0; clause
> too for compat with jpeg_huff_decode (), right?

You're treating libjpeg v6b as a rather arbitrary standard, which it isn't.  The real question is not whether libjpeg-turbo is behaving in a manner that is 100% identical to libjpeg v6b.  The question is whether we're decoding the JPEG image correctly.  After all, the newer versions of libjpeg do not behave identically to libjpeg v6b either.  They don't even generate the same bitwise output as libjpeg v6b (we do, assuming that the image you're decoding is valid. :-) 

> Yes the extra branching leads to a perf hit, but would regain compat.  I can
> report that on WIN32 the cost of the simple fallback (to account to issuing
> one warning message only)
> 
>    +      if (nb > 16)
>    +         goto bailout;
> 
> is neutral perf wise. On my macbook air, the cost is ~1% loss.

If Mozilla considers this compat issue important enough to fix, then I will perform additional testing to verify the performance hit.  This represents a significant effort on my part, however.  In order to regression-test performance in libjpeg-turbo, I test an array of images on multiple CPU architectures and O/S's using 32-bit and 64-bit code.  I do not want to go to that trouble unless this fix is deemed important.  Based on my past experience, I strongly suspect that the above is likely to affect performance measurably in at least some situations.

Comment 98

2 years ago
(to clarify, I mean additional testing to verify the performance hit from your proposed patch.)
(In reply to DRC from comment #87)
> You could just as easily do the following to make the fast Huffman decoder
> produce one variant:
> 
>     if (nb > 16)  \
>       s = 0;  \
>     else  \
>       s = htbl->pub->huffval[ (int) (s + htbl->valoffset[nb]) ];  \

Given that the branch is entirely predictable for non-malformed images
(IIUC), it is going to be essentially free in the common case.  Especially
if you put a branch hint on it, so that the common case (the else) is
not disadvantaged w.r.t. register allocation and code layout.

Maybe worth doing just to avoid inconsistent decoding?  I think that
apparent non-determinism makes people nervous, even if it turns out in
this case to be harmless.
Thanks Noel and DRC for your detailed investigation and explanations!

(In reply to Julian Seward [:jseward] from comment #99)
> Maybe worth doing just to avoid inconsistent decoding?  I think that
> apparent non-determinism makes people nervous, even if it turns out in
> this case to be harmless.

Yeah. The thing is, it *should* make people nervous, since it could easily indicate that we're reading uninitialized memory, which would be a serious security bug.

It'd be highly preferable to have this fixed, not because I think this particular bug is serious - it's clear that this behavior is harmless at this point - but because I'm concerned that keeping this behavior could lead us to miss a real, more serious bug. We should make sure that inconsistent decoding continues to be a useful signal that a security bug may exist. Given how widely used and important libjpeg-turbo is, we should make sure that catching security issues is as easy as possible.

(In reply to Julian Seward [:jseward] from comment #99)
> Given that the branch is entirely predictable for non-malformed images
> (IIUC), it is going to be essentially free in the common case.  Especially
> if you put a branch hint on it, so that the common case (the else) is
> not disadvantaged w.r.t. register allocation and code layout.

This is my expectation as well. DRC, it'd be great if you could validate this against your performance test suite.
Reading this, it seems that there is agreement that this is not a security bug. Does this mean that we can open this bug up to the public?

Comment 102

2 years ago
OK, I will test Noel's proposed patch for upstream inclusion and get back to you. It may take a few weeks, as I'm out of the country right now.

Comment 103

2 years ago
(In reply to Al Billings [:abillings] from comment #101)
> Reading this, it seems that there is agreement that this is not a security
> bug. Does this mean that we can open this bug up to the public?

Not sure yet.  Three images have been explained in terms of HUFF_DECODE_FAST and the "if (nb > 16)" test needed to stabilize decodes (good).

But I troubled by the trace in Comment #92 huffman-table-access-results.txt which says that one of the images, jpeg-image-002.jpg, does not even get Huffman decoding. The image is from:

  http://lcamtuf.coredump.cx/chjpeg/foo.html

and loading that test, I see that image drawn, then not in Firefox, and sometimes drawn, or not in Chrome.  I have no clue why yet.

(In reply to Andrew McCreight [:mccr8] from comment #96)
> Thanks for all of the analysis.

No problem, seems we might need some more Andrew.

(In reply to Michal Zalewski from comment #95)
> Interesting, thanks for the investigation - and sorry for the false alarm, 9
> out of 10 issues with inconsistent decoding reported by AFL turned out to be
> security bugs due to uninit memory, and so it seemed like a natural
> assumption here.

Multiple renderings are too often a sign of a security problem, and it's natural to assume the relation. jpeg-image-002.jpg is an interesting case in point, and unexplained. I can't get to it right now - maybe you could give that image some extra special attention and let me know your findings? I'm not sure we're out of the woods here yet.

Comment 104

2 years ago
(In reply to DRC from comment #97)
> > Yes, just different from libjpeg6b, sadly.
> 
> > Possible, but it should also produce a warning message in that s=0; clause
> > too for compat with jpeg_huff_decode (), right?
> 
> You're treating libjpeg v6b as a rather arbitrary standard, which it isn't.

It's the standard we have.  I'd prefer not to break existing applications.  Another example: the code that FILL_BIT_BUFFER_FAST depends on, GET_BYTE, does not handle a run of multiple 0xFF in an MCU, whereas the slow path code clearly does [1].  There maybe JPEG in the wild like that, since that code is old, and I'd prefer not to break those JPEG either.  Add the issue of emitting multiple warnings, and my conclusion is to just fall back and let the slow path code handle all these cases.

   if (nb > 16) \
      goto bailout; \
   s = htbl->pub->huffval[ (int) (s + htbl->valoffset[nb]) ];

Given that code, I'd add https://github.com/mozilla/mozjpeg/blob/master/jdhuff.c#L476 (Comment #90) and 
https://github.com/mozilla/mozjpeg/blob/master/jdmarker.c#L475 (Comment #93), and note the sum was CVE-2013-6630 [2], the initial report of which echoes the discussion herein.

[1] https://github.com/mozilla/mozjpeg/blob/master/jdhuff.c#L327
[2] https://code.google.com/p/chromium/issues/detail?id=299835

Just something to watch for generally when doing changes to the common IJG (libjpeg6b) code in turbo: it has been very well vetted / scrutinized since 1998.  I think fall back to the slow path will avoid compat issues (at some cost to perf, TBD).

> The real question is not whether libjpeg-turbo is behaving in a manner that
> is 100% identical to libjpeg v6b.  The question is whether we're decoding
> the JPEG image correctly.

Having the same input data decode to different images doesn't seem correct to me.

> After all, the newer versions of libjpeg do not
> behave identically to libjpeg v6b either.  They don't even generate the same
> bitwise output as libjpeg v6b (we do, assuming that the image you're
> decoding is valid. :-)

Indeed, this path is not an option (rejected at the ITU-T, and hence not an option for Chrome either).  If turbo moves away libjpeg6b compat, in any of it's meanings, then chrome-team would like to know since we'd be left with fewer options given the ITU-T result.  Compat is important to us for many reasons other than pixels (see above: the standard we have), but drawing the same pixels for the same input data would be good start ;)

Nod to Seth & Juliean Comment #100, variant pixels make us nervous for same reasons mentioned.  Compat makes us less nervous, and I don't think we should give it up if the perf cost is about neutral.

Comment 105

2 years ago
Julian even, apols. Unfinished business: jpeg-image-002.jpg.

Comment 106

2 years ago
(In reply to noel gordon from comment #104)
> It's the standard we have.  I'd prefer not to break existing applications. 
> Another example: the code that FILL_BIT_BUFFER_FAST depends on, GET_BYTE,
> does not handle a run of multiple 0xFF in an MCU, whereas the slow path code
> clearly does [1].  There maybe JPEG in the wild like that, since that code
> is old, and I'd prefer not to break those JPEG either.  Add the issue of
> emitting multiple warnings, and my conclusion is to just fall back and let
> the slow path code handle all these cases.
> 
>    if (nb > 16) \
>       goto bailout; \
>    s = htbl->pub->huffval[ (int) (s + htbl->valoffset[nb]) ];
> 
> Given that code, I'd add
> https://github.com/mozilla/mozjpeg/blob/master/jdhuff.c#L476 (Comment #90)

Am I missing something?  libjpeg-turbo already has that.  That line was inherited from jpeg-6b.

> and 
> https://github.com/mozilla/mozjpeg/blob/master/jdmarker.c#L475 (Comment
> #93), and note the sum was CVE-2013-6630 [2], the initial report of which
> echoes the discussion herein.

We also already have that, since version 1.3.1.

> Just something to watch for generally when doing changes to the common IJG
> (libjpeg6b) code in turbo: it has been very well vetted / scrutinized since
> 1998.  I think fall back to the slow path will avoid compat issues (at some
> cost to perf, TBD).

I have no problem with falling back to the slow path for corrupted JPEGs.  I have a big problem with falling back to the slow path for valid JPEGs, because the performance cost is not TBD-- it's 20%.  There are some cases in which we have to fall back to the slow path-- JPEGs with restart markers, for instance-- but I want to use the fast path whenever possible.

> Having the same input data decode to different images doesn't seem correct
> to me.

When the input data is bogus, the output data is undefined.  Placing a 0 byte in the output is simply what libjpeg has historically done, but that doesn't mean that the output pixels are any more or less valid than the ones libjpeg-turbo generates.  I would say they're equally invalid.  I do, however, agree with the argument that decoding to a single variant makes it easier to detect actual uninitialized memory issues.  In other words, libjpeg-turbo is basically generating a false positive here, and that's reason enough to change its behavior-- but I am also unwilling to do that upstream if it compromises performance by a significant amount, because this is an issue that does not affect the general use cases of libjpeg-turbo.  Thus, I need to vet your proposed solution in terms of performance, and it may be necessary for me to come back with a counter-proposal.  Stand by on that.

> Indeed, this path is not an option (rejected at the ITU-T, and hence not an
> option for Chrome either).  If turbo moves away libjpeg6b compat, in any of
> it's meanings, then chrome-team would like to know since we'd be left with
> fewer options given the ITU-T result.  Compat is important to us for many
> reasons other than pixels (see above: the standard we have), but drawing the
> same pixels for the same input data would be good start ;)

There may be a compelling reason at some point in the future for us to break bitwise compatibility with libjpeg v6b.  For instance, there are no real technical reasons anymore why we should provide three different DCT algorithms-- it's possible to implement a single integer algorithm that is as fast as the "fast" DCT but as accurate as the floating point DCT.  However, this is not something that would happen anytime soon, because it would require changes to all of the SIMD routines (and since I didn't write some of those, I am not in any mood to rewrite them unless someone pays me to.)  I would not ever implement SmartScale unless it was accepted as an official standard, and even then, it's very unclear as to what advantages it really offers (I refer you to the article on libjpeg-turbo.org for my opinions and research regarding that.)  If SmartScale ever was implemented, it would only be enabled if libjpeg-turbo was built with the libjpeg v8 compatibility option.

Comment 107

2 years ago
(In reply to DRC from comment #106)
> (In reply to noel gordon from comment #104)
> > the slow path code handle all these cases.
> > 
> >    if (nb > 16) \
> >       goto bailout; \
> >    s = htbl->pub->huffval[ (int) (s + htbl->valoffset[nb]) ];
> > 
> > Given that code, I'd add
> > https://github.com/mozilla/mozjpeg/blob/master/jdhuff.c#L476 (Comment #90)
> 
> Am I missing something?  libjpeg-turbo already has that.  That line was
> inherited from jpeg-6b.

Perhaps re-read the first entry in https://crbug.com/299835

The jdhuff.c#L476 line was mentioned (the fast path didn't use it, jpeg-6b did), jdmarker.c#L475 was the fix (in version 1.3.1.), and the report mentioned anther line as part of the problem

  #define HUFF_DECODE_FAST(s,nb,htbl) \
      ...
      s = htbl->pub->huffval[ (int) (s + htbl->valoffset[nb]) & 0xFF ]; \

We are familiar with these lines also in the current bug I think.

I like speed too - my point was I'd temper my desire for speed by being mindful of security when changing the IJG parts of the code.  It's well vetted code, so I'd take extra care when changing it, find peer reviewers, add tests etc, at least that's how I'd approach it :)  This was only friendly advice btw.

> > Just something to watch for generally when doing changes to the common IJG
> > (libjpeg6b) code in turbo: it has been very well vetted / scrutinized since
> > 1998.  I think fall back to the slow path will avoid compat issues (at some
> > cost to perf, TBD).
> 
> I have no problem with falling back to the slow path for corrupted JPEGs.  I
> have a big problem with falling back to the slow path for valid JPEGs,
> because the performance cost is not TBD-- it's 20%.

Markers drop you off fast path, always did.  Byte stuffing?  Nope.  So it's only markers I believe.  (One benefit of this bug is I have gained a better understand of this code).

> There are some cases in
> which we have to fall back to the slow path-- JPEGs with restart markers,
> for instance-- but I want to use the fast path whenever possible.

Understood, markers again. /me wants to use the fast path whenever possible too, and keep it secure and correct as best we can.

> > Having the same input data decode to different images doesn't seem correct
> > to me.
> 
> When the input data is bogus, the output data is undefined.  Placing a 0
> byte in the output is simply what libjpeg has historically done, but that
> doesn't mean that the output pixels are any more or less valid than the ones
> libjpeg-turbo generates.  I would say they're equally invalid.

I think they put a 0 byte in the output because, per the libjpeg code comment, it was the "safest" thing to do, with no added explanation about why it was so "safe".

> I do, however, agree with the argument that decoding to a single variant makes it
> easier to detect actual uninitialized memory issues.

Concur, it certainly does help with that case.

> In other words,
> libjpeg-turbo is basically generating a false positive here, and that's
> reason enough to change its behavior

Yeap.

> -- but I am also unwilling to do that
> upstream if it compromises performance by a significant amount, because this
> is an issue that does not affect the general use cases of libjpeg-turbo.

I believe the code can be both fast and correct.

> Thus, I need to vet your proposed solution in terms of performance, and it
> may be necessary for me to come back with a counter-proposal.  Stand by on
> that.

Happy to discuss a counter-proposal.  I have a few as well.

> > Indeed, this path is not an option (rejected at the ITU-T, and hence not an
> > option for Chrome either).  If turbo moves away libjpeg6b compat, in any of
> > it's meanings, then chrome-team would like to know since we'd be left with
> > fewer options given the ITU-T result.  Compat is important to us for many
> > reasons other than pixels (see above: the standard we have), but drawing the
> > same pixels for the same input data would be good start ;)
> 
> There may be a compelling reason at some point in the future for us to break
> bitwise compatibility with libjpeg v6b.

I was thinking that you might sometimes have the bitwise issue already, in that using accelerated h/w (SIMD/SSE/NEON/etc) maybe causes pixel changes?  So I only said "drawing the same pixels for the same input data", because maybe something in the accelerated h/w implementation details might not permit you to control, or easily correct, the pixel diffs.

> For instance, there are no real
> technical reasons anymore why we should provide three different DCT
> algorithms-- it's possible to implement a single integer algorithm that is
> as fast as the "fast" DCT but as accurate as the floating point DCT.

Interesting.

> However, this is not something that would happen anytime soon, because it
> would require changes to all of the SIMD routines (and since I didn't write
> some of those, I am not in any mood to rewrite them unless someone pays me
> to.)

Understood: time/money/resources etc.

> I would not ever implement SmartScale unless it was accepted as an
> official standard, and even then, it's very unclear as to what advantages it
> really offers (I refer you to the article on libjpeg-turbo.org for my
> opinions and research regarding that.)

I would not use it unless it passed peer review at the ITU.  I think I have read your article, thanks for writing it, and others about the issues surrounding the ITU decision. (Still looking on your site re: the perf stuff I asked about).

> If SmartScale ever was implemented,
> it would only be enabled if libjpeg-turbo was built with the libjpeg v8
> compatibility option.

SGTM.

Comment 108

2 years ago
(In reply to noel gordon from comment #104)

Now on to other business.  Anyone's antenna twitch on reading this bit?

> depends on, GET_BYTE,
> does not handle a run of multiple 0xFF in an MCU, whereas the slow path code
> clearly does [1].

The slow path never reads past the end of the input data buffer.  But GET_BYTE does not check for the end of the input buffer.  Could one set things up, such that the 'data' in an MCU, or the last MCU, of the compressed image input could be used to convince GET_BYTE to read past the end of the input buffer?

Interesting 'data' would be sequences of 0xFF00 0xFF00, sequences of maximum length code words, or indeed,
mixtures of the same.

Comment 109

2 years ago
Noel, just FYI, it is exceedingly difficult to process your replies, because you tend to dissect every sentence I write and reply to every sentence with a paragraph. Please endeavor to reply "in kind." It is not necessary to use five times as many words in your reply than what you are replying to. At some point, that crosses the line and becomes trolling. If I were to do the same thing, then this comment would be 25 pages long right now. I appreciate your input, but a discussion needs to involve input from others as well.

Comment 110

2 years ago
Don't mean to make if difficult for you to parse, but I want to answer each of your questions and concerns. An in-line reply mightn't work for you: a single line reply might seem like I'm ignoring you.

Comment 111

2 years ago
> depends on, GET_BYTE, does not handle a run of multiple 0xFF in an MCU, whereas the slow path code
> clearly does [1].

Correction (my bad).  0xFFFF is treated as a marker and dealt with on the slow path always.  Fine by me.

Comment 112

2 years ago
OK, cool.  So the action item is to me to evaluate your proposed fix, which I'll do as soon as I'm back in my lab.

Comment 113

2 years ago
Action item #108 begs your attention too. RE the lab: when do you expect to be there?

Comment 114

2 years ago
I am traveling on business and will return to my lab in 10 days' time.  I am working on this bug pro bono, so you're going to have to be patient.  The testing in question is going to require some hours of my time.  I have to be exceedingly careful about these things, because if I make a change that adversely affects performance in some way, it will affect virtually every Linux distro on the planet.

Regarding GET_BYTE(), please refer to the logic in decode_mcu().  The slow path is always used if there is less than (128 * blocks_in_MCU) bytes in the input buffer.  I assert that the algorithm is such that the fast decode path will never read more than this number of bytes when decoding a single MCU, but if you disagree, then let's discuss.

This code has undergone a tremendous amount of scrutiny by Mozilla and others over the years.  It has existed in the wild for 5 years, and whereas a few issues have been discovered with it during that time, the frequency of these bug reports is decreasing.  I appreciate your eagerness, but this really is quite stable code at this point.

Comment 115

2 years ago
re: GET_BYTE()a and (128 * blocks_in_MCU) limit.

For grayscale JPEG, blocks_in_MCU = 1, so fast path decoding is disabled at least < 128 bytes from the end of the input data stream. So by having exactly 128 bytes at the end of the compressed image, the fast path can be activated.  My various attempts at appending maximal length codeword sequences and/or 0xFF00 byte stuffed data into the last 128 bytes of the input stream of a grayscale JPEG image did make the fast path read, but only 96 bytes of input at most. I have no evidence at this time that FILL_BIT_BUFFER_FAST is unsafe (reads past the input data buffer).

Comment 116

2 years ago
re: jpeg-image-002.jpg

This image has never exhibited mutliple variants. Instead, it is either drawn or not drawn, which is different issue, perhaps a rendering bug, but yhere is no compat drawing that image in any browser (garbage in, garbage out), and there is no security issue with that image that I could find.

Comment 117

2 years ago
I measured the perf of the fall back fix in Chrome on four different machines using a corpus containing 142 JPEG images representative of the browser use-case and work load (including those JPEG in comment #57).  The results are about neutral.  I've rolled a fix into Chrome [1] and closed the chrome bug.

[1] https://crrev.com/c038962a97c41089b6ef09f498bfd934be95b5cc

Comment 118

2 years ago
I tested Noel's proposed patch, and whereas it is performance-neutral with 64-bit code, it unfortunately introduces a 5-8% regression with 32-bit code on both Intel and AMD processors.  I suspect Mozilla probably doesn't care about that regression, in which case you guys should feel free to check this in downstream, but unfortunately I can't include it upstream at this time, since the patch isn't actually fixing a bug but is rather just modifying the behavior of libjpeg-turbo vis-a-vis corrupt images to match what Mozilla/Chrome expect.  I don't have time to investigate the source of the performance regression or to tinker with other ways of doing this.
OK. We'll roll Noel's fix into our in-tree version of libjpeg-turbo as well.

On the off chance that this causes a regression, I'd prefer not to make the change so close to a release. So we'll make the change in Gecko 41 once the branch happens.

Comment 120

2 years ago
[1] https://crrev.com/c038962a97c41089b6ef09f498bfd934be95b5cc has been in Chrome stable 43 for a month. No bleeps from the PLT, we use the 32-bit jpeg decode code path only, and are not seeing anything like the regression reported on #118. So @ #119, yeah some independent measurements would be good data to have.

Comment 121

2 years ago
(In reply to Al Billings [:abillings] from comment #101)
> Reading this, it seems that there is agreement that this is not a security
> bug. Does this mean that we can open this bug up to the public?

Three of test cases on this bug were fixed by my patch.  But one test case was not, namely,

  http://lcamtuf.coredump.cx/ffjpeg

and it had multiple renderings in Firefox and Chrome for me.

I investigated: it's a different issue, it's not a security issue, it's a rendering bug, and I fixed it on https://code.google.com/p/chromium/issues/detail?id=500567 and it's open source.

I see no security issues in any of test cases added to this bug.  I have no objections to this bug being made public.

Updated

2 years ago
Group: core-security
(Assignee)

Comment 122

a year ago
Created attachment 8726999 [details] [diff] [review]
patch for m-c

This imports Noel's patch to blink to our copy of libjpeg-turbo, and updates the libjpeg-turbo update script to automatically apply this patch diff.
Assignee: nobody → tnikkel
Attachment #8726999 - Flags: review?(seth)
(Assignee)

Comment 123

a year ago
(In reply to noel gordon from comment #121)
> (In reply to Al Billings [:abillings] from comment #101)
> > Reading this, it seems that there is agreement that this is not a security
> > bug. Does this mean that we can open this bug up to the public?
> 
> Three of test cases on this bug were fixed by my patch.  But one test case
> was not, namely,
> 
>   http://lcamtuf.coredump.cx/ffjpeg
> 
> and it had multiple renderings in Firefox and Chrome for me.
> 
> I investigated: it's a different issue, it's not a security issue, it's a
> rendering bug, and I fixed it on
> https://code.google.com/p/chromium/issues/detail?id=500567 and it's open
> source.
> 
> I see no security issues in any of test cases added to this bug.  I have no
> objections to this bug being made public.

I'm assuming this was the jpeg-image-002.jpg issue that you mentioned earlier? The image on http://lcamtuf.coredump.cx/ffjpeg matches jpeg-image-003.jpg from the "broken-images.tar.gz 4 test images" attachment, so I'm a little confused.

I'll file another bug to import this change to our jpeg decoder. Thanks for your work on this Noel!
Flags: needinfo?(noel.gordon)

Comment 124

a year ago
(In reply to Timothy Nikkel (:tnikkel) from comment #123)
> I'm assuming this was the jpeg-image-002.jpg issue that you mentioned earlier?

Nope, jpeg-image-002.jpg came from http://lcamtuf.coredump.cx/chjpeg/foo.html (comment #103).

> The image on http://lcamtuf.coredump.cx/ffjpeg matches jpeg-image-003.jpg ...

Yes, and it's the image I was referring to in comment #118.  It is a progressive JPEG image, and a separate
patch was submitted to Chrome to handle that case, issue https://crbug.com/500567
Flags: needinfo?(noel.gordon)
(Assignee)

Comment 125

a year ago
(In reply to noel gordon from comment #124)
> (In reply to Timothy Nikkel (:tnikkel) from comment #123)
> > I'm assuming this was the jpeg-image-002.jpg issue that you mentioned earlier?
> 
> Nope, jpeg-image-002.jpg came from
> http://lcamtuf.coredump.cx/chjpeg/foo.html (comment #103).

The image I get from that page is 1448 bytes, and doesn't match anything in the broken-images.tar.gz attachment.

Comment 126

a year ago
Agree, the jpeg-image-002.jpg image should be 1448 bytes, with md5sum 221a3c9434b546a4c436f84746eeb87e, and the image in the tar.gz does not match, my mistake.  Please use the one from the test page.
Comment on attachment 8726999 [details] [diff] [review]
patch for m-c

Review of attachment 8726999 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good.
Attachment #8726999 - Flags: review?(seth) → review+
https://hg.mozilla.org/mozilla-central/rev/d8190cbce7b0
Status: NEW → RESOLVED
Last Resolved: a year ago
status-firefox48: --- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
(Assignee)

Comment 129

a year ago
This landed on inbound before but didn't get recorded here.
https://hg.mozilla.org/integration/mozilla-inbound/rev/d8190cbce7b0
Group: gfx-core-security → core-security-release
(Assignee)

Comment 130

a year ago
(In reply to noel gordon from comment #124)
> > The image on http://lcamtuf.coredump.cx/ffjpeg matches jpeg-image-003.jpg ...
> 
> Yes, and it's the image I was referring to in comment #118.  It is a
> progressive JPEG image, and a separate
> patch was submitted to Chrome to handle that case, issue
> https://crbug.com/500567

I ported this patch over directly. It didn't fix the issue at http://lcamtuf.coredump.cx/ffjpeg. In fact it made it easier for me to reproduce the problem. Bug 1257101 has the patch to fix that page. I'll keep the patch around though in case we run into more jpeg image inconsistencies.
(Assignee)

Comment 131

a year ago
Created attachment 8731087 [details] [diff] [review]
patch from https://crbug.com/500567 ported
(Assignee)

Comment 132

a year ago
(In reply to Timothy Nikkel (:tnikkel) from comment #131)
> Created attachment 8731087 [details] [diff] [review]
> patch from https://crbug.com/500567 ported

This seems to fix this testcase

http://lcamtuf.coredump.cx/chjpeg/index2a.html

which I found while going through

https://bugs.chromium.org/p/chromium/issues/detail?id=398235

The testcase isn't fixed by any other recent patches in this area.

Updated

11 months ago
Whiteboard: [adv-main48-]
Group: core-security-release
You need to log in before you can comment on or make changes to this bug.