Closed Bug 1204394 Opened 4 years ago Closed 4 years ago

Use StreamingLexer in the BMP decoder

Categories

(Core :: ImageLib, defect)

defect
Not set

Tracking

()

RESOLVED FIXED
mozilla44
Tracking Status
firefox44 --- fixed

People

(Reporter: seth, Assigned: njn, Mentored)

References

Details

(Whiteboard: [gfx-noted])

Attachments

(2 files, 4 obsolete files)

It'd be nice to rework the BMP decoder to use StreamingLexer. A particular advantage of this is that currently, the ICO decoder has to actually synthesize a fake header to pass to BMP decoder, but if the BMP decoder used the StreamingLexer / method-per-state approach that the ICO decoder uses now, we could avoid that complexity easily and just start decoding in a different state for BMPs embedded in ICOs.

This would be a good first bug for a new contributor with good C++ skills.
Whiteboard: [gfx-noted]
Assignee: nobody → n.nethercote
The are some 4-space indents in this file which mostly uses 2-space indents.
Attachment #8670619 - Flags: review?(seth)
This is far enough along that some feedback would be helpful. In particular, I
have comments marked with "njn" in places where I'm unsure about things.

It's definitely an improvement over the old code, about 280 lines shorter and
easier to read.

I'm still iffy about test coverage. This format has lots of possible
combinations and I don't think they all have test coverage. (Indeed, I have
code to crash on some combinations that I haven't yet encountered.) I also
haven't thought carefully yet about avoiding problems (e.g. buffer overflows)
with malformed files.

There's some lack of clarity about what versions of these files are supported.
The types mention OS2, WIN_V3, WIN_V5. But the code seems to indicate the
following:

- BITMAPCOREHEADER / OS21XBITMAPHEADER: supported

- OS22XBITMAPHEADER: unsupported? not entirely sure

- BITMAPINFOHEADER: supported

- BITMAPV2INFOHEADER: supported

- BITMAPV3INFOHEADER / BITMAPV4HEADER / BITMAPV5HEADER: partly supported -- the
  extra fields these versions have beyond what BITMAPV2INFOHEADER has are
  ignored, though.

The patch is a nightmare to read. That seems unavoidable, unfortunately. It's
probably easiest to review just by opening the old nsBMPEncoder.cpp and the new
one side by side.
Attachment #8670621 - Flags: review?(seth)
> There's some lack of clarity about what versions of these files are
> supported.

Ah, the version naming used at https://en.wikipedia.org/wiki/BMP_file_format is a bit confusing. The naming used at http://www.fileformat.info/format/bmp/egff.htm is clearer and matches up better with what's in the code.
Attachment #8670621 - Flags: review?(seth)
I talked with seth on IRC and no longer need feedback on part 2.
Updated version. Looking much better, but still a bit more work to do.
Attachment #8670621 - Attachment is obsolete: true
seth: I found http://entropymine.com/jason/bmpsuite/bmpsuite/html/bmpsuite.html which is a great suite of test BMP images, and AFAICT it covers lots of dark corners that our existing tests do not. (I haven't found any 16 bpp BMP files in our tests, for example.) Do you think it would be reasonable to import it into reftests?

The code that generates the images is GPLv3, so hopefully this is legally ok (and I'm happy to check this).
Attachment #8670619 - Attachment is obsolete: true
Attachment #8670619 - Flags: review?(seth)
> The code that generates the images is GPLv3, so hopefully this is legally ok
> (and I'm happy to check this).

Looking more closely, https://github.com/jsummers/bmpsuite/blob/master/readme.txt says "Image files generated by this program are not covered by this license, and are in the public domain."
Apologies for the hugeness of this patch. The StreamingLexer/WriteInternal
changes were large enough that I figured there was no point separating the
other changes. Applying the patch and reading the resulting code might be
easiest, at least for large chunks of it.
Attachment #8671748 - Flags: review?(seth)
Attachment #8671235 - Attachment is obsolete: true
This test suite has greatly increased my confidence in the new nsBMPDecoder
code.
Attachment #8671749 - Flags: review?(seth)
I was wondering, does our ICO decoder use the BMP decoder when the ICO file contains BMP data? I was thinking it would be nice if the test suite applies to both.
(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #10)
> I was wondering, does our ICO decoder use the BMP decoder when the ICO file
> contains BMP data?

It does. See comment 0, for example, though I haven't made that suggested change here.
Try looks good:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=86f1a22a193f
https://treeherder.mozilla.org/#/jobs?repo=try&revision=fdcc39cacded

(Yes, I could have avoided the Linux overlap in the second job, but I didn't think of it until afterwards. Oh well.)
Comment on attachment 8671749 [details] [diff] [review]
(part 2) - Add bmpsuite to the BMP reftests

Review of attachment 8671749 [details] [diff] [review]:
-----------------------------------------------------------------

This is great, Nick. Thanks for putting this together!

::: image/test/reftest/bmp/bmpsuite/README.mozilla
@@ +22,5 @@
> +
> +- The first line starts with "BMP:" and is the output of the MOZ_LOG call in
> +  nsBMPDecoder.cpp. It has basic image info.
> +
> +- Next is a quote from the bmpsuite docs, which describes the particulars of

Do we know the license of the bmpsuite docs themselves?

::: image/test/reftest/bmp/bmpsuite/g/reftest.list
@@ +76,5 @@
> +# "A 16-bit image with the default color format: 5 bits each for red, green, and
> +# blue, and 1 unused bit. The whitest colors should (I assume) be displayed as
> +# pure white: (255,255,255), not (248,248,248)."
> +# [XXX: we do the cheap scaling that gives (248,248,248) for white, which
> +# explains the overly generous fuzzy values. Chromium does proper scaling.]

We should probably file a followup for this.

@@ +112,5 @@
> +# "A 32-bit image with a BITFIELDS segment. As usual, there are 8 bits per color
> +# channel, and 8 unused bits. But the color channels are in an unusual order,
> +# so the viewer must read the BITFIELDS, and not just guess."
> +# [XXX: we get this completely wrong because we don't handle BITFIELDS at all
> +# in 32bpp BMPs. Chromium gets this right.]

This too.

::: image/test/reftest/bmp/bmpsuite/q/reftest.list
@@ +73,5 @@
> +# "A 16-bit image with an alpha channel. There are 4 bits for each color
> +# channel, and 4 bits for the alpha channel. It’s not clear if this is valid,
> +# but I can’t find anything that suggests it isn’t."
> +# [XXX: we don't even try to do transparency for 16bpp. Chromium gets the
> +# transparency right.]

This too.
Attachment #8671749 - Flags: review?(seth) → review+
Comment on attachment 8671748 [details] [diff] [review]
(part 1) - Using StreamingLexer in the BMP decoder

Review of attachment 8671748 [details] [diff] [review]:
-----------------------------------------------------------------

Consider all the changes outside of nsBMPDecoder.(cpp|h) r+'d. I'll make a second pass over this patch later to review those changes.
Blocks: 1213613
(In reply to Seth Fowler [:seth] [:s2h] from comment #13)
> Comment on attachment 8671749 [details] [diff] [review]
> (part 2) - Add bmpsuite to the BMP reftests
> 
> Review of attachment 8671749 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> This is great, Nick. Thanks for putting this together!
> 
> ::: image/test/reftest/bmp/bmpsuite/README.mozilla
> @@ +22,5 @@
> > +
> > +- The first line starts with "BMP:" and is the output of the MOZ_LOG call in
> > +  nsBMPDecoder.cpp. It has basic image info.
> > +
> > +- Next is a quote from the bmpsuite docs, which describes the particulars of
> 
> Do we know the license of the bmpsuite docs themselves?

I guess it's GPLv3 because it's within the code which is GPLv3. You could probably argue that these quotes are fair use.

I've also copied some of the reference PNGs. I also generated a couple of them myself (using Firefox's new "screenshot a DOM element" feature).

I will ask Gerv about all these.


> ::: image/test/reftest/bmp/bmpsuite/g/reftest.list
> @@ +76,5 @@
> > +# "A 16-bit image with the default color format: 5 bits each for red, green, and
> > +# blue, and 1 unused bit. The whitest colors should (I assume) be displayed as
> > +# pure white: (255,255,255), not (248,248,248)."
> > +# [XXX: we do the cheap scaling that gives (248,248,248) for white, which
> > +# explains the overly generous fuzzy values. Chromium does proper scaling.]
> 
> We should probably file a followup for this.
> This too.
> This too.

I filed bug 1213613 for all of them. They're all related so I will likely fix them all in one bug.
Gerv: I have a question about including third-party data and code into our codebase.

bmpsuite (https://github.com/jsummers/bmpsuite/) is a really nice test suite for BMP images and I want to import parts of it. (See the attached "part 2" patch.)

bmpsuite consists of (a) code to generate the test BMP images, (b) some reference PNG images used to compare against the test BMP images, and (c) some documentation.

bmpsuite's readme.txt says it is GPLv3 but specifically says that the images it code generates are in the public domain.

So, the parts of interest in my patch...

(0a) I've documented that we're using content from bmpsuite in the README.mozilla file, and I linked to the github page, so attribution is present.

(0b) I didn't include the bmpsuite code for generating the BMP images. (I just ran that code locally to generate them.)

(1) I included the generated BMP images, which are public domain and so should be fine.

(2) I included some quotes from the documentation ((c) above). I'd suggest they are fair use but Plan B would be to remove them and just point to the online documentation if necessary, which would be a mild inconvenience. Or I could rewrite the descriptions to not be direct quotes.

(3) I included some of the reference PNG images ((b) above). Plan B would be to regenerate these via screen shots of our renderings of the BMP images, if necessary. If I did that there's a good chance they'd be exactly the same as the original, or at least, extremely similar.

I'm confident that (1) is fine. What about (2) and (3)? If GPLv3 is compatible with MPL2 then we should be good. If not, things are a little trickier and Plan B might be necessary. What do you think?

Thank you.
Flags: needinfo?(gerv)
Comment on attachment 8671748 [details] [diff] [review]
(part 1) - Using StreamingLexer in the BMP decoder

Review of attachment 8671748 [details] [diff] [review]:
-----------------------------------------------------------------

This came out really well, Nick! Thanks for putting so much effort into this!

r+ with the additional |mCurrentRow == 0| checks mentioned below and a name change for |EndOfRow()|. The other stuff is optional but recommended. =)

::: image/decoders/nsBMPDecoder.cpp
@@ +249,5 @@
> +  mBitFields.blueLeftShift = 8 - length;
> +}
> +
> +void
> +nsBMPDecoder::EndOfRow()

This name sounds like a getter to me. Maybe "FinishRow()"?

@@ +528,5 @@
>  
> +LexerTransition<nsBMPDecoder::State>
> +nsBMPDecoder::ReadColorTable(const char* aData, size_t aLength)
> +{
> +  MOZ_ASSERT_IF(aLength != 0, mColors);

Maybe also MOZ_ASSERT_IF(aLength != 0, mNumColors > 0)?

@@ +572,5 @@
> +  const uint8_t* p = reinterpret_cast<const uint8_t*>(aData);
> +  uint32_t* d = mDownscaler
> +              ? reinterpret_cast<uint32_t*>(mDownscaler->RowBuffer())
> +              : reinterpret_cast<uint32_t*>(mImageData) +
> +                  PIXEL_OFFSET(mCurrentRow, 0);

It might be good to abstract this into a private method |GetRowBuffer()| which would do this calculation based upon |mCurrentRow| and |mCurrentPos| and return the appropriate pointer. There are two other places where we do this computation, so it'd be nice to share the code. We do this in some of the other decoders - for example, nsGIFDecoder2.

|ReadPixelRow()| leaves |mCurrentPos| at 0, so even though it doesn't use |mCurrentPos| in its calculation, it doesn't seem like we need to handle it specially.

@@ +681,5 @@
> +    if (pixelsNeeded) {
> +      uint32_t* d = mDownscaler
> +        ? reinterpret_cast<uint32_t*>(mDownscaler->RowBuffer()) + mCurrentPos
> +        : reinterpret_cast<uint32_t*>(mImageData) +
> +            PIXEL_OFFSET(mCurrentRow, mCurrentPos);

This is another place where a new |GetRowBuffer()| method could be called.

@@ +701,5 @@
> +
> +  if (byte2 == RLE::ESCAPE_EOL) {
> +    EndOfRow();
> +    mCurrentPos = 0;
> +    return Transition::To(State::RLE_SEGMENT, RLE_SEGMENT_LENGTH);

We should probably be checking if |mCurrentRow == 0| here and transitioning to State::SUCCESS if so, rather than relying on the check at the the top of this method.

@@ +765,5 @@
> +      mDownscaler->CommitRow();
> +    }
> +  }
> +
> +  return Transition::To(State::RLE_SEGMENT, RLE_SEGMENT_LENGTH);

Again, seems like we should be checking if |mCurrentRow == 0| here and transitioning to State::SUCCESS.

@@ +785,5 @@
> +  // which contains the color index of a single pixel.
> +  uint32_t* dst = mDownscaler
> +    ? reinterpret_cast<uint32_t*>(mDownscaler->RowBuffer()) + mCurrentPos
> +    : reinterpret_cast<uint32_t*>(mImageData) +
> +        PIXEL_OFFSET(mCurrentRow, mCurrentPos);

This is another place where a new |GetRowBuffer()| method could be called.

::: image/decoders/nsBMPDecoder.h
@@ +104,3 @@
>  
> +    uint32_t mNumColors;      // The number of used colors, i.e. the number of
> +                              //   entries in mColors, if it's present.

Micronit: Are you intentionally using hanging indent here?

@@ +117,5 @@
>  
> +    int32_t mCurrentRow;      // Index of the row of the image that's currently
> +                              // being decoded: [height,1].
> +    int32_t mCurrentPos;      // Index into the current line; only used when
> +                              //   doing RLE decoding.

(Also here?)
Attachment #8671748 - Flags: review?(seth) → review+
If you can set up the decoder to run stand-alone, then fuzzing it with something like American Fuzzy Lop ( https://en.wikipedia.org/wiki/American_fuzzy_lop_%28fuzzer%29 ) might be useful.
Judging from the many existing cases in the Mozilla codebase, importing GPLv3
code is fine. I've updated the README.mozilla file in this patch to
specifically describe the GPLv3 things we're including (some documentation
quotes and some reference PNG images) and I also added a COPYING.txt file to
the directory. So I think this should be good enough.
Attachment #8671749 - Attachment is obsolete: true
> @@ +572,5 @@
> > +  const uint8_t* p = reinterpret_cast<const uint8_t*>(aData);
> > +  uint32_t* d = mDownscaler
> > +              ? reinterpret_cast<uint32_t*>(mDownscaler->RowBuffer())
> > +              : reinterpret_cast<uint32_t*>(mImageData) +
> > +                  PIXEL_OFFSET(mCurrentRow, 0);
> 
> It might be good to abstract this into a private method |GetRowBuffer()|
> which would do this calculation based upon |mCurrentRow| and |mCurrentPos|
> and return the appropriate pointer. There are two other places where we do
> this computation, so it'd be nice to share the code. We do this in some of
> the other decoders - for example, nsGIFDecoder2.

Good idea. That'll let me remove the LINE and PIXEL_OFFSET macros, too.
 
> |ReadPixelRow()| leaves |mCurrentPos| at 0, so even though it doesn't use
> |mCurrentPos| in its calculation, it doesn't seem like we need to handle it
> specially.

I'll add an assertion that mCurrentPos is zero in ReadPixelRow().


> > +    uint32_t mNumColors;      // The number of used colors, i.e. the number of
> > +                              //   entries in mColors, if it's present.
> 
> Micronit: Are you intentionally using hanging indent here?

I am -- it's better than the weird "<" prefix used in the old comments -- but it's not really necessary so I'll remove it.


I've addressed the other comments as you suggested.
Comment on attachment 8672833 [details] [diff] [review]
(part 2) - Add bmpsuite to the BMP reftests

Carrying over r+ from Seth, adding sr? request for Gerv.
Attachment #8672833 - Flags: superreview?(gerv)
Attachment #8672833 - Flags: review+
Blocks: 1214476
Sorry I'm late to the party here.

(In reply to Nicholas Nethercote [:njn] from comment #19)
> Judging from the many existing cases in the Mozilla codebase, importing GPLv3
> code is fine. 

Not sure where you got that idea - we definitely have no GPLv3 code in the main Firefox or B2G codebases. We may have the odd bit in independent tools. But there's certainly not "many cases" AFAIK.

> I've updated the README.mozilla file in this patch to
> specifically describe the GPLv3 things we're including (some documentation
> quotes and some reference PNG images) and I also added a COPYING.txt file to
> the directory. So I think this should be good enough.

This stuff is all standalone and for tests only, so that's OK, as long as it's appropriately documented. So that sounds fine.

Gerv
Flags: needinfo?(gerv)
(In reply to Gervase Markham [:gerv] from comment #22)
> >
> > Judging from the many existing cases in the Mozilla codebase, importing GPLv3
> > code is fine. 
> 
> Not sure where you got that idea - we definitely have no GPLv3 code in the
> main Firefox or B2G codebases. We may have the odd bit in independent tools.
> But there's certainly not "many cases" AFAIK.

If you search for "version 3 of the License" (https://dxr.mozilla.org/mozilla-central/search?q=%22version+3+of+the+License%22&redirect=false&case=true) you get 41 hits. But they do look to mostly (all?) be in tests or the build system, i.e. not part of released code.

Thank you for the approval.
https://hg.mozilla.org/mozilla-central/rev/0c84122fa187
https://hg.mozilla.org/mozilla-central/rev/609fa3c01584
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla44
(In reply to Nicholas Nethercote [:njn] from comment #24)
> If you search for "version 3 of the License"
> (https://dxr.mozilla.org/mozilla-central/
> search?q=%22version+3+of+the+License%22&redirect=false&case=true) you get 41
> hits. But they do look to mostly (all?) be in tests or the build system,
> i.e. not part of released code.

Just FYI, what's fooling you there is that many of these are autoconf files. While these are technically GPLv3, they have a broad exception to them which allows you to distribute them under any other license.

See https://dxr.mozilla.org/mozilla-central/source/build/autoconf/config.guess for an example; see the para starting "As a special exception...".

True GPLv3 files are much rarer.

Gerv
Attachment #8672833 - Flags: superreview?(gerv) → superreview+
Depends on: 1237171
Depends on: 1267224
Blocks: 1284031
See Also: → 1422979
You need to log in before you can comment on or make changes to this bug.