GIF Decoder: replace gathering buffer with dynamic malloc to fixed 256 bytes hold

VERIFIED FIXED

Status

()

defect
VERIFIED FIXED
15 years ago
12 years ago

People

(Reporter: alfredkayser, Assigned: alfredkayser)

Tracking

({memory-footprint, perf})

Trunk
Points:
---
Dependency tree / graph
Bug Flags:
blocking1.8b4 -
blocking1.8b5 -

Firefox Tracking Flags

(Not tracked)

Details

Attachments

(1 attachment, 4 obsolete attachments)

As a followup to bug 274391, comment #27:

The GIF decoder uses a dynamically allocated and resizing 'gathering' buffer to
collect the 'blocks' of the gif file. Profiling learned that for 
GIF files bigger than 4K, each 4K crossing results in the allocation and
resizing of a gathering buffer to get the bytes at the end of the prev. 4k
together with the first bytes of the next 4K into a gathering buffer.

In a large animated GIF of 2MB, this results in about 1000 Allocs/reallocs...
Using a fixed hold buffer of 256 bytes, these allocs and reallocs can all be
prevented. In a gif file all items are at most 256 bytes, except for the colormaps.

The global_colormap can be directly copied to the available colormap in
gifstruct, and the local_colormap directly into its own allready dynamically
allocated colormap.

So, by replacing the dynamical gathering buffer by a fixed 256 byte buffer we
can save a lot allocs and reallocs. Also by preventing to switching to the
'gather_state' between each state, the state engine is twice as fast, as we only
need to 'gather' the block at the end and start of each 4K (passed to gif_write
by the streamers).

Removing some other redundant states, as well as some redundant variables
(clear_code is a static value, backgroundcolor is not needed, version can be
PRPackedBool), about 28 bytes from the gif_struct is saved.

Patch to be attached soon.
Blocks: 196295
Posted patch First edition (obsolete) — Splinter Review
This patch replaces the dynamically allocated 'gather' buffer with a 256 bytes
hold buffer allocated as part of the gif_struct. 
Global and local colormaps are directly copied into the resp. global_colormap
of the gif_struct, and the dynamically allocated local_colormap.
This also remove the now redundant 'gather' state, so that the decoder now
directly switches to the next state, instead of going through the 'gather'
state between each state.

Also removed the not used member mBackgroundColor.

The state engine is now managed by two variable: 'state' the state the decoder
want to go to (or is active in), and the 'hold_more' integer indicating how
many bytes should be available for the next state. So GETN(13, gif_init);
requests to go to the state 'gif_init', ensuring that at least 13 bytes are
available. 
In most cases this is in the current input buffer. 
If the input buffer is not enough the remain bytes are stored in the 'hold',
and at entry of the next gif_write, first the hold is completed to satisfy
'hold_more', to go the next state. 
This way, at most 256 bytes are required in the hold buffer, as each next state
block is never bigger than 256 bytes. (except for colormaps, but they are
copied directly in place).

This saves a lot of allocs (for a 80K gif image, about 80 allocs!), makes the
state engine itself faster, copies colormaps directly, and less code as well.
Note, also fixed the order of the members and initialisation of nsGIFDecoder2
(to remove the needless compiler warnings).
Also aligned comments of the members of gif_struct, that's why they all appear
in the diff.
Attachment #177387 - Flags: review?(pavlov)
Blocks: 124608
Blocks: 92580
Blocks: 105370
I think the only 1 bucket is needed after bug 274391 converted 3 mallocs into 1,
kGifAllocatorNBucket should be set to 1 at:
http://lxr.mozilla.org/mozilla/source/modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp#60
(I'm commenting here because the other bug is fixed)
Yes, nBuckets can be reduced to 1, because it seems that there is always only 1
gif decoder active, but further proof is required.

It would be even better if the combination nsGIFdecoder and mGIFstruct can be
cached as a whole, particulary if normally at most 1 is active.
This seems to be related to the way the image decoders are 'created' in imgLoad.
Keywords: mlk
Keywords: mlkfootprint, perf
No longer blocks: 196295
As bug 196295 is progressing now, let's have this one after that one.

This bug covers replacement of dynamic gathering buffer with one static array
(256 bytes) in GIFstruct. This will then also remove some redundant states
(deadcode), as well as the then redundant 'gif_write_ready' function.

Also there some redundant and/or dead members of GIF2 class and GIFstruct that
be can cleaned (tpixel, is_transparent, control_extension, interlaced, etc..)
Depends on: 196295
Comment on attachment 177387 [details] [diff] [review]
First edition

Removing review request, awaiting a new patch after bug 196295 has landed,
which makes this one much easier to review!
Attachment #177387 - Flags: review?(pavlov)
Attachment #177387 - Flags: review?(tor)
What happened to the ordering relative to bug 196295?
This bug is not really dependent on 196295.
I thought to wait reviews of this one to after that one.
But seeing that 196295 is not progressing, I wanted to get at least
the review started of this one. 
If 196295 gets checked in meanwhile, I will update this patch accordingly...
Comment on attachment 177387 [details] [diff] [review]
First edition

Reading through the patch, it seems be a collection of number of different
cleanups/changes.  Please address one problem at a time.
Attachment #177387 - Flags: review?(tor) → review-
Attachment #177387 - Attachment is obsolete: true
Summary of patch (V2):
* Much smaller because of the removed other (very minor) optimizations
* Replaced hold, gather_head, gather_request_size, gathered, post_gather_state
with hold_more, hold_size, and hold[MAX_HOLD_SIZE].
* Logic is hold buffer is now:
  - Hold never holds more than 256 bytes (as all blocks, except colormaps are <=256)
  - hold_more is the number of bytes to get into the hold
  - hold_size is number of bytes allready in hold.
* The hold buffer is only required between gif_writes, because in the loop in
gif_write, as long as the input buffer as more bytes than needed (len =>
gs->hold_more) we consume directly from input buffer. If not enough is left in
input buffer, it is pushed into the hold, awaiting the next call to gif_write.
Then first the hold is completed with 'hold_more' for the current state.
* The colormaps are bigger than 256 but can directly be copied from input into
the corr. colormap. Except if input doesn't contain enough, than the rest is
copied, and the next call to gif_write first completes the colormap copy.
* This also means that the 'gif_write_ready' is no longer needed.
* Because of the gather into hold change, also the start state needed to be
changed, merging also the three states (gif_init, gif_type, gif_version) into
one (gif_type) to check the first 6 bytes.
* States gif_gather, gif_delay and gif_stop_animating are no longer required.
Attachment #187492 - Flags: review?(tor)
Comment on attachment 187492 [details] [diff] [review]
V2: Removed other cleanups; only do the gather buffer fix

>+  const PRUint8 *q = buf;
> 
>-  const PRUint8 *q, *p = buf, *ep = buf + len;
>+  // Add what we have sofar to the block
>+  // If last call to me, left something in the hold first complete current block
>+  // Or if we are filling the colormaps, 
>+  PRUint8* p = nsnull;
>+  switch (gs->state)
>+  {
>+    case gif_global_colormap: p = gs->global_colormap; break;
>+    case gif_image_colormap: p = gs->local_colormap; break;
>+    default: if (gs->hold_size) p = gs->hold; break;
>+  }

Please split the case statements on seperate lines.

>+  if (p) {
>+    // Add what we have sofar to the block
>+    PRUint32 l = len < gs->hold_more ? len : gs->hold_more;

Use PR_MIN.

>+    memcpy(p+gs->hold_size, buf, l);
>+    buf += l;
>+    len -= l;
>+    gs->hold_size += l;
>+    gs->hold_more -= l;
> 
>-  q = nsnull;                   /* Initialize to shut up gcc warnings */
>+    if (gs->hold_more > 0) {
>+      // Block in hold is not complete yet
>+      return PR_SUCCESS;
>+    }
>+
>+    gs->hold_size = gs->hold_more = 0;

Only need to zero hold_size, since hold_more will already be zero.
>+    q = gs->hold;
>+  }
>+
>+  // Invariant: q is start of current to be processed block
>+  // buf points to start of next block
>+  // len is number of bytes left in buffer (from position p)
>+  // gs->hold_more is number of bytes required from buffer for this block (0 when current block in 'hold')
>+  for (;len >= gs->hold_more; q=buf) {   
>+    // Eat the current block from the buffer, q keeps pointed at current block
>+    buf += gs->hold_more; 
>+    len -= gs->hold_more;

Something about this logic doesn't seem to add up.  If you're working out of
the hold buffer, len is going to be what was passed in minus hold_more, but you
only want to process hold_size (before you zeroed it) bytes from the hold
buffer.
Attachment #187492 - Flags: review?(tor) → review-
Changes:
> Please split the case statements on seperate lines: Ok, replaced by 'if'
statements, is shorter but as readable.
> Use PR_MIN: Ok
> Only need to zero hold_size, since hold_more will already be zero: Ok
> Something about this logic doesn't seem to add up:
It does, but I improved the logic as follows:
* First complete block in hold if required
* If still not complete, adjust hold_size and hold_more and go to next
gif_write
* Otherwise, go to for loop, with 'hold_more' being number of bytes to be
consumed from 'buf' for current block, and with 'q' pointing to start of
current block (either hold, colormap or buf), and 'buf' to input buffer at
start of bytes to be consumed...
* At entrance of for loop, adjust 'buf' and 'len' with 'hold_more' to really
consume the bytes from input buffer, and then process current block pointed to
by 'q'. At next loop, set 'q' to the adjusted 'buf', etc...

Note, 'hold' and 'hold_size' are only used from end of gif_write to start of
next gif_write.
Attachment #187492 - Attachment is obsolete: true
Attachment #187940 - Flags: review?(tor)
Oops, don't fix too much. Colormap copying reverted to v2, and fully tested to
work, including 'truecolor.gif'.
Attachment #187940 - Attachment is obsolete: true
Attachment #187941 - Flags: review?(tor)
Attachment #187940 - Flags: review?(tor)
Comment on attachment 187941 [details] [diff] [review]
v3b: Oops, colormap copying malfunction detected and fixed

This buffer/loop control fails the test of being easy to understand for someone
new to the code, but I'm not sure exactly how to fix that.  Renaming hold_size
might help ("held" or "held_bytes", maybe?).

Some minor nits: there's a couple misspellings of "already" in comments, and
some of the new code is missing spaces around operators.

With those fixed, r=tor.
Attachment #187941 - Flags: review?(tor) → review+
we need to triple check this patch for security holes before it lands
To help understanding the logic of the 'hold' buffer:
* Renamed hold_more to bytes_to_consume: # of bytes for current state to
consume from input buffer for the current state.
* Renamed hold_size to bytes_in_hold: # of bytes already accumulated into the
hold

Fixed spelling errors.
Tested with my set of large and animated gif's.

As for the triple security check, it is important, but I believe my way of
using
a fixed hold buffer is safer than using a dynamically allocated gathering
buffer.  For the fixed hold buffer, it is important to make sure that never
more bytes are put into it than allowed.  Check all the 'GETN' statements where
the 'bytes_to_consume' value is set. 
Three cases exists:
* fixed size, for fixed sized elements (1, 2, 7, etc).
* lzw, comment, blocks etc: length value is always limited to one byte so max.
255.
* colormaps, max. length is always (2<<7)*3 = 768 bytes, copied into the resp.
colormap buffers. global_colormap is fixed at 768, and local_colormap is
realloc to the actual length.
Attachment #187941 - Attachment is obsolete: true
Comment on attachment 189901 [details] [diff] [review]
V4: Comments from Tor incorporated

Stuart, can you do the superreview as well as the security check?

Thanks in advance, Alfred
Attachment #189901 - Flags: superreview?(pavlov)
Attachment #189901 - Flags: superreview?(pavlov) → superreview+
Having received both r and sr, the next step is to commit it to the three(s).

Who can do this for me?
At this point in the cycle you also need approval, which means we need to judge
if it's safe to take this for gecko 1.8.
Attachment #189901 - Flags: approval1.8b4?
Comment on attachment 189901 [details] [diff] [review]
V4: Comments from Tor incorporated

The risk/reward factor isn't good for getting this in 1.8... let's land it
early in 1.9.
Attachment #189901 - Flags: approval1.8b4? → approval1.8b4-
Who can check this for me into the 'trunk' three for 1.9?
checked in to trunk
Checking in modules/libpr0n/decoders/gif/GIF2.cpp;
/cvsroot/mozilla/modules/libpr0n/decoders/gif/GIF2.cpp,v  <--  GIF2.cpp
new revision: 1.53; previous revision: 1.52
done
Checking in modules/libpr0n/decoders/gif/GIF2.h;
/cvsroot/mozilla/modules/libpr0n/decoders/gif/GIF2.h,v  <--  GIF2.h
new revision: 1.21; previous revision: 1.20
done
Checking in modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp;
/cvsroot/mozilla/modules/libpr0n/decoders/gif/nsGIFDecoder2.cpp,v  <-- 
nsGIFDecoder2.cpp
new revision: 1.66; previous revision: 1.65
done
Thanks!
Alfred, is there more work to do in this bug?  If not, can you mark this fixed?
 Thanks.
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
As this is checked into trunk and verified to work, can I be so bold to request
it to also check it in into the branch?
Flags: blocking1.8b5?
Flags: blocking1.8b4?
Flags: blocking1.8b5?
Flags: blocking1.8b5-
Flags: blocking1.8b4?
Flags: blocking1.8b4-
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee: stuart → alfredkayser
Status: REOPENED → NEW
Status: NEW → RESOLVED
Closed: 14 years ago14 years ago
Resolution: --- → FIXED
Verified FIXED as best I can tell.  This has been in the trunk for a few weeks
now, without crashers or other strange bugs.

Build 2005-09-11-19 on Windows XP SeaMonkey trunk.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.