Closed Bug 166862 Opened 19 years ago Closed 9 years ago

Valid (but huge) images crash mozilla in nsImageGTK::Init

Categories

(Core Graveyard :: Image: Painting, defect)

x86
Linux
defect
Not set
critical

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: tenthumbs, Unassigned)

References

Details

(Keywords: crash)

Attachments

(5 files, 4 obsolete files)

I hope this is the right component.

This is a companion to bug 153621.

All the common image formats (JPEG, GIF, and PNG) allow at least 16 bit
dimensions, but nsImageGTK assumes 1) that all calculations will fit into
int32's and 2) that all allocations will succeed. Neither is true on Linux 
so it's easy to crash mozilla.

In nsImageGTK::Init there is the code

  // create the memory for the image
  ComputeMetrics();

  mImageBits = (PRUint8*) new PRUint8[mSizeImage];


ComputeMetrics is in nsImageGTK.h and it does a nice job but it doesn't 
check for overflows so mSizeImage can be negative or the right value mod 
2^32. Neither case is good.

Mozilla doesn't use C++ exceptions so there's no way to trap operator new[].
If the underlying allocation fails you get system-dependent behavior. On 
Linux, the default exception handlers in libstdc++ just abort the process. 
Not good either.

There's also the question of whether mozilla should try to load a huge image
in the first place. The maximum contiguous space theoretically available to
a typical 32-bit Linux process is somewhat less than 2GB so loading the 
largest images is impossible. You probably don't want to load anything 
anywhere near that big so there has to be some limit imposed. Personally, I 
think there should be a pref.

How to fix this? The first thing is to change ComputeMetrics so it doesn't
overflow. It should return a value indicating success or failure. Since most
images aren't big, two executions paths should work,

   if (mWidth < 8192 && mHeight < 8192)
   {
       ... do fast integer calculations and test if small enough
   }
   else
   {
       ... do slow double precision floating point calculations and test
   }

Of course new[] has to be replaced by some flavor of malloc. This should be 
easy but I don't know for sure.

Because I can crash mozilla at will with valid data, setting severity to
critical.
Keywords: crash
>Of course new[] has to be replaced by some flavor of malloc. This should be 
>easy but I don't know for sure.

hm, not sure if that is correct. I think we're somehow telling the compiler to
return NULL, or maybe we have overridden operator new. anyway, I think the
result should be null-checked.

are you sure that mozilla terminates because of a uncaught exception?
> hm, not sure if that is correct. I think we're somehow telling the
> compiler to return NULL, or maybe we have overridden operator new.
> anyway, I think the result should be null-checked.

On Linux new[] is handled in libstc++ and it _always_ installs default 
exception handlers. The compiler has no control over the defaults.

In the situation here, operator new[] calls operator new which uses the
system malloc. If that returns NULL then the default handler aborts the
process. You are supposed to change this behavior with C++ try/catch
exception handling. Mozilla doesn't want to do that so it can't use new.

You also can't do null checking because you never see it.


> are you sure that mozilla terminates because of a uncaught exception?

Absolutely, at least for bad_alloc. Here's the last part of the stack trace
when I point mozilla at a 20000x20000 PNG.

#0 0x4055af41 in __kill () from /lib/libc.so.6
#1 0x40280d79 in pthread_kill (thread=1024, signo=6) at signals.c:65
#2 0x402811c7 in raise (sig=6) at signals.c:226
#3 0x4055c223 in abort () at ../sysdeps/generic/abort.c:88
#4 0x40513ec4 in __cxxabiv1::__terminate(void (*)()) (handler=0x4055c160
   <abort>) at
   /mnt1/GCC/3.1/gcc-3.1.1/libstdc++-v3/libsupc++/eh_terminate.cc:47
#5 0x40513f11 in std::terminate() () at
   /mnt1/GCC/3.1/gcc-3.1.1/libstdc++-v3/libsupc++/eh_terminate.cc:57
#6 0x40514081 in __cxa_throw () at
   /mnt1/GCC/3.1/gcc-3.1.1/libstdc++-v3/libsupc++/eh_throw.cc:77
#7 0x4051426f in operator new(unsigned) (sz=1079119808) at
   /mnt1/GCC/3.1/gcc-3.1.1/libstdc++-v3/libsupc++/new_op.cc:54
#8 0x4051436c in operator new[](unsigned) (sz=1200000000)
   at /mnt1/GCC/3.1/gcc-3.1.1/libstdc++-v3/libsupc++/new_opv.cc:36
#9 0x41243374 in nsImageGTK::Init(int, int, int, nsMaskRequirements)
   (this=0x8593010, aWidth=20000, aHeight=20000, aDepth=24,
   aMaskRequirements=nsMaskRequirements_kNoMask) at nsImageGTK.cpp:196

My sytem didn't have 1200000000 bytes available so malloc failed so the 
process bombs out immediately.

Note that Windows behavior is different. There, almost all requests succeed 
but fulfilling them can be a problem.
Attached file test code
Here's a simple exception handler test for Linux. If you run it with no
arguments, you get the default bad_alloc handler and it should abort. If there
are any arguments, it sets its own handler and should succeed. If you compile
it with -fno-exceptions then it crashes in either case.
Attached patch msImageGTK patch (obsolete) — Splinter Review
This is a patch for discussion. Mozilla no longer aborts on huge images but now
claims the images are corrupt. It's a start.
Since it's not possible to allocate a contiguous 2GB block on a typical Linux, I
decided to make that the maximuum for an image. There probably should be a pref
for a smaller limit. I wouldn't want mozilla to allocate 200MB for an image just
because some idiot web page wanted to.
Comment on attachment 101151 [details] [diff] [review]
msImageGTK patch

+    {	   // slow way; could use long longs.

why not use PRUint64.

also, why fail if the image is more than 4 GB. 64 bit platforms can handle >4GB
allocations just fine. I'm against introducing abitrary limits...
Intel x86 chips have few registers so their 64-bit support is marginal. The 
fpu stack has a lot more. These days there's little performance difference. 
It's not all that important here, anyway.

4GB is the theoretical maximum for a 32-bit address space. It's almost always 
less. A typical Linux has only a 3GB process space. Since almost all mozillas 
run on 32-bit systems I don't think it's worth allowing such sizes.
Right now I think it's more useful to find a better error for this case. 
Perhaps something like: "%s cannot be loaded because of a system allocation 
failure."
>4GB is the theoretical maximum for a 32-bit address space

this is why I said "on 64 bit platforms".

I would not do this check at all, and wait until later if the malloc fails.
Note that the goodies like mSizeImage are all int32's so they can't hold moree
than 2^31-1. That's the constraint I apply in ComputeMetrics, I just do it in
two stages. If you want to rewrite imagelib to use larger size variables, be
my guest. That's far beyond my abilities.
Attached patch improved patch (obsolete) — Splinter Review
Better error return value. Mozilla still says the image has errors but that's
coming from content and I don't have a clue how to fix that.
Attachment #101151 - Attachment is obsolete: true
Attached patch the right improved patch (obsolete) — Splinter Review
It helps a lot if I use the right patch.
Attachment #101397 - Attachment is obsolete: true
Comment on attachment 101406 [details] [diff] [review]
the right improved patch

a few comments:
+  // Right now 2GB-1 sizes are allowed
I don't think that comment makes sense at the place where it is. because at
that place, no such check is done.

Personally, I prefer if (!mImageBits) to if (mImageBits == nsnull)... but, this
doesn't really matter

finally, please use PRInt64/PRUint64 rather than double for the calculation.
see http://www.mozilla.org/projects/nspr/reference/html/Prtyp.html#14775
using a double for an integer type is just a dirty hack, imho.
> I don't think that comment makes sense at the place where it is. because 
> at that place, no such check is done.

I clarified it. The whole point is that a check should be done there.

> Personally, I prefer if (!mImageBits) to if (mImageBits == nsnull)... 
> but, this doesn't really matter

I always treat typedef'ed and #define'ed items as opaque quantities so I 
always compare to them explicitly. If nsnull were always 0 then it 
wouldn't be necessary or desireable. It's not essential that nsnull = 0 so 
your form isn't guaranteed to be identical to mine.

> finally, please use PRInt64/PRUint64 rather than double for the 
> calculation.

Why not. At one point I was worried about overflow but now that it's 
apparent that that's not going to happen, int64 is fine. It's slower than 
the floating point, though. Besides, the last patch was broken.

I'll attach a new patch but it's still not ready. There's the pref to 
limit the size. There's also the question of what to return if 
ComputeMetrics fails. Right now it returns NS_ERROR_OUT_OF_MEMORY which 
isn't really right because ComputeMetrics fails for administrative reasons 
not some system fault. I think I know how to create a new error number but 
I don't know how to create a string nor do I know how to report it to the 
user.

Returning NS_ERROR_OUT_OF_MEMORY for the malloc failures isn't really 
right either. It's not that the system is out of memory, it's that there 
isn't enough memory for the particular image. Same issues as previously, 
creating a new error and reporting it.

Finally, it's still an abomination that mozilla claims that the image is 
broken. A lxr search for "InvalidImage" shows that both layout and content 
have their stick fingers all over this. In fact, it's content that rather 
arrogantly assumes that any laod failure must be the image's fault.

All of this needs to be dealt with.

BTW, the other platforms all have the equivalent of ComputeMetrics and 
they all have the same overflow problems in the calculations. Someone 
should deal with that.

Attached patch patch 3Splinter Review
Attachment #101406 - Attachment is obsolete: true
firstly, no, that's not how you use PRInt64. please read the URL I gave you. you
have to use macros for things like multiplicating.

next, you can use PR_INT32_MAX instead of 2147483647

+      rb64= tmp64 >> 5 + (tmp64 & 0x1F) ?  1 : 0;
this could really do with a few parentheses
> firstly, no, that's not how you use PRInt64. please read the URL I gave you.

Actually, that's the wrong URL, you should have said prlong.html. I did look
at that and it's tres icky. Is there any platform using gtk where the compiler
doesn't understand int64?

> +      rb64= tmp64 >> 5 + (tmp64 & 0x1F) ?  1 : 0;
> this could really do with a few parentheses

yes, that should be
        rb64= (tmp64 >> 5) + ((tmp64 & 0x1F) ?  1 : 0);

Right now I don't have a lot of time for this and I'm spending that time on 
trying to figure out how to undo mozilla's theory that the image is corrupt. 
Any help would be appreciated.
"mass" re-assigning of bugs from pav to myself
Assignee: pavlov → jdunn
*** Bug 219632 has been marked as a duplicate of this bug. ***
Attached patch patch 4Splinter Review
this is tenthumb's patch, using 64-bit macros in all their glory.
this also handles the "fast" case first since it will be (by far) the most
common.
Hold up on this. There seems to be similar code in 
<http://lxr.mozilla.org/seamonkey/source/gfx/src/shared/gfxImageFrame.cpp#45> 
so some of this may not be necessary.
+      PRInt64 tmp64a, tmp64b, tmp64c, tmp64d;

sorry for not thinking about this earlier... but there is nsInt64, a C++ wrapper
around PRInt64 with overloaded operators, which might make this code easier to read.

I won't complain if you leave it as-is, though.

Attached patch patch5 (obsolete) — Splinter Review
The 64 bit macros are incomprehensible, at least to me. :-) I see I wasn't
thinking when I wrote this. Here's an equivalent patch with normal variables. I
also spotted some more new[] calls and replaced them.

I'm not sure all the testing is really necessary any more but it won't hurt
much. It bothers me that we have to replace new[] but I guess it's necessary.
> The 64 bit macros are incomprehensible, at least to me. :-) 

there's always nsInt64...

>It bothers me that we have to replace new[] but I guess it's necessary.

the entire codebase assumes that new won't throw an exception in OOM... we
should probably use new (std::nothrow), I guess...
Attached patch patch6Splinter Review
Let's try the patch that works this time.
Attachment #141459 - Attachment is obsolete: true
> we should probably use new (std::nothrow), I guess...

Will that work without exception handling?
as that flag tells new _not_ to throw an exception, I sure hope so.
This test program:

#include <new>
#include <cstdio>

using namespace std;

int main (void)
{
    char *cp1;
    char *cp2;

    cp1 = new (std::nothrow) char[1024 * 1024 * 1024];
    printf ("cp1 = %p\n", cp1);
    cp2 = new (std::nothrow) char[1024 * 1024 * 1024];
    printf ("cp2 = %p\n", cp2);

    return 0;
}

works with gcc 3.3.2 and gcc2.95.3 on Linux (cp2 is null). I have no idea if
this works with all the compilers mozilla uses. If it does, we could convert
mozilla (or at least parts of it) to this form.
gfxImageFrame.cpp checks that a conservative estimate of the memory
requirements (4*width*height) will not overflow a signed 32-bit integer.

The more serious and harder to solve problem on gtk is that the X Protocol
has numerous 16-bit signed and unsigned limits on coordinates and
dimensioning, and neither gdk nor our gfx code checks and works around
these problems for images.
I said this in another bug but there are three main issues:

1) exhausting the system virtual memory,
2) exhausting the process's free space, and
3) exhausting another process's space.

They interact with each other. This bug started out as a way to address case
2. I had to change ComputeMetrics only because it could overflow and there
were no other tests at the time. It could be simpler if it is absolutely
guaranteed that it won't overflow.

Someone somewhere has to try to protect against case 3. We could restrict
the largest dimension to 32767 but I'm sure someone will complain about
their 40000x2 image not working.
*** Bug 238401 has been marked as a duplicate of this bug. ***
Attached patch patch7Splinter Review
Basically patch6 with new (std::nothrow) instead of PR_malloc.

Mozilla won't crash but still reports images as broken when they're not.
are you sure (std::nothrow) is supported by all of our unix compilers? i was
working on a patch to macroize new to do this stuff, but never finished it (the
macroization was not fun)
I'm willing to guarantee that this _will_ break some compiler, but we'll never 
know unless we try. It does work in gcc 2.95.3 and 3.3.3 which is good. 
However, theoretically I should match new (std::nothrow) [] with a suitable 
delete []. I must be missing some thing but I can't find a syntax gcc likes, 
so this might break in some future version.

The real question is whether the standards require new[] ot throw exceptions. 
If so, then mozilla has a big problem. If not, then we can provide an ad hoc 
solution (as we do here). This is one of the few places where new[] has a 
significant chance of failure.

Note that I think it is an abomination that mozilla reports these images as 
broken. This patch will just create a lot of bugzilla noise. Mozilla really 
really needs to report the right errors.
i've used bezilla, i've already tried to cross that bridge.
mozilla has a big problem.
Blocks: 235196
*** Bug 317112 has been marked as a duplicate of this bug. ***
Assignee: jdunn → pavlov
QA Contact: tpreston
Assignee: pavlov → nobody
QA Contact: image.gfx
Duplicate of this bug: 406966
Duplicate of this bug: 421876
Product: Core → Core Graveyard
Found this bug when redirected from a duplicate bug.

Firefox 3.6.3 running on Slackware 13.0 (full patches) can be made to grind to a halt and lock the system when trying to load this image:

http://upload.wikimedia.org/wikipedia/commons/f/f0/Iridescent_Glory_of_Nearby_Helix_Nebula.jpg

No core dumps obtained.

It seems illogical on a Core 2 Duo 1.66GHz laptop with 2GB RAM running Xfce should be able to be brought to a halt just when loading this image (8.6MB).

Will try to run a strace.
Strace impossible: even in safe mode, the image loads, and the network slows indicating it's fully downloaded (watching in Xfce netload plugin), but firefox seems to have a run-away memory allocation issue with the image.  Image loads fine in GIMP (albeit > 700MB RAM used) and Gwenview (uses < 100MB RAM).
We don't have any crash reports in nsImageGTK at all in the last 4 weeks, and this bug is in a graveyard component, so I think this is not valid any more.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.