Mozilla should reset the X error handler to trap overallocation failures in gtk/gtk2 libraries [@ nsImageGTK::DrawCompositeTile ]

RESOLVED INCOMPLETE

Status

()

defect
--
critical
RESOLVED INCOMPLETE
16 years ago
8 years ago

People

(Reporter: s.a.moeller, Unassigned)

Tracking

({crash, testcase})

Trunk
x86
Linux
Points:
---
Dependency tree / graph
Bug Flags:
blocking1.9 -
wanted1.9 +
blocking1.8.1 -
blocking1.8.0.1 -
blocking1.8.0.2 -
blocking1.8.0.4 -

Firefox Tracking Flags

(Not tracked)

Details

(crash signature, )

Attachments

(6 attachments, 4 obsolete attachments)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4) Gecko/20030624

The website mentioned above uses a malformed javascript which makes Mozilla crash:
w=1024;
h=768;
window.open('','galwin','width='+w+40+',height='+h+40);

No talkback here.

Reproducible: Always

Steps to Reproduce:
See testcase.
Posted file Testcase
WFM 20030627 PC/WinXP
This is the error message I get:

Gtk-WARNING **: gtk_widget_size_allocate(): attempt to allocate widget with
width 36906 and height 11306
Gdk-ERROR **: BadAlloc (insufficient resources for operation)
  serial 16837 error_code 11 request_code 53 minor_code 0
wfm using Firebird 20030626 on WinXP.
Keywords: crash, stackwanted
Crashes my Linux trunk build. It looks like GTK gets very upset and aborts the
process.
Posted file stacktrace
Loading the attachment.
Interesting. This happens on gtk1 and gtk2 builds since in gdk.c we set
the error handler using XSetErrorHandler() to an internal handler which
prints out the error message and exits. Unfortunately this means mozilla
can never catch this error and infact can't get to it's errorhandling 
code

  XImage *ximage = XGetImage(dpy, drawable,
                             readX, readY, readWidth, readHeight,
  				AllPlanes, ZPixmap);

  NS_ASSERTION((ximage!=NULL), "XGetImage() failed");
  if (!ximage)
    return;

The only way i see round this is to reset the errorhandler after the
gdk_init/gtk_init to either a mozilla handler to handle X errror or set
it to NULL so no handler gets called and we can get to the errorpath 
if (!ximage). I think the former proposition is better.
I've changed the summary, because

a) The true cause of this bug is not the javascript being malformed (mixing up
arithmetical operation and string concatenation) but the large number resulting
from that malformed term.

b) This can lead to much more serious problems. If the width and heigth
parameters are extremely large (~100000, as in my testcase), Mozilla simply
crashes (not too dangerous). But with somewhat smaller values Mozilla seems to
allocate endless amounts of memory, rendering the whole system unusable. 

I've tried

  window.open('','win','width=10000,height=10000');

and the only way to get out of it was to kill the X server.
Summary: javascript window.open() with invalid width and height parameter crashes Mozilla → javascript window.open() with very large width and height parameter crashes Mozilla (or the whole system)
Stefan that's not a problem with mozilla that a client should not
be allowed to allocate a large amount of memory. You can get round that by
setting you limit/ulimit correctly in the shell. And anyhow, as i said
in my previous comment - mozilla will not be able to catch this error
of allocation failure since it is handled by the UI library (gtk/gtk2)
which calls exit instead of letting mozilla handle the error correctly.

So what i'm saying is that even limiting your stack in the 'mozilla' script
so the Xserver doesn't consume all the memory, we *still* will not be
able to catch the allocation error and handling it differently unless we
reset the errorhandler with our own one.
A more descriptive Summary would be

 Mozilla should reset the X error handler to trap overallocation failures in
gtk/gtk2 libraries
does it still happen with latest nightly build ?
Blocks: gtk2
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: stackwantedtestcase
Summary: javascript window.open() with very large width and height parameter crashes Mozilla (or the whole system) → Mozilla should reset the X error handler to trap overallocation failures in gtk/gtk2 libraries [@ nsImageGTK::DrawCompositeTile ]
still crashes with linux trunk 2004011007
Guys, this patch has been sitting in my folder for a while, so here it is in
all its glory which fixes both this bug and bug 267126 - the latter brought it
back to my memory banks.

Basically we use the gdk helper functions to catch the error instead of just
killing mozilla. We can then provide any form of dialog to warn the user that
the page will not be rendered as expected due to memory allocation faliure as a
result of bad html/javascript code. Currently (this patch) no user feedback is
given. We can do one of two things. 

1. Warn that the page will not be rendered as expected and carry on as best we
can perhaps resizing image/window to sensible values not larger than screen
real estate, or

2. Abort the page loading totally - i.e. don't display a partially (in)correct
page, and provide user feedback to that effect ?

Comments before submitting for sr/r ?
gtk
Assignee: general → blizzard
Component: DOM: Level 0 → GFX: Gtk
QA Contact: ashshbhatt → ian
Mitch: can you create a unified diff patch (diff -u), preferably from CVS?
Andrew, i'll do this later today. I want to clean it up slightly and provide a
feedback dialog. Will attach a unified cvs diff as requested.
Here is a slightly modified testcase to trigger this bug (for testing
purposes). It will not work on anything but mozilla/firefox since it uses a URI
resource:/res/broken-image.gif as a test image so it could be a standalone
testcase. 

Patch to follow.
Posted patch New patch for review (obsolete) — Splinter Review
unified patch against today's cvs.
Attachment #164302 - Attachment is obsolete: true
Comment on attachment 164836 [details] [diff] [review]
New patch for review

Mitch: you can get a diff from CVS with "cvs diff -u filename.cpp"
Attachment #164836 - Flags: review?(blizzard)
Posted patch As requested by Andrew. (obsolete) — Splinter Review
P.s. I changed my mind about the user dialog feedback (comment #16 and #13)
since we would be in a catch-22 situation if i tried to poup a dialog if we are
already into memory allocation failure hell. So the patch only does fprintf to
stderr now.
Attachment #164836 - Attachment is obsolete: true
*** Bug 267126 has been marked as a duplicate of this bug. ***
dveditz, this may be overflow but i can't get a stack.

testcase on the web is in bug 267126
Georgi how many times do i need to explain. There is no stack ! Mozilla is
exiting as designed (by the X library default handler). What are you expecting ? 
Comment on attachment 164836 [details] [diff] [review]
New patch for review

I'm not doing reviews at the moment, this should probably go to roc.
Attachment #164836 - Flags: review?(blizzard) → review?(roc)
I think the correct way to fix this is to put in a higher-level check on window
sizes to stop pages trying to create too-large windows.
Comment on attachment 164860 [details] [diff] [review]
As requested by Andrew.

roc: there's no guarantee that any given size will work, the xserver could just plain not have memory.
Attachment #164860 - Flags: review?(roc)
What are the performance implications of all those gdk_flush calls, if I might ask?  Those sound like a perf nightmare, especially with a non-local X server...
Posted patch updated for merge conflicts (obsolete) — Splinter Review
Attachment #164860 - Attachment is obsolete: true
Attachment #202942 - Flags: review?(roc)
Attachment #164860 - Flags: review?(roc)
*** Bug 314873 has been marked as a duplicate of this bug. ***
well, i'm not sure, it'd be nice if someone who has a profiler available could test it and find out (preferably 4 profiles: unpatched local server, unpatched remote server, patched local server, unpatched remote server).

i'm expecting they're fairly bad. in which case i'm willing to add code which only calls flush if we can detect that the server is local or a certain pref/env var is set. i'll only do this if someone is willing to accept that change. basically it would mean that users who don't want to crash, or who won't get much of a perf penalty would be protected by this code and people who would get a performance hit would have to opt for it or retain their current crashable state.

i think that minimo would want this :).
The other thing is that this code is going away in 1.9 so it may no be worth spending time on.
*** Bug 310917 has been marked as a duplicate of this bug. ***
*** Bug 314403 has been marked as a duplicate of this bug. ***
(In reply to comment #28)
> Created an attachment (id=202942) [edit]
> updated for merge conflicts
> 

This does not seem to fix bug 319235.
*** Bug 311215 has been marked as a duplicate of this bug. ***
*** Bug 315591 has been marked as a duplicate of this bug. ***
this handles the problem from bug 319235.  It also seems that gdk_draw_pixmap is not going to have the problem we're seeing here since it shouldn't allocate memory and there seems to be only one crashe in b.m.o with gdk_draw_pixmap and it's a real crash.
Attachment #202942 - Attachment is obsolete: true
Attachment #202942 - Flags: review?(roc)
*** Bug 317555 has been marked as a duplicate of this bug. ***
*** Bug 317555 has been marked as a duplicate of this bug. ***
*** Bug 321232 has been marked as a duplicate of this bug. ***
Comment on attachment 205126 [details] [diff] [review]
include other gdk_pixmap_new callers

let's play with this on trunk for a while. if we don't like it, we can arrange to not have the flushes or not have the flushes used for local connections before we release.
Attachment #205126 - Flags: superreview?(roc)
Attachment #205126 - Flags: review?(roc)
Blocks: 321581
Blocks: 321486
The issue is bad enough to a humble block status request.
Flags: blocking1.8.1?
Flags: blocking1.8.0.1?
Flags: blocking1.8.0.1? → blocking1.8.0.1-
*** Bug 322523 has been marked as a duplicate of this bug. ***
*** Bug 322599 has been marked as a duplicate of this bug. ***
+          if (NS_OK == rv)

These should be NS_SUCCEEDED.

The performance impact of these changes could be huge, even on local displays but worse on remote displays. I think we should measure Trender before they get landed --- ask vlad for help with that. This patch is really only useful if we plan to land it on some 1.8.x branch, since this code will not be used when we ship Gecko 1.9.
Blocks: 319235
*** Bug 325857 has been marked as a duplicate of this bug. ***
Flags: blocking1.8.0.3?
Flags: blocking1.8.0.2?
No reviewed, trunk-tested, patch, looks like it's missing 1.8.0.2
Flags: blocking1.8.0.2? → blocking1.8.0.2-
*** Bug 329094 has been marked as a duplicate of this bug. ***
I tested the patch of comment #42 but firefox still crashes for me with 1000x1 gif images:

X error code 11 was trapped [BadAlloc (insufficient resources for operation)]. Your page will not display correctly.

Gdk-ERROR **: The program 'firefox-bin' received an X Window System error.
This probably reflects a bug in the program.
The error was 'BadDrawable (invalid Pixmap or Window parameter)'.
  (Details: serial 11197 error_code 9 request_code 72 minor_code 0)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the --sync command line
   option to change this behavior. You can then get a meaningful
   backtrace from your debugger if you break on the gdk_x_error() function.)
aborting...
Anybody working on this?.

Current builds top crashes seems related to this. At least in my system.

Any way I can help?. I don't want to touch Mozilla code };-), but I can reproduce the issue fairly easily and have a lot of experience debugging.
it'd be helpful if you built two builds, one w/ the patch, one w/o and then arranged some perf tests. use two clean profiles and see how long basic things take:
1. starting the browser
2. opening a new tab
3. opening a new window
4. loading cnn.com (and other sites, list them...)
5. opening a sheet (if you're using firefox, the customize toolbar widget)
6. opening a dialog (if you're using suite, the open location and find dialogs)
7. loading pages that have lots of graphics
8. loading pages that have a number of plugins
9. loading pages with lots of animations (if you're using suite, the debug menu has some nice samples for animations).

if you can setup your system to use jprof or valgrind in some perf counting mode, that'd be good, since i'm not sure we want to just rely on wall clock watching.

that'd be a good start. there are also official page load test pages, if you can find someone to give you a list of pages that they approximate, that'd be good....

i'm not sure what kind of numbers we need to move this forward, if you can convince roc to specify numbers, that'd be great. but it's probably better for you to just give us raw numbers and see where we go.
How likely is it that something will land on the trunk with enough bake time to be considered for 1.8.0.3?
My guess is "not at all"; see comment 46.
Not blocking 1.8.0-anything without a well-tested trunk landing as a prerequisite
Flags: blocking1.8.0.3? → blocking1.8.0.3-
Firefoxe exits accessing some web sites. Same behavior with Thunderbird exiting when attempting to read some html messages.

Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.8.0.2) Gecko/20060419 Fedora/1.5.0.2-1.2.fc5 Firefox/1.5.0.2 pango-text
Thunderbird version 1.5 (20060313)
*** Bug 336434 has been marked as a duplicate of this bug. ***
*** Bug 336434 has been marked as a duplicate of this bug. ***
*** Bug 338333 has been marked as a duplicate of this bug. ***
No longer blocks: 267126
*** Bug 341289 has been marked as a duplicate of this bug. ***
*** Bug 40931 has been marked as a duplicate of this bug. ***
Not going to block 1.8.1 for this.  We'll reconsider if a suitable patch appears.
Flags: blocking1.8.1? → blocking1.8.1-
*** Bug 353517 has been marked as a duplicate of this bug. ***
Flags: blocking1.9? → blocking1.9-
Whiteboard: [wanted-1.9]
*** Bug 342278 has been marked as a duplicate of this bug. ***
Duplicate of this bug: 371003
Same problem on GFX thebes API:
AMD 500MHz, 128RAM, NoSwap.

Current Trunk Cairo Build:
./run-mozilla.sh ./TestGtkEmbed http://www.nih.gov/about/almanac/images/Johnson08091965a.jpg
...................
progress_change_cb cur 655360 max 4463349
progress_change_cb cur 720896 max 4463349
The program 'gecko' received an X Window System error.
This probably reflects a bug in the program.
The error was 'BadAlloc (insufficient resources for operation)'.
  (Details: serial 1848 error_code 11 request_code 53 minor_code 0)
...................


Current Trunk GTK2 Build:
progress_change_cb cur 4420749 max 4463349
progress_change_cb cur 4463349 max 4463349
./run-mozilla.sh: line 144:  2496 Killed                  "$prog" ${1+"$@"}

.............


Opera load this image without any problem and there are still a lot of memory...

MemTotal:       127136 kB
MemFree:          3732 kB
Buffers:          1244 kB
Cached:          37028 kB
SwapCached:          0 kB
Active:          92772 kB
Inactive:        20908 kB
HighTotal:           0 kB

Mem:    127136k total,   124460k used,     2676k free,     1820k buffers
Swap:        0k total,        0k used,        0k free,    41076k cached
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND                                     
 2508 romaxa    15   0 74520  54m 9444 S  0.3 43.8   0:09.94 opera                                       
 2062 root      15   0 26628 8636 1680 R  2.3  6.8   1:07.99 Xorg                                        
 2165 romaxa    15   0 36700 8116 2168 S  1.3  6.4   0:16.35 gnome-terminal                              
 2164 romaxa    15   0 17912 5092 2372 S  0.0  4.0   0:09.35 metacity         
progress_change_cb cur 720896 max 4463349  - gfx/src/thebes
progress_change_cb cur 4420749 max 4463349 - gfx/src/gtk

According this log current thebes build Thebes build require much more memory for loading this picture.... :(
http://www.nih.gov/about/almanac/images/Johnson08091965a.jpg
Normal PC:
Official FF Build thebes:
VmPeak:   197308 kB
VmSize:   120240 kB
VmHWM:     99080 kB
VmRSS:     22772 kB
VmData:    87996 kB  -  !

Not official FF Build Gtk2:
VmPeak:  120992 kB
VmSize:   62724 kB
VmHWM:   81804 kB
VmRSS:   23376 kB
VmData:   32964 kB   - !
Please file a separate bug for the memory usage issue?  cc me and dbaron, as well as vlad, pavlov, and roc, and request blocking1.9?
Blocks: 390768
Comment on attachment 205126 [details] [diff] [review]
include other gdk_pixmap_new callers

This patch is obsolete. Probably should file a new bug for this issue on Thebes.
Attachment #205126 - Flags: superreview?(roc)
Attachment #205126 - Flags: superreview-
Attachment #205126 - Flags: review?(roc)
Attachment #205126 - Flags: review-
Flags: wanted1.9+
Whiteboard: [wanted-1.9]
Has a new bug been filed for thebes ?
Product: Core → Core Graveyard
Duplicate of this bug: 491812
FYI: right now there is exists some "joke" in the wild which crashes Firefox users running under X Window system. At least, it works under x64 Linux.

To reproduce:
 Try to load URL: http://lady.mail.ru/img/bg_l.png

Result:
 browser crashes with X window system error.

Output:
The program 'firefox' received an X Window System error.
This probably reflects a bug in the program.
The error was 'BadAlloc (insufficient resources for operation)'.
  (Details: serial 345764 error_code 11 request_code 53 minor_code 0)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the --sync command line
   option to change this behavior. You can then get a meaningful
   backtrace from your debugger if you break on the gdk_x_error() function.)
Ubuntu Bug:
https://bugs.launchpad.net/bugs/374524

Same info as Comments #73 and #74.
Attachment 376627 [details] doesn't crash for me with Firefox 3.5.2.
Attachment 376627 [details] and http://lady.mail.ru/img/bg_l.png
 dose crash Firefox 3.5.3 for me.  However I have to do a "save this link as", or if the png opens in a new tab to a "save this page as".  Simplly pasting the URL into the browser doesn't crash firefox.

This is what I get when firefox crashes. 

process 21567: type array 97 not a basic type
process 21567: type array 97 not a basic type
process 21567: type invalid 0 not a basic type
process 21567: type invalid 0 not a basic type
process 21567: type struct 114 not a basic type
Assignee: blizzard → nobody
Component: GFX: Gtk → Graphics
Product: Core Graveyard → Core
QA Contact: ian → thebes
John Vivirito added the following comment to Launchpad bug #42683:

This bug is fairly old.
Can anyone confirm this still happens with latest Firefox-3.5 or 3.6?

-- 
http://launchpad.net/bugs/42683
NetBSD/i386 4.0, Firefox 3.5.2 here.  Attachment 376627 [details] does not crash FF, but it doesn't display either (URL in the address bar changes, but window contents stay the same).

However, FF does crash eventually -- especially during long browsing session.  A page with lots of large inline images seems to trigger these crashes (some pages on skyscrapercity.com, for example).  I'll try to reproduce this.
Here's the backtrace from recent crash.  I had set a breakpoint at gdk_x_error() and run FF with --sync.

Gdk-ERROR **: The program 'firefox-bin' received an X Window System error.
This probably reflects a bug in the program.
The error was 'BadAlloc (insufficient resources for operation)'.
  (Details: serial 199057 error_code 11 request_code 53 minor_code 0)
  (Note to programmers: normally, X errors are reported asynchronously;
   that is, you will receive the error a while after causing it.
   To debug your program, run it with the --sync command line
   option to change this behavior. You can then get a meaningful
   backtrace from your debugger if you break on the gdk_x_error() function.)
aborting...

Program received signal SIGABRT, Aborted.
0xba25f22f in kill () from /usr/lib/libc.so.12
(gdb) bt
#0  0xba25f22f in kill () from /usr/lib/libc.so.12
#1  0xba2fb338 in abort () from /usr/lib/libc.so.12
#2  0xba5f5b1d in g_logv () from /usr/pkg/lib/libglib-2.0.so.0
#3  0xba5f5b3b in g_log () from /usr/pkg/lib/libglib-2.0.so.0
#4  0xba8cafa9 in gdk_error_trap_push () from /usr/pkg/lib/libgdk-x11-2.0.so.0
#5  0xba471c73 in _XError () from /usr/pkg/lib/libX11.so.6
#6  0xba4736ee in _XReply () from /usr/pkg/lib/libX11.so.6
#7  0xba46da95 in XSync () from /usr/pkg/lib/libX11.so.6
#8  0xba46dc19 in _XSyncFunction () from /usr/pkg/lib/libX11.so.6
#9  0xba45203b in XCreatePixmap () from /usr/pkg/lib/libX11.so.6
#10 0xba8cbe5a in gdk_pixmap_foreign_new () from /usr/pkg/lib/libgdk-x11-2.0.so.0
#11 0xba89ba8d in gdk_pixmap_new () from /usr/pkg/lib/libgdk-x11-2.0.so.0
#12 0xbb94a74a in gfxPlatformGtk::CreateOffscreenSurface () from /usr/pkg/lib/firefox/libxul.so
#13 0xbb93ab33 in gfxPlatform::OptimizeImage () from /usr/pkg/lib/firefox/libxul.so
#14 0xbb815be6 in non-virtual thunk to nsPrintSession::Release() () from /usr/pkg/lib/firefox/libxul.so
#15 0xbfbfd7a8 in ?? ()
#16 0x08630600 in ?? ()
#17 0x0a175400 in ?? ()
#18 0x00000001 in ?? ()
#19 0x0abdfa9c in ?? ()
#20 0x0a0cc980 in ?? ()
#21 0x0b49ee00 in ?? ()
#22 0x0b49eed0 in ?? ()
#23 0x0b49eecc in ?? ()
#24 0xfffeefb4 in ?? ()
#25 0xbfbfd7e8 in ?? ()
#26 0xbbb9ed20 in ?? () from /usr/pkg/lib/firefox/libxul.so
#27 0x096ff000 in ?? ()
#28 0x0c212b00 in ?? ()
#29 0xbfbfd7d8 in ?? ()
#30 0xbb9224cb in NS_InvokeByIndex_P () from /usr/pkg/lib/firefox/libxul.so
#31 0xbb9224cb in NS_InvokeByIndex_P () from /usr/pkg/lib/firefox/libxul.so
#32 0xbb0b53c9 in DumpJSEval () from /usr/pkg/lib/firefox/libxul.so
#33 0xbb0c1a7e in DumpJSEval () from /usr/pkg/lib/firefox/libxul.so
#34 0xbb0c35ac in DumpJSEval () from /usr/pkg/lib/firefox/libxul.so
#35 0xbb0bd8f1 in DumpJSEval () from /usr/pkg/lib/firefox/libxul.so
#36 0xbb0b7a5f in DumpJSEval () from /usr/pkg/lib/firefox/libxul.so
#37 0xbafcb0cc in DumpJSEval () from /usr/pkg/lib/firefox/libxul.so
#38 0xbb02a3d1 in DumpJSEval () from /usr/pkg/lib/firefox/libxul.so
#39 0xbafb21c5 in DumpJSEval () from /usr/pkg/lib/firefox/libxul.so
#40 0xbafb2c0d in DumpJSEval () from /usr/pkg/lib/firefox/libxul.so
#41 0xbb8dca3f in NS_StringContainerInit2_P () from /usr/pkg/lib/firefox/libxul.so
#42 0xbb8f449c in NS_GetComponentManager_P () from /usr/pkg/lib/firefox/libxul.so
#43 0xbb8bc5d3 in GetSecurityContext () from /usr/pkg/lib/firefox/libxul.so
#44 0xbb802c8f in JSD_DebuggerOnForUser () from /usr/pkg/lib/firefox/libxul.so
#45 0xbb69f9b0 in DumpJSEval () from /usr/pkg/lib/firefox/libxul.so
#46 0xbaf38fca in XRE_main () from /usr/pkg/lib/firefox/libxul.so
#47 0x0804900a in _rtld_setup ()
#48 0x08048d44 in ___start ()
#49 0x08048ca7 in _start ()
If anyone wonders, BadAlloc happened because Xorg process had hit 'data seg size' limit...
Crash Signature: [@ nsImageGTK::DrawCompositeTile ]
Unable to reproduce so far on Firefox 6.0 under Ubuntu 11.10 alpha.
sam tygier added the following comment to Launchpad bug report 42683:

no crash for me with testcase https://bug210931.bugzilla.mozilla.org/attachment.cgi?id=126641 using official firefox 6 on ubuntu 11.04.

-- 
http://launchpad.net/bugs/42683
Two comments stating that the bug is not reproducible. Can we get an official confirm for this bug?
I imagine this was fixed way back when we started using Cairo. Please reopen if that's not the case.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
It was, indeed.  But lowering 'data seg size' limit exposes other bugs, somewhere in the JS engine.
You need to log in before you can comment on or make changes to this bug.