Closed Bug 586909 Opened 12 years ago Closed 12 years ago

Corruption and lockups in the UI when (new regression)

Categories

(Core :: Graphics, defect)

x86
Windows 7
defect
Not set
major

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- beta4+

People

(Reporter: sciguyryan, Assigned: bas.schouten)

References

Details

(Keywords: regression)

Attachments

(15 files, 1 obsolete file)

I'm using an Intel 17-930 processor and an Nvidia GeForce GTS 250 GPU with the 258.96 version drivers.

Basically this is a recent regression (in the last 2-3 builds) since this used to happen but it stopped. Usually it happens when quickly switching tabs or quickly reading many mail items in GMail.

The attachment shows the results, shortly afterwards it can crash (though it fails to submit a report) or the UI locks up and has to be manually closed.
blocking2.0: --- → ?
I don't see any attachment I'm afraid.
(In reply to comment #1)
> I don't see any attachment I'm afraid.

Sorry, my mistake.
We need a regression window on this. I suspect the GPU memcpy stuff, the main reason I suspect this is because we used to have a corruption problem on NVidia, when we used GPU memcpy for scrolling. Then with retained layers we stopped doing that, and the GPU memcpy patch re-introduced that. That's very circumstantial evidence though. and I don't seem to be able to reproduce this either on my ATI or my GT230M.
The corruption we're talking about here is the chrome background being black?
(In reply to comment #5)
> The corruption we're talking about here is the chrome background being black?

Correct although it does also affect items in the browser content too - such as images in the pages.

This is happening quite frequently but there are no exact steps to reproduce this, it's pretty much a hit-and-miss issue. Though it usually happens using the method I described in the original post.
(In reply to comment #6)
> (In reply to comment #5)
> > The corruption we're talking about here is the chrome background being black?
> 
> Correct although it does also affect items in the browser content too - such as
> images in the pages.
> 
> This is happening quite frequently but there are no exact steps to reproduce
> this, it's pretty much a hit-and-miss issue. Though it usually happens using
> the method I described in the original post.

Since I am unable to reproduce is there any chance you'd be willing/able to track down the regression changeset?
(In reply to comment #7)
> 
> Since I am unable to reproduce is there any chance you'd be willing/able to
> track down the regression changeset?

I can try but it is very hit and miss. So far it hasn't done it at all in the last few hours and yesterday it was crashing every half a hour or so. I'll see if we can get some more helpers from Mozillazine.

I have some possible crash dumps that didn't get submitted but if they could be useful let me know.
Coming up with good to steps to reproduce would also be very valuable.
(In reply to comment #9)
> Coming up with good to steps to reproduce would also be very valuable.

So far I cannot get one. It seems to just happen. Usually it happens when quickly opening and deleting many GMail items then switching to another tab.

Other than that I cannot get exact steps, not do the crash reports show anything useful.
Maybe what i'm experiencing is https://bugzilla.mozilla.org/show_bug.cgi?id=553269 and it's different from this bug because i get that since day one but i'll add it here too:

Basically i can't run the browser for more than 1/2 hours max with D2D enabled , it stops rendering anything, previews in aeropeek will turn black, part of the gui turns black or gray and it will crash. Breakpad never catched any of this crashes. In windows logs i've:

Faulting application name: firefox.exe, version: 2.0.0.3874, time stamp: 0x4c644470
Faulting module name: mozalloc.dll, version: 2.0.0.3874, time stamp: 0x4c61860d
Exception code: 0x80000003
Fault offset: 0x00001a19

Before it crashes both Vram and Vmem usage is very high. The Vram is around @ 430mb , i've a 512MB 8800gt, Vmem is usually @ 500/600MB.

I can't reproduce at will , a way to speedup the process is open alot highres images , it will crash sooner or later because i never got it to survive the next nightly.  

Now the crash got nastier because the whole windows gui freezes after ff crash and i'm forced to log off/on to revive the window manager. In the past it would just die with and error in D3D9x or D3D10x dll without messing with the OS
Also I sent a copy of the crash files to Bas and they are completely blank - so no help from there either.
(In reply to comment #11)
> Maybe what i'm experiencing is
> https://bugzilla.mozilla.org/show_bug.cgi?id=553269 and it's different from
> this bug because i get that since day one but i'll add it here too:

> I can't reproduce at will , a way to speedup the process is open alot highres
> images , it will crash sooner or later because i never got it to survive the
> next nightly.  
> 
> Now the crash got nastier because the whole windows gui freezes after ff crash
> and i'm forced to log off/on to revive the window manager. In the past it would
> just die with and error in D3D9x or D3D10x dll without messing with the OS

Interesting is the high VRAM usage and the lot of high-res images. This suggests your problem might be the NVidia Driver/DX not dealing with swapping in and out of VRAM properly.
Attached file Gpu-z Log
I was running gpu-z with logging enabled during the whole firefox session yesterday , i'll attach it , firefox was started at 21:50:03 and died at 23:1
I can reproduce it the following way:

1. go to http://images.google.com
2. use a search term that will give lots of hi-res pictures (for example, 'charlize theron', some nsfw images may appear)
3. in the options sidebar select Larger than -> 2MP
4. ctrl + click 20 thumbnails
5. ctrl + tab through all the tabs to make sure the images are loaded into memory
6. if it hasn't crashed yet, go to (4) using different thumbnails

I manage to consistently crash this way, but it's not easy at all to know exactly when.
It only becomes evident something is wrong when some elements start to disappear, including from the chrome. A crash is imminent after this.

Mobile Intel 4 Series, latest official drivers.
Forgot to mention, the windows gui is stable through the whole process.
Only minefield is affected.
(In reply to comment #16)
> Forgot to mention, the windows gui is stable through the whole process.
> Only minefield is affected.

That sounds right. Other apps are not affected by it here either. Could be the same issue.
It crashed for me too after opening enough images, this time the gui wasn't affected , still no breakpad and the same sig in windows log

  Fault Module Name:	mozalloc.dll
(In reply to comment #15)
> I can reproduce it the following way:
> 
> 1. go to http://images.google.com
> 2. use a search term that will give lots of hi-res pictures (for example,
> 'charlize theron', some nsfw images may appear)
> 3. in the options sidebar select Larger than -> 2MP
> 4. ctrl + click 20 thumbnails
> 5. ctrl + tab through all the tabs to make sure the images are loaded into
> memory
> 6. if it hasn't crashed yet, go to (4) using different thumbnails
> 
> I manage to consistently crash this way, but it's not easy at all to know
> exactly when.
> It only becomes evident something is wrong when some elements start to
> disappear, including from the chrome. A crash is imminent after this.
> 
> Mobile Intel 4 Series, latest official drivers.

This works well for me. Although memory usage goes up to 500 MB after opening 24 tabs with these high res images and playing around on them.

But I guess that's somewhat expected :-).

I wish I knew what exception 0x80000003 was.
Google turns up http://support.microsoft.com/kb/230176, which has

An application error has occurred and an application error log is being generated.

<Program.exe>
Exception: hardcoded-breakpoint (0x80000003), Address: 0x77f76274 

Looks like we're hitting the __debugbreak() in mozalloc_abort().
Oh, and it's bad if breakpad isn't catching this.  It's supposed to.
(In reply to comment #21)
> Oh, and it's bad if breakpad isn't catching this.  It's supposed to.

I should point out that it _has_ popped up once or twice but the files were completely blank. So something probably isn't working well there.
If the minidump files are empty there's not much we can do about this. We rely on the DbgHelp library to produce the minidumps. (The only thing that would probably help is rearchitecting Breakpad to write the minidumps from another process even when the main process crashes.)
So allocation is failing, could you check your amount of memory in use when you're experiencing this? We care about the Working Set, Private Working Set, Commit Size, Paged & Non-Paged Pool and the Commit Size.

If neither of those are approaching very high values it would seem like fragmentation is causing us to run out of address space.
Per comment #15,
NOTE: FF crashed so this memory usage is loading those images, bugzilla, and mozillaZine - nothing else before that.

1 . I opened up 10 such images as well as one 10 000 by 8 000 image
1a. Memory Private Working Set: 599MB
1b. Commit Size: 616MB

2 . After Tabbing Through all the images,
2a. Private Working Set: 605MB
2b. Commit Size: 622MB

3 . After the above as well as maximizing super large image,'
3a. Private Working Set: 650MB
3b. Commit Size: 666MB

4 . Scrolling Super Large Image
4a. Crash

5 . After Crash, Just super large image - zooming / scrolling, etc
5a. Image scrolly slow, EXTREMELY slow (!!!)
5b. Commit Size: 444MB
5b. Private Working Set: 426MB

SO,
1. Scrolling images    = HORRIBLE Performance (CPU rape) & Possible Crash.
2. Minimize / Maximize = Slow too & possible crash.
3. Memory Usage = A lot.
4. I got crashes w. 70 / 80 % RAM

Crash,
http://crash-stats.mozilla.com/report/index/bp-6236260d-a774-4d8d-9a00-00dc52100813

Super Large Image,
http://www.biomedcentral.com/content/supplementary/1471-2164-11-5-s3.jpeg

HP Mini 311, NV ION, latest WHQL, Win 7
When it started misbehaving (without crash):

working set: 1.585.000K
private ws: 1.325.000K
commit size: 1.379.000K
paged pool: 1.248K
NP pool: 233k

Physical mem in use: 77%
(all approximate because they were still fluctuating slightly)

In a previous run I tried to do the same but from about:memory, and just before going back to spawning charlize the values there were (in decimal MB):

allocated: 863
mapped: 922
committed: 882
dirty: 2
privatebytes: 1.295
workingset: 1.543
d2d/surfacecache: 298

I have VisualStudio 2k8 & 2k10 installed, btw. Windows offered to debug the crash once but it had no symbols (it was breaking on a INT 3). I can look for more information there if needed.

Oh, and I saw breakpad reacting for the first time but it gave an error on send. Despite this, it's not in the local pending list and about:crashes thinks it was submitted. Its id was bp-07ce98d3-1224-48d6-86b2-c9adf2100813.

Win7 x64, firefox in wow64
(In reply to comment #26)
> When it started misbehaving (without crash):
> 
> working set: 1.585.000K
> private ws: 1.325.000K
> commit size: 1.379.000K
> paged pool: 1.248K
> NP pool: 233k
> 
> Physical mem in use: 77%
> (all approximate because they were still fluctuating slightly)
> 
> In a previous run I tried to do the same but from about:memory, and just before
> going back to spawning charlize the values there were (in decimal MB):
> 
> allocated: 863
> mapped: 922
> committed: 882
> dirty: 2
> privatebytes: 1.295
> workingset: 1.543
> d2d/surfacecache: 298
> 
> I have VisualStudio 2k8 & 2k10 installed, btw. Windows offered to debug the
> crash once but it had no symbols (it was breaking on a INT 3). I can look for
> more information there if needed.
> 
> Oh, and I saw breakpad reacting for the first time but it gave an error on
> send. Despite this, it's not in the local pending list and about:crashes thinks
> it was submitted. Its id was bp-07ce98d3-1224-48d6-86b2-c9adf2100813.
> 
> Win7 x64, firefox in wow64

The INT 3 would likely be an allocation failure. The working set and such make it possible for this to be an OOM because of some fragmentation of the heap.
Could you compare how GDI response under the same conditions, memory usage wise (i.e. just without D2D)
(In reply to comment #25)
> Per comment #15,
> NOTE: FF crashed so this memory usage is loading those images, bugzilla, and
> mozillaZine - nothing else before that.
> 
> 1 . I opened up 10 such images as well as one 10 000 by 8 000 image
> 1a. Memory Private Working Set: 599MB
> 1b. Commit Size: 616MB
> 
> 2 . After Tabbing Through all the images,
> 2a. Private Working Set: 605MB
> 2b. Commit Size: 622MB
> 
> 3 . After the above as well as maximizing super large image,'
> 3a. Private Working Set: 650MB
> 3b. Commit Size: 666MB
> 
> 4 . Scrolling Super Large Image
> 4a. Crash
> 
> 5 . After Crash, Just super large image - zooming / scrolling, etc
> 5a. Image scrolly slow, EXTREMELY slow (!!!)
> 5b. Commit Size: 444MB
> 5b. Private Working Set: 426MB
> 
> SO,
> 1. Scrolling images    = HORRIBLE Performance (CPU rape) & Possible Crash.
> 2. Minimize / Maximize = Slow too & possible crash.
> 3. Memory Usage = A lot.
> 4. I got crashes w. 70 / 80 % RAM
> 
> Crash,
> http://crash-stats.mozilla.com/report/index/bp-6236260d-a774-4d8d-9a00-00dc52100813
> 
> Super Large Image,
> http://www.biomedcentral.com/content/supplementary/1471-2164-11-5-s3.jpeg
> 
> HP Mini 311, NV ION, latest WHQL, Win 7

Hrm, scrolling the superlarge image is fine for me. Although I'm surprised it's -that- slow for you. I guess the ATOM processor in that box isn't particularly fast, but it shouldn't be performing that badly.

I wonder why we're -still- not getting symbols for d2d1.dll
(In reply to comment #28)
> Hrm, scrolling the superlarge image is fine for me. Although I'm surprised it's
> -that- slow for you. I guess the ATOM processor in that box isn't particularly
> fast, but it shouldn't be performing that badly.
> 
> I wonder why we're -still- not getting symbols for d2d1.dll

Same here. it doesn't lad or crash for me - possibly a different bug? I had the crash report box again - another blank crash report.
I see regular GUI corruption with D2D on but I have not been crashing. Symptoms usually start with disappearing back/forward buttons, then black taskbar previews, the glitches get progressively worse until I'm forced to restart the browser. This has been happening for a long time though and is not a new problem.

The superlarge image doesn't crash here, on Intel Q6600 and ATI 5870 with 10.7 drivers. It does use as much CPU as it can get but scrolling stays pretty good.
(In reply to comment #30)
> I see regular GUI corruption with D2D on but I have not been crashing. Symptoms
> usually start with disappearing back/forward buttons, then black taskbar
> previews, the glitches get progressively worse until I'm forced to restart the
> browser. This has been happening for a long time though and is not a new
> problem.
> 

That most certainly is the same bug as I was originally reporting. It did happen a while ago but then stopped - in recent builds it is back and worse though.
Does everyone experiencing this see the same sort of problems if they open lots of blank tabs (just tab through them so they've been displayed at least once)?
(In reply to comment #32)
> Does everyone experiencing this see the same sort of problems if they open lots
> of blank tabs (just tab through them so they've been displayed at least once)?

That gives the same effect yes. Nice spotting.
And, just so we're all on the same page, at no point does Firefox crash, even if you've got lots and lots of empty tabs open? How big does Firefox get in memory, as shown by Task Manager?
(In reply to comment #34)
> And, just so we're all on the same page, at no point does Firefox crash, even
> if you've got lots and lots of empty tabs open? How big does Firefox get in
> memory, as shown by Task Manager?

My Firefox has had the UI lockup and sometimes crash when the corruption happens if I ignore it, but I don't get issues with the memory and whatnot (usually cannot check since if it crashes it's too late).
I have a Windows Debug dump file that I created while experiencing the issue. It is 800mb if you want it I'll put it somewhere that it can be downloaded.
Can someone try to find a regression window for when this got worse?
(In reply to comment #34)
> And, just so we're all on the same page, at no point does Firefox crash, even
> if you've got lots and lots of empty tabs open? How big does Firefox get in
> memory, as shown by Task Manager?

I reproduced the issue under those conditions and the memory readings were normal except for the working set, which I saw grow up to 1.7gb
One thing I noticed during this test is that tabbing through tabs doesn't seem affect the memory usage when D2D is off. With D2D on 1~3Mb were added to the working set each time I tabbed. Every created tab also took a bigger memory hit.

And I was able to crash it too by keeping a separate window open (with just the minefield start page). It's broken too, but sometimes it crashes when you try to interact with it (resizing, clicking everywhere, etc). The bug report fails to submit.
We have a patch we think will fix this currently on our project branch builds. We'll let you know as soon as it's done.
Assignee: nobody → bas.schouten
Status: NEW → ASSIGNED
Attachment #465920 - Flags: review?(vladimir)
blocking2.0: ? → beta4+
This also clears the layer managers. They hold on to the context as their default context and they can hold on to ThebesLayers with ThebesLayerBuffers which hold onto D2D surfaces as well.
Attachment #465920 - Attachment is obsolete: true
Attachment #465951 - Flags: review?(tellrob)
CC-ing roc so he knows about the LayerManager stuff I did.
Comment on attachment 465951 [details] [diff] [review]
Discard D2D surfaces upon hiding windows v2

>+#ifdef CAIRO_HAS_D2D_SURFACE
>+BOOL CALLBACK nsWindow::ClearD2DSurfaceCallback(HWND aWnd, LPARAM aMsg)
>+{
>+    nsWindow *window = (nsWindow*)::GetPropW(aWnd, GetNSWindowPropName());

nsWindow *window = nsWindow::GetNSWindowPtr(aWnd);

>+#ifdef CAIRO_HAS_D2D_SURFACE
>+  void                    ClearD2DSurface();
>+#endif

I don't think this needs to be public.

r+ with those fixed.
Attachment #465951 - Flags: review?(tellrob) → review+
it would be nice if someone would use:
https://developer.mozilla.org/En/How_to_get_a_stacktrace_with_Windbg

to get a stack trace, i'd suggest you grab this 32bit version of windbg (it's smaller than downloading an ISO):
http://msdl.microsoft.com/download/symbols/debuggers/dbg_x86_6.11.1.404.msi
As per Comment 44, I followed the STR from Comment 25 and Comment 32.

Opened up 20+ tabs of the super large image, quickly created a few dozen new tabs and held CTRL+TAB. I stopped occasionally to reload the super large images.

After 5 mins or so saw disappearing buttons and black status bar, then hang.
(In reply to comment #45)
> Created attachment 466051 [details]
> Stack Trace of a hang after following STR
> 
> As per Comment 44, I followed the STR from Comment 25 and Comment 32.
> 
> Opened up 20+ tabs of the super large image, quickly created a few dozen new
> tabs and held CTRL+TAB. I stopped occasionally to reload the super large
> images.
> 
> After 5 mins or so saw disappearing buttons and black status bar, then hang.

Hrm, is that with the latest nightly? I can't reproduce, it just becomes quite slow for me briefly. But it does that on GDI as well, double checking.
(In reply to comment #46)
> (In reply to comment #45)
> > Created attachment 466051 [details] [details]
> > Stack Trace of a hang after following STR
> > 
> > As per Comment 44, I followed the STR from Comment 25 and Comment 32.
> > 
> > Opened up 20+ tabs of the super large image, quickly created a few dozen new
> > tabs and held CTRL+TAB. I stopped occasionally to reload the super large
> > images.
> > 
> > After 5 mins or so saw disappearing buttons and black status bar, then hang.
> 
> Hrm, is that with the latest nightly? I can't reproduce, it just becomes quite
> slow for me briefly. But it does that on GDI as well, double checking.

I just got it to freeze for about a minute or so with GDI, that was after having about 40-50 tabs open with large images. I'm pretty sure D2D would to the same in that case though, I'm mainly interested in where D2D increases in memory significantly faster than GDI does.
And, to be clear, your testing should be done with the nightly of 2010-08-14 or later.
Attachment #466051 - Attachment mime type: application/octet-stream → text/plain
Attached file Stacktrace
Hang stacktrace xul!_cairo_surface_fallback_paint+1ff
(In reply to comment #47)
>I'm mainly interested in where D2D increases in
> memory significantly faster than GDI does.

The only memory difference I see when opening a few of tabs with the super large images is the malloc allocated,mapped, committed (but not dirty) being around 1GB (e.g. 1,026,197,262) with d2d compared to less than 50MB (e.g. 42,146,668) with GDI.

Am I correct in thinking that Bug 550475 will have an affect on the malloc size but that it won't be causing the corruption and lockups?
(In reply to comment #50)
> (In reply to comment #47)
> >I'm mainly interested in where D2D increases in
> > memory significantly faster than GDI does.
> 
> The only memory difference I see when opening a few of tabs with the super
> large images is the malloc allocated,mapped, committed (but not dirty) being
> around 1GB (e.g. 1,026,197,262) with d2d compared to less than 50MB (e.g.
> 42,146,668) with GDI.
> 
> Am I correct in thinking that Bug 550475 will have an affect on the malloc size
> but that it won't be causing the corruption and lockups?

1 GB? Are you talking 'a few' tabs or 'a lot' of tabs? And what does the D2D surface cache look like in about:memory? And the other surface statistics?
(In reply to comment #51)
>
> 1 GB? Are you talking 'a few' tabs or 'a lot' of tabs? And what does the D2D
> surface cache look like in about:memory? And the other surface statistics?

3 tabs of the super large image, each clicked on to expand to full size. Mallocs and gfx/surface/image will go up 300MB for each image, as shown in images/content/used/uncompressed.

This is on 14th Aug nightly with a clean profile. For the record I'm on win7 x64, also have an Nvidia 8800GTS v258.96 too, as I have read they have 'issues'.

Memory mapped: 1,032,847,360
Memory in use: 1,026,121,908
	malloc/allocated 1,026,128,908
	malloc/mapped 1,032,847,360
	malloc/committed 1,029,611,520
	malloc/dirty 1,716,224
	win32/privatebytes 1,072,992,256
	win32/workingset 1,040,535,552 
	xpconnect/js/gcchunks 4,194,304
	gfx/d2d/surfacecache 136,016 
	images/chrome/used/raw 0
	images/chrome/used/uncompressed 172,348 
	images/chrome/unused/raw 0 
	images/chrome/unused/uncompressed 0 
	images/content/used/raw 0
	images/content/used/uncompressed 332,806,548 
	images/content/unused/raw 0 
	images/content/unused/uncompressed 0 
	storage/sqlite/pagecache 1,972,880
	storage/sqlite/other 634,296 
	layout/all 584,856
	layout/bidi 0 
	gfx/surface/image 998,598,940
	gfx/surface/win32 0
And here's the widget-mode at 0 results:

Memory mapped: 33,554,432
Memory in use: 26,660,952
  malloc/allocated 26,667,920
  malloc/mapped 33,554,432
  malloc/committed 32,256,000
  malloc/dirty 2,793,472
  win32/privatebytes 1,073,713,152
  win32/workingset 1,093,201,920
  xpconnect/js/gcchunks 4,194,304
  gfx/d2d/surfacecache 0
  images/chrome/used/raw 0
  images/chrome/used/uncompressed 185,412
  images/chrome/unused/raw 0
  images/chrome/unused/uncompressed 0
  images/content/used/raw 0
  images/content/used/uncompressed 332,806,500
  images/content/unused/raw 0
  images/content/unused/uncompressed 0
  storage/sqlite/pagecache 1,939,976
  storage/sqlite/other 634,296
  layout/all 555,170
  layout/bidi 0
  gfx/surface/win32 998,604,012
  gfx/surface/image 10,108
(In reply to comment #53)
> And here's the widget-mode at 0 results:
> 
> Memory mapped: 33,554,432
> Memory in use: 26,660,952
>   malloc/allocated 26,667,920
>   malloc/mapped 33,554,432
>   malloc/committed 32,256,000
>   malloc/dirty 2,793,472
>   win32/privatebytes 1,073,713,152
>   win32/workingset 1,093,201,920
>   xpconnect/js/gcchunks 4,194,304
>   gfx/d2d/surfacecache 0
>   images/chrome/used/raw 0
>   images/chrome/used/uncompressed 185,412
>   images/chrome/unused/raw 0
>   images/chrome/unused/uncompressed 0
>   images/content/used/raw 0
>   images/content/used/uncompressed 332,806,500
>   images/content/unused/raw 0
>   images/content/unused/uncompressed 0
>   storage/sqlite/pagecache 1,939,976
>   storage/sqlite/other 634,296
>   layout/all 555,170
>   layout/bidi 0
>   gfx/surface/win32 998,604,012
>   gfx/surface/image 10,108

Thanks a -lot- these are extremely useful metrics. In the DirectWrite case we seem to be using a -load- of Win32 surfaces, in the D2D case a -load- of image surfaces. At this point I have no idea why, but this is going to be very useful.
Attached file Memory Data
This was some memory data I managed to collect before the browser locked up.

Not sure how this will help track down the lockup but if you guys find it useful at all.
Attached image Crash debugging screen
After a few hours of browsing normally I noticed some widgets disappearing briefly and immediately attached VS2010 to minefield and tried to get a stacktrace. Even when the chrome and content stops responding you can still use Alt+Space to resize the windows, move it minimize it, etc. Eventually it crashes. 

The pic attached is what I saw when VS caught an Access Violation exception. The locals window is shown expanded in the hopes you find it useful ("script stack space quota is exhausted" seems interesting. In the Output window there were a lot of lines with the following text:

First-chance exception at 0x7594b727 (KernelBase.dll) in firefox.exe: Microsoft C++ exception: _com_error at memory location 0x003bbe34..
First-chance exception at 0x7594b727 (KernelBase.dll) in firefox.exe: Microsoft C++ exception: _com_error at memory location 0x003bbf84..

My memory levels weren't that strange... I also cropped TaskManager with the relevant info and pasted it in the middle of the attached image.

Keep in mind this was during a normal browsing session of mine. All my extensions were running.
that crash isn't really related to your d2d stuff. and we really can't handle pictures of crashes, you can select the stack frames and copy them....
(In reply to comment #56)
> Created attachment 466096 [details]
> Crash debugging screen
> 
> After a few hours of browsing normally I noticed some widgets disappearing
> briefly and immediately attached VS2010 to minefield and tried to get a
> stacktrace. Even when the chrome and content stops responding you can still use
> Alt+Space to resize the windows, move it minimize it, etc. Eventually it
> crashes. 
> 
> The pic attached is what I saw when VS caught an Access Violation exception.
> The locals window is shown expanded in the hopes you find it useful ("script
> stack space quota is exhausted" seems interesting. In the Output window there
> were a lot of lines with the following text:
> 
> First-chance exception at 0x7594b727 (KernelBase.dll) in firefox.exe: Microsoft
> C++ exception: _com_error at memory location 0x003bbe34..
> First-chance exception at 0x7594b727 (KernelBase.dll) in firefox.exe: Microsoft
> C++ exception: _com_error at memory location 0x003bbf84..
> 
> My memory levels weren't that strange... I also cropped TaskManager with the
> relevant info and pasted it in the middle of the attached image.
> 
> Keep in mind this was during a normal browsing session of mine. All my
> extensions were running.

You should probably watch for -handled- exceptions of the serious kind (access violation, etc) since we have indication windows eats some of our exceptions and leaves the event loop in an invalid state, see bug 587406.

Watch for any handled exceptions except the _com_error exception you were quoting above, you can break on different sorts of exceptions by playing with Debug->Exceptions in VS under the 'Win32 exceptions'.
This is actually caused by bug 587406.
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → DUPLICATE
Duplicate of bug: 587406
Sorry for the bug spam , but this isn't really a dupe, BreakPad not being able to catch the crashes is sure a duplicate of that bug , but the main bug is about ui corruptions and crashes with D2D on probably caused by some kind of leak. There was a patch with r+ too
I see sorry i missed that , then it doesn't fix the problem because it crashed in the same way 5 mins ago with this build:

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b4pre) Gecko/20100816 Minefield/4.0b4pre Firefox/3.6.7
That's bug 587406. You ran out of memory but Firefox didn't crash.
(In reply to comment #62)
> I see sorry i missed that , then it doesn't fix the problem because it crashed
> in the same way 5 mins ago with this build:
> 
> Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b4pre) Gecko/20100816
> Minefield/4.0b4pre Firefox/3.6.7

Please note the method to reproduce on this bug is a little broken (it crashes GDI as well, atleast for me, with GDI it's slightly harder to trigger though, since it doesn't have to copy stuff into VRAM), please check with normal usage. Also note there isn't any leak involved here, just high memory usage because of all the large images being loaded in memory.
To be honest i don't crash only with the STR steps having 20 tabs open with
highres images. In that case the high memory usage is normal. 

I crash even closing the tabs. Let's say i go on deviantart i start opening 4/5
tabs watch the pic save what i like then close the tab and then open a new one.
Let's say i'm very bored and i do that for awhile and i diligently close every
tab . After awhile it will crash even having open a single tab, Vmem will grow
and grow during that process until it finally dies . Hopefully with https://bugzilla.mozilla.org/show_bug.cgi?id=587406 fixed  you'll see a flood of crash reports and it will be easier to understand what's going on.
Oh wow, that sounds bad, yes - but probably a separate bug. I am very much looking forward to getting those crash reports.

Thanks for dealing with this :)
(In reply to comment #54)

Is this stack trace of (1624.17ec): C++ EH exception - code e06d7363 (first chance) being triggered helpful? When repeatedly loading the large test image then the exception will keep getting thrown.
FWIW I just caught this one on mozalloc:

mozalloc.dll!mozalloc_abort(const char * const msg)  Line 77
mozalloc.dll!mozalloc_handle_oom()  Line 54 + 0xa bytes
xul.dll!_cairo_d2d_create_brush_for_pattern(_cairo_d2d_surface * d2dsurf, const _cairo_pattern * pattern, bool unique)  Line 1885 + 0xd bytes
xul.dll!_cairo_d2d_mask(void * surface, _cairo_operator op, const _cairo_pattern * source, const _cairo_pattern * mask, _cairo_clip * clip)  Line 2991 + 0xf bytes
xul.dll!_cairo_surface_mask(_cairo_surface * surface, _cairo_operator op, const _cairo_pattern * source, const _cairo_pattern * mask, _cairo_clip * clip)  Line 2056 + 0xb bytes
xul.dll!_cairo_gstate_mask(_cairo_gstate * gstate, _cairo_pattern * mask)  Line 1052 + 0x31 bytes
xul.dll!_moz_cairo_paint_with_alpha(_cairo * cr, double alpha)  Line 2157 + 0xe bytes
xul.dll!gfxContext::Paint(double alpha)  Line 748 + 0x11 bytes
xul.dll!nsCanvasRenderingContext2D::DrawImage(nsIDOMElement * imgElt, float a1, float a2, float a3, float a4, float a5, float a6, float a7, float a8, unsigned char optional_argc)  Line 3562
xul.dll!nsIDOMCanvasRenderingContext2D_DrawImage(JSContext * cx, unsigned int argc, unsigned __int64 * vp)  Line 3379
xul.dll!js::Interpret(JSContext * cx)  Line 4701
xul.dll!js::InvokeCommon<int (__cdecl*)(JSContext *,JSObject *,unsigned int,js::Value *,js::Value *)>(JSContext * cx, JSFunction * fun, JSScript * script, int (JSContext *, JSObject *, unsigned int, js::Value *, js::Value *)* native, const js::CallArgs & argsRef, unsigned int flags)  Line 572 + 0x6 bytes
xul.dll!js::Invoke(JSContext * cx, const js::CallArgs & args, unsigned int flags)  Line 694 + 0xd bytes
xul.dll!js::InternalInvoke(JSContext * cx, const js::Value & thisv, const js::Value & fval, unsigned int flags, unsigned int argc, js::Value * argv, js::Value * rval)  Line 734 + 0xf bytes
xul.dll!nsJSContext::CallEventHandler(nsISupports * aTarget, void * aScope, void * aHandler, nsIArray * aargv, nsIVariant * * arv)  Line 2248 + 0x51 bytes
xul.dll!nsGlobalWindow::RunTimeout(nsTimeout * aTimeout)  Line 8553
xul.dll!nsGlobalWindow::TimerCallback(nsITimer * aTimer, void * aClosure)  Line 8899
xul.dll!nsTimerImpl::Fire()  Line 425 + 0x7 bytes
xul.dll!nsTimerEvent::Run()  Line 519
xul.dll!nsThread::ProcessNextEvent(int mayWait, int * result)  Line 548
xul.dll!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate * aDelegate)  Line 143
xul.dll!MessageLoop::RunInternal()  Line 219 + 0x9 bytes
xul.dll!MessageLoop::RunHandler()  Line 203
xul.dll!EmitPropOp(JSContext * cx, JSParseNode * pn, JSOp op, JSCodeGenerator * cg, int callContext)  Line 2795 + 0xed bytes
xul.dll!nsAppShell::Run()  Line 249

It was a unhandled exception and breakpad got it afterwards:
http://crash-stats.mozilla.com/report/index/05302e94-f4b5-4b7d-9508-30fbb2100816

Memory usage was well below 1 gig but I can't recall the details.
(In reply to comment #68)
> FWIW I just caught this one on mozalloc:
> 
> mozalloc.dll!mozalloc_abort(const char * const msg)  Line 77
> mozalloc.dll!mozalloc_handle_oom()  Line 54 + 0xa bytes
> xul.dll!_cairo_d2d_create_brush_for_pattern(_cairo_d2d_surface * d2dsurf, const
> _cairo_pattern * pattern, bool unique)  Line 1885 + 0xd bytes
> xul.dll!_cairo_d2d_mask(void * surface, _cairo_operator op, const
> _cairo_pattern * source, const _cairo_pattern * mask, _cairo_clip * clip)  Line
> 2991 + 0xf bytes
> xul.dll!_cairo_surface_mask(_cairo_surface * surface, _cairo_operator op, const
> _cairo_pattern * source, const _cairo_pattern * mask, _cairo_clip * clip)  Line
> 2056 + 0xb bytes
> xul.dll!_cairo_gstate_mask(_cairo_gstate * gstate, _cairo_pattern * mask)  Line
> 1052 + 0x31 bytes
> xul.dll!_moz_cairo_paint_with_alpha(_cairo * cr, double alpha)  Line 2157 + 0xe
> bytes
> xul.dll!gfxContext::Paint(double alpha)  Line 748 + 0x11 bytes
> xul.dll!nsCanvasRenderingContext2D::DrawImage(nsIDOMElement * imgElt, float a1,
> float a2, float a3, float a4, float a5, float a6, float a7, float a8, unsigned
> char optional_argc)  Line 3562
> xul.dll!nsIDOMCanvasRenderingContext2D_DrawImage(JSContext * cx, unsigned int
> argc, unsigned __int64 * vp)  Line 3379
> xul.dll!js::Interpret(JSContext * cx)  Line 4701
> xul.dll!js::InvokeCommon<int (__cdecl*)(JSContext *,JSObject *,unsigned
> int,js::Value *,js::Value *)>(JSContext * cx, JSFunction * fun, JSScript *
> script, int (JSContext *, JSObject *, unsigned int, js::Value *, js::Value *)*
> native, const js::CallArgs & argsRef, unsigned int flags)  Line 572 + 0x6 bytes
> xul.dll!js::Invoke(JSContext * cx, const js::CallArgs & args, unsigned int
> flags)  Line 694 + 0xd bytes
> xul.dll!js::InternalInvoke(JSContext * cx, const js::Value & thisv, const
> js::Value & fval, unsigned int flags, unsigned int argc, js::Value * argv,
> js::Value * rval)  Line 734 + 0xf bytes
> xul.dll!nsJSContext::CallEventHandler(nsISupports * aTarget, void * aScope,
> void * aHandler, nsIArray * aargv, nsIVariant * * arv)  Line 2248 + 0x51 bytes
> xul.dll!nsGlobalWindow::RunTimeout(nsTimeout * aTimeout)  Line 8553
> xul.dll!nsGlobalWindow::TimerCallback(nsITimer * aTimer, void * aClosure)  Line
> 8899
> xul.dll!nsTimerImpl::Fire()  Line 425 + 0x7 bytes
> xul.dll!nsTimerEvent::Run()  Line 519
> xul.dll!nsThread::ProcessNextEvent(int mayWait, int * result)  Line 548
> xul.dll!mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate * aDelegate)
>  Line 143
> xul.dll!MessageLoop::RunInternal()  Line 219 + 0x9 bytes
> xul.dll!MessageLoop::RunHandler()  Line 203
> xul.dll!EmitPropOp(JSContext * cx, JSParseNode * pn, JSOp op, JSCodeGenerator *
> cg, int callContext)  Line 2795 + 0xed bytes
> xul.dll!nsAppShell::Run()  Line 249
> 
> It was a unhandled exception and breakpad got it afterwards:
> http://crash-stats.mozilla.com/report/index/05302e94-f4b5-4b7d-9508-30fbb2100816
> 
> Memory usage was well below 1 gig but I can't recall the details.

Can anyone come up with a scenario where this happens when memory usage is below 1 GB, other than running out of address space due to extreme fragmentation.
(In reply to comment #69)
> Can anyone come up with a scenario where this happens when memory usage is
> below 1 GB, other than running out of address space due to extreme
> fragmentation.

http://crash-stats.mozilla.com/report/index/ca896cb3-c37d-4bd7-b0fd-482a52100816

This best I can for for now, I will experiment further and possibly try a stack trace. That's from one about:memory tab and one of the super large image. I CTRL+TAB between them for a couple of minutes, interspersed with closing the image and then undoing close tab a couple of times. Memory working set was ~500MB.
I am this is a screenshot like 30secs before it crashed http://yfrog.com/f/7fmemusagej/, i had to take a screenshot because it wasn't painting anymenu, and that's a clear sign of the imminent crash. I've 4g of ram and if it's going out of memory or fragmenting it badly , it's doing something wrong for sure .
(In reply to comment #69)
> Can anyone come up with a scenario where this happens when memory usage is
> below 1 GB, other than running out of address space due to extreme fragmentation.

I have the graphical corruption that is a precursor to this problem @ 288 MB of usage. Seems to have crashed while typing this. Using Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b4pre) Gecko/20100817 Minefield/4.0b4pre
Okay guys. I think this should be un-duped from bug 587406.

I updated to todays nightly and not only did bug 587406 not fix it, but it still didn't help with crash reporting - I still get an error saying that the report cannot be sent and that Firefox is still running.

It is happening so often that I cannot browse decently with D2D turned on anymore so I've had to kill it until someone can come up with a fix for this I'm afraid.
(In reply to comment #73)
> Okay guys. I think this should be un-duped from bug 587406.
> 
> I updated to todays nightly and not only did bug 587406 not fix it, but it
> still didn't help with crash reporting - I still get an error saying that the
> report cannot be sent and that Firefox is still running.
> 
> It is happening so often that I cannot browse decently with D2D turned on
> anymore so I've had to kill it until someone can come up with a fix for this
> I'm afraid.

How high is the memory now when you crash?
Attached image Crash Box
This is the crash box I get every time I try to send one for this bug. Bas, the blank reports that I sent you before were the result of this.
(In reply to comment #74)
> How high is the memory now when you crash?

Not spectacularly large. I reported the usual memory data in attachment 466087 [details]. That was shortly before it crashed.
Attached image Normal
Another interesting thing I noticed - right before everything went haywire I had this sort of corruption in the main window, in this case GMail which as usually the first to get hit.
Attached image Before Crash
As you can see the checkboxes in the content get corrupted, shortly afterwards the back/forward buttons go then everything crashes.
I really have no idea how you'd get an abort_oom with so little memory usage :(.
(In reply to comment #79)
> I really have no idea how you'd get an abort_oom with so little memory usage
> :(.

Are you sure this is a memory issue? I am not hitting a crash that can be reported, or at least I cannot get a crash to report. They keep coming up blank.

Maybe it isn't a memory issue?
An interesting STR for what might be a memory/TabView problem. This happens with d2d on or off (dw at false and widget-mode at 0).

1.open a about:memory + any website (eg gmail) + super large image (for ease of observing a memory use increase). Check gfx/surface/image and images/content/used/uncompressed at 333mb
2. close super large image and check about:mem - gfx/surface/image and images/content/used/uncompressed are not cleared
3. close and reopen ff, and observe the two tabs you had open - go into tabview and the super large image is back


Compare with:
1. opening about:memory and only one other tab with the big image, then closing the large image gfx/surface/image and images/content/used/uncompressed are released and go right down to ~250,000 or so.
2. Close and reopen FF, and open TabView to find only the one tab that should be open.

For GDI substitute gfx/surface/win32 for gfx/surface/image

So it would seem like they're getting cleared properly only when there's no other tabs open. I can't tell if this is just a TabView problem or if it's just exacerbating a reluctance by gfx/surface/image and images/content/used/uncompressed to release and over a browsing session this is causing the memory fragmentation that has is causing the crashes.
I should point out that I've never hit a crash with a large image yet, this seems only to happen at random times, usually when browsing GMail.
I haven't touched tabview at all and i don't crash with D2D off, with D2D off i can open super large images and browse for hours without any crash. So what you see it's prolly a matter for another bug :)
(In reply to comment #83)
> I haven't touched tabview at all and i don't crash with D2D off, with D2D off i
> can open super large images and browse for hours without any crash. So what you
> see it's prolly a matter for another bug :)

I filled the original bug, I'm still not sure what causes it and so the memory issue should probably be filed under a separate bug - if this one gets re-opened in any event...
Ryan, running tonight's nightly do you get corruption _before_ the crash? or does Firefox simply crash right away?
(In reply to comment #85)
> Ryan, running tonight's nightly do you get corruption _before_ the crash? or
> does Firefox simply crash right away?

It can be both but the examples I describe in this bug are specifically before. For simplicity the other issue can be dealt with later and the corruption always occurs first. If it is ignored and I don't restart it always crashes shortly there after.

Usually after switching a tab or opening a link.
Confirming that behaviour, when the corruption starts to happen the browser is doomed it will crash pretty soon with the corruption getting worse and worse. Sometimes the whole gui stops responding before it crashes sometimes it happens in less than 20/30secs from the corruption , closing tabs will not help . 
Another different sign is that menus stops painting only a transparent rectangle is drawn , the file picker will show up blank etc
(In reply to comment #87)
> Confirming that behaviour, when the corruption starts to happen the browser is
> doomed it will crash pretty soon with the corruption getting worse and worse.
> Sometimes the whole gui stops responding before it crashes sometimes it happens
> in less than 20/30secs from the corruption , closing tabs will not help . 
> Another different sign is that menus stops painting only a transparent
> rectangle is drawn , the file picker will show up blank etc

Bingo. That is exactly what I see. I take it that yours appears seemingly randomly also?
Fairly random after browsing around for awhile , it never happened all of sudden without browsing less than one hour. But it's random like one hour two hours etc.
Blocks: 588166
Erm, why does this block bug 588166?
If anyone can come up with some tryserver builds to test possible fixes for this bug please post and let us know - I'll be happy to test them out for you. Or if there is anything you'd like me to try or whatnot.

For the time being I'm going have to keep D2D disabled as the browser is virtually unusable.
No longer blocks: 588166
Looks like breakpad finally fired and got something

http://crash-stats.mozilla.com/report/index/4ce96883-3927-49da-94bd-433092100818
Attachment #466355 - Attachment mime type: application/octet-stream → text/plain
I'm thinking this may not be a graphics issue (or at least not a D2D specific one) since I can still hit with with D2D turned off - although it does seem less common.

Something else is going on here - does anyone know who we can CC to see if they can see where this issue is coming from?
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
(In reply to comment #92)
> Looks like breakpad finally fired and got something
> 
> http://crash-stats.mozilla.com/report/index/4ce96883-3927-49da-94bd-433092100818

Were you using an hourly build? We don't have any symbols for that crash, so it's not useful.
Unfortunately yes, the nightly is not out yet
I agree that this seems to not be a duplicate of bug 587406. My only request is that people run the nightlies (once they're out!), especially now that D2D is preffed on by default, so we can get meaningful crash reports.
(In reply to comment #96)
> I agree that this seems to not be a duplicate of bug 587406. My only request is
> that people run the nightlies (once they're out!), especially now that D2D is
> preffed on by default, so we can get meaningful crash reports.

I still have had no luck getting a working crash report here, it keeps failing to submit.

Does anyone have a clue why it keeps generating blank reports? It would be supremely helpful in tracing this down since it seems a few people are hitting this issue and that it probably _is not_ D2D related at this point.
Attached file Stack trace
After Bug 587856 landed I tried to see if this bug might be triggered by as much surfacevram as possible. I went to digg.com and background loaded a couple of pages of stories and comments into tabs (~50 tabs), then checked to about:memory.

Malloc and win32 were high (1GB+), content/used/uncompressed and gfx/surface/image image at 250MB surfacevram at ~40MB. Closing tabs brings memory down to startup levels. except for malloc/mapped and the win32s, which like to stay in the 400-500MB region with all tabs closed.

Opening up tabs again so I have 50 or so I find when I switch through tabs and view d2d/surfacevram I can see it rises a bit (10-30MB or so) after I have viewed a new tab, and will rise above gfx/surface/image.

CTRL+TAB through tabs is ok until you see the UI start going black, e.t.c., after 20-30 tabs (it varies), so I quickly switch back to about:mem, d2d/surfacevram was 428MB on my 512MB card. FF then hung.

Attached a stack trace and here is the crash report that was generated after I detached bp-69940a29-677e-48a1-9492-165b52100818 (I hope it works)
the report is blank too, i had 4 crashes today, 3 were blank the only one with data was with an hourly . The blank report is matter of https://bugzilla.mozilla.org/show_bug.cgi?id=587406 i guess
Attachment #467236 - Attachment mime type: application/octet-stream → text/plain
What everyone here is seeing is the same thing I reported here:
https://bugzilla.mozilla.org/show_bug.cgi?id=587379

Have you tried turning off taskbar previews ?  It seems to help, but after 8 hrs today I hit the wall, started by lags switching tabs and typing in edit/text box, followed by UI blackouts, and full screen white-outs till I have to force kill the non-responding browser.  I have yet to actually crash, only lock up to point of needs to kill kill kill.

I'm just going to go ahead and dup my bug to this one.
Duplicate of this bug: 587379
(In reply to comment #97)
> Does anyone have a clue why it keeps generating blank reports? It would be
> supremely helpful in tracing this down since it seems a few people are hitting
> this issue and that it probably _is not_ D2D related at this point.

There's probably heap corruption happening, and since we write the minidumps in the same process, things are too screwed up to write a minidump, so we fail.
(In reply to comment #102)
> There's probably heap corruption happening, and since we write the minidumps in
> the same process, things are too screwed up to write a minidump, so we fail.

I get this all the time. Perhaps it would be useful if a report would be generated, that says that the minidump failed or something?
(In reply to comment #103)
> I get this all the time. Perhaps it would be useful if a report would be
> generated, that says that the minidump failed or something?

I thought we had a bug open on that, but I can't find one. The crash reporter should definitely have a better error message for "failed to produce a minidump".
It started happening to me. It usually takes about a hour to happen. First navigation button are starting to disappear as I switch tabs, then I crashes.
I have Nvidia GeForce 8400GS 512MB on 258.96 drivers using Windows 7 32bit.
Here's a screenshot before crash happened: http://img696.imageshack.us/img696/9940/memoryusage.jpg

Mozilla/5.0 (Windows NT 6.1; rv:2.0b5pre) Gecko/20100818 Minefield/4.0b5pre
Sorry I forgot, everytime crash happens, crash reporter is unable to send crash.
We have new memory usage reduction patches! See Bug 588690. Although it still worrysome we can't get a decent crash report, and those memory stats don't really look like an OOM.
(In reply to comment 107)

That's great news about the memory usage. I would like to direct people that are suffering this issue, but have low surfacevram use, to Bug 588166 as while trying get STR here I got a few JS related crashes that would produce similar effects like a disappearing forward/back button.
(In reply to comment #108)
> (In reply to comment 107)
> 
> That's great news about the memory usage. I would like to direct people that
> are suffering this issue, but have low surfacevram use, to Bug 588166 as while
> trying get STR here I got a few JS related crashes that would produce similar
> effects like a disappearing forward/back button.

I was the original reporter of this bug and I was quite clear that my issue was never a memory issue here. Will take a look at the other suggested bug though but it may not be the same since I have never got a single stack trace from it.
I've hit this 32 times in the last 4 days - that is pretty severe for any crash and we need to get a track on this.

Does anyone have any ideas?
Summary: [D2D] Corruption and lockups in the UI when D2D is enabled (new regression) → Corruption and lockups in the UI when (new regression)
I was browsing lots of high res images when I got this issue again. The Minefield status bar and Windows' taskbar previews went black. Oddly, if I clicked on a taskbar preview the entire window popped up on my other monitor, but I don't think it's related to the initial UI corruption. 

It didn't actually crash though. After continuing to mess around with it (in an attempt to get it to crash) the browser just became non-functional. Menus wouldn't render properly - similar to the link alt-text bug a while back. It basically stopped painting anything eventually. Appeared to be frozen and I didn't manage to get it to crash.

This happened right after I closed most of my tabs and then opened whose status bar went black. I immediately checked about:memory. Got the following:
 
malloc/committed 198,135,808
malloc/dirty 3,571,712
win32/privatebytes 296,767,488
win32/workingset 625,233,920
xpconnect/js/gcchunks 28,311,552
storage/sqlite/pagecache 6,049,640
storage/sqlite/other 1,258,144
gfx/d2d/surfacecache 68,941,172
gfx/d2d/surfacevram11 4,889,052
images/chrome/used/raw 0
images/chrome/used/uncompressed 229,328
images/chrome/unused/raw0images/chrome/unused/uncompressed 0
images/content/used/raw0images/content/used/uncompressed 69,460,436
images/content/unused/raw 0
images/content/unused/uncompressed 0
layout/all 2,799,433
layout/bidi 0
gfx/surface/image 69,321,436
gfx/surface/win320content/canvas/2d_pixel_bytes 0

I had Process Explorer open at the time by coincidence. Not sure if it's relevant but the "virtual size" reached 1.2 GB with 3-4 tabs open after closing about 10-15.
Attached file Stack trace
Attached is a stack trace of the crash while doing "normal" browsing, meaning I have no reproducible steps- sorry....
Unless I'm reading this wrong it points to Yarr as the culprit (or is it Yarr the caused it to crash after OOM happened?) which makes me a little hesitant about posting this stack trace (though it had all the "usual" signs- the status bar and other parts of the screen went black and crashed afterwards) - to make sure I'm not at fault here I'll try and reproduce it again tomorrow morning.
This is not caused by any single issue. I think our exception handling on win64 is still iffy.
Attached file another one
I see... 
Well, here's another stack trace- this time showing it going OOM on a Cairo d2d path... hopefully it'll help isolate the d2d side of the bug :)
(In reply to comment #114)
> Created attachment 467639 [details]
> another one
> 
> I see... 
> Well, here's another stack trace- this time showing it going OOM on a Cairo d2d
> path... hopefully it'll help isolate the d2d side of the bug :)

I think the D2D side of this bug is mainly that we run out of memory more quickly when using D2D :-)
Yeah, I think there's either still an OOM issue, or a heap corruption issue. Either way you wind up actually crashing in some random place that tries to allocate memory.
(In reply to comment #115)
> I think the D2D side of this bug is mainly that we run out of memory more
> quickly when using D2D :-)

Hehe fair enough guys :)
And i crashed with today nightly too , no breakpad faulting module mozalloc.dll as usually. This is weird my system wasn't OOM for sure , windows wasn't swapping , everything else was working fine. If the system was really memory starved other programs would start to crash or act weird too also my system has 4g of memory with a 8g page file, Firefox had plenty of mem to play with
(In reply to comment #118)
> And i crashed with today nightly too , no breakpad faulting module mozalloc.dll
> as usually. This is weird my system wasn't OOM for sure , windows wasn't
> swapping , everything else was working fine. If the system was really memory
> starved other programs would start to crash or act weird too also my system has
> 4g of memory with a 8g page file, Firefox had plenty of mem to play with

32 bit programs can only utilize 1.5-2.0gig of memory, no matter how much you have sitting on your system.
Has there been any improvements on this since our latest memory usage improvements?
(In reply to comment #120)
> Has there been any improvements on this since our latest memory usage
> improvements?

I've just tried CTRL+TAB through about 40 digg.com links and surfacevram was reported at 16MB on this nvidia 8400M GS 128MB, also working set was at a nice 400MB, so big thumbs up from me.
Might seem a silly question but would this have anything to do with bug 563088 which recently landed? IIRC, people were complaining this corruption happened a lot with hi-res images including myself.
(In reply to comment #122)
> Might seem a silly question but would this have anything to do with bug 563088
> which recently landed? IIRC, people were complaining this corruption happened a
> lot with hi-res images including myself.

Bug 563088 will improve general memory usage too, yes.
Memory usage has improved but it still crashes after awhile for me just crashed with today nightly . When it crashed memory usage wasn't nothing spectacular , the working set was at 521mb that was the only high value , gfx/d2d/surfacevram was low around 70mb. The way it crashes , the blank reports etc seem to point to heap/memory corruption rather than a real OOM issue
The blank report might also just be because of screwyness with WOW64; I suspect strongly that any crash at all can result in that behaviour, not just D2D-related crashes. Maybe we should reopen bug 587406?

Anyways, I'm going to resolve this fixed since I am crossing every appendage that the graphics side of it is.
Status: REOPENED → RESOLVED
Closed: 12 years ago12 years ago
Resolution: --- → FIXED
https://bugzilla.mozilla.org/show_bug.cgi?id=587406 is not fixed for sure , i never got any usefull report even after it was supposed to be fixed . Should we fill a new one for this nasty recurring crash without any blocking D2D stuff ?
I'm getting crashes which could be related to this bug : back/forward buttons disappear, then appear again, statusbar becomes black. Then I get recurrent freezes of a few seconds with one CPU core fully used, and I soon get a crash.

But all crashes are links to "http://crash-stats.mozilla.com/about/throttling" in about:crashes : when I click on them, the page keeps loading indefinitely (not even a network timeout).

I wasn't able to look at about:memory before crashing, but in Process Explorer working set and private bytes were at about 400MB and virtual size just below 1GB.
It just happened again : back/forward buttons flashed when switching tabs, then disappeared, status bar became black, I just had the time to switch to "about:memory" tab and refresh before the crash.

win32/privatebytes    440946688
win32/workingset      449507328
gfx/d2d/surfacecache    5286312
gfx/d2d/surfacevram    60107724

When submitting the crash I saw an error message in the dialog telling there had been a problem submitting the crash report.
Loïc, what build are you using? Very recent builds (tonight's) should be better on surfacevram.
Just got this problem again. Something's leaking big time. Virtual size hit 1.5
GB. 

malloc/allocated 108,538,334
malloc/mapped 316,669,952
malloc/committed 132,714,496
malloc/dirty 2,392,064
win32/privatebytes 211,869,696
win32/workingset 827,916,288
xpconnect/js/gcchunks 36,700,160
storage/sqlite/pagecache 6,619,160
storage/sqlite/other 1,247,888
gfx/d2d/surfacecache 367,872
gfx/d2d/surfacevram 39,279,364
images/chrome/used/raw 0
images/chrome/used/uncompressed 220,092
images/chrome/unused/raw 0
images/chrome/unused/uncompressed 0
images/content/used/raw 183,972
images/content/used/uncompressed 225,400
images/content/unused/raw 1,139,504
images/content/unused/uncompressed 1,902,279
layout/all 1,657,591
layout/bidi 0
gfx/surface/image 848,412
gfx/surface/win32 0
content/canvas/2d_pixel_bytes 720,000

I'm using changeset d0b284052d29 from a few hours ago (or less) - pretty sure
any recent patch related to mem usage is in this build.
(In reply to comment #129)
> Loïc, what build are you using? Very recent builds (tonight's) should be better
> on surfacevram.

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b5pre) Gecko/20100821 Minefield/4.0b5pre

Assuming about:memory is correct, surfacevram = 37.5MB does not seem very high (HD5870 1GB).
(In reply to comment #129)
> Loïc, what build are you using? Very recent builds (tonight's) should be better
> on surfacevram.

Mozilla/5.0 (Windows NT 6.1; WOW64; rv:2.0b5pre) Gecko/20100821 Minefield/4.0b5pre

Assuming about:memory is correct, surfacevram = 57MB does not seem very high (HD5870 1GB).
(In reply to comment #129)
> Loïc, what build are you using? Very recent builds (tonight's) should be better
> on surfacevram.

It could be fine. If he has 2 windows open on a 1920x1200 screen it could be explained. Or 3 if he had a smaller screen.

What we can't express is why thewolf86's working set is so huge. There's nothing else in there that can explain that. And considering it's not in the private working set it must be shared with some other process. If it's not plugin memory that it's shared with it might have something to do with a problem with the graphics driver. Make sure that's up to date (note your current version), and see if this is still happening.
(In reply to comment #133)
> (In reply to comment #129)
> > Loïc, what build are you using? Very recent builds (tonight's) should be better
> > on surfacevram.
> 
> It could be fine. If he has 2 windows open on a 1920x1200 screen it could be
> explained. Or 3 if he had a smaller screen.
> 

8 windows (total 18 tabs) on a 1680x1050 screen (not all are all full screen)

I showed all windows, surfacevram was at 140MB. Then i tried to display tabs one by one, and one window got freezed (There was no corruption, I wasn't able to change active tab, but another windows was fine), then it crashed.

On restart, surfacevram was 190MB, and after displaying all tabs, it's down to 155MB.
So it's higher than my first crash (57MB), where all windows but 2 were minimized.
I don't think the real amount of allocated memory is the problem, at least for me.
My video card is an nVidia 8800GT . The driver version I'm running is 258.96 which is the latest WHQL drivers. I'll try seeing if this happens with plugins disabled. Flash is the only thing I use consistently though
The Hi-res STR is now giving me this constantly:
http://crash-stats.mozilla.com/report/index/eb431b16-4153-4e85-97c0-49b752100821

Previously, I looked into the WOW64 crash report issues and installed the following hotfix: http://support.microsoft.com/kb/976038

Strangely enough I had a debugger attached but it didn't fire when it crashed.
Oops! With "constantly" I meant "consistently".
Sorry for the spam.
Another one, more like what thewolf86 had :
Scrolling up and down on this page increased working set and virtual size.

Just before the crash :
Virtual memory was about 1.6GB
malloc/allocated                243,102,446
malloc/mapped                   287,309,824
malloc/committed                282,066,944
malloc/dirty                      3,338,240
win32/privatebytes              358,002,688
win32/workingset              1,022,160,896
xpconnect/js/gcchunks            29,360,128
storage/sqlite/pagecache         20,911,584
storage/sqlite/other              1,102,048
gfx/d2d/surfacecache              1,607,124
gfx/d2d/surfacevram             159,223,744
images/chrome/used/raw                    0
images/chrome/used/uncompressed     208,924
images/chrome/unused/raw                  0
images/chrome/unused/uncompressed         0
images/content/used/raw             612,932
images/content/used/uncompressed  2,632,151
images/content/unused/raw                 0
images/content/unused/uncompressed        0
layout/all                       32,991,602
layout/bidi                               0
gfx/surface/image                 2,928,816
gfx/surface/win32                         0
content/canvas/2d_pixel_bytes     4,060,980

I had no corruptions until I wanted to check memory : I switched to about:memory tab, refreshed, and the status bar became black. It crashed less than 30s later.

After restarting I couldn't reproduce the problem, but now (after just spending some time to write this post), It's happening again : virtual size increased up to 1.3GB, then both working set and virtual size increase at the same time.
> Previously, I looked into the WOW64 crash report issues and installed the
> following hotfix: http://support.microsoft.com/kb/976038

Hmm, one of those 'by request' hotfixes ... wonder if it'll be included in the next service pack. Did you also enable the registry key? (either globally or for the firefox process) That looks to be a requirement for the hotfix to do anything.
Sadly I did not create the registry entries. I followed the explanation from this page (.NET but it all boils down to WndProc) and somehow skipped the full steps to install :/. I did include a simple firefox.exe.manifest file specifying win7 support.

The hotfix is straightforward to get, it's not one of those you have to call support. The request link is on the top of the page and they will instantly sent an email with a direct link.

I may have jumped the gun though. Maybe the changes in bug 587406 made having the crash report the crash properly?
I looked at memory map with VMMap :
All the increase I see when scrolling this page are in "Shareable" memory.
I see isolated 4KB pages allocated, but most of the usage comes from 4480KB blocks.
1.4GB virtual
800MB Shareable, including 150x4480KB blocks

I can't find who is allocating them : a conditional breakpoint on VirtualAllocEx does not catch them.
4480KB, that's almost certainly image data. Shareable memory, hrm. Doesn't ring a bell right now, but it's certainly helpful information! D2D doesn't allocate shareable memory but DX might. I'm not sure.
It seems to be my window size, it's about the same size, and it's bigger now with the window maximized.
(In reply to comment #143)
> It seems to be my window size, it's about the same size, and it's bigger now
> with the window maximized.

I wonder if something is wrong with the swap chain not being completely destroyed? Maybe that's considered shareable memory somehow? On the other hand I'd expect that to just be created in VRAM.
(In reply to comment #144)
> (In reply to comment #143)
> > It seems to be my window size, it's about the same size, and it's bigger now
> > with the window maximized.
> 
> I wonder if something is wrong with the swap chain not being completely
> destroyed? Maybe that's considered shareable memory somehow? On the other hand
> I'd expect that to just be created in VRAM.

Most of these blocks are not in the working set, but only committed.
I see one block of the same type allocated even with simple applications (not always, perhaps optimizations) probably for desktop composition.
(In reply to comment #145)
> (In reply to comment #144)
> > (In reply to comment #143)
> > > It seems to be my window size, it's about the same size, and it's bigger now
> > > with the window maximized.
> > 
> > I wonder if something is wrong with the swap chain not being completely
> > destroyed? Maybe that's considered shareable memory somehow? On the other hand
> > I'd expect that to just be created in VRAM.
> 
> Most of these blocks are not in the working set, but only committed.
> I see one block of the same type allocated even with simple applications (not
> always, perhaps optimizations) probably for desktop composition.

I wonder why we're apparently accumulating them.
Another way :
Resizing a window with a single Google tab is leaking a few backbuffers, but lots of 4KB Shareable blocks (resizing, even slowly, creates about 250 4KB blocks per second).
Because of the 32KB alignment, you can easily run out of address space : I just crashed with a 360MB working set, 760MB virtual space, but the 21000 4KB blocks were in fact taking 660MB of address space, and largest free block was only a few megabytes.
(In reply to comment #147)
> Another way :
> Resizing a window with a single Google tab is leaking a few backbuffers, but
> lots of 4KB Shareable blocks (resizing, even slowly, creates about 250 4KB
> blocks per second).
> Because of the 32KB alignment, you can easily run out of address space : I just
> crashed with a 360MB working set, 760MB virtual space, but the 21000 4KB blocks
> were in fact taking 660MB of address space, and largest free block was only a
> few megabytes.

So the question is why do you get these blocks, but I don't. And more importantly, -why- is it keeping them around. According to D3D we're not leaking any D3D10 resources. And your surfacevram seems to indicate we don't have any excessive surfaces allocated.
(In reply to comment #148)
> (In reply to comment #147)
> > Another way :
> > Resizing a window with a single Google tab is leaking a few backbuffers, but
> > lots of 4KB Shareable blocks (resizing, even slowly, creates about 250 4KB
> > blocks per second).
> > Because of the 32KB alignment, you can easily run out of address space : I just
> > crashed with a 360MB working set, 760MB virtual space, but the 21000 4KB blocks
> > were in fact taking 660MB of address space, and largest free block was only a
> > few megabytes.
> 
> So the question is why do you get these blocks, but I don't. And more
> importantly, -why- is it keeping them around. According to D3D we're not
> leaking any D3D10 resources. And your surfacevram seems to indicate we don't
> have any excessive surfaces allocated.

How do you count surfacevram ? Is it possible you decrement surfacevram since you release the surfaces, but reference count is keeping them in memory ?
How could I get information about the D3D10 resources ? I might be able to see the leaks.
(In reply to comment #149)
> (In reply to comment #148)
> > (In reply to comment #147)
> > > Another way :
> > > Resizing a window with a single Google tab is leaking a few backbuffers, but
> > > lots of 4KB Shareable blocks (resizing, even slowly, creates about 250 4KB
> > > blocks per second).
> > > Because of the 32KB alignment, you can easily run out of address space : I just
> > > crashed with a 360MB working set, 760MB virtual space, but the 21000 4KB blocks
> > > were in fact taking 660MB of address space, and largest free block was only a
> > > few megabytes.
> > 
> > So the question is why do you get these blocks, but I don't. And more
> > importantly, -why- is it keeping them around. According to D3D we're not
> > leaking any D3D10 resources. And your surfacevram seems to indicate we don't
> > have any excessive surfaces allocated.
> 
> How do you count surfacevram ? Is it possible you decrement surfacevram since
> you release the surfaces, but reference count is keeping them in memory ?
> How could I get information about the D3D10 resources ? I might be able to see
> the leaks.
Decrementing/incrementing it upon creation/release is indeed what we do.

It's possible, but it's a smart pointer and we only hold a single reference. But something internal -could- be keeping it alive, but then D3D10 should be reporting it in its leaked interfaces report when you shut it down.
(In reply to comment #150)

> It's possible, but it's a smart pointer and we only hold a single reference.
> But something internal -could- be keeping it alive, but then D3D10 should be
> reporting it in its leaked interfaces report when you shut it down.

With DirectX debug runtime :
I sometimes get an error telling ID3D10Device::CreateTexture2D has been called with Width = 0 and Height = 0.

I get some messages a lot of times, but it doesn't seem bad :
D3D10: INFO: ID3D10Device::PSSetShaderResources: A currently bound PixelShader ShaderResourceView is being deleted; so naturally, will no longer be bound. [ STATE_SETTING INFO #43: PSSETSHADERRESOURCES_UNBINDDELETINGOBJECT ]
D3D10: INFO: ID3D10Device::OMSetRenderTargets: A currently bound RenderTargetView is being deleted; so naturally, will no longer be bound. [ STATE_SETTING INFO #49: OMSETRENDERTARGETS_UNBINDDELETINGOBJECT ]

No error message on exit, but I don't see a lot of messages :
D3D10: INFO: ID3D10Device::PSSetShaderResources: A currently bound PixelShader ShaderResourceView is being deleted; so naturally, will no longer be bound. [ STATE_SETTING INFO #43: PSSETSHADERRESOURCES_UNBINDDELETINGOBJECT ]
D3D10: INFO: ID3D10Device::PSSetShaderResources: A currently bound PixelShader ShaderResourceView is being deleted; so naturally, will no longer be bound. [ STATE_SETTING INFO #43: PSSETSHADERRESOURCES_UNBINDDELETINGOBJECT ]
D3D10: INFO: ID3D10Device::OMSetBlendState: The currently bound BlendState is being deleted; so naturally, will no longer be bound. [ STATE_SETTING INFO #47: OMSETBLENDSTATE_UNBINDDELETINGOBJECT ]
D3D10: INFO: ID3D10Device::RSSetState: The currently bound RasterizerState is being deleted; so naturally, will no longer be bound. [ STATE_SETTING INFO #46: RSSETSTATE_UNBINDDELETINGOBJECT ]
D3D10: INFO: ID3D10Device::IASetVertexBuffers: A currently bound VertexBuffer is being deleted; so naturally, will no longer be bound. [ STATE_SETTING INFO #31: IASETVERTEXBUFFERS_UNBINDDELETINGOBJECT ]
D3D10: INFO: ID3D10Device::IASetInputLayout: The currently bound InputLayout is being deleted; so naturally, will no longer be bound. [ STATE_SETTING INFO #30: IASETINPUTLAYOUT_UNBINDDELETINGOBJECT ]
D3D10: INFO: ID3D10Device::PSSetSamplers: A currently bound PixelShader Sampler is being deleted; so naturally, will no longer be bound. [ STATE_SETTING INFO #45: PSSETSAMPLERS_UNBINDDELETINGOBJECT ]
D3D10: INFO: ID3D10Device::VSSetConstantBuffers: A currently bound VertexShader ConstantBuffer is being deleted; so naturally, will no longer be bound. [ STATE_SETTING INFO #35: VSSETCONSTANTBUFFERS_UNBINDDELETINGOBJECT ]
D3D10: INFO: ID3D10Device::VSSetShader: The currently bound VertexShader is being deleted; so naturally, will no longer be bound. [ STATE_SETTING INFO #33: VSSETSHADER_UNBINDDELETINGOBJECT ]
D3D10: INFO: ID3D10Device::PSSetShader: The currently bound PixelShader is being deleted; so naturally, will no longer be bound. [ STATE_SETTING INFO #42: PSSETSHADER_UNBINDDELETINGOBJECT ]
Shouldn't this be re-opened pending further investigation or a new bug maybe blocking this one ?
(In reply to comment #152)
> i finally got a working crash report after following the steps in
> http://support.microsoft.com/kb/976038
> 
> http://crash-stats.mozilla.com/report/index/e09e1727-4a49-4c15-8db4-b03972100822

You are using Win7 SP1 beta like me.
I just uninstalled it, and the virtual memory leaks are still the same, so our problems do not come from that.
Attached file PIX log
I tried running under PIX utility from DirectX SDK (Resource_Usage statistics), but it crashes.

A warning I do not have when not under PIX :
D3D10: WARNING: ID3D10Buffer::SetPrivateData: Existing private data of same name with different size found! [ STATE_SETTING WARNING #55: SETPRIVATEDATA_CHANGINGPARAMS ]

Last log entry:
D3D10: CORRUPTION: ID3D10Device::CopySubresourceRegion: First parameter is corrupt or NULL [ MISCELLANEOUS CORRUPTION #13: CORRUPTED_PARAMETER1 ]
(In reply to comment #155)
> Created attachment 468189 [details]
> PIX log
> 
> I tried running under PIX utility from DirectX SDK (Resource_Usage statistics),
> but it crashes.
> 
> A warning I do not have when not under PIX :
> D3D10: WARNING: ID3D10Buffer::SetPrivateData: Existing private data of same
> name with different size found! [ STATE_SETTING WARNING #55:
> SETPRIVATEDATA_CHANGINGPARAMS ]
> 
> Last log entry:
> D3D10: CORRUPTION: ID3D10Device::CopySubresourceRegion: First parameter is
> corrupt or NULL [ MISCELLANEOUS CORRUPTION #13: CORRUPTED_PARAMETER1 ]

The latter is believed to be a pixbug, I actually have a compile define in my local build that avoids the CopySubresourceRegion call for when I'm using pixwin.
Blocks: 589809
I've cloned bug 589809 for the unfixed issue with similar symptoms.
Issue is resolved - clearing old keywords - qa-wanted clean-up
You need to log in before you can comment on or make changes to this bug.