Closed Bug 619933 Opened 9 years ago Closed 9 years ago

Spike in crashes [@ gfxASurface::AddRef() ]

Categories

(Core :: Graphics, defect, critical)

x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
Tracking Status
blocking2.0 --- -

People

(Reporter: scoobidiver, Unassigned)

References

Details

(Keywords: crash, intermittent-failure, regression)

Crash Data

There is a spike in crashes in 4.0b9pre/20101217 build.
It is #1 top crasher in this build.
It happens close to startup.

Signature	gfxASurface::AddRef()
UUID	49dd00bc-2d71-4336-9bff-9e9b02101217
Time 	2010-12-17 09:05:11.109179
Uptime	6
Last Crash	242 seconds (4.0 minutes) before submission
Install Age	605 seconds (10.1 minutes) since version was first installed.
Product	Firefox
Version	4.0b9pre
Build ID	20101217030324
Branch	2.0
OS	Windows NT
OS Version	6.1.7600
CPU	x86
CPU Info	GenuineIntel family 6 model 15 stepping 13
Crash Reason	EXCEPTION_ACCESS_VIOLATION_READ
Crash Address	0x11
User Comments	
App Notes 	AdapterVendorID: 10de, AdapterDeviceID: 01d3
Tcpip MSAFD [TCP/IP] : 2 : 1 :
Tcpip MSAFD [UDP/IP] : 2 : 2 : %SystemRoot%\system32\mswsock.dll
Tcpip MSAFD [RAW/IP] : 2 : 3 :
MSAFD Tcpip [TCP/IPv6] : 2 : 1 : %SystemRoot%\system32\mswsock.dll
MSAFD Tcpip [UDP/IPv6] : 2 : 2 :
MSAFD Tcpip [RAW/IPv6] : 2 : 3 : %SystemRoot%\system32\mswsock.dll
Provedor de Serviço de TCPv6 de RSVP : 2 : 1 :
Provedor de Serviço de TCP de RSVP : 2 : 1 : %SystemRoot%\system32\mswsock.dll
Provedor de Serviço de UDPv6 de RSVP : 2 : 2 :
Provedor de Serviço de UDP de RSVP : 2 : 2 : %SystemRoot%\system32\mswsock.dll
MSAFD NetBIOS [\Device\NetBT_Tcpip_{9FC979E1-1D96-4507-AFD4-541E532859C8}] SEQPACKET 1 : 2 : 5 :
MSAFD NetBIOS [\Device\NetBT_Tcpip_{9FC979E1-1D96-4507-AFD4-541E532859C8}] DATAGRAM 1 : 2 : 2 : %SystemRoot%\system32\mswsock.dll
MSAFD NetBIOS [\Device\NetBT_Tcpip6_{79376D00-EC87-4567-A99E-B977DE93CA15}] SEQPACKET 3 : 2 : 5 :
MSAFD NetBIOS [\Device\NetBT_Tcpip6_{79376D00-EC87-

Frame 	Module 	Signature [Expand] 	Source
0 	xul.dll 	gfxASurface::AddRef 	gfx/thebes/gfxASurface.cpp:82
1 	xul.dll 	nsRefPtr<gfxSubimageSurface>::nsRefPtr<gfxSubimageSurface> 	obj-firefox/dist/include/nsAutoPtr.h:993
2 	xul.dll 	mozilla::layers::BasicCairoImage::GetAsSurface 	gfx/layers/basic/BasicImages.cpp:94
3 	xul.dll 	mozilla::layers::ImageLayerD3D9::RenderLayer 	gfx/layers/d3d9/ImageLayerD3D9.cpp:237
4 	xul.dll 	mozilla::layers::ContainerLayerD3D9::RenderLayer 	gfx/layers/d3d9/ContainerLayerD3D9.cpp:249
5 	xul.dll 	mozilla::layers::ContainerLayerD3D9::RenderLayer 	gfx/layers/d3d9/ContainerLayerD3D9.cpp:249
6 	xul.dll 	mozilla::layers::LayerManagerD3D9::Render 	gfx/layers/d3d9/LayerManagerD3D9.cpp:299
7 	xul.dll 	mozilla::layers::LayerManagerD3D9::EndTransaction 	gfx/layers/d3d9/LayerManagerD3D9.cpp:156
8 	xul.dll 	nsDisplayList::PaintForFrame 	layout/base/nsDisplayList.cpp:477
9 	xul.dll 	nsLayoutUtils::PaintFrame 	layout/base/nsLayoutUtils.cpp:1433
10 	xul.dll 	PresShell::Paint 	layout/base/nsPresShell.cpp:6093

The regression range is:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=a5413c3c1013&tochange=d1da1005b6d6

More reports at:
http://crash-stats.mozilla.com/report/list?product=Firefox&query_search=signature&query_type=exact&query=&range_value=4&range_unit=weeks&hang_type=any&process_type=any&plugin_field=&plugin_query_type=&plugin_query=&do_query=1&admin=&signature=gfxASurface%3A%3AAddRef%28%29
blocking2.0: --- → ?
#1 top crasher with 1230 crashes/buildday (#2 has only 125 crashes/buildday)
This was still the #1 crash over the weekend.
What's really strange here is that BasicCairoImage::GetAsSurface can never be called through this codepath! Very strange. Unless we're somehow passing in a BasicCairoImage here rather than a D3D9 cairo image. If that is the case this will be resolved by the patches on bug 615316.
Blocks: 604101
Depends on: 615316
We need to get this fixed as soon as possible. It's a startup crash and it's pretty high volume for the trunk.
(In reply to comment #5)
> A culprit could be bug 604101:
> http://hg.mozilla.org/mozilla-central/rev/c5aeb066cbb5

That can't be the culprit, that's GL-only code, and the crash is in Direct3D code.
Are you sure that's the regression range?  I don't see anything in there that could do this.
http://tinderbox.mozilla.org/showlog.cgi?log=Firefox/1292882127.1292882507.5493.gz
Rev3 WINNT 5.1 mozilla-central talos tp4 on 2010/12/20 13:55:27 
s: talos-r3-xp-016

NOISE: Cycle 1: loaded http://localhost/page_load_test/tp4/www.youtube.com/www.youtube.com/index.html (next: http://localhost/page_load_test/tp4/www.msn.com/www.msn.com/index.html)
NOISE: 
NOISE: __FAILbrowser non-zero return code (253)__FAIL
NOISE: Cycle 1: loaded http://localhost/page_load_test/tp4/www.youtube.com/www.youtube.com/index.html (next: http://localhost/page_load_test/tp4/www.msn.com/www.msn.com/index.html)
NOISE: 
NOISE: __FAILbrowser non-zero return code (253)__FAIL
NOISE: Found crashdump: c:\docume~1\cltbld\locals~1\temp\tmpj33nmh\profile\minidumps\ca0b5e6f-381d-449e-a065-0c1b6d25fc39.dmp
Operating system: Windows NT
                  5.1.2600 Service Pack 2
CPU: x86
     GenuineIntel family 6 model 23 stepping 10
     2 CPUs

Crash reason:  EXCEPTION_ACCESS_VIOLATION
Crash address: 0x11

Thread 0 (crashed)
 0  xul.dll!gfxASurface::AddRef() [gfxASurface.cpp:608fc8fa26dc : 82 + 0x0]
    eip = 0x1019331e   esp = 0x0012cd78   ebp = 0x0012cd98   ebx = 0x02b032d4
    esi = 0x00000001   edi = 0x05b732c0   eax = 0x0012cd94   ecx = 0x00000001
    edx = 0x02b01024   efl = 0x00010202
    Found by: given as instruction pointer in context
 1  xul.dll!nsRefPtr<gfxSubimageSurface>::nsRefPtr<gfxSubimageSurface>(gfxSubimageSurface *) [nsAutoPtr.h:608fc8fa26dc : 993 + 0x4]
    eip = 0x10508541   esp = 0x0012cd80   ebp = 0x0012cd98
    Found by: call frame info with scanning
(etc)
Blocks: 438871
Whiteboard: [orange]
I pushed more fixes to bug 615316. I think we have a good chance of those fixing this issue, although without knowing what caused a stack trace looking like this to appear in the first place it's hard to say for sure. Let's see what tomorrow's nightly brings.
> Are you sure that's the regression range?
I checked again. Yes, I am.

Other changeset culprits are (it is not bug 611433 as noted in changelog):
http://hg.mozilla.org/mozilla-central/rev/a977bda3a362
http://hg.mozilla.org/mozilla-central/rev/8dbba1a4f83f
(In reply to comment #10)
> I pushed more fixes to bug 615316. I think we have a good chance of those
> fixing this issue, although without knowing what caused a stack trace looking
> like this to appear in the first place it's hard to say for sure. Let's see
> what tomorrow's nightly brings.

many crashes still reported on 20101220030345 but I think that's before the check in. Only one crash on 20101221030401 so far.
Since the landing of the D3D patch of bug 615316, it became only #23 top crasher in today's build.

Stack traces are slightly different from the one in comment 0:
Frame 	Module 	Signature [Expand] 	Source
0 	xul.dll 	gfxASurface::AddRef 	gfx/thebes/gfxASurface.cpp:82
1 	xul.dll 	nsRefPtr<gfxWindowsSurface>::nsRefPtr<gfxWindowsSurface> 	obj-firefox/dist/include/nsAutoPtr.h:993
2 	xul.dll 	mozilla::layers::BasicCairoImage::GetAsSurface 	gfx/layers/basic/BasicImages.cpp:94
3 	xul.dll 	mozilla::layers::ImageLayerD3D9::RenderLayer 	gfx/layers/d3d9/ImageLayerD3D9.cpp:352
...
(In reply to comment #11)
> Other changeset culprits are (it is not bug 611433 as noted in changelog):
> http://hg.mozilla.org/mozilla-central/rev/a977bda3a362
> http://hg.mozilla.org/mozilla-central/rev/8dbba1a4f83f

Can't be those either, they're Mac-only.

In that regression range, the following files were changed in gfx; basically layers, GL, and mac/Quartz stuff.

--- a/gfx/layers/ImageLayers.h
--- a/gfx/layers/d3d10/LayerManagerD3D10.cpp
--- a/gfx/layers/d3d9/ImageLayerD3D9.cpp
--- a/gfx/layers/d3d9/LayerManagerD3D9.cpp
--- a/gfx/layers/d3d9/LayerManagerD3D9.h
--- a/gfx/layers/d3d9/Nv3DVUtils.cpp
--- a/gfx/layers/d3d9/Nv3DVUtils.h
all 3D video changes; D3D10 change is comment only.

--- a/gfx/layers/opengl/CanvasLayerOGL.cpp
--- a/gfx/layers/opengl/CanvasLayerOGL.h
--- a/gfx/layers/opengl/ImageLayerOGL.cpp
--- a/gfx/layers/opengl/ImageLayerOGL.h
--- a/gfx/layers/opengl/LayerManagerOGL.cpp
--- a/gfx/layers/opengl/LayerManagerOGL.h
--- a/gfx/layers/opengl/ThebesLayerOGL.cpp
--- a/gfx/thebes/GLContext.cpp
--- a/gfx/thebes/GLContext.h
--- a/gfx/thebes/GLContextProviderCGL.mm
--- a/gfx/thebes/GLContextProviderEGL.cpp
--- a/gfx/thebes/GLContextProviderGLX.cpp
--- a/gfx/thebes/GLContextProviderWGL.cpp
--- a/gfx/thebes/GLContextSymbols.h
--- a/gfx/thebes/GLDefs.h
OpenGL only; unlikely to ever be used in these cases

--- a/gfx/cairo/cairo/src/cairo-quartz-private.h
--- a/gfx/cairo/cairo/src/cairo-quartz-surface.c
--- a/gfx/cairo/cairo/src/cairo-quartz.h
--- a/gfx/cairo/cairo/src/cairo-rename.h
--- a/gfx/thebes/gfxQuartzSurface.cpp
--- a/gfx/thebes/gfxQuartzSurface.h
mac-only/irrelevant

Given that, I think we have random heap scribbling, maybe?  But it's Windows-only, so we can't really valgrind (maybe wine can help, I know jseward was doing some valgrinding that way).
Note that I don't see how bug 615316 can cause any of this; the call stacks look fine, so we're using a correct image container and doing fallback.  That bug is essentially an optimization, or a fix for acase where nothing was being rendered.  Crashing in addref should not be a symptom of bug 615316.

Is it possible to get access to that dump from the tinderbox?  Would be useful to know if the pointer is null or what.
Also this could maybe be caused by someone using a gfx*Surface directly on the stack instead of by reference.
correlation to time of day when the reports are received and the url list shows this seems to affect russian and brazillian users more than most.

new mo betta urls for testing
 101 \N
  95 http://www.yandex.ru/
  72 http://mail.ru/
  43 http://www.globo.com/
  42 http://search.localstrike.com.ar/
  41 http://www.uol.com.br/
  39 http://www.mail.ru/
  36 http://www.ukr.net/
  36 http://www.rambler.ru/
  36 http://www.mail.ru/cnt/5089
  29 http://search.conduit.com/?ctid=XXXXXXX&SearchSource=13
  28 http://www.plusnetwork.com/
  27 http://www.mail.ru/cnt/5087
  24 jar:file:///C:/Program%20Files/Minefield/omni.jar!/chrome/browser/content/browser/aboutSessionRestore.xhtml
  24 jar:file:///C:/Arquivos%20de%20programas/Minefield/omni.jar!/chrome/browser/content/browser/aboutSessionRestore.xhtml
  24 http://www.yandex.ru/?clid=XXXXX
  19 http://www.google.com.br/
  18 http://www.yandex.ru/?clid=XXXXX
  17 http://www.terra.com.br/portal/
  16 http://mail.ru/cnt/XXX
  15 http://www.mail.ru/cnt/XXXX
  15 http://search.conduit.com/?ctid=XXXXXXXXX&SearchSource=13
  14 about:blank
  13 http://www.orkut.com/Logout?msg=0&hl=pt-BR

also might be a bit easier to reproduce on xp if we to find a site out of the list above.

os breakdown
gfxASurface::AddRef..Total 1791
Win5.1  0.77
Win6.0  0.09
Win6.1  0.13

volume is down, but it seems down even in builds that we have seen a lot of crashes in.

         gfxASurface::AddRef..
date     total    breakdown by build
         crashes  count build, count build, ...

20101220 1791  	897 4.0b9pre2010121903, 
        		668 4.0b9pre2010122003, 	133 4.0b9pre2010121803, 
        		88 4.0b9pre2010121703, 	2 4.0b72010110414, 
        		1 3.6.132010120307, 	1 3.62010011514, 
        		1 3.1b32009030423, 
20101221 1139  	837 4.0b9pre2010122003, 
        		175 4.0b9pre2010121903, 	67 4.0b9pre2010121703, 
        		51 4.0b9pre2010121803, 	7 4.0b9pre2010122103, 
        		1 4.0b8pre2010121403, 	1 4.0b72010110414, 
20101222 377  	199 4.0b9pre2010122003, 
        		112 4.0b9pre2010121903, 	27 4.0b9pre2010121803, 
        		15 4.0b9pre2010121703, 	11 4.0b9pre2010122203, 
        		10 4.0b9pre2010122103, 	1 4.0b72010110414, 
        		1 3.6.132010120307, 	1 3.62010011514,
There was a huge spike, but it seems to have almost gone away, strangely. For now, this doesn't block.
blocking2.0: ? → -
There have been no crashes in 4.0b9pre for the last week. I close it as WFM.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
Crash Signature: [@ gfxASurface::AddRef() ]
Whiteboard: [orange]
 Le faccette dentali sono una soluzione indolore, veloce e a prezzi concorrenziali che sarà in grado di risolvere tutti i difetti del vostro sorriso http://www.mostrakandinsky.it/. In tempi rapidi potrete dimenticare il fastidio di denti scheggiati o irregolari, diastema e antiestetiche alterazioni del colore. Non siete di Torino, ma vorreste ugualmente ritrovare un denti splendidi in pochissimo tempo
You need to log in before you can comment on or make changes to this bug.