Closed Bug 469564 Opened 16 years ago Closed 16 years ago

Firefox hangs (99% CPU usage) when opening images in own page at deviantART

Categories

(Core :: Graphics: ImageLib, defect, P1)

x86
Windows XP
defect

Tracking

()

RESOLVED DUPLICATE of bug 468160

People

(Reporter: morac, Assigned: joe)

Details

Attachments

(2 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b2) Gecko/20081201 Firefox/3.1b2
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1b2) Gecko/20081201 Firefox/3.1b2

After upgrading from Firefox 3.0.4 to Firefox 3.1b2 I'm seeing frequent hangs when opening images in their own page at http://www.deviantart.com and then either closing the page or switching to a different tab.  It's not 100% reproducible, but if I try I can get it to occur within a minute.

I originally noticed the problem would occur sometimes after I opened an image on it's own tab, saved it and then closed the tab.  I later determined that it occurs even without saving and basically switching off the tab with the image (close or switch tabs) is what triggers the hang.

I reproduced the problem running the Firefox 3.2a1pre 20081213 nightly in "safe-mode" with all extensions and plugins disabled.

So far the only site I've seen this at is www.deviantart.com.  I tried using Google Image search to find a bunch of large images and then open them in new tabs and then click the link at the top to open the images in the tabs, but I couldn't get the browser to hang.  I did get the browser to use 99% CPU for a short period while an image was loading, but once it loaded the CPU dropped back down to 0.

Reproducible: Didn't try

Steps to Reproduce:
1. Go to http://www.deviantart.com
2. In the gallery middle click on a bunch of the images to open their pages in new tabs.
3. In the new tabs click on the download link on the left (if available).
4. Switch to the next tab.
5. If the browser hasn't hung at this point repeat steps 3 and 4 until it does or you run out of tabs.  If you run out of tabs just close all existing opened tabs and begin again at step 1.  Sometimes clearing the browser cache speeds up getting the hang results.
Actual Results:  
Firefox will hang and begin to use 99% of the CPU.  It will not lock up the system so other processes will be able to still get CPU time, but the browser itself basically stops responding to the system.

Expected Results:  
Firefox should not hang.  It does not in Firefox 3.0.4.

I used Microsoft's process explorer (formerly from SysInternal) to look at the process and saw that the main Firefox.exe thread was the one sucking up the CPU time.  The stack seemed to change each time I looked at it, but here's a few of them:

ntkrnlpa.exe+0x6a7bf
ntkrnlpa.exe!PsDereferencePrimaryToken+0x362
ntkrnlpa.exe!KiDeliverApc+0xb3
hal.dll+0x2c35
xul.dll!gfxSkipCharsIterator::gfxSkipCharsIterator+0x4789

ntkrnlpa.exe+0x6a7bf
ntkrnlpa.exe!PsDereferencePrimaryToken+0x362
ntkrnlpa.exe!KiDeliverApc+0xb3
hal.dll+0x2c35
xul.dll!gfxFontStyle::~gfxFontStyle+0x1a25

ntkrnlpa.exe!KiUnexpectedInterrupt+0x8d
ntkrnlpa.exe!PsDereferencePrimaryToken+0x362
ntkrnlpa.exe!KiDeliverApc+0xb3
hal.dll!HalClearSoftwareInterrupt+0x341
xul.dll!gfxSkipCharsIterator::gfxSkipCharsIterator+0x4799


As Firefox didn't actually crash, no crash report was generated, but I did manage to get a Dr. Watson dump, which I'll attach.

I'll note that I do have Google Desktop installed, but when testing the component files it installs in the Firefox components directory were not there and I had closed Google Desktop.
Crash dump and stack listings generated by Dr. Watson under Windows XP SP3.
http://developer.mozilla.org/en/docs/How_to_get_a_stacktrace_with_WinDbg

so, process explorer is compatible w/ the windbg symbol server. if you want to use it instead of windbg, the same paths can be used. not getting a valid stack based on a symbol server is mostly a waste of my time.

Loading Dump File [C:\TEMP\wza58e\Firefox 3.1b2 first.dmp]
User Mini Dump File: Only registers, stack and portions of memory are available

Comment: 'Dr. Watson generated MiniDump'
Symbol search path is: SRV*c:\symbols*http://symbols.mozilla.org/firefox;SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
Executable search path is: 
Windows XP Version 2600 (Service Pack 3) UP Free x86 compatible
Product: WinNt, suite: SingleUserTS
Debug session time: Sun Dec 14 20:13:36.000 2008 (GMT+2)
System Uptime: not available
Process Uptime: 0 days 0:05:29.000

xul!std::vector<nsRefPtr<imgCacheEntry>,std::allocator<nsRefPtr<imgCacheEntry> > >::empty+0xd
xul!imgCacheQueue::Pop+0x14
xul!imgLoader::PutIntoCache+0x2bb775
xul!imgLoader::sCacheQueue
WARNING: Frame IP not in any known module. Following frames may be wrong.
0x12f708
0x365b9d0
0x3b694c8

not a very interesting stack
10000000 10a17000   xul      T (private pdb symbols)  c:\symbols\xul.pdb\E219B8913AE742F0A4188063610CA0492\xul.pdb

yes, the symbol server served symbols.

i get the same stack for second.dmp

314 already_AddRefed<imgCacheEntry> imgCacheQueue::Pop()
315 {
316   if (mQueue.empty())

not quite sure what 316 means in the stack, the same line appears in the 1.9.1 branch. - joe seems to own that file.

Comment: 'Dr. Watson generated MiniDump'
Symbol search path is: SRV*c:\symbols*http://symbols.mozilla.org/firefox;SRV*c:\symbols*http://msdl.microsoft.com/download/symbols
Executable search path is: 
Windows XP Version 2600 (Service Pack 3) UP Free x86 compatible
Product: WinNt, suite: SingleUserTS
Debug session time: Sun Dec 14 20:43:11.000 2008 (GMT+2)
System Uptime: not available
Process Uptime: 0 days 0:03:08.000

xul!nsRefPtr<imgCacheEntry>::~nsRefPtr<imgCacheEntry>
xul!imgLoader::PutIntoCache+0x2ca6b5
WARNING: Frame IP not in any known module. Following frames may be wrong.
0x3082aa0

these are mostly useless.
Assignee: nobody → joe
Component: General → ImageLib
QA Contact: general → imagelib
I tried installing the Win Debugger and stopped Firefox 3.1b2 while it wasn't responding.  I can't make heads or tails out of any of the stuff since I've never done debugging on Windows, but I'll attach the stack traces I got in an attachment.  About the only thing I can see is that it seems to be stuck in imgLoader::PutIntoCache.

If there's something else I can do once it hangs that will make debugging this more useful, please tell me.

Also I'm assuming you can't recreate this?
Using the steps I posted in the original post, I reproduced this problem on a completely different computer (my work PC) running "Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2a1pre) Gecko/20081215 Minefield/3.2a1pre"

On this PC, Firefox hangs and the CPU usage goes to 50%, presumably because this PC has a dual core processor while the PC I originally saw the problem has a single core processor.  This PC does not have Google Desktop on it so that eliminates any possibility that caused it.

So so far I've recreated the problem on two different PCs:

1. Dell Inspiron 9300 laptop with 2 GB of RAM, an Intel Pentium M 1.6 GHz processor and a nVidia GeForce GO 6800 graphics chip.  The laptop runs Windows XP SP3.  Running as administrator.

2. IBM ThinkCenter desktop with 1 GB of RAM, an Intel Pentium 4 3.0 GHz processor and a built in Intel 82945G integrated graphics chip.  The desktop runs Windows XP SP2.  Running as restricted user.
This is bad.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: blocking1.9.1?
Yes, very.
Flags: blocking1.9.1? → blocking1.9.1+
Priority: -- → P1
In thinking about this, I believe this is caused by bug 89419. I'm going to finish fixing that, and then recheck this bug.
Was bug 89419 a problem in Firefox 3.0.4?  The reason I ask is that I can't reproduce this issue in Firefox 3.0.4.
Strictly speaking yes, but the fix for bug 89419 fixes bugs that were exposed by it when I rewrote part of imagelib. Since that rewrite didn't exist in Firefox 3, this bug won't exist there either.
I'm getting lots hangs on Vista 64 with xul!imgCacheQueue::Pop on the top of the stack (not always with 100% cpu usage). It seems to happen more or less randomly, but I could reproduce with the reporter steps.

I tried the steps with a debug build, and I'm getting flooded with assertions in imglib (not sure if this is related):

###!!! ASSERTION: Queue and cache sizes out of sync!: 'queuesize == cachesize',
file c:/dev/moz/central/mozilla/modules/libpr0n/src/imgLoader.cpp, line 494
imglib2!imgLoader::LoadImage+0x0000000000000018 (c:\dev\moz\central\mozilla\modu
les\libpr0n\src\imgloader.cpp, line 959)
gklayout!nsContentUtils::LoadImage+0x0000000000000152 (f:\dev\moz\central\mozill
a\content\base\src\nscontentutils.cpp, line 2417)
gklayout!nsImageLoadingContent::LoadImage+0x00000000000002FE (f:\dev\moz\central
\mozilla\content\base\src\nsimageloadingcontent.cpp, line 605)
gklayout!nsImageLoadingContent::LoadImage+0x0000000000000169 (f:\dev\moz\central
\mozilla\content\base\src\nsimageloadingcontent.cpp, line 509)
gklayout!nsHTMLImageElement::BindToTree+0x0000000000000101 (f:\dev\moz\central\m
ozilla\content\html\content\src\nshtmlimageelement.cpp, line 562)
gklayout!nsGenericElement::doInsertChildAt+0x000000000000045B (f:\dev\moz\centra
l\mozilla\content\base\src\nsgenericelement.cpp, line 3249)
gklayout!nsGenericElement::InsertChildAt+0x0000000000000055 (f:\dev\moz\central\
mozilla\content\base\src\nsgenericelement.cpp, line 3194)
gklayout!nsINode::AppendChildTo+0x000000000000002A (f:\dev\moz\central\ff_debug\
dist\include\content\nsinode.h, line 338)
gklayout!SinkContext::Node::Add+0x00000000000000C6 (f:\dev\moz\central\mozilla\c
ontent\html\document\src\nshtmlcontentsink.cpp, line 909)
gklayout!SinkContext::AddLeaf+0x000000000000005E (f:\dev\moz\central\mozilla\con
tent\html\document\src\nshtmlcontentsink.cpp, line 1164)
gklayout!SinkContext::AddLeaf+0x000000000000025F (f:\dev\moz\central\mozilla\con
tent\html\document\src\nshtmlcontentsink.cpp, line 1098)
gklayout!HTMLContentSink::AddLeaf+0x0000000000000057 (f:\dev\moz\central\mozilla
\content\html\document\src\nshtmlcontentsink.cpp, line 2413)
gkparser!CNavDTD::AddLeaf+0x000000000000005D (f:\dev\moz\central\mozilla\parser\
htmlparser\src\cnavdtd.cpp, line 3046)

And then:

###!!! ASSERTION: Queue and tracker sizes out of sync!: 'queuesize == trackersiz
e', file c:/dev/moz/central/mozilla/modules/libpr0n/src/imgLoader.cpp, line 495
imglib2!imgLoader::LoadImage+0x0000000000000018 (c:\dev\moz\central\mozilla\modu
les\libpr0n\src\imgloader.cpp, line 959)
gklayout!nsContentUtils::LoadImage+0x0000000000000152 (f:\dev\moz\central\mozill
a\content\base\src\nscontentutils.cpp, line 2417)
gklayout!nsImageLoadingContent::LoadImage+0x00000000000002FE (f:\dev\moz\central
\mozilla\content\base\src\nsimageloadingcontent.cpp, line 605)
gklayout!nsImageLoadingContent::LoadImage+0x0000000000000169 (f:\dev\moz\central
\mozilla\content\base\src\nsimageloadingcontent.cpp, line 509)
gklayout!nsHTMLImageElement::BindToTree+0x0000000000000101 (f:\dev\moz\central\m
ozilla\content\html\content\src\nshtmlimageelement.cpp, line 562)
gklayout!nsGenericElement::doInsertChildAt+0x000000000000045B (f:\dev\moz\centra
l\mozilla\content\base\src\nsgenericelement.cpp, line 3249)
gklayout!nsGenericElement::InsertChildAt+0x0000000000000055 (f:\dev\moz\central\
mozilla\content\base\src\nsgenericelement.cpp, line 3194)
gklayout!nsINode::AppendChildTo+0x000000000000002A (f:\dev\moz\central\ff_debug\
dist\include\content\nsinode.h, line 338)

at this point the console keeps scrolling and no interaction with the browser is possible.
Sylvain, are you using a 1.9.1 or mozilla-central build? I checked in the fix to bug 89419 on mozilla-central yesterday, so I'm very interested to hear if you can reproduce this with the latest nightly or an otherwise up-to-date build.
my debug build is a few days old from mozilla-central, so it does not contain your latest push from bug 89419, I'll update to see if it fixes the assertions (found bug 460652 about it).

I could reproduce the hang with a mozilla-central build from today, 20081223 (http://hg.mozilla.org/mozilla-central/rev/b2479ac7eab7). So looks like bug 89419 didn't fix it :-(.

some windbg !analyze -v -hang output:

FOLLOWUP_IP: 
xul!imgLoader::PutIntoCache+29756a
00000000`67e08cea 8b4614          mov     eax,dword ptr [rsi+14h]

SYMBOL_STACK_INDEX:  0

FOLLOWUP_NAME:  MachineOwner

MODULE_NAME: xul

IMAGE_NAME:  xul.dll

DEBUG_FLR_IMAGE_TIMESTAMP:  4950e371

SYMBOL_NAME:  xul!imgLoader::PutIntoCache+29756a

STACK_COMMAND:  ~0s ; kb

FAILURE_BUCKET_ID:  X64_HANG_xul!imgLoader::PutIntoCache+29756a

BUCKET_ID:  X64_HANG_xul!imgLoader::PutIntoCache+29756a


~0s ; kb

xul!imgLoader::PutIntoCache+0x29756a:
00000000`67e08cea 8b4614          mov     eax,dword ptr [rsi+14h] ds:00000000`68507308=005fa7a0
RetAddr           : Args to Child                                                           : Call Site
08081460`0b7df740 : 0b7df740`00000000 00000000`00000000 07b90148`00000000 00010005`000001b0 : xul!imgLoader::PutIntoCache+0x29756a
0b7df740`00000000 : 00000000`00000000 07b90148`00000000 00010005`000001b0 0015ef20`0000003f : 0x8081460`0b7df740
00000000`00000000 : 07b90148`00000000 00010005`000001b0 0015ef20`0000003f 04bba36c`00000000 : 0xb7df740`00000000
07b90148`00000000 : 00010005`000001b0 0015ef20`0000003f 04bba36c`00000000 0b7df740`04bba3b4 : 0x0

(I'm wondering why I don't have symbols for the frames below that).
If you're comfortable working with patches, I'd love for you to grab the patch to bug 468160 and trying to reproduce. I haven't been able to with that patch applied, which makes sense because leaks can lead to precisely this type of symptom.

If not, no worries: I plan on landing bug 468160 when the tree reopens today.
yeah, that seems to help. I couldn't reproduce a hang with it.
Sounds good. I'm going to call this a duplicate of bug 468160, which I've just pushed a fix for.

If this resurfaces, please reopen this bug!
Status: NEW → RESOLVED
Closed: 16 years ago
Resolution: --- → DUPLICATE
I can confirm that the fix to bug 478160 appears to solve the problem (at least I couldn't reproduce the hang in the latest nightly).  I'm wary of this being currently marked as "resolved" though since it's currently a 1.9.1 blocker and bug 468160 is currently not fixed on the 1.9.1 branch.  

I would wait until bug 468160 gets a 1.9.1 blocking status before marking this resolved, otherwise it might get missed for Firefox 3.1.
Fixed always applies to trunk (in this case Mozilla-central), so I'm going to disagree with you there.

Also, I very much doubt this will get forgotten for 3.1, as there's very little chance bug 468160 will get blocking minus.
(clearing nom as it's a straight dupe)
Flags: blocking1.9.1+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: