Closed Bug 384836 Opened 13 years ago Closed 13 years ago

Hangs with new textframe

Categories

(Core :: Graphics, defect, major)

x86
Windows XP
defect
Not set
major

Tracking

()

VERIFIED FIXED

People

(Reporter: RyanVM, Assigned: roc)

References

()

Details

(Keywords: hang)

Attachments

(1 file)

With the 2007-06-17 textframe nightly, I'm getting reliable hangs with the CPU going up to 100% on the Washington Post front page. If it doesn't hang on first load, reload a few times and it should hang fairly shortly thereafter. I've also seen similar hangs when reading stories at cnn.com.

I've confirmed that the hang only occurs with textframe-enabled builds, that it regressed with the 06-17 nightly, and that it happens on a clean profile. I haven't been able to confirm whether or not the hang is cross-platform.

Regression range: http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=MozillaTinderboxAll&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2007%2F06%2F16+04%3A35&maxdate=2007-06-17+04%3A23&cvsroot=%2Fcvsroot
Flags: blocking1.9?
Blocks: 367177
Keywords: hang
hangs at http://news.yahoo.com/i/738 might be related to this bug

(In reply to comment #1)
> hangs at http://news.yahoo.com/i/738 might be related to this bug
> 
I can reproduce this hang every time...unless I set View -> Page Style to "No style" first.

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a6pre) Gecko/20070618 Minefield/3.0a6pre ID:2007061807

Unable to reproduce hang at washingpost.com though.
I can't reproduce hangs at either page. Can someone get me a stack? (I have a fix in my tree that could be fixing this...)
(In reply to comment #2)
> (In reply to comment #1)
> > hangs at http://news.yahoo.com/i/738 might be related to this bug
> > 
> I can reproduce this hang every time...unless I set View -> Page Style to "No
> style" first.
> 
> Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a6pre) Gecko/20070618
> Minefield/3.0a6pre ID:2007061807
> 
> Unable to reproduce hang at washingpost.com though.
> 

With a trunk hourly build after the textframe landing:

Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a6pre) Gecko/20070620 Minefield/3.0a6pre ID:2007062013

I can no longer reproduce the hangs at http://news.yahoo.com/i/738 .
> 
> With a trunk hourly build after the textframe landing:
> 
> Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a6pre) Gecko/20070620
> Minefield/3.0a6pre ID:2007062013
> 
> I can no longer reproduce the hangs at http://news.yahoo.com/i/738 .
> 

Oops, I spoke too soon. It hung the first time I tried to *reload* that page.
It used to hang on initial load for me with the fx-exp builds.
FWIW, If I do 
View -> Page Style -> No style
prior to loading/reloading http://news.yahoo.com/i/738 , then no hangs (same behavior as with the fx-exp textframe builds).
There's tons of CSS errors there.
I still can't reproduce and my Windows build is still building ... anyone want to try reducing this?
Confirming that disabling CSS avoids the hang. I also can't reproduce the hang in Linux.

This page hangs for me 100% of the time, though I've already gotten a WFM from aja on it, so I guess YMMV.
http://www.washingtonpost.com/wp-dyn/content/article/2007/06/16/AR2007061601015.html
(In reply to comment #8)
> This page hangs for me 100% of the time, though I've already gotten a WFM from
> aja on it, so I guess YMMV.
> http://www.washingtonpost.com/wp-dyn/content/article/2007/06/16/AR2007061601015.html
> 

also hangs for me 100%

Keywords: qawanted
One more bit of info. Every time I get one of these hangs, the status line says
"Waiting for some.remote.server" where it's e.g. an ad server or an image / css server.
Rob Arnold says it's an infinite loop in InitTextRunUniscribe...
Hmmm...

1522             while (FAILED(item->Shape())) {
1523                 PR_LOG(gFontLog, PR_LOG_DEBUG, ("shaping failed"));
1524                 // we know we have the glyphs to display this font already
1525                 // so Uniscribe just doesn't know how to shape the script.
1526                 // Render the glyphs without shaping.
1527                 item->DisableShaping();
1528             }

Does shaping fail if you disable it? That would definitely cause an infinite loop.
no
(In reply to comment #10)
> One more bit of info. Every time I get one of these hangs, the status line says
> "Waiting for some.remote.server" where it's e.g. an ad server or an image / css
> server.

I can confirm that I've seen this frequently when Firefox is "transferring data from pagead2.googlesyndication.com" 

FF hangs with these symptoms every time I visit http://www.smartmoney.com. There's a box over on the right-hand side that diplays a plot of the Dow for the day, and it hangs before this gets displayed.
(In reply to comment #11)
> Rob Arnold says it's an infinite loop in InitTextRunUniscribe...
> 

My stacktrace from a 20070621 nightly hang supports this theory:
 	ntdll.dll!_ExpInterlockedPopEntrySListEnd@0()  + 0x9 bytes	
 	ntdll.dll!_RtlpAllocateFromHeapLookaside@4()  + 0x1d bytes	
 	ntdll.dll!_RtlAllocateHeap@12()  + 0xd7 bytes	
>	msvcr80.dll!malloc(unsigned int size=36)  Line 163 + 0x63 bytes	C
 	xul.dll!UniscribeItem::GenerateAlternativeString()  Line 1375	C++
 	xul.dll!gfxWindowsFontGroup::InitTextRunUniscribe(gfxContext * aContext=0x00000000, gfxTextRun * aRun=0x0cf76d68, const unsigned short * aString=0x0012cdb0, unsigned int aLength=1)  Line 1527 + 0x10 bytes	C++

(yay symbol server!)
We worked this out on IRC. The problem is that nsTransformedTextRun holds an nsRefPtr<gfxContext> but the context can go stale anyway if the reference is held longer than the duration of a paint event, which is what's happening here. So we're using a bad HDC which is triggering errors from Shape(), throwing us into an infinite loop as we retry Shape().

The immediate fix is to stop holding that nsRefPtr and pass the gfxContext in where we need it. That looks fairly simple, I'll do that now. We should also have an assertion that verifies there are no outstanding references to a gfxContext that is about to go stale. We should also fix the shaping loop so we don't go into an infinite loop if we keep failing.
Attached patch fix (?)Splinter Review
okay, this should fix the bug by always passing in a current gfxContext instead of trying to hang on to one. Stuart, if you could review the gfx changes and Simon the layout changes. Someone could also test this...
Assignee: nobody → roc
Status: NEW → ASSIGNED
Attachment #269757 - Flags: superreview?(pavlov)
Attachment #269757 - Flags: review?(smontagu)
Comment on attachment 269757 [details] [diff] [review]
fix (?)

i'm not super happy with needing to pass in the context to the setlinebreak functions, but i don't see a good simple alternative.  We should probably look at this further to see if we can simplify our use of contexts and when they are needed to be more clear.
Attachment #269757 - Flags: superreview?(pavlov) → superreview+
No hangs since applying the patch.
Attachment #269757 - Flags: review?(smontagu) → review+
checked in.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Duplicate of this bug: 385681
Could this have caused a Tp/Tp2 regression (bug 385957)?
Keywords: qawanted
I backed this out; it or bug 385686 caused the Tp2 regression in bug 385957; we should reland them separately when the tree is quiet.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
relanded... watching Tp
Status: REOPENED → RESOLVED
Closed: 13 years ago13 years ago
Resolution: --- → FIXED
Flags: in-testsuite?
Verified FIXED using Mozilla/5.0 (Windows; U; Windows NT 6.0; en-US; rv:1.9a9pre) Gecko/2007102005 Minefield/3.0a9pre; I haven't seen this since.
Status: RESOLVED → VERIFIED
Flags: blocking1.9?
You need to log in before you can comment on or make changes to this bug.