Closed Bug 381512 Opened 17 years ago Closed 17 years ago

[FIXr]Seamonkey goes into 99% CPU utilization when I access a PDF document

Categories

(Core :: Layout, defect)

x86
Linux
defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla1.9alpha8

People

(Reporter: charmer-bugzilla.mozilla.org, Assigned: bzbarsky)

References

()

Details

(Keywords: hang, regression)

Attachments

(2 files)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a5pre) Gecko/20070521 SeaMonkey/1.5a
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a5pre) Gecko/20070521 SeaMonkey/1.5a

if i click on a link to a PDF document, seamonkey goes into some sort of CPU consuming loop and fails to display the document (at least in a reasonable period of time).

if i open the PDF document in a separate tab then i can close the tab, but the high CPU consumption and the browser becomes very, very sluggish.

this has been true for at least ten days now (since the 20070509 drop).  i keep downloading more recent drops in the hope that it's been fixed, but it hasn't.

i don't understand why it hasn't been reported by anyone else.

Reproducible: Always

Steps to Reproduce:
1. start browser
2. open the link:  http://www.isaac.cs.berkeley.edu/isaac/mobicom.pdf (or any other PDF document either as a local file or clicking on a link)
3. wait
Actual Results:  
CPU consumptions goes up to 100% and stays there.  Document fails to open.

Expected Results:  
I expected the document to appear in the browser window.

this is on my Suse Linux Enterpise Desktop 10 machine.  the nppdf.so library that's being used comes from the acroread-7.0.9-1.2 rpm.

running strace on the only seamonkey process that is actively consuming CPU when this happens shows a repeated pattern of system calls:

poll([{fd=3, events=POLLIN}, {fd=17, events=POLLIN}, {fd=21, events=POLLIN|POLLPRI}, {fd=23, events=POLLIN|POLLPRI}, {fd=24, events=POLLIN|POLLPRI}, {fd=25, events=POLLIN|POLLPRI}, {fd=8, events=POLLIN}], 7, 0) = 0
ioctl(49, FIONREAD, [0])                = 0
ioctl(49, FIONREAD, [0])                = 0
select(52, [49 51], [], [51], {0, 0})   = 0 (Timeout)
gettimeofday({1179791935, 9743}, NULL)  = 0
ioctl(3, FIONREAD, [0])                 = 0
ioctl(49, FIONREAD, [0])                = 0
gettimeofday({1179791935, 9791}, NULL)  = 0

[repeats again ...]
poll([{fd=3, events=POLLIN}, {fd=17, events=POLLIN}, {fd=21, events=POLLIN|POLLPRI}, {fd=23, events=POLLIN|POLLPRI}, {fd=24, events=POLLIN|POLLPRI}, {fd=25, events=POLLIN|POLLPRI}, {fd=8, events=POLLIN}, {fd=49, events=POLLIN}], 8, 0) = 0
gettimeofday({1179791935, 10455}, NULL) = 0
gettimeofday({1179791935, 10483}, NULL) = 0
ioctl(3, FIONREAD, [0])                 = 0
ioctl(49, FIONREAD, [0])                = 0
gettimeofday({1179791935, 10533}, NULL) = 0
Assignee: general → nobody
Blocks: 378975
Status: UNCONFIRMED → NEW
Component: General → Layout
Ever confirmed: true
Keywords: regression
Product: Mozilla Application Suite → Core
QA Contact: general → layout
Version: unspecified → Trunk
Verifying same issue with Firefox trunk builds.  I traced the regression to the checkin for bug 378975.
Flags: blocking1.9?
Keywords: hang
I get the same hang with:

data:text/html,<object width="500" height="500" data="http://www.isaac.cs.berkeley.edu/isaac/mobicom.pdf"></object>

with builds predating bug 378975.  When we send data to the plug-in via WriteProc, it turns around and tries to start a load of the same file in the content area with this stack:

#6  0x015e1bb4 in NPN_GetURL () from /usr/lib/mozilla/plugins/nppdf.so
#7  0x015dff8b in NPP_Write () from /usr/lib/mozilla/plugins/nppdf.so
#8  0x015e21ec in Private_Write () from /usr/lib/mozilla/plugins/nppdf.so

It then claims to have consumed 2 bytes and returns.  So we keep calling WriteProc until we've passed it all the data, with it consuming 2 bytes at a time and for each two bytes trying to start a load.  That's about 50,000 NPN_GetURL calls for the PDF linked in the URL field.

In the case of a full-page plug-in the situation is worse because once the GetURL is processed we'll just load the same URI... and go back to the beginning of the whole thing.

At a guess the problem is passing data to the plug-in before calling SetWindow() on it.  I'm not sure whether this is in fact something we're not allowed to do.  If so, we need to either call SetWindow earlier (before DidReflow()) or flush out layout any time we instantiate a plug-in.

Bill, do you see the problem with the <object> testcase?  If so, did it appear with the fix for bug 1156?
Blocks: 1156
(In reply to comment #3)

> Bill, do you see the problem with the <object> testcase?  If so, did it appear
> with the fix for bug 1156?
> 

I probably won't have time to do this until tomorrow.

However, I have time to download the Windows Adobe Reader 7.0.9, and verified that it does not have the problem.  So this is a Linux only issue and not an Adobe Reader version issue.

ccing some adobe folks in the hope that they will know why the plug-in is exhibiting this odd behavior and perhaps what invariants they're relying on.
Flags: blocking1.9? → blocking1.9+
(In reply to comment #3)
> I get the same hang with:
> 
> data:text/html,<object width="500" height="500"
> data="http://www.isaac.cs.berkeley.edu/isaac/mobicom.pdf"></object>
 
> Bill, do you see the problem with the <object> testcase?  If so, did it appear
> with the fix for bug 1156?
> 

I do see the issue with the object testcase.  I can also confirm that although full page pdfs started to exhibit this behavior with the code checked in for bug 378975, the object case issue pre-dates that. 

I have been unable to determine if this occurred with the checking for bug 1156 because the url you gave does not work with builds form that period.  It does not attempt to load the pdf at all and just displays a blank page.  I will try to find an actual webpage with embedded pdf to use for testing.
OK I took you object example and wrapped some html around it and saved it on my hard drive.  Using that I verified that the regression was NOT caused by bug 1156.  I have so far determined that it regressed between the 2005-09-22 nightly and the 2006-07-01 nightly.  Bug 1156 was checked in on 2005-09-21.
OK.  Probably one of the followups, then.  Not really worth worrying about which one, I suspect.

Thank you for testing this!  It's much appreciated.
This appears to be the regression window for the object case.  I cannot seem to get a successful build form a pull by date/time from the tree so can;t really narrow it down further.

http://bonsai.mozilla.org/cvsquery.cgi?treeid=default&module=all&branch=HEAD&branchtype=match&dir=&file=&filetype=match&who=&whotype=match&sortby=Date&hours=2&date=explicit&mindate=2005-10-07+&maxdate=2005-10-08+05%3A00&cvsroot=%2Fcvsroot
Huh.  Nothing in that range jumps out at me...

Let's wait to see what we hear from Adobe, I guess...
I double checked.  Definitely works with 2005100704 nightly and fails with 2005100804.
Could be bug 297832, I suppose...  Or bug 311223, or bug 309118.  Or even bug 289352.
Liz McQuarrie of Adobe writes:

---------------------------------------------------------------------------
I looked at the code specific to Linux (UNIX) in our mozilla/firefox
plug-in  at it certainly seems it requires NPP_SetWindow before
NPP_Write is called.  It looks something like this:

int32 NP_LOADDS NPP_Write(NPP instance, NPStream *stream, int32 offset,
int32 len, void *buffer)

#if UNIX_ENV
	int windowError = 0;
	if(!m->window)
		windowError = 1;

	if(windowError == 0)
	{
	 <stuff omitted here>
	}
 
	if(windowError == 1)
	{
		DebugWriteToFile(("WARNING !!! WARNING !!! No window for
the stream which we are getting data for."));
		AddCorruptServerInfo(stream->url);
		NPN_GetURL(instance, stream->url, "_current");
		return NPERR_INVALID_INSTANCE_ERROR;
	}
#endif
---------------------------------------------------------------------------

I guess we should make sure we flush out layout before starting to pump data into the plug-in....
Oh, and the reason to flush would be that SetWindow() is called from reflow.
That said, shouldn't we be detecting the error the plug-in claims to return in this case?  Or is WriteReady not propagating that out to us?
I just checked the code, and unfortunately, it *always* says it is ready when WriteReady is called, which is obviously not the case for our Linux  plug-in.  I will enter a bug against future versions, but for compatibility with 7.X and 8.X Linux reader, you will not be able to depend on the response from WriteReady.

Sorry!

(P.S.  We have had to work around many issues with mozilla code through the years, so please forgive us this OOOPS! :-)
Liz, thank you for digging into this!

Just to summarize what's going on:

1)  We changed behavior so that we sometimes call NPP_WriteReady/NPP_Write
    before NPP_SetWindow.
2)  The Acrobat plug-in always says it's ready for data.
3)  The Acrobat plug-in expects to have gotten NPP_SetWindow before NPP_Write
4)  When this has not happened, it starts a load and returns
    NPERR_INVALID_INSTANCE_ERROR from NPP_Write.
5)  NPERR_INVALID_INSTANCE_ERROR == 2
6)  The return value of NPP_Write should be the number of bytes consumed (with
    a negative number indicating errors), so we think the plug-in consumed
    two bytes.
7)  We end up in the situation in comment 3.

Sounds like we need to undo the behavior change in step 1 here, at least for Acrobat.  Probably easiest to try to undo it for all plug-ins, especially because others may have been depending on the old behavior as well.
Assignee: nobody → bzbarsky
Status: NEW → ASSIGNED
Attachment #274198 - Flags: superreview?(jst)
Attachment #274198 - Flags: review?(cbiesinger)
Summary: Seamonkey goes into 99% CPU utilization when I access a PDF document → [FIX]Seamonkey goes into 99% CPU utilization when I access a PDF document
Attachment #274198 - Flags: superreview?(jst) → superreview+
OK, are you guys sure that you get no NPP_SetWindow call at all before WriteReady/Write? Because there's a call to SetWindow in the plugin host that should get called here before you get data, and if that doesn't happen I'd like to find out why.

(Now, that call sometimes passes a rect with 0 width/height to the plugin. Is it possible that the call happens and that the plugin just doesn't like the empty rectangle?)
OK, figured it out... the code has a check on Linux so that it doesn't call NPP_SetWindow with a size <= 0, and we're hitting that check, so the PDF plugin does indeed not get a SetWindow call.
Comment on attachment 274198 [details] [diff] [review]
This seems to be the right thing

nsPluginDocument.cpp:
+  shell->FlushPendingNotifications(Flush_Layout);
+
   nsIFrame* frame = shell->GetPrimaryFrameFor(embed);

Could the flushing cause the embed to go away? (perhaps you should hold a reference to it?)

layout/generic/nsObjectFrame.cpp
+  // XXXbz when this code moves out of reflow, see whether the layout
+  // flushes in nsPluginStreamListener::OnStartRequest and
+  // nsObjectLoadingContent::OnStartRequest() can be removed.

I don't think they could, because you still need to ensure that the frame knows its correct size...
Attachment #274198 - Flags: review?(cbiesinger) → review+
> Could the flushing cause the embed to go away?

No.  We hold a strong ref to the doc, which holds a strong ref to the plugin content.  I suppose we could use a stack ref here just to be safe, though.

> I don't think they could, because you still need to ensure that the frame knows
> its correct size...

Good point.  I'll remove the comment.
Summary: [FIX]Seamonkey goes into 99% CPU utilization when I access a PDF document → [FIXr]Seamonkey goes into 99% CPU utilization when I access a PDF document
Targeting for M8 per conversation with biesi.
Target Milestone: --- → mozilla1.9 M8
Comment on attachment 274198 [details] [diff] [review]
This seems to be the right thing

This patch makes our behavior a little closer to that of 1.8: we now make sure we've reflowed the nsObjectFrame before pumping data into the plug-in.  At least in the cases when the data is coming from a channel we opened outside the plug-in code.  Risk is probably moderate, but without this there are several plug-in regressions with several plug-ins (Acrobat reader, Java, WMV) on various platforms.
Attachment #274198 - Flags: approval1.9?
Comment on attachment 274198 [details] [diff] [review]
This seems to be the right thing

Nevermind the approval; this is already blocking+.
Attachment #274198 - Flags: approval1.9?
Checked in, with the changes per comment 21.
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: