Closed Bug 87861 Opened 23 years ago Closed 23 years ago

Mac does not idle.

Categories

(Core :: Networking: File, defect, P2)

PowerPC
Mac System 8.5
defect

Tracking

()

VERIFIED FIXED
mozilla0.9.4

People

(Reporter: sfraser_bugs, Assigned: sfraser_bugs)

References

Details

Attachments

(3 files)

The counter for in-use file transports is going negative. This causes the Mac to 
think that the browser is continually busy, which causes permanent high CPU load.

This is a bad thing on both Mac OS 9 and X (it drains my PowerBook batteries and 
causes the machine to run hot, for one thing).
cc: gagan. I think this is serious enough that we need to fix it on the branch 
(and it should be an easy fix)
The problem seems to be that the in-use count is decremented even for files that 
were not opened correctly. Here's part of the log:

22[4be7398]: nsFileTransport: OPEN_FOR_READ [this=4e01a04 jar:resource:///chrome/
modern.jar!/skin/modern/navigator/back-hover.gif] status=80520006
22[4be7398]: nsFileTransport: Leaving Process [this=4e01a04 state=4 status=
80520006]
22[4be7398]: nsFileTransport: Inside Process [this=4e01a04 state=4 status=
80520006]
22[4be7398]: nsFileTransport: END_READ [this=4e01a04 jar:resource:///chrome/
modern.jar!/skin/modern/navigator/back-hover.gif, 0] status=80520006

So there is no START_READ call (which increments the in-use count), but END_READ 
is still called, and decrements the count.

It seems that jar:resource:///chrome/modern.jar!/skin/modern/navigator/back-
hover.gif is missing.
Attached patch Potential patchSplinter Review
This patch ensures that the in-use count is only decremented in END_READ/
END_WRITE if START-READ/START_WRITE was called. It also adds a debug-only printf 
when there was an error reading the file, so that skin developers can know when 
they screwed up.
this bug does sound serious to me. nominating for the branch. simon you want to 
own this? 
Keywords: nsBranch
Priority: -- → P2
Whiteboard: need sr/a
Target Milestone: --- → mozilla0.9.3
sr=darin, though there appears to be an indentation problem here:

+        if (mStartWriteCalled)
         PR_AtomicDecrement(&mService->mInUseTransports);

why is the state going into END_READ without passing througt START_READ?
Because of this:

      case OPEN_FOR_READ: { 
        mStatus = mStreamIO->Open(&mContentType, &mTotalAmount);
       ...
        mXferState = NS_FAILED(mStatus) ? END_READ : START_READ;
How about incrementing the count in OPEN_FOR_* instead?  
That might work, as long as we can guarantee that we get matched increment/
decrements.
we have bigger problems if OnStartRequest is not paired with OnStopRequest, so
you should be able to match up the increment/decrement with the call sites of those.
New patch attached, that moves the increment of the in-use count into 
OPEN_FOR_READING/OPEN_FOR_WRITING.
looks good to me.  sr/r=dougt.

simon, I have a branch here if you would like me to seek approval and check it 
in.
Dougt: please do. However, I'm seeing another bug where the in-use file transport 
count goes bad, this time when the user clicks on a javascript:history.back() 
link. Let's wait until I've debugged that one.
Another cause of bad file transport counts is in bug 87901.
For javascript:history.back(), we seem to have a case where we can go from 
OPEN_FOR_READ to CLOSING without an END_READ. This happens if mRunState is set to 
CANCELED.

So we need a better patch.
simon: even if the file transport is canceled, it should still fire OnStopRequest.
if not, then that's a very serious bug.

just a quick reading of the code seems to suggest that a call to Cancel will only
cause mXferState to become CLOSING if it was not READING or WRITING.  so, how can
we be in the middle of reading or writing?  oh.. wait.. ic why!  it's because
we could be in the OPEN_FOR_READ or START_READ states.  the cancel logic needs to
take into account these other states.

         // Change transfer state if canceled.
         //
         if (mRunState == CANCELED) {
-            if (mXferState == READING)
+            if (mXferState == OPEN_FOR_READ ||
+                mXferState == START_READ ||
+                mXferState == READING)
                 mXferState = END_READ;
-            else if (mXferState == WRITING)
+            else if (mXferState == OPEN_FOR_WRITE ||
+                     mXferState == START_WRITE ||
+                     mXferState == WRITING)
                 mXferState = END_WRITE;
             else
                 mXferState = CLOSING;
That patch still doesn't help. Here's an edited part of the log, with the patch.

0[7094efc]: nsFileTransport: AsyncRead [this=7847d98 javascript:history.back()] 
mOffset=0 mTransferAmount=-1
39[74f9c68]: nsFileTransport: Inside Run
39[74f9c68]: nsFileTransport: Inside Process [this=7847d98 state=1 status=0]
0[7094efc]: nsFileTransport: Cancel [this=7847d98 javascript:history.back()]
39[74f9c68]: nsFileTransport: OPEN_FOR_READ [this=7847d98 
javascript:history.back()] status=805303f5
39[74f9c68]: nsFileTransport: Leaving Process [this=7847d98 state=4 status=
805303f5]
39[74f9c68]: nsFileTransport: Inside Process [this=7847d98 state=9 status=
804b0002]
39[74f9c68]: nsFileTransport: CLOSING [this=7847d98 javascript:history.back()] 
status=804b0002
39[74f9c68]: nsFileTransport: Leaving Process [this=7847d98 state=0 status=
804b0002]
39[74f9c68]: nsFileTransport: Leaving Run [xferState=0 runState=2]

So we do an AsyncRead on the javascript: url. Cancel() gets called on it (I've no 
idea why), setting mStatus. Then, when we hit the OPEN_FOR_READ state, we set the 
state to END_READ. Then, the next time through Process(), we change this state to 
CLOSING without ever having processed the END_READ state.
 // Change transfer state if canceled.
         //
         if (mRunState == CANCELED) {
-            if (mXferState == READING)
+            if (mXferState == OPEN_FOR_READ ||
+                mXferState == START_READ ||
+                mXferState == READING ||
+                mXferState == END_READ)
                 mXferState = END_READ;
-            else if (mXferState == WRITING)
+            else if (mXferState == OPEN_FOR_WRITE ||
+                     mXferState == START_WRITE ||
+                     mXferState == WRITING ||
+                     mXferState == END_WRITE)
                 mXferState = END_WRITE;
             else
                 mXferState = CLOSING;

Yup, that's better, and fixes this javascript: problem.

However, I can still get into a state where we mess up the in-use count by 
running the iBench tests, and cancelling half-way through.
ugh
OK, cancelling the iBench test in mid-flow sometimes (not always) can mess up the 
nsSocketTransport in-use count. Maybe we need some similar fixes there.
it sounds like there is a thread-safety problem then.  can you get a socket
transport log?  NSPR_LOG_MODULES=nsSocketTransport:5
the 0.9.2 boat is leaving in a few minutes.  If there is any hope to get
something on this branch, we need to act now.  Simon, does the patch darin and
you came up with solve "most" of the problems?  if so, lets go for it...
r/sr=darin
Let's get that patch in. I'll file a separate bug on the socket transport issues, 
since it's likely to be a different kind of problem (it's actually 
mSelectFDSetCount that we track in the socket transport service).
looks good. r=.  Thanks darin and simon for nailing this problem.


Chris, do you want to approve this for the branch?  I can check this in to the
trunk when it opens.
It looks like we don't care too much about having this for 0.9.2, but we 
certainly want it right afterwards.
Whiteboard: need sr/a → ready to checkin to branch
Checked in patch:

Checking in nsFileTransport.cpp;
/cvsroot/mozilla/netwerk/base/src/nsFileTransport.cpp,v  <--  
nsFileTransport.cpp
new revision: 1.109; previous revision: 1.108
done
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Blocks: 67409
Reopening for branch consideration.
Status: RESOLVED → REOPENED
Keywords: vtrunk
Resolution: FIXED → ---
You can use Peek-A-Boo to view application CPU usage on Mac. Get it from:

http://www.clarkwoodsoftware.com/peekaboo/

Mozilla (branch) will show a red bar when using lots of CPU. If this bug is 
fixed, CPU usage should be normal.
This doesn't appear to be fixed on the trunk.  CPU usage is still almost pegged 
on my os9 machine.
I installed the same build as tever, and we're seeing different behaviour on two 
machines. It sucks CPU on his, and not on mine. So I think the patch is working, 
but there is something else happening.
Things work on Sairuh's Mac OS 9.0 machine, so tever's Mac is weird somehow.
Lets not put this on the trunk, but only the branch if you get approval from 
pdt2.
Huh? I thought you already checked it into the trunk? --

> Checked in patch:
> 
> Checking in nsFileTransport.cpp;
> /cvsroot/mozilla/netwerk/base/src/nsFileTransport.cpp,v  <--  
> nsFileTransport.cpp
> new revision: 1.109; previous revision: 1.108
> done
Whiteboard: ready to checkin to branch → nsbranch+,pdt+ ready to checkin to branch
crap.  that comment was for another bug.  sorry simon,
checked into branch:

Checking in nsFileTransport.cpp;
/cvsroot/mozilla/netwerk/base/src/nsFileTransport.cpp,v  <--  
nsFileTransport.cpp
new revision: 1.107.26.2; previous revision: 1.107.26.1
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
Tom - pls verify.  Do you still see the strange behavior on your Mac?  If so,
perhaps try sairuh's again?
Keywords: vtrunk
yes, ns61 still has my cpu usage at about 90% when idle - I will have to check
sairah's machine tomorrow.  07/16/01 branch.
I checked this on Prashant's os9 system and ns6 didn't seem to be loading the
cpu that much.  On mine, it does.   (7/17 branch)  
checked this on Kritzer's os9.1 system.  The peek-a-boo utility is showing the
cpu is pegged.  So, I think there is still a problem.  re-open
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
gordon, pink, do you have any cycles to look at this today?
bulk move to 0.9.4
Target Milestone: mozilla0.9.3 → mozilla0.9.4
I don't doubt that the mac is not idling, but I do not see the file transport
count going negative....  adjusting summary.  
Summary: In use file transport count is going negative → Mac does not idle.
Whiteboard: nsbranch+,pdt+ ready to checkin to branch → nsbranch+,pdt+
back to simon for investigation.
Assignee: dougt → sfraser
Status: REOPENED → NEW
We need to test this again on tever's machine. I think it's fixed everywhere 
else.
Status: NEW → ASSIGNED
it happens on my own and Chris Kritzer's machine.  2 out of 5 that I've looked 
at so far.
Can you please run Interarchy, and look for active mail or news connections, on 
the machines that appear not to idle.
I am not seeing any active mail or news connections using interarchy.  What I am
seeing though is that peek-a-boo now shows interarchy as the application hogging
the cpu.   Interarchy is using approximately 95% of the cpu for monitoring
network connections regardless of whether it is the app in focus.  When I shut
down interarchy, NS6 will then show that it is hogging 95% of the cpu while it
idles.  
I looked at tever's G3 (Mac OS 9.0) and ccritzer's G4 (9.1), and both show 
different idling behaviour to my Macs. However, debugging with MacsBug show that 
we are calling WaitNextEvent with a sleep time of 2 ticks, which indicates that 
Mozilla thinks that it's idle, so my code is working. I do not know why Peek-a-
boo shows such high CPU usage.
Do you know of any other tools for monitoring cpu usage for the mac?  I am not
sure if I am trusting peek-a-boo.
removing nsBranch and pdt keywords, nsBranch+ is assigned by mgr and pdt+ is
assigned by PDT members
Whiteboard: nsbranch+,pdt+
MacsBug says that the code is working, and our CPU usage is low on most
machines. The code is working as designed.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
QA Contact: tever → benc
VERIFIED:
based on data provided above.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: