Closed
Bug 146884
Opened 23 years ago
Closed 23 years ago
Mozilla sometimes consumes 100% CPU when pipelining is enabled...
Categories
(Core :: Networking, defect, P1)
Tracking
()
VERIFIED
FIXED
mozilla1.0.1
People
(Reporter: mozilla-bugs, Assigned: darin.moz)
References
Details
(Keywords: hang, regression, topembed+, Whiteboard: [pipelining][adt2 RTM][verified-trunk])
Attachments
(1 file, 2 obsolete files)
4.05 KB,
patch
|
darin.moz
:
review+
rpotts
:
superreview+
asa
:
approval+
|
Details | Diff | Splinter Review |
I've seen this happen twice when doing essentially the same thing in both cases
(in a secure banking site followed a link while some frame was still saing
"Please wait"). Mozilla would hang, one thread would start using all the CPU
time and the windows would not even redraw (and, of course, would not react to
anything I might do).
I attached gdb to Mozilla when it happened the second time, and let it run,
interrupting it several times and getting the stack trace for the thread eating
all the CPU. Here are the stacks I saw:
Thread 3 (Thread 1026 (LWP 12749)):
#0 0x4017065e in nsPipe::AddRef () from /usr/lib/libxpcom.so
#1 0x401714f6 in nsPipe::nsPipeOutputStream::AddRef () from /usr/lib/libxpcom.so
#2 0x40171630 in nsPipe::nsPipeOutputStream::WriteSegments () from
/usr/lib/libxpcom.so
#3 0x40171b1e in nsPipe::nsPipeOutputStream::WriteFrom () from /usr/lib/libxpcom.so
#4 0x407ceec9 in nsHttpTransaction::OnDataWritable () from
/usr/lib/mozilla/components/libnecko.so
#5 0x407e0448 in nsHttpPipeline::OnDataWritable () from
/usr/lib/mozilla/components/libnecko.so
#6 0x407ce3a1 in nsHttpConnection::OnDataWritable () from
/usr/lib/mozilla/components/libnecko.so
#7 0x4079cb11 in nsSocketWriteRequest::OnWrite () from
/usr/lib/mozilla/components/libnecko.so
#8 0x407994a3 in nsSocketTransport::doReadWrite () from
/usr/lib/mozilla/components/libnecko.so
#9 0x4079857d in nsSocketTransport::Process () from
/usr/lib/mozilla/components/libnecko.so
#10 0x4079d538 in nsSocketTransportService::Run () from
/usr/lib/mozilla/components/libnecko.so
#11 0x4018d1b3 in nsThread::Main () from /usr/lib/libxpcom.so
#12 0x4020f69c in _pt_root () from /usr/lib/libnspr4.so
#13 0x40221c6f in pthread_start_thread (arg=0x40b16be0) at manager.c:284
Thread 3 (Thread 1026 (LWP 12749)):
#0 0x40225b50 in __pthread_unlock (lock=0x447fbd0c) at spinlock.c:260
#1 0x4020a037 in PR_Unlock () from /usr/lib/libnspr4.so
#2 0x4020a5e1 in PR_ExitMonitor () from /usr/lib/libnspr4.so
#3 0x4018afbf in nsAutoMonitor::Exit () from /usr/lib/libxpcom.so
#4 0x40171773 in nsPipe::nsPipeOutputStream::WriteSegments () from
/usr/lib/libxpcom.so
#5 0x40171b1e in nsPipe::nsPipeOutputStream::WriteFrom () from /usr/lib/libxpcom.so
#6 0x407ceec9 in nsHttpTransaction::OnDataWritable () from
/usr/lib/mozilla/components/libnecko.so
#7 0x407e0448 in nsHttpPipeline::OnDataWritable () from
/usr/lib/mozilla/components/libnecko.so
#8 0x407ce3a1 in nsHttpConnection::OnDataWritable () from
/usr/lib/mozilla/components/libnecko.so
#9 0x4079cb11 in nsSocketWriteRequest::OnWrite () from
/usr/lib/mozilla/components/libnecko.so
#10 0x407994a3 in nsSocketTransport::doReadWrite () from
/usr/lib/mozilla/components/libnecko.so
#11 0x4079857d in nsSocketTransport::Process () from
/usr/lib/mozilla/components/libnecko.so
#12 0x4079d538 in nsSocketTransportService::Run () from
/usr/lib/mozilla/components/libnecko.so
#13 0x4018d1b3 in nsThread::Main () from /usr/lib/libxpcom.so
#14 0x4020f69c in _pt_root () from /usr/lib/libnspr4.so
#15 0x40221c6f in pthread_start_thread (arg=0x40b16be0) at manager.c:284
#0 0x40223e59 in pthread_self () at pthread.c:701
#1 0x40209fc8 in PR_Lock () from /usr/lib/libnspr4.so
#2 0x4020a57e in PR_EnterMonitor () from /usr/lib/libnspr4.so
#3 0x40171659 in nsPipe::nsPipeOutputStream::WriteSegments () from
/usr/lib/libxpcom.so
#4 0x40171b1e in nsPipe::nsPipeOutputStream::WriteFrom () from /usr/lib/libxpcom.so
#5 0x407ceec9 in nsHttpTransaction::OnDataWritable () from
/usr/lib/mozilla/components/libnecko.so
#6 0x407e0448 in nsHttpPipeline::OnDataWritable () from
/usr/lib/mozilla/components/libnecko.so
#7 0x407ce3a1 in nsHttpConnection::OnDataWritable () from
/usr/lib/mozilla/components/libnecko.so
#8 0x4079cb11 in nsSocketWriteRequest::OnWrite () from
/usr/lib/mozilla/components/libnecko.so
#9 0x407994a3 in nsSocketTransport::doReadWrite () from
/usr/lib/mozilla/components/libnecko.so
#10 0x4079857d in nsSocketTransport::Process () from
/usr/lib/mozilla/components/libnecko.so
#11 0x4079d538 in nsSocketTransportService::Run () from
/usr/lib/mozilla/components/libnecko.so
#12 0x4018d1b3 in nsThread::Main () from /usr/lib/libxpcom.so
#13 0x4020f69c in _pt_root () from /usr/lib/libnspr4.so
#14 0x40221c6f in pthread_start_thread (arg=0x40b16be0) at manager.c:284
Thread 3 (Thread 1026 (LWP 12749)):
#0 0x40225a9b in __pthread_lock (lock=0x447fbd1c, self=0x0)
at ../linuxthreads/sysdeps/i386/i686/pt-machine.h:57
#1 0x40222d0e in __pthread_mutex_lock (mutex=0x447fbd0c) at mutex.c:101
#2 0x40209fbc in PR_Lock () from /usr/lib/libnspr4.so
#3 0x4020a57e in PR_EnterMonitor () from /usr/lib/libnspr4.so
#4 0x40171659 in nsPipe::nsPipeOutputStream::WriteSegments () from
/usr/lib/libxpcom.so
#5 0x40171b1e in nsPipe::nsPipeOutputStream::WriteFrom () from /usr/lib/libxpcom.so
#6 0x407ceec9 in nsHttpTransaction::OnDataWritable () from
/usr/lib/mozilla/components/libnecko.so
#7 0x407e0448 in nsHttpPipeline::OnDataWritable () from
/usr/lib/mozilla/components/libnecko.so
#8 0x407ce3a1 in nsHttpConnection::OnDataWritable () from
/usr/lib/mozilla/components/libnecko.so
#9 0x4079cb11 in nsSocketWriteRequest::OnWrite () from
/usr/lib/mozilla/components/libnecko.so
#10 0x407994a3 in nsSocketTransport::doReadWrite () from
/usr/lib/mozilla/components/libnecko.so
#11 0x4079857d in nsSocketTransport::Process () from
/usr/lib/mozilla/components/libnecko.so
#12 0x4079d538 in nsSocketTransportService::Run () from
/usr/lib/mozilla/components/libnecko.so
#13 0x4018d1b3 in nsThread::Main () from /usr/lib/libxpcom.so
#14 0x4020f69c in _pt_root () from /usr/lib/libnspr4.so
#15 0x40221c6f in pthread_start_thread (arg=0x40b16be0) at manager.c:284
Reporter | ||
Comment 1•23 years ago
|
||
P.S. BuildID 2002052314 (trunk) on RedHat Linux 7.2
Assignee | ||
Comment 3•23 years ago
|
||
reporter: looks like you have HTTP pipelining enabled. does this ever happen
when pipelining is disabled?
Status: NEW → ASSIGNED
Priority: -- → P1
Whiteboard: [pipelining]
Target Milestone: --- → mozilla1.0.1
Reporter | ||
Comment 4•23 years ago
|
||
Yes, I probably do have it enabled. I am not sure what happens if it is not
enabled - the only way of reproducing the bug that I know of involves messing
with my internet banking in some weird ways, which is something I'd rather not do.
Assignee | ||
Comment 5•23 years ago
|
||
aleksey: have you encountered this bug with recent nightly trunk builds?
starting with 1.1alpha?
Assignee | ||
Comment 6•23 years ago
|
||
downgrading for now since i don't really have enough information to do anything
about this bug and it doesn't seem like it is happening that often or to many
folks. it's also possible that one or more of the recent pipelining bug fixes
also fixed this.
Severity: critical → major
Target Milestone: mozilla1.0.1 → ---
Reporter | ||
Comment 7•23 years ago
|
||
I can still reproduce this using BuildID 2002061717 on RedHat Linux 7.2
It now looks somewhat different.
1) In the offending window I see the "Please wait" message (that would normally
be replaced with a message telling me the transaction is successful) and *no
indicators* of network activity - both the "M" icon and status/progress bar look
as if the page have finished loading.
2) Mozilla does not hang, but goes into a state where any further network
activity (I tried HTTP and IMAP) is impossible:
- I tried to view a message (from IMAP) - the progress indicators looked as if
it started loading the message, but nothing ever happens.
- Next I tried to middle-click on a link (e.g. "open link in the new window"). I
did get a new window with all the progress indicators being "alive", but again,
nothing was ever loaded.
As before, gdb showed that the thread consuming most of the CPU was in
WriteSegments function all the time.
When I tried to quit Mozilla, it hanged for good.
Assignee | ||
Comment 8•23 years ago
|
||
aleksey: wow! ok are there any steps i might be able to use try to reproduce the
problem? are you able to reproduce it readily simply by visiting a particular
site? are you connecting to the internet using a proxy server? if so, do you
happen to know what kind of proxy server? thx!
Reporter | ||
Comment 9•23 years ago
|
||
I use squid on localhost, but only for http/ftp, while the site in question is
https. I can reproduce this simply by trying to transfer money between savings
and checking on a particular online banking site (small local bank).
BTW, the reason it looked differently yesterday may be simply because I did
something slightly different - when I saw "Please wait" I didn't try to follow a
link in the offending window, but went to another window (trying to read mail).
Possibly it hangs only when I attempt to do something in the offending window
(including trying to close it) and if I leave that window alone, it blocks
network activity in other windows.
Yeah, I wish I had a test case that didn't involve messing with my bank account...
Comment 10•23 years ago
|
||
I'm also seeing this bug with current trunk builds on win2k. No proxy.
Steps to reproduce (sometimes)
1. Go to http://www.anandtech.com
2. Click on one of the articles in the main column
3. Browse through the article
4. Try & stop some of the pages in the article before they load completely
I was able to reproduce this bug 2/3 of the time following the steps above.
OS: Linux → All
Assignee | ||
Comment 11•23 years ago
|
||
walk84: what kind of internet connection do you have? modem?
gagan, dave: this is that pipelining bug that i mentioned to you today.
Keywords: topembed
Comment 12•23 years ago
|
||
darin, yes. 56k.
Comment 13•23 years ago
|
||
*** Bug 153772 has been marked as a duplicate of this bug. ***
Comment 14•23 years ago
|
||
just saw this at http://oneandone.co.uk/xml/init?k_id=3978017 , with trunk
2002070604 on win2k. pipelining enabled, using 64k ISDN.
I also reproduced it 2 times out of 2 with the steps in comment 10 (in fact, i
didn't have to fiddle with anything, it happened while loading an article).
disabled pipelining and it worked 2 times out of 2.
Updated•23 years ago
|
Keywords: mozilla1.1
Comment 15•23 years ago
|
||
*** Bug 154373 has been marked as a duplicate of this bug. ***
Updated•23 years ago
|
Keywords: regression,
relnote
Comment 16•23 years ago
|
||
*** Bug 152643 has been marked as a duplicate of this bug. ***
Comment 17•23 years ago
|
||
*** Bug 158473 has been marked as a duplicate of this bug. ***
Comment 18•23 years ago
|
||
*** Bug 156142 has been marked as a duplicate of this bug. ***
Comment 19•23 years ago
|
||
*** Bug 159212 has been marked as a duplicate of this bug. ***
Comment 20•23 years ago
|
||
*** Bug 159327 has been marked as a duplicate of this bug. ***
Assignee | ||
Updated•23 years ago
|
Target Milestone: --- → mozilla1.1beta
Assignee | ||
Comment 21•23 years ago
|
||
i was able to consistently reproduce this bug on
http://www.battle.net/war3/nightelf (from one of the dupes), and the problem
turned out to be in the socket transport. we simply weren't clearing the
socket error state before calling OnDataAvailable. in some cases,
nsHttpPipeline::OnDataReadable would return NS_OK without ever trying to read
from the socket. as a result, the socket transport would sit in a loop trying
to call OnDataAvailable because it erroneously believed that the socket read
resulted in a WOULD_BLOCK error despite the fact that read was never called on
the socket. whew!
anyways, this simple patch fixes the problem. however, it doesn't seem to
address the stack trace initially reported. i'm still thinking about that one.
Assignee | ||
Comment 22•23 years ago
|
||
fixing the component and summary
Component: XPCOM → Networking
Summary: Mozilla hangs in nsPipe::nsPipeOutputStream::WriteSegments with 100% CPU usage... → Mozilla sometimes consumes 100% CPU when pipelining is enabled...
Reporter | ||
Comment 24•23 years ago
|
||
Not sure about the new summary - for me it indeed used to hang Mozilla or at
least make all feature network activity in all windows (including Mail) impossible.
Assignee | ||
Comment 25•23 years ago
|
||
aleksey: hmm... have you seen that problem recently?
Reporter | ||
Comment 26•23 years ago
|
||
No, but I have not tried reproducing it recently. I might try once your patch
goes in.
Comment 27•23 years ago
|
||
Site mentioned in comment 21 will completely hang my computer (not just Mozilla
but the OS) for 20-30 seconds - although it does still buffer some keystrokes
for later processsing. It does this about once every 5 minutes, starting about
5 minutes after loading the site. 100% reproducible for me. (Without
pipelining everything is fine.)
Assignee | ||
Comment 28•23 years ago
|
||
Aleksey: sounds to me as if the problem you were seeing might be explained by
this bit of code inside nsHttpPipeline::OnDataWritable():
while (1) {
rv = mTransactionQ[i]->OnDataWritable(outputStream);
if (rv == NS_BASE_STREAM_CLOSED)
break; // advance to next transaction
if (NS_FAILED(rv))
return rv; // something bad happened!!
// else, there's more to write (the transaction may be
// writing in small chunks).
}
if mTransactionQ[i]->OnDataWritable(...) returned NS_OK without actually writing
anything to outputStream, then we could get stuck in this loop. moreover, this
loop runs on the socket transport thread (a background thread), meaning that the
UI would still function, but you'd be unable to use the network (as you mentioned).
i'm thinking of adding an additional check to this loop to verify that data was
actually written to the stream. if no data is written and NS_OK is returned,
then i'll assume the transaction is done writing and advance to the next
transaction. that should make this bit of code more robust, but it does seem
like a bit of a hack. nsHttpTransaction::OnDataWritable is designed to return
NS_BASE_STREAM_CLOSED when it cannot write any data. for some reason, it is not
doing that.
Assignee | ||
Comment 29•23 years ago
|
||
ok, this patch combines both fixes.
Attachment #92856 -
Attachment is obsolete: true
Comment 30•23 years ago
|
||
Comment on attachment 92856 [details] [diff] [review]
v1 patch
nit:
In the comment: not GetWouldBlock(), but GotWouldBlock.
other than that, seams reasonable. r=dougt
Attachment #92856 -
Flags: review+
Comment 31•23 years ago
|
||
Comment on attachment 92941 [details] [diff] [review]
v2 patch
r=dougt
Attachment #92941 -
Flags: review+
Assignee | ||
Comment 32•23 years ago
|
||
Attachment #92941 -
Attachment is obsolete: true
Assignee | ||
Comment 33•23 years ago
|
||
Comment on attachment 92946 [details] [diff] [review]
v2.1 patch - revised spelling mistake
forwarding r=dougt
Attachment #92946 -
Flags: review+
Comment 34•23 years ago
|
||
Comment on attachment 92946 [details] [diff] [review]
v2.1 patch - revised spelling mistake
sr=rpotts@netscape.com
Attachment #92946 -
Flags: superreview+
Comment 35•23 years ago
|
||
Comment on attachment 92946 [details] [diff] [review]
v2.1 patch - revised spelling mistake
a=asa (on behalf of drivers) for checkin to 1.1
Attachment #92946 -
Flags: approval+
Assignee | ||
Comment 36•23 years ago
|
||
fixed-on-trunk
Keywords: adt1.0.1,
mozilla1.0.1
Whiteboard: [pipelining] → [pipelining][adt2 RTM]
Comment 38•23 years ago
|
||
*** Bug 156405 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 39•23 years ago
|
||
resolving FIXED to get QA's attention.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Comment 40•23 years ago
|
||
Darin, aren't you looking for the fixed1.0.1 keyword?
Comment 41•23 years ago
|
||
*** Bug 143131 has been marked as a duplicate of this bug. ***
Comment 42•23 years ago
|
||
Could everyone who is able to reproduce this check that it is fixed in the
nightly and add their comments? I've seen a few of the 100% cpu hangs but they
usually occur after I leave my machine running all night (with mail open I
think). I haven't been able come up with a reliable test case. All of the
sites in the dupes work for me. Thanks
Anthony, the fixed1.0.1 is used to signify that the fix is in the branch and
ready for testing.
Comment 43•23 years ago
|
||
It's not happened for me since this was checked in. Bug 160222 was reported
today, but that was with an older build. Seems to me it was clobbered fine.
Comment 44•23 years ago
|
||
I've not experience it as well since it was checked in to the trunk.
Comment 45•23 years ago
|
||
*** Bug 153772 has been marked as a duplicate of this bug. ***
Comment 46•23 years ago
|
||
OK thanks. I haven't seen this for a few days either so marking as verified.
Status: RESOLVED → VERIFIED
Whiteboard: [pipelining][adt2 RTM] → [pipelining][adt2 RTM][verified-trunk]
Comment 47•23 years ago
|
||
adt1.0.1- per the ADT. let's get it in the next release.
should you disagree with the minus, pls remove the minus, leaving adt1.0.1 in
the keywords, then send an email to Drivers' and ADT for approval.
Comment 48•23 years ago
|
||
*** Bug 160222 has been marked as a duplicate of this bug. ***
Comment 49•23 years ago
|
||
I really disagree with ADT's decision.. This fix should be included in the next
Netscape release. Just look at all the dupes.
Comment 50•23 years ago
|
||
The should be only adt+ if NS has also the preference to enable pipelining..
Comment 51•23 years ago
|
||
please checkin to the MOZILLA_1_0_BRANCH branch. once there, remove the
"mozilla1.0.1+" keyword and add the "fixed1.0.1" keyword.
Keywords: mozilla1.0.1 → mozilla1.0.1+
Comment 52•23 years ago
|
||
Adding topembed+ per email from Jud Valeski.
Removing minus from adt1.0.1, and renominating this one for 1.0.1, as it is
needed by a major embeddor.
Comment 53•23 years ago
|
||
Adding adt1.0.1+ on behalf of the adt for checkin to the 1.0 branch. When you
check this into the branch, please change the mozilla1.0.1+ keyword to fixed1.0.1
Assignee | ||
Updated•23 years ago
|
Keywords: mozilla1.0.1+ → fixed1.0.1
Assignee | ||
Comment 54•23 years ago
|
||
fixed on the 1.0 branch
Comment 55•23 years ago
|
||
tever: tom, can you pls verify this as fixed on the 1.0 branch, then replace
"fixed1.0.1" with "verified1.0.1"? thanks!
Updated•23 years ago
|
Keywords: fixed1.0.1 → verified1.0.1
You need to log in
before you can comment on or make changes to this bug.
Description
•