If you think a bug might affect users in the 57 release, please set the correct tracking and status flags for Release Management.

Mozilla sometimes consumes 100% CPU when pipelining is enabled...

VERIFIED FIXED in mozilla1.0.1

Status

()

Core
Networking
P1
major
VERIFIED FIXED
16 years ago
15 years ago

People

(Reporter: Aleksey Nogin, Assigned: Darin Fisher)

Tracking

({hang, regression, topembed+})

Trunk
mozilla1.0.1
x86
All
hang, regression, topembed+
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [pipelining][adt2 RTM][verified-trunk])

Attachments

(1 attachment, 2 obsolete attachments)

(Reporter)

Description

16 years ago
I've seen this happen twice when doing essentially the same thing in both cases
(in a secure banking site followed a link while some frame was still saing
"Please wait"). Mozilla would hang, one thread would start using all the CPU
time and the windows would not even redraw (and, of course, would not react to
anything I might do).

I attached gdb to Mozilla when it happened the second time, and let it run,
interrupting it several times and getting the stack trace for the thread eating
all the CPU. Here are the stacks I saw:

Thread 3 (Thread 1026 (LWP 12749)):
#0  0x4017065e in nsPipe::AddRef () from /usr/lib/libxpcom.so
#1  0x401714f6 in nsPipe::nsPipeOutputStream::AddRef () from /usr/lib/libxpcom.so
#2  0x40171630 in nsPipe::nsPipeOutputStream::WriteSegments () from
/usr/lib/libxpcom.so
#3  0x40171b1e in nsPipe::nsPipeOutputStream::WriteFrom () from /usr/lib/libxpcom.so
#4  0x407ceec9 in nsHttpTransaction::OnDataWritable () from
/usr/lib/mozilla/components/libnecko.so
#5  0x407e0448 in nsHttpPipeline::OnDataWritable () from
/usr/lib/mozilla/components/libnecko.so
#6  0x407ce3a1 in nsHttpConnection::OnDataWritable () from
/usr/lib/mozilla/components/libnecko.so
#7  0x4079cb11 in nsSocketWriteRequest::OnWrite () from
/usr/lib/mozilla/components/libnecko.so
#8  0x407994a3 in nsSocketTransport::doReadWrite () from
/usr/lib/mozilla/components/libnecko.so
#9  0x4079857d in nsSocketTransport::Process () from
/usr/lib/mozilla/components/libnecko.so
#10 0x4079d538 in nsSocketTransportService::Run () from
/usr/lib/mozilla/components/libnecko.so
#11 0x4018d1b3 in nsThread::Main () from /usr/lib/libxpcom.so
#12 0x4020f69c in _pt_root () from /usr/lib/libnspr4.so
#13 0x40221c6f in pthread_start_thread (arg=0x40b16be0) at manager.c:284


Thread 3 (Thread 1026 (LWP 12749)):
#0  0x40225b50 in __pthread_unlock (lock=0x447fbd0c) at spinlock.c:260
#1  0x4020a037 in PR_Unlock () from /usr/lib/libnspr4.so
#2  0x4020a5e1 in PR_ExitMonitor () from /usr/lib/libnspr4.so
#3  0x4018afbf in nsAutoMonitor::Exit () from /usr/lib/libxpcom.so
#4  0x40171773 in nsPipe::nsPipeOutputStream::WriteSegments () from
/usr/lib/libxpcom.so
#5  0x40171b1e in nsPipe::nsPipeOutputStream::WriteFrom () from /usr/lib/libxpcom.so
#6  0x407ceec9 in nsHttpTransaction::OnDataWritable () from
/usr/lib/mozilla/components/libnecko.so
#7  0x407e0448 in nsHttpPipeline::OnDataWritable () from
/usr/lib/mozilla/components/libnecko.so
#8  0x407ce3a1 in nsHttpConnection::OnDataWritable () from
/usr/lib/mozilla/components/libnecko.so
#9  0x4079cb11 in nsSocketWriteRequest::OnWrite () from
/usr/lib/mozilla/components/libnecko.so
#10 0x407994a3 in nsSocketTransport::doReadWrite () from
/usr/lib/mozilla/components/libnecko.so
#11 0x4079857d in nsSocketTransport::Process () from
/usr/lib/mozilla/components/libnecko.so
#12 0x4079d538 in nsSocketTransportService::Run () from
/usr/lib/mozilla/components/libnecko.so
#13 0x4018d1b3 in nsThread::Main () from /usr/lib/libxpcom.so
#14 0x4020f69c in _pt_root () from /usr/lib/libnspr4.so
#15 0x40221c6f in pthread_start_thread (arg=0x40b16be0) at manager.c:284


#0  0x40223e59 in pthread_self () at pthread.c:701
#1  0x40209fc8 in PR_Lock () from /usr/lib/libnspr4.so
#2  0x4020a57e in PR_EnterMonitor () from /usr/lib/libnspr4.so
#3  0x40171659 in nsPipe::nsPipeOutputStream::WriteSegments () from
/usr/lib/libxpcom.so
#4  0x40171b1e in nsPipe::nsPipeOutputStream::WriteFrom () from /usr/lib/libxpcom.so
#5  0x407ceec9 in nsHttpTransaction::OnDataWritable () from
/usr/lib/mozilla/components/libnecko.so
#6  0x407e0448 in nsHttpPipeline::OnDataWritable () from
/usr/lib/mozilla/components/libnecko.so
#7  0x407ce3a1 in nsHttpConnection::OnDataWritable () from
/usr/lib/mozilla/components/libnecko.so
#8  0x4079cb11 in nsSocketWriteRequest::OnWrite () from
/usr/lib/mozilla/components/libnecko.so
#9  0x407994a3 in nsSocketTransport::doReadWrite () from
/usr/lib/mozilla/components/libnecko.so
#10 0x4079857d in nsSocketTransport::Process () from
/usr/lib/mozilla/components/libnecko.so
#11 0x4079d538 in nsSocketTransportService::Run () from
/usr/lib/mozilla/components/libnecko.so
#12 0x4018d1b3 in nsThread::Main () from /usr/lib/libxpcom.so
#13 0x4020f69c in _pt_root () from /usr/lib/libnspr4.so
#14 0x40221c6f in pthread_start_thread (arg=0x40b16be0) at manager.c:284


Thread 3 (Thread 1026 (LWP 12749)):
#0  0x40225a9b in __pthread_lock (lock=0x447fbd1c, self=0x0)
    at ../linuxthreads/sysdeps/i386/i686/pt-machine.h:57
#1  0x40222d0e in __pthread_mutex_lock (mutex=0x447fbd0c) at mutex.c:101
#2  0x40209fbc in PR_Lock () from /usr/lib/libnspr4.so
#3  0x4020a57e in PR_EnterMonitor () from /usr/lib/libnspr4.so
#4  0x40171659 in nsPipe::nsPipeOutputStream::WriteSegments () from
/usr/lib/libxpcom.so
#5  0x40171b1e in nsPipe::nsPipeOutputStream::WriteFrom () from /usr/lib/libxpcom.so
#6  0x407ceec9 in nsHttpTransaction::OnDataWritable () from
/usr/lib/mozilla/components/libnecko.so
#7  0x407e0448 in nsHttpPipeline::OnDataWritable () from
/usr/lib/mozilla/components/libnecko.so
#8  0x407ce3a1 in nsHttpConnection::OnDataWritable () from
/usr/lib/mozilla/components/libnecko.so
#9  0x4079cb11 in nsSocketWriteRequest::OnWrite () from
/usr/lib/mozilla/components/libnecko.so
#10 0x407994a3 in nsSocketTransport::doReadWrite () from
/usr/lib/mozilla/components/libnecko.so
#11 0x4079857d in nsSocketTransport::Process () from
/usr/lib/mozilla/components/libnecko.so
#12 0x4079d538 in nsSocketTransportService::Run () from
/usr/lib/mozilla/components/libnecko.so
#13 0x4018d1b3 in nsThread::Main () from /usr/lib/libxpcom.so
#14 0x4020f69c in _pt_root () from /usr/lib/libnspr4.so
#15 0x40221c6f in pthread_start_thread (arg=0x40b16be0) at manager.c:284
(Reporter)

Updated

16 years ago
Keywords: hang
(Reporter)

Comment 1

16 years ago
P.S. BuildID 2002052314 (trunk) on RedHat Linux 7.2

Comment 2

16 years ago
maybe darin can look at this.
Assignee: dougt → darin
(Assignee)

Comment 3

16 years ago
reporter: looks like you have HTTP pipelining enabled.  does this ever happen
when pipelining is disabled?
Status: NEW → ASSIGNED
Priority: -- → P1
Whiteboard: [pipelining]
Target Milestone: --- → mozilla1.0.1
(Reporter)

Comment 4

16 years ago
Yes, I probably do have it enabled. I am not sure what happens if it is not
enabled - the only way of reproducing the bug that I know of involves messing
with my internet banking in some weird ways, which is something I'd rather not do.
(Assignee)

Comment 5

16 years ago
aleksey: have you encountered this bug with recent nightly trunk builds? 
starting with 1.1alpha?
(Assignee)

Comment 6

16 years ago
downgrading for now since i don't really have enough information to do anything
about this bug and it doesn't seem like it is happening that often or to many
folks.  it's also possible that one or more of the recent pipelining bug fixes
also fixed this.
Severity: critical → major
Target Milestone: mozilla1.0.1 → ---
(Reporter)

Comment 7

16 years ago
I can still reproduce this using BuildID 2002061717 on RedHat Linux 7.2

It now looks somewhat different.
1) In the offending window I see the "Please wait" message (that would normally
be replaced with a message telling me the transaction is successful) and *no
indicators* of network activity - both the "M" icon and status/progress bar look
as if the page have finished loading.

2) Mozilla does not hang, but goes into a state where any further network
activity (I tried HTTP and IMAP) is impossible:
- I tried to view a message (from IMAP) - the progress indicators looked as if
it started loading the message, but nothing ever happens.
- Next I tried to middle-click on a link (e.g. "open link in the new window"). I
did get a new window with all the progress indicators being "alive", but again,
nothing was ever loaded.

As before, gdb showed that the thread consuming most of the CPU was in
WriteSegments function all the time.

When I tried to quit Mozilla, it hanged for good.
(Assignee)

Comment 8

16 years ago
aleksey: wow! ok are there any steps i might be able to use try to reproduce the
problem?  are you able to reproduce it readily simply by visiting a particular
site?  are you connecting to the internet using a proxy server?  if so, do you
happen to know what kind of proxy server?  thx!
(Reporter)

Comment 9

16 years ago
I use squid on localhost, but only for http/ftp, while the site in question is
https. I can reproduce this simply by trying to transfer money between savings
and checking on a particular online banking site (small local bank).

BTW, the reason it looked differently yesterday may be simply because I did
something slightly different - when I saw "Please wait" I didn't try to follow a
link in the offending window, but went to another window (trying to read mail).
Possibly it hangs only when I attempt to do something in the offending window
(including trying to close it) and if I leave that window alone, it blocks
network activity in other windows.

Yeah, I wish I had a test case that didn't involve messing with my bank account...

Comment 10

16 years ago
I'm also seeing this bug with current trunk builds on win2k. No proxy.

Steps to reproduce (sometimes)
1. Go to http://www.anandtech.com
2. Click on one of the articles in the main column
3. Browse through the article
4. Try & stop some of the pages in the article before they load completely 

I was able to reproduce this bug 2/3 of the time following the steps above.
OS: Linux → All
(Assignee)

Comment 11

16 years ago
walk84: what kind of internet connection do you have?  modem?

gagan, dave: this is that pipelining bug that i mentioned to you today.
Keywords: topembed

Comment 12

16 years ago
darin, yes. 56k.

Updated

16 years ago
Blocks: 144480

Updated

16 years ago
No longer blocks: 144480
*** Bug 153772 has been marked as a duplicate of this bug. ***

Comment 14

15 years ago
just saw this at http://oneandone.co.uk/xml/init?k_id=3978017 , with trunk
2002070604 on win2k. pipelining enabled, using 64k ISDN.

I also reproduced it 2 times out of 2 with the steps in comment 10 (in fact, i
didn't have to fiddle with anything, it happened while loading an article).
disabled pipelining and it worked 2 times out of 2.

Updated

15 years ago
Keywords: mozilla1.1

Comment 15

15 years ago
*** Bug 154373 has been marked as a duplicate of this bug. ***
Keywords: regression, relnote
*** Bug 152643 has been marked as a duplicate of this bug. ***

Comment 17

15 years ago
*** Bug 158473 has been marked as a duplicate of this bug. ***

Comment 18

15 years ago
*** Bug 156142 has been marked as a duplicate of this bug. ***

Comment 19

15 years ago
*** Bug 159212 has been marked as a duplicate of this bug. ***
*** Bug 159327 has been marked as a duplicate of this bug. ***
(Assignee)

Updated

15 years ago
Target Milestone: --- → mozilla1.1beta
(Assignee)

Comment 21

15 years ago
Created attachment 92856 [details] [diff] [review]
v1 patch

i was able to consistently reproduce this bug on
http://www.battle.net/war3/nightelf (from one of the dupes), and the problem
turned out to be in the socket transport.  we simply weren't clearing the
socket error state before calling OnDataAvailable.  in some cases,
nsHttpPipeline::OnDataReadable would return NS_OK without ever trying to read
from the socket.  as a result, the socket transport would sit in a loop trying
to call OnDataAvailable because it erroneously believed that the socket read
resulted in a WOULD_BLOCK error despite the fact that read was never called on
the socket.  whew!

anyways, this simple patch fixes the problem.  however, it doesn't seem to
address the stack trace initially reported.  i'm still thinking about that one.
(Assignee)

Comment 22

15 years ago
fixing the component and summary
Component: XPCOM → Networking
Summary: Mozilla hangs in nsPipe::nsPipeOutputStream::WriteSegments with 100% CPU usage... → Mozilla sometimes consumes 100% CPU when pipelining is enabled...
(Assignee)

Comment 23

15 years ago
QA -> tever
QA Contact: scc → tever
(Reporter)

Comment 24

15 years ago
Not sure about the new summary - for me it indeed used to hang Mozilla or at
least make all feature network activity in all windows (including Mail) impossible.
(Assignee)

Comment 25

15 years ago
aleksey: hmm... have you seen that problem recently?
(Reporter)

Comment 26

15 years ago
No, but I have not tried reproducing it recently. I might try once your patch
goes in.

Comment 27

15 years ago
Site mentioned in comment 21 will completely hang my computer (not just Mozilla
but the OS) for 20-30 seconds - although it does still buffer some keystrokes
for later processsing.  It does this about once every 5 minutes, starting about
5 minutes after loading the site.  100% reproducible for me.  (Without
pipelining everything is fine.)
(Assignee)

Comment 28

15 years ago
Aleksey: sounds to me as if the problem you were seeing might be explained by
this bit of code inside nsHttpPipeline::OnDataWritable():

    while (1) {
        rv = mTransactionQ[i]->OnDataWritable(outputStream);
        if (rv == NS_BASE_STREAM_CLOSED)
            break; // advance to next transaction
        if (NS_FAILED(rv))
            return rv; // something bad happened!!
        // else, there's more to write (the transaction may be
        // writing in small chunks).
    }

if mTransactionQ[i]->OnDataWritable(...) returned NS_OK without actually writing
anything to outputStream, then we could get stuck in this loop.  moreover, this
loop runs on the socket transport thread (a background thread), meaning that the
UI would still function, but you'd be unable to use the network (as you mentioned).

i'm thinking of adding an additional check to this loop to verify that data was
actually written to the stream.  if no data is written and NS_OK is returned,
then i'll assume the transaction is done writing and advance to the next
transaction.  that should make this bit of code more robust, but it does seem
like a bit of a hack.  nsHttpTransaction::OnDataWritable is designed to return
NS_BASE_STREAM_CLOSED when it cannot write any data.  for some reason, it is not
doing that.
(Assignee)

Comment 29

15 years ago
Created attachment 92941 [details] [diff] [review]
v2 patch

ok, this patch combines both fixes.
Attachment #92856 - Attachment is obsolete: true

Comment 30

15 years ago
Comment on attachment 92856 [details] [diff] [review]
v1 patch

nit:
In the comment: not GetWouldBlock(), but GotWouldBlock.  

other than that, seams reasonable. r=dougt
Attachment #92856 - Flags: review+

Comment 31

15 years ago
Comment on attachment 92941 [details] [diff] [review]
v2 patch

r=dougt
Attachment #92941 - Flags: review+
(Assignee)

Comment 32

15 years ago
Created attachment 92946 [details] [diff] [review]
v2.1 patch - revised spelling mistake
Attachment #92941 - Attachment is obsolete: true
(Assignee)

Comment 33

15 years ago
Comment on attachment 92946 [details] [diff] [review]
v2.1 patch - revised spelling mistake

forwarding r=dougt
Attachment #92946 - Flags: review+

Comment 34

15 years ago
Comment on attachment 92946 [details] [diff] [review]
v2.1 patch - revised spelling mistake

sr=rpotts@netscape.com
Attachment #92946 - Flags: superreview+

Comment 35

15 years ago
Comment on attachment 92946 [details] [diff] [review]
v2.1 patch - revised spelling mistake

a=asa (on behalf of drivers) for checkin to 1.1
Attachment #92946 - Flags: approval+
(Assignee)

Comment 36

15 years ago
fixed-on-trunk
Keywords: adt1.0.1, mozilla1.0.1
Whiteboard: [pipelining] → [pipelining][adt2 RTM]
(Assignee)

Comment 37

15 years ago
-> moz 1.0.1
Target Milestone: mozilla1.1beta → mozilla1.0.1

Comment 38

15 years ago
*** Bug 156405 has been marked as a duplicate of this bug. ***
(Assignee)

Comment 39

15 years ago
resolving FIXED to get QA's attention.
Status: ASSIGNED → RESOLVED
Last Resolved: 15 years ago
Resolution: --- → FIXED

Comment 40

15 years ago
Darin, aren't you looking for the fixed1.0.1 keyword?

Comment 41

15 years ago
*** Bug 143131 has been marked as a duplicate of this bug. ***

Comment 42

15 years ago
Could everyone who is able to reproduce this check that it is fixed in the
nightly and add their comments?  I've seen a few of the 100% cpu hangs but they
usually occur after I leave my machine running all night (with mail open I
think).  I haven't been able come up with a reliable test case.  All of the
sites in the dupes work for me.  Thanks

Anthony, the fixed1.0.1 is used to signify that the fix is in the branch and
ready for testing.

Comment 43

15 years ago
It's not happened for me since this was checked in. Bug 160222 was reported
today, but that was with an older build. Seems to me it was clobbered fine.

Comment 44

15 years ago
I've not experience it as well since it was checked in to the trunk.

Comment 45

15 years ago
*** Bug 153772 has been marked as a duplicate of this bug. ***

Comment 46

15 years ago
OK thanks.  I haven't seen this for a few days either so marking as verified.  
Status: RESOLVED → VERIFIED
Whiteboard: [pipelining][adt2 RTM] → [pipelining][adt2 RTM][verified-trunk]

Comment 47

15 years ago
adt1.0.1- per the ADT. let's get it in the next release. 

should you disagree with the minus, pls remove the minus, leaving adt1.0.1 in
the keywords, then send an email to Drivers' and ADT for approval.
Keywords: adt1.0.1 → adt1.0.1-

Comment 48

15 years ago
*** Bug 160222 has been marked as a duplicate of this bug. ***
I really disagree with ADT's decision.. This fix should be included in the next
Netscape release. Just look at all the dupes.
The should be only adt+ if NS has also the preference to enable pipelining..

Comment 51

15 years ago
 please checkin to the MOZILLA_1_0_BRANCH branch. once there, remove the
"mozilla1.0.1+" keyword and add the "fixed1.0.1" keyword. 
Keywords: mozilla1.0.1 → mozilla1.0.1+

Comment 52

15 years ago
Adding topembed+ per email from Jud Valeski. 

Removing minus from adt1.0.1, and renominating this one for 1.0.1, as it is
needed by a major embeddor.
Blocks: 143047
Keywords: adt1.0.1-, relnote, topembed → adt1.0.1, topembed+

Comment 53

15 years ago
Adding adt1.0.1+ on behalf of the adt for checkin to the 1.0 branch.  When you
check this into the branch, please change the mozilla1.0.1+ keyword to fixed1.0.1
Keywords: adt1.0.1 → adt1.0.1+
(Assignee)

Updated

15 years ago
Keywords: mozilla1.0.1+ → fixed1.0.1
(Assignee)

Comment 54

15 years ago
fixed on the 1.0 branch

Comment 55

15 years ago
tever: tom, can you pls verify this as fixed on the 1.0 branch, then replace
"fixed1.0.1" with "verified1.0.1"? thanks!

Updated

15 years ago
Keywords: fixed1.0.1 → verified1.0.1
You need to log in before you can comment on or make changes to this bug.