Closed Bug 80457 Opened 25 years ago Closed 25 years ago

crash or hang loading page in nsHttpChunkedDecoder

Categories

(Core :: Networking, defect)

x86
All
defect
Not set
blocker

Tracking

()

VERIFIED FIXED

People

(Reporter: jg, Assigned: darin.moz)

References

()

Details

(4 keywords)

Attachments

(5 files)

cvs build from today, having removed dist/bin/component.reg and created a new profile, I'm hanging loading the checkins page. This prolly hangs the product on tons of other pages too.
Ugh, ok try reassigning again -- cache messed up. Darin is apparently being paged on this, reassigning to him. Adding smoketest keyword since this is being seen by at least one other person and it's highly likely to be hit during smoketesting.
Assignee: neeti → darin
RH7.1 without compat egcs (using gcc 2.96-81) so perhaps a fringe case, but for me moz crash when loading the checkin page. Doesn't crash on bug-page. Pulled at 05/12/2001 01:32 bonsai time, clobbered and built.
sorry: WITH compat-egcs (6.2)
Yeah the page crashed for me too with an overnight build (checked out from cvs shortly after the carpools). So I decided to pull 'n' build again today see if it fixed it, but it hangs now. I've given it to darin since cls is seeing hangs/crashes on this page too, and his backtrace indicated networking. cc him.
Hmm. Crashed twice in a row with that backtrace. Deleted cache - tried again - same crash. But now, after first surfing a lot of other sites, the page loaded once just fine. There's trouble today though - can't send attachments in bugzilla, crash when i reply to ng postings.. will repull
Ought to mention that yesterday I upgraded to gcc 2.95.4 from .3. I'm also building with -O2.
rebuilt - i still crash but the backtrace changed. Now it looks more like network related. Since we don't use egcs this might not be a blocker. Someone on egcs should confirm this. As far as i understand it, mozilla isn't guaranteed to even build with gcc 2.95 and above - far less run - since it's designed for an old compiler
investigating...
Attached file new backtrace
Darin: are you able to reproduce? Anyone tried with a nightly?
I crash trying to load this page on Windows 98 with the latest nightly (the one posted May 12 at 11:32) Windows Installer. Sorry I don't have the build ID - I reverted to a previous build because of the crashing. This is on an old profile.
I first noticed this with a 2001051104 release nightly that was put on the FTP server between 5pm to 6pm on May 11. The build with the exact same Build ID, but put on the server much earlier on the same day, DID NOT experience this crash. Tested on Windows 98 SE. This might not be related to this bug at all, but in the first 2001051104 context menus were completely broken, while the later 2001051104 had them semi-functional.
r=dr
bug 80481 reports a crash on the same page in Windows. Several talkbacks referred to there. Dup?
and bug 80478 seems to be a dup. Same backtrace as in "new attachment" here.
FWIW, this patch fixes the frequent crashes I was seeing yesterday loading tinderbox build logs, but it doesn't fix the fact that the logs don't finish loading.
i've seen a lot of pages seemingly never finishing to load lately: even if all content is on page the throbber keeps on going. Is that what you see, or is content actually missing?
Content is missing whenever I select "View Full Log" after clicking on one of the "L" links in http://tinderbox.mozilla.org/showbuilds.cgi?tree=SeaMonkey .
R.K.Aa: The problem of pages seemingly never finishing to load is covered by multple bugs in bugzilla, and in fact TWO tracking bugs for them, so they are seperate. My build from yesterday morning (prior to tree closure) did work fine iirc -- or at least I would have reported this then. So this is definitely a recent regression.
With the patch, I'm seeing a hang loading the tinderbox logs with disk cache turned off (via prefs). And it didn't finish loading on top of that. Turning disk cache back on doesn't seem to make a difference.
Does the "hang" you're seeing involve full CPU usage? If so, it may be fixed by the event supression patch to nsSocketTransport Darin emailed me yesterday evening.
I just built (linux) from the tip, with Memory and Disk cache turned on, and the bonsai page now _crashes_ for me quite a way into the load, I'd say about five pages worth and still coming. Re: did it hang with 100% cpu before - no it didn't. Mozilla froze and took no cpu time at all. ps ax | grep mozilla showed two entries, when I normally expect 5 or more which may or may not be significant (right now I've got five in total).
Keywords: crash
modifying summary so people find this bug
Summary: hang loading page → crash loading bonsai queries
I am seeing this under Windows XP build 2001051208. Change OS to ALL?
OS: Linux → All
*** Bug 80481 has been marked as a duplicate of this bug. ***
I've had a lot of crashes associated with this bug and generated a lot of talkbacks. (See bug 80481) I don't know if it adds anything to the discussion but I just had the crash again and it was a little different. All my previous crashes happened in the same (wish i could remember) dll. This latest happened in a different (one of the javascripts) dll. Anyway, if it's helpful, its talkback TB30371522G
*** Bug 80472 has been marked as a duplicate of this bug. ***
Per bug 80472, this is not confined to bonsai, re-summarising for better searches -- sorry for spam :( Anyone seeing this on Mac?
Summary: crash loading bonsai queries → crash or hang loading page
Semiverified, Linux tar.gz 2001051206...http://www.isnnews net and http://www.linuxgames.com crash as well. I'm not COMPLETELY certain this is the same bug, but this trouble came up at or around the same time as the blocker, so I'm guessing 80% chance. (Can't do a stack trace right now to completely confirm; GDB isn't cooperating...) What's odd is that sometimes the processes don't completely die on segfault; one or two idle processes remain that have to be killed manually. I don't get that...
i'm (still) seeing this, probably because the fix isn't in :) adding cc. exact same stack trace for me as the "new backtrace" attachment, by the way. also, i can confirm david caswell's observation that some processes don't die cleanly after the segfault, fwiw.
Summary: crash or hang loading page → crash or hang loading page in nsHttpChunkedDecoder
Some additional info that might or might not be useful: In the debugger, after the initial crash (which yielded the backtrace indicating nsHttpChunkedDecoder), I did a "next" and received a *new* segfault with a different trace: #0 0x40276033 in PR_SetThreadPrivate (index=1, priv=0x0) at prtpd.c:165 #1 0x40113c3b in nsAutoLockBase::~nsAutoLockBase (this=0xbedffa3c, __in_chrg=0) at nsAutoLock.cpp:262 #2 0x4015e06a in nsAutoLock::~nsAutoLock (this=0xbedffa3c, __in_chrg=2) at ../../../../dist/include/nsAutoLock.h:141 #3 0x401182aa in nsThreadPool::GetRequest (this=0x81632d0, currentThread=0x82a5a50) at nsThread.cpp:596 #4 0x40119080 in nsThreadPoolRunnable::Run (this=0x82a5a38) at nsThread.cpp:834 #5 0x40116bf5 in nsThread::Main (arg=0x82a5a50) at nsThread.cpp:106 #6 0x402903ee in _pt_root (arg=0x82a5ac8) at ptthread.c:198 #7 0x402b0c8e in pthread_start_thread_event (arg=0xbedffc00) at manager.c:274 It looks like |self| is junk: (gdb) print self $1 = (PRThread *) 0x33090909 (gdb) print self->privateData Error: Cannot access memory at address 0x3309092d (gdb) print self->tpdLength Error: Cannot access memory at address 0x33090929 This might explain why some processes don't die -- perhaps our threads are freaking out before they can kill them...? I don't know. Hope this helps :)
FYI, I get another new segfault in our event queue, following the last one in threads. gdb just died on me, but I get the feeling we have some underlying safety issues in threads, which are probably the subject of a different bug.
*** Bug 80483 has been marked as a duplicate of this bug. ***
*** Bug 80480 has been marked as a duplicate of this bug. ***
r=bbaetz
*** Bug 80605 has been marked as a duplicate of this bug. ***
darin: i note these lines in the patch: - p = PL_strstr(buf, "\r\n"); + p = PL_strnstr(buf, "\r\n", count); can we always be sure that the data has CRLF linebreaks here? Isn't is the case that some pages may be served with just CR, or just LF breaks?
Just pulled and built after applying the patch: Crash is gone loading bonsai, and loading tinderbox "L" links now complete. Loading URL from bug 80472 i still see another crash - stream/plugin related. Backtrace in bug 80504.
simon: i was going by the spec here.. assuming servers would send \r\n, but perhaps that is too optimistic of me. it looks like the old code accepted either \n or \r\n. i'll submit a new patch, which does the same.
sr=jst
With this patch, bug 80472 and bonsai is fixed (win32)
fix checked in
Status: NEW → RESOLVED
Closed: 25 years ago
Resolution: --- → FIXED
Tinderbox and Bonsai pages work following patch on the tip. VERIFIED on linux.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: