Closed
Bug 337550
Opened 18 years ago
Closed 18 years ago
Network connection dies after browser has been idle
Categories
(Camino Graveyard :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: phiw2, Unassigned)
References
Details
(Keywords: hang, regression)
Attachments
(1 file)
2.70 KB,
text/plain
|
Details |
User-Agent: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en; rv:1.9a1) Gecko/20060510 Camino/1.2+
Build Identifier: Mozilla/5.0 (Macintosh; U; PPC Mac OS X Mach-O; en; rv:1.9a1) Gecko/20060510 Camino/1.2+
With the most recent Trunk builds - using 2006051020 (1.2+) - the browser fails to connect to websites after it has been idle for a moment. It shows 'loading' in the statusbar, but the site is never loaded.
Other symptoms observed: clicking on a link does nothing anymore.
This has happened in two ways:
load page, then go reading your email , back to browser, and it is dead
load page, let it sit there while I was looking at another browser on another computer.
First noticed with my own build (checkout start: Thu May 11 08:33:57 JST 2006), and now with the 2006051020 (1.2+) 'Maya' build.
Suspicion is on bug 326273.
Camino atm, can't check with Firefox - the equivalent FX tinderbox builds simply crash at start-up (bug 337481).
My network connection is alright, as I can connect with other browsers.
Reproducible: Always
Comment 1•18 years ago
|
||
This WFM with that same Maya build. Is it possible that this is just DNS stuff?
Reporter | ||
Comment 2•18 years ago
|
||
It happens with any site, including those loaded from my own dev. server.
Loading the same site, at the same moment nearly, in any other browser works perfectly.
Reporter | ||
Comment 3•18 years ago
|
||
Surprisingly, I haven't been able to reproduce this on OS X 10.3.9.
But it reproduces on two machines running 10.4.6.
I see this in today's (200605011-01) trunk nightly on 10.3.9. Even stuff that doesn't have to hit the network (Bookmarks) just goes "Loading..." forever.
One time the console.log printed this message:
libxpt: bad magic header in input file; found '', expected 'XPCOM\nTypeLib\r\n\032'
It doesn't show up every time, though.
We need to verify the regression ranges (was yesterday's nightly before , but since basically the only thing that landed on the trunk yesterday was bug 326273....
Reporter | ||
Comment 6•18 years ago
|
||
The last build that works for me was this 'Maya' build: 2006051008 (1.2+)
Ah, I got it to happen in the build I was using yesterday, too; apparently I had never paused long enough when I was doing stuff in the trunk builds yesterday--it seems the length of the pause required to cause this can sometimes be quite short and other times quite long :/
But the build philippe notes as the last working one is the last one before ThreadManager landed.
Darin, mento, any idea what might be causing Camino to "lose" network connectivity?
Blocks: nsIThreadManager
Comment 8•18 years ago
|
||
JUst to add, seeing the smae thing on Intel.
Comment 9•18 years ago
|
||
I saw a weird hang yesterday in my trunk build, is this a hang?
Keywords: hang
Comment 10•18 years ago
|
||
It sounds like something is preventing the processing of 'gecko' events. I'd start by investigating the changes made in widget/src/cocoa/.
Comment 11•18 years ago
|
||
Now seeing the same thing on 10.4.6, Camino Version 2006051122 (1.2+)
Comment 12•18 years ago
|
||
In debugging bug 337841, I'm seeing cases where we get stuck in the native run loop when we really should be getting called away from it to process Gecko events. In that case, Camino's UI would continue to run but Cocoafox's would not. That's consistent with the behavior I'm seeing.
Comment 13•18 years ago
|
||
> In debugging bug 337841
Make that bug 337481. Thanks for looking into this Mark!
Comment 14•18 years ago
|
||
I want to say I just ran into that on my Firefox trunk build too. I had to restart my computer before I could connect to any websites again. Maybe this isn't Mac-only?
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9a1) Gecko/20060511 Firefox/3.0a1
Comment 15•18 years ago
|
||
similar symptoms:
Bugzilla Bug 272787
After a random amount of time, unable to connect to anything
Comment 16•18 years ago
|
||
I think that bug is actually the same as this.
After a while of inactivity, I can't go to any page, even local files won't load.
This is really a smoketest blocker...
Comment 17•18 years ago
|
||
272787 is likely unrelated. Re comments 12, 14, 15, this is most likely shares the same cause as [part of] bug 337481, although I haven't tried debugging this one in isolation yet. The problem does not appear to be in the platform-specific appshell proper, but an unfortunate result of the way of a performance/stability/cleanliness-of-code enhancement I included in the Mac appshells interacts with an odd condition that seems to be preventing Gecko events from signaling the native loop. The root cause of the bug is actually most likely cross-platform, but the appshells for other platforms are more tolerant because they don't have the same optimization/don't need it, and handling any single system event like a mouse-move will unhang the app and let the main thread's gecko event queue drain.
A VERY rough diagram of the xp runloop, without all of the anti-starvation measures is:
while TRUE
process a gecko event
if there are no more gecko events
process a native event and block if none are available
else
process a native event and don't block
When allowed to block, the Mac "process a native event" code for Carbon and Cocoa calls right into system routines that run an event loop, so the Mac implementations plus the system internals look like this:
if can block
do while running
get os event from queue blocking until event is available
dispatch
else
get os event from queue, don't block if none is available
if got an event
dispatch
This differs from other platforms, which don't have the |do while running| clause and instead block waiting for a single event, dispatch it, and return.
The design of the new system is such that when a Gecko event occurs, if the system is blocked waiting on an event, it should be interrupted and return control back to Gecko. On the Mac, that means stopping the |do while running| loop. This ordinarily works, but apparently, it's sometimes failing. The failure doesn't seem to be in the platform appshells - it seems that the platform appshells just aren't being notified. This may be as simple as making certain ops atomic, which is something that Darin and I covered earlier in development, but the affected code has changed slightly now.
On non-Mac platforms, there's no |do while running| loop, so a failure to interrupt the call blocked waiting for a native event, while still wrong, doesn't hang all Gecko events on the main thread. It just takes a single native event to get things flowing again. This is almost definitely also causing a perf regression too (bug 337689?)
Because Camino has a native Cocoa FE and the native event loop is still spinning, Camino appears to be running, but tasks that Gecko handles on Gecko events (like network chatter) won't work. In the Fox and other apps, the XUL UI depends much more heavily on Gecko, so when you get stuck in the native loop and can't break free to process Gecko events, the app's UI will be more solidly wedged.
Updated•18 years ago
|
Severity: critical → blocker
Comment 18•18 years ago
|
||
mProcessingNextNativeEvent may be a problem. Using RunWasCalled the way we do in the Mac appshells may be a problem.
Comment 19•18 years ago
|
||
The patches in bug 337824 fix this bug.
Comment 20•18 years ago
|
||
Today's Camino nightly exits with no crash log, after it is idle for a minute or so. Very consistent.
There are many logouts like this in console.log:
2006-05-17 16:00:26.570 Camino[3864] *** _NSAutoreleaseNoPool(): Object 0x6632b00 of class BrowserWindowController autoreleased with no pool in place - just leaking
2006-05-17 16:00:26.570 Camino[3864] *** _NSAutoreleaseNoPool(): Object 0x665cc60 of class TopLevelWindowData autoreleased with no pool in place - just leaking
2006-05-17 16:00:26.570 Camino[3864] *** _NSAutoreleaseNoPool(): Object 0x6656160 of class NSCFString autoreleased with no pool in place - just leaking
Comment 21•18 years ago
|
||
Attached is the complete list of messages in console.log that appear when Camino exits.
Comment 22•18 years ago
|
||
One bug per bug report, please!
This bug (comment 0) is fixed by the checkin of bug 326273.
Comment 21 sounds like bug 338249.
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
Comment 24•18 years ago
|
||
I just experienced this again in my trunk build, after having used it without any problems for several hours.
Comment 25•18 years ago
|
||
Maybe there are still occasional problems getting the browser to leave its blocking wait? Håkan, since you were running for several hours, I assume you were using my test "stop" patch from bug 338249?
Comment 26•18 years ago
|
||
(In reply to comment #25)
> Maybe there are still occasional problems getting the browser to leave its
> blocking wait? Håkan, since you were running for several hours, I assume you
> were using my test "stop" patch from bug 338249?
>
Yeah, I think so.
You need to log in
before you can comment on or make changes to this bug.
Description
•