Closed Bug 82720 Opened 24 years ago Closed 24 years ago

Pages do not load or display completely [M092]

Categories

(Core :: Layout, defect, P1)

defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: zackw, Assigned: darin.moz)

References

()

Details

(Keywords: helpwanted, regression, testcase, Whiteboard: [PDT+] fixed on branch, remainder on trunk)

Attachments

(9 files)

Most pages on the site referenced above do not load completely. Parts of the page will appear, others are left blank or replaced by random HTTP headers. Hitting reload sometimes causes more of the page to load, sometimes causes less of it to load. The problem is worst with the mailing list archives (e.g. http://subversion.tigris.org/servlets/SummarizeList?listName=dev) but can be seen with any of the pages. If you don't see it immediately, hit reload a couple of times. Idle speculation: problems with chunked transfers and/or keep-alive?
reporter: Have you enabled keep-alive ? If yes, can you ty to disable it ? (Preferences/debug/networking). And please add to every bug report the build ID !
Yes, I had keep-alive enabled, but not pipelining. Turning off keep-alive makes mozilla behave *predictably* - ie. it always loads the same amount of each page - but it still fails to load pages from this site completely. Pipelining on/off doesn't seem to make a difference. 'View Source' suggests that each page is getting cut off after some part of it loads. sorry about the build-id, it's 2001052308
WinMe 2001062104 I'm seeing this on many different sites. A page will start loading, and then just stop halfway through. Reload will get the whole page. The last time I saw it was when retrieving the order confirmation page on www.newegg.com . I needed to press reload to get the whole page. (and confirm button!) Disabling keep-alive has no effect on the problem.
Status: UNCONFIRMED → NEW
Ever confirmed: true
tor can't see this when he enables logging. On 0.9.1 (which is all I have ATM), I can packet sniff. We're double loading because of the charset problem, but we do appear to be laoding the entire document each time. I'm seeing the headers (presumably from the 2nd load) at the bottom of the page, and most of the content is missing. It doesn't happen all the time, though.
Managed to catch an aborted load while logging. The page that failed to load completely is http://www.enid-rayeadams.com/photos.htm (second page in the log). I've inserted a comment in the log showing when activity had ceased and I hit reload. The connection is running through a junkbuster proxy, for what it's worth.
i wonder if this has to do with bug 87047.
I don't think so - turned off the proxy and it still does an incomplete load for the failed page in my log (stopped at the same point, incidentally).
-> moz 0.9.3
Priority: -- → P1
Target Milestone: --- → mozilla0.9.3
There is another site where you can see this problem: http://www.ragreiner.de I tested with the following Mozilla versions: 0.9.2 for Linux and Netscape 6.1 PR1 for Windows, the problems exists in both versions. (NS4, MSIE and Konqueror display all pages on this site without problems.) "View" -> "Source" shows the COMPLETE source text of the page, so the problem does not seem to be caused by incomplete loading of the page over the network.
Target Milestone: mozilla0.9.3 → mozilla1.0
moving out milestone. Will try to get this in for 0.9.3 if a tested patch is submitted
*** Bug 87556 has been marked as a duplicate of this bug. ***
There are a whole lot of pages which have this problem. Some appear fixed (possibly with the charset patch, since they all had charset tags). this one occurs all the time, and on other pages (eg bugzilla) it just happens sometimes. I have no idea where to start looking, unfortunately, and neither I nor darin saw anything sticking out in that log.
benc: I don't suppose you can come up with a simple test case? :)
Keywords: qawanted
*** Bug 83175 has been marked as a duplicate of this bug. ***
I see the HTTP log, but I think this may actually *not* be an HTTP bug. Some info from bug 83175, which I just duped: 1. This seems to be fixed on the trunk -- it is still present on the 0.9.2 branch. 2. This happens in MailNews (with HTML emails) as frequently as it does in the browser. This leads me to believe that it isn't an HTTP bug. 3. I don't use a proxy, and I do see this bug. Others who do use proxies also see this bug. Proxy should be unrelated. 4. Kent Dorsey posted some info on when this bug disappeared on the trunk, although we haven't correlated the fix to any particular patch. I *strongly* recommend a little bit of investigation to see what patch needs to be applied to the branch to fix this, or why a patch that may have gone into both trees didn't fix them both. This shouldn't need a patch by itself... We will look like rather large morons if we ship a web browser that doesn't fully load webpages.
Assignee: neeti → asa
Severity: normal → blocker
Component: Networking: HTTP → Browser-General
QA Contact: benc → doronr
Whiteboard: [critical for 0.9.2.1] Fixed on trunk, but not sure why. Need investigation.
I don't fix bugs. Moving a bug to Browser-General and assigning it to me pushes it further away from a fix, not closer. Browser-General is reserved for bugs which don't fit into any of the real components. If it stays assigned to me I'm going to test in on the trunk and mark it Worksforme if it can't be reproduced.
dr - I agree. My bet for the fixe is vidur's meta charset reload patch, based on looking at urls this problem was reported on in the beta feedback. Since that patch fixed the ibench results, its probaby nsbranch. there are two problems - partial page loads (like this case), and sites where the page just comes up blank. the second case is what was probably fixed with the charset page. The subversion url here only displays part of the page though, even on the truck. Could we have a charset converter bug of some sort?
Oookay... ->vidur.
Assignee: asa → vidur
Component: Browser-General → Layout
Target Milestone: mozilla1.0 → ---
I don't see any problems with the subversion page with the Linux 2001070208 (trunk). And the bug which was seen as fixed on the trunk was with only partial loading of a page not the blank page displayed.
Maybe I've got the fix on the trunk which fixed this. It could be the fix for the bug 82418 by Darin Fisher. If you look at the 'now marked as duplicate' bug 83175, there is report that the fix was made sometime in 2001-06-26, which is exactly the same time when the fix for bug 82418 was checked in.
I'm still seeing the bug on the trunk (2001070408, Linux) with the same test URL (subversion.tigris.org). It's not as bad as it used to be - what mostly happens is it displays part of the page, then I hit reload and it displays the whole thing. Caching issue?
Additional to what I said just now. Once a page is in cache, it is displayed completely. However, the first time I visit any page at subversion, it is incomplete. The View Source listing matches the rendered version; that is, if the rendering is incomplete, so is the source listing. TCP tracing indicates that hitting reload causes additional network traffic. I will post an attachment with the trace.
The trunk does not fix subversion.tigris.org loading problems for me (similar problems can be found on javasoft.com as well). My loading problems at subversion are similar in nature, but not the same per page, as the ones noted here. However, the trunk does fix the test case entered by me for the now duplicate bug 83175 as of the build dates mentioned in that bug. As previously noted, the bug is not consistently reproducible using the test case in bug 83175 with the Classic skin, though it is consistent with Modern across many builds. Why would switching skins affect the reproducibility? I would think this would point to a memory corruption type of bug, except that it is *consistently* reproducible across many builds. Is this one bug or many? Is there one underlying cause or many? Food for thought.
*** Bug 89352 has been marked as a duplicate of this bug. ***
Summary: Most pages on this site fail to load completely → Pages do not load or display completely [M092]
vidur, have you had a chance to look at this? looks like a possible ugly stopper... I'm not sure I see it on my latest build
chofmann: Your latest *branch* build, I hope... Not sure I made that entirely clear to everybody.
Whiteboard: [critical for 0.9.2.1] Fixed on trunk, but not sure why. Need investigation. → [critical for 0.9.2.1][PDT] Fixed on trunk, but not sure why. Need investigation.
This bug is easier to reproduce with .php pages Try looking webpages that have forums in it made in php (slackware forums did this for me today)
*** Bug 89472 has been marked as a duplicate of this bug. ***
marking PDT+. The + was missing from the PDT marking already on the status whiteboard.
Whiteboard: [critical for 0.9.2.1][PDT] Fixed on trunk, but not sure why. Need investigation. → [critical for 0.9.2.1][PDT+] Fixed on trunk, but not sure why. Need investigation.
*** Bug 89597 has been marked as a duplicate of this bug. ***
This bug is still reproducible using bug 83175 test case under windows 2000 server, modern skin, and windows talkback 0.9.2 build id 2001070710. Also, it occurs when looking at my amazon wishlist which contains over 200 items; a single re-load renders it correctly.
The meta charset fix isn't on the branch. Comments above imply that this problem is seen on the branch as well as the trunk. If so, this definitely isn't mine.
Vidur: This problem is *not* seen on the trunk. It is *only* on the branch.
Then it definitely isn't related to the meta charset sniffing fix (for bug 81253) which is only checked into the trunk. Who'd like to own this hot potato next?
Vidur: If I'm interpreting things correctly, bbaetz suspects your meta charset patch is what fixed this bug on the trunk. That's why you've got this bug for the moment (it was previously assigned to network/cache, then layout)...
There seems to be a communication failure here. One possibility is that a fix has *not* been merged into the branch that is needed to correct this defect. The fix for bug 81253 may be needed if the code that it patches *has been merged* into the branch, but the fix *has not been merged*. Is this the case? If so, then someone should test out a merge into a private build based on the 0.9.2 branch. If the original code that was patched has not been merged into the branch, then there is another possibility that both *still may need to be merged*. In that scenario, the check-in dates would need to be checked against the information in bug 83175 that lists the trunk build where the defect disappeared. I am beginning to believe this page loading functional defect is caused by multiple underlying code defects that manifest in similar, overlapping ways.
Kent: the patch from bug 81253 has *not* been merged into the branch. Rumor also has it that a checkin by dougt@netscape.com caused a regression with symptoms similar to the ones described in this bug on the branch. It was backed out the evening of Friday, July 6th. I'll post if I get more specifics.
->ftang (who vidur says knows something about dougt's regression... fun fun!)
Assignee: vidur → ftang
It seems' dogut's fix in 82418 also fix this problem.
the checkin by dougt is detailed in bug 89643 and bug 89472. It meant that pages would spin "forever" waiting to load _images_, but otherwise all of the content was loaded (i.e., nothing was truncated). It is a completely different bug than this one.
reassign to dougt. dogut, is this a dup of 82418?
Assignee: ftang → dougt
Doug, we were also looking for the appropriate owner for this bug. If it's not you, can you forward it instead of kicking it back? Thanks!
With yeterday's 0.9.2 commercial branch build, this wfm.
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → WORKSFORME
As of build id 2001071003 from 0.9.2 latest mozilla-win32-talkback.zip, this is only partially fixed for me. I saw the same partial fix when isolating trunk build that fixed bug 83175 in the trunk. See attached screenshot kd-partial-load-2001071003.gif to show the improvement when running the test case from bug 83175. The top banner and leftmost table rendered, but the main screen section does not dsplay the technicolor div text, whereas before only the top banner and top link in the leftmost table were rendered.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
trunk build isolation notes and test case and skin observation taken from bug 83175. note entry under anomaly concerning partial fix: ---- darin: mission accomplished (now retiring from testing this bug :) 2001-06-26-06-trunk/mozilla-win32.zip - Build ID: 2001062604 - bug is reproducible 2001-06-26-14-trunk/mozilla-win32.zip - Build ID: 2001062611 - bug is reproducible (with one anomaly) 2001-06-27-06-trunk/mozilla-win32-talkback.zip Build ID: 2001062704 - bug is no longer reproducible note: verified in multiple tests of each build in various orders (working backward and forward through builds). anomaly: during the first test of 2001-06-26-14-trunk, the bug was still reproducible, with the sole difference that the left table (site navigation links) rendered fully. during all other tests of this buildand preceding builds, only the top banner and the first link of the left table (site navigation links) were rendered, as illustrated in the partial load window capture attachment. good luck! ---- The following sequence of commands can be used to consistently reproduce this bug under Windows 2000 Server on one of my machines. Hope this helps (and works for others). 1. Start Mozilla (my settings start with a mail window). 2. Clear memory and disk caches. 2.a. Select "Edit / Preferences / Advanced / Cache / Clear Memory Cache" 2.b. Select "Edit / Preferences / Advanced / Cache / Clear Disk Cache" 3. Open a browser window. Note: Steps 2 and 3 can be reversed; browser can be set to open with blank page or live web page. 4. Select "Debug / Viewer Demos / #9 Frames" (loads from resource) 5. Select "Debug / XBL Demos / #1 Technicolor DIV" (loads from www.mozilla.org) 6. The page will not completely load. 7. Go to non-www.mozzila.org web site. 8. Repeat step 2 to clear caches. 9. Repeat steps 4 and 5 to reproduce again. Did not test further for other possibilities. My cache settings are 4096 memory, 5000 disk, Check every time. ---- More interesting tidbits: The bug is not always reproducible when using the Classic theme, but is always reproducible using Modern.
On the bright side: The partial load of URLs listed here *does* seem to be fixed, including large amazon wishlist. The test case still fails, which leads me to believe there is more than one patch merge involved in completely fixing this bug.
I still don't see the problem. :-/
over to gagan. something came up that requires my attention.
Assignee: dougt → gagan
Status: REOPENED → NEW
New observation: This bug is now just as easily reproduced with Classic skin in the build previously cited. Also, leftmost table is rendered with only top link present (ordinary state of the world when this bug exists). The anomaly of rendering more (but not all) of the leftmost table, as illustrated in previous attachment, is sporadic.
i find it very hard to understand how this can be a necko issue if difference skins affect the behavior.
As I said, the skins are not affecting the behavior, after all. However, even if it were, wouldn't the different memory footprints when using two different skins conceivably affect the manifestation of certain types of bugs?
I hope we are not hunting that kind of bug.. :-/
I think this should be a Mojo stop-ship. I'm running into this over and over again on my latest win32 branch nightly, both on dial-up and on the fast work connection. I haven't found a workaround short of restarting (reloading and shift+reloading don't seem to do it). I often see it on mozilla.org and bugzilla.mozilla.org. I believe it has something to do with the cache. Usually when I see this problem, the statusbar says "Read [path ending in '/cache', or containing 'cache' somewhere]".
OS: Linux → All
Hardware: PC → All
Has anyone seen this on the branch since the charset stuff landed on Friday? I think that it is a cache problem of some sort which the double-load just triggers, and that that patch is only hiding the problem, though. No evidence for that :)
Yes, I'm still seeing this over and over again in 2001071403 (branch).
Kent's cache steps are good for verifying its a cache problem, but they don't isolate every type of cache problem. Blake: does this happen after you blow away the "Cache" and "NewCache" directories while mozilla is off -OR- if you create a new profile?
I saw this today with win2k build 20010715.. (trunk) Reload /Shift+reload doesn't fix it (Stops at the same part of the page). I saw in the status bar: Reading ....\Profile\cache\... I deleted the cache via prefs but that doesn't fix it. (The page stoped loading on another part of the page after this). I fixed it with restarting mozilla.
OK, then. So are people seeing half loaded pages, or just blank screens? If you do view source, does the source load from the network? Is there some meta charset stuff in the page? If its repeatable, what happens if you disable the cache (using the debug pref pane - I don't think setting the cache size to 0 has the same effect)
This bug now WORKSFORME in Build ID 2001071505 Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.2) Gecko/20010715 under Windows 2000 Server using Modern skin.
I'm not seeing it recently...
I see there's a WFM comment on 7/15, is this gone from the branch then?
I'm still seeing this on the branch (commercial) using 2001-07-15-08 on Windows98. Type "quote ibm" in the URL bar, and the resulting stock quote page won't finish loading. The status bar reads: Read C:\PROGRAM FILES\NETSCAPE\NETSCAPE 6\res\arrow.gif
using sol's test case that tries to load http://personalfinance.netscape.com/finance/quotes/quotes.tmpl?symbol=ibm nearly all the content on the page loads but I see the page never finish to load and trobber conintues to throb.. sometimes I see it make it no further than "...conecting to personalfinanace.netscape.com" sometimes "...transfering data from ads.web.aol.com" sometimes "resolving host search.netscape.com" sometimes "Read: C:\..\Netscape 6\res\arrow.gif" branch build from friday Mozilla/5.0 (Windows; U; Win98; en-US; rv:0.9.2) Gecko/20010713 Netscape6/6.1
This bug has been about portions of pages failing to load (as in, not at all), as opposed to pages that don't fire onload, or get stuck loading an image in the page. The 'not stopping throbber' that Sol noted seems to be caused by the graph in that particular page. Filed as bug 91025. Now, back to our regularly scheduled bug -- can anyone still reproduce the problem of partial loads of page content with either (1) the current branch, or (2) the current trunk.
Yes, seen it at least four times on today's branch build.
Make that seven (this bug seems to come in bursts). -_-
We don't need no stinking pronouns! When you say "it", that means ...
It means the page didn't load completely. Actually, come to think of it, I saw it probably three times in a row on http://maps/ . I hit reload, and it me to one of the subframes (which I found really bizarre). I tried to load maps/ again, and it showed empty frames.
I've also run into this bug a couple of times on today's branch build. And every time it happens, the statusbar says "Read [path-containing-my-profile's-NewCache-directory]"
back to me for more investigation...
Assignee: gagan → dougt
I have tried everything listed in this bug but still can not reproduce this problem. Does ANYONE have a debug build of either the branch or trunk that they can reproduce this on? If you do, can I try reproducing at your desk?
Is it possible another recent fix has also fixed this bug?
Whiteboard: [critical for 0.9.2.1][PDT+] Fixed on trunk, but not sure why. Need investigation. → [critical for 0.9.2.1][PDT+] can't dup on branch?
No because blake can still reproduce pages not loading on a build from two days ago. There is a real bug here, but we don't know what the conditions are which trigger the problem.
the subversion.tigris.org page no longer shows this problem in 0.9.2, even though it used to. The xbl demo in the 07/10/01 13:14 attachment is repeatable in 0.9.2, but not in a current trunk build. However, it has a charset tag.... dougt - what if you back out vidur's fix, and then look at that page (and I have a list of others from the beta feedback which are no longer reproducable, but were with 0.9.2)? That way we could see if that patch actually fixed it, or just hid the problem in most cases. If the later, that would allow us to at least reproduce the problem and maybe fix it. Yes, I'm clutching at straws. I spend most of monday morning trying to reproduce this, and failing. Can people who see this bug please give a URL, even if its not repeatable?
I have pinged the two nscp'ers that reported problems here. I hope to get some driving time on their machines.
ignore that last patch. (it should have been bar, not foo... :-) )
'fixed' on trunk: Checking in nsCacheService.cpp; /cvsroot/mozilla/netwerk/cache/src/nsCacheService.cpp,v <-- nsCacheService.cpp new revision: 1.53; previous revision: 1.52 done Checking in nsMemoryCacheDevice.cpp; /cvsroot/mozilla/netwerk/cache/src/nsMemoryCacheDevice.cpp,v <-- nsMemoryCacheDevice.cpp new revision: 1.34; previous revision: 1.33 done Lets see if anyone can reproduce this problem on tomorrows build.
worksforme on win2k sp2 with build 2001071704
Fixed checked in on branch: Checking in nsCacheService.cpp; /cvsroot/mozilla/netwerk/cache/src/nsCacheService.cpp,v <-- nsCacheService.cpp new revision: 1.49.12.2; previous revision: 1.49.12.1 done Checking in nsMemoryCacheDevice.cpp; /cvsroot/mozilla/netwerk/cache/src/nsMemoryCacheDevice.cpp,v <-- nsMemoryCacheDevice.cpp new revision: 1.31.28.1; previous revision: 1.31 done
Updated whiteboard. Thanks for the patch!!! I see checkin comments for both trunk and branch. Shouldn't this be closed fixed so QA knows to attack it?
Whiteboard: [critical for 0.9.2.1][PDT+] can't dup on branch? → [PDT+] fixed on branch & trunk
Another testcase would be http://www.world-direct.com/mozilla-table-bug-testcase/index2.html just reload the page a few times and you see the "effect". as reported in bug 90482. There are a few others around reporting the same behaviour like 82946.
A more simplified testcase is available at http://www.world-direct.com/mozilla-table-bug-testcase/
Keywords: testcase
WFM on Mozilla/5.0 (Windows; U; Win98; en-US; rv:0.9.2+) Gecko/20010718 in W98 the pages renders completly without any text or tables missing, even if i reload the page still is render completly. The test case provided in the comment below works for me too see my comment on bug 90482.
QA - we are going to need some really extensive testing on this bug.
Status: NEW → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
Further testcases in http://bugzilla.mozilla.org/show_bug.cgi?id=82946 Although it seems we really need to decide which bugs relate to each other and make a concrete split. Reopening since this one needs extensive testing.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
using build 2001071903 and the problem is still there. In both testcases I used http://www.world-direct.com/miss-tirol/index.asp?ueber http://www.world-direct.com/mozilla-table-bug-testcase/
Markus this bug is fixed, I'm closing it again. The testcase in the above url works, your bug is a different one
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
Okay, kiddos, we're back in the game. I just saw it on BugZilla in today's commercial branch build.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Chad, define "it"?
*** Bug 91501 has been marked as a duplicate of this bug. ***
http://www.defleppard.com/defleppard/defleppard.html This is the url that NEVER loads for me May make a nice testcase
Sorry... What I meant was I went to bugzilla.mozilla.org and nothing on the page showed up. The status bar says "Read F:\private\aegis\blahblah.slt\NewCache\9CA9FF21d01". View | Source shows the full HTML of the page.
not loading a Def Leppard page, I say that is a feature! :-) Seriously, that is a seperate bug. Please file it and assign it to pavlov@netscape.com. The problem with that page is that the two images. Or at least that is what I am seeing.
Chad, could you look at the contents of F:\private\aegis\blahblah.slt\NewCache\9CA9FF21d01?
Yes, we *definitely* still have a problem here. I just had partial page loads three times in a row on bugzilla.mozilla.org. It happened often enough that I was getting frustrated. Then I tried to go to lxr.mozilla.org and saw the problem there too. Doing a debug build now to see if I can reproduce a partial page load in that. Francisco: Thanks for the testcase. I can't reproduce partial page loads there. It seems every person who can see this bug has a set of pages where it's most common (now the trick is finding a set that the developers can see!). Doug: 9CA9FF... has the full contents of http://bugzilla.mozilla.org/ (HTML)
Chad, how exactly did you produce this result?
Just browsing the pages... There isn't a set of reproducable steps. I started off on subversion.tigris.org, then went to Slashdot, then flipcode, then bugzilla.mozilla.org, and then lxr. Those are the pages where I see it most often, and if I just casually browse on them enough, I'm destined to see it again! An interesting note is that I didn't hit reload on these pages. The bug shows up on an initial load (typing in the URL bar or clicking on a link). I just cleared my cache, so I'll see if I ever see the bug on uncached pages.
Sometime between 0.9.1 and 0.9.2 I saw those partial loads everyday somewhere (google.org search results was most visible/reproducible). Now (current trunk) I have not seen ANY of those partial loads for 2+ weeks (and I browse a _lot_ on google, www.ceskenoviny.cz, slashdot, bugzilla and _many_ other sites). I cannot reproduce ANY of testcases/url mentioned in this bug. could the reason be that I ALWAYS have [ ] Enable Disk Cache (not checked) [x] Enable Mem Cache (checked) ??? The mem cache is set to 16MB. I use squid as a http proxy (so I don't use mozilla's disk cache). My system is uptodate Red Hat 7.1 Linux and I use current mozilla CVS snapshots.
Answering myself: I cannot reproduce it even with Enable Disk Cache [x] and Disk cache set to 10MB. Sometimes the pages here at bugzilla.mozilla.org load _very_ slowly (1-2 minutes now) but loads well and all everytime (no partial loads).
Added release note to 90577.
For what it's worth, I too have been seeing this problem all day on the branch as I was running through top100 sites.
i think that i might have a clue about what's going on... i believe that all of these partial page loads are due to the HTTPChannel being Cancel()ed while there are outstanding OnDataAvailable() and the OnStopRequest() events in the PLEventQ. It appears that nsHTTPTransaction::Cancel(...) is "not quite right"... The mStatus of the HTTPTransaction is set when the socket transport thread finishes. However, if the HTTPChannel is later cancelled on the UI thread before OnStopRequest() event has been processed, the *error* cancel code will be lost because it is not stored in the nsHTTPTransaction... I've noted where the problem is in the Cancel implementation in nsHTTPTransaction... ---------------------------------------------------------- // called from any thread NS_IMETHODIMP nsHttpTransaction::Cancel(nsresult status) { LOG(("nsHttpTransaction::Cancel [this=%x status=%x]\n", this, status)); // ignore cancelation if the transaction already has an error status. if (NS_FAILED(mStatus)) { LOG(("ignoring cancel since transaction has already failed " "[this=%x mStatus=%x]\n", this, mStatus)); return NS_OK; } // if the transaction is already "done" then there is nothing more to do. // ie., our consumer _will_ eventually receive their OnStopRequest. PRInt32 priorVal = PR_AtomicSet(&mTransactionDone, 1); if (priorVal == 1) { LOG(("ignoring cancel since transaction is already done [this=%x]\n", this)); ***** ---> At this point any NS_FAILED status code that was passed in is lost ***** return NS_OK; } // the status must be set immediately as the cancelation may only take // action asynchronously. mStatus = status; return nsHttpHandler::get()->CancelTransaction(this, status); } So, it looks like if the HTTPChannel is cancelled, the error code "may not" propagate up to nsHTTPChannel::OnStopRequest(...). This means that if a cache entry is being built, it may be marked valid incorrectly!! I believe that this bug has become harder to reproduce since vidur landed the charset sniffing code because that code dramatically reduced the number of channels that get cancelled!! I tried to test this theory by pulling the 0.9.1 branch which exhibited this problem ALOT and fixed nsHTTPTransaction::Cancel(...) to *always* set the mStatus... All of the partial page loads seemed to "go away"... I'm attaching a patch which will propagate the error code...
I think that i should mention that the code in nsHttpTransaction does NOT LOOK THREADSAFE AT ALL!! member variables are accessed and set on multiple threads with NO locking at all. in particular mStatus is not protected, so the actual status code could get munged... and it appears that the way mTransactionDone is used provides small windows of vunerability - ie. race conditions - between the UI thread and the socket transport thread. I think that the code works "as well as it does" because there are *only two* threads which simultaneously call the methods... -- rick
Blake, Chad, Please apply Ricks change and try to reproduce this bug.
Running through the list of top100 sites (as defined by http://netmation.com/list100.htm) with this patch applied, I encountered 0 failures of this kind. Yesterday, without this patch, about 16 of the 100 sites failed to load, or finish loading properly.
rpotts: yes, that sounds right (Wrt vidurs fix). Wrt threadsaftely though, there are lots of asumptions which aren't really documented, and I don't think its as bad as you think. I applied the patch, and didn't see anything, although since this was hard to reproduce that doesn't mean much. Could we be canceneled twice with different error codes (so that this does the wrong thing)? If not, then the assignment to mStatus should be moved above the PR_AtomicSet. I'm very worried about moving this onto the branch. The last time darin fixed a bug here, he was chasing crashers for a full week. And he only gets back on Monday... Can someone run jrgm's tests over the low bandwidth simulator, several times? On all three platforms? (The reason that we can be cancelled on the sockettransport thread is if we send a request, and the socket closes before we get a response. This happens when the server times out a keep-alive connection before it gets our request. We should not be being cancelled from both threads at the same time, I think, because the other thread can't cancel us at that stage. I need to check that with darin though...)
I misunderstood what was happening, and this fix does make sense. Sorry. Move the mStatus = status like above the if, and r=bbaetz
Still PDT+, please check in if you get fully comfortable and fully reviewed.
Fixed on branch and trunk. Rick, we owe you!!
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
woohoo!
Bad news. I've been running with this for an hour or so, and while I haven't seen the partial page load bug, I have seen the one where pages just don't come up at all. View Source shows <html><body></body></html>. I have http://dict.org/ in my personal toolbar, and after clicking on it enough, I saw a blank page rather than what should be there. This is not the same thing as the other day's "Click reload really fast and watch as the page refuses to load until you restart the browser." Should this be filed under a new bug?
hey chad, i bet this is another bug... with similar symptoms :-) why don't you open up a new bug for it and will take another whack :-)
Unfortunately I'm still seeing the problem in a new branch nightly, as is kerz. I'm still seeing both partial and complete load failures. As usual, the statusbar says "Read [my NewCache dir]" I don't think I'm seeing it as often, however. I guess we're stuck just relnoting this for rtm.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [PDT+] fixed on branch & trunk → [PDT+]
*** Bug 91912 has been marked as a duplicate of this bug. ***
welcome back darin :-)
Assignee: dougt → darin
Status: REOPENED → NEW
I would like to take the PDT+ off this bug because we've fixed what was necessary for the branch. Does anyone know that doing so would be blatantly stupid? (A necessary part of claiming it's stupid would be having a patch that addresses the remaining problems :-)
bug #91710 is a splitner bug from this one - with a proposed patch... we might want to evaluate that one... do we have any URLs that still show the partial page reload bug? if so, please let me know so i can start tracking those problems down - again ;-)
I still see it on many mozilla.org pages - lxr.mozilla.org, bugzilla.mozilla.org, etc (not that it's specific to those, those are just all I visit mostly). Can you please e-mail PDT and have them consider 91710? I think this bug is going to kill us if we ship with it.
Just FYI, this problem is very prominent in BeOS builds. With a trunk BeOS build from this morning, I am completely unable to render the bugzilla query page. With a 0.9.2 build from this afternoon, the query page renders and is usable but the Bug List page is truncated. If I turn off the mem & disk cache in both builds and hit reload repeatedly, sometimes the page renders correctly, other times it's truncated and sometimes it just shows the page source instead of rendering it.
Blake, I use bugzilla a zillion times every day from several different win32 boxes and I've never seen this problem. If you create a new profile and try these things again does it still happen? Do we have a ton of people complaining about these symptoms? I still think the remainder of this bug isn't PDT+ because most people don't see it and (with some exceptions) those who do have to kinda work at it.
Whiteboard: [PDT+] → [PDT+] fixed on branch, remainder on trunk
Just for the record, I see this on Tinderbox like once every 2 weeks or so.. Reload always clears this for me (NT & Win2k).
*** Bug 92139 has been marked as a duplicate of this bug. ***
Bug 92139 has a relatively reproducable (for me) testcase. I can even reproduce it in my debug build, if anyone wants to have a shot.
I have noticed the incidence of this problem has increased in 2001072308 trunk for Mac from previous builds.
darin's patch to 91710 does _not_ fix this bug. I can still reproduce it with ease on the URL in bug 92139.
More random information (just want to get this one fixed)... I *can* reproduce this bug when I disable both mem and disk caches in the Debug | Networking preferences. I also set the disk cache size to 0k.
using build 2001072503 this still happens on our intranet site.
Is 92402 a duplicate of this?
I submitted a bug (92472) which seems to be a duplicate of this. Another page that shows this problem for me is http://www.dilbert.com/ After it fails to load completely, the brower cannot load any additional web pages, no matter how simple, until I restart the browser. This is in 2001072608. -Zorin
This bug may be related to bug 92611, since the symptoms seems the same from what I've read. Call it another datapoint to check.
*** Bug 92611 has been marked as a duplicate of this bug. ***
I had no problem with this bug recently but today it is here again. No problem with yesterday build 20010729xx but today with build 2001073003 a lot of pages don't load completley.
It maybe related to bug 77072 and bug 92354.
I've observed this bug on a number of occasions, but it seems to be much more frequent on OSF1/alpha than Linux/intel. Originally saw it in 0.9.2 on the alpha. I'm now seeing it in build 2001073008 on alpha, but not on a contemporary Linux build. Strange. Although I guess if there is a threads-related issue here, that might explain why the severity changes between platforms (and maybe even quite subtle details of system config).
Just an update: I was able to tickle this bug under Mozilla 0.9.3 Linux I was unable, however, to tickle it under Mozilla 0.9.3 win32. It may be a platform specific issue afterall. Mozilla for Win32 seems to run like a dream. Hopefully this will be fixed soon so I can return to using Mozilla on my favorite OS. }:) -John
Seeing this repeadely on win2k. for the screenshots used build 2001080508
Removing nsenterprise nomination; moving to nsBranch.
Keywords: nsenterprisensBranch
Blocks: 99142
Darin - Is this resolved for 0.9.4? Should we close this one out?
i believe this has been fixed... as there haven't been any similar reports for ages... marking FIXED.
Status: NEW → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → FIXED
This is not fixed. I'm still having this problem with both the latest nightly and with 0.9.4. Test case page is still www.dilbert.com. Page starts to load, then stops loading. There is then a long wait, and either it never finishes loading, or it eventually loads after about 2-3 minutes. If you try to leave dilbert.com while it's stuck, other pages won't load at all, even local pages. If I'm doing something wrong, someone please let me know! I just wanted to provide input since the original problem doesn't seem to be fixed. -John
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Hmm. After experimenting further, I find that it seems to be something that's blocking during the page load. Eventually the page finishes loading, even if it takes five minutes. Whatever it is that's blocking also keeps other pages, in other browser windows, from opening. However, when that blocking function eventually times out, the browser starts working again. This may be a threading issue. Hopefully I'm not the only one experiencing this. }:) -John
john: what platform are you noticing this on?
I resolve this bug again as WFM because the original problem IS fixed. Your problem with www.dilbert.com should be entered as a new bug. Please enter the www.dilbert.com as a new bug, but before try to search a correct duplicate. I think I've seen this bug already reported.
Status: REOPENED → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → WORKSFORME
FWIW, I still see this problem frequently under beos when visiting either the tinderbox or any bugzilla page. Should I just open up a new bug on that?
I think it should be reported as a new bug because it's probably platform specific problem. Because there were problems on other platforms which have been already fixed by the fix in this bug.
cls: please open a new bug. thx!
Filed bug 100508 on the BeOS problem.
I still see this bug regulary on Milestone Mozilla 0.9.6 on windows nt with modern skin. (build 2001112009) -Page start loading but does not complete. Repeatedely pressing reload will eventually load the page correctly. Notorious pages were this happens for me are the readers comment pages in slashdot. The link above: http://www.linuxnews.pl is one that just didn't work either.
Sander: Please try a fresh profile. You can manage/create profiles with "mozilla.exe -profilemanager".
Keywords: qawanted
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: