Closed Bug 82720 Opened 23 years ago Closed 23 years ago

Pages do not load or display completely [M092]

Categories

(Core :: Layout, defect, P1)

defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: zackw, Assigned: darin.moz)

References

()

Details

(Keywords: helpwanted, regression, testcase, Whiteboard: [PDT+] fixed on branch, remainder on trunk)

Attachments

(9 files)

Most pages on the site referenced above do not load completely.  Parts of
the page will appear, others are left blank or replaced by random HTTP headers.
Hitting reload sometimes causes more of the page to load, sometimes causes
less of it to load.  The problem is worst with the mailing list archives
(e.g. http://subversion.tigris.org/servlets/SummarizeList?listName=dev) but
can be seen with any of the pages.  If you don't see it immediately, hit
reload a couple of times.

Idle speculation: problems with chunked transfers and/or keep-alive?
reporter: Have you enabled keep-alive ?
If yes, can you ty to disable it ? (Preferences/debug/networking).

And please add to every bug report the build ID !
Yes, I had keep-alive enabled, but not pipelining.  Turning off keep-alive
makes mozilla behave *predictably* - ie. it always loads the same amount of
each page - but it still fails to load pages from this site completely.
Pipelining on/off doesn't seem to make a difference.

'View Source' suggests that each page is getting cut off after some part of
it loads.

sorry about the build-id, it's 2001052308
WinMe 2001062104

I'm seeing this on many different sites.    A page will start loading, and then
just stop halfway through.   Reload will get the whole page.    The last time I
saw it was when retrieving the order confirmation page on www.newegg.com .   I
needed to press reload to get the whole page.  (and confirm button!)

Disabling keep-alive has no effect on the problem.
Status: UNCONFIRMED → NEW
Ever confirmed: true
tor can't see this when he enables logging.

On 0.9.1 (which is all I have ATM), I can packet sniff. We're double loading
because of the charset problem, but we do appear to be laoding the entire
document each time. I'm seeing the headers (presumably from the 2nd load) at the
bottom of the page, and most of the content is missing.

It doesn't happen all the time, though.
Managed to catch an aborted load while logging.  The page that failed to
load completely is http://www.enid-rayeadams.com/photos.htm (second page
in the log).  I've inserted a comment in the log showing when activity
had ceased and I hit reload.

The connection is running through a junkbuster proxy, for what it's worth.
i wonder if this has to do with bug 87047.
I don't think so - turned off the proxy and it still does an incomplete load
for the failed page in my log (stopped at the same point, incidentally).
-> moz 0.9.3
Priority: -- → P1
Target Milestone: --- → mozilla0.9.3
There is another site where you can see this problem:
http://www.ragreiner.de
I tested with the following Mozilla versions: 0.9.2 for Linux and Netscape 6.1
PR1 for Windows, the problems exists in both versions.
(NS4, MSIE and Konqueror display all pages on this site without problems.)

"View" -> "Source" shows the COMPLETE source text of the page, so the problem
does not seem to be caused by incomplete loading of the page over the network.
Target Milestone: mozilla0.9.3 → mozilla1.0
moving out milestone.  Will try to get this in for 0.9.3 if a tested patch is
submitted
*** Bug 87556 has been marked as a duplicate of this bug. ***
There are a whole lot of pages which have this problem. Some appear fixed
(possibly with the charset patch, since they all had charset tags). this one
occurs all the time, and on other pages (eg bugzilla) it just happens sometimes.
I have no idea where to start looking, unfortunately, and neither I nor darin
saw anything sticking out in that log.
benc: I don't suppose you can come up with a simple test case? :)
Keywords: qawanted
*** Bug 83175 has been marked as a duplicate of this bug. ***
I see the HTTP log, but I think this may actually *not* be an HTTP bug. Some
info from bug 83175, which I just duped:

  1. This seems to be fixed on the trunk -- it is still present on the 0.9.2
     branch.
  2. This happens in MailNews (with HTML emails) as frequently as it does in
     the browser. This leads me to believe that it isn't an HTTP bug.
  3. I don't use a proxy, and I do see this bug. Others who do use proxies
     also see this bug. Proxy should be unrelated.
  4. Kent Dorsey posted some info on when this bug disappeared on the trunk,
     although we haven't correlated the fix to any particular patch.

I *strongly* recommend a little bit of investigation to see what patch needs to
be applied to the branch to fix this, or why a patch that may have gone into
both trees didn't fix them both. This shouldn't need a patch by itself... We
will look like rather large morons if we ship a web browser that doesn't fully
load webpages.
Assignee: neeti → asa
Severity: normal → blocker
Component: Networking: HTTP → Browser-General
QA Contact: benc → doronr
Whiteboard: [critical for 0.9.2.1] Fixed on trunk, but not sure why. Need investigation.
I don't fix bugs. Moving a bug to Browser-General and assigning it to me pushes
it further away from a fix, not closer.  Browser-General is reserved for bugs
which don't fit into any of the real components.  If it stays assigned to me I'm
going to test in on the trunk and mark it Worksforme if it can't be reproduced.
dr - I agree. My bet for the fixe is vidur's meta charset reload patch, based on
looking at urls this problem was reported on in the beta feedback. Since that
patch fixed the ibench results, its probaby nsbranch.

there are two problems - partial page loads (like this case), and sites where
the page just comes up blank. the second case is what was probably fixed with
the charset page.

The subversion url here only displays part of the page though, even on the
truck. Could we have a charset converter bug of some sort?
Oookay... ->vidur.
Assignee: asa → vidur
Component: Browser-General → Layout
Target Milestone: mozilla1.0 → ---
I don't see any problems with the subversion page with the Linux 2001070208 (trunk).
And the bug which was seen as fixed on the trunk was with only partial loading
of a page not the blank page displayed.
Maybe I've got the fix on the trunk which fixed this. It could be the fix for
the bug 82418 by Darin Fisher.

If you look at the 'now marked as duplicate' bug 83175, there is report that the
fix was made sometime in 2001-06-26, which is exactly the same time when the fix
for bug 82418 was checked in.
I'm still seeing the bug on the trunk (2001070408, Linux) with the same test
URL (subversion.tigris.org).  It's not as bad as it used to be - what mostly
happens is it displays part of the page, then I hit reload and it displays
the whole thing.  Caching issue?
Additional to what I said just now.

Once a page is in cache, it is displayed completely.  However, the first
time I visit any page at subversion, it is incomplete.
The View Source listing matches the rendered version; that is, if the
rendering is incomplete, so is the source listing.

TCP tracing indicates that hitting reload causes additional network traffic.
I will post an attachment with the trace.
The trunk does not fix subversion.tigris.org loading problems for me (similar
problems can be found on javasoft.com as well).  My loading problems at
subversion are similar in nature, but not the same per page, as the ones noted here.

However, the trunk does fix the test case entered by me for the now duplicate
bug 83175 as of the build dates mentioned in that bug.

As previously noted, the bug is not consistently reproducible using the test
case in bug 83175 with the Classic skin, though it is consistent with Modern
across many builds.  Why would switching skins affect the reproducibility?  I
would think this would point to a memory corruption type of bug, except that it
is *consistently* reproducible across many builds.
Is this one bug or many?  Is there one underlying cause or many?  Food for thought.
*** Bug 89352 has been marked as a duplicate of this bug. ***
Resummarizing. Pages this has been seen on so far include:

  http://subversion.tigris.org
  http://www.ceskenoviny.cz
  http://forum.uznam.net.pl
  http://www.linuxnews.pl
  http://www.ragreiner.de
  http://www.bbc.co.uk/cgi-bin/quiz.pl?QUIZDATA=ultimate_question.dat
    &WEBSITE=nature
  http://www.freeswan.org
  http://www.google.com
  http://www.nytimes.com (see attachment 39609 [details])
  MailNews occasionally

Stop the Madness!
Summary: Most pages on this site fail to load completely → Pages do not load or display completely [M092]
vidur,  have you had a chance to look at this?  
looks like a possible ugly stopper...
I'm not sure I see it on my latest build
chofmann: Your latest *branch* build, I hope... Not sure I made that entirely
clear to everybody.
Whiteboard: [critical for 0.9.2.1] Fixed on trunk, but not sure why. Need investigation. → [critical for 0.9.2.1][PDT] Fixed on trunk, but not sure why. Need investigation.
This bug is easier to reproduce with .php pages
Try looking webpages that have forums in it made in php
(slackware forums did this for me today)
*** Bug 89472 has been marked as a duplicate of this bug. ***
marking PDT+.  The + was missing from the PDT marking already on the status
whiteboard.
Whiteboard: [critical for 0.9.2.1][PDT] Fixed on trunk, but not sure why. Need investigation. → [critical for 0.9.2.1][PDT+] Fixed on trunk, but not sure why. Need investigation.
*** Bug 89597 has been marked as a duplicate of this bug. ***
This bug is still reproducible using bug 83175 test case under windows 2000
server, modern skin, and windows talkback 0.9.2 build id 2001070710.  Also, it
occurs when looking at my amazon wishlist which contains over 200 items; a
single re-load renders it correctly.
The meta charset fix isn't on the branch. Comments above imply that this problem
is seen on the branch as well as the trunk. If so, this definitely isn't mine.
Vidur: This problem is *not* seen on the trunk. It is *only* on the branch.
Then it definitely isn't related to the meta charset sniffing fix (for bug
81253) which is only checked into the trunk. Who'd like to own this hot potato next?
Vidur: If I'm interpreting things correctly, bbaetz suspects your meta charset
patch is what fixed this bug on the trunk. That's why you've got this bug for
the moment (it was previously assigned to network/cache, then layout)...
There seems to be a communication failure here.  One possibility is that a fix
has *not* been merged into the branch that is needed to correct this defect. 
The fix for bug 81253 may be needed if the code that it patches *has been
merged* into the branch, but the fix *has not been merged*.  Is this the case?

If so, then someone should test out a merge into a private build based on the
0.9.2 branch.  If the original code that was patched has not been merged into
the branch, then there is another possibility that both *still may need to be
merged*.  In that scenario, the check-in dates would need to be checked against
the information in bug 83175 that lists the trunk build where the defect
disappeared.

I am beginning to believe this page loading functional defect is caused by
multiple underlying code defects that manifest in similar, overlapping ways.
Kent: the patch from bug 81253 has *not* been merged into the branch. 

Rumor also has it that a checkin by dougt@netscape.com caused a regression with
symptoms similar to the ones described in this bug on the branch. It was backed
out the evening of Friday, July 6th. I'll post if I get more specifics.
->ftang (who vidur says knows something about dougt's regression... fun fun!)
Assignee: vidur → ftang
It seems' dogut's fix in 82418 also fix this problem. 
the checkin by dougt is detailed in bug 89643 and bug 89472. It meant that 
pages would spin "forever" waiting to load _images_, but otherwise all of the
content was loaded (i.e., nothing was truncated). It is a completely different
bug than this one.
reassign to dougt. dogut, is this a dup of 82418?
Assignee: ftang → dougt
Doug, we were also looking for the appropriate owner for this bug.  If it's not
you, can you forward it instead of kicking it back?  Thanks!
With yeterday's 0.9.2 commercial branch build, this wfm.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → WORKSFORME
As of build id 2001071003 from 0.9.2 latest mozilla-win32-talkback.zip, this is
only partially fixed for me.  I saw the same partial fix when isolating trunk
build that fixed bug 83175 in the trunk.  See attached screenshot
kd-partial-load-2001071003.gif to show the improvement when running the test
case from bug 83175.  The top banner and leftmost table rendered, but the main
screen section does not dsplay the technicolor div text, whereas before only the
top banner and top link in the leftmost table were rendered.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
trunk build isolation notes and test case and skin observation taken from bug
83175.  note entry under anomaly concerning partial fix:

----

darin: mission accomplished (now retiring from testing this bug :)

2001-06-26-06-trunk/mozilla-win32.zip - Build ID: 2001062604 - bug is reproducible

2001-06-26-14-trunk/mozilla-win32.zip - Build ID: 2001062611 - bug is
reproducible (with one anomaly)

2001-06-27-06-trunk/mozilla-win32-talkback.zip Build ID: 2001062704 - bug is no
longer reproducible

note: verified in multiple tests of each build in various orders (working
backward and forward through builds).

anomaly: during the first test of 2001-06-26-14-trunk, the bug was still
reproducible, with the sole difference that the left table (site navigation
links) rendered fully.  during all other tests of this buildand preceding
builds, only the top banner and the first link of the left table (site
navigation links) were rendered, as illustrated in the partial load window
capture attachment.

good luck!

----

The following sequence of commands can be used to consistently reproduce this
bug under Windows 2000 Server on one of my machines.  Hope this helps (and works
for others).

1. Start Mozilla (my settings start with a mail window).
2. Clear memory and disk caches.
2.a. Select "Edit / Preferences / Advanced / Cache / Clear Memory Cache"
2.b. Select "Edit / Preferences / Advanced / Cache / Clear Disk Cache"
3. Open a browser window.
Note: Steps 2 and 3 can be reversed; browser can be set to open with blank page
or live web page.
4. Select "Debug / Viewer Demos / #9 Frames" (loads from resource)
5. Select "Debug / XBL Demos / #1 Technicolor DIV" (loads from www.mozilla.org)
6. The page will not completely load.
7. Go to non-www.mozzila.org web site.
8. Repeat step 2 to clear caches.
9. Repeat steps 4 and 5 to reproduce again.

Did not test further for other possibilities.  My cache settings are 4096
memory, 5000 disk, Check every time.

----

More interesting tidbits: The bug is not always reproducible when using the
Classic theme, but is always reproducible using Modern.

On the bright side: The partial load of URLs listed here *does* seem to be
fixed, including large amazon wishlist.  The test case still fails, which leads
me to believe there is more than one patch merge involved in completely fixing
this bug.
I still don't see the problem. :-/
over to gagan.  something came up that requires my attention.
Assignee: dougt → gagan
Status: REOPENED → NEW
New observation: This bug is now just as easily reproduced with Classic skin in
the build previously cited.  Also, leftmost table is rendered with only top link
present (ordinary state of the world when this bug exists).  The anomaly of
rendering more (but not all) of the leftmost table, as illustrated in previous
attachment, is sporadic.
i find it very hard to understand how this can be a necko issue if difference 
skins affect the behavior. 
As I said, the skins are not affecting the behavior, after all. However, even if
it were, wouldn't the different memory footprints when using two different skins
conceivably affect the manifestation of certain types of bugs?
I hope we are not hunting that kind of bug.. :-/
I think this should be a Mojo stop-ship. I'm running into this over and over
again on my latest win32 branch nightly, both on dial-up and on the fast work
connection.  I haven't found a workaround short of restarting (reloading and
shift+reloading don't seem to do it).  I often see it on mozilla.org and
bugzilla.mozilla.org.

I believe it has something to do with the cache.  Usually when I see this
problem, the statusbar says "Read [path ending in '/cache', or containing
'cache' somewhere]".
OS: Linux → All
Hardware: PC → All
Has anyone seen this on the branch since the charset stuff landed on Friday?

I think that it is a cache problem of some sort which the double-load just
triggers, and that that patch is only hiding the problem, though. No evidence
for that :)
Yes, I'm still seeing this over and over again in 2001071403 (branch).
Kent's cache steps are good for verifying its a cache problem, but they don't
isolate every type of cache problem.

Blake: does this happen after you blow away the "Cache" and "NewCache"
directories while mozilla is off -OR- if you create a new profile?
I saw this today with win2k build 20010715.. (trunk)

Reload /Shift+reload doesn't fix it (Stops at the same part of the page).
I saw in the status bar: Reading ....\Profile\cache\...
I deleted the cache via prefs but that doesn't fix it.
(The page stoped loading on another part of the page after this).

I fixed it with restarting mozilla.

OK, then. So are people seeing half loaded pages, or just blank screens?

If you do view source, does the source load from the network? Is there some meta
charset stuff in the page? If its repeatable, what happens if you disable the
cache (using the debug pref pane - I don't think setting the cache size to 0 has
the same effect)
This bug now WORKSFORME in Build ID 2001071505 Mozilla/5.0 (Windows; U; Windows
NT 5.0; en-US; rv:0.9.2) Gecko/20010715 under Windows 2000 Server using Modern skin.
I'm not seeing it recently...
I see there's a WFM comment on 7/15, is this gone from the branch then?
I'm still seeing this on the branch (commercial) using 2001-07-15-08 on Windows98.

Type "quote ibm" in the URL bar, and the resulting stock quote page won't finish
loading.

The status bar reads:

     Read C:\PROGRAM FILES\NETSCAPE\NETSCAPE 6\res\arrow.gif
using sol's test case that tries to load
http://personalfinance.netscape.com/finance/quotes/quotes.tmpl?symbol=ibm
nearly all the content on the page loads but
I see the page never finish to load and trobber conintues to throb..  

sometimes I see it make it no further than 
 "...conecting to personalfinanace.netscape.com"
sometimes
 "...transfering data from ads.web.aol.com"
sometimes
 "resolving host search.netscape.com"
sometimes 
  "Read: C:\..\Netscape 6\res\arrow.gif"

branch build from friday
Mozilla/5.0 (Windows; U; Win98; en-US; rv:0.9.2) Gecko/20010713 Netscape6/6.1 



This bug has been about portions of pages failing to load (as in, not at all), 
as opposed to pages that don't fire onload, or get stuck loading an image in
the page.

The 'not stopping throbber' that Sol noted seems to be caused by the graph
in that particular page. Filed as bug 91025.

Now, back to our regularly scheduled bug -- can anyone still reproduce the 
problem of partial loads of page content with either (1) the current branch, 
or (2) the current trunk. 
Yes, seen it at least four times on today's branch build.
Make that seven (this bug seems to come in bursts).  -_-
We don't need no stinking pronouns! 

When you say "it", that means ...
It means the page didn't load completely.

Actually, come to think of it, I saw it probably three times in a row on
http://maps/ .  I hit reload, and it me to one of the subframes (which I found
really bizarre).  I tried to load maps/ again, and it showed empty frames.
I've also run into this bug a couple of times on today's branch build. And every
time it happens, the statusbar says "Read
[path-containing-my-profile's-NewCache-directory]"
back to me for more investigation... 
Assignee: gagan → dougt
I have tried everything listed in this bug but still can not reproduce this problem.

Does ANYONE have a debug build of either the branch or trunk that they can
reproduce this on?  If you do, can I try reproducing at your desk?  

Is it possible another recent fix has also fixed this bug?
Whiteboard: [critical for 0.9.2.1][PDT+] Fixed on trunk, but not sure why. Need investigation. → [critical for 0.9.2.1][PDT+] can't dup on branch?
No because blake can still reproduce pages not loading on a build from two days
ago.  There is a real bug here, but we don't know what the conditions are which
trigger the problem.
the subversion.tigris.org page no longer shows this problem in 0.9.2, even
though it used to.

The xbl demo in the 07/10/01 13:14 attachment is repeatable in 0.9.2, but not in
a current trunk build. However, it has a charset tag....

dougt - what if you back out vidur's fix, and then look at that page (and I have
a list of others from the beta feedback which are no longer reproducable, but
were with 0.9.2)? That way we could see if that patch actually fixed it, or just
hid the problem in most cases. If the later, that would allow us to at least
reproduce the problem and maybe fix it. Yes, I'm clutching at straws. I spend
most of monday morning trying to reproduce this, and failing.

Can people who see this bug please give a URL, even if its not repeatable?
I have pinged the two nscp'ers that reported problems here.  I hope to get some
driving time on their machines.  
ignore that last patch.  (it should have been bar, not foo... :-) )
'fixed' on trunk:

Checking in nsCacheService.cpp;
/cvsroot/mozilla/netwerk/cache/src/nsCacheService.cpp,v  <--  nsCacheService.cpp
new revision: 1.53; previous revision: 1.52
done

Checking in nsMemoryCacheDevice.cpp;
/cvsroot/mozilla/netwerk/cache/src/nsMemoryCacheDevice.cpp,v  <-- 
nsMemoryCacheDevice.cpp
new revision: 1.34; previous revision: 1.33
done

Lets see if anyone can reproduce this problem on tomorrows build. 
worksforme on win2k sp2 with build 2001071704
Fixed checked in on branch:

Checking in nsCacheService.cpp;
/cvsroot/mozilla/netwerk/cache/src/nsCacheService.cpp,v  <--  nsCacheService.cpp
new revision: 1.49.12.2; previous revision: 1.49.12.1
done
Checking in nsMemoryCacheDevice.cpp;
/cvsroot/mozilla/netwerk/cache/src/nsMemoryCacheDevice.cpp,v  <-- 
nsMemoryCacheDevice.cpp
new revision: 1.31.28.1; previous revision: 1.31
done
Updated whiteboard.  Thanks for the patch!!!

I see checkin comments for both trunk and branch.  Shouldn't this be closed
fixed so QA knows to attack it?
Whiteboard: [critical for 0.9.2.1][PDT+] can't dup on branch? → [PDT+] fixed on branch & trunk
Another testcase would be
http://www.world-direct.com/mozilla-table-bug-testcase/index2.html
just reload the page a few times and you see the "effect".

as reported in bug 90482.
There are a few others around reporting the same behaviour like 82946.

A more simplified testcase is available at
http://www.world-direct.com/mozilla-table-bug-testcase/

Keywords: testcase
WFM on Mozilla/5.0 (Windows; U; Win98; en-US; rv:0.9.2+) Gecko/20010718 in W98
the pages renders completly without any text or tables missing, even if i reload
the page still is render completly.
The test case provided in the comment below works for me too see my comment on bug 
90482.
QA - we are going to need some really extensive testing on this bug.  
Status: NEW → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
Further testcases in
http://bugzilla.mozilla.org/show_bug.cgi?id=82946
Although it seems we really need to decide which bugs relate to each other
and make a concrete split.

Reopening since this one needs extensive testing.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
using build 2001071903 and the problem is still there. 
In both testcases I used
http://www.world-direct.com/miss-tirol/index.asp?ueber
http://www.world-direct.com/mozilla-table-bug-testcase/
Markus this bug is fixed, I'm closing it again. The testcase in the above url
works, your bug is a different one
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
Okay, kiddos, we're back in the game.  I just saw it on BugZilla in today's
commercial branch build.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Chad, define "it"?
*** Bug 91501 has been marked as a duplicate of this bug. ***
http://www.defleppard.com/defleppard/defleppard.html
This is the url that NEVER loads for me
May make a nice testcase
Sorry...  What I meant was I went to bugzilla.mozilla.org and nothing on the
page showed up.  The status bar says "Read
F:\private\aegis\blahblah.slt\NewCache\9CA9FF21d01".  View | Source shows the
full HTML of the page.
not loading a Def Leppard page, I say that is a feature! :-)  Seriously, that is
a seperate bug.  Please file it and assign it to pavlov@netscape.com.  The
problem with that page is that the two images.  Or at least that is what I am
seeing.
Chad, could you look at the contents of
F:\private\aegis\blahblah.slt\NewCache\9CA9FF21d01?  

Yes, we *definitely* still have a problem here.  I just had partial page loads
three times in a row on bugzilla.mozilla.org.  It happened often enough that I
was getting frustrated.  Then I tried to go to lxr.mozilla.org and saw the
problem there too.  Doing a debug build now to see if I can reproduce a partial
page load in that.

Francisco:  Thanks for the testcase.  I can't reproduce partial page loads
there.  It seems every person who can see this bug has a set of pages where it's
most common (now the trick is finding a set that the developers can see!).

Doug:  9CA9FF... has the full contents of http://bugzilla.mozilla.org/ (HTML)
Chad, how exactly did you produce this result?
Just browsing the pages...  There isn't a set of reproducable steps.  I started
off on subversion.tigris.org, then went to Slashdot, then flipcode, then
bugzilla.mozilla.org, and then lxr.  Those are the pages where I see it most
often, and if I just casually browse on them enough, I'm destined to see it
again!  An interesting note is that I didn't hit reload on these pages.  The bug
shows up on an initial load (typing in the URL bar or clicking on a link).

I just cleared my cache, so I'll see if I ever see the bug on uncached pages.
Sometime between 0.9.1 and 0.9.2 I saw those partial loads everyday somewhere
(google.org search results was most visible/reproducible). Now (current trunk) I
have not seen ANY of those partial loads for 2+ weeks (and I browse a _lot_ on
google, www.ceskenoviny.cz, slashdot, bugzilla and _many_ other sites).

I cannot reproduce ANY of testcases/url mentioned in this bug.

could the reason be that I ALWAYS have
[ ] Enable Disk Cache (not checked)
[x] Enable Mem Cache (checked)
 ???

The mem cache is set to 16MB.

I use squid as a http proxy (so I don't use mozilla's disk cache). My system is
uptodate Red Hat 7.1 Linux and I use current mozilla CVS snapshots.
Answering myself:
I cannot reproduce it even with Enable Disk Cache [x] and Disk cache set to 10MB.

Sometimes the pages here at bugzilla.mozilla.org load _very_ slowly (1-2 minutes
now) but loads well and all everytime (no partial loads).
Added release note to 90577.
For what it's worth, I too have been seeing this problem all day on the branch
as I was running through top100 sites.
i think that i might have a clue about what's going on...  i believe that all of
these partial page loads are due to the HTTPChannel being Cancel()ed while there
are outstanding OnDataAvailable() and the OnStopRequest() events in the PLEventQ.

It appears that nsHTTPTransaction::Cancel(...) is "not quite right"...  The
mStatus of the HTTPTransaction is set when the socket transport thread finishes.
  However, if the HTTPChannel is later cancelled on the UI thread before
OnStopRequest() event has been processed, the *error* cancel code will be lost
because it is not stored in the nsHTTPTransaction...

I've noted where the problem is in the Cancel implementation in nsHTTPTransaction...
----------------------------------------------------------
// called from any thread
NS_IMETHODIMP
nsHttpTransaction::Cancel(nsresult status)
{
    LOG(("nsHttpTransaction::Cancel [this=%x status=%x]\n", this, status));

    // ignore cancelation if the transaction already has an error status.
    if (NS_FAILED(mStatus)) {
        LOG(("ignoring cancel since transaction has already failed "
             "[this=%x mStatus=%x]\n", this, mStatus));
        return NS_OK;
    }

    // if the transaction is already "done" then there is nothing more to do.
    // ie., our consumer _will_ eventually receive their OnStopRequest.
    PRInt32 priorVal = PR_AtomicSet(&mTransactionDone, 1);
    if (priorVal == 1) {
        LOG(("ignoring cancel since transaction is already done [this=%x]\n",
this));
*****
---> At this point any NS_FAILED status code that was passed in is lost
*****
        return NS_OK;
    }

    // the status must be set immediately as the cancelation may only take
    // action asynchronously.
    mStatus = status;

    return nsHttpHandler::get()->CancelTransaction(this, status);
}


So, it looks like if the HTTPChannel is cancelled, the error code "may not"
propagate up to nsHTTPChannel::OnStopRequest(...).  This means that if a cache
entry is being built, it may be marked valid incorrectly!!

I believe that this bug has become harder to reproduce since vidur landed the
charset sniffing code because that code dramatically reduced the number of
channels that get cancelled!!

I tried to test this theory by pulling the 0.9.1 branch which exhibited this
problem ALOT and fixed nsHTTPTransaction::Cancel(...) to *always* set the
mStatus...  All of the partial page loads seemed to "go away"...

I'm attaching a patch which will propagate the error code...
I think that i should mention that the code in nsHttpTransaction does NOT LOOK
THREADSAFE AT ALL!!

member variables are accessed and set on multiple threads with NO locking at all.

in particular mStatus is not protected, so the actual status code could get
munged...  and it appears that the way mTransactionDone is used provides small
windows of vunerability - ie. race conditions - between the UI thread and the
socket transport thread.

I think that the code works "as well as it does" because there are *only two*
threads which simultaneously call the methods...

-- rick
Blake, Chad, Please apply Ricks change and try to reproduce this bug.
Running through the list of top100 sites (as defined by
http://netmation.com/list100.htm) with this patch applied, I encountered 0
failures of this kind.

Yesterday, without this patch, about 16 of the 100 sites failed to load, or
finish loading properly.
rpotts: yes, that sounds right (Wrt vidurs fix).

Wrt threadsaftely though, there are lots of asumptions which aren't really
documented, and I don't think its as bad as you think.

I applied the patch, and didn't see anything, although since this was hard to
reproduce that doesn't mean much.
Could we be canceneled twice with different error codes (so that this does the
wrong thing)? If not, then the assignment to mStatus should be moved above the
PR_AtomicSet.

I'm very worried about moving this onto the branch. The last time darin fixed a
bug here, he was chasing crashers for a full week. And he only gets back on
Monday...

Can someone run jrgm's tests over the low bandwidth simulator, several times? On
all three platforms?

(The reason that we can be cancelled on the sockettransport thread is if we send
a request, and the socket closes before we get a response. This happens when the
server times out a keep-alive connection before it gets our request. We should
not be being cancelled from both threads at the same time, I think, because the
other thread can't cancel us at that stage. I need to check that with darin
though...)
I misunderstood what was happening, and this fix does make sense. Sorry.

Move the mStatus = status like above the if, and r=bbaetz
Still PDT+, please check in if you get fully comfortable and fully reviewed.
Fixed on branch and trunk. 

Rick, we owe you!!
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
woohoo!
Bad news.

I've been running with this for an hour or so, and while I haven't seen the
partial page load bug, I have seen the one where pages just don't come up at
all.  View Source shows <html><body></body></html>.

I have http://dict.org/ in my personal toolbar, and after clicking on it enough,
I saw a blank page rather than what should be there.  This is not the same thing
as the other day's "Click reload really fast and watch as the page refuses to
load until you restart the browser."

Should this be filed under a new bug?
hey chad,

i bet this is another bug... with similar symptoms :-) why don't you open up a 
new bug for it and will take another whack :-)
Unfortunately I'm still seeing the problem in a new branch nightly, as is kerz.
I'm still seeing both partial and complete load failures.  As usual, the
statusbar says "Read [my NewCache dir]"  I don't think I'm seeing it as often,
however.  I guess we're stuck just relnoting this for rtm.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: [PDT+] fixed on branch & trunk → [PDT+]
*** Bug 91912 has been marked as a duplicate of this bug. ***
  welcome back darin :-) 
Assignee: dougt → darin
Status: REOPENED → NEW
I would like to take the PDT+ off this bug because we've fixed what was
necessary for the branch.  Does anyone know that doing so would be blatantly
stupid?  (A necessary part of claiming it's stupid would be having a patch that
addresses the remaining problems :-)
bug #91710 is a splitner bug from this one - with a proposed patch...  we might
want to evaluate that one...

do we have any URLs that still show the partial page reload bug?  if so, please
let me know so i can start tracking those problems down - again ;-)
I still see it on many mozilla.org pages - lxr.mozilla.org,
bugzilla.mozilla.org, etc (not that it's specific to those, those are just all I
visit mostly).

Can you please e-mail PDT and have them consider 91710?  I think this bug is
going to kill us if we ship with it.
Just FYI, this problem is very prominent in BeOS builds.  With a trunk BeOS
build from this morning, I am completely unable to render the bugzilla query
page.  With a 0.9.2 build from this afternoon, the query page renders and is
usable but the Bug List page is truncated.  If I turn off the mem & disk cache
in both builds and hit reload repeatedly, sometimes the page renders correctly,
other times it's truncated and sometimes it just shows the page source instead
of rendering it.  
Blake, I use bugzilla a zillion times every day from several different win32
boxes and I've never seen this problem.  If you create a new profile and try
these things again does it still happen?  Do we have a ton of people complaining
about these symptoms?

I still think the remainder of this bug isn't PDT+ because most people don't see
it and (with some exceptions) those who do have to kinda work at it.
Whiteboard: [PDT+] → [PDT+] fixed on branch, remainder on trunk
Just for the record, I see this on Tinderbox like once every 2 weeks or so..
Reload always clears this for me (NT & Win2k).
*** Bug 92139 has been marked as a duplicate of this bug. ***
Bug 92139 has a relatively reproducable (for me) testcase.  I can even reproduce
it in my debug build, if anyone wants to have a shot.
I have noticed the incidence of this problem has increased in 2001072308 trunk
for Mac from previous builds.
darin's patch to 91710 does _not_ fix this bug.  I can still reproduce it with
ease on the URL in bug 92139.
More random information (just want to get this one fixed)...  I *can* reproduce
this bug when I disable both mem and disk caches in the Debug | Networking
preferences.  I also set the disk cache size to 0k.
using build 2001072503 this still happens on our intranet site.
Is 92402 a duplicate of this?
I submitted a bug (92472) which seems to be a duplicate of this.

Another page that shows this problem for me is http://www.dilbert.com/

After it fails to load completely, the brower cannot load any additional web
pages, no matter how simple, until I restart the browser.

This is in 2001072608.

-Zorin
This bug may be related to bug 92611, since the symptoms seems the same from
what I've read.  Call it another datapoint to check.

*** Bug 92611 has been marked as a duplicate of this bug. ***
I had no problem with this bug recently but today it is here again. No problem
with yesterday build 20010729xx but today with build 2001073003 a lot of pages
don't load completley.
It maybe related to bug 77072 and bug 92354.
I've observed this bug on a number of occasions, but it seems to be much more
frequent on OSF1/alpha than Linux/intel.  Originally saw it in 0.9.2 on the
alpha.  I'm now seeing it in build 2001073008 on alpha, but not on a
contemporary Linux build.  Strange.  Although I guess if there is a
threads-related issue here, that might explain why the severity changes between
platforms (and maybe even quite subtle details of system config).
Just an update:

I was able to tickle this bug under Mozilla 0.9.3 Linux

I was unable, however, to tickle it under Mozilla 0.9.3 win32. It may be a
platform specific issue afterall. Mozilla for Win32 seems to run like a dream.

Hopefully this will be fixed soon so I can return to using Mozilla on my
favorite OS. }:)

-John
Seeing this repeadely on win2k.
for the screenshots used build 2001080508
Removing nsenterprise nomination; moving to nsBranch.
Keywords: nsenterprisensBranch
Blocks: 99142
Darin - Is this resolved for 0.9.4? Should we close this one out?
i believe this has been fixed... as there haven't been any similar reports for
ages... marking FIXED.
Status: NEW → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
This is not fixed. I'm still having this problem with both the latest nightly
and with 0.9.4.

Test case page is still www.dilbert.com. Page starts to load, then stops
loading. There is then a long wait, and either it never finishes loading, or it
eventually loads after about 2-3 minutes.

If you try to leave dilbert.com while it's stuck, other pages won't load at all,
even local pages.

If I'm doing something wrong, someone please let me know! I just wanted to
provide input since the original problem doesn't seem to be fixed.

-John
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Hmm. After experimenting further, I find that it seems to be something that's
blocking during the page load. Eventually the page finishes loading, even if it
takes five minutes.

Whatever it is that's blocking also keeps other pages, in other browser windows,
from opening. However, when that blocking function eventually times out, the
browser starts working again.

This may be a threading issue. Hopefully I'm not the only one experiencing this. }:)

-John
john: what platform are you noticing this on?
I resolve this bug again as WFM because the original problem IS fixed. Your
problem with www.dilbert.com should be entered as a new bug. Please enter the
www.dilbert.com as a new bug, but before try to search a correct duplicate. I
think I've seen this bug already reported.

Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → WORKSFORME
FWIW, I still see this problem frequently under beos when visiting either the
tinderbox or any bugzilla page.  Should I just open up a new bug on that?
I think it should be reported as a new bug because it's probably platform
specific problem.
Because there were problems on other platforms which have been already fixed by
the fix in this bug.
cls: please open a new bug.  thx!
Filed bug 100508 on the BeOS problem.
I still see this bug regulary on Milestone Mozilla 0.9.6 on windows nt with
modern skin. (build 2001112009)

-Page start loading but does not complete. Repeatedely pressing reload will
eventually load the page correctly. 

Notorious pages were this happens for me are the readers comment pages in slashdot.
The link above:  http://www.linuxnews.pl is one that just didn't work either. 
Sander: Please try a fresh profile. You can manage/create profiles with
"mozilla.exe -profilemanager".
Keywords: qawanted
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: