Closed Bug 45143 Opened 24 years ago Closed 23 years ago

META HTTP-EQUIV="Refresh" not re-directing properly

Categories

(Core :: Networking, defect, P3)

defect

Tracking

()

VERIFIED FIXED
mozilla0.9

People

(Reporter: stephe, Assigned: jag+mozilla)

References

()

Details

(Keywords: testcase, Whiteboard: would like for 0.9)

Attachments

(10 files)

On the this web site, there are links that look like

http://www.britannica.com/bcom/redirect/1,5793,,00.html?url=%2Fnews

If I click on this link using Mozilla, I am brought to an error page that says
in part: "The requested object does not exist on Britannica." If I click on the
same link in NN4, it works fine.

In fact I can take the above link and paste into the URL bar and get the same
error. Or I can place the link in a small html page and get the same error when
I click on the link.

This seems to have something to do with the redirection that is goin on.  In NN4
when I click on the link, it brings up the page successfully.  If I copy that
url and paste it into Mozilla's url bar, Mozilla can load the page as well.

This web site did not have this problem at some point in the near past so either
Mozilla has regressed or the web site was redesigned, but it does not look as if
it was.

I tried this using the latest nightly build of Mozilla, 071020, on NT4 and Linux.
It loaded for me just fine with build 2000071108 on Linux.  Can you try this
again with a newer build?
linux, build ID 2000071108
Windows 95, build ID 2000071008

I'm seeing this problem too.

The offending bit is this (you get it when you fetch 
http://www.britannica.com/news/):

<META HTTP-EQUIV="Refresh" CONTENT="0; URL=/brit/0,8532,25,00.html">

which will result in a redirect to "http://www.britannica.com/brit/0"
Status: UNCONFIRMED → NEW
Ever confirmed: true
Does anyone know why we use ; *and* , as token separator in
nsHTMLContentSink.cpp? This breaks this URL with , in it.
ouch! I entered comments into this bug using mozilla awhile ago yet they were
never posted :-/. There are sites out there that use ',' as a seperator and
there are sitest that use ';'. We're screwed on this one.
*** Bug 45250 has been marked as a duplicate of this bug. ***
Not necessarily.

When the value of http-equiv is "refresh" Mozilla could just split (not
tokenize) on the first , or ; it finds, use whatever's before it as the wait
time, and everything after the "url=", not including any trailing invalid
characters, as the destination.

That should cover most if not all cases in current practice.
But bug 45250 doesn't have anything to do with "refresh".
I think bug 45250 is a different bug...

http://lxr.mozilla.org/seamonkey/source/layout/html/document/src/nsHTMLContentSink.cpp#4130

That's where the meta http-equiv=refresh is dealt with.

Also see my comments in bug 45250 which I just reopened.
We could look for the first available tokenizer char ; or ,. Then we could do
the next searches only with that char first found. That would also work in most
cases, but would fail if there is only a url in the refresh header and it
contains a ,. Anyway, in my opinion, if we have a site that conforms to the
standards and we break it, because we want to fix other sites that violate those
standards, the path is clear: We support the standards supporting site 100%, may
the other site be part of the top 100 or not. 
Added testcase since britannica has changed and now works in Mozilla (yes, also
those older versions with which it didn't work).

I think I've found the fix for this... I'll attach it in a while.
Keywords: testcase
Actually... Could someone point at the CNN pages which use a ',' seperator?
I've tried to use the ',' seperator in a testcase and NS4.73 just choked on it.

So, the obvious fix is to just not use ',' as a seperator. However, I'll attach
a patch which needs review and some testing (can't build here), and barring any
mistakes fixes all but one case: content="12,foo.html", where "12,foo.html" is a
page, instead of indicating a 12 second wait with a redirect to foo.html. The
workaround for that case is content="0,12,foo.html" or
content="url=12,foo.html".

However, I'd rather ',' not be a valid seperator at all.

Adding patch, requesting review.
Keywords: patch, review
I'll attach a new patch which is a bit cleaner and deals with most cases.
Basically it tries to get the number of seconds (optional) and uri (optional),
allowing both ',' and ';' as separators, and throws away anything that follows
the uri and can't/shouldn't be part of it.
Target Milestone: --- → Future
*** Bug 49447 has been marked as a duplicate of this bug. ***
Blocks: 49476
This looks good to me. The problem of bug 49447 works with this. The dot problem
is also taken care of.
*** Bug 42531 has been marked as a duplicate of this bug. ***
Assigning this bug to myself and I'll try to get this patch r=, a= and checked
in.
Assignee: gagan → disttsc
Got an r=valeski, keyword review -> approval.
Keywords: reviewapproval
Accepting bug (sshhh, bugzilla), got a fix, rewriting it slightly per waterson's
request.
Status: NEW → ASSIGNED
latest patch looks good to me, r=valeski.
The last patch looks good to me. harishd: could you be 2nd reviewer on it?
valeski has already given it one look through. a=waterson, so long as r=harishd.
*** Bug 57637 has been marked as a duplicate of this bug. ***
Attaching a new patch, based on comments from harishd... Added comments, changed
the parameter naming style and looked at the return values and how I've changed
them.

I completely rewrote the parser logic to use iterators, it's a lot cleaner (and
hopefully understandable) this way (I hope).

Here are a few questions:

1) Can someone explain to me when ProcessMETATag is supposed to pass on errors
it encounters, and when not (other than "we know stuff might go wrong so we'll
try this alternative")?

2) Currently we skip all of the "header" processing code if we don't have a
docShell. Is this correct behaviour, or do we only need to do that for those who
really need the docShell?

3) I've moved |mDocument->SetHeaderData(fieldAtom, content);| up so it'll always
be set. It used to always be set for all but "refresh" and "set-cookie", where
it would only be set if we didn't |return rv;| early. What is the
correct/desired behaviour here? Move it down and make it only be set if
NS_SUCCEEDED(rv)? Make it only be set for certain header types?

Also see the // -jag- comments for the above three questions

Btw, currently there are a number of |if (NS_FAILED(rv)) return rv;| which
prevent us from reaching |NS_RELEASE(it);|. I've turned all that logic around,
making it fall through to the NS_RELEASE on error, and continue if all's ok.
Oh, and I forgot... What kind of whitespace can I be expecting in the content
attribute? Is |nsCRT::IsSpace| (' ', '\n', '\r', '\t') enough, or do I need the
'\b'?

scc, is there something nsReadableUtils.h could provide? |skipChars(i1, i2,
charSet)| or perhaps a more specialized |skipWhitespace(i1, i2)|?
r=valeski
Yay... Found another spot where this code lives.
http://lxr.mozilla.org/seamonkey/source/docshell/base/nsDocShell.cpp#3783
*** Bug 59023 has been marked as a duplicate of this bug. ***
cc'ing rpotts. Can you help us answer some of the questions in my 2000-10-26
09:25 comment?
r=harishd
hey peter,

I do not think that '\b' needs to be included as a whitespace character.  I'd 
stick with isAsciiSpace(...).

I took a look at your patch and have a few suggestions:
In ProgressMetaTag(...)
1. I would change 'it' to a nsCOMPtr<nsIHTMLContent> to avoid having to do an 
explicit NS_RELEASE
2. I would also change fieldAtom to an COMPtr.  Remember, to use the 
dont_AddRef(...) modifier as follows:
    nsCOMPtr<nsIAtom> fieldAtom( dont_AddRef(NS_NewAtom(header)) );
to keep the refcounting correct :-)

By using COMPtrs you don't need to worry about any early return paths that exist 
now, or could be introduced in the future :-)

3. I would only call mDocument->SetHeaderData(...) if the data is correct.  This 
preserves the same semantics as the old code - and makes some sense :-)

ProcessMETARefresh(...) is a really scary method.  Mostly because it is a 
complete rewrite of the HTTP-EQUIV=REFRESH parsing.  I know that whenever we 
change this code some part of meta-refresh breaks :-(

1. Get rid of the '\b' whitespace checks.

2. Should seconds be cleared if we drop into the 'back to square 1' case?  It 
seems like this case assumes that no time was specified.  But you could have 
some lame, relative URL that began with a digit :-)

3. It looks like the new code will also accept 'HTTP-EQUIV=REFRESH CONTENT="5 
URL=...".  Where the semicolon (or comma) is completely missing.  Is this 
correct? Or are we being too forgiving? (I don't know the answer :-)

4. Can we fix the spelling of 'reefer' :-)  I have no idea what 'reefer' is 
meant to be.  It looks like it's been in the code way too long!

-- rick
dude! I put "reefer" in as a subliminal suggestion that it might be easier to
work on this code after a joint ;-).
> 1. I would change 'it' to a nsCOMPtr<nsIHTMLContent> to avoid having to do an 
> explicit NS_RELEASE
> 2. I would also change fieldAtom to an COMPtr.  Remember, to use the 
> dont_AddRef(...) modifier as follows:
>     nsCOMPtr<nsIAtom> fieldAtom( dont_AddRef(NS_NewAtom(header)) );
> to keep the refcounting correct :-)

Yup, sounds good, will do that.

> 3. I would only call mDocument->SetHeaderData(...) if the data is correct.
> This preserves the same semantics as the old code - and makes some sense :-)

Yes that makes sense. However, currently it seems to happen regardlessly for
"default-style", "link", "content-base" and "window-target". Only "refresh" and
"set-cookie" possibly return (in the old code) before reaching
SetHeaderData([0]). So I'm curious, is this what we want?

Also, (did I mention that here yet? Nope, doesn't look like I did) SetHeaderData
uses the first parameter as a unique ID (a key) to store the data, so with
multiple set-cookie headers, you lose data. Apparantly that is currently not a
problem because of |cookieServ->SetCookieString| which also stores it, then adds
whatever's in the header data store and isn't duplicate (or so I was told).

> ProcessMETARefresh(...) is a really scary method.  Mostly because it is a 
> complete rewrite of the HTTP-EQUIV=REFRESH parsing.  I know that whenever we 
> change this code some part of meta-refresh breaks :-(

Well, we can change this to do exactly what NS 4 does if that's deemed The Right
Thing.

> 1. Get rid of the '\b' whitespace checks.

Will do.

> 2. Should seconds be cleared if we drop into the 'back to square 1' case? 
> It seems like this case assumes that no time was specified.  But you could
> have some lame, relative URL that began with a digit :-)

Urgh, you're right. I did that in one of my previous patches, I forgot it in the
rewrite... Will do.

> 3. It looks like the new code will also accept 'HTTP-EQUIV=REFRESH CONTENT="5
> URL=...".  Where the semicolon (or comma) is completely missing.  Is this
> correct? Or are we being too forgiving? (I don't know the answer :-)

Can we be too forgiving? This wasn't exactly by design (oops) but I'm starting
to like it :-) I could however just skip everything if I don't find a ';' or ','
there (add an |else { iter = doneIterating; }| to do that).

This kind of feedback is what I was looking for :-) Thanks Rick

So that answered my question #3 from a while back, leaving #1 (ProcessMETATag
return value) and #2 (when should no docShell cause us to skip code?)
we need to be forgiving.
*** Bug 60223 has been marked as a duplicate of this bug. ***
For clarity, I'll repeat the unanswered questions (just in case someone actually
reads their bug mail):

1) Can someone explain to me when ProcessMETATag is supposed to pass on errors
it encounters, and when not (other than "we know stuff might go wrong so we'll
try this alternative")?

2) Currently we skip all of the "header" processing code if we don't have a
docShell. Is this correct behaviour, or do we only need to do that for those who
really need the docShell?
*** Bug 66265 has been marked as a duplicate of this bug. ***
*** Bug 66283 has been marked as a duplicate of this bug. ***
Nominating for mozilla0.9 (and recommend nsbeta1 although as I don't work for
Netscape I'm not setting that) this really needs to be fixed before any more end
user moz spinoffs (NS6.x, beonex, etc) hit the market.

Like I said in my bug http://bugzilla.mozilla.org/show_bug.cgi?id=49447, I was
working at the Guardian at the time and I was creating a simple series of
sidebars for their network of sites in the hope that they'd help draw interest
in Netscape 6 once it was launched. Well, because they use the StoryServer
content management system and by default it uses commas in the URLs this caused
problems with the sidebars as the wouldn't work as they periodically refreshed
using meta tags.

So because this bug wasn't fixed for NS 6.0 it meant the sidebars never
happened, so Netscape lost out on some free publicity, I don't work there at the
moment so even when this bug is fixed I dunno if someone will revive the idea of
the news sidebar panels for the site.

But anyway, this created a bad impression of Netscape 6.0 and really needs to be
sorted for the next release.
Keywords: mozilla0.9
Well, I've got a patch, all I need is answers to those two questions.

Otherwise I'll just try to keep things the way they were as much as possible and
open a new bug on those questions. Going for mozilla 0.9.
Target Milestone: Future → mozilla0.9
Okay, I'm going to update that patch and try and get some traction. Updating
summary to better reflect the bug.
Summary: Mozilla will not follow certain links → META HTTP-EQUIV="Refresh" not re-directing properly
*** Bug 73276 has been marked as a duplicate of this bug. ***
Hi I´ve been having a problem with mozilla that might be related to this. If you
put <meta http-equiv=refresh content=´0; http://www.somepage.com/´> it actually
works when it shouldn´t. I only realized I was not putting the url= after I
tested it with ns4.7 and ie. I don´t think mozilla should accept invalid input
(at least when no other browsers do it).

There is another problem. If I write <meta http-equiv=refresh content=´0;
something?aa=otherthing´> then mozilla tries to go the the page named
´otherthing´. It catches the part after the = even when there was no url=.

Tested with mozilla 20010319.
> There is another problem. If I write <meta http-equiv=refresh content=´0;
> something?aa=otherthing´> then mozilla tries to go the the page named
> ´otherthing´. It catches the part after the = even when there was no url=.

Once I get my patch updated (and hopefully checked in soon) you should find that
this will work as expected. I do think however I'll allow "url=" to be absent,
even if no other browser supports it.
yea, we need to be able to handle absent "url=" tokens.
*** Bug 73375 has been marked as a duplicate of this bug. ***
I've decided to mark nsbeta1 despite the fact I normally avoid nominating for
Netscape's release schedule because if this isn't fix it's going to make
Netscape 6.5 look really bad for the reasons I keep outlining both here and in
Bug 49447 particularly as there's a patch here and just a couple of questions to
be resolved. Someone familiar with this code should be able to answer these
remaining questions fairly quickly.
Keywords: nsbeta1
Hardware: PC → All
I have to go away for the evening, but I'll run it through my test-suite (as
soon as I recover that, I managed to trash the partition which has it) when I
come back. Putting up the untested patch so anyone interested can run it through
their tests.
*** Bug 73791 has been marked as a duplicate of this bug. ***
My bad - Bug 73791 is not a dup
*** Bug 74054 has been marked as a duplicate of this bug. ***
So, by moving the code into nsDocShell as part of nsIRefreshURI I'm able to get
rid of the code duplication and update the code in nsDocShell (which has kinda
been neglected while its cousin was being fixed).

It fixes all of the dupes of this bugs and other uris I could find, though it'd
be very nice if someone could run this through a test suite.
r/sr=rpotts.  This looks really good !

I only have a couple of questions/comments:
- why is '\b' considered whitespace?  It doesn't look like it was in the
  old parsing code...

- in nsDocShell::RefreshURIFromHeader(...) I think that you should return
  'rv' at the bottom (rather than NS_OK)...  This allows the failure codes
  to be returned correctly...  Then, you can also remove the check for NS_FAILED
  after the call to RefreshURI(...)
I'll remove this assertion (it could also trigger if someone did
content="3000000000; url=bla.html"):
+        NS_ASSERTION(seconds >= 0,"Parse error in content attribute of (meta)
http-equiv=refresh");

- why is '\b' considered whitespace?  It doesn't look like it was in the
  old parsing code...

I think I copied it from this code:
http://lxr.mozilla.org/seamonkey/source/string/obsolete/nsString2.cpp#40

I'm not sure how necessary it is though...

- in nsDocShell::RefreshURIFromHeader(...) I think that you should return
  'rv' at the bottom (rather than NS_OK)...  This allows the failure codes
  to be returned correctly...  Then, you can also remove the check for NS_FAILED
  after the call to RefreshURI(...)

Yeah, I've been struggling with the return value, but I guess if we can't create
a new URI, something's wrong, and we should return early from
HTMLContentSink::ProcessHeaderData. Note though that that's a change from the
current code, which doesn't. Also note that nsDocShell::SetupRefreshURI
completely throws away the return value, should that too be fixed?
Whiteboard: would like for 0.9
go ahead and check in when your ready - approved by drivers for 0.9
I also fixed two minor bugs:

1) in case content="15.html", it would optimistically parse that as 15 seconds,
encounter the '.', decide the number was part of a url, and re-parse it as a
url. But it didn't reset seconds to 0, so you'd refresh after 15 seconds instead
of immediately. This bug was actually pointed out by rpotts 2000-11-07 12:16
comment. I'm pretty sure I fixed it back then, I guess I lost it somewhere in
the bitrotting. Whoops :-)

2) in case content="-1; url=foopy.html", it would try to go to url "-1;". I'm
now allowing for one of '-', '+' before the number, and setting any negative
numbers to 0 later on (yeah, I could just directly set to 0, but who knows, we
may invent time travel someday!).
looks good...  (r/sr=rpotts)

-- rick
r=brendan@mozilla.org if you need it, I read the patch, it's all good.

/be
Checked in, marking fixed. Thanks everyone!
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Keywords: approval
Resolution: --- → FIXED
*** Bug 79080 has been marked as a duplicate of this bug. ***
Status: RESOLVED → VERIFIED
verified:
WinNT4 2001052204
Mac os9 2001051504
Linux rh6 2001052213
not sure if this is a regression, or a new manifestation, but 
http://www.peoplesclinic.org/ continuously reloads.
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
jud: which build and what platform?

I can't see it on the 200105304 build on Windows ...
Don't see this on Linux 2001052908 FWIW
I was seeing this w/ a 0.9 binary on win98.
12 dups. Marking mostfreq.
Keywords: mostfreq
valeski, can you reproduce this with a recent nightly build?
qa to me.
+qawanted, verifyme

If this is regressing, a build AND a testcase URL would be most helpful.

This is an area of HTML I am not especially familiar with.
Keywords: qawanted, verifyme
QA Contact: tever → benc
from my previous comment:

"not sure if this is a regression, or a new manifestation, but 
http://www.peoplesclinic.org/ continuously reloads."

I reopened this because jag recently checked in some META REFRESH changes. It
*may* be a separate bug.

steps to repro:
1. visit the url outlined above.
2. watch it continuously reload.

again, I did this on win98 w/ a 0.9 build off of ftp.mozilla.org. I don't have a
win98 box avail anymore to verify it w/ newer builds.

Mass move to jaggernaut@netscape.com
Assignee: jag → jaggernaut
Status: REOPENED → NEW
I can see this with the 0.9 build, can't see it with any recent nightly, and 
from what I can tell those pages don't use META REFRESH but javascript to do the 
refresh. Marking this bug fixed again.
Status: NEW → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
not seeing this on todays 0.9.1, also passing META Refresh test cases

verified: Win98 2001060409
Status: RESOLVED → VERIFIED
QA Contact: benc → tever
Keywords: qawanted
Keywords: verifyme
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: