Closed Bug 278404 Opened 20 years ago Closed 20 years ago

&prod causes ∏ to be displayed

Categories

(SeaMonkey :: General, defect)

1.7 Branch
x86
All
defect
Not set
normal

Tracking

(Not tracked)

VERIFIED DUPLICATE of bug 155047

People

(Reporter: dan, Unassigned)

References

()

Details

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.0; en-GB; rv:1.7.6) Gecko/20050112
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-GB; rv:1.7.6) Gecko/20050112

When the term &prod is used in a dynamic URL it displays the ∏ symbol. 
Surely it shouldn't do that unless there realy is a semi-colon in the URL?

Reproducible: Always
your URL should use &prod

It's true that Mozilla doesn't require the ; anymore, but that was done for
IE-compatibility unfortunately.
Summary: &prod causes ∏ to be displayed → &prod causes ∏ to be displayed
I don't understand what you mean by IE-compatibility?? IE (v6) displays that
page correctly!?
Version: unspecified → 1.7 Branch
Comfirmed with Mozilla Suite 1.8a6 release build/Win-2K.
> Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.8a6) Gecko/20050111

This is *NOT* a problem on real URL in anchor tag.
Status bar displays this link as expected("&prod=" is displayed as "&prod="
properly).
Problem in the test case(URL: field of this bug) is on text string of "&prod" in
[URL-format-plain-text] part in following HTML source format.
> <a href="...">[URL-format-plain-text]</a>

I know "&" in text in HTML is recommended to be written as "&amp;".
But text string of "&prod=" should not be interpreted as "&prod;=", I think.

Jo Hermans, do you know the bug number which introduced the "IE-compatibility"
you say? 
Status: UNCONFIRMED → NEW
Ever confirmed: true
The bug also appears in Standards compliance mode (where, I think, dirty
work-arrounds to implement IE bugs should disappear, like the document.all...)
(By the way, I see it on linux too, so OS->ALL ?)
confirmed, I see this with Mozilla on Linux and on Solaris so changing OS to "All"
OS: Windows 2000 → All
invalid, see bug 155047 comment 2

also see http://www.w3.org/TR/html4/charset.html#entities
"Note. In SGML, it is possible to eliminate the final ";" after a character
reference in some cases (e.g., at a line break or immediately before a tag). In
other circumstances it may not be eliminated (e.g., in the middle of a word). We
strongly suggest using the ";" in all cases to avoid problems with user agents
that require this character to be present."

the = sign is one of those cases.

*** This bug has been marked as a duplicate of 155047 ***
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → DUPLICATE
(In reply to comment #6)

> the = sign is one of those cases.

I agree on that "=" can be one of cases ";" of character reference can be
eliminated, if this is on data in a tag itself, such as attribute=value in a tag.
But this bug's case is "[preceding_text]&[entity_name]=[trailing_text]" in plain
text data portion in HTML source.
I think next description in "Note:" in "5.3 Character references" of HTML 4.01
reference should be applied, although "=" can usually be one of separater of
words when natural language.
> In other circumstances it may not be eliminated
> (e.g., in the middle of a word).
I believe that this description does not mean "; can be eliminated at any end of
word".
And I believe that next descrition means "; is required usually except on some
special cases such as just before CR+LF or tag starting character."
> We strongly suggest using the ";" in all cases to avoid problems with user
> agents that require this character to be present."

I feel that current logic too widely applies "; can be eliminated".
Christian Biesinger, what do you think? 




ok, HTML 4.01 normatively references SGML. since SGML is an ISO standard, it's
unfortunately not free...
(In reply to comment #8)
> ok, HTML 4.01 normatively references SGML. since SGML is an ISO standard, it's
> unfortunately not free...

Is it caused by use of SGML?
I cannot believe it.
Bug 155047 comment #0 says :
> It happens for: &amp;nbsp, &amp;pound, &amp;yen, &amp;deg, &amp;cent, &amp;#123
> but not for: &amp;plus, &amp;period, &amp;equals, &amp;dollar
If due to SGML use or definition, I believe same logic will be(should be)
applied to both "&amp;yen" and "&amp;dollar".
But not.

Christian Biesinger, why ";" after "&amp;yen" can be elminated even though ";"
after "&amp;dollar" should not be eliminated?
Verified.  SGML clearly spells out what "can be eliminated" means -- in brief,
any character that's not a valid entity name character indicates the stop of the
entity.

Note that any page that doesn't escape the '&' is depending on the wholly buggy
behavior of HTML browsers which show the entity name when they don't know the
entity... a real SGML processor would simply treat the document as being in
error at that point instead.
Status: RESOLVED → VERIFIED
> why ";" after "&amp;yen" can be elminated even though ";" after "&amp;dollar"
> should not be eliminated?

Because HTML defines an entity named "yen", but not an entity named "dollar". 
"&dollar;" (with the ';') will also just show as plaintext.  See the part about
handling unknown entities in comment 10.

And please test things before making claims about when ';' can be eliminated
(that is, put the ';' in, and see what happens).
(In reply to comment #11)
Sorry for my bad question based on undefined entitiy name.
My concern is "&" with valid entitiy name followed by "=" without ";" case. 
I understand that using "&amp;" is always recommended.
But I also think accepting "omition of ;" should be based on SGML since Mozilla
uses SGML.

http://www.isgmlug.org/sgmlhelp/g-sg17.htm says :
>Once an entity has been declared, it may be referenced anywhere within a document. 
>This is done by supplying its name prefixed with the ampersand character and
>followed by the semicolon.
>The semicolon may be omitted if the entity reference is followed by a space
>or record end.

This document is not exact SGML standard definition but I think this description
is basic concept on entity reference in SGML document.
(Sorry but I still don't know where is official SGML standard definition.)
I think that most natural understanding of "record end" in HTML is "Line
Break(end of line)", and if added, tag start character("<" when HTML).

I think next in HTML specicification corresponds to "record end" in SGML,
> it is possible to eliminate the final ";" after a character reference in some
> cases (e.g., at a line break or immediately before a tag)
and next corresponds to "space" in SGML.
> In other circumstances it may not be eliminated
> (e.g., in the middle of a word).
(In other words, "If not followed by space, ';' is required".)

"=" is apprently not "space".
Boris Zbarsky, "=" in text between <a> and </a> in HTML source is "record end"
in SGML?
"record end", in this context, is what I said -- anything that's not a valid
entity name character.
Oh, I see.
I can now explain why "&" should be written as "&amp;" to any claiming users :-)
Boris, thanks for your teaching on SGML spec to me.
OK, Wow! That certainly was a learning experiance for me :-)

I now need to go and raise a bug on the forum software I was using for not
translating "&" into "&amp;" and go though all this again!

Keep up the great work people and sorry for wasting time with a duplicate bug :-\
You need to log in before you can comment on or make changes to this bug.