Closed Bug 124904 Opened 23 years ago Closed 22 years ago

Gecko parser still recognises meta tag's content even when it's surrounded by comments

Categories

(Core :: DOM: HTML Parser, defect, P1)

x86
Windows 2000
defect

Tracking

()

VERIFIED FIXED
mozilla1.0

People

(Reporter: hansoodev, Assigned: harishd)

References

Details

(Keywords: topembed+, Whiteboard: Fix checked into 0.9.4 and 0.9.9 branch)

Attachments

(4 files, 6 obsolete files)

The page already contains meta tag about charset info for the whole page.
Later on in that page, there is another meta tag section defining another 
charset info, but this time, it's surrounded by the comment marks.
( e.g. <!--    and  -->  )

In this case, Gecko overwrites the charset info with later one, which is not 
supposed to be interpreted by the browser engine.


EX )
< other meta_tag's contents;    charset=A >
   .
   .
   .
Begin_comment  < other meta_tag's contents;    charset=B >   End_comment


In this case, the final charset is set to B, resulting some of characters on 
the page not readable.


I am also including the test case files, one of which include the extra comment 
with extra meta tag content, and one withou it.

Thanks.
In This case, since ISO-88590-1 is overwritten by UTF-8,
all accented characters are replaced with ? marks.
In This case, ISO-88590-1 *IS* the encoding detected
and 
all accented characters are displayed correctly.
wfm 2002021108 win98
Mozilla sets the char encoding to iso-8859-1 in both testcases.
Roy,

Would you put appropriate "priority", "severity", and "target mailstone"
for me ?

I will also update how much this needs to be fixed on my end.
However, as the testing cases show since there is a work around,
it should not be high priority. (?)

Thanks.

-------------------------------------------------------

Gilles Durys

I have tested with NS 6.2.
---> Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.4) Gecko/20011128 
Netscape6/6.2.1 

The result are the same as what I have described.

It looks like it is fixed in trunk, then.
However, I wonder the fix has been applied to most of branches.

Thank you for your input.
Assignee: harishd → yokoyama
Parser owner:  Correct me if I am wrong; but I recall 
that we have fixed this problem.  Do we have it in 0_9_4_BRANCH?
Assignee: yokoyama → harishd
Reporter: Can you test this with a mozilla build (0.9.8 or later) ?
If this works, then this bug is "worksforme" because this is not a Netscape bug 
database..
This is not fixed yet! I'm sure there is another open bug on this.  I don't
think this bug exists in 0.9.4 since the code on the branch is a bit different
than the tip.
Harish,

I think the bug exists in both 9.4 and 9.4.2 branches.

Is there any update on this bug ?

Thank you very much.

HK
Accepting and targeting to 1.0
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Priority: -- → P1
Target Milestone: --- → mozilla1.0
FYI: With the patch mozilla did not hang / crash. However, need to make sure
that I did not regress anything ( related to charset ). Yokoyama, do you know
of any test suite that I can use to test the patch before landing? Thanks.
+  while (tagStart != end) {
+    if (FindCharInReadable('<', ++tagStart, end)) {

++tagStart should probably go to the end of the while-loop since on our first
iteration we could, potentially, miss the first tag. 
Attached patch patch v1.3 (obsolete) — Splinter Review
harishd: to start, we can use attached test files by Hansoo
         Generic i18n test pages can be found at Test Data/Double Byte under
         http://babel.mcom.com/index_new.html
Adding KWs
Keywords: edt0.9.4, topembed
Comment on attachment 71544 [details] [diff] [review]
patch v1.3.1 [ diff with "uw" options - for reviewers ]

1. Make GetNextChar() into something like

 return (aStart != aEnd) ? *(++aStart) : '\0';

2. while (!foundMDC || !foundMatch)
can become
while (!foundMDC)

as foundMDC will turn true only if foundMatch is true.
Attachment #71544 - Flags: review+
Yes, return (++aStart != aEnd) ? *aStart : nsnull; is a bug will fix that.
Comment on attachment 71544 [details] [diff] [review]
patch v1.3.1 [ diff with "uw" options - for reviewers ]

>Index: nsParser.cpp

>+            foundMatch = (foundMatch) ? PR_FALSE : PR_TRUE; // toggle until we've matching "--"

how about "foundMatch = !foundMatch;" instead?

otherwise, sr=darin
Attachment #71544 - Flags: superreview+
darin: dp suggested the same thing ( on AIM ). Will do that. 
darin: dp suggested the same thing ( on AIM ). Will do that. 
requesting edt0.9.4+
Attached patch Patch 1.4 (obsolete) — Splinter Review
I've made the changes suggested by Darin and dp.  Attaching a new patch. 
Harish, please take a look and give it a green light for checkin. Thanks!
Attachment #71428 - Attachment is obsolete: true
Attachment #71429 - Attachment is obsolete: true
Attachment #71431 - Attachment is obsolete: true
Attachment #71543 - Attachment is obsolete: true
Attachment #71544 - Attachment is obsolete: true
Good to go. Except for the missing ';' near foundMatch = !foundMatch. :-)

Thanx Nisheeth.
Fix has been checked in to the 0.9.4 branch on behalf of Harish (This is
Nisheeth typing on Harish's computer).

Harish will check this in on the trunk later.
Whiteboard: Fix checked into 0.9.4 branch
Sorry about the missing semicolon!  This patch fixes that problem.  This is
what got checked into the 0.9.4 branch.
Attachment #71801 - Attachment is obsolete: true
Keywords: fixed0.9.4
QA Contact: moied → teruko
Keywords: edt0.9.4edt0.9.4+
Harish...

I tried the patch posted... it works...

Thank you for your hard work.

FYI, I did not do any regression tests since not sure which ones to be done.

HK ( Hansoo Kim )
Hansoo: Just make sure that international pages get the correct charset. 

Try: http://komodo.mozilla.org/buster
> Hansoo: Just make sure that international pages get the correct charset. 

> Try: http://komodo.mozilla.org/buster

I don't know if this URL helps completely. This page loads target sites 
in one of the frames, but the top frame does not have any met-charset info. 
So, if you look at the Character Coding menu,it will always be set to your 
default encoding. 
You need to check the view frame info to get the info on charset for
loaded pages. I prefer a more straightforward page loading for this
type of test.

Interestingly, Hansoo's test page does not fail if one of
multi-byte auto detectors is turned on. All, Russian, Ukrainian
don't work but all others can avoid this problem. I guess we are
eliminating UTF-8 from the possibillity in these auto-detectors?
If harishd is going to check in the fix into the trunk, let's make
sure that auto-detect cases are also covered.
This patch in 094 is causing www.disney.com to hang. (exact url is 
http://disney.go.com/park/homepage/today/flash/index.html)

The page contains serveral meta tags which mess up the new parsing codes. 

Please confirm.
will take a look
Comment on attachment 72654 [details] [diff] [review]
patch v1.4.2 [ fix the hang ]

r=heikki
Attachment #72654 - Flags: review+
Comment on attachment 72654 [details] [diff] [review]
patch v1.4.2 [ fix the hang ]

sr=darin
Attachment #72654 - Flags: superreview+
approved by marek
Fixed on the branch.
How is this different from bug 105725?
*** Bug 105725 has been marked as a duplicate of this bug. ***
Keywords: nsbeta1
Comment on attachment 72654 [details] [diff] [review]
patch v1.4.2 [ fix the hang ]

a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #72654 - Flags: approval+
Comment on attachment 71807 [details] [diff] [review]
Patch Version 1.4.1 that got checked into 0.9.4 branch

a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #71807 - Flags: approval+
FIXED on the TRUNK & BRANCH.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
This was a very late "trunk" checkin. We need to push this fix to 099 ASAP.
Reopening until 099 checkin occurs.
Status: RESOLVED → REOPENED
Keywords: edt0.9.9
Resolution: FIXED → ---
Landed into 0.9.9 branch. Marking FIXED.
Status: REOPENED → RESOLVED
Closed: 22 years ago22 years ago
Resolution: --- → FIXED
Whiteboard: Fix checked into 0.9.4 branch → Fix checked into 0.9.4 and 0.9.9 branch
Verified the check-in with bonsai...

Thank you, Harish...
Keywords: fixed0.9.9
Verified as fixed in 04-05 0.99 build.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: