Gecko parser still recognises meta tag's content even when it's surrounded by comments

VERIFIED FIXED in mozilla1.0

Status

()

Core
HTML: Parser
P1
normal
VERIFIED FIXED
16 years ago
4 years ago

People

(Reporter: Hansoo Kim, Assigned: harishd)

Tracking

({topembed+})

Trunk
mozilla1.0
x86
Windows 2000
topembed+
Points:
---

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: Fix checked into 0.9.4 and 0.9.9 branch)

Attachments

(4 attachments, 6 obsolete attachments)

(Reporter)

Description

16 years ago
The page already contains meta tag about charset info for the whole page.
Later on in that page, there is another meta tag section defining another 
charset info, but this time, it's surrounded by the comment marks.
( e.g. <!--    and  -->  )

In this case, Gecko overwrites the charset info with later one, which is not 
supposed to be interpreted by the browser engine.


EX )
< other meta_tag's contents;    charset=A >
   .
   .
   .
Begin_comment  < other meta_tag's contents;    charset=B >   End_comment


In this case, the final charset is set to B, resulting some of characters on 
the page not readable.


I am also including the test case files, one of which include the extra comment 
with extra meta tag content, and one withou it.

Thanks.
(Reporter)

Comment 1

16 years ago
Created attachment 68918 [details]
Page with Extra meta tag's content as a comment

In This case, since ISO-88590-1 is overwritten by UTF-8,
all accented characters are replaced with ? marks.
(Reporter)

Comment 2

16 years ago
Created attachment 68919 [details]
Page *WITHOUT*  Extra meta tag's content as a comment

In This case, ISO-88590-1 *IS* the encoding detected
and 
all accented characters are displayed correctly.

Comment 3

16 years ago
wfm 2002021108 win98
Mozilla sets the char encoding to iso-8859-1 in both testcases.
(Reporter)

Comment 4

16 years ago
Roy,

Would you put appropriate "priority", "severity", and "target mailstone"
for me ?

I will also update how much this needs to be fixed on my end.
However, as the testing cases show since there is a work around,
it should not be high priority. (?)

Thanks.

-------------------------------------------------------

Gilles Durys

I have tested with NS 6.2.
---> Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:0.9.4) Gecko/20011128 
Netscape6/6.2.1 

The result are the same as what I have described.

It looks like it is fixed in trunk, then.
However, I wonder the fix has been applied to most of branches.

Thank you for your input.
Assignee: harishd → yokoyama

Comment 5

16 years ago
Parser owner:  Correct me if I am wrong; but I recall 
that we have fixed this problem.  Do we have it in 0_9_4_BRANCH?
Assignee: yokoyama → harishd
Reporter: Can you test this with a mozilla build (0.9.8 or later) ?
If this works, then this bug is "worksforme" because this is not a Netscape bug 
database..
(Assignee)

Comment 7

16 years ago
This is not fixed yet! I'm sure there is another open bug on this.  I don't
think this bug exists in 0.9.4 since the code on the branch is a bit different
than the tip.
(Reporter)

Comment 8

16 years ago
Harish,

I think the bug exists in both 9.4 and 9.4.2 branches.

Is there any update on this bug ?

Thank you very much.

HK
(Assignee)

Comment 9

16 years ago
Accepting and targeting to 1.0
Status: UNCONFIRMED → ASSIGNED
Ever confirmed: true
Priority: -- → P1
Target Milestone: --- → mozilla1.0
(Assignee)

Comment 10

16 years ago
Created attachment 71428 [details] [diff] [review]
patch v1.0  [ Needs additional testing ]

FYI: With the patch mozilla did not hang / crash. However, need to make sure
that I did not regress anything ( related to charset ). Yokoyama, do you know
of any test suite that I can use to test the patch before landing? Thanks.
(Assignee)

Comment 11

16 years ago
Created attachment 71429 [details] [diff] [review]
patch v1.1 [ same as above but without nsDTDUtils changes ]
(Assignee)

Comment 12

16 years ago
Created attachment 71431 [details] [diff] [review]
patch v1.2 [ tiny bit simplified :-/ ]
(Assignee)

Comment 13

16 years ago
+  while (tagStart != end) {
+    if (FindCharInReadable('<', ++tagStart, end)) {

++tagStart should probably go to the end of the while-loop since on our first
iteration we could, potentially, miss the first tag. 
(Assignee)

Comment 14

16 years ago
Created attachment 71543 [details] [diff] [review]
patch v1.3
(Assignee)

Comment 15

16 years ago
Created attachment 71544 [details] [diff] [review]
patch v1.3.1 [ diff with "uw" options - for reviewers ]

Comment 16

16 years ago
harishd: to start, we can use attached test files by Hansoo
         Generic i18n test pages can be found at Test Data/Double Byte under
         http://babel.mcom.com/index_new.html

Comment 17

16 years ago
Adding KWs
Keywords: edt0.9.4, topembed

Comment 18

16 years ago
See Bugscape Bug:
http://bugscape.mcom.com/show_bug.cgi?id=12252

Comment 19

16 years ago
Comment on attachment 71544 [details] [diff] [review]
patch v1.3.1 [ diff with "uw" options - for reviewers ]

1. Make GetNextChar() into something like

 return (aStart != aEnd) ? *(++aStart) : '\0';

2. while (!foundMDC || !foundMatch)
can become
while (!foundMDC)

as foundMDC will turn true only if foundMatch is true.
Attachment #71544 - Flags: review+
(Assignee)

Comment 20

16 years ago
Yes, return (++aStart != aEnd) ? *aStart : nsnull; is a bug will fix that.

Comment 21

16 years ago
Comment on attachment 71544 [details] [diff] [review]
patch v1.3.1 [ diff with "uw" options - for reviewers ]

>Index: nsParser.cpp

>+            foundMatch = (foundMatch) ? PR_FALSE : PR_TRUE; // toggle until we've matching "--"

how about "foundMatch = !foundMatch;" instead?

otherwise, sr=darin
Attachment #71544 - Flags: superreview+
(Assignee)

Comment 22

16 years ago
darin: dp suggested the same thing ( on AIM ). Will do that. 
(Assignee)

Comment 23

16 years ago
darin: dp suggested the same thing ( on AIM ). Will do that. 

Comment 24

16 years ago
requesting edt0.9.4+
(Assignee)

Comment 25

16 years ago
Created attachment 71801 [details] [diff] [review]
Patch 1.4

I've made the changes suggested by Darin and dp.  Attaching a new patch. 
Harish, please take a look and give it a green light for checkin. Thanks!
(Assignee)

Updated

16 years ago
Attachment #71428 - Attachment is obsolete: true
Attachment #71429 - Attachment is obsolete: true
Attachment #71431 - Attachment is obsolete: true
Attachment #71543 - Attachment is obsolete: true
Attachment #71544 - Attachment is obsolete: true
(Assignee)

Comment 26

16 years ago
Good to go. Except for the missing ';' near foundMatch = !foundMatch. :-)

Thanx Nisheeth.
(Assignee)

Comment 27

16 years ago
Fix has been checked in to the 0.9.4 branch on behalf of Harish (This is
Nisheeth typing on Harish's computer).

Harish will check this in on the trunk later.
Whiteboard: Fix checked into 0.9.4 branch
(Assignee)

Comment 28

16 years ago
Created attachment 71807 [details] [diff] [review]
Patch Version 1.4.1 that got checked into 0.9.4 branch

Sorry about the missing semicolon!  This patch fixes that problem.  This is
what got checked into the 0.9.4 branch.
Attachment #71801 - Attachment is obsolete: true

Updated

16 years ago
Keywords: fixed0.9.4
QA Contact: moied → teruko

Updated

16 years ago
Keywords: edt0.9.4 → edt0.9.4+
(Reporter)

Comment 29

16 years ago
Harish...

I tried the patch posted... it works...

Thank you for your hard work.

FYI, I did not do any regression tests since not sure which ones to be done.

HK ( Hansoo Kim )
(Assignee)

Comment 30

16 years ago
Hansoo: Just make sure that international pages get the correct charset. 

Try: http://komodo.mozilla.org/buster

Comment 31

16 years ago
> Hansoo: Just make sure that international pages get the correct charset. 

> Try: http://komodo.mozilla.org/buster

I don't know if this URL helps completely. This page loads target sites 
in one of the frames, but the top frame does not have any met-charset info. 
So, if you look at the Character Coding menu,it will always be set to your 
default encoding. 
You need to check the view frame info to get the info on charset for
loaded pages. I prefer a more straightforward page loading for this
type of test.

Comment 32

16 years ago
Interestingly, Hansoo's test page does not fail if one of
multi-byte auto detectors is turned on. All, Russian, Ukrainian
don't work but all others can avoid this problem. I guess we are
eliminating UTF-8 from the possibillity in these auto-detectors?
If harishd is going to check in the fix into the trunk, let's make
sure that auto-detect cases are also covered.

Comment 33

16 years ago
This patch in 094 is causing www.disney.com to hang. (exact url is 
http://disney.go.com/park/homepage/today/flash/index.html)

The page contains serveral meta tags which mess up the new parsing codes. 

Please confirm.
(Assignee)

Comment 34

16 years ago
will take a look
(Assignee)

Comment 35

16 years ago
Created attachment 72654 [details] [diff] [review]
patch v1.4.2 [ fix the hang ]
Comment on attachment 72654 [details] [diff] [review]
patch v1.4.2 [ fix the hang ]

r=heikki
Attachment #72654 - Flags: review+

Comment 37

16 years ago
Comment on attachment 72654 [details] [diff] [review]
patch v1.4.2 [ fix the hang ]

sr=darin
Attachment #72654 - Flags: superreview+

Comment 38

16 years ago
approved by marek
(Assignee)

Comment 39

16 years ago
Fixed on the branch.
How is this different from bug 105725?
(Assignee)

Comment 41

16 years ago
*** Bug 105725 has been marked as a duplicate of this bug. ***
(Assignee)

Updated

16 years ago
Keywords: nsbeta1

Comment 42

16 years ago
Comment on attachment 72654 [details] [diff] [review]
patch v1.4.2 [ fix the hang ]

a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #72654 - Flags: approval+

Comment 43

16 years ago
Comment on attachment 71807 [details] [diff] [review]
Patch Version 1.4.1 that got checked into 0.9.4 branch

a=asa (on behalf of drivers) for checkin to the 1.0 trunk
Attachment #71807 - Flags: approval+
(Assignee)

Comment 44

16 years ago
FIXED on the TRUNK & BRANCH.
Status: ASSIGNED → RESOLVED
Last Resolved: 16 years ago
Resolution: --- → FIXED

Comment 45

16 years ago
This was a very late "trunk" checkin. We need to push this fix to 099 ASAP.
Reopening until 099 checkin occurs.
Status: RESOLVED → REOPENED
Keywords: edt0.9.9
Resolution: FIXED → ---

Updated

16 years ago
Keywords: edt0.9.9, topembed → edt0.9.9+, topembed+
(Assignee)

Comment 46

16 years ago
Landed into 0.9.9 branch. Marking FIXED.
Status: REOPENED → RESOLVED
Last Resolved: 16 years ago16 years ago
Resolution: --- → FIXED
Whiteboard: Fix checked into 0.9.4 branch → Fix checked into 0.9.4 and 0.9.9 branch
(Reporter)

Comment 47

16 years ago
Verified the check-in with bonsai...

Thank you, Harish...

Updated

16 years ago
Keywords: fixed0.9.9

Comment 48

16 years ago
Verified as fixed in 04-05 0.99 build.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.