Closed Bug 133853 Opened 22 years ago Closed 22 years ago

why always add <HTML> token into Tokenizer even there is an <HTML> token already?

Categories

(Core :: DOM: HTML Parser, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jerry.tan, Assigned: jerry.tan)

References

()

Details

(Keywords: fixedOEM)

Attachments

(1 file, 2 obsolete files)

http://lxr.mozilla.org/seamonkey/source/htmlparser/src/CNavDTD.cpp#504
it always add <HTML> token into Tokenizer.
I think if the Tokenizer has <HTML> Token already ,
no need to add <HTML> token into it.
it will cause handleToken function to be called one time more.

I think that if Tokenizer's first Token isnot  <HTML> Token, then add one.
  if it is, just skip.
though handle extra token is not expensive, still think if there is an <HTML> 
already, no need to add it into Tokenizer.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Attached patch patch (obsolete) — Splinter Review
in my patch, I just judge the first one token, if it is <HTML>, skip, if not
,add <HTML> token.

some web page will use <DOCTYPE ...> in their first line, to calculate all
Token in tokenzier whether it is <html> or not is expensive, so I skip this
type of html files.
Comment on attachment 77387 [details] [diff] [review]
patch 

>+          eHTMLTags theTag=(eHTMLTags)(mTokenizer->GetTokenAt(0))->GetTypeID();

This will crash if the tokenizer was empty or if the top token was null for
some reason.
Attachment #77387 - Flags: needs-work+
Giving bug back to Jerry since he's working on the bug ;)
Assignee: harishd → jerry.tan
Attached patch patch 1.01 (obsolete) — Splinter Review
updated one.
if FirstToken is null, then add <html> token,
if not null and not html token, add <html> token.
Attachment #77387 - Attachment is obsolete: true
Comment on attachment 82785 [details] [diff] [review]
patch 1.01

>+          //if the content model is empty or not begin with <html>(see bug 133853), then add <html>...
But DOCTYPE, etc., can appear above HTML tag. 

>+        if(mTokenizer) {
>+         CToken* theFirstToken=mTokenizer->GetTokenAt(0);
This check is not necessary ( since there is a similar check on the top of the
function ).
Fix the indentation. Also, remove tabs if there are any.

>+      	  CToken* theFirstToken=mTokenizer->GetTokenAt(0);

Add a whitespace before and after '='. ( This applies to rest of your patch ).
Also, is "theFirstToken" variable necessary?. You can use the existing
variable, thetoken. no?

>+            if(theTag!=eHTMLTag_html) {
Add necessary whitespaces. That is, if (theTag != eHTMLTag_html).
Attachment #82785 - Flags: needs-work+
Attached patch patch 1.02Splinter Review
Attachment #82785 - Attachment is obsolete: true
Comment on attachment 84141 [details] [diff] [review]
patch 1.02

r=harishd
Attachment #84141 - Flags: review+
jst ,can you give sr?
Comment on attachment 84141 [details] [diff] [review]
patch 1.02

sr=jst
Attachment #84141 - Flags: superreview+
verified.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Blocks: </html>
Whiteboard: branchOEM
Whiteboard: branchOEM → branchOEM+
Whiteboard: branchOEM+ → branchOEM+ fixedOEM
Whiteboard: branchOEM+ fixedOEM → fixedOEM
Keywords: fixedOEM
Whiteboard: fixedOEM
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: