Closed Bug 111576 Opened 23 years ago Closed 22 years ago

Incorrect line numbers in HTML files

Categories

(Other Applications Graveyard :: Venkman JS Debugger, defect, P2)

x86
All

Tracking

(Not tracked)

VERIFIED FIXED
mozilla1.0

People

(Reporter: ramon, Assigned: harishd)

References

()

Details

(Keywords: topembed+, Whiteboard: [native][ADT3][fixed on the trunk and branch])

Attachments

(7 files, 13 obsolete files)

980 bytes, text/html
Details
168 bytes, text/html
Details
164 bytes, text/html
Details
79.37 KB, image/png
Details
386 bytes, text/html
Details
102.37 KB, patch
caillon
: review+
caillon
: superreview+
chofmann
: approval+
Details | Diff | Splinter Review
106.44 KB, patch
Details | Diff | Splinter Review
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:0.9.6) Gecko/20011120
BuildID:    2001112009

In at least www.dilbert.com, selecting a given function in the functions
drop-down, results in an incorrect highlighting of the function, and incorrect
placement of breakpoints.

Reproducible: Always
Steps to Reproduce:
1. Open the JS debugger
2. go to http://www.dilbert.com
3. In the debugger, click on the function tree in any function. 

Actual Results:  The selected function appears highlighted 4 or 5 lines before
its real start in the text pane. Breakpoint indicators are also misplaced.

Expected Results:  Highlight it correctly.
I see this everywhere. I'm using Linux build 2001121008. It makes the debugger
unusable for me.
Let's use this bug for the HTML problems, and bug 98255 for XUL files, in case
they work out to be different problems.

Setting P1, and reassigning to harishd.

Assignee: rginda → harishd
Severity: normal → major
OS: Windows 98 → All
Summary: incorrect line numbering of functions in javascript debugger → Incorrect line numbers in HTML files
Status: UNCONFIRMED → NEW
Ever confirmed: true
Priority: -- → P1
ginda: Dude, you can't set priority to my bugs :-) 
sorry, I'm trying to set priorities on debugger bugs.  I had planned to use P1
for all bugs requiring native changes, but then I run into problems like this. 
I'll use [natvive] in the status whiteboard.  Set the priority to whatever you'd
like :)
Whiteboard: [native]
QA Contact: rginda → caillon
Status: NEW → ASSIGNED
Priority: P1 → P2
Target Milestone: --- → mozilla0.9.9
With this patch applied, www.dilbert.com still shows incorrect line numbers. 
The source looks like this...

18: <SCRIPT LANGUAGE="JAVASCRIPT">
19:
20:
21: <!--
22: function rand(n) {
    ...

The debugger thinks |rand| started right after the script tag on line 19, when
it actually starts at line 22.  (The opening "{" should always mark the start
line of a function.)
Save www.dilbert.com and then force an error on the line where the rand function
is. Do you see an error reported at the expected line number? I do.
Attached file testcase
It seems if I work on a testcase long enough it starts to work, so I'm going to
post this now and try to reduce it further after.

This file has an assignment to an undeclared variable called |line_13|, which
happens to be on line 13 according to my editor.  When I start ./mozilla
file:/home/rginda/src/HTML/tests/linenumbers.html, I end up with
"file:///home/rginda/src/HTML/tests/linenumbers.html line 11: assignment to
undeclared variable line_13".
slightly smaller testcase, line 9 is mistaken for line 6
if I take out the spaces between <head> and <script>, line numbers are correct.
Attached patch patch v1.1 (obsolete) — Splinter Review
Attached patch patch v1.2 (obsolete) — Splinter Review
http://my.netscape.com/index2.psp

I see script as content in that page after this patch, not sure if it was there
before or not.
I'm starting another build to see if the problem is still there.  I'll talk
another hour or two at least.
uh, that is, "It'll take another hour or two, at least."
It looks like this patch introduces a regression, I'll attach a screenshot.
Could have something to do with the <!-- --> comments in this script.
Not going to make it to 0.9.9.

--> 1.0
Target Milestone: mozilla0.9.9 → mozilla1.0
Moving Netscape owned 0.9.9 and 1.0 bugs that don't have an nsbeta1, nsbeta1+,
topembed, topembed+, Mozilla0.9.9+ or Mozilla1.0+ keyword.  Please send any
questions or feedback about this to adt@netscape.com.  You can search for
"Moving bugs not scheduled for a project" to quickly delete this bugmail.
Target Milestone: mozilla1.0 → mozilla1.2
This is very important for the JS debugger. Nominating for nsbeta1
Keywords: nsbeta1
Target Milestone: mozilla1.2 → mozilla1.0
I'm applying this to a fresh tree now, results soon.
There seem to be conflicts with the trunk in nsViewSourceHTML.cpp, I think I can
fix them, but a patch against the trunk would be helpful.
Updating my tree. Will post another patch ones it's done.
I embed Mozilla parser since two years and I had always problems with line
numbering (see bug #...).
Last June I wrote a patch (preprocessing html files) that made
nsIParserNode.GetSourceLineNumber() work properly. It doesn't work well with
0.9.6 but I hope these suggestions may be useful.
I removed all whitespaces preceding newlines.

<img src="sample.gif" alt="sample">  \n   
                  |
   
<img src="sample.gif" alt="sample">\n

I moved all newlines between element attributes after element end:
<img src="sample.gif"\n alt="sample"> -> <img src="sample.gif" alt="sample">\n
Excuse me for this kind of spam but I submitted last comment #25 missing bug number.
I was talking about an old bug report: #52915.

I will update my patch for 0.9.9 and I hope to be more useful that time! :-)

Enrico: I removed line number information from the nsParserNode ( Note: I've
been thinking of removing parser nodes altogether ) since it was used only by
the SCRIPT tag. Also, the line number information was readily available in
CNavDTD and my current patch exploits that.
patch 1.3.1 applies cleanly and works like a charm so far, awesome!  I'll stop
by tomorrow or early next week to review the changes.
Comment on attachment 72986 [details] [diff] [review]
patch v1.3.1 [ merged to the tip ]

I'm not all that happy about the name GetSource2, why the name change from
AppendSource()?

- In nsToken.h:

+    virtual PRInt32 GetNewlineCount() { return mNewlineCount; }

I don't see us overloading this anywhere, any reason to not make this
non-virtual?

- The nsLineBreakConverter stuff needs some fixing before this can go in.

- In COtherElements.h:

-      theNode->SetSkippedContent(mText);
+      //theNode->SetSkippedContent(mText); XXX why do we need this?

can we take this out, or do we need to figure out why we need it?

- In CParserContext.cpp, no newline at end of file.

- All the stubbed out CollectSkippedContent() methods should at least truncate
the out string, and initialize the aLineNo variable to something, probably 0.

- In nsHTMLTokenizer.cpp:

+      PRInt32 newline = 0;
+      result = aScanner.SkipWhitespace(newline);
+      NS_ASSERTION(newline == 0, "CAttribute::Consume() failed to collect all
the attributes!");

That assertion doesn't seem to make much sense to me.

- In nsParser::Tokenize(), fairly early on:

       if (theTokenizer->GetCount() == 0) {
	 mFlags &= ~NS_PARSER_FLAG_FLUSH_TOKENS; // reset since the tokens have
been flushed.
+	 result = Tokenize(aIsFinalChunk);
       }

This seems odd to me, why recurse back into Tokenize()? If this is correct,
then this needs a comment explaining why this is done.

- In nsScanner::SkipWhitespace():

+  while (!done && current != mEndPosition) {
     switch(theChar) {
-      case '\n': mNewlinesSkipped++;
+      case '\n':
+      case '\r': aNewlinesSkipped++;

Won't this count a normal windows newline (i.e. \r\n) as two newlines? That's
not what we want here, is it?

- In nsScanner::SkipWhitespace():

+	   skipped = PR_TRUE;
+	   PRUnichar thePrevChar = theChar;
+	   theChar = *(++current);

Here you'r accessing *current after incrementing it w/o checking if you stepped
past the end of the string. Not good.

- In nsScanner::ReadWhitespace(), same thing. Oh, and that code looks a lot
like the code in nsScanner::SkipWhitespace(), could you split out that switch
statement into its own function?

- In the next version of nsScanner::ReadWhitespace(), same thing.

Other than that, this looks ok, but I want to see a new patch before sr'ing.
>I'm not all that happy about the name GetSource2, why the name change from
>AppendSource()?

YYY->AppendSource(XXX) would mean XXX is appended to YYY. With nsToken's
AppendSource YYY get's appened to XXX!. That's, we're trying to fill XXX and
that's a get operation and hence the name change. IMO, GetSource2 is a much
better name than the incorrect AppendSource().

>+    virtual PRInt32 GetNewlineCount() { return mNewlineCount; }
>
>I don't see us overloading this anywhere, any reason to not make this
>non-virtual?

Yeeaaah...may be. ok will do :)

>- The nsLineBreakConverter stuff needs some fixing before this can go in.

Have opened a bug on scc/jag.

>- In COtherElements.h:
>
>-      theNode->SetSkippedContent(mText);
>+      //theNode->SetSkippedContent(mText); XXX why do we need this?
>
>can we take this out, or do we need to figure out why we need it?

AFAIK, the only consumer of COtherElements is COtherDTD. COtherDTD is used my
our editor for strict containment check and I don't think this code will ever
get used. Will find out before landing.

>- In CParserContext.cpp, no newline at end of file.

Will do.

>- All the stubbed out CollectSkippedContent() methods should at least truncate
>the out string, and initialize the aLineNo variable to something, probably 0.

Yes, yes...I have already done that on my tree. 

>- In nsHTMLTokenizer.cpp:

>+      NS_ASSERTION(newline == 0, "CAttribute::Consume() failed to collect all
>the attributes!");
>That assertion doesn't seem to make much sense to me.

Doh!, the assertion should read "CAttribute::Consume() failed to collect all
the newlines!". Will correct that.

>- In nsParser::Tokenize(), fairly early on:
>
>       if (theTokenizer->GetCount() == 0) {
>
 mFlags &= ~NS_PARSER_FLAG_FLUSH_TOKENS; // reset since the tokens have
>been flushed.
>+
 result = Tokenize(aIsFinalChunk);

>       }
>This seems odd to me, why recurse back into Tokenize()? If this is correct,
>then this needs a comment explaining why this is done.

The recurssion is necessary because after all the tokens are flushed, from the
tokenizer, we should resume tokenization for the rest of the document. Without
the recurssion it would take a few more steps for the parser to realize that
it's ok to continue tokenization.

>- In nsScanner::SkipWhitespace():
>+  while (!done && current != mEndPosition) {
>     switch(theChar) {
>-      case '\n': mNewlinesSkipped++;
>+      case '\n':
>+      case '\r': aNewlinesSkipped++;
>
>Won't this count a normal windows newline (i.e. \r\n) as two newlines? That's
>not what we want here, is it?

No it won't. There is code a few lines below that would account for \r\n. All
this code does is to count singled out \r as newline.

>- In nsScanner::SkipWhitespace():
>
>+
   skipped = PR_TRUE;
>+
   PRUnichar thePrevChar = theChar;
>+
   theChar = *(++current);
>
>Here you'r accessing *current after incrementing it w/o checking if you stepped
>past the end of the string. Not good.

We enter the while-loop after checking for end [ while (!done && current !=
mEndPosition) ]. So, it should be safe. However, there is code a couple of lines 
below where I do increment without checking for end and that needs to be fixed.

>- In nsScanner::ReadWhitespace(), same thing. Oh, and that code looks a lot
>like the code in nsScanner::SkipWhitespace(), could you split out that switch
>statement into its own function?

Mmmaaaay be....but not this time :-X

>I want to see a new patch before sr'ing.

Definitely.
Attached patch patch v1.4 (obsolete) — Splinter Review
Depends on: 130665
ADT3 per ADT triage.  Is the js debugger planned as part of Mach V?
Whiteboard: [native] → [native][ADT3]
The debugger will be an optional component on Mach V.  This bug also affects the
line numbers of javascript errors for scripts embedded in HTML (with or without
the debugger installed.)  Without this patch, we report the incorrect line
number in errors and exceptions.

It is very important to the debugger project, and I believe Mach V and Mozilla
in general, to have this bug fixed in the 1.0 timeframe.
Attached patch patch v1.5 (obsolete) — Splinter Review
Changed GetSource2 to AppendSourceTo [ after talking to jst ]
Added GetSourceLineNumber to nsIDTD [ as requested ( email ) by Enrico ]
caillon and myself both have a problem viewing bugzilla patches while running
with this patch.  Clicking on a text/plain link in bugzilla causes mozilla so
spin, but nothing ever loads.
Good catch. Will fix it.
Attached patch patch v1.6 (obsolete) — Splinter Review
Fixed problem with text/plain, text/css, text/javascript and viewing XML/XUL
source.
I'm using this patch as my daily browser on Linux now, and will be for the next
few days.  If I see anything else strange, I'll report back here.

Harish's testcase (and the others attached here) all work for me.
- CollectSkippedContent() in COtherDTD, nsExpatDriver, and CViewSourceHTML
should truncate the out string parameter.

- In nsScanner::SkipWhitespace():

+          skipped = PR_TRUE;
+          PRUnichar thePrevChar = theChar;
+          theChar = *(++current);

You're accessing |current| here after incrementing it w/o ever checking if it
points to the end of the string after you incremented it. Check for that before
referencing *current.

- In both versions of nsScanner::ReadWhitespace(), same as above.

Other than that, sr=jst

Attachment #75015 - Flags: superreview+
http://www.orinocowireless.com/template.html?section=m87&envelope=127&action=folder

Under the heading "Buy Online" I see the text 'lass="kop"' with this patch.  It
doesn't seem to show up without the patch.  Another strange thing to note is
than in View Source, the entire </DIV class="kop"> is purple.  Probably
something to do with attributes in closing tags.
Attached patch patch v1.7 (obsolete) — Splinter Review
fixes the problem pointed by rginda.
patch v1.7 fixes the problems with attributes in closing tags.
Attached patch patch v1.8 (obsolete) — Splinter Review
Removed GetSourceLineNumber from nsIDTD ( that got added in my previous patch )
and moved it to nsIParserNode. 

FYI: The patch still has a problem with newlines within quoted attribute values
( finding from Enrico ). 

IMO, the correct way to fix the line number problem is to have a counter in the
scanner that can keep track of all the newlines encountered as we scan the
document. Unfortunately, it will not be a trivial fix and will not be feasible
in the 1.0 time frame. Will work on that post 1.0.
Comment on attachment 75481 [details] [diff] [review]
patch v1.8

Be consistent with whitespace. Check that space after comma
usages are consistent and things like that.

>Index: htmlparser/public/nsHTMLTokens.h
>===================================================================
>Index: htmlparser/public/nsIDTD.h
>===================================================================
>Index: htmlparser/public/nsIParserNode.h
>===================================================================
>Index: htmlparser/public/nsToken.h
>===================================================================
>+    PRInt32 GetNewlineCount() 
>+    { 
>+      PRInt32 tmp = mNewlineCount;
>+      if (tmp) {
>+        mNewlineCount = 0; // Reset ( since we should never count this more than once ) 
>+      }
>+      return tmp; 
>+    }
>+    void SetLineNumber(PRInt32 aLineNumber) 
>+    { 
>+      mLineNumber = mLineNumber == 0 ? aLineNumber : mLineNumber;
>+    }

These seem really strange, and the approach kind of fragile. So these
(at least GetNewlineCount) are used when we parse a tag, but put it aside 
waiting for a proper insertion point. We count the line before putting it
aside, and should not count it again after we get it from the displaced
stack.

A solution I would like slightly better would be to have the newline count
be set to 0 when the token is placed in the displaced stack (there
are actually several stacks for other kinds of things as well).

This is probably something we want to think a bit more.

>Index: htmlparser/src/CNavDTD.cpp
>===================================================================

Will do later.

>Index: htmlparser/src/CNavDTD.h
>===================================================================
>Index: htmlparser/src/COtherDTD.cpp
>===================================================================
>Index: htmlparser/src/COtherDTD.h
>===================================================================
>Index: htmlparser/src/COtherElements.h
>===================================================================
>Index: htmlparser/src/CParserContext.cpp
>===================================================================
>+             mMimeType.EqualsWithConversion(kXIFTextContentType))

Get rid of the XIF type and all references to it, anywhere you see!

>Index: htmlparser/src/CParserContext.h
>===================================================================
>Index: htmlparser/src/nsDTDUtils.cpp
>===================================================================
>Index: htmlparser/src/nsDTDUtils.h
>===================================================================
>Index: htmlparser/src/nsExpatDriver.cpp
>===================================================================
>Index: htmlparser/src/nsHTMLTokenizer.cpp
>===================================================================

I will still need to look what is before the part below.

>+    if(kGreaterThan != aChar) {
>+      result = ConsumeAttributes(aChar, aToken, aScanner);

Does this still work if we have "/>"?

>Index: htmlparser/src/nsHTMLTokenizer.h
>===================================================================
>Index: htmlparser/src/nsHTMLTokens.cpp
>===================================================================

Will do later.

>Index: htmlparser/src/nsLoggingSink.cpp
>===================================================================
>Index: htmlparser/src/nsLoggingSink.h
>===================================================================
>Index: htmlparser/src/nsParser.cpp
>===================================================================

Will do later.

>Index: htmlparser/src/nsParser.h
>===================================================================
>-    virtual nsITokenizer* GetTokenizer(void);
>+    nsresult GetTokenizer(nsITokenizer*& aTokenizer);

It might be a good idea to leave this as virtual even with the new
signature. Some frankenbrowsers (like DocZilla) might be 
depending on this. Most of the methods are virtual on 
the parser anyway.

>Index: htmlparser/src/nsParserNode.cpp
>===================================================================
> PRInt32 nsCParserNode::GetSourceLineNumber(void) const {
>-  return mLineNumber;
>+  return mToken ? mToken->GetLineNumber() : -1;

I would guess 0 would be better... Check what the old code did.
Also check new code; if new code does nothing special with -1
return from here I would think 0 would make us behave less badly.

>Index: htmlparser/src/nsParserNode.h
>===================================================================
>Index: htmlparser/src/nsScanner.cpp
>===================================================================

Will do later.

>Index: htmlparser/src/nsScanner.h
>===================================================================
>Index: htmlparser/src/nsToken.cpp
>===================================================================
>Index: htmlparser/src/nsViewSourceHTML.cpp
>===================================================================
>Index: htmlparser/src/nsViewSourceHTML.h
>===================================================================
>Index: content/html/document/src/nsHTMLContentSink.cpp
>===================================================================

Will do later.

>Index: content/html/document/src/nsHTMLFragmentContentSink.cpp
>===================================================================

Will do later.
Comment on attachment 75481 [details] [diff] [review]
patch v1.8

>Index: htmlparser/src/CNavDTD.cpp
>===================================================================
>+        NS_ASSERTION(mSkippedContent.GetSize() == 0, "all the skipped content tokens did not get handled");
>+        mSkippedContent.Empty();

Where you hitting this with the new code? Old code? Or is this just better
safeguards you noticed we could have? What happens if we don't clear 
mSkippedContent?

>+        result = CollectSkippedContent(eHTMLTag_xmp, theString, lineNo);

Shouldn't |eHTMLTag_xmp| be |aChildTag| (maybe we treat plaintext and
xmp the same, but still)?

>+CNavDTD::CollectSkippedContent(

Truncate aContent in the beginning of this method rather than later.

>+  NS_ASSERTION(aTag >= eHTMLTag_unknown && aTag <= NS_HTML_TAG_MAX, "tag array out of bounds");

You might want to start using NS_WARN|ABORT_IF_FALSE, ABORT for fatal,
WARN in cases where we can recover to some extend at least
but might still be worth a bug report some day. This is not a requirement,
though.

>+    // This tag doesn't support skipped content.
>+    aLineNo = -1;

I am a bit uncertain about this line number for unsupported. Some callers
don't seem to use it, some typecast it to unsigned - if that ends up
to user they'll see a completely bogus number. Is that the intent? Or
should the caller behave differently for -1?

>+  for (PRInt32 i = 0; i< tagCount; i++){

Move the declaration of i outside.

>+              aContent.Append(PRUnichar('&'));
>+              aContent.Append(theNextToken->GetStringValue());

Hmm... would be it more or less efficient to write that as

aContent.Append(NS_LITERAL_STRING("&") +
theNextToken->GetStringValue());

Try to pay attention to multiple Appends: if you are appending
strings using operator+ should be always better. In this case
I am not sure so we'd better check with the string guys.

>+    // XXX - ConvertStringLineBreaks should be modified to accept nsAString
>+    // Don't checkin without fixing this part
>+
>+    //nsLinebreakConverter::ConvertStringLineBreaks(aContent,
>+    //     nsLinebreakConverter::eLinebreakAny, nsLinebreakConverter::eLinebreakContent);

Seems like there is something to do here.

>+  // Note: TEXTAREA content is PCDATA and hence the newlines are already accounted for.
>+  mLineNumber += (aTag != eHTMLTag_textarea) ? aContent.CountChar(kNewLine) : 0;

Hmm... I am having trouble finding where this happened in the old code...
Comment on attachment 75481 [details] [diff] [review]
patch v1.8

>Index: htmlparser/src/nsHTMLTokenizer.cpp
>===================================================================
>+    if(kGreaterThan != aChar) {
>+      result = ConsumeAttributes(aChar, aToken, aScanner);

For completeness sake repeating the question here:
Does this still work if we have "/>"?

TODO:
nsHTMLTokens.cpp
nsParser.cpp
nsScanner.cpp
nsHTMLContentSink.cpp
nsHTMLFragmentContentSink.cpp
>Index: htmlparser/public/nsHTMLTokens.h
>===================================================================
>Index: htmlparser/public/nsIDTD.h
>===================================================================
>Index: htmlparser/public/nsIParserNode.h
>===================================================================
>Index: htmlparser/public/nsToken.h
>===================================================================
>+    PRInt32 GetNewlineCount() 
>+    { 
>+      PRInt32 tmp = mNewlineCount;
>+      if (tmp) {
>+        mNewlineCount = 0; // Reset ( since we should never count this more
than once ) 
>+      }
>+      return tmp; 
>+    }

>A solution I would like slightly better would be to have the newline count
>be set to 0 when the token is placed in the displaced stack (there
>are actually several stacks for other kinds of things as well).

Yeah, that sounds like a better approach. Will give it a shot and if doesn't
pose anymore complication then will adopt your idea.

>+             mMimeType.EqualsWithConversion(kXIFTextContentType))

>Get rid of the XIF type and all references to it, anywhere you see!

Will do :)

>+    if(kGreaterThan != aChar) {
>+      result = ConsumeAttributes(aChar, aToken, aScanner);

>Does this still work if we have "/>"?

I've added this piece of code to consume attributes in end tags ( go figure! ) -
would matter only for view source. In normal mode this attribute will be ignored
. Anyway, the answer to your question is, this case never worked before but will
work now with the patch.

>+        result = CollectSkippedContent(eHTMLTag_xmp, theString, lineNo);

>Shouldn't |eHTMLTag_xmp| be |aChildTag| (maybe we treat plaintext and
>xmp the same, but still)?

Yes, yes it should be. Good catch.

>+    // XXX - ConvertStringLineBreaks should be modified to accept nsAString
>+    // Don't checkin without fixing this part
>+
>+    //nsLinebreakConverter::ConvertStringLineBreaks(aContent,
>+    //     nsLinebreakConverter::eLinebreakAny,
>nsLinebreakConverter::eLinebreakContent);

scc has provided a patch for this in bug 130665. Will post another patch
ingrating scc's patch and your suggestions.

>+  // Note: TEXTAREA content is PCDATA and hence the newlines are already
>accounted for.
>+  mLineNumber += (aTag != eHTMLTag_textarea) ? aContent.CountChar(kNewLine) : >0;

This wasn't there before. I've added this because for PCDATA line numbers are
already counted and the only PCDATA that calls CollectSkippedContent() is
textarea and hence the specific code :-/

Comment on attachment 75481 [details] [diff] [review]
patch v1.8

>Index: htmlparser/src/nsHTMLTokens.cpp
>===================================================================
>Index: htmlparser/src/nsParser.cpp
>===================================================================
>+      mUnusedInput.Truncate(0); 

You could take out the parameter here, it defaults just ok.

>+nsresult nsParser::GetTokenizer(nsITokenizer*& aTokenizer) {
>+  nsresult result = NS_OK;
>+  if(mParserContext) {
>+    result = mParserContext->GetTokenizer(aTokenizer);
>   }
>-  return theTokenizer;
>+  return result;
> }

You should change this function so that the caller does not need to
pass in a tokenizer variable already initialized. It is really easy
to forget that kind of unusual requirement. Dunno if this compiles
with that kind of signature, but in essence the first thing in
this function should be:

aTokenizer = nsnull;

>+        // Resume tokenization for the rest of the document 
>+        // since all the tokens in the tokenizer got flushed.
>+        result = Tokenize(aIsFinalChunk); 

How did this work before, since I have trouble finding where this happened
originally?

>Index: htmlparser/src/nsScanner.cpp
>===================================================================
>+          theChar = ++current != mEndPosition ? *current : '\0';
>+            theChar = ++current != mEndPosition ? *current : '\0'; // CRLF == LFCR => LF
>+            theChar = ++current != end ? *current : '\0';
>+              theChar = ++current != end ? *current : '\0'; // CRLF == LFCR => LF
>+            theChar = ++current != end ? *current : '\0';
>+              theChar = ++current != end ? *current : '\0'; // CRLF == LFCR => LF

Add parenthesis around the part before '?' so this becomes a
bit easier to read.

Also it looks like there is a lot of common between SkipWhitespace()
and ReadWhitespace() functions so in the future at least it would
probably make sense to put the common part into a subroutine.

>Index: content/html/document/src/nsHTMLContentSink.cpp
>===================================================================
>-                           aResult, skippedContent, !!mInsideNoXXXTag);
>+                           aResult, &skippedContent, !!mInsideNoXXXTag);

Uuh! Get rid of the double bangs. 

And just to be on the safe side,
check if you see any places where someone is comparing to PR_FALSE or
PR_TRUE - those are basically are bugs and should be fixed. The preferred
way to test booleans is

if (foo)

and 

if (!foo)

>Index: content/html/document/src/nsHTMLFragmentContentSink.cpp
>===================================================================

Done!
Attachment #75481 - Flags: needs-work+
>+      mUnusedInput.Truncate(0); 
>You could take out the parameter here, it defaults just ok.

Old code. Just indendation.

>+        // Resume tokenization for the rest of the document 
>+        // since all the tokens in the tokenizer got flushed.
>+        result = Tokenize(aIsFinalChunk); 

>How did this work before, since I have trouble finding where this happened
>originally?

The recurssion is necessary because after all the tokens are flushed, from the
tokenizer, we should resume tokenization for the rest of the document. Without
the recurssion it would take a few more steps for the parser to realize that
it's ok to continue tokenization.

>Also it looks like there is a lot of common between SkipWhitespace()
>and ReadWhitespace() functions so in the future at least it would
>probably make sense to put the common part into a subroutine.

SkipW...() and ReadW...() have been like that forever. Yes, it will be good
to have a common routine for both the methods but I'm running out of time
to make all the clean up work. Will delay your suggestion post 1.0 when I
attack the line number problem the correct way.

>Index: content/html/document/src/nsHTMLContentSink.cpp
>===================================================================
>-                           aResult, skippedContent, !!mInsideNoXXXTag);
>+                           aResult, &skippedContent, !!mInsideNoXXXTag);

>Uuh! Get rid of the double bangs. 

This is old code. Ask jst why we need two bangs :)






Comment on attachment 76235 [details] [diff] [review]
patch v1.9 [ includes heikki's suggestions and scc's patch ]

In CParserContext::GetTokenizer() you should remove the NS_ENSURE_SUCCESS line
so that we will set aTokenizer to nsnull before leaving the function.

With that, r=heikki
Attachment #76235 - Flags: review+
I've built v1.9.1 on linux, and copied it over to my P2/350, 128M machine at
home, running through SERA.

Pageload results before are:
Test id: 3CA26BEA25
Avg. Median : 4380 msec		Minimum     : 740 msec
Average     : 4817 msec		Maximum     : 18848 msec

Pageload results after are:
Test id: 3CA4281D04
Avg. Median : 4500 msec		Minimum     : 769 msec
Average     : 4788 msec		Maximum     : 15141 msec

I'm running this patch as my daily build on my laptop, and will post a linux
test build shortly.
Linux test build is here: www.hacksrus.com/~ginda/mozilla-after.tar.gz

And here's another set of "after" data:
Test id: 3CA4CCD10A
Avg. Median : 4419 msec		Minimum     : 775 msec
Average     : 4434 msec		Maximum     : 13611 msec
Does this mean that we've slowed down by ~3% or we've improved by ~0.6%? 
Here is another set of "before" data:
Test id: 3CA4D540AE
Avg. Median : 4393 msec		Minimum     : 773 msec
Average     : 4421 msec		Maximum     : 13741 msec

Averaging the two runs of each leaves us with:
Avg. Median : 1.6% slower       Minimum     : 2.0% slower
Average     : 0.2% faster       Maximum     : 9.5% faster

Looks to me like we're at least ~1.5% slower.  I'm not sure I trust the Maximum
on the first "before" run, either.  Seems a bit too large, could be due to some
noise.

We're much more "correct" though.  Without this patch, errors reported in top
level script will be off by a few lines in many common situations.

Harish, would you like to search out a perf gain, or should I take it to drivers
as is?
Robert: Frankly, I don't have the time to investigate the perf. problem :-(. We
need to run quantify or some other tool to start the investigation.
Here are three runs of each on my P3/850Mhz 384M laptop...

before:

Test id: 3CA4EE963E
Avg. Median : 1908 msec		Minimum     : 506 msec
Average     : 2433 msec		Maximum     : 10586 msec

Test id: 3CA4F22524
Avg. Median : 1899 msec		Minimum     : 505 msec
Average     : 1907 msec		Maximum     : 5495 msec

Test id: 3CA4F5CAE5
Avg. Median : 1901 msec		Minimum     : 502 msec
Average     : 1902 msec		Maximum     : 5497 msec

after:

Test id: 3CA4FD8612
Avg. Median : 1911 msec		Minimum     : 508 msec
Average     : 2431 msec		Maximum     : 10545 msec

Test id: 3CA504C767
Avg. Median : 1899 msec		Minimum     : 495 msec
Average     : 1910 msec		Maximum     : 5504 msec

Test id: 3CA507A831
Avg. Median : 1901 msec		Minimum     : 497 msec
Average     : 1907 msec		Maximum     : 5499 msec
Here are numbers from a 450Mhz P3 with 128M...

before:
Test id: 3CA8FA460E
Avg. Median : 2779 msecMinimum     : 422 msec
Average     : 2866 msecMaximum     : 9562 msec

Test id: 3CA90237B3
Avg. Median : 2805 msecMinimum     : 405 msec
Average     : 2834 msecMaximum     : 9089 msec

Test id: 3CA9056F72
Avg. Median : 2808 msecMinimum     : 408 msec
Average     : 2839 msecMaximum     : 8630 msec

Test id: 3CA90937FB
Avg. Median : 2809 msecMinimum     : 414 msec
Average     : 2836 msecMaximum     : 9266 msec

after:
Test id: 3CA90E3FCE
Avg. Median : 2809 msecMinimum     : 410 msec
Average     : 2882 msecMaximum     : 9318 msec

Test id: 3CA9116CD5
Avg. Median : 2808 msecMinimum     : 409 msec
Average     : 2831 msecMaximum     : 9195 msec

Test id: 3CA914AD5E
Avg. Median : 2811 msecMinimum     : 417 msec
Average     : 2830 msecMaximum     : 8644 msec

Test id: 3CA9197110
Avg. Median : 2814 msecMinimum     : 405 msec
Average     : 2843 msecMaximum     : 8929 msec
Comment on attachment 76584 [details] [diff] [review]
patch v1.9.1 [ merged to the tip ]

a=roc+moz

BTW, in the future it would be helpful if you obsolete old patches and carry
forward r/sr, or better yet, get the reviewer to do it themselves
Attachment #76584 - Flags: approval+
This bug causes us to report incorrect line numbers for multiline tags and
comments. As such it causes serious defects in the JS debugger and any other
component/product/tool depending on line number information.

This patch has been in the works for over a month and a half, and has gotten
extensive testing.

Performance tests indicate this might cause 0.4% perf hit in page load, although
this falls well within the noise. The patch reduces bloat by not creating a DTD
for every document.write(), and we avoid allocating several string objects on
the heap.
Keywords: adt1.0.0
adt1.0.0- (on ADT's behalf).
Keywords: adt1.0.0adt1.0.0-
I would urge ADT to include this in Mozilla 1.0. The bug affects not
only the JS Debugger, but the JS Console as well. When the bug is biting,
line number information in the JS Console is incorrect, just as it is
in the JS Debugger. Note the JS Console is a feature many users rely on
and is present in every commercial build.

I am a frequent user of the JS Debugger; it is one of the strongest
and most useful tools we have. But if the line numbers provided to 
the debugger are incorrect, it becomes frustrating to set breakpoints.
You have to manually adjust for this line number bug. For example,
suppose you see function f() visually indicated at line 50 in the code,
yet you have to set a breakpoint at, say, line 45, a line ABOVE the
function! You end up having to step through the code line by line, five
lines too high! What you see visually on each line has no bearing on 
where you are actually stopped in the code.

I would imagine many people give up at this point. It is a disservice
to our users not to fix this; it is a problem of basic importance.
The debugger is a great selling point of our browser, and a great
advantage over IE. Let's support it fully.
In addition to Phil's comments, I too would urge ADT to reconsider.  It has a
huge benefit for both JS Console and JS Debugger by fixing a bug which makes us
look stupid, quite frankly.

Please keep in mind that much of the web developer community uses proprietary,
non-supported APIs (document.layers, document.all).  Many will need to learn
some or most of the W3C DOM API in order for their code to work properly in our
browser.  Leaving this bug unfixed will greatly increase the level of difficulty
to do that.  A developer starting out with coding to the standard APIs is bound
to make some mistakes at first.  If we can't help them figure out what they're
doing wrong, that imposes an enormous burden on any developer who is
transitioning to code for the standardized APIs we support.  Why on earth would
someone even bother to learn the W3C APIs and support us if we can't report
proper line numbers for their errors?  Coding could concievably become damn near
impossible for the new user.

It also has adverse effects to a developer who already has worked with MSIE's
limited standards support.  If a certain line of code, say line 200, doesn't get
called in another browser, but does in ours, and line 200 is in error, the
developer may assume that his standards compliant code on say line 100 (which is
supported by other browsers and ours as well) is causing the error since we
happen to report the error on line 100, when in fact the error is on line 200. 
But since he can't figure out why, he assumes that Mozilla doesn't support that
piece, it is too hard to code for, it sucks, etc. and then he will continue to
develop for other proprietary APIs (such as MSIE's) as opposed to the W3C DOM
API and/or fail to support us properly by imposing blocks against our products
and possibly recommending against usage of Mozilla.

The bottom line is that reporting the correct line number will save a lot of
user confusion, increase usability for developers by easing the transition from
other APIs, help evangelism and advocacy of the product, and it will really help
make Mozilla/Netscape/Gecko-based browsers not look stupid.

Re-nominating for adt1.0.0
Blocks: advocacybugs
Keywords: adt1.0.0-adt1.0.0
I second the opinions already expressed concerning the need for correct line
number reporting in the JS Console and JS Debugger.

Many times when using Mozilla, the JS Console and the JS Debugger I personally
have to perform detective work to determine where a reported error actually
occured. It is a PITA for me.  From my experience with web developers, they will
not appreciate the incorrect reporting of such a basic piece of data as the
location of an exception.

I urge the ADT to reconsider or at least to justify not accepting this important
improvement to two of the primary tools web developers have in developing
dynamic content.
I can use this for improving web contend debugging and analysis...
topembed+
Keywords: topembed+
Attached patch Updated patchSplinter Review
Attachment 76584 [details] [diff] no longer applies to the tip.	This patch is the same as
previous just applies cleanly.	Marking all other patches obsoleted by this
one.
Attachment #67798 - Attachment is obsolete: true
Attachment #70103 - Attachment is obsolete: true
Attachment #70109 - Attachment is obsolete: true
Attachment #72635 - Attachment is obsolete: true
Attachment #72986 - Attachment is obsolete: true
Attachment #74185 - Attachment is obsolete: true
Attachment #74189 - Attachment is obsolete: true
Attachment #74863 - Attachment is obsolete: true
Attachment #75015 - Attachment is obsolete: true
Attachment #75111 - Attachment is obsolete: true
Attachment #75481 - Attachment is obsolete: true
Attachment #76235 - Attachment is obsolete: true
Attachment #76584 - Attachment is obsolete: true
Comment on attachment 78184 [details] [diff] [review]
Updated patch

affixing r=heikki, sr=jst.
Attachment #78184 - Flags: superreview+
Attachment #78184 - Flags: review+
Attachment #78184 - Flags: approval+
Comment on attachment 78184 [details] [diff] [review]
Updated patch

a=chofmann
***** NOT TO BE CHECKED IN UNTIL BRANCHED ******
Btw, I'll post another patch with a fix ( trivial ) for a potential hang.
One more reason for why ADT should really consider this again: this directly
affects both Netscape engineers productivity ($) as well as a lot of the AOL
Time Warner and partner web developers... ($$$). Giving the correct line number
will save them from few seconds to perhaps several days of time when they need
to look at a JavaScript error message. Just think of yourself if you have ever
written code: what if the compiler gave wrong lines for warnings and errors, or
you had to anticipate how many lines the debugger is off while stepping in the
source code... I personally spent almost a day trying to figure out why a
certain unreproduceable topcrasher was happening on a certain line until someone
mentioned to me that talkback always reports the error one line off...
Re-Considering: Removing adt1.0.0 keyword. Pls renominate once it has been on
the trunk a couple days, and been tested. 
Keywords: adt1.0.0
FIXED landed ( 04/10 ) on the trunk .
Whiteboard: [native][ADT3] → [native][ADT3][fixed on the trunk]
Bug 137315 filed on the fact that CDATA tokens and attribute tokens are still
handled incorrectly.
Fixed on branch.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Whiteboard: [native][ADT3][fixed on the trunk] → [native][ADT3][fixed on the trunk and branch]
removing item for this bug from the release notes since the bug is marked
fixed. If this is in error, please make a note in the release notes bug for
the current milestone.
Verified fixed on linux trunk and branch bits from yesterday.
Status: RESOLVED → VERIFIED
Product: Core → Other Applications
Product: Other Applications → Other Applications Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: