Closed Bug 320883 Opened 19 years ago Closed 19 years ago

Extra newline when textfile has CRLF at offset 1024+1025

Tracking

()

Status:

VERIFIED FIXED

Milestone:

mozilla1.9alpha1

People

(Reporter: bugzilla, Assigned: mrbkap)

References

(
URL
)

Details

(Keywords: fixed1.8.1, Whiteboard: [patch])

Attachments

(1 file, 1 obsolete file)

Treat CR-LF pairs as pairs 19 years ago Blake Kaplan (:mrbkap) (inactive) 2.88 KB, patch	bzbarsky : review-	Details \| Diff \| Splinter Review
Updated patch 19 years ago Blake Kaplan (:mrbkap) (inactive) 3.16 KB, patch	bzbarsky : review+ brendan : superreview+ jst : approval-branch-1.8.1+	Details \| Diff \| Splinter Review

Peter Brodersen

Reporter

Description

•

19 years ago

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8) Gecko/20051111 Firefox/1.5

When a file of type text/plain has a "Windows" newline (CRLF) at byte offset 1024+1025 two newlines are shown.

I assume Firefox reads chunks of 1024 bytes at a time. In that case the CR is not seen in connection with the LF and both are rendered as a newline.


Reproducible: Always

Steps to Reproduce:
1. Create a text/plain file with 1023 bytes of text
2. Add CR+LF
3. Add some more text

Actual Results:  
The CRLF will show as two newlines.

Expected Results:  
Only one newline should be shown.

This occurs when the resource is requested over HTTP. It does not occur when the file is accessed through the file system.

E.g. visit http://stock.ter.dk/chunks.txt , save the resource, open the resource via the local file system.

zug_treno

Comment 1

•

19 years ago

Maybe related to Core bug 197075?

Phil Ringnalda (:philor)

Updated

•

19 years ago

Assignee: nobody → mrbkap

Component: General → HTML: Parser

Product: Firefox → Core

QA Contact: general → parser

Version: unspecified → Trunk

Jesse Ruderman

Comment 2

•

19 years ago

*** Bug 323613 has been marked as a duplicate of this bug. ***

Iain Arnell

Comment 3

•

19 years ago

Also being reported heavily at Project Gutenberg Distributed Proofreaders.  http://www.pgdp.net/phpBB2/viewtopic.php?t=18705
Another testcase at http://www.xs4all.nl/~iarnell/test1024.txt

Blake Kaplan (:mrbkap) (inactive)

Assignee

Comment 4

•

19 years ago

This is easy to fix, but it brings into question the newline handling code in CTextToken::Consume. It currently leaves CR-LF pairs in instead of stripping the CR out (which seems wrong to me). I'll attach a patch that leaves their behavior alone, but fixes the testcase here.

Status: UNCONFIRMED → ASSIGNED

Ever confirmed: true

Priority: -- → P4

Target Milestone: --- → mozilla1.9alpha

Blake Kaplan (:mrbkap) (inactive)

Assignee

Comment 5

•

19 years ago

Attached patch Treat CR-LF pairs as pairs (obsolete) — Details — Splinter Review

If we find a carriage return, then we need to check the next character. If we're at the end of a packet boundary, then we'll need to wait longer. Otherwise, we don't need to worry about more data coming, and we can treat it as a standalone CR.

Attachment #209368 - Flags: review?(bzbarsky)

Blake Kaplan (:mrbkap) (inactive)

Assignee

Comment 6

•

19 years ago

Comment on attachment 209368 [details] [diff] [review]
Treat CR-LF pairs as pairs

>Index: parser/htmlparser/src/nsHTMLTokens.cpp
>       result = aScanner.Peek(aChar);
> ...
>+        aScanner.Peek(aChar);

This second Peek is superfluous, please ignore it.

Boris Zbarsky [:bzbarsky]

Comment 7

•

19 years ago

> This is easy to fix, but it brings into question the newline handling code in
> CTextToken::Consume.

This is probably handled by us stripping \r in the content sink....  Perhaps we should consider fixing the code here and removing the stripping there?  Or would that lead to less-efficient string-copying than we have now?

Boris Zbarsky [:bzbarsky]

Comment 8

•

19 years ago

Comment on attachment 209368 [details] [diff] [review]
Treat CR-LF pairs as pairs

>Index: parser/htmlparser/src/nsHTMLTokens.cpp
>+        aScanner.Peek(aChar);

You're removing this, right?

>+              if (aScanner.IsIncremental()) {
>+                return result;

So... don't we need to reset the scanner position?  Or does something else handle it for us?

I have to admit; I was looking at nsHTMLTokenizer::ConsumeTex and I'm not sure it deals with a failure return at all well.  In particular, it will IF_FREE aToken, set it to null, then set it to theToken, and AddToken theToken.... But theToken has nothing useful in it (we just bailed out of this code).  So what's up with that?

Blake Kaplan (:mrbkap) (inactive)

Assignee

Comment 9

•

19 years ago

(In reply to comment #7)
> should consider fixing the code here and removing the stripping there?  Or
> would that lead to less-efficient string-copying than we have now?

That's my fear. I think I'd rather leave the current code as-is, since it's the content sink that does the stripping.

(In reply to comment #8)
> >+        aScanner.Peek(aChar);
> You're removing this, right?

Yes.

> >+              if (aScanner.IsIncremental()) {
> >+                return result;
> 
> So... don't we need to reset the scanner position?  Or does something else
> handle it for us?

Yes, see the code in nsParser::Tokenize that calls nsITokenizer::ConsumeToken. If the parser returns an error, then it rewinds the scanner to the last mark. Marks are set at the end of each successfully consumed token.

> But theToken has nothing useful in it (we just bailed out of this code).
> So what's up with that?

D'oh, my |return result| should be a |break;|, so we'll bail out of the while loop and bind our substring to something useful. Want a new patch?

Boris Zbarsky [:bzbarsky]

Comment 10

•

19 years ago

> If the parser returns an error, then it rewinds the scanner to the last mark.

Ok.

> so we'll bail out of the while loop and bind our substring to something useful.

But if the scanner is going to be rewound... should we really be binding here?  That will convert the error code to NS_OK on ConsumeText and we'll get doubled text, no?

Blake Kaplan (:mrbkap) (inactive)

Assignee

Comment 11

•

19 years ago

(In reply to comment #10)
> > so we'll bail out of the while loop and bind our substring to something useful.
> But if the scanner is going to be rewound... should we really be binding here? 
> That will convert the error code to NS_OK on ConsumeText and we'll get doubled
> text, no?

No, because CTextToken::Consume only returns to nsHTMLTokenizer::ConsumeText, which will convert the return value immediately, so the parser will never see it, and it will mark the scanner right before the carriage return. Therefore, when the next chunk of data comes in, we'll start reading at the carriage return and do the Right Thing (TM).

Boris Zbarsky [:bzbarsky]

Comment 12

•

19 years ago

Comment on attachment 209368 [details] [diff] [review]
Treat CR-LF pairs as pairs

Ah, I see.  Can you post a patch with the various things fixed so I can do some more mental modeling of parser?  ;)

Attachment #209368 - Flags: review?(bzbarsky) → review-

Blake Kaplan (:mrbkap) (inactive)

Assignee

Comment 13

•

19 years ago

Attached patch Updated patch — Details — Splinter Review

I made one large change to this patch: I no longer set result to NS_OK in CTextToken::Consume. This is a micro optimization so that we don't go around the loop again if we found EOF.

Attachment #209368 - Attachment is obsolete: true

Attachment #209480 - Flags: review?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Updated

•

19 years ago

Attachment #209480 - Flags: review?(bzbarsky) → review+

Blake Kaplan (:mrbkap) (inactive)

Assignee

Updated

•

19 years ago

Attachment #209480 - Flags: superreview?(brendan)

Blake Kaplan (:mrbkap) (inactive)

Assignee

Updated

•

19 years ago

Whiteboard: [patch]

Brendan Eich [:brendan]

Comment 14

•

19 years ago

Comment on attachment 209480 [details] [diff] [review]
Updated patch

Sure, rs=me.

/be

Attachment #209480 - Flags: superreview?(brendan) → superreview+

Blake Kaplan (:mrbkap) (inactive)

Assignee

Comment 15

•

19 years ago

Fix checked into trunk.

Status: ASSIGNED → RESOLVED

Closed: 19 years ago

Resolution: --- → FIXED

Iain Arnell

Comment 16

•

19 years ago

Thanks. Fix works for me on trunk. Any chance of getting this into 1.8 as well? Reports from distributed proofreaders show this is occurring on firefox 1.5 (and 1.0.7) too.

Peter Brodersen

Reporter

Comment 17

•

18 years ago

This issue is still present in Firefox 2.0a1 (PortableFirefox)

Blake Kaplan (:mrbkap) (inactive)

Assignee

Updated

•

18 years ago

Attachment #209480 - Flags: approval-branch-1.8.1?(jst)

Boris Zbarsky [:bzbarsky]

Comment 18

•

18 years ago

Yes; it hasn't been checked on branch.   See the flags on the patch attachment.

Johnny Stenback (:jst)

Updated

•

18 years ago

Attachment #209480 - Flags: approval-branch-1.8.1?(jst) → approval-branch-1.8.1+

Blake Kaplan (:mrbkap) (inactive)

Assignee

Comment 19

•

18 years ago

Fix checked into the MOZILLA_1_8_BRANCH. Marking this verified on trunk per comment 16.

Status: RESOLVED → VERIFIED

Keywords: fixed1.8.1

You need to log in before you can comment on or make changes to this bug.