Closed Bug 562590 Opened 15 years ago Closed 8 years ago

Bad byte right before EOF is not turned into the REPLACEMENT CHARACTER

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla56

Tracking Flags:

Tracking

Status

firefox56

---

fixed

People

(Reporter: hsivonen, Assigned: hsivonen)

References

(Blocks 1 open bug,
URL
)

Details

(Whiteboard: [good first bug])

Attachments

(1 file)

Bug 562590 - Make incomplete byte sequences near HTML EOF emit a REPLACEMENT CHARACTER. 8 years ago Henri Sivonen (:hsivonen) 59 bytes, text/x-review-board-request	emk : review+	Details

Henri Sivonen (:hsivonen)

Assignee

Description

•

15 years ago

Steps to reproduce (both the old parser and the HTML5 parser have this bug): 1) Load data:text/html;charset=utf-8,x%c2 2) Compare with data:text/html;charset=utf-8,x%c2x Actual results: A bad byte is turned into the REPLACEMENT CHARACTER only if there's another byte after it. Expected results: Expected a bad byte to turn into the REPLACEMENT CHARACTER even when followed immediately by the EOF.

Anne (:annevk)

Updated

•

12 years ago

Blocks: encoding

Joshua Bell

Comment 1

•

12 years ago

Henri, Anne - any updated thoughts here? Over on Chromium we're trying to decide how to tackle this. bug: http://crbug.com277035 patch: http://crrev.com/23532016 Chromium flushes the decoder at the end of fetches (page load, XHR) *but* the UTF-16 decoder is buggy and the flush ends up as a no-op. So... for UTF-16, Chromium and Gecko happen to be bug-compatible here. :( If there's no intent to fix this issue, Chromium could disable flushing at the end of fetches to become compatible with Gecko; that would merit an addition to the Encoding and/or Fetch specs. I'd prefer that we correctly flush in all cases, though, but that's a bigger change for Gecko. (There's no impact on the Encoding script API here, so far as I can tell; the behavior there can be independent of what's done for fetching.)

Joshua Bell

Comment 2

•

12 years ago

Sorry, typo in the bug URL: http://crbug.com/277035

Anne (:annevk)

Comment 3

•

12 years ago

Joshua, I think we want to do the right thing here. Emitting U+FFFD if there's an incomplete sequence when you hit EOF. I believe this is what the combination of Encoding, Fetch, and HTML already defines. I take it that by flushing you mean that Chromium already does the correct thing?

Joshua Bell

Comment 4

•

12 years ago

(In reply to Anne (:annevk) from comment #3) > I take it that by flushing you mean that Chromium already does the correct > thing? Yes (mostly). By flushing I meant ensuring the EOF propagates from "fetch" through "encoding", emitting U+FFFD. From discussion (email?) I was led to believe that Gecko is missing the plumbing there. Chromium does propagate the EOF, it just has a bug in the UTF-16 decoder that causes the it to be ignored, leading to "bug compatible' behavior with Gecko.

Anne (:annevk)

Comment 5

•

12 years ago

I recommend fixing the bug in Chromium and shaming Gecko (or better, provide us with a patch :-)).

Andrea Marchesini [:baku]

Updated

•

11 years ago

Blocks: 1032511

Henri Sivonen (:hsivonen)

Assignee

Comment 6

•

10 years ago

The HTML parser should remember if the last call to nsIUnicodeDecoder::Convert returned NS_OK_UDEC_MOREINPUT or NS_OK. At end of input from network, if the remembered result is NS_OK_UDEC_MOREINPUT, append one REPLACEMENT CHARACTER to output. Bug 912470 will update the API doc to actually say this.

Henri Sivonen (:hsivonen)

Assignee

Updated

•

10 years ago

Whiteboard: [good first bug]

Henri Sivonen (:hsivonen)

Assignee

Comment 7

•

8 years ago

I intend to actually fix this now that bug 1261841 put in place nicer infra.

Assignee: nobody → hsivonen

Status: NEW → ASSIGNED

Comment hidden (mozreview-request)

Henri Sivonen (:hsivonen)

Assignee

Updated

•

8 years ago

Attachment #8880350 - Flags: review?(VYV03354)

Masatoshi Kimura [:emk]

Comment 11

•

8 years ago

mozreview-review

Comment on attachment 8880350 [details] Bug 562590 - Make incomplete byte sequences near HTML EOF emit a REPLACEMENT CHARACTER. https://reviewboard.mozilla.org/r/151716/#review159298

Attachment #8880350 - Flags: review?(VYV03354) → review+

Mozilla Autoland

Comment 12

•

8 years ago

We're sorry, Autoland could not rebase your commits for you automatically. Please manually rebase your commits and try again. hg error in cmd: hg rebase -s 678da39b4708 -d 1bc50cd63349: rebasing 405139:678da39b4708 "Bug 562590 - Make incomplete byte sequences near HTML EOF emit a REPLACEMENT CHARACTER. r=emk" (tip) merging testing/web-platform/meta/MANIFEST.json warning: conflicts while merging testing/web-platform/meta/MANIFEST.json! (edit, then use 'hg resolve --mark') unresolved conflicts (see hg resolve, then hg rebase --continue)

Comment hidden (mozreview-request)

Pulsebot

Comment 14

•

8 years ago

Pushed by hsivonen@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f0a1809a7ee1 Make incomplete byte sequences near HTML EOF emit a REPLACEMENT CHARACTER. r=emk

Carsten Book [:Tomcat]

Comment 15

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/f0a1809a7ee1

Status: ASSIGNED → RESOLVED

Closed: 8 years ago

status-firefox56: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla56

You need to log in before you can comment on or make changes to this bug.