174351 - Encoding errors aren't treated as fatal XML errors

Reporter

Description

•

23 years ago

Build ID: 2002100108 Reproducible: Always. Steps to reproduce: 1) Load http://bugzilla.mozilla.org/buglist.cgi?bug_id=136650&format=rdf Actual results: The document contains a byte 0xC4. This byte shows up as a question mark. However, the byte is surrounded by bytes whose most significant bit is zero. Such a byte sequence is is not legal UTF-8. Neither the document nor the HTTP headers declare a character encoding and there's no BOM, so the document should be treated as UTF-8-encoded. The XML spec (2nd ed) says: "It is a fatal error if an XML entity is determined (via default, encoding declaration, or higher-level protocol) to be in a certain encoding but contains octet sequences that are not legal in that encoding. It is also a fatal error if an XML entity contains no encoding declaration and its content is not legal UTF-8 or UTF-16." Mozilla is letting a fatal XML error pass, which is bad since enforcing the well-formedness constraints is important in order to keep XML from degenerating into tag soup. Presumably the data goes though a lenient encoding converter before reaching the XML parser. Expected results: Expected Mozilla do display an error report page similar to those displayed when there's a tag mismatch. Additional information: Bug 174340 filed about Bugzilla serving the badly encoded document.

Boris Zbarsky [:bzbarsky]

Comment 1

•

23 years ago

Yeah... we may need to conditionally instantiate a fault-intolerant nsScanner...

OS: MacOS X → All

Hardware: Macintosh → All

Rakesh Mishra

Updated

•

23 years ago

QA Contact: petersen → rakeshmishra

Boris Zbarsky [:bzbarsky]

Updated

•

23 years ago

Blocks: 182751

Heikki Toivonen (remove -bugzilla when emailing directly)

Updated

•

22 years ago

Target Milestone: --- → mozilla1.4beta

Heikki Toivonen (remove -bugzilla when emailing directly)

Updated

•

22 years ago

Target Milestone: mozilla1.4beta → ---

Rakesh Mishra

Updated

•

22 years ago

QA Contact: rakeshmishra → ashishbhatt

Alexey Chernyak

Updated

•

22 years ago

No longer blocks: 182751

Summary: Fatal UTF-8 errors aren't treated as fatal errors → Fatal UTF-8 errors aren't treated as fatal XML errors

Simon Montagu :smontagu

Comment 2

•

22 years ago

*** Bug 230821 has been marked as a duplicate of this bug. ***

Simon Montagu :smontagu

Updated

•

22 years ago

Summary: Fatal UTF-8 errors aren't treated as fatal XML errors → Encoding errors aren't treated as fatal XML errors

Julian Reschke

Updated

•

22 years ago

Flags: blocking1.7a+

Flags: blocking1.6+

Flags: blocking1.4.2+

Christian :Biesinger (don't email me, ping me on IRC)

Comment 3

•

22 years ago

only drivers set blocking flags

Flags: blocking1.7a+

Flags: blocking1.6+

Flags: blocking1.4.2+

Henri Sivonen (:hsivonen)

Reporter

Comment 4

•

22 years ago

The converters now insert a placeholder character when the input contains a nonsensical byte. This bug could be fixed in a somewhat dirty way by using a place holder character that is illegal in XML.

Boris Zbarsky [:bzbarsky]

Comment 5

•

22 years ago

> The converters now insert a placeholder character when the input contains a > nonsensical byte. They do? They shouldn't be.... It's the responsibility of the converter's caller to deal with nonsensical bytes; in the case of HTML and XML parsing, see http://lxr.mozilla.org/seamonkey/source/htmlparser/src/nsScanner.cpp#364, I believe.

Boris Zbarsky [:bzbarsky]

Comment 6

•

22 years ago

The problem, of course, is that the scanner is instantiated before we have any idea whether we're parsing XML or HTML or what.

Simon Montagu :smontagu

Comment 7

•

22 years ago

>> The converters now insert a placeholder character when the input contains a >> nonsensical byte. > >They do? Yes, often. All the table-based converters pad the tables with U+FFFD REPLACEMENT CHARACTER, and silently convert undefined bytes to it, and plenty of places in the algorithmic converters insert the same character without returning failure. >They shouldn't be.... File bugs. :)

Boris Zbarsky [:bzbarsky]

Comment 8

•

22 years ago

In any case, the UTF-8 converter doesn't do that, which is the part most relevant to XML... I've filed bug 231267

Henri Sivonen (:hsivonen)

Reporter

Comment 9

•

22 years ago

smontagu: Would changing 0xFFFD to 0x001A (SUBSTITUTE) in the converters and the scanner have ill effects to other parts of Mozilla?

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 10

•

21 years ago

Additional testcases cited in: http://lists.w3.org/Archives/Public/www-style/2004Feb/0385.html

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 11

•

21 years ago

Fixing nsScanner::Append's handling of the error case when the parser context's doctype is eXML should fix the UTF-8 problem.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Comment 12

•

21 years ago

BTW, making the scanner know whether it's HTML or XML isn't hard (I just did it in my tree attempting to fix this). But I don't see an obvious way to go from nsScanner::Append to nsExpatDriver::HandleError.

Anne (:annevk)

Updated

•

21 years ago

Blocks: 234628

Simon Montagu :smontagu

Updated

•

21 years ago

No longer blocks: 234628

Vladimir Vukicevic [:vlad] [:vladv] (needinfo me, slow to respond)

Updated

•

21 years ago

Blocks: 251845

Anne (:annevk)

Updated

•

21 years ago

Keywords: helpwanted

Henri Sivonen (:hsivonen)

Reporter

Comment 13

•

21 years ago

*** Bug 267189 has been marked as a duplicate of this bug. ***

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 14

•

17 years ago

We really should fix this since we're creating interoperability problems by allowing it.

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

17 years ago

Blocks: acid3

Mike Schroepfer

Updated

•

17 years ago

Flags: wanted1.9.1+

Johnny Stenback (:jst)

Comment 15

•

17 years ago

Peter, do you think you'd have cycles for this? If not let me know. I doubt Heikki is planning to work on this any time soon :)

Assignee: hjtoi-bugzilla → peterv

Peter Van der Beken [:peterv]

Comment 16

•

17 years ago

I'll take a look at this.

Status: NEW → ASSIGNED

Priority: -- → P3

QA Contact: ashshbhatt → xml

Target Milestone: --- → mozilla1.9.1

Makoto Kato [:m_kato]

Assignee

Comment 17

•

17 years ago

Attached patch a patch for xml error detection (obsolete) — Details — Splinter Review

Peter Van der Beken [:peterv]

Updated

•

17 years ago

Assignee: peterv → m_kato

Status: ASSIGNED → NEW

Makoto Kato [:m_kato]

Assignee

Updated

•

17 years ago

Status: NEW → ASSIGNED

Mike Schroepfer

Comment 18

•

17 years ago

Peter any updates on this?

Peter Van der Beken [:peterv]

Comment 19

•

17 years ago

Schrep: Makoto Kato took this one. Makoto Kato: are you still working on this?

Makoto Kato [:m_kato]

Assignee

Comment 20

•

17 years ago

(In reply to comment #19) > Schrep: Makoto Kato took this one. > Makoto Kato: are you still working on this? > I will submit new patch and send review today.

Makoto Kato [:m_kato]

Assignee

Comment 21

•

17 years ago

Attached patch a patch v2 for detecting invalid character into XML (obsolete) — Details — Splinter Review

Attachment #327210 - Attachment is obsolete: true

Attachment #331104 - Flags: superreview?(mrbkap)

Attachment #331104 - Flags: review?(mrbkap)

Peter Van der Beken [:peterv]

Comment 22

•

17 years ago

Error reporting doesn't really work with this patch, it points to the wrong position in the source. You would need add code to HandleError to get the correct position in the file from the scanner. Another, simpler approach would be to allow consumers of the encoders to set the replacement character they want to use, and then the nsExpatDriver could set it to an invalid XML character (0xFFFF) through aParserContext.mScanner in WillBuildModel. That way Expat will catch the error and the error message would point to the right position. I think the only downside here is that the error message would be "not well-formed", which isn't technically correct. XML_ERROR_INCORRECT_ENCODING is the wrong error code to use in any case.

Makoto Kato [:m_kato]

Assignee

Comment 23

•

17 years ago

Attached patch patch v3 (obsolete) — Details — Splinter Review

Attachment #331104 - Attachment is obsolete: true

Attachment #331314 - Flags: review?(peterv)

Attachment #331104 - Flags: superreview?(mrbkap)

Attachment #331104 - Flags: review?(mrbkap)

Makoto Kato [:m_kato]

Assignee

Comment 24

•

17 years ago

Peter, could you review my patch v3?

Makoto Kato [:m_kato]

Assignee

Updated

•

17 years ago

Attachment #331314 - Flags: superreview?(peterv)

Peter Van der Beken [:peterv]

Comment 25

•

17 years ago

Comment on attachment 331314 [details] [diff] [review] patch v3 I don't think we want to change the replacement character in the scanner from 0xFFFD to 0xFFFF, that would also affect HTML. As I mentioned before (in comment 22), I think there should be a way for the nsExpatDriver to set the error character through the scanner. I also think the scanner should then set the error character on the encoder, but I'm not a peer for the I18N library so you'd have to check with Simon Montagu or Jungshik Shin if that's the right thing to do.

Attachment #331314 - Flags: superreview?(peterv)

Attachment #331314 - Flags: review?(peterv)

Attachment #331314 - Flags: review-

Makoto Kato [:m_kato]

Assignee

Comment 26

•

17 years ago

Attached patch patch v4 (obsolete) — Details — Splinter Review

Attachment #331314 - Attachment is obsolete: true

Makoto Kato [:m_kato]

Assignee

Updated

•

17 years ago

Attachment #332357 - Flags: review?(smontagu)

Peter Van der Beken [:peterv]

Comment 27

•

17 years ago

Note that v4 doesn't do what I proposed either.

Simon Montagu :smontagu

Comment 28

•

17 years ago

Comment on attachment 332357 [details] [diff] [review] patch v4 The changes to nsJapaneseToUnicode.cpp regress bug 108136. > NS_IMETHODIMP nsTableDecoderSupport::ConvertNoBuff(const char * aSrc, > PRInt32 * aSrcLength, > PRUnichar * aDest, > PRInt32 * aDestLength) > { > return nsUnicodeDecodeHelper::ConvertByTable(aSrc, aSrcLength, > aDest, aDestLength, > mScanClass, >- mShiftInTable, mMappingTable); >+ mShiftInTable, mMappingTable, >+ mErrBehavior == kOnError_Signal ? PR_TRUE : PR_FALSE); > } "? PR_TRUE : PR_FALSE" is redundant here and elsewhere > nsresult nsUnicodeDecodeHelper::ConvertByFastTable( >+ for (; src<srcEnd;) { >+ *dest = aFastTable[*src++]; >+ if (*dest == 0xfffb && aErrorSignal) { Typo for 0xfffd?

Attachment #332357 - Flags: review?(smontagu) → review-

Makoto Kato [:m_kato]

Assignee

Comment 29

•

17 years ago

Attached patch patch v5 (obsolete) — Details — Splinter Review

Attachment #332357 - Attachment is obsolete: true

Attachment #335358 - Flags: review?(smontagu)

Simon Montagu :smontagu

Updated

•

17 years ago

Attachment #335358 - Flags: review?(smontagu) → review-

Simon Montagu :smontagu

Comment 30

•

17 years ago

Comment on attachment 335358 [details] [diff] [review] patch v5 I'm sorry: I made a mistake in my earlier comment "The changes to nsJapaneseToUnicode.cpp regress bug 108136." I should have said that they regressed bug 116882.

Makoto Kato [:m_kato]

Assignee

Comment 31

•

17 years ago

Attached patch patch v6 (obsolete) — Details — Splinter Review

Attachment #335358 - Attachment is obsolete: true

Makoto Kato [:m_kato]

Assignee

Updated

•

17 years ago

Attachment #337431 - Flags: review?(smontagu)

Simon Montagu :smontagu

Updated

•

17 years ago

Attachment #337431 - Flags: review?(smontagu) → review+

Makoto Kato [:m_kato]

Assignee

Updated

•

17 years ago

Attachment #337431 - Flags: superreview?(peterv)

Simon Montagu :smontagu

Updated

•

17 years ago

Blocks: 289947

Bill Gianopoulos [:WG9s]

Comment 32

•

17 years ago

This should probably be a separate bug, however I wanted to report it here just in case there is some reason this case should not be triggering an XML Parsing Error message. With this patch applied it is no longer possible to view the following URL: http://wiki.mozilla-russia.org/index.php/Mozilla%20Firefox Attempting to view this page results in the rather ugly, none user friendly XML Parsing Error message: XML Parsing Error: not well-formed Location: http://wiki.mozilla-russia.org/index.php/Mozilla%20Firefox Line Number 26, Column 699: This error message in this form is something that should never be presented to a user becuase of errors in content. For chrome errors I guess it is OK, but for content errors, if browser.xul.error_pages.enabled is true, and XUL error page should be displayed. I think we are probably correct in following the spec in not displaying the page in this case becuase the validator at http://validator.w3.org/ refuses to attempt to validate the page becuase of this parsing error. It displays: Sorry, I am unable to validate this document because on line 26 it contained one or more bytes that I cannot interpret as utf-8 (in other words, the bytes found are not valid values in the specified Character Encoding). Please check both the content of the file and the character encoding indication. The error was: utf8 "\xD0" does not map to Unicode The issue with this particular error is that this page currently displays with the latest release version of every browser I have tried despite this error. It is all well and good to say we are the only ones following the spec correctly, but we all know that argument is relatively worthless in the real world. The other thing that is an issue about this particular case is that this is a WIKI, so you can;t just file a tech evangelism bug and get the page fixed on the server. Unless the WIKI software is doing proper character encoding tests when people post, new pages will just be added that trigger this issue. So, I think what is needed is more user friendly XUL error pages for all XML Parsing errors that occur in content and, for this error in particular, the page needs to have a "Try to Display Anyway" button that tries to parse and display the page using the previous, more lenient, rules.

Sam Sneddon [:gsnedders]

Comment 33

•

17 years ago

IE treats this as not well-formed (though not in the case of XHTML, due to its lack of support, but does for generic XML and feeds), likewise does WebKit trunk (as part of the Acid3 fixes). As someone who primarily uses WebKit, I've not seen anything break as a result. (In reply to comment #32) > The other thing that is an issue about this particular case is that this is a > WIKI, so you can;t just file a tech evangelism bug and get the page fixed on > the server. Unless the WIKI software is doing proper character encoding tests > when people post, new pages will just be added that trigger this issue. It is perfectly possible to write code that deals with it. The wiki, if it is going to output XHTML, should deal with it. If it can't be bothered checking everything in great detail, it should just use HTML (if it does use XHTML, it should probably use a serializer and thus not have to worry about such issues). Even if this is not checked, you'd then have to make sure you didn't have things like U+FFFF (a perm. noncharacter, equally breaking a well-formedness constraint).

Boris Zbarsky [:bzbarsky]

Comment 34

•

17 years ago

Bill, the wiki in question is going out of its way to send different markup to IE and Firefox, and the markup it sends to Firefox is buggy. It should just stop doing that (or fix its bug). There are existing bugs on improving the error one gets on invalid XML.

Peter Van der Beken [:peterv]

Comment 35

•

17 years ago

Comment on attachment 337431 [details] [diff] [review] patch v6 Doesn't do what I proposed, but more importantly: >+ if (aScanner.HasInvalidCharacter()) >+ length = aScanner.FirstInvalidCharacterPosition(); is wrong given >+ PRInt32 FirstInvalidCharacterPosition() >+ { >+ return Distance(mCurrentPosition, mFirstInvalidPosition); >+ } mFirstInvalidPosition can be in a different buffer from mCurrentPosition, length must only be the length of the buffer or less.

Attachment #337431 - Flags: superreview?(peterv) → superreview-

Makoto Kato [:m_kato]

Assignee

Comment 36

•

17 years ago

Attached patch patch v7 (obsolete) — Details — Splinter Review

Attachment #337431 - Attachment is obsolete: true

Makoto Kato [:m_kato]

Assignee

Comment 37

•

17 years ago

When first call of nsScanner::Append(), we don't know whether current markup is HTML or XML. If we set invalid charcter (0xffff) for XML, we need patch current buffer (or reload?)

Gabriele Best [:Gabri]

Comment 38

•

17 years ago

Kato, I think you should re-ask review for this.

Simon Montagu :smontagu

Updated

•

17 years ago

Attachment #344880 - Flags: superreview?(peterv)

Simon Montagu :smontagu

Comment 39

•

17 years ago

Comment on attachment 344880 [details] [diff] [review] patch v7 Peter, does this patch address your concerns?

Peter Van der Beken [:peterv]

Comment 40

•

17 years ago

Comment on attachment 344880 [details] [diff] [review] patch v7 >diff --git a/intl/uconv/public/nsIUnicodeDecoder.h b/intl/uconv/public/nsIUnicodeDecoder.h >+ NS_IMETHOD SetInputErrorBehavior(PRInt32 aBehavior) = 0; This should be just virtual void. >+ NS_IMETHOD GetCharacterForUnMapped(PRUnichar *aMap) = 0; This should return PRUnichar. >diff --git a/intl/uconv/ucvja/nsJapaneseToUnicode.cpp b/intl/uconv/ucvja/nsJapaneseToUnicode.cpp >@@ -210,6 +221,10 @@ > } > *aDestLen = dest - aDest; > return NS_OK; >+error_invalidchar: >+ *aDestLen = dest-aDest; Spaces around - >@@ -303,6 +325,8 @@ > // others >+ if (mErrBehavior == kOnError_Signal) >+ goto error_invalidchar; Spacing is off. >@@ -394,6 +426,10 @@ > } > *aDestLen = dest - aDest; > return NS_OK; >+error_invalidchar: >+ *aDestLen = dest-aDest; Spaces around - >diff --git a/parser/htmlparser/src/nsScanner.cpp b/parser/htmlparser/src/nsScanner.cpp >+ // if we failed, we consume one byte, replace it with unmapped ... with the unmapped >+ while (--aErrorPos >= 0) >+ ++mFirstInvalidPosition; mFirstInvalidPosition.advance(aErrorPos)? >+void nsScanner::UseDecoderUnmappedChar(PRBool aUseDecoderChar) >+ if (mHasInvalidCharacter) >+ ReplaceCharacter(mFirstInvalidPosition, 0xffff); >+ mHasInvalidCharacter = PR_FALSE; I'm not sure why this gets set to PR_FALSE again? >diff --git a/parser/htmlparser/src/nsScanner.h b/parser/htmlparser/src/nsScanner.h >+ PRPackedBool mUseDecoderUnmappedChar; Would it make more sense to make this PRUnichar mUnmappedCharOverride, with default value 0x0, set it to 0xFFFF for XML?

Attachment #344880 - Flags: superreview?(peterv) → superreview-

Simon Montagu :smontagu

Comment 41

•

17 years ago

Attached patch Patch v8 - addressing sr comments (obsolete) — Details — Splinter Review

Attachment #361753 - Flags: superreview?

Simon Montagu :smontagu

Updated

•

17 years ago

Attachment #361753 - Flags: superreview? → superreview?(peterv)

Peter Van der Beken [:peterv]

Comment 42

•

17 years ago

Comment on attachment 361753 [details] [diff] [review] Patch v8 - addressing sr comments >diff --git a/intl/uconv/public/nsIUnicodeDecoder.h b/intl/uconv/public/nsIUnicodeDecoder.h >+ /** >+ * Specify what to do when a character cannot be mapped into unicode >+ */ >+ virtual void SetInputErrorBehavior(PRInt32 aBehavior) = 0; Need to document the values (refer to the enum or something). >+ >+ /** >+ * return the UNICODE charater for unmapped character chara*c*ter >diff --git a/intl/uconv/util/nsUCSupport.cpp b/intl/uconv/util/nsUCSupport.cpp >@@ -138,16 +151,23 @@ NS_IMETHODIMP nsBufferDecoderSupport::Co >+ if (res == NS_ERROR_ILLEGAL_INPUT) { >+ if (mErrBehavior == kOnError_Signal) { Merge those ifs. >+ break; >+ } >+ } >diff --git a/parser/htmlparser/src/nsParser.cpp b/parser/htmlparser/src/nsParser.cpp >+ mParserContext->mScanner->OverrideReplacementCharacter(0xffff); Please move this into nsExpatDriver::WillBuildModel instead. >diff --git a/parser/htmlparser/src/nsScanner.cpp b/parser/htmlparser/src/nsScanner.cpp >@@ -179,18 +183,30 @@ nsresult nsScanner::SetDocumentCharset(c >+ mUnicodeDecoder->SetInputErrorBehavior(nsIUnicodeDecoder::kOnError_Signal); >+ if (mReplacementCharacter != 0xffff) I think this should be |mReplacementCharacter == 0xfffd| >+ // 0xffff is invalid character for XML (expat) >+ if (mReplacementCharacter != 0xffff) { Same here, == 0xfffd (and drop the comment). >+ if (errorPos == -1) and merge the if's >+ errorPos = totalChars; >+ } >diff --git a/parser/htmlparser/src/nsScanner.h b/parser/htmlparser/src/nsScanner.h >+ * Override replacement character used by IUnicodeDecoder. >+ * Default behavior is that it uses IUnicodeDecoder's mapping. nsIUnicodeDecoder >+ nsScannerIterator mFirstInvalidPosition; // The position we detects invalid character The position of the first invalid character that was detected

Attachment #361753 - Flags: superreview?(peterv)

Attachment #361753 - Flags: superreview+

Attachment #361753 - Flags: review+

Peter Van der Beken [:peterv]

Comment 43

•

17 years ago

(In reply to comment #42) > I think this should be |mReplacementCharacter == 0xfffd| Hmm, I guess that won't work. It's a bit annoying to hardcode this 0xffff in so many places. That's why I proposed 0x0 as default, you could then just check for == 0x0 where you currently check != 0xffff and instead of setting mReplacementCharacter to GetCharacterForUnMapped just call GetCharacterForUnMapped in the one place that it's read, making unichars[unicharLength++] = mReplacementCharacter; into something like unichars[unicharLength++] = mReplacementCharacter == 0x0 ? mUnicodeDecoder->GetCharacterForUnMapped() : mReplacementCharacter; I don't think we should be worried about the overhead of repeatedly calling GetCharacterForUnMapped in case of multiple errors.

Simon Montagu :smontagu

Comment 44

•

16 years ago

http://hg.mozilla.org/mozilla-central/rev/5953efc48779

Status: ASSIGNED → RESOLVED

Closed: 16 years ago

Resolution: --- → FIXED

Simon Montagu :smontagu

Comment 45

•

16 years ago

Attached patch Patch as checked in — Details — Splinter Review

Attachment #344880 - Attachment is obsolete: true

Attachment #361753 - Attachment is obsolete: true

Dão Gottwald [:dao]

Updated

•

16 years ago

Keywords: helpwanted

Target Milestone: mozilla1.9.1 → mozilla1.9.2a1

Robert Sayre

Updated

•

16 years ago

Flags: wanted1.9.1+ → wanted1.9.1-

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 46

•

16 years ago

Sayrer, why did you change this to wanted1.9.1-?

Robert Sayre

Comment 47

•

16 years ago

(In reply to comment #46) > Sayrer, why did you change this to wanted1.9.1-? It's going to break a bunch of pages and feeds. Seems a little late for the 3.1 branch, don't you think?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 48

•

16 years ago

Ah, OK.

Nick Lowe

Comment 49

•

16 years ago

Saying that, both Safari 4.0 and Opera 10.0 will exhibit this behavior as part of their ACID3 compliance. What's the argument for excluding this while 3.1 is still in beta? It would certainly be interesting to see the real world metrics of what would break.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 50

•

16 years ago

We have the fix on trunk and it will be in the next release. Getting one more point in Acid3 for 3.1 is not important.

Nick Lowe

Comment 51

•

16 years ago

Fair enough :) I know ACID3 isn't important by itself :p - the fact that both Safari, Opera and Chrome will or do exhibit this behavior is surely relevant though?

Julian Reschke

Comment 52

•

16 years ago

Unless I'm missing something, IE has been doing this properly since XML was introduced.

Boris Zbarsky [:bzbarsky]

Comment 53

•

16 years ago

Julian, IE doesn't get application/xhtml+xml content (which is possibly mis-encoded) served to it, so its situation is quite different.

Boris Zbarsky [:bzbarsky]

Comment 54

•

16 years ago

And just to make it more clear, this is likely to break at least some pages. Maybe they're already broken in Safari and Opera. Maybe they browser-sniff. But I can pretty much guarantee that the number of pages that used to work and now won't will be nonzero. That's fine, but I don't think we want to be taking that sort of change this late in the cycle... _if_ this were to land in b3 I _might_ be ok with that. Anything later than that is unacceptable. My gut feeling is that this is not exactly urgent given the history of the bug, and a 6-month lag in getting the fix out won't kill anyone.

Simon Montagu :smontagu

Comment 55

•

16 years ago

I assume that the last few comments are using "pages break" in the sense "broken pages used to be unreadable, but now aren't displayed at all"? FWIW, we already know of at least one page that breaks in that sense: http://wiki.mozilla-russia.org/index.php/Mozilla%20Firefox referenced in comment 32.

Boris Zbarsky [:bzbarsky]

Comment 56

•

16 years ago

> "broken pages used to be unreadable, but now aren't displayed at all"? Or "page was readable except for one single mis-encoded character but now isn't displayed at all".

Nick Lowe

Comment 57

•

16 years ago

The reality is that errors will only get fixed in most sites when content breaks in the browser, not before hand. In that regard, it makes no real difference if it is now or six months down the line. For that reason, if it were my choice, I would seek to get this committed for Beta 3. If standards compliance is claimed in markup, it should be held to that contract. Leaving this just prolongs interoperability issues and, as new content is created, just exacerbates the issue.

Jeff Walden [:Waldo]

Comment 58

•

16 years ago

(In reply to comment #57) > The reality is that errors will only get fixed in most sites when content > breaks in the browser, not before hand. > > In that regard, it makes no real difference if it is now or six months down the > line. Don't underestimate the power of nightly users and beta users to notify sites to get their content fixed. A few extra months of that can make a huge difference in web compatibility. (As for me, I think this would have been good in the previous beta, but I think now's really too late to do it for this release cycle. Standards compatibility is a long-haul process; not every win has to be made NOW, for THIS RELEASE, to slightly overemphasize some people's demands a little.)

Boris Zbarsky [:bzbarsky]

Comment 59

•

16 years ago

Nick, we have people whose job it is to contact sites that break (which we find out about from our nightly users) and get them to fix their stuff. It's a lot easier for that to work if the sites have as much lead time as possible between things changing on our end and us shipping. I'm basically with Jeff on the scheduling here.

Robert Sayre

Comment 60

•

16 years ago

It's also going to affect us more than, say, Safari, because they ignore these errors in their feed parser. The current arrangement allowed us to get a lot of XML edge cases right that Safari doesn't but IE does, and also avoid a lot of fatal errors that come down to "smart quotes" and things. I'm a little annoyed that this is going to make a bunch of feed work for me when we don't get a lot of parsing bugs reported there as it stands.

Stebs

Comment 61

•

16 years ago

Now with a Beta4 planned, might this Bug be a case of: "- complete: no follow up fixes or initial implimentations that commit us to extending the schedule further - proven: patches with tests, security reviews and performance impact analysis, baked on trunk for a good amount of time - valuable: we need to understand the benefit of taking the change, and why it's needed for 3.1 instead of later - removable: backout-ready, if anything goes wrong" Or still no way for 1.9.1?

Johnny Stenback (:jst)

Comment 62

•

16 years ago

I would argue that this is not 1.9.1 material, even given more beta cycles. I'd like this to be baked on the trunks for many months before even considering this for a final release, which would mean it'll be in 1.9.2 (or whatever the number of the next release will be).

Bill Gianopoulos [:WG9s]

Comment 63

•

16 years ago

I think this also needs a more user friendly (or at least better looking) display of the error message, now that this error can be triggered by content, rather than just chrome errors. Perhaps I will file a followup bug on that.

Michael Ryan

Updated

•

16 years ago

Depends on: 498607

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

16 years ago

Depends on: 525581

David Baron :dbaron: (⌚️UTC-4, no longer working on Mozilla)

Updated

•

16 years ago

Depends on: 530328

Timothy Nikkel (:tnikkel)

Updated

•

16 years ago

Depends on: 545061

Peter Van der Beken [:peterv]

Updated

•

15 years ago

Blocks: 555496

Alice0775 White

Updated

•

14 years ago

Depends on: 651074

a patch for xml error detection 17 years ago Makoto Kato [:m_kato] 31.96 KB, patch		Details \| Diff \| Splinter Review
a patch v2 for detecting invalid character into XML 17 years ago Makoto Kato [:m_kato] 33.13 KB, patch		Details \| Diff \| Splinter Review
patch v3 17 years ago Makoto Kato [:m_kato] 26.88 KB, patch	peterv : review-	Details \| Diff \| Splinter Review
patch v4 17 years ago Makoto Kato [:m_kato] 34.32 KB, patch	smontagu : review-	Details \| Diff \| Splinter Review
patch v5 17 years ago Makoto Kato [:m_kato] 33.76 KB, patch	smontagu : review-	Details \| Diff \| Splinter Review
patch v6 17 years ago Makoto Kato [:m_kato] 23.44 KB, patch	smontagu : review+ peterv : superreview-	Details \| Diff \| Splinter Review
patch v7 17 years ago Makoto Kato [:m_kato] 23.73 KB, patch	peterv : superreview-	Details \| Diff \| Splinter Review
Patch v8 - addressing sr comments 17 years ago Simon Montagu :smontagu 37.75 KB, patch	peterv : review+ peterv : superreview+	Details \| Diff \| Splinter Review
Patch as checked in 16 years ago Simon Montagu :smontagu 37.85 KB, patch		Details \| Diff \| Splinter Review