Closed Bug 253954 Opened 21 years ago Closed 20 years ago

Not possible to let nsDOMParser detect charset from xml entity decl

Categories

(Core :: XML, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: vlad, Assigned: vlad)

References

Details

(Keywords: fixed-aviary1.0)

Attachments

(1 file, 2 obsolete files)

The patch for bug 240717 that adds support for setting the charset has a bad interaction with my patch in bug 250119 -- the new ParseFromUTF8String call just creates a ByteArrayInputStream and passes that down, expecting the parser to figure out the charset from the decl. Now that charset is handled, it ends up getting forced to UTF8 and the charset in the declaration is ignored. ParseFromUTF8String is a bit of a misnomer, maybe ParseFromBinaryString would be better; however, would an acceptable fix be to change nsDOMParser::ParseFromStream such that it doesn't set a charset if an empty string is passed in for the charset (which ParseFromUTF8String would do)?
> Now that charset is handled, it ends up getting forced to UTF8 and the charset > in the declaration is ignored. That's correct, given that you're claiming to the parser that you have out-of-band information that the data is UTF8 (that's what passing "UTF-8" as the charset parameter to ParseFromStream means). I note that you changed the charset param from being "string" to being "UTF8String" but left the "null if not specified" language, by the way... that's probably wrong. > ParseFromUTF8String is a bit of a misnomer, maybe ParseFromBinaryString would > be better Then it needs a different signature in the IDL. Right now the implementation exactly matches the API. > such that it doesn't set a charset if an empty string is passed in for the > charset That's what already happens (an empty result from GetContentCharset is what you get if you never set the charset; if that happens, the parser will use the XML decl and so forth).
Attached patch domparser-parsebuffer-0.patch (obsolete) — Splinter Review
Here's an attempt, after the discussion on irc. Removed the FromUTF8String interface (I'm the only caller), added a FromBuffer interface that takes a byte array and a length. It doesn't enforce a charset, but instead lets the xml parser figure it out itself. The patch also removes the erroneous string->AUTF8String changes for charset and contentType I did before -- no need for those, that was a bit of overzealous de-string-ifying; also fixes C++ callers of these interfaces.
Assignee: hjtoi-bugzilla → vladimir
Status: NEW → ASSIGNED
Comment on attachment 155132 [details] [diff] [review] domparser-parsebuffer-0.patch >Index: xmlextras/base/public/nsIDOMParser.idl >+ * The buffer passed in as ACString is parsed into a DOM document. This isn't using ACString... >Index: xmlextras/base/src/nsDOMParser.cpp >+nsDOMParser::ParseFromBuffer(const PRUint8 *buf, nsnull, not NULL >+ return ParseFromStream(stream, NULL, bufLen, contentType, _retval); nsnull. > nsDOMParser::ParseFromStream(nsIInputStream *stream, This function never null-checks contentType. At least some uses of it (eg nsDependentCString on it) will have issues (like crash) if it's null... >Index: webservices/soap/src/nsSOAPMessage.cpp parser->ParseFromString(nsString(*kEmptySOAPDocStr[aVersion]).get(), I know you didn't write this code, but is there a reason it's not using nsDependentString?
Attachment #155132 - Flags: review?(bzbarsky) → review-
Attached patch domparser-parsebuffer-1.patch (obsolete) — Splinter Review
Attempt #2. I'm not sure whether you meant you wanted that line in SOAPMessage to use nsDependentString or not (looking at the code, I don't think it'll hurt, but strings scare me).. ? If so, I'll make that change.
Attachment #155132 - Attachment is obsolete: true
Comment on attachment 155144 [details] [diff] [review] domparser-parsebuffer-1.patch > I'm not sure whether you meant you wanted that line in SOAPMessage to > use nsDependentString or not If it's safe (that is, if that pointer is never null), then yes. >Index: xmlextras/base/src/nsDOMParser.cpp >+ if (streamBuf == nsnull) >+ return NS_ERROR_OUT_OF_MEMORY; Actually, need to set *_retval to null here too... >+ NS_ENSURE_ARG(contentType); Document the IDL to make it clear that the type must NOT be null. r=bzbarsky with those changes.
Attachment #155144 - Flags: review?(bzbarsky) → review+
Attached patch final patchSplinter Review
final patch, incorporating above comments
Attachment #155144 - Attachment is obsolete: true
Comment on attachment 155146 [details] [diff] [review] final patch (asking for approval-aviary because this is blocking livemark parsing for non-utf8 charsets)
Attachment #155146 - Flags: superreview?(shaver)
Attachment #155146 - Flags: approval-aviary?
(In reply to comment #5) > >+ if (streamBuf == nsnull) > >+ return NS_ERROR_OUT_OF_MEMORY; > > Actually, need to set *_retval to null here too... in theory, callers are not supposed to look at out params if a function fails...
Comment on attachment 155146 [details] [diff] [review] final patch I generally eschew NULLing *retval upon failure, but that _does_ seem to be the dominant style in this code. sr+a=shaver, please make it to aviary now, and the trunk after it opens for the 1.8a4 dev cycle.
Attachment #155146 - Flags: superreview?(shaver)
Attachment #155146 - Flags: superreview+
Attachment #155146 - Flags: approval-aviary?
Attachment #155146 - Flags: approval-aviary+
Hello, please remember that all aviary Gecko patches need to land on the 1.7 branch too. Thanks! Also, I notice that you're changing IDL files but you forgot to change the UUID. Please fix that everywhere. Thanks!
in on aviary (with changed IID); not on trunk yet. Is this something that really needs to go on the 1.7 branch?
Whiteboard: needs-trunk-checkin
Read my lips: No Gecko Forking.
Attachment #155146 - Flags: approval1.7.3? → approval1.7.3+
vlad: If you want any reviews from me in the future, yes this needs to go on 1.7 branch. You have a=, so I'm not sure why it's a problem to land it there.
(In reply to comment #13) > vlad: If you want any reviews from me in the future, yes this needs to go on 1.7 > branch. > > You have a=, so I'm not sure why it's a problem to land it there. It's not a problem at all (you'll note that I requested and got a= for 1.7 after my question, though.. I guess you can't note that because the dates aren't there :) -- I wasn't aware that firefox 1.0 and 1.7 platforms were to stay in sync. I'm aware of that now, and will check in as soon as I can. No panic! No panic! :)
in on aviary, 1.7, and trunk.
Status: ASSIGNED → RESOLVED
Closed: 20 years ago
Keywords: fixed-aviary1.0
Resolution: --- → FIXED
Whiteboard: needs-trunk-checkin
Note: this is causing issues for people who were using domparser in 1.7.[1-3]... See bug 267351.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: