Closed
Bug 197342
Opened 21 years ago
Closed 21 years ago
attributes are serialized with double quote even when value contains double quotes
Categories
(Core :: DOM: Serializers, defect)
Core
DOM: Serializers
Tracking
()
RESOLVED
FIXED
People
(Reporter: bmo, Assigned: bmo)
Details
Attachments
(2 files, 5 obsolete files)
1.95 KB,
text/html
|
Details | |
4.04 KB,
patch
|
Details | Diff | Splinter Review |
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.3) Gecko/20030312 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.0; en-US; rv:1.3) Gecko/20030312 In nsXMLContentSerializer::SerializeAttr, all attributes are output using double quote delimiters, however in the HTML spec, both single quote and double quote delimiters are valid. This simple output of all attributes surrounded by double quotes results in incorrect HTML when the attribute value contains double quotes. For instance, if the original source specified the attribute as: <a onclick='goToLocation("new");'>new</a> When we run through the DOM tree and serialize this tag it will be written out as: <a onclick="goToLocation("new");">new</a> Which is invalid HTML and will cause javascript errors and parsing errors in anything that looks at this HTML in the future. Reproducible: Always Steps to Reproduce:
I'm not sure that this is the best possible solution to the problem, but it is infinitely better then the current state of the code and seems to fix the error for all html that I've seen so far. lxr link for this file is: http://lxr.mozilla.org/seamonkey/source/content/base/src/nsXMLContentSerializer.cpp#403
Comment 3•21 years ago
|
||
What about an attribute whose value contains both types of quotes? eg: <input value="I'm not "happy""> How about we do this right and just escape the '"' chars in the value?
In your example, this value will be represented in memory as [I'm not "happy"] right? When I wrote the fix I wasn't sure if we could have this because I knew we couldn't have both types of unescaped quotes in the value in the HTML, I forgot about escaped quotes. Escaping the quotes won't cause any problems in javascript or elsewhere because the HTML parser should always unescape the string before using them, right? We could scan the value string first to determine if we have only 1 type of quote, and if so just pick the correct value delimiter, otherwise if we have both types then go ahead and replace one of the types with the " representation. This would mean we save a bit of buffer manipulation if we only have a single quote type in the value. i.e. scan value string, count all single & double quote chars if single quote count = 0 or double quote count = 0 set quote type to count = 0 type else set quote type to double create new value string by escaping all double quotes output value string I can create a new patch like this if you agree.
Comment 5•21 years ago
|
||
> this value will be represented in memory as [I'm not "happy"] right?
It should be. You could check (with your patch, of course). ;)
Your approach sounds good.
In the calling function nsHTMLContentSerializer::SerializeAttributes (see http://lxr.mozilla.org/seamonkey/source/content/base/src/nsHTMLContentSerializer.cpp#406) the value of aDoEscapeEntities is set to PR_FALSE for attribute values containing javascript. Any idea why this would be? Any reason why we can't just set this to PR_TRUE for all attributes including JS?
Comment 7•21 years ago
|
||
If you click the "Blame" link in the top-right corner of that page, it gives you a listing of the file with each line of code "blamed" to the checkin that last touched it... In this case, see bug 68167 and bug 92271
The problem discussed in bug 68167 seems to be more indesirable formatting than an error in the HTML. It appears that this bug was that apostrophe entity references were used unnecessarily. The following sample works perfectly in Moz 1.3: <a href=" javascript: alert('href'); alert('quoting problem'); alert("string"); ">blah</a> Bug 92271 isn't directly related to this problem as it is about URL escaping. According to the W3 spec, it is recommended for attribute values containing quotes and ampersands to have them turned into character entities references. http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.3.2.2 [partial quote] When script or style data is the value of an attribute (either style or the intrinsic event attributes), authors should escape occurrences of the delimiting single or double quotation mark within the value according to the script or style language convention. Authors should also escape occurrences of "&" if the "&" is not meant to be the beginning of a character reference. [endquote] The main problem at the moment seems to be where to introduce the fix. Is it valid to modify nsXMLContentSerializer::SerializeAttr and use a character entity for a quote when necessary, thus ignoring the value of aDoEscapeEntities?
Comment 9•21 years ago
|
||
ccing brade, seeing as she added this code....
Assignee | ||
Comment 10•21 years ago
|
||
I've attached an updated patch. This follows the following pseudocode... if ( translating to entities ) { add unmodified string (use current code) } else { determine which quote characters exist in the string (apos, quot) select a value delimiter character (see table in attached patch) if both apos and quot are used then { translate quot chars to entities add modified string } else { add unmodified string } } Comments on design?
Attachment #117189 -
Attachment is obsolete: true
Comment 11•21 years ago
|
||
To Brodie, btw... ;)
Assignee: harishd → bugzilla
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 12•21 years ago
|
||
The design seems pretty reasonable to me... One comment on the code -- NS_LL is not exactly portable as an nsAStrin; if you need literal strings use NS_LITERAL_STRING, if you need to convert char to unichar use PRUnichar(char), in the |case| statements you can just use chars.
Comment 13•21 years ago
|
||
my only suggestion is to do some testing in Composer and make sure things still roundtrip correctly (see testcases in various bugs including bug 68167)
Assignee | ||
Comment 14•21 years ago
|
||
The line that adds the '=' sign had been mistakenly deleted at the time I made the last patch. This fixes that problem.
Attachment #117455 -
Attachment is obsolete: true
Assignee | ||
Comment 15•21 years ago
|
||
Testcase to check this bug and the results of this patch. Using composer, open the file and switch to html view. The changes to the html source as detailed in the paragraph should have occurred. Brade: Note that this does not provide true roundtrip to the original source. Instead the code is simplified and made consistent while still being valid attributes according to the standard. For example, what won't roundtrip is: <a href="alert("Hello")">hello</a> instead, you will get the simpler but still valid version: <a href='alert("Hello")'>hello</a> The only way that I can see to get real roundtrip is to store the actual string from the source file at parse time (along with the delimiter type). Boris: I've changed the code to eliminate NS_LL where possible. The only place that I couldn't find a replacement was the call to nsString::ReplaceString(). This requires an nsString or PRUnichar* parameter. NS_LITERAL_STRING doesn't provide a compatible type for this. I'm not sure what you mean about the case statements - it is comparisons between PRUnichar types here not char, I still need to cast as far as I understand.
Comment 16•21 years ago
|
||
> This requires an nsString or PRUnichar* parameter.
Yeah, but NS_LL is neither of those, on Linux. In fact, it's a no-op (unless
you're building with GCC 3.x and have --fshort-wchar enabled). You can get a
PRUnichar* out of NS_LITERAL_STRING by using .get() like:
NS_LITERAL_STRING("\"").get()
Assignee | ||
Comment 17•21 years ago
|
||
I have built and tested this patch in the composer on Windows 2000 and Redhat Linux 2.2.16-22 without a problem (changes to code on round-trip as described in the testcase). Could I get review and s-review please.
Attachment #117561 -
Attachment is obsolete: true
Attachment #118387 -
Flags: superreview?
Attachment #118387 -
Flags: review?
Attachment #118387 -
Flags: superreview?(kin)
Attachment #118387 -
Flags: superreview?
Attachment #118387 -
Flags: review?(akkana)
Attachment #118387 -
Flags: review?
Assignee | ||
Comment 18•21 years ago
|
||
Since the 1.4 version of composer is going to have a few extra features, and 1.4 is trying to be a stable and fairly bugfree platform for development, it would be good to get this fix into the 1.4 version as well since this is fixing problems in the composer. Nominating this for 1.4b. This requires reviews and a checkin.
Flags: blocking1.4b?
Comment 19•21 years ago
|
||
Comment on attachment 118387 [details] [diff] [review] patch with requested changes >+ // depending on if the attribute value contains quotes or apostrophes we "depending on whether" r=bzbarsky; please mail kin@netscape.com in person and prod lightly for review?
Attachment #118387 -
Flags: review?(akkana) → review+
Attachment #118387 -
Flags: superreview?(kin) → superreview?(heikki)
Comment on attachment 118387 [details] [diff] [review] patch with requested changes >Index: nsXMLContentSerializer.cpp >=================================================================== >- AppendToString(NS_LITERAL_STRING("\""), aStr); >+ AppendToString(NS_LITERAL_STRING("\""), aStr); There is a version of AppendToString that takes PRUnichar, so you could do: + AppendToString(PRUnichar('"'), aStr); >+ // character entity references, ignoring the value of aDoEscapeEntities. >+ // http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.3.2.2 for >+ // the standard on character entity references in values etc. Missing something above? Should the URL line start with "See "? >+ // Delimiter and escaping is according to the following table >+ // bIncludesDouble bIncludesSingle Delimiter Escape Quotes >+ // FALSE FALSE " FALSE >+ // FALSE TRUE " FALSE >+ // TRUE FALSE ' FALSE >+ // TRUE TRUE " TRUE Reading this comment I wondered why you escape both single and double quotes in the last case, but reading the code that was not the case. So maybe you should change the column heading to be "Escape Double Quotes". You might also want to move this table further down, where the actual escaping happens. >+ for (;iCurr != iEnd; ++iCurr) { >+ switch (*iCurr) { >+ case PRUnichar('\''): >+ bIncludesSingle = PR_TRUE; >+ if (bIncludesDouble) { >+ // no point searching further if we have both >+ iCurr = iEnd; >+ --iCurr; // for loop will increment before test >+ } >+ break; >+ case PRUnichar('\"'): Do you need to escape the double quote like that inside the apostrophes? >+ bIncludesDouble = PR_TRUE; >+ if (bIncludesSingle) { >+ iCurr = iEnd; >+ --iCurr; >+ } >+ } >+ } I would recomment you change the switch statement to if statement. Then you could simply break when you have found that the string contains both single and double quote, and save the manipulations with iCurr that is not used after that point. This is probably not perf critical piece of code but if it was that loop with iterators could be written slightly more efficiently by following the string guide: http://www.mozilla.org/projects/xpcom/string-guide.html#Looping_Iterators >+ PRUnichar cDelimiter = >+ (bIncludesDouble && !bIncludesSingle) ? PRUnichar('\'') : PRUnichar('\"'); Again is it necessary to escape the double quote inside single quotes? Nothing that you absolutely must fix before checking in, but I would appreciate it if you could make those changes.
Attachment #118387 -
Flags: superreview?(heikki) → superreview+
Assignee | ||
Comment 21•21 years ago
|
||
I've updated the patch to include the comments from Heikki Toivonen. All of the requested changes from comment 19 and comment 20 have been included. I have built and tested this on Win32 and Linux with todays daily trunk.
Attachment #118387 -
Attachment is obsolete: true
Comment 22•21 years ago
|
||
Brodie, please let me know if you need this checked in, ok?
Assignee | ||
Comment 23•21 years ago
|
||
I don't have CVS checkin privs, so I will need it checked in for me. Because I changed the string iterator loop substantially to do the faster looping, would you have a squiz at that before you check it in though. I am confident that it is correct (I basically cut and pasted the sample), but I would prefer someone else to have a look.
Comment 24•21 years ago
|
||
Checked in (yes, the iterator stuff looks correct). Thanks for the patch and your patience!
Status: NEW → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Comment 25•21 years ago
|
||
What was the reason for not doing the simpler "Always use double-quote as delimiter, escape double-quotes if they exist"? You'd save one pass over the attribute string that way, your code becomes a little simpler, your output has consistent attribute value delimiters[0], at the cost of slightly larger and a little less human readable xml source. [0] I don't have to look at the delimiter to know which of ' and " are safe to use directly, and (to me) it looks a little prettier to have consistent delimiters
Assignee | ||
Comment 26•21 years ago
|
||
There didn't seem any compelling reason to go in either direction. Granted it probably is a bit over-engineered, but the primary advantage of doing it this way is that we avoid shuffling the buffer around to insert the entity string if it isn't necessary. Neither this way or always using quote delimiters is the best solution for roundtrip editing. Like I said in comment 15, we can't roundtrip back to the quoting method used by the user because we never store that information. Consequently this best fit is as good as any.
Comment 27•21 years ago
|
||
clearing 1.4b blocking request, since marked as fixed.
Flags: blocking1.4b?
You need to log in
before you can comment on or make changes to this bug.
Description
•