Closed Bug 165686 Opened 23 years ago Closed 23 years ago

.innerHTML doesn't recover  

Categories

(Core :: DOM: Serializers, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.3alpha

People

(Reporter: michael, Assigned: rbs)

References

()

Details

Attachments

(2 files)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.1) Gecko/20020826 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.1) Gecko/20020826 When selecting HTML containing escaped entities and chosing view selection source, these entities are displayed as they were in original source (correct). But: - This seems not to be true for quot and nbsp - unescaped html symbols are converted to escaped entities (amp, lt, gt...) just compare view source and view selection source to validate. Reproducible: Always Steps to Reproduce: 1. Open a HTML file containing unescaped HTML symbols (example URL) -or- 2. Open a HTML file containing escaped qout or nbsp Actual Results: In both cases View Source and View Selection Source differ. Expected Results: View Selection Source should behave exactly like View Source Using german OS (english Mozilla version)
to rbs.
Assignee: doron → rbs
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Windows XP → All
Hardware: PC → All
There might be yet another effect of the fact that view selection source is actually showing the DOM source in which " is already transformed to " (see also bug 164906). However, it is not clear to me why   isn't recovered. Let's ask akkana since the serialized source that is displayed comes from .innerHTML (i.e., nsGenericHTMLElement::GetInnerHTML which in turn calls nsIDocumentEncoder::EncodeToString(). Noted in passing that GetInnerHTML isn't setting the charset in its encoder). Simpler testcases to enter in the URL bar for investigation of this bug: data:text/html;charset=utf-8," data:text/html;charset=utf-8, 
-> updating title and passing on to akkana. As seen here the view selection source comes from .innerHTML: http://lxr.mozilla.org/seamonkey/source/xpfe/browser/resources/content/viewPartialSource.js#184 and this boils down to nsIDocumentEncoder::EncodeToString(). Reporter, it might be best to produce a testcase using a JS alert + .innerHTML to make a stronger case for your bug. I doubt that it is going to be possible to recover " but   might be recoverable.
Assignee: rbs → akkana
Component: ViewSource → DOM to Text Conversion
Summary: view selection source displays html-chars from the source as entitites → .innerHTML doesn't recover " and  
Serializer bug. ->owner of that component.
Assignee: akkana → harishd
QA Contact: pmac → sujay
Comment#2 seems to be right saying >>There might be yet another effect of the fact that view selection source is >>actually showing the DOM source in which " is already transformed to " >>.(see also bug 164906). View Selection source seems to be taking the modified DOM Tree where entities are properly replaced. While generating the source now it operates on modified entities like ", & and space. As there is no translation for " and ' '(space) we see it as "not recovering" quotes and nbsp. But we do see & getting replaced with "&". Just for the sake of confirming the above observation, we can put """ and " " in the arrays named "kEntities" and "kAttrEntities" in nsXMLContentSerializer.cpp and see the space and quote getting replaced like amp.
For the '"' case, if the user gave '"' you wan to recover '"' -- not '"', right? Otherwise, if you recover '"' as '"', somebdoy might report the bug in a similar way that you are doing here. It seems to me that the '"' issue is a dead-end.
Until we start supporting entity reference nodes....
*** Bug 155635 has been marked as a duplicate of this bug. ***
I looked at what is happening in the debugger and I am enclosing a stack trace that might be useful. 1. set a break point in nsHTMLContentSerializer::AppendToString() 2. enter |data:text/html;charset=utf-8, "| in the URL bar 3. select | "| (i.e., space and quote) in the content window 4. right click and pick |View Selection Source| in the context menu -->observation: The clause |if (aTranslateEntities && !mInCDATA)| isn't true in nsHTMLContentSerializer::AppendToString(), and so the code is skipping directly to the last statement in that function, i.e., |aOutputStr.Append(aStr)|. Stack trace: ------------ nsHTMLContentSerializer::AppendToString(const nsAString & {...}, nsAString & {...}, int 0, int 1) line 889 nsHTMLContentSerializer::AppendElementStart(nsHTMLContentSerializer * const 0x03f60c28, nsIDOMElement * 0x03f4188c, int 1, nsAString & {...}) line 475 + 39 bytes nsDocumentEncoder::SerializeNodeStart(nsIDOMNode * 0x03f4188c, int 0, int -1, nsAString & {...}) line 320 nsDocumentEncoder::SerializeToStringRecursive(nsIDOMNode * 0x03f4188c, nsAString & {...}) line 381 + 20 bytes nsDocumentEncoder::SerializeRangeNodes(nsIDOMRange * 0x03f5c6e0, nsIDOMNode * 0x03f4188c, nsAString & {...}, int 1) line 690 + 19 bytes nsDocumentEncoder::SerializeRangeNodes(nsIDOMRange * 0x03f5c6e0, nsIDOMNode * 0x03f4cb9c, nsAString & {...}, int 0) line 768 + 35 bytes nsDocumentEncoder::SerializeRangeToString(nsIDOMRange * 0x03f5c6e0, nsAString & {...}) line 892 + 30 bytes nsDocumentEncoder::EncodeToString(nsDocumentEncoder * const 0x03f57f98, nsAString & {...}) line 944 + 24 bytes nsGenericHTMLElement::GetInnerHTML(nsAString & {...}) line 899 nsGenericHTMLElementTearoff::GetInnerHTML(nsGenericHTMLElementTearoff * const 0x03f4e620, nsAString & {...}) line 213 + 27 bytes XPTC_InvokeByIndex(nsISupports * 0x03f4e620, unsigned int 8, unsigned int 1, nsXPTCVariant * 0x0012d214) line 106 XPCWrappedNative::CallMethod(XPCCallContext & {...}, XPCWrappedNative::CallMode CALL_GETTER) line 2016 + 42 bytes XPCWrappedNative::GetAttribute(XPCCallContext & {...}) line 1891 + 14 bytes XPC_WN_GetterSetter(JSContext * 0x033f9548, JSObject * 0x03f11048, unsigned int 0, long * 0x03f492c8, long * 0x0012d504) line 1315 + 12 bytes js_Invoke(JSContext * 0x033f9548, unsigned int 0, unsigned int 2) line 839 + 23 bytes js_InternalInvoke(JSContext * 0x033f9548, JSObject * 0x03f11048, long 66130080, unsigned int 0, unsigned int 0, long * 0x00000000, long * 0x0012e308) line 931 + 20 bytes js_GetProperty(JSContext * 0x033f9548, JSObject * 0x03f11048, long 55491040, long * 0x0012e308) line 2548 + 45 bytes js_Interpret(JSContext * 0x033f9548, long * 0x0012e4c0) line 2634 + 2033 bytes js_Invoke(JSContext * 0x033f9548, unsigned int 1, unsigned int 2) line 856 + 13 bytes js_InternalInvoke(JSContext * 0x033f9548, JSObject * 0x03a40ff0, long 62920544, unsigned int 0, unsigned int 1, long * 0x0012e71c, long * 0x0012e5ec) line 931 + 20 bytes JS_CallFunctionValue(JSContext * 0x033f9548, JSObject * 0x03a40ff0, long 62920544, unsigned int 1, long * 0x0012e71c, long * 0x0012e5ec) line 3431 + 31 bytes nsJSContext::CallEventHandler(nsJSContext * const 0x03b32c38, void * 0x03a40ff0, void * 0x03c01760, unsigned int 1, void * 0x0012e71c, int * 0x0012e720, int 0) line 1041 + 33 bytes nsJSEventListener::HandleEvent(nsJSEventListener * const 0x03e8ed80, nsIDOMEvent * 0x03f2dcf8) line 182 + 77 bytes nsEventListenerManager::HandleEventSubType(nsListenerStruct * 0x03e8ee40, nsIDOMEvent * 0x03f2dcf8, nsIDOMEventTarget * 0x03e29ce0, unsigned int 1, unsigned int 7) line 1219 + 20 bytes nsEventListenerManager::HandleEvent(nsEventListenerManager * const 0x03dc8fa0, nsIPresContext * 0x03cd7890, nsEvent * 0x0012ee58, nsIDOMEvent * * 0x0012ee14, nsIDOMEventTarget * 0x03e29ce0, unsigned int 7, nsEventStatus * 0x0012ee80) line 1901 + 36 bytes GlobalWindowImpl::HandleDOMEvent(GlobalWindowImpl * const 0x03e29cd0, nsIPresContext * 0x03cd7890, nsEvent * 0x0012ee58, nsIDOMEvent * * 0x0012ee14, unsigned int 7, nsEventStatus * 0x0012ee80) line 770 DocumentViewerImpl::LoadComplete(DocumentViewerImpl * const 0x03daf360, unsigned int 0) line 938 + 47 bytes nsDocShell::EndPageLoad(nsIWebProgress * 0x03d7622c, nsIChannel * 0x03c7eff8, unsigned int 0) line 4224 nsWebShell::EndPageLoad(nsIWebProgress * 0x03d7622c, nsIChannel * 0x03c7eff8, unsigned int 0) line 812 nsDocShell::OnStateChange(nsDocShell * const 0x03d7604c, nsIWebProgress * 0x03d7622c, nsIRequest * 0x03c7eff8, unsigned int 131088, unsigned int 0) line 4156 nsDocLoaderImpl::FireOnStateChange(nsIWebProgress * 0x03d7622c, nsIRequest * 0x03c7eff8, int 131088, unsigned int 0) line 1235 nsDocLoaderImpl::doStopDocumentLoad(nsIRequest * 0x03c7eff8, unsigned int 0) line 870 nsDocLoaderImpl::DocLoaderIsEmpty() line 768 nsDocLoaderImpl::DocLoaderIsEmpty() line 771 nsDocLoaderImpl::OnStopRequest(nsDocLoaderImpl * const 0x03f194f4, nsIRequest * 0x03f19d78, nsISupports * 0x00000000, unsigned int 0) line 699 nsLoadGroup::RemoveRequest(nsLoadGroup * const 0x03f19738, nsIRequest * 0x03f19d78, nsISupports * 0x00000000, unsigned int 0) line 694 + 35 bytes nsStreamIOChannel::OnStopRequest(nsStreamIOChannel * const 0x03f19d7c, nsIRequest * 0x03f19e6c, nsISupports * 0x00000000, unsigned int 0) line 486 nsOnStopRequestEvent::HandleEvent() line 213 nsARequestObserverEvent::HandlePLEvent(PLEvent * 0x03f1aeac) line 116 PL_HandleEvent(PLEvent * 0x03f1aeac) line 644 + 10 bytes PL_ProcessPendingEvents(PLEventQueue * 0x00ec5d78) line 574 + 9 bytes nsEventQueueImpl::ProcessPendingEvents(nsEventQueueImpl * const 0x00ec8f88) line 388 + 12 bytes nsWindow::DispatchPendingEvents() line 3661 nsWindow::ProcessMessage(unsigned int 512, unsigned int 0, long 17170915, long * 0x0012fc48) line 4004 nsWindow::WindowProc(HWND__ * 0x0006090a, unsigned int 512, unsigned int 0, long 17170915) line 1338 + 27 bytes USER32! 77e148dc() USER32! 77e14aa7() USER32! 77e14bf7() nsAppShellService::Run(nsAppShellService * const 0x0189b158) line 472 main1(int 2, char * * 0x00284f60, nsISupports * 0x00276eb8) line 1541 + 32 bytes main(int 2, char * * 0x00284f60) line 1902 + 37 bytes mainCRTStartup() line 338 + 17 bytes
-->observation2: Stack trace was for the case when serialiazing the tag; Noted that aTranslateEntities is switched to true when serializing the textual content.
Ready for r/sr ...
Attachment #106652 - Flags: superreview?(bzbarsky)
Attachment #106652 - Flags: review?(akkana)
Attachment #106652 - Flags: superreview?(bzbarsky) → superreview+
This is okay with me, but I'd like to see jst review it, since he wrote the code in question and probably knows more about its intent; and I want to make sure Naoki knows about this and that there won't be any I18n issues with the nbsp entity-izing and the other entitization that comes from setting the OutputEncodeEntities flag.
Attachment #106652 - Flags: review?(akkana) → review?(jst)
I think I saw similar patch somewhere but I forgot the bug number. Anyway, for  , that does not have to be encoded if the charset of the document can encode it. Why do we need a special case for  ?
re: comment 14 >Naoki knows about this and that there won't be any I18n issues with the nbsp >entity-izing and the other entitization that comes from setting the >OutputEncodeEntities flag. I had earlier noted in passing (comment 2) that the charset isn't set in the encoder when it is initialized from nsGenericHTMLElement::GetInnerHTML(). Hence the code never entity-tize the HTML entities because mIsLatin1 is always false. The current patch retains that behavior but special-case nbsp by treating it separately. Fortuitously, the combination of these two things (no charset specified and special-casing of nbsp) causes innerHTML to mimic what IE does. re: comment 15 > Why do we need a special case for  ? See comments in the bug and the testcase in attachment 106653 [details]. To see the bug, if you click 'add' a number of times and click 'source' you see ' ' with IE 6 (SP1), but nothing with Mozilla. With the patch, you see the same result as IE, and even when selecting some pieces of the added text and doing |View Selection Source| on that, you get those &nbsp as well -- which is what is desired by this bug.
Nhotta, probably you are referring to bug# 171405. Please see comment# 7 & 5 there.
Another related bug is bug 169590.
Summary: .innerHTML doesn't recover " and   → .innerHTML doesn't recover  
If everyone wants IE behavior then I think that is fine. For the charset check, we may remove the check if we provide a way to control entity creation for the user (bug 169590). I assume that if we do that way then the result of the test case would be, é " which is diffrent from IE's result. Would it be a problem to show é? Should we only show   and not other entities?
Comment on attachment 106652 [details] [diff] [review] patch to recover   (like IE) r=jst
Attachment #106652 - Flags: review?(jst) → review+
-> Taking, will check in when the tree opens. Re: comment 19 'nbsp' was the most problematic because it was lost. If the charset check is removed, and all entities are converted, some people might complain about incompatibilities -- as was the case in bug 65324. Adding a UI pref to control that sounds like "yet another UI pref". Plus, the explanatory text for that is likely going to be obscure. My suggestion would be to remove the charset, yes, but also add two levels of entity-zation, namely, remove nsIDocumentEncoder::OutputEncodeEntities and instead define: nsIDocumentEncoder::OutputEncodeBasicEntities (just for 'nbsp', 'amp', 'gt', ...) for callers such as nsGenericHTMLElement::GetInnerHTML() to use for compatibility with IE6's innerHTML or for interoperability with other 4xp products that don't support 'α' and friends. nsIDocumentEncoder::OutputEncodeHTMLEntities for other callers who want a blanket convertion of all HTML entities as requested in bug 169590.
Assignee: harishd → rbs
Target Milestone: --- → mozilla1.3alpha
That sounds good. I still think the option needs to be user configuable because it is user's preference (e.g. é or é which one to use in ISO-8859-1), UI might not be necessary. Anyway, that is not about this bug.
What would the pref be used for? Will that be for 'Save As...' ?
Yes, that is where we usually get feedback/complaint. Some people want entities some do not (but most people do not care).
Fix checked in.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Follow-up: user options/prefs were added in bug 169590.
Why, why a browser should convert " " to   ? I'm trying to find any explanation, any specific nbsp behavior in specifications, but I can't. So, could you explain guys why do it? And why don't you convert another specific whitespaces, such as " ", " ", " ", " ", " ", " ", " ", " ", " ", " " to entities?
> Why, why a browser should convert " " to   ? Initially, because (especially at the time) in the vast majority of cases it started as   in the HTML. Most people didn't even have editors where you could put a Unicode non-breaking space in easily. At this point, because the spec at http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#escapingString says so. And it does that because web sites depend on that behavior, so it can't be changed without breaking the world. > And why don't you convert another specific whitespaces, such as " ", " ", " ", " ", " > ", " ", " ", " ", " ", " " to entities? Because for the most part there were no entities for these in wide use in the 90s, so browsers didn't do anything special for them, so sites depend on browsers not doing anything special for them.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: