Last Comment Bug 165686 - .innerHTML doesn't recover  
: .innerHTML doesn't recover  
Status: RESOLVED FIXED
:
Product: Core
Classification: Components
Component: Serializers (show other bugs)
: Trunk
: All All
: -- normal (vote)
: mozilla1.3alpha
Assigned To: rbs
: sujay
Mentors:
http://redsunrising.de/bugzilla.mozil...
: 155635 (view as bug list)
Depends on:
Blocks:
  Show dependency treegraph
 
Reported: 2002-08-30 03:41 PDT by michael
Modified: 2013-01-08 12:43 PST (History)
10 users (show)
See Also:
Crash Signature:
(edit)
QA Whiteboard:
Iteration: ---
Points: ---
Has Regression Range: ---
Has STR: ---


Attachments
patch to recover   (like IE) (1.88 KB, patch)
2002-11-18 06:47 PST, rbs
jst: review+
bzbarsky: superreview+
Details | Diff | Review
testcase to try in IE and Mozilla (537 bytes, text/html)
2002-11-18 06:55 PST, rbs
no flags Details

Description michael 2002-08-30 03:41:20 PDT
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.1) Gecko/20020826
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.1) Gecko/20020826

When selecting HTML containing escaped entities and chosing view selection 
source, these entities are displayed as they were in original source (correct).
But:
- This seems not to be true for quot and nbsp
- unescaped html symbols are converted to escaped entities (amp, lt, gt...)

just compare view source and view selection source to validate.

Reproducible: Always

Steps to Reproduce:
1. Open a HTML file containing unescaped HTML symbols (example URL)
-or-
2. Open a HTML file containing escaped qout or nbsp

Actual Results:  
In both cases View Source and View Selection Source differ.

Expected Results:  
View Selection Source should behave exactly like View Source

Using german OS (english Mozilla version)
Comment 1 Boris Zbarsky [:bz] 2002-08-30 04:23:15 PDT
to rbs.
Comment 2 rbs 2002-08-30 17:57:12 PDT
There might be yet another effect of the fact that view selection source is
actually showing the DOM source in which " is already transformed to " (see
also bug 164906).

However, it is not clear to me why   isn't recovered. Let's ask akkana
since the serialized source that is displayed comes from .innerHTML (i.e.,
nsGenericHTMLElement::GetInnerHTML which in turn calls
nsIDocumentEncoder::EncodeToString(). Noted in passing that GetInnerHTML isn't
setting the charset in its encoder).

Simpler testcases to enter in the URL bar for investigation of this bug:

data:text/html;charset=utf-8,"
data:text/html;charset=utf-8, 
Comment 3 rbs 2002-09-01 22:40:23 PDT
-> updating title and passing on to akkana.

As seen here the view selection source comes from .innerHTML:
http://lxr.mozilla.org/seamonkey/source/xpfe/browser/resources/content/viewPartialSource.js#184
and this boils down to nsIDocumentEncoder::EncodeToString().

Reporter, it might be best to produce a testcase using a JS alert + .innerHTML
to make a stronger case for your bug. I doubt that it is going to be possible to
recover " but   might be recoverable.
Comment 4 Akkana Peck 2002-09-17 18:32:39 PDT
Serializer bug.  ->owner of that component.
Comment 5 Tanu Mutreja 2002-09-18 06:34:45 PDT
Comment#2 seems to be right saying
>>There might be yet another effect of the fact that view selection source is
>>actually showing the DOM source in which " is already transformed to " 
>>.(see also bug 164906).

View Selection source seems to be taking the modified DOM Tree where entities 
are properly replaced. While generating the source now it operates on modified 
entities like ", & and space. As there is no translation for " and ' '(space) 
we see it as "not recovering" quotes and nbsp. But we do see & getting replaced 
with "&".

Just for the sake of confirming the above observation, we can put """ 
and " " in the arrays named "kEntities" and "kAttrEntities" in  
nsXMLContentSerializer.cpp and see the space and quote getting replaced like 
amp.
Comment 6 rbs 2002-09-18 15:06:09 PDT
For the '"' case, if the user gave '"' you wan to recover '"' -- not
'"', right? Otherwise, if you recover '"' as '"', somebdoy might
report the bug in a similar way that you are doing here.

It seems to me that the '"' issue is a dead-end.
Comment 7 Boris Zbarsky [:bz] 2002-09-20 18:09:12 PDT
Until we start supporting entity reference nodes....
Comment 8 rbs 2002-10-08 19:36:50 PDT
*** Bug 155635 has been marked as a duplicate of this bug. ***
Comment 9 rbs 2002-11-18 05:38:17 PST
I looked at what is happening in the debugger and I am enclosing a stack trace
that might be useful. 
1. set a break point in nsHTMLContentSerializer::AppendToString()
2. enter |data:text/html;charset=utf-8, "| in the URL bar
3. select | "| (i.e., space and quote) in the content window 
4. right click and pick |View Selection Source| in the context menu

-->observation: The clause |if (aTranslateEntities && !mInCDATA)| isn't true in 
nsHTMLContentSerializer::AppendToString(), and so the code is skipping directly
to the last statement in that function, i.e., |aOutputStr.Append(aStr)|.

Stack trace:
------------

nsHTMLContentSerializer::AppendToString(const nsAString & {...}, nsAString &
{...}, int 0, int 1) line 889
nsHTMLContentSerializer::AppendElementStart(nsHTMLContentSerializer * const
0x03f60c28, nsIDOMElement * 0x03f4188c, int 1, nsAString & {...}) line 475 + 39
bytes
nsDocumentEncoder::SerializeNodeStart(nsIDOMNode * 0x03f4188c, int 0, int -1,
nsAString & {...}) line 320
nsDocumentEncoder::SerializeToStringRecursive(nsIDOMNode * 0x03f4188c, nsAString
& {...}) line 381 + 20 bytes
nsDocumentEncoder::SerializeRangeNodes(nsIDOMRange * 0x03f5c6e0, nsIDOMNode *
0x03f4188c, nsAString & {...}, int 1) line 690 + 19 bytes
nsDocumentEncoder::SerializeRangeNodes(nsIDOMRange * 0x03f5c6e0, nsIDOMNode *
0x03f4cb9c, nsAString & {...}, int 0) line 768 + 35 bytes
nsDocumentEncoder::SerializeRangeToString(nsIDOMRange * 0x03f5c6e0, nsAString &
{...}) line 892 + 30 bytes
nsDocumentEncoder::EncodeToString(nsDocumentEncoder * const 0x03f57f98,
nsAString & {...}) line 944 + 24 bytes
nsGenericHTMLElement::GetInnerHTML(nsAString & {...}) line 899
nsGenericHTMLElementTearoff::GetInnerHTML(nsGenericHTMLElementTearoff * const
0x03f4e620, nsAString & {...}) line 213 + 27 bytes
XPTC_InvokeByIndex(nsISupports * 0x03f4e620, unsigned int 8, unsigned int 1,
nsXPTCVariant * 0x0012d214) line 106
XPCWrappedNative::CallMethod(XPCCallContext & {...}, XPCWrappedNative::CallMode
CALL_GETTER) line 2016 + 42 bytes
XPCWrappedNative::GetAttribute(XPCCallContext & {...}) line 1891 + 14 bytes
XPC_WN_GetterSetter(JSContext * 0x033f9548, JSObject * 0x03f11048, unsigned int
0, long * 0x03f492c8, long * 0x0012d504) line 1315 + 12 bytes
js_Invoke(JSContext * 0x033f9548, unsigned int 0, unsigned int 2) line 839 + 23
bytes
js_InternalInvoke(JSContext * 0x033f9548, JSObject * 0x03f11048, long 66130080,
unsigned int 0, unsigned int 0, long * 0x00000000, long * 0x0012e308) line 931 +
20 bytes
js_GetProperty(JSContext * 0x033f9548, JSObject * 0x03f11048, long 55491040,
long * 0x0012e308) line 2548 + 45 bytes
js_Interpret(JSContext * 0x033f9548, long * 0x0012e4c0) line 2634 + 2033 bytes
js_Invoke(JSContext * 0x033f9548, unsigned int 1, unsigned int 2) line 856 + 13
bytes
js_InternalInvoke(JSContext * 0x033f9548, JSObject * 0x03a40ff0, long 62920544,
unsigned int 0, unsigned int 1, long * 0x0012e71c, long * 0x0012e5ec) line 931 +
20 bytes
JS_CallFunctionValue(JSContext * 0x033f9548, JSObject * 0x03a40ff0, long
62920544, unsigned int 1, long * 0x0012e71c, long * 0x0012e5ec) line 3431 + 31 bytes
nsJSContext::CallEventHandler(nsJSContext * const 0x03b32c38, void * 0x03a40ff0,
void * 0x03c01760, unsigned int 1, void * 0x0012e71c, int * 0x0012e720, int 0)
line 1041 + 33 bytes
nsJSEventListener::HandleEvent(nsJSEventListener * const 0x03e8ed80, nsIDOMEvent
* 0x03f2dcf8) line 182 + 77 bytes
nsEventListenerManager::HandleEventSubType(nsListenerStruct * 0x03e8ee40,
nsIDOMEvent * 0x03f2dcf8, nsIDOMEventTarget * 0x03e29ce0, unsigned int 1,
unsigned int 7) line 1219 + 20 bytes
nsEventListenerManager::HandleEvent(nsEventListenerManager * const 0x03dc8fa0,
nsIPresContext * 0x03cd7890, nsEvent * 0x0012ee58, nsIDOMEvent * * 0x0012ee14,
nsIDOMEventTarget * 0x03e29ce0, unsigned int 7, nsEventStatus * 0x0012ee80) line
1901 + 36 bytes
GlobalWindowImpl::HandleDOMEvent(GlobalWindowImpl * const 0x03e29cd0,
nsIPresContext * 0x03cd7890, nsEvent * 0x0012ee58, nsIDOMEvent * * 0x0012ee14,
unsigned int 7, nsEventStatus * 0x0012ee80) line 770
DocumentViewerImpl::LoadComplete(DocumentViewerImpl * const 0x03daf360, unsigned
int 0) line 938 + 47 bytes
nsDocShell::EndPageLoad(nsIWebProgress * 0x03d7622c, nsIChannel * 0x03c7eff8,
unsigned int 0) line 4224
nsWebShell::EndPageLoad(nsIWebProgress * 0x03d7622c, nsIChannel * 0x03c7eff8,
unsigned int 0) line 812
nsDocShell::OnStateChange(nsDocShell * const 0x03d7604c, nsIWebProgress *
0x03d7622c, nsIRequest * 0x03c7eff8, unsigned int 131088, unsigned int 0) line 4156
nsDocLoaderImpl::FireOnStateChange(nsIWebProgress * 0x03d7622c, nsIRequest *
0x03c7eff8, int 131088, unsigned int 0) line 1235
nsDocLoaderImpl::doStopDocumentLoad(nsIRequest * 0x03c7eff8, unsigned int 0)
line 870
nsDocLoaderImpl::DocLoaderIsEmpty() line 768
nsDocLoaderImpl::DocLoaderIsEmpty() line 771
nsDocLoaderImpl::OnStopRequest(nsDocLoaderImpl * const 0x03f194f4, nsIRequest *
0x03f19d78, nsISupports * 0x00000000, unsigned int 0) line 699
nsLoadGroup::RemoveRequest(nsLoadGroup * const 0x03f19738, nsIRequest *
0x03f19d78, nsISupports * 0x00000000, unsigned int 0) line 694 + 35 bytes
nsStreamIOChannel::OnStopRequest(nsStreamIOChannel * const 0x03f19d7c,
nsIRequest * 0x03f19e6c, nsISupports * 0x00000000, unsigned int 0) line 486
nsOnStopRequestEvent::HandleEvent() line 213
nsARequestObserverEvent::HandlePLEvent(PLEvent * 0x03f1aeac) line 116
PL_HandleEvent(PLEvent * 0x03f1aeac) line 644 + 10 bytes
PL_ProcessPendingEvents(PLEventQueue * 0x00ec5d78) line 574 + 9 bytes
nsEventQueueImpl::ProcessPendingEvents(nsEventQueueImpl * const 0x00ec8f88) line
388 + 12 bytes
nsWindow::DispatchPendingEvents() line 3661
nsWindow::ProcessMessage(unsigned int 512, unsigned int 0, long 17170915, long *
0x0012fc48) line 4004
nsWindow::WindowProc(HWND__ * 0x0006090a, unsigned int 512, unsigned int 0, long
17170915) line 1338 + 27 bytes
USER32! 77e148dc()
USER32! 77e14aa7()
USER32! 77e14bf7()
nsAppShellService::Run(nsAppShellService * const 0x0189b158) line 472
main1(int 2, char * * 0x00284f60, nsISupports * 0x00276eb8) line 1541 + 32 bytes
main(int 2, char * * 0x00284f60) line 1902 + 37 bytes
mainCRTStartup() line 338 + 17 bytes
Comment 10 rbs 2002-11-18 06:42:06 PST
-->observation2: Stack trace was for the case when serialiazing the tag; Noted
that aTranslateEntities is switched to true when serializing the textual content.
Comment 11 rbs 2002-11-18 06:47:10 PST
Created attachment 106652 [details] [diff] [review]
patch to recover    (like IE)
Comment 12 rbs 2002-11-18 06:55:36 PST
Created attachment 106653 [details]
testcase to try in IE and Mozilla
Comment 13 rbs 2002-11-18 06:57:54 PST
Ready for r/sr ...
Comment 14 Akkana Peck 2002-11-18 18:08:58 PST
This is okay with me, but I'd like to see jst review it, since he wrote the code
in question and probably knows more about its intent; and I want to make sure
Naoki knows about this and that there won't be any I18n issues with the nbsp
entity-izing and the other entitization that comes from setting the
OutputEncodeEntities flag.  
Comment 15 nhottanscp 2002-11-18 18:25:07 PST
I think I saw similar patch somewhere but I forgot the bug number.
Anyway, for  , that does not have to be encoded if the charset of the
document can encode it.
Why do we need a special case for  ?
Comment 16 rbs 2002-11-18 18:42:59 PST
re: comment 14
>Naoki knows about this and that there won't be any I18n issues with the nbsp
>entity-izing and the other entitization that comes from setting the
>OutputEncodeEntities flag.  

I had earlier noted in passing (comment 2) that the charset isn't set in the
encoder when it is initialized from nsGenericHTMLElement::GetInnerHTML().
Hence the code never entity-tize the HTML entities because mIsLatin1 is always
false.

The current patch retains that behavior but special-case nbsp by treating it
separately. Fortuitously, the combination of these two things (no charset
specified and special-casing of nbsp) causes innerHTML to mimic what IE does.

re: comment 15
> Why do we need a special case for  ?

See comments in the bug and the testcase in attachment 106653 [details].

To see the bug, if you click 'add' a number of times and click 'source' you see
' ' with IE 6 (SP1), but nothing with Mozilla.

With the patch, you see the same result as IE, and even when selecting some
pieces of the added text and doing |View Selection Source| on that, you get
those &nbsp as well -- which is what is desired by this bug.
Comment 17 Tanu Mutreja 2002-11-18 20:15:06 PST
Nhotta, probably you are referring to bug# 171405. Please see comment# 7 & 5 
there.
Comment 18 rbs 2002-11-18 21:21:47 PST
Another related bug is bug 169590.
Comment 19 nhottanscp 2002-11-19 10:10:07 PST
If everyone wants IE behavior then I think that is fine.
For the charset check, we may remove the check if we provide a way to control
entity creation for the user (bug 169590).
I assume that if we do that way then the result of the test case would be,
é " which is diffrent from IE's result. Would it be a problem to
show é? Should we only show   and not other entities?
 
Comment 20 Johnny Stenback (:jst, jst@mozilla.com) 2002-11-19 10:46:40 PST
Comment on attachment 106652 [details] [diff] [review]
patch to recover    (like IE)

r=jst
Comment 21 rbs 2002-11-19 14:49:49 PST
-> Taking, will check in when the tree opens.

Re: comment 19

'nbsp' was the most problematic because it was lost.

If the charset check is removed, and all entities are converted, some people
might complain about incompatibilities -- as was the case in bug 65324. Adding a
UI pref to control that sounds like "yet another UI pref". Plus, the explanatory
text for that is likely going to be obscure.

My suggestion would be to remove the charset, yes, but also add two levels of
entity-zation, namely, remove nsIDocumentEncoder::OutputEncodeEntities and
instead define:

nsIDocumentEncoder::OutputEncodeBasicEntities (just for 'nbsp', 'amp', 'gt', ...)
for callers such as nsGenericHTMLElement::GetInnerHTML() to use for
compatibility with IE6's innerHTML or for interoperability with other 4xp
products that don't support 'α' and friends.

nsIDocumentEncoder::OutputEncodeHTMLEntities
for other callers who want a blanket convertion of all HTML entities as
requested in bug 169590.
Comment 22 nhottanscp 2002-11-19 15:22:14 PST
That sounds good. I still think the option needs to be user configuable because
it is user's preference (e.g. é or é which one to use in ISO-8859-1), UI
might not be necessary. Anyway, that is not about this bug.
Comment 23 rbs 2002-11-19 15:37:37 PST
What would the pref be used for? Will that be for 'Save As...' ?
Comment 24 nhottanscp 2002-11-19 15:47:35 PST
Yes, that is where we usually get feedback/complaint. Some people want entities
some do not (but most people do not care).
Comment 25 rbs 2002-11-19 17:49:25 PST
Fix checked in.
Comment 26 rbs 2003-02-09 16:31:47 PST
Follow-up: user options/prefs were added in bug 169590.
Comment 27 danya.postfactum 2013-01-08 12:39:56 PST
Why, why a browser should convert " " to   ? I'm trying to find any explanation, any specific nbsp behavior in specifications, but I can't. So, could you explain guys why do it? And why don't you convert another specific whitespaces, such as  " ", " ", " ", " ", " ", " ", " ", " ", " ", " " to entities?
Comment 28 Boris Zbarsky [:bz] 2013-01-08 12:43:59 PST
> Why, why a browser should convert " " to   ?

Initially, because (especially at the time) in the vast majority of cases it started as   in the HTML.  Most people didn't even have editors where you could put a Unicode non-breaking space in easily.

At this point, because the spec at http://www.whatwg.org/specs/web-apps/current-work/multipage/the-end.html#escapingString says so.  And it does that because web sites depend on that behavior, so it can't be changed without breaking the world.

> And why don't you convert another specific whitespaces, such as  " ", " ", " ", " ", "
> ", " ", " ", " ", " ", " " to entities?

Because for the most part there were no entities for these in wide use in the 90s, so browsers didn't do anything special for them, so sites depend on browsers not doing anything special for them.

Note You need to log in before you can comment on or make changes to this bug.