DOM Source Shows Entity References As the Translated Characters




8 years ago
8 years ago


(Reporter: david, Unassigned)


Firefox Tracking Flags

(Not tracked)





8 years ago
User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv: Gecko/20100701 SeaMonkey/2.0.6
Build Identifier: 

The display for DOM Source is not the actual source file.  Entity references (e.g., • and ©) are translated into their characters.  Since the display for Page Source reflects the actual source file, the source entity references are indeed available for DOM source.  

Reproducible: Always

Steps to Reproduce:
1.  Go to the cited URI.  Scroll to the bottom of the page.  

2.  Get the context pull-down menu, and select View Page Source.  

3.  Use the cursor to select the last three lines, from Site Map to the end of the copyright notice.  

4.  Get the context pull-down menu, and select View Selection Source.  
Actual Results:  
At step 2, the markup at the bottom of the page source shows  •  between the anchor elements for Site Map and Index of HTML Files.  Closer to the bottom, the anchor element for copyright contains ©.  

At step 4, the • reference displays as an actual bullet; and the © displays as a circled-C copyright symbol.  

Expected Results:  
DOM Source (step 4) should display the actual source as was done in step 2.  

Interestingly, non-breaking spaces remain   in the DOM source.  Thus, the DOM source is inconsistent with how it handles entity references.  

If DOM Source is indeed supposed to show translated characters, then View Selection Source should be based on View Page Source and not on DOM Source.
You can't base "view selection source" on the page source, since the selected content is often not present in the original source on modern sites; it's dynamically generated.

"view selection source" just serialized the DOM.  Since the DOM doesn't contain entity references, only characters, the best we can do for any given character is to either output it as-is or entity-encode it, no matter where it came from originally.  The current decision is to output ASCII as-is (even if it may have originally come from an entity reference), to output a small set of characters that are basically never literal characters in the source (like  ), and to leave the rest as-is.  In fact, we just use .innerHTML to serialize the source, so you get exactly the behavior innerHTML has.  There's really no way to do this "right" without keeping track of entity references in the DOM somehow, and no plans to do that.
Last Resolved: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.