Closed Bug 314181 Opened 19 years ago Closed 18 years ago

Changing encodings in "view selection source" causes artifacts at selection boundary.

Categories

(Toolkit :: View Source, defect)

x86
Windows XP
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: grendelkhan, Assigned: smontagu)

References

()

Details

(Keywords: fixed1.8.1)

Attachments

(2 files)

User-Agent:       Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7
Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.12) Gecko/20050915 Firefox/1.0.7

When I use "view selection source" on a UTF-8 encoded file, then change the encoding to ISO-8859-1, a long string of characters appears at the selection boundaries. This happens whether or not I deselect the text in the "View Source" window.

Reproducible: Always

Steps to Reproduce:
1. Load http://www.cl.cam.ac.uk/~mgk25/ucs/examples/digraphs.txt, a UTF-8 encoded document.
2. Select some text. (It may or may not contain non-ASCII characters.)
3. Right-click, "View Selection Source".
4. Select from the menu: View->Character Encoding->Western (ISO-8859-1).

Actual Results:  
The multibyte UTF-8 sequences expand to their ISO-8859-1 equivalents. The string "​​​​​" appears at the beginning and the end of the original selection.

Expected Results:  
The multibyte UTF-8 sequences expand to their ISO-8859-1 equivalents, with no additional text inserted.

This works on both HTML and plain-text files encoded as UTF-8. It also seems to happen when going from UTF-8 to any other encoding, even UTF-7. (I could try all of them with the exception of UTF-16 or UTF-32, which change the original text so much that I can't really tell what it should look like.) Switching back to UTF-8 again removes the added strings, and the document looks just as it should.
I can see this on Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a1) Gecko/20051028 Firefox/1.6a1
The code that's appearing is a whole bunch of "ZERO WIDTH SPACE"s. A block of the Latin-1 gobbledygook (​) is the bytes 0xE2 0x80 0x8B, which is the UTF-8 encoding of U+200B, ZERO WIDTH SPACE.

http://www.fileformat.info/info/unicode/char/200b/index.htm
For where those characters are coming from see http://lxr.mozilla.org/seamonkey/source/toolkit/components/viewsource/content/viewPartialSource.js#52

The real question is why does view selection source even allow you to change encoding: it is always encoded in UTF-8 whatever the original page encoding was.
Assignee: nobody → smontagu
Status: UNCONFIRMED → ASSIGNED
Attachment #201318 - Flags: review?
Attachment #201318 - Flags: review? → review?(mconnor)
This patch is longer, because I also had to cut and paste the menu from viewSourceOverlay.xul into viewSource.xul and viewPartialSource.xul
Attachment #201326 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #201326 - Flags: review?(db48x)
Simon Montagu said "it is always encoded in UTF-8 whatever the original page encoding was."

Wait, I don't understand. Aside from the \x200B's appearing at the selection boundaries, I've found changing encodings in view-source to be tremendously useful. (I work on Project Gutenberg HTML files, which can be several megabytes in size, on a relatively slow computer. The ability to view only a fragment of the source, and check the encoding, is invaluable to me?

Does this mean that view selection source was never meant to be able to change the viewing encoding, and that if I like that ability, I should file a request-for-enhancement bug?
Comment on attachment 201326 [details] [diff] [review]
Same thing for suite

Alternatively you could go the onLoadViewPartialSource route of disabling the menuitem.
Attachment #201326 - Flags: superreview?(neil.parkwaycc.co.uk) → superreview+
Attachment #201326 - Flags: review?(db48x) → review+
(In reply to comment #6)
> Wait, I don't understand. Aside from the \x200B's appearing at the selection
> boundaries, I've found changing encodings in view-source to be tremendously
> useful. (I work on Project Gutenberg HTML files, which can be several megabytes
> in size, on a relatively slow computer. The ability to view only a fragment of
> the source, and check the encoding, is invaluable to me?

Can you provide a step-by-step example of what you are doing and how it helps you? It's possible that you could get the same results with a tool like http://www.macchiato.com/unicode/convert.html.
Simon Montagu said:

> Can you provide a step-by-step example of what 
> you are doing and how it helps you?

Certainly.

I find a bit of text I want to see converted to Latin-1. I highlight it, right-click, view selection source. I get only the containing block-level element, so for a paragraph, it's small, quick and manageable--no scrolling back down to find what I was just looking at. I switch the encoding, and non-ASCII characters pop out at me.

The tools you linked to are lovely, and I'm sure they provide the same ability and then some. It's just convenient having it integrated like it is now.
Attachment #201318 - Flags: review?(mconnor)
Attachment #201318 - Flags: review+
Attachment #201318 - Flags: approval-branch-1.8.1+
Checked in toolkit patch to trunk and MOZILLA_1_8_BRANCH, and xpfe patch to trunk.
Status: ASSIGNED → RESOLVED
Closed: 18 years ago
Keywords: fixed1.8.1
Resolution: --- → FIXED
Attachment #201326 - Flags: approval-branch-1.8.1?(neil)
Attachment #201326 - Flags: approval-branch-1.8.1?(neil) → approval-branch-1.8.1+
Checked in xpfe patch to MOZILLA_1_8_BRANCH also
Product: Firefox → Toolkit
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: