Closed Bug 295865 Opened 20 years ago Closed 17 years ago

can't select string into primary selection with (certain) special characters from win-1252 -encoded text

Categories

(Core :: Internationalization, defect)

x86
Linux
defect
Not set
normal

Tracking

()

RESOLVED INVALID

People

(Reporter: sakke, Assigned: smontagu)

References

()

Details

Attachments

(1 file)

User-Agent:       Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050514 Firefox/1.0.4
Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.7.8) Gecko/20050514 Firefox/1.0.4

For example, in http://www.civil-war.net/cw_images/files/prisoner.htm
trying to paint (to select) the following line:
587. The Execution of William Johnson at Jordan's Farm Near Petersburg, VA, July
20, 1864

..the selection is painted but it never goes to Primary Selection and therefor
cannot be "pasted" or "inserted" to other X-app. The problem seems to lie in
dash, or hyphen (whichever you prefer). I can select the aforementioned string
in parts when I don't paint this character. 

Same sort of problems occur on some pages which are encoded in win-1252 encoded
characters that are out of range a-z 0-9. This problem might exist with UTF-8
too, but I'm not verifying that until i'll come across an example. 

This problem seems to plague at least 1.0.4, 1.0.3, 1.0.2, possibly 1.0.1. I
remember seeing this odd behaviour first time sometime around 4th quarter of
2004. Previously there was no problem since all odd characters were replaced
with question marks, and it did not break Primary Selection.



Reproducible: Always

Steps to Reproduce:
1. open page http://www.civil-war.net/cw_images/files/prisoner.htm
2. try to select line that begins.. 587. The Execution of William Johnson at.. 
3. paste/insert in other X app

Actual Results:  
Nothing

Expected Results:  
Inserting selected text
Note likely to be an GRE issue...
Assignee: dougt → smontagu
Component: Embedding: GRE Core → Internationalization
QA Contact: amyy
Attached file Reduced testcase
This seems to depend on what you're pasting into. Pasting into xterm works but
the dash character is skipped. Pasting into KDE's konsole works; the dash is
replaced by a '?', possibly due to the font I'm using.

With gnome-terminal it depends on the character encoding that gnome-terminal is
set to (Tools->Set Character Encoding). Set to ISO-8859-1, the paste fails and
gnome-terminal writes a message to stderr:

    ** (gnome-terminal:7657): WARNING **: Error (Invalid or incomplete
    multibyte or wide character) converting data for child, dropping.

When I change gnome-terminal to use UTF-8, the paste succeeds including the
dash character.
(In reply to comment #2)
> Created an attachment (id=192630) [edit]
> Reduced testcase

In all three cases (xterm - used 'uxterm' which is a shell-script-wrapper over
xterm, konsole, gnome-terminal), it works fine for me as long as all of them run
in a UTF-8 locale.
 
> This seems to depend on what you're pasting into. Pasting into xterm works but
> the dash character is skipped. 

That's because either your xterm was launched in a locale that doesn't cover
U+2013 or your font setting for xterm doesn't cover it. The former is more
likely. In an xterm launched in a UTF-8 locale, I switched to a legacy locale
whose codeset can't represent U+2013 (using 'luit') and I found U+2013 was skipped. 

> Pasting into KDE's konsole works; the dash is
> replaced by a '?', possibly due to the font I'm using.

Either because of that or because you're running Konsole in a locale whose
codeset doesn't cover U+2013. 

> With gnome-terminal it depends on the character encoding that gnome-terminal is
> set to (Tools->Set Character Encoding). Set to ISO-8859-1, the paste fails and
> gnome-terminal writes a message to stderr:
 
>     ** (gnome-terminal:7657): WARNING **: Error (Invalid or incomplete
>     multibyte or wide character) converting data for child, dropping.

This is what is to be expected because U+2013 is not covered by ISO-8859-1. 

> When I change gnome-terminal to use UTF-8, the paste succeeds including the
> dash character.

It also works fine if the character encoding is set to any of encodings covering
U+2013 (e.g. Windows-1257/1253/1251, let alone Windows-1252 and UTF-8). The same
is true of Konsole (Konsole also has the character encoding menu)


Reporter: what's your locale (the output of 'locale' command). What X
applications did you try to paste into? Despite all these, not being able to
paste at all (as opposed to U+2013 being turned into a question mark) should be
fixed if that actually happens, but I can't reproduce your problem described in
comment #0.
(In reply to comment #3)
> (In reply to comment #2)
>
> Reporter: what's your locale (the output of 'locale' command). What X
> applications did you try to paste into? Despite all these, not being able to
> paste at all (as opposed to U+2013 being turned into a question mark) should be
> fixed if that actually happens, but I can't reproduce your problem described in
> comment #0.
> 
Sorry for taking a moment, been off the box. 

My locale is en_US. In the beginning the problem was with any X-app; now, after
some upgrades (newer Xorg, newer rxvt, newer gnu emacs, alot of others) the
problem seems to be with rxvt: at least xterm, kde-apps, emacs and OO do happily
accept inserting the primary selection (some dropping bad chars), which,
probably means it must have been partially problem with X. Nevertheless, using
primary selection worked earlier perfectly, as described in comment #0; although
it seems kind of odd, since considering the quite long period of time I took
upon myself to actually make a bug of this, there has been few new sub-versions
of Xorg (by distro) and few official minor versions too. 

I dunno guys, i'll leave it unconfirmed and let you decide the resolution. I do
realize i'm just one user and there's more important fixes to do - and now I
have some kind of workaround for this since newer xterm works at least partially. 
(In reply to comment #4)

Thanks for your reply. Somehow, I don't have rxvt so that I can't try it. If
it's only rxvt which doesn't accept 'paste' from firefox, it's likely to be rxvt
that is to blame.

> My locale is en_US. I
> newer xterm works at least partially. 

I strongly recommend you use en_US.UTF-8 instead of en_US (whose codeset is
ISO-8859-1). Then, even xterm will happily accept any characters representable
in UTF-8 (virtually all) although some of them will appear as hollow boxes
because they're not covered by a font/fonts used. 

(In reply to comment #5)
> (In reply to comment #4)
> 
> Thanks for your reply. Somehow, I don't have rxvt so that I can't try it. If
> it's only rxvt which doesn't accept 'paste' from firefox, it's likely to be rxvt
> that is to blame.

Currently - probably, previously - not.
 
> > My locale is en_US. I
> > newer xterm works at least partially. 
> 
> I strongly recommend you use en_US.UTF-8 instead of en_US (whose codeset is
> ISO-8859-1). 

As much as I'd like to use UTF8, I can't. It's just not supported widely enough.
Or i'd need to be able to switch between the two all the time, like with
keyboard layouts.  
(In reply to comment #6)

> > I strongly recommend you use en_US.UTF-8 instead of en_US (whose codeset is
> > ISO-8859-1). 
> 
> As much as I'd like to use UTF8, I can't. It's just not supported widely enough.
> Or i'd need to be able to switch between the two all the time, like with
> keyboard layouts.  

I thought we live in 2005 (not in 1998) ;-) The keyboard layout has little to do
with the codeset of your locale.


(In reply to comment #7)
> (In reply to comment #6)
> 

> 
> I thought we live in 2005 (not in 1998) ;-) The keyboard layout has little to do
> with the codeset of your locale.
> 
> 
> 

Ok we are getting WAY out of topic here. I used the possibility of switching
keyboard layouts as an example of how I would prefer to be able to switch
between locales. Let's cut the ****, shall we.
After reading all comments above, I'm resolving this bug as INVALID. You obviously can't expect a pasted string to "look right" if it contains characters which the receiving application is not set up to represent. For instance, any Unicode codepoint above U+00FF cannot be displayed in any application using ISO-8859-1.
Status: UNCONFIRMED → RESOLVED
Closed: 17 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: