Closed Bug 57720 Opened 24 years ago Closed 23 years ago

Armenian character encoding picked if language dropdown doesn't completely populate [and thus can't login to bugzilla] [form submission encoding wrong]

Categories

(Core :: Internationalization, defect, P3)

x86
All
defect

Tracking

()

VERIFIED FIXED
mozilla0.9.1

People

(Reporter: sgifford+mozilla-old, Assigned: jbetak)

References

Details

(Keywords: intl)

Attachments

(5 files)

I'm using a recent (10/23/2000) CVS build of Mozilla; I've seen this since at
least Friday (10/20/2000).

When I fill out a form that includes parentheses or a period, the data is
percent-encoded incorrectly.  All other characters are encoded correctly.  I'm
seeing this:

  Char    Correct Encoding    Mozilla Encoding
     (    %28                 %A5
     )    %29                 %A4
     .    %2E                 %A9

Exactly what I'm seeing is submitting this text in a <textarea>:

         !"#$%&'()*+,-./0123456789:;<=>?@ABCDEFGHIJKLMNOPQRSTUVWXYZ[\]
        ^_`abcdefghijklmnopqrstuvwxyz{|}

(with no CRs or LFs) results in this encoded:

    QUERY_STRING="textarea-input=+%21%22%23%24%25%26%27%A5%A4*%2B%2C-%A9
        %2F0123456789%3A%3B%3C%3D%3E%3F%40ABCDEFGHIJKLMNOPQRSTUVWXYZ%5B
        %5C%5D%5E_%60abcdefghijklmnopqrstuvwxyz%7B%7C%7D"

This encoding problem makes it impossible for me to use Bugzilla from Mozilla,
since the period in my email address is misencoded.
From walking through the debugger, I think this happens in the network code.
Assignee: rods → gagan
Attached file simple testcase
I just don't see a problem. with Friday's trunk build or today's branch build.

Output from running testcase:

Remote Host = '205.217.229.106'
User Agent = 'Mozilla/5.0 (Windows; U; WinNT4.0; en-US; m18) Gecko/20000518'
Request Method = 'GET'
Content Length = ''

textarea
.()
ENV = 'SERVER_SOFTWARE->Apache/1.3.9 (Unix) PHP/4.0.2 PHP/3.0.16 mod_perl/1.21 
FrontPage/4.0.4.3
GATEWAY_INTERFACE->CGI/1.1
DOCUMENT_ROOT->/home/serve/pollmann
UNIQUE_ID->OfSwPs8ImBkAAA2FhtU
REMOTE_ADDR->205.217.229.106
SERVER_PROTOCOL->HTTP/1.1
REQUEST_METHOD->GET
REMOTE_HOST->205.217.229.106
QUERY_STRING->textarea=.%28%29
HTTP_USER_AGENT->Mozilla/5.0 (Windows; U; WinNT4.0; en-US; m18) Gecko/20000518
PATH->/usr/local/bin:/usr/bin:/bin
HTTP_ACCEPT->*/*
HTTP_CONNECTION->keep-alive
REMOTE_PORT->4827
SERVER_ADDR->207.8.157.240
HTTP_ACCEPT_LANGUAGE->en
HTTP_KEEP_ALIVE->300
SCRIPT_NAME->/echo.cgi
HTTP_ACCEPT_ENCODING->gzip,deflate,compress,identity
SCRIPT_FILENAME->/home/serve/pollmann/echo.cgi
SERVER_NAME->pollmann.net
REQUEST_URI->/echo.cgi?textarea=.%28%29
SERVER_PORT->80
HTTP_HOST->pollmann.net
SERVER_ADMIN->webmaster@POLLMANN.NET
'
The problem appears to be at nsFormFrame.cpp:975.  When I submit

	.()

, on the line that says:

	inBuf  = UnicodeToNewBytes(aString.GetUnicode(), aString.Length(), encoder);

after calling, inBuf contains
	
	(gdb) p inBuf
	$29 = 0x8661510 "йед"
	(gdb) x/3x inBuf
	0x8661510:	0xa9	0xa5	0xa4

.  The incorrect encoding I'm seeing is %A9%A5%A4, which corresponds to this
exactly.

I haven't been able to drill it down any deeper than that; I'll try when I get
home.

I'm compiling with gcc 2.96, a somewhat experimental version; if you're not
seeing this, perhaps it's a compiler issue.  What are you building with?
not in networking. form data. ->pollmann(?) pls. reassign if its someone else.
Assignee: gagan → pollmann
Traced it further to umap.c:112.  If I have a dot on the way in:

	(gdb) p in
	$178 = 46

I get this cell from the mapping table:

	(gdb) p *uCell
	$179 = {fmt = {format0 = {srcBegin = 40, srcEnd = 46, destBegin = 0}, 
	    format1 = {srcBegin = 40, srcEnd = 46, mappingOffset = 0}, format2 = 	{
	      srcBegin = 40, srcEnd = 46, destBegin = 0}}}

which maps to the wrong character:

	(gdb) p *out
	$180 = 169

This seems to happen no matter what my encoding, but I'll play with it a little
more and see if I can figure out which map this is coming from.
This mapping seems to be coming from the Armenian character set, the first one
alphabetically:

	$ cat armscii.uf
	[ ... ]
	Begin of Item 0002
	 Format 1
	  srcBegin = 0028
	  srcEnd = 002E
	  mappingOffset = 0000
	 Mapping  = 
	  00A5 00A4 002A 002B 002C 002D 00A9 
	End of Item 0002 
	[ ... ]

Which leads to 3 questions:

1. Is this the correct behavior for the Armenian character set?

2. Why did my copy of Mozilla decide this was the best character set for it?

3. Why is it still using it when I have changed my default to ISO-8859-1?

Can somebody change their character set to Armenian, and see if they see this
bug also?

Thanks!
Removing all traces of "armscii" from my prefs.js, the one bookmark that
contained "armscii", and clearing my cache (which also contained the word
"armscii" for some reason) cleared it up.

Probably some previous version of Mozilla set my default charset to Armenian,
and then something in this build made that be a problem?  I don't know; it still
seems strange to me that the default behavior for Armenian would be for forms to
not work, but then maybe Armenia isn't a real interactive sort of place . . .

:-)

I'm moving this to Internationalization.  Sorry for filling up your mailboxes
with this gibberish.
Component: Form Submission → Internationalization
This looks like an Internationalization bug; changing assigned to Intl. owner.

Intl. owner -- brief summary.

Problem 1. My browser somehow became set to Armenian (ARMSCII).  That is the
first on my list alphabetically, which seems likely to have something to do with
it.

Problem 2. When that happened, form submission broke completely (see details
above).  I don't know if this is a problem, or if this would actually work if I
was really using an Armenian character set.

Thanks,

-----Scott.
Assignee: pollmann → nhotta
QA Contact: vladimire → teruko
Problem 1
If this is still reproducible with current build by creating a new profile, it's
a bug. Otherwise, it can be treated as worksforme.

Problem 2
It looks like that's a valid behavior. I searched for "ARMSCII-8" using yahoo
and found a link.

http://moon.yerphi.am/~hovik/ArmSCII/armcs-006.html
I couldn't get it to go back to Armenian without manually setting it.  I'll mark
this WORKSFORME.

Dammit...It took me almost 12 hours to track it down to that Armenian thing, and
all I get is a lousy WORKSFORME...


:-)
Status: UNCONFIRMED → RESOLVED
Closed: 24 years ago
Resolution: --- → WORKSFORME
Scott, sounds like you went to a lot of trouble tracking this one down - we
really appreciate it!  Perhaps if someone finds a similar sort of problem they
will search through Bugzilla, find this report and be able to quickly understand
what is going on.  Thanks for taking the time to investigate!
Already came in handy - see bug 57946!
Summary: Incorrect form/CGI encoding of parentheses and period (%28, %29, %2E) → non Western Character Coding -> Incorrect form/CGI encoding of parentheses and period (%28, %29, %2E)
I'm going to reopen this.  I saw this on the branch today with a brand new
profile in 2000110609mn6.  I would have never caught this had someone not told
me about it.
Status: RESOLVED → UNCONFIRMED
Resolution: WORKSFORME → ---
From some brief testing, it appears that this happens if you change the default
font.  I changed it to arial, then noted that the character coding changed to
armenian again.
Status: UNCONFIRMED → NEW
Ever confirmed: true
I can reproduce this.
I changed Western Sans Serif font from "Arial" to "Arial Black".
At this point a charset did not change.
But after I went to www.yahoo.com (or open a new browser windows), charset menu 
changed to Armenian.

Erik, could you take a look at this?
Assignee: nhotta → erik
I'll take it.
Assignee: erik → cata
My last comment, I was able to reproduce with an existing profile. But I have
not been able to reproduce it with a new profile so far.
Also, after I tested with a new profile then came back to the existing profile
again I cannot reproduce it anymore, very strange.

Damn! I can't reproduce... Not with an old profile, nor with a new one. I went 
to Edit>Preferences>Fonts & did Naoki's steps and the default charset stays 
Western(ISO-8895-1).

I am using Win2k, with a build from a pull from the branch ~1wk old. What are 
you guys using?
I am using today's branch build on Windows 2000.
I was able to reproduce once but cannot reproduce it anymore.
OK, I've found a way to reproduce the reported behavior or the default
charset being set to Armenian ARMSCII-8 (instead of Western ISO-8859-1).
I've reproduced this on US Win95 and US W2K.  I did this with a new profile on
W2K.

Will Scott Gifford, Steve Elmer and Jaime Rodriguez, please try to confirm is
this is possibly what they did?

The first time you open the Edit|Preferences...|Languages panel, you will notice
that the item listed in the Default Character Coding dropdown menu flashes
from an initial value of Armenian (ARMSCII-8) to Western (ISO-8859-1),
that is because it is dynamically building (via RDF) the contents of
the dropdown.

If you have never opened the Languages pref panel and you click on
the Languages category in the left-hand side, and then very quicly click on
a different category (e.g., Fonts, History), you seem to interrupt the this
building of the menu contents and it will be left as Armenian (ARMSCII-8).
If you then click OK in another pref panel, then it will change your
character coding default (intl.charset.default in prefs.js) as armscii-8.

For the 3 folks that this has happened to, could you have done the above?
I.e., clicked on Languages and then quickly clicked on Fonts?
It is possible that I accidentally did this, although I don't specifically
recall it.

It sounds like bug reports of this are trickling in occasionally.  It might be
smart to fix the bug that you just discovered, and see if that makes them stop
trickling in.

It also might be smart to add a little bit of code to detect a browser set to
Armenian on the Mozilla login pages (which is where I and other reporters first
encountered this bug), warn the user, and log the incident, so that it would be
possible to see how often this is really occuring.  I wasn't able to find a way
to get the current character set from JavaScript or a CGI script; if somebody
can point me in the right direction, I can try to come up with a patch and send
it to the WebTools folks.

The other theory being floated around over in bug #57946 is that a bug existed
in some previous Mozilla build that set the encoding to Armenian, and left it
there.  I had been downloading and trying new versions on a pretty regular basis
when I encountered this bug, so that is certainly also a possibility.

When my CVS build finishes building, I'll play around and see if I can reproduce
this in any of the ways described above.

Thanks to all for their attention to this!
I came up with the "other" theory, but think my new theory is better since I
can actually reproduce it.

To do you check, you'd need to look at the pref "intl.charset.default", but
you'd need a signed script to access the prefs.
This is the XUL that appears to be interrupted (line 87) and leaving the popup
menu selection set to Armenian (ARMSCII-8).  Why isn't the creation of the
popup atomic?  Seems like if it's interrupted, then nothing should be set.

http://lxr.mozilla.org/seamonkey/source/xpfe/components/prefwindow/resources/content/pref-languages.xul#82

 82       <menulist id="DefaultCharsetList" ref="NC:DecodersRoot"
datasources="rdf:charset-menu"
 83           pref="true" preftype="string" prefstring="intl.charset.default"
 84           prefattribute="data" wsm_attributes="data">
 85           <template>
 86             <menupopup>
 87               <menuitem value="rdf:http://home.netscape.com/NC-rdf#Name"
data="..." uri="..."/>
 88             </menupopup>
 89           </template>
 90       </menulist>
Adding Ben, Hyatt and Waterson to Cc list. See the previous comment about
atomicity of a XUL popup operation.
I don't think that the menu is "partially built", unless you're notifying us of
languages asynchronously via the OnAssert() callback? I'd guess maybe the pref
panel's onload and the menu's oncreate handler are fighting...
Keywords: intl
Since the comments indicate that the summary of this bug is actually not a bug,
I am updating the summary to reflect what the problem really is. Marking 
mostfreq since this is often filed, but usually marked WORKSFORME or DUPLICATE 
of other WORKSFORME bugs.
Summary: non Western Character Coding -> Incorrect form/CGI encoding of parentheses and period (%28, %29, %2E) → Armenian character encoding picked if language dropdown doesn't completely populate [and thus can't login to bugzilla] [form submission encoding wrong]
*** Bug 63180 has been marked as a duplicate of this bug. ***
Keywords: mostfreq
*** Bug 61231 has been marked as a duplicate of this bug. ***
cc:in self
/me ponders how to describe this in one line on the mostfreq page :-)

Gerv
Gerv: no kidding. I would suggest "Can't log in to Bugzilla (it says invalid e-
mail address)" and, totally separately (but with the same bug #) "Mozilla 
switches to Armenian for no reason".
Changing OS to 'All' since this bug's been confirmed on several platforms:
Windows 98, Windows 95, Windows 2000 and Linux.
OS: Linux → All
Somehow this might have been fixed, at least it hasn't showed up for me anymore.
It disappeared a few weekes ago..

Is anyone else still seeing this? Else I think we should mark it WFM.
Sorry to spam everyone but there are people who are still seeing this problem. 
I have had at least one come on to irc complaining he cant submit stuff, I 
would leave this one open right now.
*** Bug 65894 has been marked as a duplicate of this bug. ***
Ksosez: tell your IRC friend that a fresh, new profile will do it. It worked
fine for me.

I really think we should close this now.
move all cata's bug to ftang
Assignee: cata → ftang
*** Bug 66625 has been marked as a duplicate of this bug. ***
Mark it as workforme.
Status: NEW → RESOLVED
Closed: 24 years ago24 years ago
Resolution: --- → WORKSFORME
Sorry, my mistake. We probably should still look at this one. reassign to nhotta. 
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
nhotta- can you fix this one?
Assignee: ftang → nhotta
Status: REOPENED → NEW
Target Milestone: --- → mozilla0.9.1
Changed QA contact to andreasb@netscape.com.
QA Contact: teruko → andreasb
Reassign to jbetak.
Assignee: nhotta → jbetak
Status: NEW → ASSIGNED
Target Milestone: mozilla0.9.1 → mozilla0.9
I've been unsuccessfully trying to reproduce this with old (from Sept-Oct/2000) 
and new builds. 

Although this is really frustrating, it just occurred to me that we might be 
able to diminish future risk by speeding up the drop-down menu build-up. I'm 
examining some pre-initialization / caching possibilities. Alternatively, I'll 
try to see, whether some changes have been made to the oncreate and onload XUL 
event handlers. The onload pref panel handler shouldn't be interrupted by 
the "create" event as waterson indicated.
 
ccing Joe Hewitt. 

Joe, could you please have a glace at this?

The underlying cause for this problem seems to be in the widget state manager. 
Our situation is likely to be similar to the scenario in bug 62101. Our pref 
pane contains a long drop-down box, which gets filled from an RDF source and 
acts almost like a timer similar to what you described in 62101.

What we have been seeing is that when switching between pref panes, sometimes 
our list doesn't get fully loaded and the wsm then caches the first entry from 
the list. When subsequently OK is clicked to store a change on some other pref 
pane, this unwanted pref state propagates to the user profile and continues to 
live there unnoticed until it wrecks havoc.

From September 8 to February 14, this first drop-down list entry was a rather 
arcane character set, causing Mozilla do fail prominently in Bugzilla account 
logins. I was able to track down bugs caused by this between October 6 and 
January 18.

I'm tempted to believe that your change to savePageData on January 16 might 
have helped to alleviate our situation and would like to solicit your opinion.
I finally succeeded in making both PR3 and 0922BASE debug builds.

It seems that with the PR3 build, I can obtain an empty intl.charset.default 
preference by following Bob Jung's steps, which might or might not bring us 
closer to final resolution. 

In the debug builds I'm hitting an assertion "Failed to locate XBL binding" in 
nsXBLService.cpp on line 641, when abruptly leaving and returning to the 
language preference panel. It goes only away when I restart the browser. Only 
then will the charset drop-down box in the language pref panel build properly 
again.
marking dependency on bug 62101. I was able to consistently reproduce this 
problem with both the Netscape_20000922_BASE and Netscape_PR3_RELEASE builds. 

It seems that initially when pref panes are rapidly flipped, 
intl.charset.default is filled with a blank as described in 62101. When 
flipping pref panes again, upon reentry to the languages pane, the build-up of 
the charset drop-down listbox is disrupted and intl.charset.default is filled 
with the first list entry, which happens to be "Armenian (ARMSCII-8)". Debug 
builds throw two assertions, depending on the release:

1) nsXULDocument.cpp, line 5577 - "no script global object" 
mScriptGlobalObject != nsnull 

2) "failed to locate XBL binding" in nsXBLService.cpp on line 641

When patching nsWidgetStateManager.js with Joe Hewitt's change from Jan 18, the 
whole process stops, although the assertions are still being thrown. 

nsWidgetStateManager.prototype = 
  {
    get contentArea()
      { 
        return window.frames[ this.contentID ]; 
      },

    savePageData: 
      function ( aPageTag )
        {
+          if (!(aPageTag in this.dataManager.pageData))
+            return;
Depends on: 62101
Well, as much as I wanted to close this bug, I can't. Since I was not able to 
reproduce this with 100% consistency, I decided to do some more testing over 
the weekend. I'm getting better at it and just reproduced this bug on a tip 
build. 

Instead of "Armenian (ARMSCII)" we are now picking up "Arabic (IBM-864)", which 
is now the first item in the list of default character codings. I'm attaching a 
console output from my Netscape_20000922_BASE build. It's spiced up with some 
debug comments. 

It appears that the culprit is asynchronous load of external JS files from 
individual pref panes. When "dancing around" the preference window tree, the 
asynchronous load can fail due to the timing (racing?) conditions and the 
initialization JS code for the pref pane doesn't get executed. After a failed 
JS load, we start hitting the asserts.

Given such circumstances the character coding list will never be set to the 
current value of the preference. It will be initialized with the first entry in 
the RDF source instead. This value is subsequently cached by the wsm, which 
appeared to be fully functional at all times.
I think we are getting closer to a resolution. Splitting up the pref-
languages.js file and placing all of the initialization code into XUL makes 
this dialog much more robust.

Frank suggested using a flag on the XUL file to indicate that the JS file has 
been fully loaded. We could pursue a route similar to pref-fonts.js, where the 
pref saving is handled by a callback function. There we could verify that the 
dialog was initialized properly before agreeing to save anynew preference 
values.

A side note: when I increase the size of pref-search.js and place a delay in 
its initialization function, this bug can also be reproduced with the "Internet 
Search" preference panel. It seems to be an underlying issue with the 
preference code, which cannot handle multiple rapid requests without 
compromising data integrity. I'll file a bug for that against "Preferences" 
component.

Why would the bug be limited to Preferences?  Seems like any XUL that is
loading external JS files might be affected.
Bob, 

you are most likely right. I'd have to investigate some more, but my first 
reaction would be that preferences are just more susceptible due to transient 
state information caching. I might be wrong, but other parts of the UI might 
not cache and store transient state information in quite the same way.
moving to 0.9.1, hope to have this resolved with a week
Target Milestone: mozilla0.9 → mozilla0.9.1
QA Contact: andreasb → ylong
attaching a preliminary patch
The new patch offers a generic solution for improperly generalized pref panes. 
Ben insists on overseeing all nsPrefWindow.js changes, so I'm not sure whether 
we can get this in before he gets back from his vacation.

I'm still refining the previous patch to pref-languages.js. I could move it to 
bug 41245, since it's a big rework of the current code and I'd favor a quicker 
resolution for this bug.

Would anyone care to review?
No longer blocks: 41891
Whiteboard: need suprereview 2001-05-08 11:53
we really need to get this into 0.9.1 and 6.5 Beta. Please don't make me beg 
for r and sr. 

Ben?
sr=alecf on this band-aid fix
we really need to get to the bottom of this race condition though, is there are
bug around on that somewhere?
thanks for your help alecf - I'll try to come up with a reasonable test scenario
and file a bug against Preferences (XP Toolkit?). Please note that the race
conditions were extremely difficult to reproduce and the inflow of complaints
stopped around January 16.

Since I was able to reproduce this bug in a recent build, I wouldn't feel
comfortable without some damage-control for Beta1.
Whiteboard: need suprereview 2001-05-08 11:53
marking dependency on follow-up bug 80868, marking fixed - thanks everyone!

ylong: this one might be though to verify, please talk to bobj or myself should 
you run into trouble...

Status: ASSIGNED → RESOLVED
Closed: 24 years ago23 years ago
Depends on: 80868
Resolution: --- → FIXED
Actually it's really hard to reproduce even on old build - I have to try very 
hard then I'll see this problem finally, but at least I haven't found once on 
05-15 trunk yet. 
I'll mark it as verified, please re-open it if some one still can reproduce it.
Status: RESOLVED → VERIFIED
*** Bug 83411 has been marked as a duplicate of this bug. ***
*** Bug 83413 has been marked as a duplicate of this bug. ***
*** Bug 89107 has been marked as a duplicate of this bug. ***
*** Bug 91201 has been marked as a duplicate of this bug. ***
*** Bug 102179 has been marked as a duplicate of this bug. ***
*** Bug 103882 has been marked as a duplicate of this bug. ***
*** Bug 100915 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: