Closed Bug 179026 Opened 22 years ago Closed 22 years ago

URL parameter containing non-ASCII characters is not parsed correctly

Categories

(Core :: Internationalization, defect)

x86
Windows 2000
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla1.3alpha

People

(Reporter: momoi, Assigned: darin.moz)

References

()

Details

(Keywords: regression, topembed)

Attachments

(1 file)

** Observed with 2002-11-06 1.0.2 branch and trunk builds **
** This problem does not occur with 1.0.1 branch builds such as
   Netscape 7-rtm. **

1. Go to the above chat site in China.
2. Click on any chat channels using the latest trunk or 
   1.0.2 branch build.
3. It cannot find the chat room in question unless the
   URL parameter for channel contains an ASCII character.

This does happen with 1.0.1 branch builds such as Mozilla 1.0 or
Netscape releases.

These channels are accessed by an HREF URL such as the 
following: (View this under Chinese (GB2312) encoding.

HREF="javachat/login.php?channel=»¨¼¾Óê¼¾"

The URL is sent to the server side script with a channel
name appended at the end. 

With the latest trunk and 1.0.2 branch builds, no
character after "channel=" seems to be processed.
So the resulting URL is:

http://chat.sohu.com/javachat/login.php?channel=

with no channel name.

On this page there are a few channels that can be found and
all of them contain at least 1 ASCII character as part of
the channel name, e.g.:

HREF="subclass.php?classname=boys"

HREF="javachat/login.php?channel=³õÁµÔÚ14"

Note the 2nd example. If the channel name is displayed
in the status bar, that channel can be found when the 
link is clicked on. If not, it cannot be.

The status bar displays the channel name correctly in
the 2nd case above -- it seems that ASCII characters "14"
seems to make this possible.

The opening of chat channels at this chat site critically
depends on processing the channel parameter correctly. 

This is a regression from 1.0.1 branch releases.
This is needed for embedding customers and should also be fixed
ASAP for the 1.0.2. branch. Nominating fro nsbeta1 and topembed.

Note that this may be related to Bug 169388.
Severity is critical.



Severity: normal → critical
Keywords: nsbeta1, topembed
nhotta
Assignee: smontagu → nhotta
The Chinese channel name doesn't show in URL and staus bar is a regression.

But by clicking any channel name, after I login as a random guest name, I always
get the same chat page no matter which channel name that I clicked:
http://chat.sohu.com/javachat/main.php
this is the same result with IE.

So, looks like it doesn't matter with whether the channel name is displayed or
not in URL bar, I'll always get the same chat page at last.  IE, N7.0RTM and
latest branch or trunk build has the same result.
Keywords: regression
Sorry, there was a mistaken in my previous comment:
In 11-07 branch or trunk build, when I click the Chinese channel name, I'll have
the page:
http://chat.sohu.com/javachat/main.php
But with error message: "²ÎÊý´íÎó£¬Ã»ÓÐÖ¸¶¨·¿¼äÃû»òÕßÓû§êdzƣ¬Çë·µ»ØÖØнøÈë".

If I enter channel name "³õÁµÔÚ14", then I'll see the chat room same as in IE.

Target Milestone: --- → mozilla1.3alpha
The regression started to happen from 10/31 1.0 branch build. The last good
build is 10/30 branch build. Starting from 10/31, the escaped channel name is
not shown in url bar when clicking on a channel, and "wrong parameter" error
occurs in this case.
I reverted a change for nsURLParsers.cpp checked in 10/30 and the problem was
fixed. 
The problem was that non ASCII characters after '?' were truncated.
http://chat.sohu.com/javachat/login.php?channel=%B4%BA%CC%EC%BB%A8%BB%E1%BF%AA

I think this is caused by bug 169902.
Reassign to dbradley%netscape.com.

Assignee: nhotta → dbradley
Keywords: adt1.0.2
This looks like it worked before the 169902 fix by pure accident. p happens to
be a char rather than unsigned char so the previous *p > 0 test (I'm assuming a
test for reaching the end of the string) stopped truncation at an intl char with
the high bit turned on. Now that there's a different test for the string
termination all intl chars appear to be less than a control character.

Are URLs guaranteed to be UTF8? If some other multibyte character set is allowed
I don't think you can just strip bytes since a second byte might fall in the
control character range. or maybe they all avoid that range and you're safe.
Attached patch v1 patchSplinter Review
cast to unsigned char...
dveditz: yes, the input to these URL parsing functions must be UTF-8, so this
filtering is valid (if done correctly).

-> me
Assignee: dbradley → darin
When saw this bug, I was wondering the same thing if URL's were always UTF8.

-    for (p = spec + specLen - 1; (*p > 0) && (*p <= ' '); --p);
+    for (p = spec + specLen - 1; (*p <= ' ') && (p != spec); --p);

Yeah, I think I misunderstood the *p > 0 as looking for a terminator. It really
wasn't. The comment could have been a little more explicit about *p > 0

I'm not well versed on UTF8, but I'm wondering what constitues a control
character. I'm just a little worried that (unsigned char)*p <= ' ' may not be
enough. Since we're going character by character through this, we may be in a
sequence of valid non-whitespace characters that evaluate to true, depending on
where p is positioned.
dbradley: w/ UTF-8, each non-ASCII character has the high-bit (the 8-th bit)
set.  as a result, the expression:
  
  ((unsigned char) *p <= ' ')

should not evaluate to true if the byte (*p) is part of a UTF-8 multibyte
character.  (this is one of the properties of UTF-8 that make it so attractive.)
Comment on attachment 105740 [details] [diff] [review]
v1 patch

r=dbradley

Ok, that makes sense then. It needs to get checked into the branch as well.
Attachment #105740 - Flags: review+
*** Bug 177730 has been marked as a duplicate of this bug. ***
*** Bug 179143 has been marked as a duplicate of this bug. ***
Comment on attachment 105740 [details] [diff] [review]
v1 patch

sr=dveditz
Attachment #105740 - Flags: superreview+
Status: NEW → ASSIGNED
Keywords: mozilla1.0.2
This is a pretty severe bug that will break many sites. Definitely needs to make
it into blackbird.
I see the patch was checked in to the trunk.
I tested my local build and the problem for http://chat.sohu.com/ is fixed.
On the 1.0.2 branch we'd like the bogus 169902 patch backed out instead. It
looks like the real bug in 169902 was not applicable to the branch (no
FilterString) and only the  hitchhiker from comment 3 was checked into the branch.

Although it should be equivalent, we're more comfortable with the code we know
worked until 10/31.
adt plussing to back out 169902 patch as dveditz suggests in comment 18
Keywords: adt1.0.2adt1.0.2+
The 11-11 trunk build seems already has this fixed:
1. http://chat.suho.com, the chinese character parameter channel name is escaped
in URL bar, and I don't get the wrong parameter error message any more.
2. The cases montioned in dup. bug 179143 were fixed: local and ftp file system
are displayed/opened properly.
3. Can open the URL that including czech char in the test case of dup. bug 177730 

I have tested on WinXP-SC, linux RH7.2-JA, Mac 10.1.5 and Mac 9.2.1.
marking FIXED... dbradley: thanks for checking in the patch!
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
Marking as verified per comment #20.
Status: RESOLVED → VERIFIED
Comment on attachment 105740 [details] [diff] [review]
v1 patch

a=asa for checkin to the 1.2 branch (on behalf of drivers)
Attachment #105740 - Flags: approval+
The only issue with backing out on the 1.0 branch, is this will reintroduce the
slight risk of a buffer under run that existed before. As long as everyone is
fine with that, I'll back it out today.
i'd prefer not re-introducing the buffer under run.  what's so risky about the
current patch?  seems a lot safer than a buffer under run.
From adt: ok, check in the new patch (after getting driver's approval).
What's the impact of the current fix, i.e. are there any specific areas QA
should focus on testing in order to make sure that there are no regressions
introduced?  Thanks.
a=rjesup@wgate.com for 1.0 branch; change mozilla1.0.2+ to fixed1.0.2 when
checked in
fixed1.0.2
fixed1.2

revision 1.19.2.1
date: 2002/11/11 20:16:23;  author: darin%netscape.com;  state: Exp;  lines: +1 -1
fixes bug 179026 "URL parameter containing non-ASCII characters is not
parsed correctly" r=dbradley sr=dveditz a=asa
Keywords: fixed1.2
Verified fixed on 11-12-13 branch build, change "fixed1.0.2" to "verified1.0.2".
Verified fixed in 11-26 1.2 build / WinME.  Change "fixed1.2" to "verified1.2".
Keywords: fixed1.2verified1.2
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: