html content serializer and nsISaveAsCharset does not handle surrogate correctly

VERIFIED FIXED in mozilla1.2alpha

Status

()

Core
Internationalization
P3
normal
VERIFIED FIXED
16 years ago
15 years ago

People

(Reporter: Frank Tang, Assigned: Shanjian Li)

Tracking

({intl, topembed})

Trunk
mozilla1.2alpha
x86
Windows 2000
intl, topembed
Points:
---
Dependency tree / graph

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [eta:8/5/2002], URL)

Attachments

(1 attachment)

(Reporter)

Description

16 years ago
I think our nsISaveAsCharset and/or our html content sink code does not handle
surrogate pair correctly

reproduce procedure:
1. open http://people.netscape.com/ftang/testscript/gb18030/gbtext.cgi?page=596
this page have some surrogate characters. (you probably will see all question
mark display there. it is ok)
2. select the fourth line, and copy
3. open "File:New:Blank page to edit"
4. paste the surrogate characters in
5. "File:Save" and save into one file
6. browser that file, 
7. view source of that file, 
you will notice that it use two ncr to encode one unicode characters,
for example you will got
<pre>&#55360;&#56324;&#55360;&#56325;&#55360;&#56326;&#55360;&#56327;&#55360;&#56328;&#55360;&#56329;&#55360;&#56330;&#55360;&#56331;&#55360;&#56332;&#55360;&#56333;</pre>

if you copy the 4th line

expect behavior, I should see
<pre>&#131076;&#131077;&#131078;&#131079;&#131080;&#131081;&#131082;&#131083;&#131084;&#131085;</pre>
(Reporter)

Comment 1

16 years ago
I think this will cause some problme about copy and paste through html. Could be
the cause of the bug ji filed about copy and paste surrogate characters from
mozilla to microsoft WordXP
(Reporter)

Comment 2

16 years ago
probably caused by
intl/unicharutil/src/nsSaveAsCharset.cpp
270 NS_IMETHODIMP
271 nsSaveAsCharset::DoConversionFallBack(PRUnichar inCharacter, char
*outString, PRInt32 bufferLength)
272 {
..
310   case attr_FallbackDecimalNCR:
311     rv = ( PR_snprintf(outString, bufferLength, "&#%u;", inCharacter) > 0) ?
NS_OK : NS_ERROR_FAILURE;
312     break;
313   case attr_FallbackHexNCR:
314     rv = (PR_snprintf(outString, bufferLength, "&#x%x;", inCharacter) > 0) ?
NS_OK : NS_ERROR_FAILURE;
315     break;
(Reporter)

Comment 3

16 years ago
we probably need to change the implementation of 178
nsSaveAsCharset::DoCharsetConversion(const PRUnichar *inString, char **outString)
, and change the interface of 

148 nsSaveAsCharset::HandleFallBack(PRUnichar character, char **outString,
PRInt32 *bufferLength, 
149                                 PRInt32 *currentPos, PRInt32 estimatedLength)


to
148 nsSaveAsCharset::HandleFallBack(PRUint32 inUCS4, char **outString, PRInt32
*bufferLength, 
149                                 PRInt32 *currentPos, PRInt32 estimatedLength)

and 

271 nsSaveAsCharset::DoConversionFallBack(PRUnichar inCharacter, char
*outString, PRInt32 bufferLength)

to
271 nsSaveAsCharset::DoConversionFallBack(PRUint32 inUCS4, char *outString,
PRInt32 bufferLength)

(Reporter)

Comment 4

16 years ago
P3 bug, 
Keywords: intl
Priority: -- → P3

Updated

16 years ago
QA Contact: ruixu → ylong
(Reporter)

Comment 5

16 years ago
I think this problem cause copy and paste problem with OfficeXP when we paste it
in as html.
Also, we will cause problem when we save the page. 
shanjian- please look at this when you have time. We may need this for rtm. (30%
of chance)
Target Milestone: --- → mozilla1.1alpha

Comment 6

16 years ago
nsISaveAsCharset is also used for message send, gfx seems to be using it too
(for transliteration?)
(Assignee)

Comment 7

16 years ago
Created attachment 90894 [details] [diff] [review]
patch
(Assignee)

Updated

16 years ago
Status: NEW → ASSIGNED
(Reporter)

Comment 8

16 years ago
this might cause 131603
Blocks: 157673
(Reporter)

Comment 9

16 years ago
Comment on attachment 90894 [details] [diff] [review]
patch

r=ftang the buffer size is guaranteed to be 256 by the caller and the code use
the right function to ensure no buffer overrun also.
Attachment #90894 - Flags: review+
(Reporter)

Comment 10

16 years ago
let's get this into trunk asap
nsbeta1+ for m1.2final
Keywords: nsbeta1+
Whiteboard: [eta:8/5/2002]
Target Milestone: mozilla1.1alpha → mozilla1.2alpha
(Assignee)

Comment 11

16 years ago
blizzard, could you sr?
Attachment #90894 - Flags: superreview+
Comment on attachment 90894 [details] [diff] [review]
patch

sr=blizzard on these bits.

Any callers that need to be fixed from the API changes?
(Assignee)

Comment 13

15 years ago
fix checked in.
Status: ASSIGNED → RESOLVED
Last Resolved: 15 years ago
Resolution: --- → FIXED

Comment 14

15 years ago
This commit have added a "may be used uninitialized" warning to brad TBox:

+intl/unicharutil/src/nsSaveAsCharset.cpp:254
+ `PRUint32 unMappedChar' might be used uninitialized in this function

See also bug 59652 for more on these warnings.

Comment 15

15 years ago
P.S. The warning was fixed in bug 165908.

Comment 16

15 years ago
batch: adding topembed per Gecko2 document
http://rocknroll.mcom.com/users/marek/publish/Gecko/Gecko2Tasks.html
Keywords: topembed

Comment 17

15 years ago
Verified fixed in 09-12 trunk build / WinXP-SC.
Status: RESOLVED → VERIFIED

Updated

15 years ago
Blocks: 176349
(Reporter)

Updated

15 years ago
Depends on: 180372
(Reporter)

Updated

15 years ago
No longer blocks: 157673
You need to log in before you can comment on or make changes to this bug.