Closed Bug 137657 Opened 22 years ago Closed 22 years ago

html content serializer and nsISaveAsCharset does not handle surrogate correctly

Categories

(Core :: Internationalization, defect, P3)

x86
Windows 2000
defect

Tracking

()

VERIFIED FIXED
mozilla1.2alpha

People

(Reporter: ftang, Assigned: shanjian)

References

()

Details

(Keywords: intl, topembed, Whiteboard: [eta:8/5/2002])

Attachments

(1 file)

I think our nsISaveAsCharset and/or our html content sink code does not handle
surrogate pair correctly

reproduce procedure:
1. open http://people.netscape.com/ftang/testscript/gb18030/gbtext.cgi?page=596
this page have some surrogate characters. (you probably will see all question
mark display there. it is ok)
2. select the fourth line, and copy
3. open "File:New:Blank page to edit"
4. paste the surrogate characters in
5. "File:Save" and save into one file
6. browser that file, 
7. view source of that file, 
you will notice that it use two ncr to encode one unicode characters,
for example you will got
<pre>&#55360;&#56324;&#55360;&#56325;&#55360;&#56326;&#55360;&#56327;&#55360;&#56328;&#55360;&#56329;&#55360;&#56330;&#55360;&#56331;&#55360;&#56332;&#55360;&#56333;</pre>

if you copy the 4th line

expect behavior, I should see
<pre>&#131076;&#131077;&#131078;&#131079;&#131080;&#131081;&#131082;&#131083;&#131084;&#131085;</pre>
I think this will cause some problme about copy and paste through html. Could be
the cause of the bug ji filed about copy and paste surrogate characters from
mozilla to microsoft WordXP
probably caused by
intl/unicharutil/src/nsSaveAsCharset.cpp
270 NS_IMETHODIMP
271 nsSaveAsCharset::DoConversionFallBack(PRUnichar inCharacter, char
*outString, PRInt32 bufferLength)
272 {
..
310   case attr_FallbackDecimalNCR:
311     rv = ( PR_snprintf(outString, bufferLength, "&#%u;", inCharacter) > 0) ?
NS_OK : NS_ERROR_FAILURE;
312     break;
313   case attr_FallbackHexNCR:
314     rv = (PR_snprintf(outString, bufferLength, "&#x%x;", inCharacter) > 0) ?
NS_OK : NS_ERROR_FAILURE;
315     break;
we probably need to change the implementation of 178
nsSaveAsCharset::DoCharsetConversion(const PRUnichar *inString, char **outString)
, and change the interface of 

148 nsSaveAsCharset::HandleFallBack(PRUnichar character, char **outString,
PRInt32 *bufferLength, 
149                                 PRInt32 *currentPos, PRInt32 estimatedLength)


to
148 nsSaveAsCharset::HandleFallBack(PRUint32 inUCS4, char **outString, PRInt32
*bufferLength, 
149                                 PRInt32 *currentPos, PRInt32 estimatedLength)

and 

271 nsSaveAsCharset::DoConversionFallBack(PRUnichar inCharacter, char
*outString, PRInt32 bufferLength)

to
271 nsSaveAsCharset::DoConversionFallBack(PRUint32 inUCS4, char *outString,
PRInt32 bufferLength)

P3 bug, 
Keywords: intl
Priority: -- → P3
QA Contact: ruixu → ylong
I think this problem cause copy and paste problem with OfficeXP when we paste it
in as html.
Also, we will cause problem when we save the page. 
shanjian- please look at this when you have time. We may need this for rtm. (30%
of chance)
Target Milestone: --- → mozilla1.1alpha
nsISaveAsCharset is also used for message send, gfx seems to be using it too
(for transliteration?)
Attached patch patchSplinter Review
Status: NEW → ASSIGNED
this might cause 131603
Blocks: 157673
Comment on attachment 90894 [details] [diff] [review]
patch

r=ftang the buffer size is guaranteed to be 256 by the caller and the code use
the right function to ensure no buffer overrun also.
Attachment #90894 - Flags: review+
let's get this into trunk asap
nsbeta1+ for m1.2final
Keywords: nsbeta1+
Whiteboard: [eta:8/5/2002]
Target Milestone: mozilla1.1alpha → mozilla1.2alpha
blizzard, could you sr?
Attachment #90894 - Flags: superreview+
Comment on attachment 90894 [details] [diff] [review]
patch

sr=blizzard on these bits.

Any callers that need to be fixed from the API changes?
fix checked in.
Status: ASSIGNED → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
This commit have added a "may be used uninitialized" warning to brad TBox:

+intl/unicharutil/src/nsSaveAsCharset.cpp:254
+ `PRUint32 unMappedChar' might be used uninitialized in this function

See also bug 59652 for more on these warnings.
P.S. The warning was fixed in bug 165908.
batch: adding topembed per Gecko2 document
http://rocknroll.mcom.com/users/marek/publish/Gecko/Gecko2Tasks.html
Keywords: topembed
Verified fixed in 09-12 trunk build / WinXP-SC.
Status: RESOLVED → VERIFIED
Blocks: grouper
Depends on: 180372
No longer blocks: 157673
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: