Optimize UTF8 to UTF16 conversion

RESOLVED FIXED

Status

()

Core
String
RESOLVED FIXED
8 years ago
6 years ago

People

(Reporter: vlad, Assigned: vlad)

Tracking

(Blocks: 5 bugs)

Trunk
Points:
---
Dependency tree / graph
Bug Flags:
blocking1.9.2 -
wanted1.9.2 +

Firefox Tracking Flags

(Not tracked)

Details

(Whiteboard: [ts])

Attachments

(9 attachments, 2 obsolete attachments)

1.90 KB, patch
Details | Diff | Splinter Review
897.29 KB, text/plain
Details
4.99 KB, patch
Details | Diff | Splinter Review
14.50 KB, text/plain
Details
1.99 KB, patch
Details | Diff | Splinter Review
9.64 KB, patch
Details | Diff | Splinter Review
6.60 KB, patch
jst
: review+
Details | Diff | Splinter Review
4.90 KB, patch
Details | Diff | Splinter Review
487 bytes, patch
vlad
: review+
Details | Diff | Splinter Review
ConvertUTF8_to_UTF16 shows up at about 5-8% of warm start time.  Our implementation is straightforward C code; I bet we can find a much faster implementation using SSE, NEON, whatever's available.
Whiteboard: [ts]
There is u8u16, but I don't think it's license is GPL-compatible.
Quick and dirty test:

inBufSize: 2155589
Convert_1: 4.78326 ms
Convert_dumb: 2.2116 ms
Convert_dumb2: 2.8301 ms
Convert_u8u16: 1.90388 ms

the input is all ASCII, about 2MB worth.

* Convert_1 is our current routine. 
* Convert_dumb just does u8->u16 straight; it's not a true utf8->utf16 conversion.
* Convert_dumb2 walks the input string and looks for any high bits; if no high bits are set, it does a straight u8->u16 conversion, otherwise it calls Convert_1 (so it basically times the overhead of checking for any high bits set).
* Convert_u8u16 uses u8u16.

Convert_dumb could be accelerated even more using SSE; my impl is just a straight C impl.

I'd suggest that we take the dumb2 approach in our current code and keep searching for something faster.
Er, sorry:

Convert_1: 7.22484 ms
Convert_dumb: 2.18996 ms
Convert_dumb2: 2.84106 ms
Convert_u8u16: 1.8791 ms

The 4.7ms number was with some hacks applied.
Created attachment 390671 [details] [diff] [review]
sample patch

Convert_1: 6.70116 ms
Convert_2: 2.73848 ms
Convert_dumb: 2.12909 ms
Convert_dumb2: 2.73505 ms
Convert_u8u16: 1.77669 ms

Here's an impl of Convert_2.  Note that I'm trying to optimize for ascii here, and also that this isn't touching utf8->utf16 that happens with js, and maybe not for xbl/xml.  Not sure.

This needs some SSE2 love, and it also should probably do something with the work it did to figure out which initial parts of a string have no non-ascii chars by doing the fast copy for that segment...
blocking1.9.1: needed → ---
Flags: blocking1.9.2?
Maybe mmoy can help?

Comment 6

8 years ago
I ran a comparison of char to short using c code and sse2 intrinsics and came up with 625 for c and 484 for intrinsics. This was with the best case of input and output aligned 16. To get this to work in the real world, there would have to be some special code at the beginning and end to handle the alignment stuff and this would consume some cycles.

I can code up something to do the high-bit testing too just to get comparison times.

A few questions:

- Can I assume that in/out are aligned on dwords?
- What is the typical number of characters to be processed?
(In reply to comment #6)
> I ran a comparison of char to short using c code and sse2 intrinsics and came
> up with 625 for c and 484 for intrinsics. This was with the best case of input
> and output aligned 16. To get this to work in the real world, there would have
> to be some special code at the beginning and end to handle the alignment stuff
> and this would consume some cycles.
> 
> I can code up something to do the high-bit testing too just to get comparison
> times.

Great!

> A few questions:
> 
> - Can I assume that in/out are aligned on dwords?

I /think/ so.  If not, we may be able to change things around so that it's always true.

> - What is the typical number of characters to be processed?

Good question.  Someone should do some instrumentation and figure that out... would probably be helpful to know distribution of run lengths, and also for each run how many ascii chars before you hit a char with the high bit set.

Updated

8 years ago
Blocks: 506431
(In reply to comment #0)
> ConvertUTF8_to_UTF16 shows up at about 5-8% of warm start time.  Our
> implementation is straightforward C code; I bet we can find a much faster
> implementation using SSE, NEON, whatever's available.

How much data is being converted? 5% of 900ms is 45ms which would be like 12MB of data, which seems like a lot of text... Is all of that unique text or are we converting the same text more than once?
> How much data is being converted?

Well, let's see.  Quickly instrumenting the sync case in nsXBLService::FetchBindingDocument shows us loading about 630KB of UTF-8 encoded XBL at startup.

Doing the same for nsStreamLoader::OnDataAvailable (which should really only be hit for CSS and maybe scripts) I see us loading about 1.1MB of data that way.

Then there are the actual XUL documents, of course, not to mention whatever internal conversions we might be doing.
Created attachment 391128 [details]
The text that we convert on startup

About 900k worth, nothing obviously duplicated. I don't know if any of the stuff is unnecessary, but there sure is a lot of it.
Attachment #391128 - Attachment mime type: application/octet-stream → text/plain

Comment 11

8 years ago
I instrumented an image with the patch and it came up with 748671 bytes converted. But doing a lot of other things generates conversions too. As far as startup time goes, is precomputing the converted text a possibility?
Possibly; I looked into doing a quick-and-dirty iconv conversion of the relevant files, but that caused other problems since that switches the default charset of a bunch of documents (with a utf16 bom).  There's also additional IO cost.  But if someone can think of a way to do it without breaking anything, it'd be good to evaluate that cost.
However, a lot of this is JS modules and XBL, which we want to fastload...

Comment 14

8 years ago
I have an SSE2 implementation of the one to two byte conversion working but it uses MOVDQU to write the output doublequadwords. MOVDQU is an unaligned write so there's a performance penalty there. I did some additional instrumenting and it looks like both source and destination are doubleword aligned which means that I can add code to do aligned writes with a little effort. I have to add some code to turn this on for Mac OS X and Windows too.

Comment 15

8 years ago
Created attachment 394142 [details] [diff] [review]
Original patch with SSE2 one byte to two byte code

This patch has been tested on Mac OS X and Windows and it appears to work (returns correct results). There is special code to handle the cases where the offset between the source and destination is 0 or 8 after the source has been double quadword aligned. These two cases are seen about 99% of the time. Other alignments are handled by the movdqu (unaligned move) default code.

Could someone run a performance test to determine if this code helps?
Warm start on my fast win7 box went from 343-344ms down to 332-334ms.  Not a huge boost, but noticeable.  I also added a block at the start of the high-bit-set checking to align the pointer to 4 bytes, so that we don't blow up on ARM.

However... after just doing a bunch more startups, I started seeing some 342's again.  Now I'm starting to doubt my measurement accuracy; I'll try this again on my slower Mac tomorrow.

Also, u8u16 is really crazy fast!  I'll plug your code into the hack benchmark that I wrote tomorrow as well to see how it compares.
On my mac -- goes from 825-843 to 780-790.  I'll take 5-6%!
You could also probably use _mm_movemask_pi8 instead of & 0x80808080 to pick up another couple percent.
Created attachment 395742 [details]
standalone benchmark

Just added the sse code in to the benchmark -- it gets us in the right ballpark, g++ -O2:

                orig: 8.06701 ms
                 chk: 2.87505 ms
          dumb_ascii: 2.26799 ms
                 sse: 2.86604 ms
           sse_ascii: 2.23546 ms
               u8u16: 1.8919 ms

"chk" is basically the same as the sse case, where it checks for high byte, but then instead of using sse it just does a simple byte-to-short copy loop.  The two _ascii variants just do straight copies and don't do the high bit check.  Kinda unfortunate that the sse variant is basically the same speed.  But, it doesn't hurt.

So, what I'm trying to say: let's get this patch in (but with the additional alignment fix).

I've also attached the benchmark; apologies for horrible code.  Some more bits:

g++ -O2 -mtune=prescott (gcc 4.0.1):

                orig: 6.72993 ms
                 chk: 2.87276 ms
          dumb_ascii: 2.23084 ms
                 sse: 2.90849 ms
           sse_ascii: 2.23746 ms
               u8u16: 1.8933 ms

similar for g++-4.2 with -mtune=core2.

g++ -O3 -mtune=prescott (gcc 4.0.1):

                orig: 7.47574 ms
                 chk: 2.87053 ms
          dumb_ascii: 2.23784 ms
                 sse: 2.90917 ms
           sse_ascii: 2.24879 ms
               u8u16: 1.91424 ms

(yes, slower than -O2, even worse slowdown with g++-4.2)

g++ -Os -mtune=prescott:

                orig: 9.78382 ms
                 chk: 2.86463 ms
          dumb_ascii: 2.25211 ms
                 sse: 3.0451 ms
           sse_ascii: 2.33603 ms
               u8u16: 2.01193 ms
Created attachment 395743 [details] [diff] [review]
patch with 4-byte alignment on check

Just the 4-byte alignment at the start in this patch.  Let's get this in, and someone can write the followup bug for linux (really, the only thing that's missing is having sse2_available equivalent defined somewhere on linux).
Attachment #395743 - Flags: superreview?
Attachment #395743 - Flags: review?(mmoy)
Attachment #395743 - Attachment is patch: true
Attachment #395743 - Attachment mime type: application/octet-stream → text/plain
Attachment #395743 - Flags: superreview? → superreview?(jst)
And while I'm here, here's the benchmark for real utf8 data (longer than the earlier set, so they're not directly comparable... this is 3145728 bytes, original was 2155589):

g++ -O2:

                orig: 15.4942 ms
                 chk: 15.4288 ms
          dumb_ascii: 3.77464 ms
                 sse: 15.4654 ms
           sse_ascii: 3.78217 ms
               u8u16: 7.89827 ms

g++ -O2 -mtune=prescott:

                orig: 14.4634 ms
                 chk: 14.2659 ms
          dumb_ascii: 3.78667 ms
                 sse: 14.2653 ms
           sse_ascii: 3.77649 ms
               u8u16: 7.92226 ms
Replacing "& 0x80808080" with "_mm_movemask_epi8(*(__m128i *) src)" gives me the following results on the benchmark (I used my own pure ASCII text, so obviously the results aren't directly comparable, and I commented out u8u16 since it doesn't seem to want to build on x64)

Before
kshuey@linux-5wgr:~/Downloads> ./t.o
inBufSize: 512262
                orig: 5.40125 ms
                 chk: 2.95589 ms
          dumb_ascii: 2.64377 ms
                 sse: 1.31553 ms
           sse_ascii: 0.82156 ms

After
kshuey@linux-5wgr:~/Downloads> ./t.o
inBufSize: 512262
                orig: 5.54781 ms
                 chk: 2.9853 ms
          dumb_ascii: 2.5621 ms
                 sse: 0.994294 ms
           sse_ascii: 0.81584 ms


That works out to a relative savings of 25%.  Given your number of 5-6% of startup time overall that shaves off a little over 1% more.
A few comments:

A code comment that said something like "Use the real converter if there are any bytes with the high bit set, otherwise just expand ASCII to UTF-16" might have saved me a few minutes of head-scratching.

The "return NS_OK" at the end doesn't seem like it's equivalent to what the other code does; don't you need to return NS_OK_UDEC_MOREOUTPUT in some cases?

Finally, the UTF8-UTF16 conversion code in xpcom/string has been optimized a good bit more than this code (although it's also a little less tolerant of errors, since it's designed for "sanitized" data... although I think that's actually mostly been fixed recently).  I suspect ConvertReal could be made a good bit faster; one thing I might try would be loading the member variables into locals at the start of the function and saving them back into the member variables at the end.

Comment 24

8 years ago
Shouldn't the |#endif ! | be |#endif // | instead? Why are you using two different *_SSE2 macros when you are really only using them together?

Are you planning to address Linux (or in general, other GCC platforms) in a similar way (in another bug)? I hope I adapted your benchmark correctly for my Linux x86_64 system (taking the MAC_SSE2 route without u8u16), I get these numbers for comparison

inBufSize: 918825
                orig: 3.12586 ms
                 chk: 2.68073 ms
          dumb_ascii: 0.815924 ms
                 sse: 2.68309 ms
           sse_ascii: 0.446675 ms
I think that your patch should not define MAC_SSE2.  gcc defines __SSE2__ for MacOS X and x86_64, so you should use __SSE2__ instead of your MacOS's define.
So I did some testing of vlad's benchmark from attachment 395742 [details] using attachment 391128 [details] as lipsum.txt.

My understanding is that the changes in this patch are supposed to move us from "orig" to "sse".

On my laptop (Linux, x86-64, Thinkpad), I see:

                orig: 2.97091 ms
                 chk: 2.73285 ms
          dumb_ascii: 0.91355 ms
                 sse: 2.74453 ms
           sse_ascii: 0.83265 ms

on mrbkap's laptop (x86 Mac), I actually see chk and sse being slightly slower than orig (I guess the cost of the extra pass varies).

Now, I wrote a patch to the original code to:
 * not use member variables during the conversion
 * add a nested inner loop

and on my laptop I now get:

                orig: 1.34414 ms

(I think these numbers only really have two significant figures, though.)


So I think maybe it would be better to optimize the original code and avoid the two-pass approach.
Created attachment 395941 [details] [diff] [review]
patch to standalone benchmark, optimizing original code
Except I realize I'm supposed to be testing with pure ASCII data, since vlad's test doesn't do any chunking.  With that, on mrbkap's laptop (with SSE), I see:

original:


inBufSize: 2300000
                orig: 8.06885 ms
                 chk: 2.68913 ms
          dumb_ascii: 2.11991 ms
                 sse: 2.10505 ms
           sse_ascii: 1.70828 ms


with my patch:

inBufSize: 2300000
                orig: 2.92254 ms
                 chk: 2.52946 ms
          dumb_ascii: 2.0997 ms
                 sse: 1.9211 ms
           sse_ascii: 1.64253 ms
At the risk of further cluttering the bug with benchmarks and attachments ...

I've been working on further optimizing the sse variation by eliminating the dual loop structure (i.e. combining the scanning loop and the processing loop).

What I came up with is "sseoneloop".  It scans and converts at the same time as long as it finds only ASCII.  Once it finds non-ASCII it calls ConvertReal.  Right now it calls ConvertReal to do the whole thing so theoretically it could be worse than orig (if you had a bunch of ASCII text with one multibyte character at the very end, runtime would be approximately sseoneloop + orig) but it's fairly simple to just call ConvertReal on what remains (given that you already have a nice character boundary).

I'll clean up the code and attach it in patch form in a bit.  As for numbers:

orig, chk, dumb_ascii, and sse_ascii are all what they are in vlad's benchmark.  dbaron is dbaron's code.  sseoneloop is as above.

Benchmark results: x86-64 linux
inBufSize: 3073578
                orig: 30.6071 ms
                 chk: 18.392 ms
          dumb_ascii: 14.9552 ms
              dbaron: 20.4106 ms
                 sse: 8.57576 ms
          sseoneloop: 4.90909 ms
           sse_ascii: 5.37327 ms

I find it interesting that all of your numbers seem to come out with sse about equal to chk when every time I run it it runs almost twice as fast as chk.  Also, I don't believe that there are even two significant figures in these results.  From what I've seen the results fluctuate in a roughly one millisecond interval.
Created attachment 396042 [details] [diff] [review]
Patch to benchmark adding Covert_dbaron and Convert_sseoneloop

This patch adds dbaron's code to the benchmark and adds sseoneloop as described in my previous comment.
If you're on a 32 bit system you'll probably want to add u8u16 back in after you apply that patch, as I forgot to unremove it before diffing.
And with -O2

inBufSize: 3073578
                orig: 14.5715 ms
                 chk: 6.72878 ms
          dumb_ascii: 5.47907 ms
              dbaron: 6.89872 ms
                 sse: 5.41505 ms
          sseoneloop: 0.82828 ms
           sse_ascii: 4.41576 ms

And -O3

inBufSize: 3073578
                orig: 14.3186 ms
                 chk: 6.87903 ms
          dumb_ascii: 5.33305 ms
              dbaron: 6.39502 ms
                 sse: 4.88152 ms
          sseoneloop: 0.706781 ms
           sse_ascii: 3.99238 ms
Created attachment 396067 [details] [diff] [review]
Patch adding dbaron and sseoneloop to standalone benchmark

Ignore the last optimized benchmarks, I was missing a piece of the routine (it was just throwing away its output and the compiler knew that).

Corrected benchmarks:
-O0
inBufSize: 3073578
                orig: 30.5634 ms
                 chk: 18.0327 ms
          dumb_ascii: 14.9367 ms
              dbaron: 20.4797 ms
                 sse: 8.27054 ms
          sseoneloop: 5.60232 ms
           sse_ascii: 5.32999 ms

-O2
inBufSize: 3073578
                orig: 14.182 ms
                 chk: 5.98159 ms
          dumb_ascii: 5.13692 ms
              dbaron: 6.23665 ms
                 sse: 4.77082 ms
          sseoneloop: 3.89594 ms
           sse_ascii: 3.86942 ms

-O3
inBufSize: 3073578
                orig: 14.1013 ms
                 chk: 6.66146 ms
          dumb_ascii: 4.79984 ms
              dbaron: 6.18853 ms
                 sse: 4.75576 ms
          sseoneloop: 3.81217 ms
           sse_ascii: 3.801 ms

Sorry for the incorrect numbers earlier.  I should have realized it's absurd for it to perform faster than the version that doesn't even check to see if there are multibyte characters.  The overhead of checking for those characters is incredibly small though, only 10 microseconds at -O3 according to this.
Attachment #396042 - Attachment is obsolete: true
Kyle, thanks for your help!  It looks like a combination of dbaron's work (for when we have multibyte chars) and yours (for optimizing the common case of where we don't) is what we want.  I'll cook up a patch for the actual code later on today.
Attachment #395743 - Attachment is obsolete: true
Attachment #395743 - Flags: superreview?(jst)
Attachment #395743 - Flags: review?(mmoy)
(In reply to comment #34)
> Kyle, thanks for your help!  It looks like a combination of dbaron's work (for
> when we have multibyte chars) and yours (for optimizing the common case of
> where we don't) is what we want.  I'll cook up a patch for the actual code
> later on today.

No problem.  Sounds great.  If you don't get around to it and want me to turn it into a patch just let me know.  I probably couldn't get around to it until tomorrow, but would be happy to do it if wanted.
Created attachment 396576 [details] [diff] [review]
final patch

Ok, here is what I think is a good final patch.  It includes dbaron's optimization as the core piece, and then two customized ascii run converters, one for SSE2, and one for ARM.

Here's x86, with the SSE code...

ascii:
                orig: 10.4181 ms (ok)
              dbaron: 4.79316 ms (ok)
               final: 2.34829 ms (ok)

utf8 content (wikipedia utf8 page, so short html runs and long utf8 runs):

                orig: 15.1058 ms (ok)
              dbaron: 12.4791 ms (ok)
               final: 13.4203 ms (ok)

so a little slower than the simple loop, due to a few extra bits of code for short runs.

ARM:

ascii:
                orig: 164.045 ms (ok) 006d 006f 0064 006f
               final: 80.1377 ms (ok) 006d 006f 0064 006f

utf8:

                orig: 177.403 ms (ok) 006f 006e 0069 0063
             dbaron2: 117.875 ms (ok) 006f 006e 0069 0063

(3MB in all cases)
Attachment #396576 - Flags: superreview?(jst)
Attachment #396576 - Flags: review?(jst)
Assignee: nobody → vladimir
OS: Mac OS X → All
Hardware: x86 → All

Comment 37

8 years ago
Created attachment 397227 [details] [diff] [review]
Make the UCS conversion faster for the non-ascii part

Some nits:
Instead of  if ((*src & 0x80) == 0), one can also do if (*src >= 0).
Also instead of if ((*in & 0xE0) == 0xC0), testing bits with if (*in & 0x40) is also faster.
The first is sometimes (but not always!) done by a smart optimizing compiler, but the second not.

Attached patch reduces the loop part for non-ascii from 88 ASM lines to 71 lines, for example from:
; Line 152
	mov	al, cl
	and	al, 224					; 000000e0H
	cmp	al, 192					; 000000c0H
	jne	SHORT $LN25@Convert
; Line 155
	and	ecx, 31					; 0000001fH
	shl	ecx, 6
	mov	DWORD PTR [edx+12], ecx
; Line 156
	mov	BYTE PTR [edx+16], 1
; Line 157
	mov	BYTE PTR [edx+17], 2
	jmp	$LN30@Convert

to:
; Line 161
	test	cl, 32					; 00000020H
	jne	SHORT $LN25@Convert
; Line 164
	and	ecx, 31					; 0000001fH
	shl	ecx, 6
; Line 165
	mov	BYTE PTR [edx+17], 2
	jmp	SHORT $LN45@Convert
Vlad, shouldn't it be "include <emmintrin.h>"?

Also, are we handling support for this on linux at a later date? (presumably since there's no easy way to get at __sse2_available)

Other than that looks good for me.  I built firefox with it on windows 7 and I think it loads noticeably faster though I haven't done any quantitative measurements.

Comment 39

8 years ago
We have a few functions already in the tree that determine SSE2 functionality on Linux, e.g. CheckForSSE2() in jstracer.cpp and sse2_available() in qcms, so Linux on x86 shouldn't be a problem.

As I hinted in comment 24, at least the Linux x86_64 case could very well be covered by MAC_SSE2.
Yeah, the issue with linux isn't getting it to work, it's figuring out where to put the detection code.  I could just copy sse2_available somewhere, but was hoping to end up with a libxul-global spot... but maybe that's not feasible.

I missed comment 24 -- I'll change the ifdefs to __SSE2__.  The further optimizations in comment 37 we should also take, though merging the patches is going to become tricky; Alfred, can you remerge after I land the ascii bit?
(In reply to comment #37)
> Instead of  if ((*src & 0x80) == 0), one can also do if (*src >= 0).

You can't do this; you're assuming than |char| == |signed char|, but |char| can be either |signed char| or |unsigned char|.  (And I thought it was usually |unsigned char|.)

> Also instead of if ((*in & 0xE0) == 0xC0), testing bits with if (*in & 0x40) is
> also faster.

But not equivalent, since you're not checking that that the bit below is unset (i.e., you want to require (*in & 0x40) && !(*in & 0x20)), which probably leads to security bugs.
(In reply to comment #41)
> But not equivalent, since you're not checking that that the bit below is unset
> (i.e., you want to require (*in & 0x40) && !(*in & 0x20)), which probably leads
> to security bugs.

Er, actually, given the order you check it, it is ok, so never mind.
(In reply to comment #38)
> Also, are we handling support for this on linux at a later date? (presumably
> since there's no easy way to get at __sse2_available)

It actually looks like there are a few tricky things about this, but I filed bug 513422 on some changes that I think will get us a bit closer.

Updated

8 years ago
Attachment #396576 - Flags: superreview?(jst)
Attachment #396576 - Flags: superreview+
Attachment #396576 - Flags: review?(jst)
Attachment #396576 - Flags: review+
Comment on attachment 396576 [details] [diff] [review]
final patch

+#if defined(MAC_SSE2) || defined(WIN_SSE2)
+
+static inline void
+Convert_ascii_run (const char *&src,
+                   PRUnichar *&dst,
+                   PRInt32 len)
+{
+  if (__sse2_available && len > 15) {

Reversing the order of those two checks would avoid checking if sse2 is available for short strings. Not a big deal, but would save us a few instructions when converting short strings.

r+sr=jst
http://hg.mozilla.org/mozilla-central/rev/00d2664f70d4

Checked in with the check reversed as suggested.
Status: NEW → RESOLVED
Last Resolved: 8 years ago
Resolution: --- → FIXED
backed out due to maybe-orange, and I gotta sleep so can't watch the second cycle.  will reland tomorrow if it ends up being not this patch's fault
Status: RESOLVED → REOPENED
Resolution: FIXED → ---

Comment 47

8 years ago
re #40: I will look at my part after this is landed safely.
re #41: we can always force the char pointer to be signed char within the function to be sure about the matching. Secondly testing &0x40 is indeed not the same, but in my patch first the high bit is checked and the next bit.

Comment 48

8 years ago
Created bug 514140 for the non-ascii part
Blocks: 514140
(In reply to comment #46)
> backed out due to maybe-orange, and I gotta sleep so can't watch the second
> cycle.  will reland tomorrow if it ends up being not this patch's fault

These were all known intermittent oranges.
Maybe a silly question at this point in the game given that you just relanded this, but what optimizations option are we using on this now?
Whatever the default is, which I think is -Os -- we should probably switch to at least -O2 here, but that can be a separate bug.
Status: REOPENED → RESOLVED
Last Resolved: 8 years ago8 years ago
Resolution: --- → FIXED
Created attachment 398357 [details] [diff] [review]
VC7.1 bustage fix
Attachment #398357 - Flags: review?(vladimir)
Attachment #398357 - Flags: review?(vladimir) → review+
Comment on attachment 398357 [details] [diff] [review]
VC7.1 bustage fix

Pushed changeset 8bb68d6639b7 to mozilla-central.
Depends on: 514624

Comment 54

8 years ago
Note, it seems that nsUnicharInputStream doesn't use this converter, 
but the converter from  ConvertUTF8toUTF16 (xpcom/string/public/nsUTF8Utils.h)
Also the  CountValidUTF8Bytes function is not really needed as one can easily make the safe assumption that destLen = srcLen + 1
(see http://mxr.mozilla.org/mozilla-central/source/intl/uconv/src/nsUTF8ToUnicode.cpp#113).
The easiest way would be to change nsUnicharInputStream to use the nsUTF8ToUnicode converter, instead the one in nsUTF8Utils.h.

ConvertUTF8toUTF16 is also used in:
# xpcom/string/src/nsReadableUtils.cpp
# toolkit/xre/nsWindowsRestart.cpp

So, we should these also redirect use the optimized nsUTF8toUnicode converter, and remove the ConvertUTF8toUTF16 version completely.
That would mean moving nsUTF8toUnicode out of intl and into xpcom.  All the things you list live in xpcom and can't depend on intl.

I would be all for not having multiple UTF8-to-UTF16 conversion codepaths around...
(In reply to comment #55)

> I would be all for not having multiple UTF8-to-UTF16 conversion codepaths
> around...

A hearty "me too"!! Note however that there are some functional differences between the two converters: ConvertUTF8toUTF16 is less fault-tolerant than nsUTF8toUnicode, and can only safely be used when the input is known to be valid UTF-8. 

See also bug 497204

Updated

8 years ago
Blocks: 459117

Updated

8 years ago
Blocks: 515969

Updated

8 years ago
Blocks: 515956
Not blocking, but if a roll-up patch were to be nominated, we should take it.
Flags: wanted1.9.2+
Flags: blocking1.9.2?
Flags: blocking1.9.2-
file 548664 for neon support
Blocks: 548664
Depends on: 585538
You need to log in before you can comment on or make changes to this bug.