Closed
Bug 506430
Opened 15 years ago
Closed 15 years ago
Optimize UTF8 to UTF16 conversion
Categories
(Core :: XPCOM, defect)
Core
XPCOM
Tracking
()
RESOLVED
FIXED
People
(Reporter: vlad, Assigned: vlad)
References
(Blocks 1 open bug)
Details
(Whiteboard: [ts])
Attachments
(9 files, 2 obsolete files)
1.90 KB,
patch
|
Details | Diff | Splinter Review | |
897.29 KB,
text/plain
|
Details | |
4.99 KB,
patch
|
Details | Diff | Splinter Review | |
14.50 KB,
text/plain
|
Details | |
1.99 KB,
patch
|
Details | Diff | Splinter Review | |
9.64 KB,
patch
|
Details | Diff | Splinter Review | |
6.60 KB,
patch
|
jst
:
review+
jst
:
superreview+
|
Details | Diff | Splinter Review |
4.90 KB,
patch
|
Details | Diff | Splinter Review | |
487 bytes,
patch
|
vlad
:
review+
|
Details | Diff | Splinter Review |
ConvertUTF8_to_UTF16 shows up at about 5-8% of warm start time. Our implementation is straightforward C code; I bet we can find a much faster implementation using SSE, NEON, whatever's available.
Assignee | ||
Updated•15 years ago
|
Whiteboard: [ts]
There is u8u16, but I don't think it's license is GPL-compatible.
Assignee | ||
Comment 2•15 years ago
|
||
Quick and dirty test: inBufSize: 2155589 Convert_1: 4.78326 ms Convert_dumb: 2.2116 ms Convert_dumb2: 2.8301 ms Convert_u8u16: 1.90388 ms the input is all ASCII, about 2MB worth. * Convert_1 is our current routine. * Convert_dumb just does u8->u16 straight; it's not a true utf8->utf16 conversion. * Convert_dumb2 walks the input string and looks for any high bits; if no high bits are set, it does a straight u8->u16 conversion, otherwise it calls Convert_1 (so it basically times the overhead of checking for any high bits set). * Convert_u8u16 uses u8u16. Convert_dumb could be accelerated even more using SSE; my impl is just a straight C impl. I'd suggest that we take the dumb2 approach in our current code and keep searching for something faster.
Assignee | ||
Comment 3•15 years ago
|
||
Er, sorry: Convert_1: 7.22484 ms Convert_dumb: 2.18996 ms Convert_dumb2: 2.84106 ms Convert_u8u16: 1.8791 ms The 4.7ms number was with some hacks applied.
Assignee | ||
Comment 4•15 years ago
|
||
Convert_1: 6.70116 ms Convert_2: 2.73848 ms Convert_dumb: 2.12909 ms Convert_dumb2: 2.73505 ms Convert_u8u16: 1.77669 ms Here's an impl of Convert_2. Note that I'm trying to optimize for ascii here, and also that this isn't touching utf8->utf16 that happens with js, and maybe not for xbl/xml. Not sure. This needs some SSE2 love, and it also should probably do something with the work it did to figure out which initial parts of a string have no non-ascii chars by doing the fast copy for that segment...
Assignee | ||
Updated•15 years ago
|
blocking1.9.1: needed → ---
Flags: blocking1.9.2?
Comment 5•15 years ago
|
||
Maybe mmoy can help?
Comment 6•15 years ago
|
||
I ran a comparison of char to short using c code and sse2 intrinsics and came up with 625 for c and 484 for intrinsics. This was with the best case of input and output aligned 16. To get this to work in the real world, there would have to be some special code at the beginning and end to handle the alignment stuff and this would consume some cycles. I can code up something to do the high-bit testing too just to get comparison times. A few questions: - Can I assume that in/out are aligned on dwords? - What is the typical number of characters to be processed?
Assignee | ||
Comment 7•15 years ago
|
||
(In reply to comment #6) > I ran a comparison of char to short using c code and sse2 intrinsics and came > up with 625 for c and 484 for intrinsics. This was with the best case of input > and output aligned 16. To get this to work in the real world, there would have > to be some special code at the beginning and end to handle the alignment stuff > and this would consume some cycles. > > I can code up something to do the high-bit testing too just to get comparison > times. Great! > A few questions: > > - Can I assume that in/out are aligned on dwords? I /think/ so. If not, we may be able to change things around so that it's always true. > - What is the typical number of characters to be processed? Good question. Someone should do some instrumentation and figure that out... would probably be helpful to know distribution of run lengths, and also for each run how many ascii chars before you hit a char with the high bit set.
Comment 8•15 years ago
|
||
(In reply to comment #0) > ConvertUTF8_to_UTF16 shows up at about 5-8% of warm start time. Our > implementation is straightforward C code; I bet we can find a much faster > implementation using SSE, NEON, whatever's available. How much data is being converted? 5% of 900ms is 45ms which would be like 12MB of data, which seems like a lot of text... Is all of that unique text or are we converting the same text more than once?
Comment 9•15 years ago
|
||
> How much data is being converted?
Well, let's see. Quickly instrumenting the sync case in nsXBLService::FetchBindingDocument shows us loading about 630KB of UTF-8 encoded XBL at startup.
Doing the same for nsStreamLoader::OnDataAvailable (which should really only be hit for CSS and maybe scripts) I see us loading about 1.1MB of data that way.
Then there are the actual XUL documents, of course, not to mention whatever internal conversions we might be doing.
Comment 10•15 years ago
|
||
About 900k worth, nothing obviously duplicated. I don't know if any of the stuff is unnecessary, but there sure is a lot of it.
Updated•15 years ago
|
Attachment #391128 -
Attachment mime type: application/octet-stream → text/plain
Comment 11•15 years ago
|
||
I instrumented an image with the patch and it came up with 748671 bytes converted. But doing a lot of other things generates conversions too. As far as startup time goes, is precomputing the converted text a possibility?
Assignee | ||
Comment 12•15 years ago
|
||
Possibly; I looked into doing a quick-and-dirty iconv conversion of the relevant files, but that caused other problems since that switches the default charset of a bunch of documents (with a utf16 bom). There's also additional IO cost. But if someone can think of a way to do it without breaking anything, it'd be good to evaluate that cost.
Assignee | ||
Comment 13•15 years ago
|
||
However, a lot of this is JS modules and XBL, which we want to fastload...
Comment 14•15 years ago
|
||
I have an SSE2 implementation of the one to two byte conversion working but it uses MOVDQU to write the output doublequadwords. MOVDQU is an unaligned write so there's a performance penalty there. I did some additional instrumenting and it looks like both source and destination are doubleword aligned which means that I can add code to do aligned writes with a little effort. I have to add some code to turn this on for Mac OS X and Windows too.
Comment 15•15 years ago
|
||
This patch has been tested on Mac OS X and Windows and it appears to work (returns correct results). There is special code to handle the cases where the offset between the source and destination is 0 or 8 after the source has been double quadword aligned. These two cases are seen about 99% of the time. Other alignments are handled by the movdqu (unaligned move) default code. Could someone run a performance test to determine if this code helps?
Assignee | ||
Comment 16•15 years ago
|
||
Warm start on my fast win7 box went from 343-344ms down to 332-334ms. Not a huge boost, but noticeable. I also added a block at the start of the high-bit-set checking to align the pointer to 4 bytes, so that we don't blow up on ARM. However... after just doing a bunch more startups, I started seeing some 342's again. Now I'm starting to doubt my measurement accuracy; I'll try this again on my slower Mac tomorrow. Also, u8u16 is really crazy fast! I'll plug your code into the hack benchmark that I wrote tomorrow as well to see how it compares.
Assignee | ||
Comment 17•15 years ago
|
||
On my mac -- goes from 825-843 to 780-790. I'll take 5-6%!
You could also probably use _mm_movemask_pi8 instead of & 0x80808080 to pick up another couple percent.
Assignee | ||
Comment 19•15 years ago
|
||
Just added the sse code in to the benchmark -- it gets us in the right ballpark, g++ -O2: orig: 8.06701 ms chk: 2.87505 ms dumb_ascii: 2.26799 ms sse: 2.86604 ms sse_ascii: 2.23546 ms u8u16: 1.8919 ms "chk" is basically the same as the sse case, where it checks for high byte, but then instead of using sse it just does a simple byte-to-short copy loop. The two _ascii variants just do straight copies and don't do the high bit check. Kinda unfortunate that the sse variant is basically the same speed. But, it doesn't hurt. So, what I'm trying to say: let's get this patch in (but with the additional alignment fix). I've also attached the benchmark; apologies for horrible code. Some more bits: g++ -O2 -mtune=prescott (gcc 4.0.1): orig: 6.72993 ms chk: 2.87276 ms dumb_ascii: 2.23084 ms sse: 2.90849 ms sse_ascii: 2.23746 ms u8u16: 1.8933 ms similar for g++-4.2 with -mtune=core2. g++ -O3 -mtune=prescott (gcc 4.0.1): orig: 7.47574 ms chk: 2.87053 ms dumb_ascii: 2.23784 ms sse: 2.90917 ms sse_ascii: 2.24879 ms u8u16: 1.91424 ms (yes, slower than -O2, even worse slowdown with g++-4.2) g++ -Os -mtune=prescott: orig: 9.78382 ms chk: 2.86463 ms dumb_ascii: 2.25211 ms sse: 3.0451 ms sse_ascii: 2.33603 ms u8u16: 2.01193 ms
Assignee | ||
Comment 20•15 years ago
|
||
Just the 4-byte alignment at the start in this patch. Let's get this in, and someone can write the followup bug for linux (really, the only thing that's missing is having sse2_available equivalent defined somewhere on linux).
Attachment #395743 -
Flags: superreview?
Attachment #395743 -
Flags: review?(mmoy)
Assignee | ||
Updated•15 years ago
|
Attachment #395743 -
Attachment is patch: true
Attachment #395743 -
Attachment mime type: application/octet-stream → text/plain
Attachment #395743 -
Flags: superreview? → superreview?(jst)
Assignee | ||
Comment 21•15 years ago
|
||
And while I'm here, here's the benchmark for real utf8 data (longer than the earlier set, so they're not directly comparable... this is 3145728 bytes, original was 2155589): g++ -O2: orig: 15.4942 ms chk: 15.4288 ms dumb_ascii: 3.77464 ms sse: 15.4654 ms sse_ascii: 3.78217 ms u8u16: 7.89827 ms g++ -O2 -mtune=prescott: orig: 14.4634 ms chk: 14.2659 ms dumb_ascii: 3.78667 ms sse: 14.2653 ms sse_ascii: 3.77649 ms u8u16: 7.92226 ms
Replacing "& 0x80808080" with "_mm_movemask_epi8(*(__m128i *) src)" gives me the following results on the benchmark (I used my own pure ASCII text, so obviously the results aren't directly comparable, and I commented out u8u16 since it doesn't seem to want to build on x64) Before kshuey@linux-5wgr:~/Downloads> ./t.o inBufSize: 512262 orig: 5.40125 ms chk: 2.95589 ms dumb_ascii: 2.64377 ms sse: 1.31553 ms sse_ascii: 0.82156 ms After kshuey@linux-5wgr:~/Downloads> ./t.o inBufSize: 512262 orig: 5.54781 ms chk: 2.9853 ms dumb_ascii: 2.5621 ms sse: 0.994294 ms sse_ascii: 0.81584 ms That works out to a relative savings of 25%. Given your number of 5-6% of startup time overall that shaves off a little over 1% more.
A few comments: A code comment that said something like "Use the real converter if there are any bytes with the high bit set, otherwise just expand ASCII to UTF-16" might have saved me a few minutes of head-scratching. The "return NS_OK" at the end doesn't seem like it's equivalent to what the other code does; don't you need to return NS_OK_UDEC_MOREOUTPUT in some cases? Finally, the UTF8-UTF16 conversion code in xpcom/string has been optimized a good bit more than this code (although it's also a little less tolerant of errors, since it's designed for "sanitized" data... although I think that's actually mostly been fixed recently). I suspect ConvertReal could be made a good bit faster; one thing I might try would be loading the member variables into locals at the start of the function and saving them back into the member variables at the end.
Comment 24•15 years ago
|
||
Shouldn't the |#endif ! | be |#endif // | instead? Why are you using two different *_SSE2 macros when you are really only using them together? Are you planning to address Linux (or in general, other GCC platforms) in a similar way (in another bug)? I hope I adapted your benchmark correctly for my Linux x86_64 system (taking the MAC_SSE2 route without u8u16), I get these numbers for comparison inBufSize: 918825 orig: 3.12586 ms chk: 2.68073 ms dumb_ascii: 0.815924 ms sse: 2.68309 ms sse_ascii: 0.446675 ms
Comment 25•15 years ago
|
||
I think that your patch should not define MAC_SSE2. gcc defines __SSE2__ for MacOS X and x86_64, so you should use __SSE2__ instead of your MacOS's define.
So I did some testing of vlad's benchmark from attachment 395742 [details] using attachment 391128 [details] as lipsum.txt. My understanding is that the changes in this patch are supposed to move us from "orig" to "sse". On my laptop (Linux, x86-64, Thinkpad), I see: orig: 2.97091 ms chk: 2.73285 ms dumb_ascii: 0.91355 ms sse: 2.74453 ms sse_ascii: 0.83265 ms on mrbkap's laptop (x86 Mac), I actually see chk and sse being slightly slower than orig (I guess the cost of the extra pass varies). Now, I wrote a patch to the original code to: * not use member variables during the conversion * add a nested inner loop and on my laptop I now get: orig: 1.34414 ms (I think these numbers only really have two significant figures, though.) So I think maybe it would be better to optimize the original code and avoid the two-pass approach.
Except I realize I'm supposed to be testing with pure ASCII data, since vlad's test doesn't do any chunking. With that, on mrbkap's laptop (with SSE), I see: original: inBufSize: 2300000 orig: 8.06885 ms chk: 2.68913 ms dumb_ascii: 2.11991 ms sse: 2.10505 ms sse_ascii: 1.70828 ms with my patch: inBufSize: 2300000 orig: 2.92254 ms chk: 2.52946 ms dumb_ascii: 2.0997 ms sse: 1.9211 ms sse_ascii: 1.64253 ms
At the risk of further cluttering the bug with benchmarks and attachments ... I've been working on further optimizing the sse variation by eliminating the dual loop structure (i.e. combining the scanning loop and the processing loop). What I came up with is "sseoneloop". It scans and converts at the same time as long as it finds only ASCII. Once it finds non-ASCII it calls ConvertReal. Right now it calls ConvertReal to do the whole thing so theoretically it could be worse than orig (if you had a bunch of ASCII text with one multibyte character at the very end, runtime would be approximately sseoneloop + orig) but it's fairly simple to just call ConvertReal on what remains (given that you already have a nice character boundary). I'll clean up the code and attach it in patch form in a bit. As for numbers: orig, chk, dumb_ascii, and sse_ascii are all what they are in vlad's benchmark. dbaron is dbaron's code. sseoneloop is as above. Benchmark results: x86-64 linux inBufSize: 3073578 orig: 30.6071 ms chk: 18.392 ms dumb_ascii: 14.9552 ms dbaron: 20.4106 ms sse: 8.57576 ms sseoneloop: 4.90909 ms sse_ascii: 5.37327 ms I find it interesting that all of your numbers seem to come out with sse about equal to chk when every time I run it it runs almost twice as fast as chk. Also, I don't believe that there are even two significant figures in these results. From what I've seen the results fluctuate in a roughly one millisecond interval.
This patch adds dbaron's code to the benchmark and adds sseoneloop as described in my previous comment.
If you're on a 32 bit system you'll probably want to add u8u16 back in after you apply that patch, as I forgot to unremove it before diffing.
And with -O2 inBufSize: 3073578 orig: 14.5715 ms chk: 6.72878 ms dumb_ascii: 5.47907 ms dbaron: 6.89872 ms sse: 5.41505 ms sseoneloop: 0.82828 ms sse_ascii: 4.41576 ms And -O3 inBufSize: 3073578 orig: 14.3186 ms chk: 6.87903 ms dumb_ascii: 5.33305 ms dbaron: 6.39502 ms sse: 4.88152 ms sseoneloop: 0.706781 ms sse_ascii: 3.99238 ms
Ignore the last optimized benchmarks, I was missing a piece of the routine (it was just throwing away its output and the compiler knew that). Corrected benchmarks: -O0 inBufSize: 3073578 orig: 30.5634 ms chk: 18.0327 ms dumb_ascii: 14.9367 ms dbaron: 20.4797 ms sse: 8.27054 ms sseoneloop: 5.60232 ms sse_ascii: 5.32999 ms -O2 inBufSize: 3073578 orig: 14.182 ms chk: 5.98159 ms dumb_ascii: 5.13692 ms dbaron: 6.23665 ms sse: 4.77082 ms sseoneloop: 3.89594 ms sse_ascii: 3.86942 ms -O3 inBufSize: 3073578 orig: 14.1013 ms chk: 6.66146 ms dumb_ascii: 4.79984 ms dbaron: 6.18853 ms sse: 4.75576 ms sseoneloop: 3.81217 ms sse_ascii: 3.801 ms Sorry for the incorrect numbers earlier. I should have realized it's absurd for it to perform faster than the version that doesn't even check to see if there are multibyte characters. The overhead of checking for those characters is incredibly small though, only 10 microseconds at -O3 according to this.
Attachment #396042 -
Attachment is obsolete: true
Assignee | ||
Comment 34•15 years ago
|
||
Kyle, thanks for your help! It looks like a combination of dbaron's work (for when we have multibyte chars) and yours (for optimizing the common case of where we don't) is what we want. I'll cook up a patch for the actual code later on today.
Assignee | ||
Updated•15 years ago
|
Attachment #395743 -
Attachment is obsolete: true
Attachment #395743 -
Flags: superreview?(jst)
Attachment #395743 -
Flags: review?(mmoy)
(In reply to comment #34) > Kyle, thanks for your help! It looks like a combination of dbaron's work (for > when we have multibyte chars) and yours (for optimizing the common case of > where we don't) is what we want. I'll cook up a patch for the actual code > later on today. No problem. Sounds great. If you don't get around to it and want me to turn it into a patch just let me know. I probably couldn't get around to it until tomorrow, but would be happy to do it if wanted.
Assignee | ||
Comment 36•15 years ago
|
||
Ok, here is what I think is a good final patch. It includes dbaron's optimization as the core piece, and then two customized ascii run converters, one for SSE2, and one for ARM. Here's x86, with the SSE code... ascii: orig: 10.4181 ms (ok) dbaron: 4.79316 ms (ok) final: 2.34829 ms (ok) utf8 content (wikipedia utf8 page, so short html runs and long utf8 runs): orig: 15.1058 ms (ok) dbaron: 12.4791 ms (ok) final: 13.4203 ms (ok) so a little slower than the simple loop, due to a few extra bits of code for short runs. ARM: ascii: orig: 164.045 ms (ok) 006d 006f 0064 006f final: 80.1377 ms (ok) 006d 006f 0064 006f utf8: orig: 177.403 ms (ok) 006f 006e 0069 0063 dbaron2: 117.875 ms (ok) 006f 006e 0069 0063 (3MB in all cases)
Attachment #396576 -
Flags: superreview?(jst)
Attachment #396576 -
Flags: review?(jst)
Assignee | ||
Updated•15 years ago
|
Assignee: nobody → vladimir
OS: Mac OS X → All
Hardware: x86 → All
Comment 37•15 years ago
|
||
Some nits: Instead of if ((*src & 0x80) == 0), one can also do if (*src >= 0). Also instead of if ((*in & 0xE0) == 0xC0), testing bits with if (*in & 0x40) is also faster. The first is sometimes (but not always!) done by a smart optimizing compiler, but the second not. Attached patch reduces the loop part for non-ascii from 88 ASM lines to 71 lines, for example from: ; Line 152 mov al, cl and al, 224 ; 000000e0H cmp al, 192 ; 000000c0H jne SHORT $LN25@Convert ; Line 155 and ecx, 31 ; 0000001fH shl ecx, 6 mov DWORD PTR [edx+12], ecx ; Line 156 mov BYTE PTR [edx+16], 1 ; Line 157 mov BYTE PTR [edx+17], 2 jmp $LN30@Convert to: ; Line 161 test cl, 32 ; 00000020H jne SHORT $LN25@Convert ; Line 164 and ecx, 31 ; 0000001fH shl ecx, 6 ; Line 165 mov BYTE PTR [edx+17], 2 jmp SHORT $LN45@Convert
Vlad, shouldn't it be "include <emmintrin.h>"? Also, are we handling support for this on linux at a later date? (presumably since there's no easy way to get at __sse2_available) Other than that looks good for me. I built firefox with it on windows 7 and I think it loads noticeably faster though I haven't done any quantitative measurements.
Comment 39•15 years ago
|
||
We have a few functions already in the tree that determine SSE2 functionality on Linux, e.g. CheckForSSE2() in jstracer.cpp and sse2_available() in qcms, so Linux on x86 shouldn't be a problem. As I hinted in comment 24, at least the Linux x86_64 case could very well be covered by MAC_SSE2.
Assignee | ||
Comment 40•15 years ago
|
||
Yeah, the issue with linux isn't getting it to work, it's figuring out where to put the detection code. I could just copy sse2_available somewhere, but was hoping to end up with a libxul-global spot... but maybe that's not feasible. I missed comment 24 -- I'll change the ifdefs to __SSE2__. The further optimizations in comment 37 we should also take, though merging the patches is going to become tricky; Alfred, can you remerge after I land the ascii bit?
(In reply to comment #37) > Instead of if ((*src & 0x80) == 0), one can also do if (*src >= 0). You can't do this; you're assuming than |char| == |signed char|, but |char| can be either |signed char| or |unsigned char|. (And I thought it was usually |unsigned char|.) > Also instead of if ((*in & 0xE0) == 0xC0), testing bits with if (*in & 0x40) is > also faster. But not equivalent, since you're not checking that that the bit below is unset (i.e., you want to require (*in & 0x40) && !(*in & 0x20)), which probably leads to security bugs.
(In reply to comment #41) > But not equivalent, since you're not checking that that the bit below is unset > (i.e., you want to require (*in & 0x40) && !(*in & 0x20)), which probably leads > to security bugs. Er, actually, given the order you check it, it is ok, so never mind.
(In reply to comment #38) > Also, are we handling support for this on linux at a later date? (presumably > since there's no easy way to get at __sse2_available) It actually looks like there are a few tricky things about this, but I filed bug 513422 on some changes that I think will get us a bit closer.
Updated•15 years ago
|
Attachment #396576 -
Flags: superreview?(jst)
Attachment #396576 -
Flags: superreview+
Attachment #396576 -
Flags: review?(jst)
Attachment #396576 -
Flags: review+
Comment 44•15 years ago
|
||
Comment on attachment 396576 [details] [diff] [review] final patch +#if defined(MAC_SSE2) || defined(WIN_SSE2) + +static inline void +Convert_ascii_run (const char *&src, + PRUnichar *&dst, + PRInt32 len) +{ + if (__sse2_available && len > 15) { Reversing the order of those two checks would avoid checking if sse2 is available for short strings. Not a big deal, but would save us a few instructions when converting short strings. r+sr=jst
Assignee | ||
Comment 45•15 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/00d2664f70d4 Checked in with the check reversed as suggested.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 46•15 years ago
|
||
backed out due to maybe-orange, and I gotta sleep so can't watch the second cycle. will reland tomorrow if it ends up being not this patch's fault
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 47•15 years ago
|
||
re #40: I will look at my part after this is landed safely. re #41: we can always force the char pointer to be signed char within the function to be sure about the matching. Secondly testing &0x40 is indeed not the same, but in my patch first the high bit is checked and the next bit.
Comment 49•15 years ago
|
||
(In reply to comment #46) > backed out due to maybe-orange, and I gotta sleep so can't watch the second > cycle. will reland tomorrow if it ends up being not this patch's fault These were all known intermittent oranges.
Maybe a silly question at this point in the game given that you just relanded this, but what optimizations option are we using on this now?
Assignee | ||
Comment 51•15 years ago
|
||
Whatever the default is, which I think is -Os -- we should probably switch to at least -O2 here, but that can be a separate bug.
Status: REOPENED → RESOLVED
Closed: 15 years ago → 15 years ago
Resolution: --- → FIXED
Comment 52•15 years ago
|
||
Attachment #398357 -
Flags: review?(vladimir)
Assignee | ||
Updated•15 years ago
|
Attachment #398357 -
Flags: review?(vladimir) → review+
Comment 53•15 years ago
|
||
Comment on attachment 398357 [details] [diff] [review] VC7.1 bustage fix Pushed changeset 8bb68d6639b7 to mozilla-central.
Depends on: 514624
Comment 54•15 years ago
|
||
Note, it seems that nsUnicharInputStream doesn't use this converter, but the converter from ConvertUTF8toUTF16 (xpcom/string/public/nsUTF8Utils.h) Also the CountValidUTF8Bytes function is not really needed as one can easily make the safe assumption that destLen = srcLen + 1 (see http://mxr.mozilla.org/mozilla-central/source/intl/uconv/src/nsUTF8ToUnicode.cpp#113). The easiest way would be to change nsUnicharInputStream to use the nsUTF8ToUnicode converter, instead the one in nsUTF8Utils.h. ConvertUTF8toUTF16 is also used in: # xpcom/string/src/nsReadableUtils.cpp # toolkit/xre/nsWindowsRestart.cpp So, we should these also redirect use the optimized nsUTF8toUnicode converter, and remove the ConvertUTF8toUTF16 version completely.
Comment 55•15 years ago
|
||
That would mean moving nsUTF8toUnicode out of intl and into xpcom. All the things you list live in xpcom and can't depend on intl. I would be all for not having multiple UTF8-to-UTF16 conversion codepaths around...
Comment 56•15 years ago
|
||
(In reply to comment #55) > I would be all for not having multiple UTF8-to-UTF16 conversion codepaths > around... A hearty "me too"!! Note however that there are some functional differences between the two converters: ConvertUTF8toUTF16 is less fault-tolerant than nsUTF8toUnicode, and can only safely be used when the input is known to be valid UTF-8. See also bug 497204
Comment 58•15 years ago
|
||
Not blocking, but if a roll-up patch were to be nominated, we should take it.
Flags: wanted1.9.2+
Flags: blocking1.9.2?
Flags: blocking1.9.2-
Comment 59•14 years ago
|
||
file 548664 for neon support
Updated•3 years ago
|
Component: String → XPCOM
You need to log in
before you can comment on or make changes to this bug.
Description
•