Closed
Bug 928351
Opened 11 years ago
Closed 11 years ago
Mixing char16_t and wchar_t broke mingw builds
Categories
(Core :: XPCOM, defect)
Tracking
()
RESOLVED
FIXED
mozilla28
People
(Reporter: jacek, Assigned: jacek)
References
(Blocks 1 open bug)
Details
(Whiteboard: [qa-])
Attachments
(2 files, 6 obsolete files)
129.54 KB,
patch
|
Details | Diff | Splinter Review | |
28.78 KB,
patch
|
Details | Diff | Splinter Review |
From bug 895047:
G++ follows standards so it has char16_t, which is different from wchar_t. It would require having tons of casts between win32 API, which uses wchar_t extensivelly, and the rest of code using char16_t.
Looking at the solution for MSVC from the patch:
# define char16_t wchar_t
I'm pretty sure that's not a good idea to maintain. Trying the same with g++ results in having tons of redefinitions for types specialized both for wchar_t and char16_t. This will also cause similar problems on MSVC once (if) it will have proper char16_t support.
Assignee | ||
Comment 2•11 years ago
|
||
Compiling with the same define that is used on MSVC results in errors like:
In file included from /usr/local/i686-w64-mingw32/include/c++/4.8.2/bits/stl_algobase.h:61:0,
from /usr/local/i686-w64-mingw32/include/c++/4.8.2/bits/char_traits.h:39,
from /usr/local/i686-w64-mingw32/include/c++/4.8.2/string:40,
from /home/jacek/mozilla/mozilla-central/mfbt/decimal/Decimal.h:44,
from /home/jacek/mozilla/mozilla-central/mfbt/decimal/Decimal.cpp:31:
/usr/local/i686-w64-mingw32/include/c++/4.8.2/bits/cpp_type_traits.h:184:12: error: redefinition of ‘struct std::__is_integer<wchar_t>’
struct __is_integer<char16_t>
^
/usr/local/i686-w64-mingw32/include/c++/4.8.2/bits/cpp_type_traits.h:175:12: error: previous definition of ‘struct std::__is_integer<wchar_t>’
struct __is_integer<wchar_t>
^
Relevant parts of cpp_type_traits.h are:
# ifdef _GLIBCXX_USE_WCHAR_T
template<>
struct __is_integer<wchar_t>
{
enum { __value = 1 };
typedef __true_type __type;
};
# endif
#if __cplusplus >= 201103L
template<>
struct __is_integer<char16_t>
{
enum { __value = 1 };
typedef __true_type __type;
};
Note that with char16_t defined to wchar_t, this is indeed a duplication.
Component: String → General
Comment 3•11 years ago
|
||
(Please let's keep this in Core::String.)
Can't we just #include <bits/c++config.h> and then #undef _GLIBCXX_USE_WCHAR_T?
Component: General → String
Assignee | ||
Comment 4•11 years ago
|
||
I tried that and got similar error about atomic<> redeclaration:
/// Explicit specialization for wchar_t.
template<>
struct atomic<wchar_t> : public atomic_wchar_t
{
typedef wchar_t __integral_type;
typedef atomic_wchar_t __base_type;
atomic() noexcept = default;
~atomic() noexcept = default;
atomic(const atomic&) = delete;
atomic& operator=(const atomic&) = delete;
atomic& operator=(const atomic&) volatile = delete;
constexpr atomic(__integral_type __i) noexcept : __base_type(__i) { }
using __base_type::operator __integral_type;
using __base_type::operator=;
};
/// Explicit specialization for char16_t.
template<>
struct atomic<char16_t> : public atomic_char16_t
{
typedef char16_t __integral_type;
typedef atomic_char16_t __base_type;
atomic() noexcept = default;
~atomic() noexcept = default;
atomic(const atomic&) = delete;
atomic& operator=(const atomic&) = delete;
atomic& operator=(const atomic&) volatile = delete;
constexpr atomic(__integral_type __i) noexcept : __base_type(__i) { }
using __base_type::operator __integral_type;
using __base_type::operator=;
};
Comment 5•11 years ago
|
||
(In reply to comment #4)
> I tried that and got similar error about atomic<> redeclaration:
Shoot... Is that guarded by a macro? Who #includes <atomic>?
Assignee | ||
Comment 6•11 years ago
|
||
Before I could reproduce the atomic error today, I hit hash<> redeclaration similar to previous ones, which is also not guarded by a macro. It's very early in the build process, so many more problems will probably show later. I do not think that trying to hack around all those is possible without changes to GCC that would disable char16_t type or make it an alias to char16_t (for which a patch could be rejected by GCC anyway, because that's not what spec says).
How about simply using a new type like char16 or Char16_t in Mozilla? I understand that using standard types is nice, but, as you can see, the standard type does not meet requirements. Those tricks for MSVC while may work, change char16_t to something it's not. This way, char16_t is not really char16_t (not in Windows builds) and calling it char16_t is missleading. Could we be explicit about this being a new type?
Comment 7•11 years ago
|
||
I'm not sure I understand. We do the #define for MSVC because it doesn't yet have a standard char16_t. But since mingw does already have that, why wouldn't we use it?
Assignee | ||
Comment 8•11 years ago
|
||
Windows API uses wchar_t in many places and we'd need casts for all of them (it's mostly for pointers, you can't pass char16_t* as wchar_t* argument).
Comment 9•11 years ago
|
||
We discussed this on IRC today at length. I suggested trying to redefine WCHAR to char16_t in mingw builds, and Jacek said he'll try doing that.
Assignee | ||
Comment 10•11 years ago
|
||
I tried that and it's not enough. WCHAR is used in platform headers, but crt ones use wchar_t, so there are new conflicts (because wchar_t != WCHAR). I keep trying to work around that.
Assignee | ||
Comment 11•11 years ago
|
||
After more investigation, I decided that trying to make wchar_t=char16_t is not possible and it's better to try make the code right. Changing ns*String classes to accept both wchar_t and char16_t helps a lot. This, combined with a helper char16ptr_t class that I introduced to have a type that can be implicitly casted to both const char16_t* and const wchar_t* (on other build configs that's just |typedef char16_t char16ptr_t|), makes required changes significally smaller. Still, there are tons of places that need simple changes and casts. I have most of the tree compiling already, but it's still quite a lot of work to do.
Comment 12•11 years ago
|
||
(In reply to comment #11)
> After more investigation, I decided that trying to make wchar_t=char16_t is not
> possible and it's better to try make the code right. Changing ns*String classes
> to accept both wchar_t and char16_t helps a lot. This, combined with a helper
> char16ptr_t class that I introduced to have a type that can be implicitly
> casted to both const char16_t* and const wchar_t* (on other build configs
> that's just |typedef char16_t char16ptr_t|), makes required changes
> significally smaller. Still, there are tons of places that need simple changes
> and casts. I have most of the tree compiling already, but it's still quite a
> lot of work to do.
Can you please upload your WIP patch so that we can give you feedback on it?
Thanks!
Comment 13•11 years ago
|
||
For bug 514173 I'd rather not have to deal with anything other than typeof(MOZ_UTF16("")) == nsAString::char_type[1] etc.
Assignee | ||
Comment 14•11 years ago
|
||
Sure, I'm attaching the parts that do tricks I mentioned earlier. There are still tons of casts needed and this could probably be improved. char16ptr_t may be an unfortunate name... This type should act as close to |const char26_t*| as possible, but accept coth wchar_t and char16_t. Also, it could be useful to have another such type for non-const pointers. There are still some constructs where it can't be a transparent replacement for |const char16_t|... I have more changes in my tree, including tons of type changes, but this is the most important part to limit number of required casts. Feedback is appreciated.
Comment 15•11 years ago
|
||
Comment on attachment 822322 [details] [diff] [review]
WIP
Review of attachment 822322 [details] [diff] [review]:
-----------------------------------------------------------------
Your patch doesn't show where else char16ptr_t is used, but why can't we just modify nsTString.h and nsTSubstring.h to accept wchar_t as well as char16_t on mingw?
::: mfbt/Char16.h
@@ +68,5 @@
>
> +#ifdef MOZ_USE_CHAR16_WRAPPER
> + class char16ptr_t {
> + private:
> + const char16_t *mPtr;
Nit: we don't use this kind mMember naming convention most of the times.
@@ +69,5 @@
> +#ifdef MOZ_USE_CHAR16_WRAPPER
> + class char16ptr_t {
> + private:
> + const char16_t *mPtr;
> + typedef decltype(sizeof(int)) size_type;
Why this and not size_t?
@@ +74,5 @@
> + public:
> + char16ptr_t(const char16_t *ptr) : mPtr(ptr) {}
> + char16ptr_t(const wchar_t *ptr) : mPtr(reinterpret_cast<const char16_t*>(ptr)) {}
> +
> + constexpr char16ptr_t(decltype(nullptr)) : mPtr(nullptr) {}
Can't you just make the ptr argument to the first ctor default to nullptr?
@@ +76,5 @@
> + char16ptr_t(const wchar_t *ptr) : mPtr(reinterpret_cast<const char16_t*>(ptr)) {}
> +
> + constexpr char16ptr_t(decltype(nullptr)) : mPtr(nullptr) {}
> +
> + operator bool() const {
This is bad, see <http://www.artima.com/cppsource/safebool.html>.
@@ +82,5 @@
> + }
> + operator const char16_t *() const {
> + return mPtr;
> + }
> + operator const void *() const {
This is not needed.
@@ +88,5 @@
> + }
> + operator const wchar_t *() const {
> + return reinterpret_cast<const wchar_t*>(mPtr);
> + }
> + explicit operator char16_t*() const {
Why are the non-const pointer conversion operators needed?
@@ +94,5 @@
> + }
> + explicit operator wchar_t*() const {
> + return const_cast<wchar_t*>(static_cast<const wchar_t*>(*this));
> + }
> + explicit operator void*() const {
This shouldn't be needed.
@@ +97,5 @@
> + }
> + explicit operator void*() const {
> + return const_cast<char16_t*>(mPtr);
> + }
> + bool operator==(const char16ptr_t &x) const {
I'm not sure why you're adding all of these other operators.
@@ +120,5 @@
> + return mPtr[i];
> + }
> + };
> +
> +#else
We should try to not put anything into this branch of the #if.
Comment 16•11 years ago
|
||
Comment on attachment 822322 [details] [diff] [review]
WIP
Review of attachment 822322 [details] [diff] [review]:
-----------------------------------------------------------------
::: mfbt/Char16.h
@@ +88,5 @@
> + }
> + operator const wchar_t *() const {
> + return reinterpret_cast<const wchar_t*>(mPtr);
> + }
> + explicit operator char16_t*() const {
If I'm not mistaken, MSVC doesn't support explicit operators.
@@ +112,5 @@
> + }
> + char16ptr_t operator+(size_type add) const {
> + return char16ptr_t(mPtr+add);
> + }
> + long operator-(const char16ptr_t &other) const {
That is ptrdiff_t, not long.
::: xpcom/string/public/nsTString.h
@@ +72,5 @@
> self_type& operator=( char_type c ) { Assign(c); return *this; }
> self_type& operator=( const char_type* data ) { Assign(data); return *this; }
> self_type& operator=( const self_type& str ) { Assign(str); return *this; }
> +#if defined(CharT_is_PRUnichar) && defined(MOZ_USE_CHAR16_WRAPPER)
> + self_type& operator=( char16ptr_t data ) { Assign(static_cast<char16_t>(data)); return *this; }
Should it not be static_cast<char16_t*>?
I also feel like the declaration should be operator=( const char16ptr_t data );
Comment 17•11 years ago
|
||
(In reply to :Ehsan Akhgari (needinfo? me!) [Away 10/29-11/6] from comment #15)
> @@ +69,5 @@
> > +#ifdef MOZ_USE_CHAR16_WRAPPER
> > + class char16ptr_t {
> > + private:
> > + const char16_t *mPtr;
> > + typedef decltype(sizeof(int)) size_type;
>
> Why this and not size_t?
I imagine stddef.h is not included when Char16.h is...
Comment 18•11 years ago
|
||
(In reply to Joshua Cranmer [:jcranmer] from comment #16)
> ::: mfbt/Char16.h
> @@ +88,5 @@
> > + }
> > + operator const wchar_t *() const {
> > + return reinterpret_cast<const wchar_t*>(mPtr);
> > + }
> > + explicit operator char16_t*() const {
>
> If I'm not mistaken, MSVC doesn't support explicit operators.
This code is only compiled on MinGW.
Updated•11 years ago
|
OS: All → Windows XP
Assignee | ||
Comment 19•11 years ago
|
||
Thanks for your comments. I'm traveling until tomorrow, I will be back to this later this week.
Assignee | ||
Comment 20•11 years ago
|
||
Addresses most feedback and more changes.
Assignee: nobody → jacek
Attachment #822322 -
Attachment is obsolete: true
Assignee | ||
Comment 21•11 years ago
|
||
Contains different changes across the tree (also a few unrelated mingw fixes). With this patch I'm able to build usable Firefox.
Assignee | ||
Comment 22•11 years ago
|
||
(In reply to :Ehsan Akhgari (needinfo? me!) [Away 10/29-11/6] from comment #15)
> Comment on attachment 822322 [details] [diff] [review]
> WIP
Thanks for the review.
> Review of attachment 822322 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> Your patch doesn't show where else char16ptr_t is used, but why can't we
> just modify nsTString.h and nsTSubstring.h to accept wchar_t as well as
> char16_t on mingw?
The main motivation was to make get() to return something that may used both by code expecting char16_t and wchar_t. This is not possible without a new type. This is also useful in a few other places.
> ::: mfbt/Char16.h
> @@ +68,5 @@
> >
> > +#ifdef MOZ_USE_CHAR16_WRAPPER
> > + class char16ptr_t {
> > + private:
> > + const char16_t *mPtr;
>
> Nit: we don't use this kind mMember naming convention most of the times.
OK, fixed.
> @@ +69,5 @@
> > +#ifdef MOZ_USE_CHAR16_WRAPPER
> > + class char16ptr_t {
> > + private:
> > + const char16_t *mPtr;
> > + typedef decltype(sizeof(int)) size_type;
>
> Why this and not size_t?
For that we'd need to include stddef.h here.
> @@ +74,5 @@
> > + public:
> > + char16ptr_t(const char16_t *ptr) : mPtr(ptr) {}
> > + char16ptr_t(const wchar_t *ptr) : mPtr(reinterpret_cast<const char16_t*>(ptr)) {}
> > +
> > + constexpr char16ptr_t(decltype(nullptr)) : mPtr(nullptr) {}
>
> Can't you just make the ptr argument to the first ctor default to nullptr?
Without this constructor, using nullptr is ambiguous as it could be casted both to const wchar_t* and const char16_t*.
> @@ +76,5 @@
> > + char16ptr_t(const wchar_t *ptr) : mPtr(reinterpret_cast<const char16_t*>(ptr)) {}
> > +
> > + constexpr char16ptr_t(decltype(nullptr)) : mPtr(nullptr) {}
> > +
> > + operator bool() const {
>
> This is bad, see <http://www.artima.com/cppsource/safebool.html>.
This class is not meant to be safe of anything like that, this is meant to be as similar to |typedef const char16_t *char16ptr_t| as possible so that there are as few mingw-specific changes as possible. This operator is part of that.
> @@ +82,5 @@
> > + }
> > + operator const char16_t *() const {
> > + return mPtr;
> > + }
> > + operator const void *() const {
>
> This is not needed.
Without that, assigning char16ptr_t to const void* is ambiguous (it could be casted both via const char16_t* and const wchar_t*).
> @@ +88,5 @@
> > + }
> > + operator const wchar_t *() const {
> > + return reinterpret_cast<const wchar_t*>(mPtr);
> > + }
> > + explicit operator char16_t*() const {
>
> Why are the non-const pointer conversion operators needed?
Because you may do casts like: (char16_t*)strptr and it's done in a lot of places in the code. We want to support that here, even if it doesn't look nice. I should probably put quite a few comments explaining that...
> @@ +94,5 @@
> > + }
> > + explicit operator wchar_t*() const {
> > + return const_cast<wchar_t*>(static_cast<const wchar_t*>(*this));
> > + }
> > + explicit operator void*() const {
>
> This shouldn't be needed.
It is to make (void*) casts not ambiguous.
> @@ +97,5 @@
> > + }
> > + explicit operator void*() const {
> > + return const_cast<char16_t*>(mPtr);
> > + }
> > + bool operator==(const char16ptr_t &x) const {
>
> I'm not sure why you're adding all of these other operators.
Because such operators are allowed on pure pointers.
> @@ +120,5 @@
> > + return mPtr[i];
> > + }
> > + };
> > +
> > +#else
>
> We should try to not put anything into this branch of the #if.
If we have the typedef for other build configs, we may sasely use char16ptr_t wherever we like and it will be no-op on anything other than mingw. See the patch I attached.
Assignee | ||
Comment 23•11 years ago
|
||
(In reply to Joshua Cranmer [:jcranmer] from comment #16)
> @@ +112,5 @@
> > + }
> > + char16ptr_t operator+(size_type add) const {
> > + return char16ptr_t(mPtr+add);
> > + }
> > + long operator-(const char16ptr_t &other) const {
>
> That is ptrdiff_t, not long.
Yes, except we can't use ptrdiff_t here. I used decltype((char*)0-(char*)0), which is ugly but does the job. I'd be glad to have better suggestion.
> ::: xpcom/string/public/nsTString.h
> @@ +72,5 @@
> > self_type& operator=( char_type c ) { Assign(c); return *this; }
> > self_type& operator=( const char_type* data ) { Assign(data); return *this; }
> > self_type& operator=( const self_type& str ) { Assign(str); return *this; }
> > +#if defined(CharT_is_PRUnichar) && defined(MOZ_USE_CHAR16_WRAPPER)
> > + self_type& operator=( char16ptr_t data ) { Assign(static_cast<char16_t>(data)); return *this; }
>
> Should it not be static_cast<char16_t*>?
>
> I also feel like the declaration should be operator=( const char16ptr_t data
> );
Yes, good catch, thanks.
Assignee | ||
Comment 24•11 years ago
|
||
It's time to try to have it landed. This is the most tricky part. I will also attach accessible/ part as an example of how remaining changes looks like and hold other changes until those are done.
Attachment #825938 -
Attachment is obsolete: true
Attachment #827945 -
Flags: review?(ehsan)
Assignee | ||
Comment 25•11 years ago
|
||
Attachment #827946 -
Flags: review?(ehsan)
Assignee | ||
Updated•11 years ago
|
Attachment #827945 -
Attachment description: patch.diff → Char16.h and xpcom/strings/ part.
Assignee | ||
Comment 26•11 years ago
|
||
Fixed a peoblem found on try. This version succeeded:
https://tbpl.mozilla.org/?tree=Try&rev=48e1eba111d3
Attachment #827945 -
Attachment is obsolete: true
Attachment #827945 -
Flags: review?(ehsan)
Attachment #828631 -
Flags: review?(ehsan)
Assignee | ||
Updated•11 years ago
|
Attachment #827946 -
Flags: review?(ehsan)
Comment 27•11 years ago
|
||
Comment on attachment 828631 [details] [diff] [review]
Char16.h and xpcom/strings/ part.
Review of attachment 828631 [details] [diff] [review]:
-----------------------------------------------------------------
r=me, but I would like Benjamin to take a look at this as well.
::: mfbt/Char16.h
@@ +68,5 @@
>
> +#ifdef MOZ_USE_CHAR16_WRAPPER
> +# include <string>
> + /**
> + * Win32 API is extensively uses wchar_t, which is represented by a separated
Nit: s/is //
@@ +73,5 @@
> + * builtin type than char16_t per spec. It's not the case for MSVC, but GCC
> + * follows the spec. We want to mix wchar_t and char16_t on Windows builds.
> + * This class is supposed to make it easier. It stores char16_t const pointer,
> + * but provides implicit casts for wchar_t as well. On other platforms, we
> + * simply use |typedef const char16_t char16ptr_t|. Here, we want to make
Nit: char16_t*.
@@ +79,5 @@
> + * are allowed by the typedef.
> + */
> + class char16ptr_t {
> + private:
> + const char16_t *ptr;
Your indentation is off here, see mfbt/STYLE.
Also, please add a static_assert here checking to make sure the sizes of the two types (wchar_t/char16_t) is the same.
@@ +84,5 @@
> + public:
> + char16ptr_t(const char16_t *ptr) : ptr(ptr) {}
> + char16ptr_t(const wchar_t *ptr) : ptr(reinterpret_cast<const char16_t*>(ptr)) {}
> +
> + constexpr char16ptr_t(decltype(nullptr)) : ptr(nullptr) {}
Please add a comment mentioning why this is necessary.
Attachment #828631 -
Flags: review?(ehsan)
Attachment #828631 -
Flags: review?(benjamin)
Attachment #828631 -
Flags: review+
Assignee | ||
Comment 28•11 years ago
|
||
Thanks for the review. Here the patch updated with comments.
Attachment #828631 -
Attachment is obsolete: true
Attachment #828631 -
Flags: review?(benjamin)
Attachment #829746 -
Flags: review?(benjamin)
Comment 29•11 years ago
|
||
Comment on attachment 829746 [details] [diff] [review]
Char16.h and xpcom/strings/ part.
I'm basing this on only using char16ptr_t as the result of nsString::get(). I couldn't work out whether what you achieve by using char16ptr_t in other places. In particular for assigning to strings I assumed a char16_t*/wchar_t* overload would suffice.
>+ /* Without this, nullptr assignment would be ambiguous */
>+ constexpr char16ptr_t(decltype(nullptr)) : ptr(nullptr) {}
I don't know why this might be needed. I couldn't find a use in the mozilla-central tree.
>+ operator const void *() const {
>+ return ptr;
>+ }
I see someone passing get() to memcpy. It seems a reasonable thing to do.
>+ operator bool() const {
>+ return ptr != nullptr;
>+ }
get() never returns nullptr, except in the case of an nsXPIDLString or its subclass nsAdoptingString, but in these cases there are better ways of testing for null.
>+ operator std::wstring() const {
>+ return std::wstring(static_cast<const wchar_t*>(*this));
>+ }
Not sure why this is needed, as std::wstring can't be constructed from a const char16_t*.
>+ /* Explicit cast operators to allow things like (char16_t*)str */
>+ explicit operator char16_t*() const {
>+ return const_cast<char16_t*>(ptr);
>+ }
>+ explicit operator wchar_t*() const {
>+ return const_cast<wchar_t*>(static_cast<const wchar_t*>(*this));
>+ }
In theory, nobody should be casting away const from get(). I even found someone casting away const from a .get() and writing to the string. (Fortunately the string was probably not shared, but that's still incorrect Mozilla code that you're wallpapering over.)
>+ explicit operator const char*() const {
>+ return reinterpret_cast<const char*>(ptr);
>+ }
>+ explicit operator const unsigned char*() const {
>+ return reinterpret_cast<const unsigned char*>(ptr);
>+ }
>+ explicit operator unsigned char*() const {
>+ return const_cast<unsigned char*>(reinterpret_cast<const unsigned char*>(ptr));
>+ }
>+ explicit operator void*() const {
>+ return const_cast<char16_t*>(ptr);
>+ }
These look plain wrong, although I found people casting a .get() to various byte types.
>+ /* Some operators used on pointers */
>+ char16_t operator[](size_t i) const {
>+ return ptr[i];
>+ }
Doesn't make sense. Code should just index the nsString directly.
>+ bool operator==(const char16ptr_t &x) const {
>+ return ptr == x.ptr;
>+ }
>+ bool operator==(decltype(nullptr)) const {
>+ return !ptr;
>+ }
>+ bool operator!=(const char16ptr_t &x) const {
>+ return ptr != x.ptr;
>+ }
>+ bool operator!=(decltype(nullptr)) const {
>+ return ptr != nullptr;
>+ }
Doesn't make sense to compare a transient pointer with anything.
>+ char16ptr_t operator+(size_t add) const {
>+ return char16ptr_t(ptr+add);
>+ }
It's unclear whether Mozilla code should be doing this. They should possibly be using Substring or Rebind instead. For instance:
> CopyUTF16toUTF8(fileName.get() + i + 1, fileExt);
should be
> CopyUTF16toUTF8(Substring(fileName, i + 1), fileExt);
(Note that nsExternalHelperAppService::GetTypeFromFile is badly written in general.)
while
> nsDependentString withoutHash(colorStr.get() + 1, colorStr.Length() - 1);
> if (NS_HexToRGB(withoutHash, &color)) {
should be
> if (NS_HexToRGB(Substring(colorStr, 1))) {
For cases I couldn't easily fix I worked around it by using BeginReading() instead.
>+ ptrdiff_t operator-(const char16ptr_t &other) const {
>+ return ptr-other.ptr;
>+ }
>+};
>+
>+inline decltype((char*)0-(char*)0) operator-(const char16_t *x, const char16ptr_t y) {
>+ return x - static_cast<const char16_t*>(y);
>+}
I'm not sure what the use case is for these. I couldn't find any in the mozilla-central tree.
Assignee | ||
Comment 30•11 years ago
|
||
Thanks for the review.
(In reply to neil@parkwaycc.co.uk from comment #29)
> Comment on attachment 829746 [details] [diff] [review]
> Char16.h and xpcom/strings/ part.
>
> I'm basing this on only using char16ptr_t as the result of nsString::get().
> I couldn't work out whether what you achieve by using char16ptr_t in other
> places. In particular for assigning to strings I assumed a
> char16_t*/wchar_t* overload would suffice.
This is also useful in other places. There are quite a few cases where wchar_t is mixed with char16_t and for those simple type change is the easiest solution. Sorry I didn't make it clear earlier. I skipped some comments that were based on that assumption.
> >+ /* Without this, nullptr assignment would be ambiguous */
> >+ constexpr char16ptr_t(decltype(nullptr)) : ptr(nullptr) {}
> I don't know why this might be needed. I couldn't find a use in the
> mozilla-central tree.
That's for something like this:
char16ptr_t str = nullptr;
> >+ operator std::wstring() const {
> >+ return std::wstring(static_cast<const wchar_t*>(*this));
> >+ }
> Not sure why this is needed, as std::wstring can't be constructed from a
> const char16_t*.
It can be constructed on MSVC (because there is no char16_t is the same as wchar_t) and that's what I'm trying to emulate here. IPC code from chromium uses wstring in many places.
> >+ /* Explicit cast operators to allow things like (char16_t*)str */
> >+ explicit operator char16_t*() const {
> >+ return const_cast<char16_t*>(ptr);
> >+ }
> >+ explicit operator wchar_t*() const {
> >+ return const_cast<wchar_t*>(static_cast<const wchar_t*>(*this));
> >+ }
> In theory, nobody should be casting away const from get(). I even found
> someone casting away const from a .get() and writing to the string.
> (Fortunately the string was probably not shared, but that's still incorrect
> Mozilla code that you're wallpapering over.)
There are some places where such casts are made due to lack proper const support in Win32 API. See this for example (there are more cases even in the same file):
https://github.com/mozilla/mozilla-central/blob/master/embedding/components/printingui/src/win/nsPrintDialogUtil.cpp#L809
> >+ explicit operator const char*() const {
> >+ return reinterpret_cast<const char*>(ptr);
> >+ }
> >+ explicit operator const unsigned char*() const {
> >+ return reinterpret_cast<const unsigned char*>(ptr);
> >+ }
> >+ explicit operator unsigned char*() const {
> >+ return const_cast<unsigned char*>(reinterpret_cast<const unsigned char*>(ptr));
> >+ }
> >+ explicit operator void*() const {
> >+ return const_cast<char16_t*>(ptr);
> >+ }
> These look plain wrong, although I found people casting a .get() to various
> byte types.
There are such casts for different Win32 API. For example, strings are casted to BYTE* (aka. unsigned char*) for Reg* arguments. All of above I wrote after hitting compile error on a cast, they are not imagined cases.
Comment 31•11 years ago
|
||
Comment on attachment 829746 [details] [diff] [review]
Char16.h and xpcom/strings/ part.
Asking another review from Jeff on the MFBT part (mostly for MFBT style comments and such.)
Attachment #829746 -
Flags: review?(jwalden+bmo)
Comment 32•11 years ago
|
||
(In reply to Jacek Caban from comment #30)
> (In reply to comment #29)
> > (From update of attachment 829746 [details] [diff] [review])
> > I'm basing this on only using char16ptr_t as the result of nsString::get().
> > I couldn't work out whether what you achieve by using char16ptr_t in other
> > places. In particular for assigning to strings I assumed a
> > char16_t*/wchar_t* overload would suffice.
>
> This is also useful in other places. There are quite a few cases where
> wchar_t is mixed with char16_t and for those simple type change is the
> easiest solution. Sorry I didn't make it clear earlier. I skipped some
> comments that were based on that assumption.
Ah, so you're basically using it as a cast function? I hope that doesn't confuse people more than a bunch of reinterpret_cast<wchar_t*>()s would.
> > >+ operator std::wstring() const {
> > >+ return std::wstring(static_cast<const wchar_t*>(*this));
> > >+ }
> > Not sure why this is needed, as std::wstring can't be constructed from a
> > const char16_t*.
>
> It can be constructed on MSVC (because there is no char16_t is the same as
> wchar_t) and that's what I'm trying to emulate here. IPC code from chromium
> uses wstring in many places.
Obviously MSVC can construct a std::wstring from a char16ptr_t because it is a typedef for const wchar_t*. I'm trying to work out why gcc would have trouble constructing a std::wstring from a char16ptr_t. I admit I wasn't using mingw but a basic test of gcc seemed to work for me.
Comment 33•11 years ago
|
||
Comment on attachment 829746 [details] [diff] [review]
Char16.h and xpcom/strings/ part.
Why do we need the new string constructors? If you have a char16ptr_t and you're constructing a nsString, won't the operators take care of the conversion?
This feel fragile: most people aren't going to know when they need to add these special char16ptr_t methods, and so this is going to break regularly :-(
Assignee | ||
Comment 34•11 years ago
|
||
(In reply to neil@parkwaycc.co.uk from comment #32)
> > This is also useful in other places. There are quite a few cases where
> > wchar_t is mixed with char16_t and for those simple type change is the
> > easiest solution. Sorry I didn't make it clear earlier. I skipped some
> > comments that were based on that assumption.
> Ah, so you're basically using it as a cast function?
Yes.
> I hope that doesn't confuse people more than a bunch of reinterpret_cast<wchar_t*>()s would.
The hope is that it will be needed much less places than reinterpret_casts would.
> > > >+ operator std::wstring() const {
> > > >+ return std::wstring(static_cast<const wchar_t*>(*this));
> > > >+ }
> > > Not sure why this is needed, as std::wstring can't be constructed from a
> > > const char16_t*.
> >
> > It can be constructed on MSVC (because there is no char16_t is the same as
> > wchar_t) and that's what I'm trying to emulate here. IPC code from chromium
> > uses wstring in many places.
>
> Obviously MSVC can construct a std::wstring from a char16ptr_t because it is
> a typedef for const wchar_t*. I'm trying to work out why gcc would have
> trouble constructing a std::wstring from a char16ptr_t. I admit I wasn't
> using mingw but a basic test of gcc seemed to work for me.
That's because C++ allows only one implicit cast. In case of mingw, char16ptr_t is a class, so there are two casts needed: char16ptr_t -> const wchar_t* -> std::wstring. Defining the cast inside char16ptr_t works around that.
Assignee | ||
Comment 35•11 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #33)
> Comment on attachment 829746 [details] [diff] [review]
> Char16.h and xpcom/strings/ part.
>
> Why do we need the new string constructors? If you have a char16ptr_t and
> you're constructing a nsString, won't the operators take care of the
> conversion?
They are needed for wchar_t* support. However, they can't take const wchar_t* as arugument, because that would make constructing from char16ptr_t ambiguous.
> This feel fragile: most people aren't going to know when they need to add
> these special char16ptr_t methods, and so this is going to break regularly
> :-(
Yes, I know. But I can't see any solution that would require less mingw-specific things. FWIW, since I got the tree compiling (about two weeks ago), I only had to rebase patches. There were no new char16_t-related problems introduced.
Comment 36•11 years ago
|
||
Comment on attachment 829746 [details] [diff] [review]
Char16.h and xpcom/strings/ part.
Review of attachment 829746 [details] [diff] [review]:
-----------------------------------------------------------------
::: mfbt/Char16.h
@@ +77,5 @@
> + * simply use |typedef const char16_t* char16ptr_t|. Here, we want to make
> + * the class as similar to this typedef, including providing some casts that
> + * are allowed by the typedef.
> + */
> +class char16ptr_t {
Brace on new line.
@@ +81,5 @@
> +class char16ptr_t {
> + private:
> + const char16_t* ptr;
> + static_assert(sizeof(char16_t) == sizeof(wchar_t), "char16_t and wchar_t sizes differ");
> + public:
Blank line before the public: line, please.
@@ +83,5 @@
> + const char16_t* ptr;
> + static_assert(sizeof(char16_t) == sizeof(wchar_t), "char16_t and wchar_t sizes differ");
> + public:
> + char16ptr_t(const char16_t *ptr) : ptr(ptr) {}
> + char16ptr_t(const wchar_t *ptr) : ptr(reinterpret_cast<const char16_t*>(ptr)) {}
*/& goes by type name in all places in this file:
char16ptr_t(const char16_t* ptr) : ptr(ptr) {}
typedef const char16_t* char16ptr_t;
operator const char16_t*() const
static_cast<const char16_t*>(...)
bool operator==(const char16ptr_t& x)
@@ +85,5 @@
> + public:
> + char16ptr_t(const char16_t *ptr) : ptr(ptr) {}
> + char16ptr_t(const wchar_t *ptr) : ptr(reinterpret_cast<const char16_t*>(ptr)) {}
> +
> + /* Without this, nullptr assignment would be ambiguous */
Period at end of sentence, please.
@@ +86,5 @@
> + char16ptr_t(const char16_t *ptr) : ptr(ptr) {}
> + char16ptr_t(const wchar_t *ptr) : ptr(reinterpret_cast<const char16_t*>(ptr)) {}
> +
> + /* Without this, nullptr assignment would be ambiguous */
> + constexpr char16ptr_t(decltype(nullptr)) : ptr(nullptr) {}
I guess we're just assuming native nullptr and decltype, no need for emulation of either? Just flagging as something to keep in mind, my memory says we can get away with these if we're assuming basically gcc relatively new here.
@@ +98,5 @@
> + operator const void *() const {
> + return ptr;
> + }
> + operator bool() const {
> + return ptr != nullptr;
Hmm, interesting. I guess operator bool() isn't as completely broken as I thought! (To the peanut gallery, this disambiguates the ambiguous conversion to bool through multiple different pointer types. And char16ptr_t() + 1 uses the pointer conversions, so no bizarro summing up to an integer.)
@@ +101,5 @@
> + operator bool() const {
> + return ptr != nullptr;
> + }
> + operator std::wstring() const {
> + return std::wstring(static_cast<const wchar_t*>(*this));
Is the * before this right? I'd think it wouldn't be, or that *this would at least be ambiguous about which conversion operator it chose to invoke prior to dereferencing.
@@ +120,5 @@
> + }
> + explicit operator unsigned char*() const {
> + return const_cast<unsigned char*>(reinterpret_cast<const unsigned char*>(ptr));
> + }
> + explicit operator void*() const {
These four const char*, const unsigned char*, unsigned char*, void* explicit operators are blargh. How many instances were we actually hitting where these were necessary? If it's relatively "few" I would probably go more for uglifying casts at those sites, or in some utility method in some central header that did the relevant sequence of casts instead. (This would be followup-land, to be clear.)
I'd also put a comment before these explaining why you have this:
"""
Some Windows API calls accept BYTE* but require that data actually be WCHAR*. Supporting this requires explicit operators to support the requisite explicit casts.
"""
@@ +132,5 @@
> + bool operator==(const char16ptr_t &x) const {
> + return ptr == x.ptr;
> + }
> + bool operator==(decltype(nullptr)) const {
> + return !ptr;
ptr == nullptr is probably slightly clearer, paralleling the operator being defined, as it does.
@@ +141,5 @@
> + bool operator!=(decltype(nullptr)) const {
> + return ptr != nullptr;
> + }
> + char16ptr_t operator+(size_t add) const {
> + return char16ptr_t(ptr+add);
Spaces around +.
@@ +144,5 @@
> + char16ptr_t operator+(size_t add) const {
> + return char16ptr_t(ptr+add);
> + }
> + ptrdiff_t operator-(const char16ptr_t &other) const {
> + return ptr-other.ptr;
Spaces around -.
@@ +148,5 @@
> + return ptr-other.ptr;
> + }
> +};
> +
> +inline decltype((char*)0-(char*)0) operator-(const char16_t *x, const char16ptr_t y) {
inline decltype(static_cast<char*>(0) - static_cast<char*>(0))
operator-(const char16_t* x, const char16ptr_t y)
{
...
}
::: xpcom/string/public/nsTDependentString.h
@@ +53,5 @@
> }
>
> +#if defined(CharT_is_PRUnichar) && defined(MOZ_USE_CHAR16_WRAPPER)
> + explicit
> + nsTDependentString_CharT( const wchar_t *data )
I'd have thought this would take char16ptr_t like all the others, but this isn't *wrong*, at least, so okay. :-)
::: xpcom/string/public/nsTString.h
@@ +40,5 @@
> }
>
> +#if defined(CharT_is_PRUnichar) && defined(MOZ_USE_CHAR16_WRAPPER)
> + explicit
> + nsTString_CharT( const wchar_t* data, size_type length = size_type(-1) )
Again I'd think char16ptr_t, but again, not *wrong*.
@@ +82,5 @@
> * returns the null-terminated string
> */
>
> +#if defined(CharT_is_PRUnichar) && defined(MOZ_USE_CHAR16_WRAPPER)
> + const char16ptr_t get() const
The const here doesn't hurt, but it doesn't help, either.
@@ +599,5 @@
> }
>
> // return nullptr if we are voided
> +#if defined(CharT_is_PRUnichar) && defined(MOZ_USE_CHAR16_WRAPPER)
> + const char16ptr_t get() const
Again const hurt/help stuff.
::: xpcom/string/public/nsTSubstring.h
@@ +374,5 @@
> +
> + void Assign (char16ptr_t data, size_type length)
> + {
> + Assign(static_cast<const char16_t*>(data), length);
> + }
I think you also need
void Assign(char16ptr_t data, size_type length, const fallible_t&) NS_WARN_UNUSED_RESULT
{
return Assign(static_cast<const char16_t*>(data), length, fallible_t());
}
Attachment #829746 -
Flags: review?(jwalden+bmo) → review+
Assignee | ||
Comment 37•11 years ago
|
||
Attachment #827946 -
Attachment is obsolete: true
Attachment #829746 -
Attachment is obsolete: true
Attachment #829746 -
Flags: review?(benjamin)
Assignee | ||
Comment 38•11 years ago
|
||
Thanks for the review. The attached version addresses all comments.
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #36)
> @@ +86,5 @@
> > + char16ptr_t(const char16_t *ptr) : ptr(ptr) {}
> > + char16ptr_t(const wchar_t *ptr) : ptr(reinterpret_cast<const char16_t*>(ptr)) {}
> > +
> > + /* Without this, nullptr assignment would be ambiguous */
> > + constexpr char16ptr_t(decltype(nullptr)) : ptr(nullptr) {}
>
> I guess we're just assuming native nullptr and decltype, no need for
> emulation of either? Just flagging as something to keep in mind, my memory
> says we can get away with these if we're assuming basically gcc relatively
> new here.
Yes, I believe we may assume that here.
> @@ +101,5 @@
> > + operator bool() const {
> > + return ptr != nullptr;
> > + }
> > + operator std::wstring() const {
> > + return std::wstring(static_cast<const wchar_t*>(*this));
>
> Is the * before this right? I'd think it wouldn't be, or that *this would
> at least be ambiguous about which conversion operator it chose to invoke
> prior to dereferencing.
The cast happens after dereferencing and it will use the cast operator, which is defined on char16ptr_t instance.
> @@ +120,5 @@
> > + }
> > + explicit operator unsigned char*() const {
> > + return const_cast<unsigned char*>(reinterpret_cast<const unsigned char*>(ptr));
> > + }
> > + explicit operator void*() const {
>
> These four const char*, const unsigned char*, unsigned char*, void* explicit
> operators are blargh. How many instances were we actually hitting where
> these were necessary? If it's relatively "few" I would probably go more for
> uglifying casts at those sites, or in some utility method in some central
> header that did the relevant sequence of casts instead. (This would be
> followup-land, to be clear.)
>
> I'd also put a comment before these explaining why you have this:
>
> """
> Some Windows API calls accept BYTE* but require that data actually be
> WCHAR*. Supporting this requires explicit operators to support the
> requisite explicit casts.
> """
There quite a few cases, but I don't have numbers. I will file a followup bug once this is landed.
> ::: xpcom/string/public/nsTDependentString.h
> @@ +53,5 @@
> > }
> >
> > +#if defined(CharT_is_PRUnichar) && defined(MOZ_USE_CHAR16_WRAPPER)
> > + explicit
> > + nsTDependentString_CharT( const wchar_t *data )
>
> I'd have thought this would take char16ptr_t like all the others, but this
> isn't *wrong*, at least, so okay. :-)
Good catch, I've changed this.
Assignee | ||
Comment 39•11 years ago
|
||
I landed it on m-i:
https://hg.mozilla.org/integration/mozilla-inbound/rev/30fe34b9ebf7
I will be cleaning up other parts as dependent bugs soon (as the time permits).
Comment 40•11 years ago
|
||
(In reply to Jacek Caban from comment #38)
> The cast happens after dereferencing and it will use the cast operator,
> which is defined on char16ptr_t instance.
Er, right. Obviously. I'm just thinko-ing here, clearly!
Comment 41•11 years ago
|
||
Thank you *so much* for working on this, Jacek!
Comment 42•11 years ago
|
||
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla28
Updated•11 years ago
|
Whiteboard: [qa-]
Assignee | ||
Comment 43•11 years ago
|
||
I landed a trivial fixup revealed by parts of codebase that use -Wshadow:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7b8c2a48d08b
Comment 44•11 years ago
|
||
Updated•4 years ago
|
Component: String → XPCOM
You need to log in
before you can comment on or make changes to this bug.
Description
•